Classroom Assessement
Classroom Assessement
Classroom Assessement
After studying this course the prospective teachers will be able to:
In short, we can say that assessment entails much more than testing. It is an ongoing
process that includes many formal and informal activities designed to monitor and
improve teaching and learning.
Evaluation: According to Kizlik (2011) evaluation is most complex and the least
understood term. Hopkins and Antes (1990) defined evaluation as a continuous
inspection of all available information in order to form a valid judgment of students’
learning and/or the effectiveness of education program.
The central idea in evaluation is
"value." When we evaluate a variable,
we are basically judging its
worthiness, appropriateness and
goodness. Evaluation is always done
against a standard, objectives or
criterion. In teaching learning process
teachers made students’ evaluations
that are usually done in the context of
comparisons between what was
intended (learning, progress,
behaviour) and what was obtained.
Activity 1.1: Distinguish among measurement, assessment and evaluation with the
help of relevant examples
Types of Assessment
"As coach and facilitator, the teacher uses formative assessment to help support and
enhance student learning, As judge and jury, the teacher makes summative judgments
about a student's achievement..."
The National Center for Fair and The Value of Formative Assessment.
Open Testing (1999). http://www.fairtest.org/examarts/winter99/k-
forma3.html
Assessment for learning has many unique characteristics for example this type of
assessment is taken as “practice." Learners should not be graded for skills and concepts
that have been just introduced. They should be given opportunities to practice. Formative
assessment helps teachers to determine next steps during the learning process as the
instruction approaches the summative assessment of student learning. A good analogy for
this is the road test that is required to receive a driver's license. Before the final driving
test, or summative assessment, a learner practice by being assessed again and again to
point out the deficiencies in the skill
Another distinctive characteristic of formative assessment is student involvement. If
students are not involved in the assessment process, formative assessment is not practiced
or implemented to its full effectiveness. One of the key components of engaging students
in the assessment of their own learning is providing them with descriptive feedback as
they learn. In fact, research shows descriptive feedback to be the most significant instructional
strategy to move students forward in their learning. Descriptive feedback provides students with
an understanding of what they are doing well. It also gives input on how to reach the next step in
the learning process.
Role of assessment for learning in instructional process can be best understood with the
help of following diagram.
Source: http://www.stemresources.com/index.php?
option=com_content&view=article&id=52&It emid=70
Garrison, & Ehringhaus, (2007) identified some of the instructional strategies that can be
used for formative assessment:
Observations. Observing students’ behaviour and tasks can help teacher to identify
if students are on task or need clarification. Observations assist teachers in gathering
evidence of student learning to inform instructional planning.
Questioning strategies. Asking better questions allows an opportunity for deeper
thinking and provides teachers with significant insight into the degree and depth of
understanding. Questions of this nature engage students in classroom dialogue that
both uncovers and expands learning.
Self and peer assessment. When students have been involved in criteria and goal
setting, self-evaluation is a logical step in the learning process. With peer evaluation,
students see each other as resources for understanding and checking for quality work
against previously established criteria.
Student record keeping It also helps the teachers to assess beyond a "grade," to see
where the learner started and the progress they are making towards the learning
goals.
b) Assessment of Learning (Summative Assessment)
Summative assessment or assessment of learning is used to evaluate students’
achievement at some point in time, generally at the end of a course. The purpose of this
assessment is to help the teacher, students and parents know how well student has
completed the learning task. In other words summative evaluation is used to assign a
grade to a student which indicates his/her level of achievement in the course or program.
Assessment of learning is basically designed to provide useful information about the
performance of the learners rather than providing immediate and direct feedback to
teachers and learners, therefore it usually has little effect on learning. Though high
quality summative information can help and guide the teacher to organize their courses,
decide their teaching strategies and on the basis of information generated by summative
assessment educational programs can be modified.
Many experts believe that all forms of assessment have some formative element. The
difference only lies in the nature and the purpose for which assessment is being
conducted.
Comparing Assessment for Learning and Assessment of Learning
Checks how students are learning and is Checks what has been learned to date.
there any problem in learning process. it
determines what to do next.
Usually uses detailed, specific and Usually uses numbers, scores or marks as
descriptive feedback—in a formal or part of a formal report.
informal report.
4. Assessment requires attention to outcomes but also and equally to the experiences
that lead to those outcomes.
Information about outcomes is of high importance; where students "end up" matters
greatly. But to improve outcomes, we need to know about student experience along the
way -- about the curricula, teaching, and kind of student effort that lead to particular
outcomes. Assessment can help us understand which students learn best under what
conditions; with such knowledge comes the capacity to improve the whole of their
learning.
Role of Assessment
"Teaching and learning are reciprocal processes that depend on and affect one another.
Thus, the assessment component deals with how well the students are learning and how
well the teacher is teaching" Kellough and Kellough, (1999)
Assessment does more than allocate a grade or degree classification to students – it plays
an important role in focusing their attention and, as Sainsbury & Walker (2007) observe,
actually drives their learning. Gibbs (2003) states that assessment has 6 main functions:
1. Capturing student time and attention
2. Generating appropriate student learning activity
3. Providing timely feedback which students pay attention to
4. Helping students to internalize the discipline’s standards and notions of equality
5. Generating marks or grades which distinguish between students .
Providing evidence for other outside the course to enable them to judge the
appropriateness of standards on the course.
Surgenor (2010) summarized the role of assessment in learning in the following points.
It fulfills student expectations
It is used to motivate students
It provide opportunities to remedy mistakes
It indicate readiness for progression
Assessment serves as a diagnostic tool
Assessment enables grading and degree classification
Assessment works as a performance indicator for students
It is used as a performance indicator for teacher
Assessment is also a performance indicator for institution
Assessment facilitates learning in the one way or the other.
Activity 1.3: Enlist different role of formative and summative assessment in teaching
learning process.
Testing is done at the end of the instructional unit. The test score is seen as the
summation of all knowledge learned during a particular subject unit.
(a) Formative Evaluation:
Testing occurs constantly with learning so that teachers can evaluate the
effectiveness of teaching methods along with the assessment of students' abilities.
(ii)Advantages of Achievement Test:
One of the main advantages of testing is that it is able to provide assessments
that are psychometrically valid and reliable, as well as results which are
generalized and replicable.
Another advantage is aggregation. A well designed test provides an
assessment of an individual's mastery of a domain of knowledge or skill
which at some level of aggregation will provide useful information. That is,
while individual assessments may not be accurate enough for practical
purposes, the mean scores of classes, schools, branches of a company, or
other groups may well provide useful information because of the reduction of
error accomplished by increasing the sample size.
Activity 3.1: Prepare the achievement test on content to be taught of any subject
while focusing its steps and discuss with your course mates.
Aptitude Tests
Aptitude tests assume that individuals have inherent strengths and weaknesses, and are
naturally inclined toward success or failure in certain areas based on their inherent
characteristics.
Aptitude tests determine a person's ability to learn a given set of information. They do not
test a person's knowledge of existing information. The best way to prepare for aptitude
tests is to take practice tests.
Aptitude and ability tests are designed to assess logical reasoning or thinking
performance. They consist of multiple choice questions and are administered under exam
conditions. They are strictly timed and a typical test might allow 30 minutes for 30 or so
questions. Test result will be compared to that of a control group so that judgments can
be made about your abilities.
You may be asked to answer the questions either on paper or online. The advantages of
online testing include immediate availability of results and the fact that the test can be
taken at employment agency premises or even at home. This makes online testing
particularly suitable for initial screening as it is obviously very cost-effective.
(a) Instructional
Teachers can use aptitude test results to adapt their curricula to match the level of their
students, or to design assignments for students who differ widely. Aptitude test scores
can also help teachers form realistic expectations of students. Knowing something about
the aptitude level of students in a given class can help a teacher identify which students
are not learning as much as could be predicted on the basis of aptitude scores. For
instance, if a whole class were performing less well than would be predicted from
aptitude test results, then curriculum, objectives, teaching methods, or student
characteristics might be investigated.
(b) Administrative
Aptitude test scores can identify the general aptitude level of a high school, for example.
This can be helpful in determining how much emphasis should be given to college
preparatory programs. Aptitude tests can be used to help identify students to be
accelerated or given extra attention, for grouping, and in predicting job training
performance.
(c) Guidance
Guidance counselors use aptitude tests to help parents develop realistic expectations for
their child's school performance and to help students understand their own strengths and
weaknesses.
Activity: 3.2 Discuss with your course mate about their aptitudes towards teaching
profession and analyze their opinions.
Attitude
Attitude is a posture, action or disposition of a figure or a statue. A mental and neural
state of readiness, organized through experience, exerting a directive or dynamic
influence upon the individual's response to all objects and situations with which it is
related.
Attitude is the state of mind with which you approach a task, a challenge, a person, love,
life in general. The definition of attitude is “a complex mental state involving beliefs and
feelings and values and dispositions to act in certain ways”. These beliefs and feelings are
different due to various interpretations of the same events by various people and these
differences occur due to the earlier mentioned inherited characteristics’.
(i) Components of Attitude
1. Cognitive Component:
It refers that's part of attitude which is related in general know how of a person,
for example, he says smoking is injurious to health. Such type of idea of a person
is called cognitive component of attitude.
2. Effective Component:
This part of attitude is related to the statement which affects another person. For
example, in an organization a personal report is given to the general manager. In
report he points out that the sale staff is not performing their due responsibilities.
The general manager forwards a written notice to the marketing manager to
negotiate with the sale staff.
3. Behavioral Component:
The behavioral component refers to that part of attitude which reflects the
intension of a person in short run or long run. For example, before the production
and launching process the product. Report is prepared by the production
department which consists of the intention in near future and long run and this
report is handed over to top management for the decision.
Intelligence Tests
Intelligence involves the ability to think, solve problems, analyze situations, and
understand social values, customs, and norms. Two main forms of intelligence are
involved in most intelligence assessments:
Verbal Intelligence is the ability to comprehend and solve language-based problems;
and
Nonverbal Intelligence is the ability to understand and solve visual and spatial
problems.
Intelligence is sometimes referred to as intelligence quotient (IQ), cognitive functioning,
intellectual ability, aptitude, thinking skills and general ability.
While intelligence tests are psychological tests that are designed to measure a variety of
mental functions, such as reasoning, comprehension, and judgment.
Intelligence test is often defined as a measure of general mental ability. Of the
standardized intelligence tests, those developed by David Wechsler are among those most
widely used. Wechsler defined intelligence as “the global capacity to act purposefully, to
think rationally, and to deal effectively with the environment.” While psychologists
generally agree with this definition, they don't agree on the operational definition of
intelligence (that is, a statement of the procedures to be used to precisely define the
variable to be measured) or how to accomplish its measurement.
The goal of intelligence tests is to obtain an idea of the person's intellectual potential. The
tests center around a set of stimuli designed to yield a score based on the test maker's
model of what makes up intelligence. Intelligence tests are often given as a part of a
battery of tests.
(ii)Advantages
In general, intelligence tests measure a wide variety of human behaviours better than any
other measure that has been developed. They allow professionals to have a uniform way
of comparing a person's performance with that of other people who are similar in age.
These tests also provide information on cultural and biological differences among people.
Intelligence tests are excellent predictors of academic achievement and provide an outline
of a person's mental strengths and weaknesses. Many times the scores have revealed
talents in many people, which have led to an improvement in their educational
opportunities. Teachers, parents, and psychologists are able to devise individual curricula
that matches a person's level of development and expectations.
(iii) Disadvantages
Some researchers argue that intelligence tests have serious shortcomings. For example,
many intelligence tests produce a single intelligence score. This single score is often
inadequate in explaining the multidimensional.
Another problem with a single score is the fact that individuals with similar intelligence
test scores can vary greatly in their expression of these talents. It is important to know the
person's performance on the various subtests that make up the overall intelligence test
score. Knowing the performance on these various scales can influence the understanding
of a person's abilities and how these abilities are expressed. For example, two people
have identical scores on intelligence tests. Although both people have the same test score,
one person may have obtained the score because of strong verbal skills while the other
may have obtained the score because of strong skills in perceiving and organizing various
tasks.
Furthermore, intelligence tests only measure a sample of behaviors or situations in which
intelligent behavior is revealed. For instance, some intelligence tests do not measure a
person's everyday functioning, social knowledge, mechanical skills, and/or creativity.
Along with this, the formats of many intelligence tests do not capture the complexity and
immediacy of real-life situations. Therefore, intelligence tests have been criticized for
their limited ability to predict non-test or nonacademic intellectual abilities. Since
intelligence test scores can be influenced by a variety of different experiences and
behaviors, they should not be considered a perfect indicator of a person's intellectual
potential.
Activity 3.4:
Discuss with your course mate about the intelligence testing and identify the methods
used to measure intelligence, and make a list of problems in measuring intelligence
Personality Tests
Your personality is what makes you who you are. It's that organized set of unique traits
and characteristics that makes you different from every other person in the world. Not
only does your personality make you special, it makes you!?
“The particular pattern of behavior and thinking that prevails across
time and contexts, and differentiates one person from another.”
Increasing productivity
Get along better with classmates
Help students realize their full potential
Identify teaching strategies for students
Help students appreciate other personality types.
The stem may be stated as a direct question or as an incomplete statement. For example:
Direct question
Which is the capital city of Pakistan?------------------------(Stem)
A. Paris. --------------------------------------- (Distracter)
B. Lisbon. -------------------------------------- (Distracter)
C. Islamabad. ---------------------------------- (Key)
D. Rome. --------------------------------------- (Distracter)
Multiple choice questions are composed of one question with multiple possible answers
(options), including the correct answer and several incorrect answers (distracters). Typically,
students select the correct answer by circling the associated number or letter, or filling in the
associated circle on the machine-readable response sheet. Students can generally respond to
these types of questions quite quickly. As a result, they are often used to test student’s
knowledge of a broad range of content. Creating these questions can be time consuming
because it is often difficult to generate several plausible distracters. However, they can be
marked very quickly.
8. Avoid Distracters in the Form of "All the answers are correct" or "None of the
Answers is Correct"!
Teachers use these statements most frequently when they run out of ideas for distracters.
Students, knowing what is behind such questions, are rarely misled by it. Therefore, if
you do use such statements, sometimes use them as the key answer. Furthermore, if a
student recognizes that there are two correct answers (out of 5 options), they will be able
to conclude that the key answer is the statement "all the answers are correct", without
knowing the accuracy of the other distracters.
Advantages
Versatility
Multiple-choice test items are appropriate for use in many different subject-matter areas,
and can be used to measure a great variety of educational objectives. They are adaptable
to various levels of learning outcomes, from simple recall of knowledge to more complex
levels, such as the student’s ability to:
• Analyze phenomena
• Apply principles to new situations
• Comprehend concepts and principles
• Discriminate between fact and opinion
• Interpret cause-and-effect relationships
• Interpret charts and graphs
• Judge the relevance of information
• Make inferences from given data
• Solve problems
The difficulty of multiple-choice items can be controlled by changing the alternatives,
since the more homogeneous the alternatives, the finer the distinction the students must
make in order to identify the correct answer. Multiple-choice items are amenable to item
analysis, which enables the teacher to improve the item by replacing distracters that are
not functioning properly. In addition, the distracters chosen by the student may be used to
diagnose misconceptions of the student or weaknesses in the teacher’s instruction.
Validity
In general, it takes much longer to respond to an essay test question than it does to
respond to a multiple-choice test item, since the composing and recording of an essay
answer is such a slow process. A student is therefore able to answer many multiple-
choice items in time it would take to answer a single essay question. This feature enables
the teacher using multiple-choice items to test a broader sample of course contents in a
given amount of testing time. Consequently, the test scores will likely be more
representative of the students’ overall achievement in the course.
Reliability
Well-written multiple-choice test items compare favourably with other test item types on
the issue of reliability. They are less susceptible to guessing than are true-false test items,
and therefore capable of producing more reliable scores. Their scoring is more clear-cut
than short answer test item scoring because there are no misspelled or partial answers to
deal with. Since multiple-choice items are objectively scored, they are not affected by
scorer inconsistencies as are essay questions, and they are essentially immune to the
influence of bluffing and writing ability factors, both of which can lower the reliability of
essay test scores.
Efficiency
Multiple-choice items are amenable to rapid scoring, which is often done by scoring
machines. This expedites the reporting of test results to the student so that any follow-up
clarification of instruction may be done before the course has proceeded much further.
Essay questions, on the other hand, must be graded manually, one at a time. Overall
multiple choice tests are:
Very effective
Versatile at all levels
Minimum of writing for student
Guessing reduced
Can cover broad range of content
Disadvantages
Versatility
Since the student selects a response from a list of alternatives rather than supplying or
constructing a response, multiple-choice test items are not adaptable to measuring certain
learning outcomes, such as the student’s ability to:
• Articulate explanations
• Display thought processes
• Furnish information
• Organize personal thoughts.
Perform a specific task
• Produce original ideas
• Provide examples
Reliability
Although they are less susceptible to guessing than are true false-test items, multiple-
choice items are still affected to a certain extent. This guessing factor reduces the
reliability of multiple-choice item scores somewhat, but increasing the number of items
on the test offsets this reduction in reliability.
Difficulty of Construction
Good multiple-choice test items are generally more difficult and time-consuming to
write than other types of test items. Coming up with plausible distracters requires a
certain amount of skill. This skill, however, may be increased through study, practice,
and experience.
Gronlund (1995) writes that multiple-choice items are difficult to construct. Suitable
distracters are often hard to come by and the teacher is tempted to fill the void with a
“junk” response. The effect of narrowing the range of options will available to the test
wise student. They are also exceedingly time consuming to fashion, one hour per
question being by no means the exception. Finally multiple-choice items generally take
student longer to complete (especially items containing fine discrimination) than do other
types of objective question.
Difficult to construct good test items.
Difficult to come up with plausible distracters/alternative responses.
Activity 4.1: Construct two items of direct question and two items of incomplete
statement while following the rules of multiple items.
True/False Questions
A True-False test item requires the student to determine whether a statement is true or
false. The chief disadvantage of this type is the opportunity for successful guessing.
According to Gronlund (1995) the alternative response test items that consists of a
declaration statement that the pupil is asked to mark true or false, right or wrong, correct
or incorrect, yes or no, fact or opinion, agree or disagree and the like. In each case there
are only two possible answers. Because the true-false option is the most common, this
type is mostly refers to true-false type. Students make a designation about the validity of
the statement. Also known as a “binary-choice” item because there are only two options
to select from. These types of items are more effective for assessing knowledge,
comprehension, and application outcomes as defined in the cognitive domain of Blooms’
Taxonomy of educational objectives.
Example
Directions: Circle the correct response to the following statements.
1. Allama Iqbal is the founder of Pakistan. T/F
2. Democracy system is for the people. T/F
3. Quaid-e-Azam was the first Prime Minister of Pakistan. T/F
Good for:
Knowledge level content
Evaluating student understanding of popular misconceptions
Concepts with two logical responses
Advantages:
Easily assess verbal knowledge
Each item contains only two possible answers
Easy to construct for the teacher
Easy to score for the examiner
Helpful for poor students
Can test large amounts of content
Students can answer 3-4 questions per minute
Disadvantages:
They are easy to construct.
It is difficult to discriminate between students that know the material and
students who don't know.
Students have a 50-50 chance of getting the right answer by guessing.
Need a large number of items for high reliability.
Fifty percent guessing factor.
Assess lower order thinking skills.
Poor representative of students learning achievement.
Activity 4.2: Enlist five items by indicating them T/F (True & False)
Matching items
According to Cunningham (1998), the matching items consist of two parallel columns.
The column on the left contains the questions to be answered, termed premises; the
column on the right, the answers, termed responses. The student is asked to associate
each premise with a response to form a matching pair.
For example;
Matching test items are used to test a student's ability to recognize relationships and to
make associations between terms, parts, words, phrases, clauses, or symbols in one
column with related alternatives in another column. When using this form of test item, it
is a good practice to provide alternatives in the response column that are used more than
once, or not at all, to preclude guessing by elimination. Matching test items may have
either an equal or unequal number of selections in each column.
Matching-Equal Columns. When using this form, providing for some items in the
response column to be used more than once, or not at all, can preclude guessing by
elimination.
Good for:
Knowledge level
Some comprehension level, if appropriately constructed
Types:
Terms with definitions
Phrases with other phrases
Causes with effects
Parts with larger units
Problems with solutions
Advantages:
The chief advantage of matching exercises is that a good deal of factual information can
be tested in minimal time, making the tests compact and efficient. They are especially
well suited to who, what, when and where types of subject matter. Further students
frequently find the tests fun to take because they have puzzle qualities to them.
Maximum coverage at knowledge level in a minimum amount of space/prep time
Valuable in content areas that have a lot of facts
Disadvantages:
The principal difficulty with matching exercises is that teachers often find that the subject
matter is insufficient in quantity or not well suited for matching terms. An exercise
should be confined to homogeneous items containing one type of subject matter (for
instance, authors-novels; inventions inventors; major events-dates terms – definitions;
rules examples and the like). Where unlike clusters of questions are used to adopt but
poorly informed student can often recognize the ill-fitting items by their irrelevant and
extraneous nature (for instance, in a list of authors the inclusion of the names of capital
cities).
Student identifies connected items from two lists. It is useful for assessing the ability to
discriminate, categorize, and association amongst similar concepts.
Time consuming for students
Not good for higher levels of learning
Activity 4.3: Keeping in view the nature of matching items, construct at least five
items of matching case about any topic.
Completion Items
Like true-false items, completion items are relatively easy to write. Perhaps the first tests
classroom teachers’ construct and students take completion tests. Like items of all other
formats, though, there are good and poor completion items. Student fills in one or more
blanks in a statement. These are also known as “Gap-Fillers.” Most effective for
assessing knowledge and comprehension learning outcomes but can be written for higher
level outcomes. e.g.
The capital city of Pakistan is-----------------.
I. Word the statement such that the blank is near the end of the sentence rather than
near the beginning. This will prevent awkward sentences.
II. If the problem requires a numerical answer, indicate the units in which it is to be
expressed.
Short Answer
Student supplies a response to a question that might consistent of a single word or phrase.
Most effective for assessing knowledge and comprehension learning outcomes but can be
written for higher level outcomes. Short answer items are of two types.
Simple direct questions
Who was the first president of the Pakistan?
Completion items
Advantages:
Easy to construct
Good for "who," what," where," "when" content
Minimizes guessing
Encourages more intensive study-student must know the answer vs. recognizing
the answer.
Gronlund (1995) writes that short-answer items have a number of advantages.
They reduce the likelihood that a student will guess the correct answer
They are relatively easy for a teacher to construct.
They are will adapted to mathematics, the sciences, and foreign languages where
specific types of knowledge are tested (The formula for ordinary table salt is
).
They are consistent with the Socratic question and answer format frequently
employed in the elementary grades in teaching basic skills.
Disadvantages:
May overemphasize memorization of facts
Take care - questions may have more than one correct answer
Scoring is laborious
According to Grounlund (1995) there are also a number of disadvantages with short-
answer items.
They are limited to content areas in which a student’s knowledge can be
adequately portrayed by one or two words.
They are more difficult to score than other types of objective-item tests since
students invariably come up with unanticipated answers that are totally or
partially correct.
Short answer items usually provide little opportunity for students to synthesize,
evaluate and apply information.
Example 1:
List the major similarities and differences in the lives of people living in Islamabad and
Faisalabad.
Example 2:
Compare advantages and disadvantages of lecture teaching method and demonstration
teaching method.
Example:
Identify as many different ways to generate electricity in Pakistan as you can? Give
advantages and disadvantages of each. Your response will be graded on its accuracy,
comprehension and practical ability.
Good for:
Application, synthesis and evaluation levels
Types:
Extended response: synthesis and evaluation levels; a lot of freedom in answers
Restricted response: more consistent scoring, outlines parameters of responses
Advantages:
Students less likely to guess
Easy to construct
Stimulates more study
Allows students to demonstrate ability to organize knowledge, express opinions,
show originality.
Disadvantages:
Can limit amount of material tested, therefore has decreased validity.
Subjective, potentially unreliable scoring.
Time consuming to score.
Activity 4.6: Develop an essay type test on this unit while covering the levels of
knowledge, application and analysis.
Self Assessment Questions:
1. In an area in which you are teaching or plan to teach, identify several learning
outcomes that can be best measured with objective and subjective types
questions.
2. Criticize the different types of selection and supply categories. In your opinion which
type is more appropriate for measuring the achievement level of elementary
students?
3. What factors should be considered in deciding whether subjective or objective type
questions should be included in a classroom tests?
4. Compare the functions of selection and supply types items.
References/Suggested Readings
Airasian, P. (1994) "Classroom Assessment," Second Edition, NY" McGraw-Hill.
American Psychological Association. (1985). Standards for Educational and
Psychological Testing. Washington, DC: American Psychological Association.
Anastasi, A. (1988). Psychological Testing (6th ed.). New York, NY: MacMillan
Publishing Company.
Cangelosi, J. (1990) "Designing Tests for Evaluating Student Achievement." NY:
Addison-Wesley.
Cunningham, G.K. (1998). Assessment in the Classroom. Bristol, PA: Falmer Press.
Ward, A.W., & Murray-Ward, M. (1999). Assessment in the Classroom.
Belmont, CA: Wadsworth Publishing Co.
Gronlund, N. (1993) "How to Make Achievement Tests and Assessments," 5th Edition,
NY: Allyn and Bacon.
Gronlund, N. E. & Linn, R. L. (1995). Measurement and Assessment in Teaching. New
Delhi: Baba Barkha Nath Printers.
Haladyna, T.M. & Downing, S.M. (1989) Validity of a Taxonomy of Multiple-Choice
Item-Writing Rules. "Applied Measurement in Education," 2(1), 51-78.
Monahan, T. (1998) The Rise of Standardized Educational Testing in the U.S. – A
Bibliographic Overview.
Ravitch, Diane, “The Uses and Misuses of Tests”, in The Schools We Deserve (New
York: Basic Books, 1985), pp. 172–181.
Thissen, D., & Wainer, H. (2001). Test Scoring. Mahwah, NJ: Erlbaum.
Wilson, N. (1997) Educational standards and the problem of error. Education Policy
Analysis Archives, Vol 6 No 10
UNIT : 03
TEST CONSTRUCTION
Purpose of a Test
Assessment of a student in class is inevitable because it is integral part of teaching-
learning process. Assessment on one hand provides information to design or redesign
instruction and on the other hand it promotes learning. Teachers use different techniques
and procedures to assess their students i.e tests, observations, questionnaires, interviews,
rating scales, discussion etc. A teacher develops, administers, and marks academic
achievement and other types of tests in order to measure the ability of a student in a
subject or measures behaviour in class or in school. What are these tests? Does a teacher
really need to know that what is test? Yes, it is very important. The teaching-learning
process remains incomplete if a teacher does not know that how well her class is doing
and to what extent her teaching is effective in terms of achievement of pre defined
objectives. There are many technical terms which are related with assessment. Before we
go any further, it would be beneficial to define first what is a test.
What is a Test ?
A test is a device which is used to measure behaviour of a person for a specific purpose.
Moreover it is an instrument that typically uses sets of items designed to measure a
domain of learning tasks. Tests are systematic method of collecting information that lead
to make inferences about the characteristics of people or objects. A teacher must
understand that educational test is a measuring device and therefore involves rules
(administering, scoring) for assigning numbers that will be used for describing the
performance of an individual. You should also keep in mind that it is not possible for a
teacher to test all the subject matter of a course that has been taught to the class in a
semester or in a year. Therefore, teacher prepares tests while sampling the items from a
pool of items in such a way that it represents the whole subject matter. Teacher must also
understand that whole content with many topics and concepts that have been taught
within a semester or in a year can not be tested in one or two hours. In simple words a
test should assess content area in accordance with relative importance a teacher has
assigned to them. It is believed most commonly that the meaning of a test is simple
paper-and-pencil tests. But now a days other testing procedures have been developed and
are practiced in many schools.
Even tests are of many types that can be placed into two main categories. These are:
(i) Subjective type tests
(ii)Objective type tests
At elementary level students do not have much proficiency of writing long essay type
answer of a question, therefore, objective type tests are preferred. Objective type tests are
also called selective-response tests. In this types of tests responses of an item are
provided and the students are required to choose correct response. The objective types of
tests that are used at elementary level are:
(i) Multiple choice
(ii)Multiple Binary-choice
(iii) Matching items
You will study about the development process of each of these items in next units. In this
unit you have been given just an idea that what does a test mean for a teacher. Definitely
after going through this discussion you might be ready to extract yourself from the above
mentioned paragraphs that why it is important for a teacher to know about a classroom
test. What purpose it serves? The job of a teacher is to teach and to test for the following:
Purposes of test:
You have learned that a test is a simple device which measures the achievement level of a
student in a particular subject and grade. Therefore we can say that a test is used to serve
the following purposes:
5. Evaluating Instruction
Students’ performance on tests helps the teacher to evaluate her/his own
instructional effectiveness or to know that how effective their teaching have been. A
teacher teaches a topic for two weeks. After the completion of topic the teacher gives a
test. The score obtained by students show that they learned the skills and knowledge that
was expected to learn. But if the obtained score is poor, then the teacher will decide to
retain, alter or totally discard their current instructional activities.
Activity-2.1: Visit some schools of your area and perform the following:
The Credit Common Accord for Wales defines learning outcomes as:
Statements of what a learner can be expected to know, understand and/or do as a result of
a learning experience. (QCA /LSC, 2004, p. 12)
Activity-2.4 Differentiate between learning Objective and Outcome with the help of
relevant examples
4. SOLO Taxonomy
The SOLO taxonomy stands for:
Structure of
Observed
Learning
Outcomes
SOLO taxonomy was developed by Biggs and Collis (1982) which is further explained
by Biggs and Tang (2007). This taxonomy is used by Punjab for the assessment.
It describes level of increasing complexity in a student's understanding of a subject
through five stages, and it is claimed to be applicable to any subject area. Not all students
get through all five stages, of course, and indeed not all teaching.
4 Relational level: the student is now able to appreciate the significance of the
parts in relation to the whole.
5 At the extended abstract level, the student is making connections not only
within the given subject area, but also beyond it, able to generalise and
transfer the principles and ideas underlying the specific instance.
SOLO taxonomy
http://www.learningandteaching.info/learning/solo.htm#ixzz1nwXTmNn9
Content taught
Content taught
Content taught
Figure-2.7 Inadequate representativeness
Test items
Content taught
In figures 2.5 to 2.9 the shaded area represents the test items which cover the content of
subject matter whereas un-shaded area is the subject matter (learning domain) which the
teacher has taught in the class in the subject of social studies.
Figures 2.5-2.8 show the poor or inadequate representativeness of content of test
items. For example in figure-2.5 test covers a small portion (shaded area) of taught
content domain, rest of the items do not coincide with the taught domain. In figure 2.5 &
most of the test items/questions have been taken from a specific part of taught domain,
therefore, the representation of taught content domain is inadequate. Though, the test
items have been taken from the same content domain. The content of test items in figure
2.7 give very poor picture of a test. None of the parts of taught domain have been
assessed, therefore test shows zero representativeness.
It implies that the content from which the test item have to be taken should be well
defined and structured. With out setting the boundary of knowledge, behaviour, or skills
to be measured, the test development task will become difficult and complex. As a result
the assessment will produce unreliable results. Therefore a good test represents the taught
content up to maximum extent. A test which is representative of the entire content
domain is actually is a good test. Therefore it is imperative for a teacher to prepare
outline of the content that will be covered during the instruction. The next step is the
selection of subject matter and designing of instructional activities. All these steps are
guided by the objectives. One must consider objectives of the unit before selection of
content domain and subsequently designing of a test. It is clear from above discussion
that the outline of the test content should based on the following principles:
1. Purpose of the test (diagnostic test, classification, placement, or job employment)
2. Representative sample of the knowledge, behaviour, or skill domain being measured.
3. Relevancy of the topic with the content of the subject
4. Language of the content should be according to the age and grade level of the
students.
5. Developing table of specification.
A test, which meets the criteria stated in above principles, will provide reliable and valid
information for correct decision regarding the individual. Now keeping in view these
principles go on the following activity.
Activity-2.5:
Visit elementary school of your area and collect question papers/tests of sixth class
of any subject developed by the school teachers. Now perform the following:
How many items are related with the content?
(1) a.
How many items (what percentage) are not related with the
b. content covered for the testing period?
c.
Is the test representative of the entire content domain?
d.
(2) Share your
Does the results
test fulfill electronically
the criteria of with your classmates,
test construction? and get
Explain.
their opinion on the clarification of concept discussed in unit-2
Planning a Test
The main objective of classroom assessment is to obtain valid, reliable and useful data
regarding student learning achievement. This requires determining what is to be
measured and then defining it precisely so that assessments tasks to measure desired
performance can be developed. Classroom tests and assessments can be used for the
following instructional objectives:
i. Pre-testing
Tests and assessments can be given at the beginning of an instructional unit or course to
determine:-
weather the students have the prerequisite skills needed for the
instruction (readiness, motivation etc)
to what extent the students have already achieved the objectives of
planned instruction (to determine placement or modification of
instruction)
ii. During the Instruction Testing
provides bases for formative assessment
monitor learning progress
detect learning errors
provide feedback for students and teachers
iii. End of Instruction Testing
measure intended learning outcomes
used for formative assessment
provides bases for grades, promotion etc
Prior to developing an effective test, one needs to determine whether or not a test is the
appropriate type of assessment. If the learning objectives are of primarily types of
procedural knowledge (how to perform a task) then a written test may not be the best
approach. Assessment of procedural knowledge generally calls for a performance
demonstration assessed using a rubric. Where demonstration of a procedure is not
appropriate, a test can be an effective assessment tool.
The first stage of developing a test is planning the test content and length. Planning the
test begins with development of a blueprint or test specifications for the test structured on
the learning outcomes or instructional objectives to be assessed by the test instrument.
For each learning outcome, a weight should be assigned based on the relative importance
of that outcome in the test. The weight will be used to determine the number of items
related to each of the learning outcomes.
Test Specifications
When an engineer prepares a design to construct a building and choose the materials, he
intends to use in construction, he usually know what a building is going to be used for,
and therefore designs it to meet the requirements of its planned inhabitants. Similarly, in
testing, table of specification is the blueprint of the assessment which specifies
percentages and weightage of test items and measuring constructs. It includes constructs
and concepts to be measured, tentative weightage of each construct, specify number of items
for each concept, and description of item types to be constructed. It is not surprising that
specifications are also referred to as ‘blueprints’, for they are literally architectural drawings
for test construction. Fulcher & Davidson (2009) divided test specifications into the following
four elements:
Item specifications: Item specifications describe the items, prompts or tasks, and
any other material such as texts, diagrams, and charts which are used as stimuli.
Typically, a specification at this sub-level contains two key elements: samples of
the tasks to be produced, and guiding language that details all information
necessary to produce the task.
Presentation Model: Presentation model provides information how the items
and tasks are presented to the test takers.
Assembly Model: Assembly model helps the test developer to combine test
items and tasks to develop a test format.
Delivery Model: Delivery Model tells how the actual test is delivered. It
includes information regarding test administration, test security/confidentiality
and time constraint.
According to W. Wiersma and S.G. Jurs (1990) in some matching exercises the number
of premises and responses are the same, termed a balanced or perfect matching exercise.
In others, the number and responses may be different.
Advantages
The chief advantage of matching exercises is that a good deal of factual information can
be tested in minimal time, making the tests compact and efficient. They are especially
well suited to who, what, when and where types of subject matter. Further students
frequently find the tests fun to take because they have puzzle qualities to them.
Disadvantages
The principal difficulty with matching exercises is that teachers often find that the subject
matter is insufficient in quantity or not well suited for matching terms. An exercise
should be confined to homogeneous items containing one type of subject matter (for
instance, authors-novels; inventions inventors; major events-dates terms – definitions;
rules examples and the like). Where unlike clusters of questions are used to adopt but
poorly informed student can often recognize the ill-fitting items by their irrelevant and
extraneous nature (for instance, in a list of authors the inclusion of the names of capital
Student identifies connected items from two lists. It is Useful for assessing the ability to
discriminate, categorize, and association amongst similar concepts.
Direct question
Which is the capital city of Pakistan? -------- (Stem)
A. Lahore. -------------------------------------- (Distracter)
B. Karachi. ------------------------------------- (Distracter)
Incomplete Statement
The capital city of Pakistan is
A. Lahore.
B. Karachi.
C. Islamabad.
D. Peshawar.
EXAMPLES:
Memory Only Example (Less Effective)
6. Be Grammatically Correct
Use simple, precise and unambiguous wording
Students will be more likely to select the correct answer by finding the
grammatically correct option
9. Use Only One Correct Option (Or be sure the best option is clearly the best
option)
The item should include one and only one correct or clearly best answer
With one correct answer, alternatives should be mutually exclusive and
not overlapping
Using MC with questions containing more than one right answer lowers
discrimination between students
11.Use Only a Single, Clearly-Defined Problem and Include the Main Idea in the
Question
Students must know what the problem is without having to read the
response options
14.Don’t Use MCQ When Other Item Types Are More Appropriate
Limited distracters or assessing problem-solving and creativity
Advantages
The chief advantage of the multiple-choice question according to N.E. Gronlund (1990)
is its versatility. For instance, it is capable of being applied to a wide range of subject
areas. In contrast to short answer items limit the writer to those content areas that are
capable of being stated in one or two words, multiple choice item necessary bound to
homogeneous items containing one type of subject matter as are matching items, and a
multiple choice question greatly reduces the opportunity for a student to guess the correct
answer from one choice in two with a true – false items to one in four or five, there by
increasing the reliability of the test. Further, since a multiple – choice item contains
plausible incorrect or less correct alternative, it permits the test constructor to tine tune
the discriminations (the degree or homogeneity of the responses) and control.
Disadvantages
N.E. Gronlund (1990) writes that multiple-choice items are difficult to construct. Suitable
distracters are often hard to come by and the teacher is tempted to fill the void with a
“junk” response. The effect of narrowing the range of options will available to the test
wise student. They are also exceedingly time consuming to fashion, one hour per
question being by no means the exception. Finally they generally take student longer to
complete (especially items containing fine discrimination) than do other types of
objective question.
B. Short Answer
Student supplies a response to a question that might consistent of a single word or phrase.
Most effective for assessing knowledge and comprehension learning outcomes but can be
written for higher level outcomes. Short answer items are of two types.
Simple direct questions
Who was the first president of the Pakistan?
Completion items
Advantages
Norman E. Gronlund (1990) writes that short-answer items have a number of advantages.
They reduce the likelihood that a student will guess the correct answer
They are relatively easy for a teacher to construct.
They are will adapted to mathematics, the sciences, and foreign languages where
specific types of knowledge are tested (The formula for ordinary table salt is
).
They are consistent with the Socratic question and answer format frequently
employed in the elementary grades in teaching basic skills.
Disadvantages
According to Norman E. Grounlund (1990) there are also a number of disadvantages with
short-answer items.
They are limited to content areas in which a student’s knowledge can be
adequately portrayed by one or two words.
They are more difficult to score than other types of objective-item tests since
students invariably come up with unanticipated answers that are totally or
partially correct.
Short answer items usually provide little opportunity for students to synthesize,
evaluate and apply information.
All three types of scores interpretation are useful, depending on the purpose for which
comparisons made.
An absolute score merely describes a measure of performance or achievement without
comparing it with any set or specified standard. Scores are not particularly useful without
any kind of comparison. Criterion-referenced scores compare test performance with a
specific standard; such a comparison enables the test interpreter to decide whether the
scores are satisfactory according to established standards. Norm-referenced tests compare
test performance with that of others who were measured by the same procedure. Teachers
are usually more interested in knowing how children compare with a useful standard than
how they compare with other children; but norm-referenced comparisons may also
provide useful insights.
For example, a score at the 60th percentile means that the individual's score is the same
as or higher than the scores of 60% of those who took the test. The 50th percentile is
known as the median and represents the middle score of the distribution.
Percentiles have the disadvantage that they are not equal units of measurement. For
instance, a difference of 5 percentile points between two individual’s scores will have a
different meaning depending on its position on the percentile scale, as the scale tends to
exaggerate differences near the mean and collapse differences at the extremes.
Percentiles cannot be averaged nor treated in any other way mathematically. However,
they do have the advantage of being easily understood and can be very useful when
giving feedback to candidates or reporting results to managers.
If you know your percentile score then you know how it compares with others in the
norm group. For example, if you scored at the 70th percentile, then this means that you
scored the same or better than 70% of the individuals in the norm group.
Percentile score is easily understood when tend to bunch up around the average of the
group i.e. when most of the student are the same ability and have score with very small
rang.
To illustrate this point, consider a typical subject test consisting of 50 questions. Most of
the students, who are a fairly similar group in terms of their ability, will score around 40.
Some will score a few less and some a few more. It is very unlikely that any of them will
score less than 35 or more than 45.
These results in terms of achievement scores are a very poor way of analyzing them.
However, percentile score can interpret results very clearly.
Definition
A percentile is a measure that tells us what percent of the total frequency scored at or
below that measure. A percentile rank is the percentage of scores that fall at or below a
given score. OR
A percentile is a measure that tells us what percent of the total frequency scored below
that measure. A percentile rank is the percentage of scores that fall below a given score.
Both definitions are seams to same but statistically not same. For Example
Example No.1
If Aslam stand 25th out of a class of 150 students, then 125 students were ranked below
Aslam.
Formula:
To find the percentile rank of a score, x, out of a set of n scores, where x is
B 0.5E
included: .100 percentilerank
n
Where B = number of scores below x
E = number of scores equal to x
n = number of scores
using this formula Aslam's percentile rank would be:
Formula:
To find the percentile rank of a score, x, out of a set of n scores, where x is not included:
Example No.2
The science test scores are: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94,
96, 98, 98, 99 Find the percentile rank for a score of 84 on this test.
Solution:
First rank the scores in ascending or descending order
50, 65, 70, 72, 72, 78, 80, 82, 84, |84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99
Since there are 2 values equal to 84, assign one to the group "above 84" and the other to
the group "below 84".
Example No.3
The science test scores are: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94,
96, 98, 98, 99. Find the percentile rank for a score of 86 on this test.
Solution:
First rank the scores in ascending or descending order
Since there is only one value equal to 86, it will be counted as "half" of a data value for
the group "above 86" as well as the group "below 86".
Solution Using Formula:
B 0.5E
.100 percentile rank
n
11 0.5(1) 11.5
.100 .100 58th percentile
20 20
Keep in Mind:
Percentile rank is a number between 0 and 100 indicating the percent of cases
falling at or below that score.
Percentile ranks are usually written to the nearest whole percent: 64.5% = 65%
= 65th percentile
Scores are divided into 100 equally sized groups.
Scores are arranged in rank order from lowest to highest.
There is no 0 percentile rank - the lowest score is at the first percentile.
There is no 100th percentile - the highest score is at the 99th percentile.
Percentiles have the disadvantage that they are not equal units of measurement.
Percentiles cannot be averaged nor treated in any other way mathematically.
You cannot perform the same mathematical operations on percentiles that you can on
raw scores. You cannot, for example, compute the mean of percentile scores, as
the results may be misleading.
Quartiles can be thought of as percentile measure. Remember that quartiles break the
data set into 4 equal parts. If 100% is broken into four equal parts, we have
subdivisions at 25%, 50%, and 75% .creating the:
MarksOtained
100 = % marks
TotalMarks
Example:
The marks detail of Hussan’s math test is shown. Find the percentage marks of Hussan.
Question Q1 Q2 Q3 Q4 Q5 Total
Marks 10 10 5 5 20 50
Marks 8 5 2 3 10 28
obtained
Solution:
Hussan’ s marks = 28
Total marks =50
Hussan got = 28
MarksObtained 100 =56 %
100 =
TotalMarks 50
For example, a number can be used merely to label or categorize a response. This sort of
number (nominal scale) has a low level of meaning. A higher level of meaning comes
with numbers that order responses (ordinal data). An even higher level of meaning
(interval or ratio data) is present when numbers attempt to present exact scores, such as
when we state that a person got 17 correct out of 20. Although even the lowest scale is
useful, higher level scales give more precise information and are more easily adapted to
many statistical procedures.
Scores can be summarized by using either the mode (most frequent score), the median
(midpoint of the scores), or the mean (arithmetic average) to indicate typical
performance. When reporting data, you should choose the measure of central tendency
that gives the most accurate picture of what is typical in a set of scores. In addition, it is
possible to report the standard deviation to indicate the spread of the scores around the
mean.
Scores from measurement processes can be either absolute, criterion referenced, or norm
referenced. An absolute score simply states a measure of performance without comparing
it with any standard. However, scores are not particularly useful unless they are
compared with something. Criterion-referenced scores compare test performance with a
specific standard; such a comparison enables the test interpreter to decide whether the
scores are satisfactory according to established standards. Norm-referenced tests compare
test performance with that of others who were measured by the same procedure. Teachers
are usually more interested in knowing how children compare with a useful standard than
how they compare with other children; but norm referenced comparisons may also
provide useful insights.
Criterion-referenced scores are easy to understand because they are usually
straightforward raw scores or percentages. Norm-referenced scores are often converted to
percentiles or other derived standard scores. A student's percentile score on a test
indicates what percentage of other students who took the same test fell below that
student's score. Derived scores are often based on the normal curve. They use an arbitrary
mean to make comparisons showing how respondents compare with other persons who
took the same test.
Measurement Scales
Measurement is the assignment of numbers to objects or events in a systematic fashion.
Measurement scales are critical because they relate to the types of statistics you can use
to analyze your data. An easy way to have a paper rejected is to have used either an
incorrect scale/statistic combination or to have used a low powered statistic on a high
powered set of data. Following four levels of measurement scales are commonly
distinguished so that the proper analysis can be used on the data a number can be used
merely to label or categorize a response.
Nominal Scale
Nominal scales are the lowest scales of measurement. A nominal scale, as the name
implies, is simply some placing of data into categories, without any order or structure.
You are only allowed to examine if a nominal scale datum is equal to some particular
value or to count the number of occurrences of each value. For example, categorization of
blood groups of classmates into A, B. AB, O etc. In The only mathematical operation we
can perform with nominal data is to count. Variables assessed on a nominal scale are
called categorical variables; Categorical data are measured on nominal scales which
merely assign labels to distinguish categories. For example, gender is a nominal scale
variable. Classifying people according to gender is a common application of
a nominal scale.
Nominal Data
classification or gatagorization of data, e.g. male or female
no ordering, e.g. it makes no sense to state that male is greater than female (M >
F) etc
arbitrary labels, e.g., pass=1 and fail=2 etc
Ordinal Scale
Something measured on an "ordinal" scale does have an evaluative connotation. You are
also allowed to examine if an ordinal scale datum is less than or greater than another
value. For example rating of job satisfaction on a scale from 1 to 10, with 10
representing complete satisfaction. With ordinal scales, we only know that 2 is better than
1 or 10 is better than 9; we do not know by how much. It may vary. Hence, you can 'rank'
ordinal data, but you cannot 'quantify' differences between two ordinal values. Nominal
scale properties are included in ordinal scale.
Ordinal Data
ordered but differences between values are not important. Difference between
values may or may not same or equal.
e.g., political parties on left to right spectrum given labels 0, 1, 2
e.g., Likert scales, rank on a scale of 1..5 your degree of satisfaction
e.g., restaurant ratings
Interval Scale
An ordinal scale has quantifiable difference between values become interval scale. You
are allowed to quantify the difference between two interval scale values but there is no
natural zero. A variable measured on an interval scale gives information about more or
better as ordinal scales do, but interval variables have an equal distance between each
value. The distance between 1 and 2 is equal to the distance between 9 and 10. For
example, temperature scales are interval data with 25C warmer than 20C and a 5C
difference has some physical meaning. Note that 0C is arbitrary, so that it does not make
sense to say that 20C is twice as hot as 10C but there is the exact same difference
between 100C and 90C as there is between 42C and 32C. Students’ achievement scores
are measured on interval scale
Interval Data
ordered, constant scale, but no natural zero
differences make sense, but ratios do not (e.g., 30°-20°=20°-10°, but 20°/10° is
not twice as hot!
e.g., temperature (C,F), dates
Ratio Scale
Something measured on a ratio scale has the same properties that an interval scale has
except, with a ratio scaling, there is an absolute zero point. Temperature measured in
Kelvin is an example. There is no value possible below 0 degrees Kelvin, it is absolute
zero. Physical measurements of height, weight, length are typically ratio variables.
Weight is another example, 0 lbs. is a meaningful absence of weight. This ratio hold true
regardless of which scale the object is being measured in (e.g. meters or yards). This is
because there is a natural zero.
Ratio Data
ordered, constant scale, natural zero
e.g., height, weight, age, length
One can think of nominal, ordinal, interval, and ratio as being ranked in their relation to
one another. Ratio is more sophisticated than interval, interval is more sophisticated than
ordinal, and ordinal is more sophisticated than nominal.
Frequency Distribution
Frequency is how often something occurs. The frequency (f) of a particular observation
is the number of times the observation occurs in the data.
Distribution
The distribution of a variable is the pattern of frequencies of the observation.
Frequency Distribution
It is a representation, either in a graphical or tabular format, which displays the number of
observations within a given interval. Frequency distributions are usually used within a
statistical context.
Step 2:
Subtract the minimum data value from the maximum data value. For example,
our the IQ list above had a minimum value of 118 and a maximum value of 154,
so:
154 – 118 = 36
Step 3:
Divide your answer in Step 2 by the number of classes you chose in Step 1.
36 / 5 = 7.2
Step 4:
Round the number from Step 3 up to a whole number to get the class width.
Rounded up, 7.2 becomes 8.
Step 5:
Write down your lowest value for your first minimum data value:
The lowest value is 118
Step 6:
Add the class width from Step 4 to Step 5 to get the next lower class limit:
118 + 8 = 126
Step 7:
Repeat Step 6 for the other minimum data values (in other words, keep on
adding your class width to your minimum data values) until you have created the
number of classes you chose in Step 1. We chose 5 classes, so our 5 minimum
data values are:
118
126 (118 + 8)
134 (126 + 8)
142 (134 + 8)
150 (142 + 8)
Step 8:
Write down the upper class limits. These are the highest values that can be in the
category, so in most cases you can subtract 1 from class width and add that to the
minimum data value. For example:
118 + (8 – 1) = 125
118 – 125
126 – 133
134 – 142
143 – 149
150 – 157
Step 9:
Add a second column for the number of items in each class, and label the
columns with appropriate headings:
IQ Number
118 – 125
126 – 133
134 – 142
143 – 149
150 – 157
Step 10:
Count the number of items in each class, and put the total in the second column. The
list of IQ scores are: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133, 136, 138,
141, 142, 149, 150, 154.
IQ Number
118 – 125 4
126 – 133 6
134 – 142 4
143 – 149 1
150 – 157 2
Example 2
A survey was taken in Lahore. In each of 20 homes, people were asked how many cars
were registered to their households. The results were recorded as follows:
1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
Use the following steps to present this data in a frequency distribution table.
1. Divide the results (x) into intervals, and then count the number of results in each
interval. In this case, the intervals would be the number of households with no
car (0), one car (1), two cars (2) and so forth.
2. Make a table with separate columns for the interval numbers (the number of cars
per household), the tallied results, and the frequency of results in each interval.
Label these columns Number of cars, Tally and Frequency.
3. Read the list of data from left to right and place a tally mark in the appropriate
row. For example, the first result is a 1, so place a tally mark in the row beside
where 1 appears in the interval column (Number of cars). The next result is a 2,
so place a tally mark in the row beside the 2, and so on. When you reach your
fifth tally mark, draw a tally line through the preceding four marks to make your
final frequency calculations easier to read.
4. Add up the number of tally marks in each row and record them in the final
column entitled Frequency.
Your frequency distribution table for this exercise should look like this:
Table 1. Frequency table for the number of cars registered in
each household
The data from a frequency table can be displayed graphically. A graph can provide a
visual display of the distributions, which gives us another view of the summarized data.
For example, the graphic representation of the relationship between two different test
scores through the use of scatter plots. We learned that we could describe in general
terms the direction and strength of the relationship between scores by visually examining
the scores as they were arranged in a graph. Some other examples of these types of
graphs include histograms and frequency polygons.
A histogram is a bar graph of scores from a frequency table. The horizontal x-axis
represents the scores on the test, and the vertical y-axis represents the frequencies. The
frequencies are plotted as bars.
A frequency polygon could also be used to compare two or more sets of data by
representing each set of scores as a line graph with a different color or pattern. For
example, you might be interested in looking at your students’ scores by gender, or
comparing students’ performance on two tests (see Figure 9.4).
. Draw the Y-axis to indicate the frequency of each class. Place a point in the middle of
each class interval at the height corresponding to its frequency. Finally, connect the
points. You should include one class interval below the lowest value in your data and one
above the highest value. The graph will then touch the X-axis on both sides.
A frequency polygon for 642 psychology test scores is shown in Figure 1. The first label
on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the
lowest test score is 46, this interval has a frequency of 0. The point labeled 45 represents
the interval from 39.5 to 49.5. There are three scores in this interval. There are 150 scores
in the interval that surrounds 85.
You can easily discern the shape of the distribution from Figure 1. Most of the scores are
between 65 and 115. It is clear that the distribution is not symmetric inasmuch as good
scores (to the right) trail off more gradually than poor scores (to the left). In the
terminology of Chapter 3 (where we will study shapes of distributions more
systematically), the distribution is skewed.
A cumulative frequency polygon for the same test scores is shown in Figure 2. The graph
is the same as before except that the Y value for each point is the number of students in
the corresponding class interval plus all numbers in lower intervals. For example, there
are no scores in the interval labeled "35," three in the interval "45,"and 10 in the interval
"55."Therefore the Y value corresponding to "55" is 13. Since 642 students took the test,
the cumulative frequency for the last interval is 642.
Figure 2: Cumulative frequency polygon for the psychology test scores.
Solution
Self Assessment Questions
1. The control group scored 47.26 on the pretest. Does this score represent nominal,
ordinal, or interval scale data?
2. The control group's score of 47.26 on the pretest put it at the 26th percentile.
Does this percentile score represent nominal, ordinal, or interval scale data?
3. The control group had a standard deviation of 7.78 on the pretest. Does this
standard deviation represent nominal, ordinal, or interval scale data?
4. Construct a frequency distribution with suitable class interval size of marks obtained
by 50 students of a class are given below:
23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15, 21, 51,
54, 72, 68, 36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 50, 41, 57, 65,
54, 43, 56, 44, 30, 46, 67, 53
5. The Lakers scored the following numbers of goals in their last twenty matches: 3, 0,
1, 5, 4, 3, 2, 6, 4, 2, 3, 3, 0, 7, 1, 1, 2, 3, 4, 3
6. Which number had the highest frequency?
7. Which letter occurs the most frequently in the following sentence?
The task of grading and reporting students’ progress cannot be separated from the
procedures adopted in assessing students’ learning. If instructional objectives are well
defined in terms of behavioural or performance terms and relevant tests and other
assessment procedures are properly used, grading and reporting become a matter of
summarizing the results and presenting them in understandable form. Reporting students’
progress is difficult especially when data is represented in single letter-grade system or
numerical value (Linn & Gronlund, 2000).
Assigning grades and making referrals are decisions that require information about
individual students. In contrast, curricular and instructional decisions require information
about groups of students, quite often about entire classrooms or schools (Linn &
Gronlund, 2000).
There are three primary purposes of grading students. First, grades are the primary
currency for exchange of many of the opportunities and rewards our society has to offer.
Grades can be exchanged for such diverse entities as adult approval, public recognition,
college and university admission etc. To deprive students of grades means to deprive
them of rewards and opportunities. Second, teachers become habitual of assessing their
students’ learning in grades, and if teachers don’t award grades, the students might not
well know about their learning progress. Third, grading students motivate them. Grades
can serve as incentives, and for many students incentives serve a motivating function.
The different functions of grading and reporting systems are given as under:
1. Instructional uses
The focus of grading and reporting should be the student improvement in learning. This
is most likely occur when the report: a) clarifies the instructional objectives; b) indicates
the student’s strengths and weaknesses in learning; c) provides information concerning
the student’s personal and social development; and d) contributes to student’s motivation.
The improvement of student learning is probably best achieved by the day-to-day
assessments of learning and the feedback from tests and other assessment procedures. A
portfolio of work developed during the academic year can be displayed to indicate
student’s strengths and weaknesses periodically.
Periodic progress reports can contribute to student motivation by providing short-term
goals and knowledge of results. Both are essential features of essential learning. Well-
designed progress reports can also help in evaluating instructional procedures by
identifying areas need revision. When the reports of majority of students indicate poor
progress, it may infer that there is a need to modify the instructional objectives.
2. Feedback to students
Grading and reporting test results to the students have been an on-going practice in all the
educational institutions of the world. The mechanism or strategy may differ from country
to country or institution to institution but each institution observes this practice in any
way. Reporting test scores to students has a number of advantages for them. As the
students move up through the grades, the usefulness of the test scores for personal
academic planning and self-assessment increases. For most students, the scores provide
feedback about how much they know and how effective their efforts to learn have been.
They can know their strengths and areas need for special attention. Such feedback is
essential if students are expected to be partners in managing their own instructional time
and effort. These results help them to make good decisions for their future professional
development.
Teachers use a variety of strategies to help students become independent learners who are
able to take an increasing responsibility for their own school progress. Self-assessment is
a significant aspect of self-guided learning, and the reporting of test results can be an
integral part of the procedures teachers use to promote self-assessment. Test results help
students to identify areas need for improvement, areas in which progress has been strong,
and areas in which continued strong effort will help maintain high levels of achievement.
Test results can be used with information from teacher’s assessments to help students set
their own instructional goals, decide how they will allocate their time, and determine
priorities for improving skills such as reading, writing, speaking, and problem solving.
When students are given their own test results, they can learn about self-assessment while
doing actual self-assessment. (Iowa Testing Programs, 2011).
Grading and reporting results also provide students an opportunity for developing an
awareness of how they are growing in various skill areas. Self-assessment begins with
self-monitoring, a skill most children have begun developing well before coming to
kindergarten.
1. Raw scores
The raw score is simply the number of points received on a test when the test has been
scored according to the directions. For example, if a student responds to 65 items
correctly on an objective test in which each correct item counts one point, the raw score
will be 65.
Although a raw score is a numerical summary of student’s test performance, it is not very
meaningful without further information. For example, in the above example, what does a
raw score of 35 mean? How many items were in the test? What kinds of the problems
were asked? How the items were difficult?
2. Grade norms
Grade norms are widely used with standardized achievement tests, especially at
elementary level. The grade equivalent that corresponds to a particular raw score
identifies the grade level at which the typical student obtains that raw score. Grade
equivalents are based on the performance of students in the norm group in each of two or
more grades.
3. Percentile ranking
A percentile is a score that indicates the rank of the score compared to others (same
grade/age) using a hypothetical group of 100 students. In other words, a percentile rank
(or percentile score) indicates a student’s relative position in the group in terms of
percentage of students.
Percentile rank is interpreted as the percentage of individuals receiving scores equal or
lower than a given score. A percentile of 25 indicates that the student’s test performance
is equal or exceeds 25 out of 100 students on the same measure.
4. Standard scores
A standard score is also derived from the raw scores using the normal information
gathered when the test was developed. Instead of indicating a student’s rank compared to
others, standard scores indicate how far above or below the average (Mean) an individual
score falls, using a common scale, such as one with an average of 100. Basically standard
scores express test performance in terms of standard deviation (SD) from the Mean.
Standard scores can be used to compare individuals of different grades or age groups
because all are converted into the same numerical scale. There are various forms of
standard scores such as z-score, T-score, and stanines.
Z-score expresses test performance simply and directly as the number of SD units a raw
score is above or below the Mean. A z-score is always negative when the raw score is
smaller than Mean. Symbolic representation can be shown as: z-score = X-M/SD.
T-score refers to any set of normally distributed standard cores that has a Mean of 50 and
SD of 10. Symbolically it can be represented as: T-score = 50+10(z).
Stanines are the simplest form of normalized standard scores that illustrate the process of
normalization. Stanines are single digit scores ranging from 1 to 9. These are groups of
percentile ranks with the entire group of scores divided into nine parts, with the largest
number of individuals falling in the middle stanines, and fewer students falling at the
extremes (Linn & Gronlund, 2000).
7. Checklist of Objectives
To provide more informative progress reports, some schools have replaced or
supplemented the traditional grading system with a list of objectives to be checked or
rated. This system is more popular at elementary school level. The major advantage of
this system is that it provides a detailed analysis of the students’ strengths and
weaknesses. For example, the objectives for assessing reading comprehension can have
the following objectives.
Reads with understanding
Works out meaning and use of new words
Reads well to others
Reads independently for pleasure (Linn & Gronlund, 2000).
8. Rating scales
In many schools students’ progress is prepared on some rating scale, usually 1 to 10,
instead letter grades; 1 indicates the poorest performance while 10 indicates as the
excellent or extra-ordinary performance. But in the true sense, each rating level
corresponds to a specific level of learning achievement. Such rating scales are also used
by the evaluation of students for admissions into different programmes at university
level. Some other rating scales can also be seen across the world.
In rating scales, we generally assess students’ abilities in the context of ‘how much’,
‘how often’, ‘how good’ etc. (Anderson, 2003). The continuum may be qualitative such
as ‘how good a student behaves’ or it may quantitative such as ‘how much marks a
student got in a test’. Developing rating scales has become a common practice now-a-
days, but still many teachers don’t possess the skill of developing an appropriate rating
scale in context to their particular learning situations.
9. Letters to parents/guardians
Some schools keep parents inform about the progress of their children by writing letters.
Writing letters to parents is usually done by a fewer teachers who have more concern
with their students as it is a time consuming activity. But at the same time some good
teachers avoid to write formal letters as they think that many aspects are not clearly
interpreted. And some of the parents also don’t feel comfortable to accept such letters.
Linn and Gronlund (2000) state that although letters to parents might provide a good
supplement to other types of reports, their usefulness as the sole method of reporting
progress is limited by several of the following factors.
Comprehensive and thoughtful written reports require excessive amount of time and
energy.
Descriptions of students learning may be misinterpreted by the parents.
Fail to provide a systematic and organized information
10.Portfolio
The teachers of some good schools prepare complete portfolio of their students. Portfolio
is actually cumulative record of a student which reflects his/her strengths and weaknesses
in different subjects over the period of the time. It indicates what strategies were used by
the teacher to overcome the learning difficulties of the students. It also shows students’
progress periodically which indicates his/her trend of improvement. Developing portfolio
is really a hard task for the teacher, as he/she has to keep all record of students such as
teacher’s lesson plans, tests, students’ best pieces of works, and their assessments records
in an academic year.
An effective portfolio is more than simply a file into which student work products are
placed. It is a purposefully selected collection of work that often contains commentary on
the entries by both students and teachers.
No doubt, portfolio is a good tool for student’s assessment, but it has three limitations.
First, it is a time consuming process. Second, teacher must possess the skill of developing
portfolio which is most of the time lacking. Third, it is ideal for small class size and in
Pakistani context, particularly at elementary level, class size is usually large and hence
the teacher cannot maintain portfolio of a large class.
11.Report Cards
There is a practice of report cards in many good educational institutions in many
countries including Pakistan. Many parents desire to see the report cards or progress
reports in written form issued by the schools. Although a good report card explains the
achievement of students in terms of scores or marks, conduct and behaviour, participation
in class activities etc. Well written comments can offer parents and students’ suggestions
as to how to make improvements in specific academic or behavioural areas. These
provide teachers opportunities to be reflective about the academic and behavioural
progress of their students. Such reflections may result in teachers gaining a deeper
understanding of each student’s strengths and needs for improvement. Bruadli (1998) has
divided words and phrases into three categories about what to include and exclude from
written comments on report cards.
A. Words and phrases that promote positive view of the student
1. Gets along well with people
2. Has a good grasp of …
3. Has improved tremendously
4. Is a real joy to have in class
5. Is well respected by his classmates
6. Works very hard
12.Parent-teacher conferences
Parent-teacher conferences are mostly used in elementary schools. In such conferences
portfolio are discussed. This is a two-way flow of information and provides much
information to the parents. But one of the limitations is that many parents don’t come to
attend the conferences. It is also a time consuming activity and also needs sufficient
funds to hold conferences.
Literature also highlights ‘parent-student-teacher conference’ instead ‘parent-teacher
conference’, as student is also one of the key components of this process since he/she is
directly benefitted. In many developed countries, it has become the most important way
of informing parents about their children’s work in school. Parent-teacher conferences are
productive when these are carefully planned and the teachers are skilled and committed.
The parent-teacher conference is an extremely useful tool, but it shares three important
limitations with informal letter. First, it requires a substantial amount of time and skills.
Second, it does not provide a systematic record of student’s progress. Third, some parents
are unwilling to attend conferences, and they can’t be enforced.
Parent-student-teacher conferences are frequently convened in many states of the USA
and some other advanced countries. In the US, this has become a striking feature of
Charter Schools. Some schools rely more on parent conferences than written reports for
conveying the richness of how students are doing or performing. In such cases, a school
sometimes provides a narrative account of student’s accomplishments and status to
augment the parent conferences. (www.uscharterschools.org).
3. Conduct conference with student, parent, and advisor. Advisee takes the lead to the
greatest possible extent
Have a comfortable setting of chairs, tables etc.
Notify a viable timetable for the conferences
Review goals set earlier
Review progress towards goals
Review progress with samples of work from learning activities
Present students strong points first
Review attendance and handling of responsibilities at school and
home
Modify goals for balance of the year as necessary
Determine other learning activities to accomplish goals
Describe upcoming events and activities
Discuss how the home can contribute to learning
Parents should be encouraged to share their thoughts on students’
progress
Ask parents and students for questions, new ideas
Activities
Activity 1:
Enlist three pros and cons of test scores.
Activity 2:
Give a self-explanatory example of each of the types of test scores.
Activity 3:
Write down the different purposes and functions of test scores in order of importance as
per your experience. Add more purposes as many as you can.
Activity 4:
Compare the modes of reporting test scores to parents by MEAP and NCCA. Also
conclude which is relatively more appropriate in the context of Pakistan as per your point
of view.
Activity 5:
In view of the strengths and shortcomings in above different grading and reporting
systems, how would you briefly comment on the following characteristics of a multiple
grading and reporting system for effective assessment of students’ learning?
a) Grading and reporting system should be guided by the functions to be served.
b) It should be developed cooperatively by parents, students, teachers, and other school
personnel.
c) It should be based on clear and specific instructional objectives.
d) It should be consistent with school standards.
e) It should be based on adequate assessment.
f) It should provide detailed information of student’s progress, particularly
diagnostic and practical aspects.
g) It should have the space of conducting parent-teacher conferences.
Activity 6:
Explain the differences between relative grading and absolute grading by giving an
example of each.
Activity 7:
Faiza Shaheen, a student of MA Education (Secondary) has earned the following marks,
grades and GPA in the 22 courses at the Institute of Education & Research, University of
the Punjab. Calculate her CGPA. Note down that that maximum value of GPA in each
course is 4.
Activity 8:
Write Do’s and Don’ts in order of priority as per your perception. You may add more
points or exclude what have been mentioned above.
Part-I: MCQs:
10.Who said that ‘lack of information provided to consumers about test data has
negative and sweeping consequences’
a) Hopkins & Stanley
b) Anderson
c) Linn & Gronlund
d) Barber et al.
e) Kearney
THE END