0% found this document useful (0 votes)
68 views14 pages

PSC 352 Handout Planning A Test

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 14

PSC 352 Handout

Planning a Test
1. Outline learning objectives or major concepts to be covered by the test
Test should be representative of objectives and material covered
Major student complaint: Tests do not fairly cover the material that was
supposed to be canvassed on the test.
2. Create a test blueprint
3. Create questions based on blueprint
Match the question type with the appropriate level of learning
4. For each check on the blueprint, jot down 3-4 alternative question on ideas and
item types which will get at the same objective
5. Organize questions and/or ideas by item types
6. Eliminate similar questions
7. Organize questions logically
8. Time yourself actually taking the test and then multiply that by about 4 depending
on the level of students
9. Analyze the results (item analysis)

Translating Course Objectives/Competencies into Test Items


Syllabus
Specification table- what was taught/weight areas to be tested
Creating a Test Blueprint
Blueprint- this is the test plan, i.e., which questions test what concept
Plotting the objectives/competencies against some hierarchy representing
levels of cognitive difficulty or depth of processing
(Listing instructional objectives of subject matter for which the test is constructed
Listing the main topics covered or to be covered by the test
Marrying the objectives and list of topics to construct the table of specification)

Thinking Skills
What level of learning corresponds to the course content
Blooms Taxonomy of Educational Objectives
Knowledge (see handout)
Comprehension
Application
Analysis
Synthesis
Evaluation

Practical Considerations
Representative sample of the course content not random purposeful based on
blueprint
Representative sample of skill or cognitive levels across content
Analyze results by level AND content area

1
Choosing between objective and subjective test items
There are two general categories of test items: (1) objective items which require students
to select the correct response from several alternatives or to supply a word or short phrase
to answer a question or complete a statement; and (2) subjective or essay items which
permit the student to organize and present an original answer. Objective items include
multiple-choice, true-false, matching and completion, while subjective items include
short-answer essay, extended-response essay, problem-solving and performance test
items. For some instructional purposes one or the other item types may prove more
efficient and appropriate.

WHEN TO USE ESSAY OR OBJECTIVE TESTS

Essay tests are especially appropriate when:

the group to be tested is small and the test is not to be reused.


you wish to encourage and reward the development of student skill in writing.
you are more interested in exploring the student's attitudes than in measuring
his/her achievement.
you are more confident of your ability as a critical and fair reader than as an
imaginative writer of good objective test items.

Objective tests are especially appropriate when:

the group to be tested is large and the test may be reused.


highly reliable test scores must be obtained as efficiently as possible.
impartiality of evaluation, absolute fairness, and freedom from possible test
scoring influences (e.g., fatigue, lack of anonymity) are essential.
you are more confident of your ability to express objective test items clearly than
of your ability to judge essay test answers correctly.
there is more pressure for speedy reporting of scores than for speedy test
preparation.

Either essay or objective tests can be used to:

measure almost any important educational achievement a written test can


measure.
test understanding and ability to apply principles.
test ability to think critically.
test ability to solve problems.
test ability to select relevant facts and principles and to integrate them toward the
solution of complex problems.

In addition to the preceding suggestions, it is important to realize that certain item types
are better suited than others for measuring particular learning objectives. For example,
learning objectives requiring the student to demonstrate or to show, may be better
measured by performance test items, whereas objectives requiring the student to explain

2
or to describe may be better measured by essay test items. The matching of learning
objective expectations with certain item types can help you select an appropriate kind of
test item for your classroom exam as well as provide a higher degree of test validity (i.e.,
testing what is supposed to be tested). To further illustrate, several sample learning
objectives and appropriate test items are provided on the following page.

After you have decided to use either an objective, essay or both objective and essay
exam, the next step is to select the kind(s) of objective or essay item that you wish to
include on the exam. To help you make such a choice, the different kinds of objective and
essay items are presented in the following section of this booklet. The various kinds of
items are briefly described and compared to one another in terms of their advantages and
limitations for use. Also presented is a set of general suggestions for the construction of
each item variation.

Anatomy of a Multiple-Choice Item


A standard multiple-choice test item consists of two basic parts: a problem (stem) which
identifies the question or problem, and a list of suggested solutions (alternatives). The
stem may be in the form of either a question or an incomplete statement, and the list of
alternatives contains one correct or best alternative (answer) and a number of incorrect or
inferior alternatives (distracters). The purpose of the distracters is to appear as plausible
solutions to the problem for those students who have not achieved the objective being
measured by the test item. Conversely, the distracters must appear as implausible
solutions for those students who have achieved the objective. Only the answer should
appear plausible to these students. Students are asked to select the one alternative that
best completes the statement or answers the question. For example:
(a) Item Stem: Which of the following is a chemical change?
(b) Response Alternatives: a. Evaporation of alcohol
b. Freezing of water
*c. Burning of oil
d. Melting of wax (*correct response)

Guidelines for Writing: Stems


Place most of the subject matter in the Stem
ensures full statement of problem
Eliminate extraneous material from the Stem
goal is to measure student achievement, not to present new material
maximize use of time for demonstrating understanding, not reading ability
Avoid Negatively phrased Stems
students may miss the qualifier
use only when learning outcome requires this type of differentiation
Ensure similarity among alternatives with regard to:
grammatical structure
length
mode of expression

3
Grammatical errors provide unintentional clues to the answer
When in doubt, students will select the longest alternative as the correct answer
Make one of the alternatives the most clearly correct or best answer
exception: multiple answer form
reduces intrinsic ambiguity
reduces frustration during test
Make distracters plausible
desire to attract students who really do NOT know the answer to the
question
create distracters from elements of the correct response
improves reliability of item
Avoid parallel language between the Stem and the Correct Response
gives clues to keyed response
emphasizes test-wiseness, not knowledge
Randomly distribute answers across the alternative positions
inexperienced test writers emphasize b and c alternatives (hide the
answer!!)
do NOT use an interpretable order of keyed responses
Use qualifiers such as all of the above and none of the above sparingly
test wise, students will use process of elimination to select answer
do NOT use to pad out the distracters because you cannot think of
another one.

Advantages of Multiple Choice Items


allow more and wide adequate sampling of content or objectives.
tend to more effectively structure the problem to be addressed
items can be more efficiently and reliably scored than supply items
different response alternatives can provide diagnostic feedback (item analysis)
items can be constructed to address various levels of cognitive complexity, i.e.,
versatility in measuring all levels of cognitive ability.
highly reliable test scores.
objective measurement of student achievement or ability.
a reduced guessing factor when compared to true-false items.

Disadvantages of Multiple Choice Items


difficult and time consuming to construct good items
leads to emphasis on other selected response item types
can lead the instructor to favour simple recall of facts
high degree of dependence on students reading and instructors writing ability
can be difficult to achieve clarity of expression
measuring synthesis and evaluation can be difficult
inappropriate for measuring outcomes that require skilled performance

SAMPLE EXAMPLES FOR WRITING MULTIPLE-CHOICE TEST ITEMS


The Stem
1. When possible, state the stem as a direct question rather than as an incomplete

4
statement.
Undesirable: Alloys are ordinarily produced by ...
Desirable: How are allows ordinarily produced?

2. Present a definite, explicit and singular question or problem in the stem.


Undesirable: Psychology ...
Desirable: The science of mind and behaviour is called ...

3. Eliminate excessive verbiage or irrelevant information from the stem.


Undesirable: While ironing her dress, Jane burned her hand accidentally on the hot iron.
This was due to a transfer of heat be ...
Desirable: Which of the following ways of heat transfer explains why Jane's hand was
burned after she touched a hot iron?

4. Include in the stem any word(s) that might otherwise be repeated in each
alternative.
Undesirable: Consider two steel tapes. Tape A measures 100 m at 70C whereas tape B
measures 200 m at 70C. If the temperature decreases from 70C to 40C,
then

a) the length of tape A will decrease more than the length of tape B.
b) the length of tape B will decrease more than the length of tape A.
c) the length of each tape will decrease by the same amount.

Desirable: Consider two steel tapes. Tape A measures 100 m at 70C whereas tape B
measures 200 m at 70C. If the temperature decreases from 70C to 40C,
then the length of

a) tape A will decrease more than the length of tape B.


*b) tape B will decrease more than the length of tape A.
c) each tape will decrease by the same amount.

5. Use negatively stated stems sparingly. When used, bold or emphasise the negative
word.
Undesirable: Which of the following characters is not acquired through heredity?
A. Language spoken
B. Shape of nose
C. Colour of eyes
D. Temperament
Desirable: Which of the following characters is not (or not) acquired through
heredity?
A. Language spoken
B. Shape of nose

5
C. Colour of eyes
D. Temperament

6. Make all alternatives plausible and attractive to the less knowledgeable or skillful
student.
Which of the following processes is most nearly the opposite of photosynthesis?
Undesirable Desirable
a.Digestion a.Digestion
b.Relaxation b.Assimilation
*c.Respiration *c.Respiration
d.Exertion d.Catabolism

7. Make the alternatives grammatically parallel with each other, and consistent with the
stem.
Undesirable: What would do most to advance the application of atomic discoveries to
medicine?
*a. Standardized techniques for treatment of patients.
b. Train the average doctor to apply radioactive treatments.
c. Remove the restriction on the use of radioactive substances.
d. Establishing hospitals staffed by highly trained radioactive therapy
specialists.

Desirable: What would do most to advance the application of atomic discoveries to


medicine?
*a. Development of standardized techniques for treatment of patients.
b. Training of the average doctor in application of radioactive treatments.
c. Removal of restriction on the use of radioactive substances.
d. Addition of trained radioactive therapy specialists to hospital staffs.

8. Make the alternatives mutually exclusive.


Undesirable: The daily minimum required amount of milk
that a 10 year old child should drink is
a. 1-2 glasses.
*b. 2-3 glasses.
*c. 3-4 glasses.
d. at least 4 glasses.
Desirable: What is the daily minimum required amount of
milk a 10 year old child should drink?
a. 1 glass.
b. 2 glasses.
*c. 3 glasses.

6
d. 4 glasses.

9. When possible, present alternatives in some logical order (e.g., chronological, most to
least, alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour
and the other truck averages 38 miles per hour. At what time will they be 24 miles apart?
Undesirable Desirable
a. 6 p.m. a. 1 a.m.
b. 9 p.m. b. 6 a.m.
c. 1 a.m. c. 9 a.m.
*d. 1 p.m. *d. 1 p.m.
e. 6 a.m. e. 6 p.m.
10. Be sure there is only one correct or best response to the item
Undesirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
*d. consistency.
Desirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
d. standardization.
11. Make alternatives approximately equal in length.
Undesirable: Which of the following statements is the best definition of
refraction is?
A. Passing through a boundary
*B. Changing direction when crossing a boundary
C. Bouncing off a boundary
D. Changing speed at a boundary

Neurotics are more likely than psychotics to


a. be dangerous to society
b. have delusional symptoms
c. be dangerous to themselves
*d. have insight into their own inappropriate behaviour but
nevertheless feel rather helpless in terms of dealing with their
difficulties

7
12. Avoid irrelevant clues such as grammatical structure, well known verbal associations
or connections between stem and answer.
Undesirable: A chain of islands is called an:
(grammatical *a. archipelago.
clue) b. peninsula.
c. continent.
d. isthmus.
Undesirable: The reliability of a test can be estimated by a coefficient of:
(verbal a. measurement.
association *b. correlation.
clue)
c. testing.
d. error.
Undesirable: The height to which a water dam is built depends on
(connection a. the length of the reserve CTE behind the dam.
between stem b. the volume of water behind the dam.
and answer clue)
*c. the height of water behind the dam.
d. the strength of the reinforcing wall.

13. Use at least four alternatives for each item to lower the probability of getting the item
correct by guessing.

14. Randomly distribute the correct response among the alternative positions throughout
the test having approximately the same proportion of alternatives a, b, c, d and e as the
correct response.

15. Avoid using the alternatives "none of the above" and "all of the above".

Alternate Response Items


Involves the selection of one of two alternatives
true / false
yes / no
Mainly for Knowledge & Comprehension
Can be written at higher levels
True / False
Word statements clearly. Vague or ambiguous wording will confuse students.
Avoid over generalizing.
Poor: Heavy smoking causes lung cancer. T F
Better: Heavy smoking often causes lung cancer. T F
Avoid Trick questions.
Do not use trivial statements to pad out the number of questions and marks to
arrive at a predetermined level.
Statements should be entirely true, or entirely false
Avoid using universal descriptors such as never, none, always, and all.

8
-Test wise, students will recognize that there are few absolutes.
Avoid negative words, as they are often overlooked by students.
Do not include two ideas in one statement unless you are evaluating students
understanding of cause and effect relationships.
Poor: Porpoises are able to communicate because they are mammals. T F
Better: Porpoises are mammals. T F
Porpoises are able to communicate. T F
Provide a T and F beside each statement and ask students to circle correct
answer.
-Avoids problem of students writing illegible letters.
Include more false than true statements in any given test and vary the number of
false statements from test to test.
-tendency to mark more statements true than false.
-discrimination between those who know the content and those who do not is
greater for false expressions.
Avoid using negative statements.
-Under the demands of the testing situation, students may fail to see the negative
qualifier.

Matching Items
Consist of
a column of premises
a column of responses
directions for matching the two.
Similar to multiple choice, but easier and more efficient to construct
Can be written to assess Knowledge, Comprehension, Application, Analysis level
behaviors
Provide clear instructions on how to indicate the correct answers.
Indicate whether the same response can be used more than once.
Maintain grammatical consistency within and between columns.
within a column: either sentence or point form
between columns: one or the other
Ensure that any matching question appears entirely on one page.

Guidelines for Writing Matching Items


Provide an unequal number of premises and responses
reduces guessing and elimination
increases measure of comprehension
Avoid designing questions which require students to draw lines between premise
and response.
confusing for student and marker
provide space for letter or number answers
Make sure lists are homogeneous.
i.e., do not include items testing names, dates, and events.
Instead, make every response plausible
Make the wording of the premises longer than the wording of the responses.

9
Identify the items in one list with numbers and those in the second list with letters.

Short Answer Test Items


Typically, the student is asked to reply with a word, phrase, name, or sentence,
rather than a more extended response.
Direct Questions / Short Answer
Incomplete Sentences / Fill In the Blanks
Items are fairly easy to construct and mark
Assess mainly knowledge, comprehension, and some application.

Guidelines for Writing Short Answer Items


Questions must be carefully worded so that all students understand the specific
nature of the question asked and the answer required.
Questions must be carefully worded so that all students understand the specific
nature of the question asked and the answer required.
Word completion or fill-in questions so that missing information is at, or near the
end of, the sentence. Makes reading and responding easier.
Instructions and teachers expectations about filling in blanks should be made
clear. Indicate whether each blank of equal length represents one word or several
words, whether long blanks require sentences or phrases, and whether
synonymous terms are accepted.
When an answer is to be expressed in numerical units, the unit should be stated.
Do not use too many blanks in completion items. The emphasis should be on
knowledge and comprehension, not mind reading!

As a summary, note the following:

Item Writing General Guidelines


Present a single clearly defined problem that is based on a significant concept
rather than trivial or esoteric ideas
Use simple, precise & unambiguous wording
Exclude extraneous or irrelevant information
Eliminate any systematic pattern of answers that may allow guessing correctly
Avoid presupposed knowledge which favors one group over another (fly ball
favors those that know baseball)
Refrain from providing unnecessary clues to the correct answer.
Avoid negatively phrased items (i.e., except, not)
Arrange answers in alphabetical / numerical order
Avoid None of the above or All of the above type answers
Avoid Both A & B or Neither A or B type answers

10
Comparison of Multiple-Choice (MC) & Essay
Essay Multiple-choice (MC)
Depth of learning Can measure application and more Can be designed to measure
complex outcomes. Poor for recall. application and more complex
outcomes as well as recall.
Item preparation Fewer test items, less prep time Relatively large number of
items, more prep time
Content sampling Limited, few items Broader content sampling
Encouragement Encourages organization, integration Easy to score with consistent
& effective expression of ideas results.

Scoring Time consuming, requires special Easy to score with consistent


measures for consistent results results.

Interpreting test scores


Teachers
High scores = good instruction
Low scores = poor students
Students
High scores = smart, well-prepared
Low scores = poor teaching, bad test
High scores- too easy, only measured simple educational objectives, biased scoring,
cheating, unintentional clues to right answers
Low scores- too hard, tricky questions, content not covered in class, grader bias,
insufficient time to complete test

What is your view of this interpretation?

Item Analysis
Main purpose of item analysis is to improve the test
Analyze items to identify:
Potential mistakes in scoring
Ambiguous/tricky items
Alternatives that do not work well
Problems with time limits

Question Arrangement on a Test


Group by question type
Common instructions will save reading time
Limit the number of times students have to change frame of reference
Patterns on test must be logical
Arrange from a content standpoint
Keep similar concepts together
Group by difficulty (easy to hard)

11
Deciding When Multiple-Choice Items Should Be Used
In order for scores to accurately represent the degree to which a student has attained an
educational objective, it is essential that the form of test item used in the assessment be
suitable for the objective. Multiple-choice test items are often advantageous to use, but
they are not the best form of test item for every circumstance. In general, they are
appropriate to use when the attainment of the educational objective can be measured by
having the student select his or her response from a list of several alternative responses.

If the attainment of the educational objective can be better measured by having


the student supply his response, a short-answer item or essay question may be
appropriate.
If there are several homogeneous test items, it may be possible to combine them
into a single matching item for more efficient use of testing time.
If the attainment of the objective can be better measured by having the student do
something, a performance test should be considered.

The Essay Test


Aside from taking notes in class, the type of writing you will be asked to do most often
for courses in nearly all disciplines will probably be answering timed essay questions.
These essays are really not so different from the ones you write as assignments,
except for two significant points: you can't get feedback from a peer or the
instructor, and only rarely are you given the chance to do serious revision.
Essay or subjective tests may include either short answer questions or long
general questions. These tests have no one specific answer per student. They are
usually scored on an opinion basis, although there will be certain facts and
understanding expected in the answer.
The main reason students fail essay tests is not because they cannot write, but
because they fail to answer the questions fully and specifically, and because their
answer is not well organized.
Essay tests require recall learning. Carefully figure out the major content areas to
learn. If you are not caught up, this is not a time to read everything in a frantic
manner. Focus on the key source for the test: notes or textbook, or whatever you
think will be most heavily covered on the test. It's better to understand and know a
few things very well than to have a large quantity of unorganized, poorly learned
material.
These suggestions may help:
List all topics sure to be a part of the test. List important subtopics for each.
Skim all the materials to be covered, checking those to be more intensively
studied.
Write down all the key topics covered in class and in your reading up until the test
date.
Read or reread all materials not understood; use a specific purpose when reading.
Develop a pool of information for each topic. Answering words like "who,"
"what," "where located," "how works," "key characteristics, " "cause-effect," and
"examples" for each topic will help to cover the critical information.

12
Tips for Grading Short Answer/Essay Questions
When grading Short Answer/Essay questions, instructors should focus primarily on two
goals: consistency and fairness. Consistency refers to the extent to which the same points
are awarded or subtracted for comparable information across students. Two students
making the same or comparable misinterpretations should receive the same deductions.
Fairness refers to the extent to which the points assigned or deducted reflect the
weighting of objectives in the test blueprint. If, for example, students are asked to solve a
series of related problems (e.g., the answer from one problem is used to solve another
problem), getting an intermediate step wrong should only result in losing points once.

The second problem presumably relates to a different objective, and if the problem is
solved correctly (given that the wrong initial value was used for one part of the problem),
full points should be awarded Achieving consistency and fairness while scoring CR items
is challenging. Below are a few guidelines which may help achieve these two goals.

1. Construct a detailed scoring rubric that identifies the basis for awarding
or subtracting points at each phase of each item. To do this, it may be
helpful to develop a model answer and think about the essential elements
in producing that answer. Pay careful attention to how to score errors of
omission and commission. While establishing your rubric, be cognizant of
the total number of points available for the item and make sure that it is
not possible to receive lower than zero points or more than the total
number of points for each item.
2. Short Answer/Essay items should be graded anonymously if at all possible
to reduce the subjectivity of graders. That is, graders should not be
informed as to the identity of the examinees whose papers they are
grading.
3. Grade all students responses to one question before moving on to grade
the second question. This helps the grader maintain a single set of criteria
for awarding points. In addition, it tends to reduce the influence of the
examinees previous performance on other items. If multiple graders are
used and it is not possible for all graders to rate all items for all students, it
is better to have each grader score a particular problem or two for all
students than to have each grader score all problems for only a subset of
students. This strategy is effective for eliminating effects due to one
person grading harder than another.
4. While grading a question, maintain a log of the types of errors observed
and their corresponding deductions. It is very difficult to anticipate every
error you will see, but this will allow you to maintain consistency across
exams. It may be necessary to re-examine some questions that had
already been graded to verify that the point deductions are consistent and
fair.
5. Unless writing skill is one of the course objectives, do not take off credit
for poor grammar, spelling errors, or failure to punctuate properly, unless
the quality of writing clearly interferes with your ability to understand
whether the student has adequately grasped the material. Never grade on

13
the basis of penmanship. CR items are difficult and time-consuming to
grade, but with carefully planned and methodically implemented grading
criteria, they can provide a richness of information not available through
only MC items.

Get feedback from the class about the test. Ask students to tell you what was
particularly difficult or unexpected.

14

You might also like