PSC 352 Handout Planning A Test
PSC 352 Handout Planning A Test
PSC 352 Handout Planning A Test
Planning a Test
1. Outline learning objectives or major concepts to be covered by the test
Test should be representative of objectives and material covered
Major student complaint: Tests do not fairly cover the material that was
supposed to be canvassed on the test.
2. Create a test blueprint
3. Create questions based on blueprint
Match the question type with the appropriate level of learning
4. For each check on the blueprint, jot down 3-4 alternative question on ideas and
item types which will get at the same objective
5. Organize questions and/or ideas by item types
6. Eliminate similar questions
7. Organize questions logically
8. Time yourself actually taking the test and then multiply that by about 4 depending
on the level of students
9. Analyze the results (item analysis)
Thinking Skills
What level of learning corresponds to the course content
Blooms Taxonomy of Educational Objectives
Knowledge (see handout)
Comprehension
Application
Analysis
Synthesis
Evaluation
Practical Considerations
Representative sample of the course content not random purposeful based on
blueprint
Representative sample of skill or cognitive levels across content
Analyze results by level AND content area
1
Choosing between objective and subjective test items
There are two general categories of test items: (1) objective items which require students
to select the correct response from several alternatives or to supply a word or short phrase
to answer a question or complete a statement; and (2) subjective or essay items which
permit the student to organize and present an original answer. Objective items include
multiple-choice, true-false, matching and completion, while subjective items include
short-answer essay, extended-response essay, problem-solving and performance test
items. For some instructional purposes one or the other item types may prove more
efficient and appropriate.
In addition to the preceding suggestions, it is important to realize that certain item types
are better suited than others for measuring particular learning objectives. For example,
learning objectives requiring the student to demonstrate or to show, may be better
measured by performance test items, whereas objectives requiring the student to explain
2
or to describe may be better measured by essay test items. The matching of learning
objective expectations with certain item types can help you select an appropriate kind of
test item for your classroom exam as well as provide a higher degree of test validity (i.e.,
testing what is supposed to be tested). To further illustrate, several sample learning
objectives and appropriate test items are provided on the following page.
After you have decided to use either an objective, essay or both objective and essay
exam, the next step is to select the kind(s) of objective or essay item that you wish to
include on the exam. To help you make such a choice, the different kinds of objective and
essay items are presented in the following section of this booklet. The various kinds of
items are briefly described and compared to one another in terms of their advantages and
limitations for use. Also presented is a set of general suggestions for the construction of
each item variation.
3
Grammatical errors provide unintentional clues to the answer
When in doubt, students will select the longest alternative as the correct answer
Make one of the alternatives the most clearly correct or best answer
exception: multiple answer form
reduces intrinsic ambiguity
reduces frustration during test
Make distracters plausible
desire to attract students who really do NOT know the answer to the
question
create distracters from elements of the correct response
improves reliability of item
Avoid parallel language between the Stem and the Correct Response
gives clues to keyed response
emphasizes test-wiseness, not knowledge
Randomly distribute answers across the alternative positions
inexperienced test writers emphasize b and c alternatives (hide the
answer!!)
do NOT use an interpretable order of keyed responses
Use qualifiers such as all of the above and none of the above sparingly
test wise, students will use process of elimination to select answer
do NOT use to pad out the distracters because you cannot think of
another one.
4
statement.
Undesirable: Alloys are ordinarily produced by ...
Desirable: How are allows ordinarily produced?
4. Include in the stem any word(s) that might otherwise be repeated in each
alternative.
Undesirable: Consider two steel tapes. Tape A measures 100 m at 70C whereas tape B
measures 200 m at 70C. If the temperature decreases from 70C to 40C,
then
a) the length of tape A will decrease more than the length of tape B.
b) the length of tape B will decrease more than the length of tape A.
c) the length of each tape will decrease by the same amount.
Desirable: Consider two steel tapes. Tape A measures 100 m at 70C whereas tape B
measures 200 m at 70C. If the temperature decreases from 70C to 40C,
then the length of
5. Use negatively stated stems sparingly. When used, bold or emphasise the negative
word.
Undesirable: Which of the following characters is not acquired through heredity?
A. Language spoken
B. Shape of nose
C. Colour of eyes
D. Temperament
Desirable: Which of the following characters is not (or not) acquired through
heredity?
A. Language spoken
B. Shape of nose
5
C. Colour of eyes
D. Temperament
6. Make all alternatives plausible and attractive to the less knowledgeable or skillful
student.
Which of the following processes is most nearly the opposite of photosynthesis?
Undesirable Desirable
a.Digestion a.Digestion
b.Relaxation b.Assimilation
*c.Respiration *c.Respiration
d.Exertion d.Catabolism
7. Make the alternatives grammatically parallel with each other, and consistent with the
stem.
Undesirable: What would do most to advance the application of atomic discoveries to
medicine?
*a. Standardized techniques for treatment of patients.
b. Train the average doctor to apply radioactive treatments.
c. Remove the restriction on the use of radioactive substances.
d. Establishing hospitals staffed by highly trained radioactive therapy
specialists.
6
d. 4 glasses.
9. When possible, present alternatives in some logical order (e.g., chronological, most to
least, alphabetical).
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour
and the other truck averages 38 miles per hour. At what time will they be 24 miles apart?
Undesirable Desirable
a. 6 p.m. a. 1 a.m.
b. 9 p.m. b. 6 a.m.
c. 1 a.m. c. 9 a.m.
*d. 1 p.m. *d. 1 p.m.
e. 6 a.m. e. 6 p.m.
10. Be sure there is only one correct or best response to the item
Undesirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
*d. consistency.
Desirable: The two most desired characteristics in a classroom test are validity and
a. precision.
*b. reliability.
c. objectivity.
d. standardization.
11. Make alternatives approximately equal in length.
Undesirable: Which of the following statements is the best definition of
refraction is?
A. Passing through a boundary
*B. Changing direction when crossing a boundary
C. Bouncing off a boundary
D. Changing speed at a boundary
7
12. Avoid irrelevant clues such as grammatical structure, well known verbal associations
or connections between stem and answer.
Undesirable: A chain of islands is called an:
(grammatical *a. archipelago.
clue) b. peninsula.
c. continent.
d. isthmus.
Undesirable: The reliability of a test can be estimated by a coefficient of:
(verbal a. measurement.
association *b. correlation.
clue)
c. testing.
d. error.
Undesirable: The height to which a water dam is built depends on
(connection a. the length of the reserve CTE behind the dam.
between stem b. the volume of water behind the dam.
and answer clue)
*c. the height of water behind the dam.
d. the strength of the reinforcing wall.
13. Use at least four alternatives for each item to lower the probability of getting the item
correct by guessing.
14. Randomly distribute the correct response among the alternative positions throughout
the test having approximately the same proportion of alternatives a, b, c, d and e as the
correct response.
15. Avoid using the alternatives "none of the above" and "all of the above".
8
-Test wise, students will recognize that there are few absolutes.
Avoid negative words, as they are often overlooked by students.
Do not include two ideas in one statement unless you are evaluating students
understanding of cause and effect relationships.
Poor: Porpoises are able to communicate because they are mammals. T F
Better: Porpoises are mammals. T F
Porpoises are able to communicate. T F
Provide a T and F beside each statement and ask students to circle correct
answer.
-Avoids problem of students writing illegible letters.
Include more false than true statements in any given test and vary the number of
false statements from test to test.
-tendency to mark more statements true than false.
-discrimination between those who know the content and those who do not is
greater for false expressions.
Avoid using negative statements.
-Under the demands of the testing situation, students may fail to see the negative
qualifier.
Matching Items
Consist of
a column of premises
a column of responses
directions for matching the two.
Similar to multiple choice, but easier and more efficient to construct
Can be written to assess Knowledge, Comprehension, Application, Analysis level
behaviors
Provide clear instructions on how to indicate the correct answers.
Indicate whether the same response can be used more than once.
Maintain grammatical consistency within and between columns.
within a column: either sentence or point form
between columns: one or the other
Ensure that any matching question appears entirely on one page.
9
Identify the items in one list with numbers and those in the second list with letters.
10
Comparison of Multiple-Choice (MC) & Essay
Essay Multiple-choice (MC)
Depth of learning Can measure application and more Can be designed to measure
complex outcomes. Poor for recall. application and more complex
outcomes as well as recall.
Item preparation Fewer test items, less prep time Relatively large number of
items, more prep time
Content sampling Limited, few items Broader content sampling
Encouragement Encourages organization, integration Easy to score with consistent
& effective expression of ideas results.
Item Analysis
Main purpose of item analysis is to improve the test
Analyze items to identify:
Potential mistakes in scoring
Ambiguous/tricky items
Alternatives that do not work well
Problems with time limits
11
Deciding When Multiple-Choice Items Should Be Used
In order for scores to accurately represent the degree to which a student has attained an
educational objective, it is essential that the form of test item used in the assessment be
suitable for the objective. Multiple-choice test items are often advantageous to use, but
they are not the best form of test item for every circumstance. In general, they are
appropriate to use when the attainment of the educational objective can be measured by
having the student select his or her response from a list of several alternative responses.
12
Tips for Grading Short Answer/Essay Questions
When grading Short Answer/Essay questions, instructors should focus primarily on two
goals: consistency and fairness. Consistency refers to the extent to which the same points
are awarded or subtracted for comparable information across students. Two students
making the same or comparable misinterpretations should receive the same deductions.
Fairness refers to the extent to which the points assigned or deducted reflect the
weighting of objectives in the test blueprint. If, for example, students are asked to solve a
series of related problems (e.g., the answer from one problem is used to solve another
problem), getting an intermediate step wrong should only result in losing points once.
The second problem presumably relates to a different objective, and if the problem is
solved correctly (given that the wrong initial value was used for one part of the problem),
full points should be awarded Achieving consistency and fairness while scoring CR items
is challenging. Below are a few guidelines which may help achieve these two goals.
1. Construct a detailed scoring rubric that identifies the basis for awarding
or subtracting points at each phase of each item. To do this, it may be
helpful to develop a model answer and think about the essential elements
in producing that answer. Pay careful attention to how to score errors of
omission and commission. While establishing your rubric, be cognizant of
the total number of points available for the item and make sure that it is
not possible to receive lower than zero points or more than the total
number of points for each item.
2. Short Answer/Essay items should be graded anonymously if at all possible
to reduce the subjectivity of graders. That is, graders should not be
informed as to the identity of the examinees whose papers they are
grading.
3. Grade all students responses to one question before moving on to grade
the second question. This helps the grader maintain a single set of criteria
for awarding points. In addition, it tends to reduce the influence of the
examinees previous performance on other items. If multiple graders are
used and it is not possible for all graders to rate all items for all students, it
is better to have each grader score a particular problem or two for all
students than to have each grader score all problems for only a subset of
students. This strategy is effective for eliminating effects due to one
person grading harder than another.
4. While grading a question, maintain a log of the types of errors observed
and their corresponding deductions. It is very difficult to anticipate every
error you will see, but this will allow you to maintain consistency across
exams. It may be necessary to re-examine some questions that had
already been graded to verify that the point deductions are consistent and
fair.
5. Unless writing skill is one of the course objectives, do not take off credit
for poor grammar, spelling errors, or failure to punctuate properly, unless
the quality of writing clearly interferes with your ability to understand
whether the student has adequately grasped the material. Never grade on
13
the basis of penmanship. CR items are difficult and time-consuming to
grade, but with carefully planned and methodically implemented grading
criteria, they can provide a richness of information not available through
only MC items.
Get feedback from the class about the test. Ask students to tell you what was
particularly difficult or unexpected.
14