Lesson 4 - Lecture
Lesson 4 - Lecture
Introduction
The previous lesson provides an overview of the whole process of designing and
implementing teacher-made assessment tools, and the emphasis of the whole process is
the writing of the test itself. Teachers, to be effective evaluators of learning must
therefore be equipped with skills in writing good tests. While it may be generally
acceptable that good tests do not happen overnight, a set of guidelines in writing specific
types of test will guide a prospective teacher or a beginning teacher to ensure that
learning targets are assessed properly and appropriately.
This lesson provides the different guidelines in writing the most commonly used
types of teacher -made test. These guidelines are not rules, but more of a set of
suggestions that will ensure that test items are written properly to the very least, and of
course to ensure the quality of the test as a whole. Each type of test is described,
strengths and limitations are identified and examples are given to illustrate the steps for
better understanding.
bjectives
The most common and effective way to assess knowledge is simply to ask a
question and require the students to answer it from memory. Items for which the students
respond to an incomplete statement are completion items or fill-in-the blank test.This
type offers the least freedom of student response, calling for one answer at the end of the
sentence. Responses may be in the forms of words, numbers or symbols.
Strengths:
1. They are easy to construct
2. Their short response time allows a good sampling of different facts
3. Guessing contributes little to error
4. Scorer reliability is high
5. They can be scored quickly than short-answer or essay items
6. They provide more valid results than a test with an equal number of selected-
response items (e.g., multiple choice)
Limitations:
Example: The textbook statement is “The criterion that refers to the extent to
which the test yields consistent, dependable and stable scores is called reliability”.
Poor item: The criterion that refers to the extent to which the test yields
consistent, dependable and stable scores is called _____________.
Improved: The test yields consistent, dependable and stable scores, the test is
said to be _______________.
2. Word the sentence so that only one brief answer is correct. The single greatest error in
writing completion items is to use sentences that can be legitimately completed with
more than one response. This is true when the sentence is open-ended. Avoid
indefinite statements.
Poor: Jose Rizal was born in ________. (in may mean year or place)
Improved: Jose Rizal was born in the year ______.
Or Jose Rizal was born in the province of ______.
3. Place one or two blanks at the end of the sentence. If blanks are placed at the
beginning or at the middle, it may be difficult for the students to understand what
response is called for. It is easier to first read the sentence and then determine what
will complete it correctly. (That is why it is called completion item).
Improved: The name of the president who decided to have the atomic bomb
dropped on Japan in 1945 was ______.
4. Do not include several blanks in a single sentence. This will confuse students and
measure reasoning skills as much, if more, than recall. Avoid over mutilated statements.
Poor: The name of the ______ who decided to have the ______dropped
on_______ in 1945 was _______.
Improved: The US President who decided to have the atomic bomb dropped
on _______ in 1945 was _______.
Poor: The distance between the moon and the Earth is _______.
Improved: The distance between the moon and the earth is _______ miles.
6. Do not include clues to the correct answer. The common wording errors are using
single or plural verbs and wording the sentence so that blank is preceded by “ a “ or “an”.
Poor: The supply –type used to measure the ability to organize and
integrate material is called an ______.
Improved: Supply-type items used to measure the ability to organize and
integrate materials are called ______.
8. Make the blanks of uniform length. Unequal length of blanks may provide clue to the
answers.
Poor: Jose Rizal wrote the novels ______ and ________________ .
Improved: Jose Rizal wrote the novels _____________ and _____________ .
Poor: The two types of teacher-made tests are ____________ and ____.
Improved: The two types of teacher-made tests are ________ and ________.
9. Allow one point for each correctly filled blank. Avoid fractional credits or unequal
weighing of items based on difficulty or importance, because complicating scoring
usually fails to improve reliability or validity (Stanley, 1964).
Poor: The part of the flower that produces seeds is ________(2pts) while
the part that attracts insect is ________(1 pt).
Poor: The authors of the first performance test of intelligence were ____.
Improved: The first performance test of intelligence was prepared by ______.
11. Avoid unordered series within an item. It may be difficult to score. Request that they
be listed in unique order, perhaps alphabetical or numerical.
Poor: The five major parts of a plant are _______, _______, _______,
________ and ________.
Improved: The five major parts of a plant from the lowest part are 1)
_______, 2)_______, 3)_______, 4) ________ and 5) ________.
12. For easy scoring, prepare scoring key. Try to choose statements in which there is
only one correct response for the blanks. The required response should be a single
word or a brief phrase. Arrange the items so that the answers are in a column at the
right of the sentences.
13. Refrain from using a particular grammatical form, common expression or well-known
saying as a completion item.
II. SHORT ANSWER TEST
This type, in which students supply an answer consisting of one word, a few
words, or a sentence or two, is generally preferred to completion items for assessing
recall targets. First, this type is similar to how teachers phrase questions and direct
student behavior during instruction, making questions more natural for the students.
Second, it is easier for teachers to write these items to more accurately measure
knowledge.
Short-answer items are usually stated in the form of a question (e.g., “What is the
largest planet in the solar system?”). They can also be stated in general directions (e.g.,
“Define each of the following terms’), and they can require responses to visual stimulus
materials (e.g., “Name each of the countries identified with arrows A-D’).
Strengths:
1. They are easy to construct
2. Their short response time allows a good sampling of different facts
3. Guessing contributes little to error
4. They can be scored quickly than essay items
5. They provide more valid results than a test with an equal number of selected-
response items (e.g., multiple choice)
Limitations:
1. Scoring takes longer
2. Scoring is more subjective.
2. State the item so that the required answer is brief. Keep students’ responses to a word
or two, or a short sentence or two if necessary, by properly wording the item, offering
clear directions and providing spaces or blanks that indicate the length of the
responses.
Poor: What does the term reptile mean?
______________________________________________________
______________________________________________________
______________________________________________________
Improved: Name three characteristics of reptiles.
1. ________________
2. ________________
3. ________________
3. Do not use questions verbatim from the textbooks and other instructional materials.
This will discourage students from rote memorization
4. Designate units required for the answer. This will avoid the time students may take to
try figuring out what is wanted.
5. State the items using a few words that students understand. Avoid using words or
phrases that may be difficult for some students to understand.
Poor: What was the name of the extraordinary president of the United
States who earlier had used his extensive military skills in a
protracted was with exemplary soldiers from another country?
Improved: Who was the United States general who earlier defeated British
and later became president?
Poor: What do you call the kind of transfer of pollen grain which
requires the pollinator, either insect or wind, to transfer the pollen
grain from one flower to another?
Improved: What is the process of transfer of pollen grain from one flower to
another?
Generally, short answer tests are used to measure lower order thinking skills, but
we can also use this type of test to measure more complex thinking skills. Short-answer
items can assess thinking skills when students are required to supply a brief response to a
question or situation that can be understood only by the use of the targeted learning skills.
Reasoning tasks, like decision making and critical thinking, however are not assessed
very well with short-answer items. The following are examples of short answer tests and
the specific higher order thinking skills it measures.
Examples
(Comparing)
(Deductive reasoning)
Coach Mike substitutes his basketball players by height, so that the first
substitute is the tallest player on the bench, the next substitute is the next tallest,
and so forth. Reginald is taller than Sam, and Juan is taller than Reginald. Which
of these players should Coach Mike play first?
(Analysis/prediction)
The principal needs to decide if the new block schedule allows teachers to
go into topics in greater detail. He can ask a parent, teacher, or a principal from
another school. Who should he ask to get the most objective answer?
(Investigating)
Several paper towel companies claim that their products absorb more
liquid that the other brands. Design an experiment to test absorbency of each
brand of paper towel
(Analysis)
List the anatomical structure s of the kidney, explain the function of each
part, and describe how they all work together
Matching items effectively and efficiently measure the extent to which students
know related facts, associations, and relationships. In a matching item, the items on the
left are called the premises or the question column, and the right are the responses or the
option column. The students’ task is to match the correct response with each of the
premises.
Strengths:
1. The teacher can obtain a very good sampling of recall knowledge.
2. It is easily and objectively scored.
3. Easier to construct than multiple choice items
4. Reading and response time is short
Limitations:
1. This item type is largely restricted to simple knowledge outcomes based on
association
2. When there is insufficient material to include in the item, the items are weak
measures because irrelevant information is added.
Example 1:
Directions: Match the planet’s description in Column A with the list of planets in
Column B. Write the letter of the planet next to the number of the corresponding
descriptions. Each planet in Column B may be used once, more than once, or not at all.
Column A Column B
_____ 1. It is the nearest planet to the sun a. Pluto
_____ 2. It is the largest planet b. Neptune
_____ 3. It is also referred to as the red planet c. Uranus
_____ 4. It is known as the Earth’s sister planet d. Saturn
_____ 5. It is the third planet from the sun e. Jupiter
_____ 6. It is considered as the most beautiful planet f. Mars
_____ 7. It is the farthest planet from the sun g. Earth
_____ 8. It is the planet with five satellites h. Venus
i. Mercury
Example 2:
Directions: Filipino presidents are listed in column B, and descriptive phrases relating to
their administration are listed in column A. Place the letter of the phrase that describes
each president in the space provided. Each match is worth 1 point. Two of the options
cannot be matched to any of the item.
Column A Column B
___1. His term was described as Philippines’ a. Emilio Aguinaldo
Golden Years
___2. He was known for “Filipino First Policy” b. Manuel Quezon
___3. He was the First president of the Philippine c. Ramon Magsaysay
Republic
___4. He was considered the Father of National d. Ferdinand Marcos
Language
___5. He established the first Land Reform Law e. Diosdado Macapagal
f. Carlos Garcia
g. Elpidio Quirino
IV. TRUE OR FALSE AND OTHER BINARY-CHOICE ITEMS
When students select and answer from only two response categories, they are
completing a binary-choice item or sometimes called alternative response. The most
popular binary-choice item is true/false question. Other types of options can be
right/wrong, correct/incorrect, yes/no, fact/opinion, agree/disagree, and so on. Binary-
choice items are constructed in a form of a propositional (declarative) statement and one
of the two choices must be absolutely true or false, correct or incorrect, and so on.
Strengths:
1. Students are familiar with the items because such questions are similar to
what is asked in class.
2. Short binary items provide for an extensive sampling of knowledge because
students are able to answer many items in a short time
3. Items can be written in short, easy-to-understand sentences.
4. Scoring is objective and quick.
Limitations:
1. It is susceptible to guessing.
2. It is difficult to write items beyond the knowledge level that are free from
ambiguity
3. No diagnostic information is provided by the incorrect answers.
1. Include only one central idea in each statement. The decision should not depend
on some subordinate point or trivial detail because students tend to be confused and the
answer is more apt to be influenced by reading ability than the intended outcome.
3. Word the statement so precisely that can be clearly judged as true or false. Specific
determiners or give-away qualifiers are words which make the statement looks like
true or false. These should be avoided because they provide clues.
Specific determiners that are most likely to be true are may, some,
possible, seldom, sometimes, usually, often, frequently, generally, as a rule.
Specific determiners that are most likely to be false are all, always, none,
never, no, not and nothing.
4. Use negative statement sparingly and avoid double negatives. The “no” and/or “not”
in negative statement are frequently overlooked and they are read as positive statement.
Statements including double negatives tend to be so confusing that they should be
restated in positive form.
5. When cause-effect relationships are being measured, use only true propositions. When
used for this purpose, both propositions should be true, and only the relationship
judged true or false.
If the statement is to test for the truthfulness or falsity of a reason, the main clause
should be true and the reason either true of false.
6. Do not try to trick students. This happens when a word that changes the meaning of an
idea is included. Trick statements appear to be true but are really false because of the
petty insertion of some inconspicuous word, phrase or letter. This practice undermines
your credibility, frustrate students, and provide less valid measures of knowledge.
7. Commands cannot be true or false. They do not state or assert anything; they simply
direct.
10. Limit each statement to the exact point to be tested, avoid partly true and partly false
statements.
11. Avoid ambiguous statements. Ambiguous statement is one that may be true with one
interpretation and false with another equally plausible interpretation.
13. Avoid quantitative language wherever possible like few, many, more, frequent, great,
and large.
14. Require the simplest possible method of indicating the response. Indicate by a short
line or by ( ) where the response is to be recorded.
Example: If the second part of the sentence explains why the first sentence is true,
circle T for true; if it does not explain why the first part is true, circle F for false.
V. MULTIPLE-CHOICE ITEMS
Multiple choice items are used widely in schools, even though they may not be
the best method for assessing recall knowledge.
Multiple choice items have five distinct parts: 1) stem, in the form of a question or
incomplete statement, 2) options or alternatives, 3) distracters/distracters, 4) key and 5)
stimulus materials (in some forms of multiple-choice) which appear in the form of
paragraph, table, graph, picture or illustration. Stimulus materials provide information
where the questions are based. The alternatives contain one correct or best answer and
two or more distracters. For measuring recall knowledge, it is usually best to use a direct
question as the stem because it is easier to write and its format is familiar to students.
Example:
(The illustration is the stimulus material. The question is the stem. There are 4
alternatives. Option b is the key while a, c and d are the distracters)
Strengths:
1. Learning outcomes from simple to complex can be measured.
2. Highly structured and clear tasks are provided.
3. A broad sample of achievement can be measured
4. Incorrect alternatives provide diagnostic information.
5. Scores are less influenced by guessing that true-false items.
6. Scoring is easy, objective and reliable.
Limitations:
1. Constructing good items is time consuming.
2. It is frequently difficult to find plausible distracters.
3. Scores can be influenced by reading ability.
a. They measure personality traits that make for effective use of one’s
ability, like motivation.
b. They identify the type of activities the individual would tend to select.
c. They estimate the student’s capacity to profit from academic instruction.
d. They appraise the present academic ability of the student.
b. Correct Answer Variety
a. Mars
b. Jupiter
c. Neptune
d. Uranus
a. octagon
b. hexagon
c. pentagon
d. heptagon
2) Setting-and-Option Variety. It uses stimulus materials. The responses to this type of test
are dependent upon a setting or foundation of some sort. Stimulus material can be a
graphical representation, equation, picture, sentence, table, chart and paragraph.
Example:
2. Write the stem as a clearly described question or task. It is best to put as much
information as possible in the stem and not the responses, as long as the stem does not
become too wordy. The stem is longer than the alternatives but, in the end, a good
indicator of an effective stem is if students have a tentative answer in mind quickly,
before reading the options.
Improved: The inference made on the basis of the test scores refers to ______.
a. Measurement error
b. Reliability
c. Validity
d. Stability
3. Avoid the use of negative in stem. Using words like not and except will confuse
students and create anxiety and frustration. So try to word the stem positively.
Example 1:
Poor: What are the differences between invertebrates and vertebrates?
( There are many differences between them, according to structure,
etc.)
Example 2:
Poor: Mosquitoes and flies are disease carriers. How can we protect
ourselves from them?
a. Spray insecticides
b. Destroy their young
c. Clean the surroundings
d. Use mosquito nets and cover our food
Poor: If there are 10 books and 15 children in the library, the library
lacks how many books?
a. 3 b. 5 c. 7
Improved: There are 15 children and 10 books in the library. If each child will
be given one book, how many more books are needed?
a. 3 b. 5 c. 7
6. The question should not be trivial. There should be a consensus on its answer.
Sample text: “Biology is the study of life, from the Greek words bio, which means
life,and logo, which means study….”
Poor: _______ is the study of life, from the Greek words bio, which means
life,and logo, which means study.
a. Biology
b. Chemistry
c. History
8. Articles “an” and “a” should be avoided as last word in an incomplete sentence variety.
Improved: What was the pen name used by Dr. Jose Rizal?
A. Di Masalang C. Di Pasisiil
B. Laong Laan D. Both A and B
10. Do not use modifiers that are vague such as much, often, usually.
Improved: About 70% of rice supply in the Philippines came from ______.
a. Nueva Ecija
b. Mt. Province
c. Metro Manila
d. Laguna
1. Teodora Alonzo, the mother of Dr. Jose Rizal, has ____ sisters.
a. 3 b. 5 c. 6 d. 8
….
The two items above (numbers 1 and 9) should not be used at the same time
in a test because the stem in number 1 reveals the answer to item number 9.
13. Avoid presenting sequenced items in the same order as in the text.
The following terms are discussed in a certain lesson in this order: photosynthesis,
photothropism and stomata. You should not ask questions regarding these terms in
the same order, for example, you will place photsynthesis in number 1, phototropism
in number 2 and stomata in number 3. Why? This may only tap the student’s rote
memorization.
14. Avoid the use of unnecessary words or phrases which are not relevant to the problem
at hand.
Poor: Aling Nena went to the market early in the morning. She met her
childhood friend there and made a chat. After an hour, she went to
Mang Ador, the fish vendor, who happened to be their neighbor in the
village where she lives in. She bought 3 kilos of tilapia at 130 pesos
per kilo. How much will Aling Nena pay the fish vendor?
Improved: Aling Nena bought 3 kilos of tilapia from the fish vendor at 130
pesos per kilo.How much will Aling Nena pay?
Stick to the objectives of the test. Consider the appropriateness of item format
with the kind of objective being tested. If the objective is to test only for knowledge
of terminology, refrain from requiring students from writing essays. In the same
manner, if the objective is to test the ability of the students to compare or justify a
decision, refrain from giving enumeration of terms.
There is only one Dr. Jose Rizal in the list of Philippine heroes, no need to be so
specific in his name, unless you are asking for the complete name of the hero.
B. Constructing Alternatives
1. Three alternatives for grades I-III, four for grades IV-VI,and at least 4 for high school
2. Label the stem using number and label alternatives using letters.
If the alternatives are dates, arrange them from the earliest to the most recent, or
the other way around, from the most recent to the earliest.
If the alternatives have the same length, arrange them alphabetically. Exception to
this rule are the alternatives with natural order(rule 3).
6. Alternatives should be grammatically consistent with the stem. If you are asking of a
name of a person, all alternatives should be names of person, or else, your alternative
will become obviously wrong. No one will choose a choice that is obviously wrong.
Poor: Bahay kubo, kahit munti. Ang halaman doon ay sari-sari. Singkamas at
talong. Sigarilyas at _________.
A. Kamatis C. Mani
B. Bawang D. Aso (not plant)
Improved: Bahay kubo, kahit munti. Ang halaman doon ay sari-sari.
Singkamas at talong. Sigarilyas at _________.
A. Kamatis C. Luya
B. Bawang D. Mani
7. Avoid responses that overlap or include each other. All responses should be
mutually exclusive.
(Pistil is the collective term for the female part of the flower. It includes the
ovary.)
8. Alternatives “None of these”, “Both A and C”, “All of the above” and the like should
be used sparingly and with care. If you used them, at least once in a while, they should
also be the correct answer.
Refrain from using them as the last choice in all the items in a test, as students
might think that you were only getting out of alternatives and might not take them
seriously.
Refrain from using” none of the above” and “all of the above” if the alternatives
are arranged horizontally. You can only use these if the alternatives are arranged
vertically.
Sometimes, students who do not really know the answer might think that what is
always well-explained, peculiar or technical-sounding may be the correct answer.
10. If you want to control the difficulty of the item, you can do so by varying the
homogeneity of responses.
Some students may find the best answer variety of Multiple Choice,as more
difficult than the other variety, because of the homogeneity of the alternatives provided.
Write the distracters to be plausible yet clearly wrong. If the distracters are
obviously wrong, they are useless because the intent of a multiple-choice item is to
have students discriminate among plausible answers.
Poor: Which of the following is the largest city in the United States?
a. Michigan
b. New York
c. London
d. Berlin (not a city in the US)
Improved: Which of the following is the largest city in the United States?
a. Los Angeles
b. New York
c. Chicago
d. Miami
This happens when your purpose if to facilitate scoring. Students have the
tendency to look for pattern of answers (like a,b,c,d,..a,b,c,d or a,a,a,b,b,b,c,c,c,d,d,d,
etc.). If this happens, students may get the correct answers even though they do not even
read the items. Take note, guessing reduces the validity of the test.
Multiple choice items can also be used for assessing reasoning in two ways. One
is to focus on a particular skill and the other is to assess the extent to which the students
can use their knowledge and skills in performing a problem solving or other reasoning
tasks.
Examples: ( Focusing on a particular task)
(Identifying assumptions)
When Patrick Henry said “give me liberty or give me death,” his assumption was
that:
a. Everyone would agree with him
b. Thomas Jefferson would be impressed by the speech
c. If he couldn’t have freedom he might die as well
d. His words would be taught to students for years
(Comparison)
One way in which insects are different from centipedes is that:
a. They are different colors
b. One is an arthropod
c. Centipedes have more legs
d. Insects have two body parts
(Analysis)
Roy decided to go sailing with a friend. He took supplies with him so he could
eat, repair anything that might be broken, and find where on the lake he could sail.
Which of the following supplies would best meet his needs?
a. Bread, hammer, map
b. Milk, bread, screwdriver
c. Map, hammer, pliers, screwdrivers
d. Screwdriver, hammer, pliers
(Synthesis)
What is the main idea in the following paragraph?
Ann picked a pretty blue boat for her first sail. It took her about an hour to
understand all the parts of the boat and another hour to get the sail on. Her first
sail was on a beautiful summer day. She tried to go fast but couldn’t. After
several lessons she was able to make her boat go fast.
a. Sailing is fun
b. Ann’s first sail
c. Sailing is difficult
d. Going fast on a sailboat
(Hypothesizing)
If there were a significant increase in the number of hawks in given area,
a. The number of plants would increase
b. The number of mice would increase
c. There would be fewer hawk nests
d. The number of mice would decrease
(Problem-solving)
Farmers want to be able to make money for the crops they grow, but too many
farmers are growing too may crops. What can the farmers do to make more
money?
a. Agree to produce fewer crops
b. Reduce the number of farmers
c. Try to convince the public to pay higher prices
d. Work on legislation to turn farmlands into parks
(Critical thinking)
Pablo is deciding which car to buy. He is impressed with the sales representative
for the Toyota, and he likes the color of the Mitsubishi. The Toyota is smaller
and gets more kilometers to the gallon. The Mitsubishi takes larger tires and has a
smaller trunk. More people can ride in the Toyota. Which car should Pablo
purchase if he wants to do everything he can to ensure that his favorite lake does
not become polluted?
a. Toyota
b. Mitsubishi
c. Either car
d. Can’t decide from the information provided
(Predicting)
Suppose that Central Luzon, which grows most of the country’s rice, suffered a
drought for several years and produced much less rice than usual. What could
happen to the price of the rice?
a. The price would rise
b. The price would fall
c. People would eat less rice.
d. The price would stay the same
Strengths:
1. It is possible to measure more reasoning skills in greater depth because there
are many questions about the same information.
2. It is possible to separate the assessment of the reasoning skills from content
knowledge of the subject.
3. It is relatively easy to use materials that students will encounter in everyday
living, such as maps, newspaper articles, and graphs.
4. The results are more reliable because it provides a standard structure for all
students and are scored objectively
Limitations:
1. It is time consuming and difficult to write.
2. Unable to assess how students organize their thoughts and ideas
3. Most items rely heavily on reading comprehension
Joy lost her pencil on her way to school. It was red and given to her by her
grandmother. She wanted the teacher to ask the class if anyone found the pencil.
Key: Circle Yes if the information in the sentence will help the class find the pencil.
Circle No if the information in the sentence will not help the class find the pencil.
Figure 1. Number of Elementary, High School and College Students Graduating from
Region III
Based on Figure 1. Circle T if the statement is true and F if the statement is false.
The essay question is especially useful for measuring ability to organize, integrate,
and express ideas. It provides freedom of response. It also requires the students to
interpret information, give arguments and explanations, evaluate the merit of the idea,
and conduct other types of reasoning, thus it is an excellent way to measure deep
understanding and mastery of complex information.
Strengths:
1. The highest level of understanding, complex thinking and reasoning skills can
be assessed.
2. Preparation is less than for selection-type of test.
3. The integration and application of ideas is emphasized.
4. It motivates better study habits and provides students flexibility in how to
respond.
5. It discourages rote learning and guessing.
Limitations:
1. Reading and scoring is very time-consuming, highly subjective and
notoriously unreliable.
2. There is inadequate sampling of achievement due to time needed for
answering the questions.
3. It is difficult to relate to intended learning outcomes because of freedom to
select, organize and express ideas.
4. Scores are raised by writing skills and bluffing, and lowered by poor
handwriting, misspelling and grammatical errors
Examples:
Why are tomatoes better for your health than potato chips?
What is the effect of inflation of raising the prime interest rate?
Describe the relative merits of selection-type test items and essay
questions for measuring learning outcomes at the comprehension level.
Confine your answer to one page.
Examples:
1. Explain how the fertilizers farmers use to grow crops may pollute our
river and streams.
2. Describe the major events that led to People Power Revolution in 1986.
3. Give an example, new to me and not one from class, of how the law of
supply and demand would make prices of some products increase.
4. Write a critical evaluation of this test using the rules and standards for
test constructions described in the textbook. Include a detailed
analysis of the test’s strengths and weaknesses and an overall
evaluation of its overall quality.
5. In teaching a particular lesson, prepare a complete plan for evaluating
student achievement. Be sure to include the procedures you would
follow, the instruments you would use, and the reason for your choices.
1. Construct the item to elicit skills identified in the learning target. A good way to
begin writing the item to match the target is to start with a standard stem. Then
modify it as needed for the subject and level of student ability. Examples are
shown in the table below.
Skills Stem
Comparing Describe the similarities and differences between…..
Compare the following two methods of ….
Relating Cause What are the major causes of …?
and Effect What would be most likely the effects of …?
Justifying Which of the following alternatives do you favor and why?
Explain why you agree or disagree with the following statement
Summarizing State the main points included in…..
Briefly summarize the contents of…
Generalizing Formulate several valid generalizations from the following data.
State a set of principle that can explain the following events.
Inferring In light of the facts presented, what is most likely to happen when..?
How would Senator X be likely to react to the following issues?
Classifying Group the following items according to…
What do the following items have in common?
Creating List as many ways as you can think of for….
Make up a story describing what would happen if…
Applying Using the principle of …. as a guide, describe how to solve the
problem
Describe a situation that illustrates the principle of…
Analyzing Describe the reasoning errors in the following paragraph.
List and describe the main characteristics of…
Synthesizing Describe a plan for providing that…
Write a well-organized report that shows….
Evaluating Describe the strengths and weaknesses of …
Using the given criteria, write an evaluation of….
2. Write the item so that the students clearly understand the specific task. If the
students will need to interpret what is asked, many answers will be off target.
When students misinterpreted the task, you don’t know if they have the targeted
skills or not, leading to invalid conclusions.
3. Indicate the criteria for scoring their responses. This can be labeled as scoring
plan, scoring criteria, or attributes to be scored.
Examples:
4. Indicate approximately how much time students should spend on each essay-item.
You can get idea by writing draft answers, and as you gain more experience the
responses of previous students to similar questions will be helpful. Make sure
that even the slowest writers can complete their answers satisfactorily in the time
available.
5. Avoid giving students options as to which essay questions they will answer.
When doing this, each student may be taking a different test. Differences in the
difficulty of each question are unknown, thus making scoring problematic.
Guidelines for Scoring Responses in an Essay Item
5. Score the answers anonymously. This will reduce if not eliminate the bias during
scoring. This can be done by having the students write their names on the back of
the paper or by using code numbers.
6. Whenever possible, have two or more persons grade each answer. Obtain two
independent judgments, especially where the results are to be used for important
and irreversible decisions.