Constrtucting Written Test Questions
Constrtucting Written Test Questions
Michael Altman, MD
Larry Cochard, PhD
Northwestern University Feinberg School of Medicine
Adopted from:
Constructing Written Test
Questions for the Basic and Clinical
Sciences
Susan M. Case & David B. Swanson
National Board of Medical
Examiners
We have embarked on a program for improving the quality of our examinations, including
review of the structure of all test questions. Herein we describe the principles of constructing
test items based upon USMLE guidelines, both taking advantage of NBME research, experience,
and wisdom regarding exam construction and providing our students with experience with
USMLE-style exam questions.
The purposes of testing are listed in Table 1. Accomplishing these goals depends on high-quality
items. Definitions of question terminology and question types are given in Table 2.
Table 2. Definitions
1. Exam content should match course/clerkship objectives. Exams can only sample the entirety
of a subject; the sample of items should be representative of the instructional goals.
2. Items should test important material; or important topics should be weighted more heavily
than less important topics. Trivial topics should be excluded.
3. One of the most important and most useful guidelines is that a question be worded so that it
can be answered without looking at the options. The stem should provide enough
information to answer the question independent of the options.
5. Include as much of an item in the stem as possible; stems should be long and options short. If
every option begins with the same phase or word, put it in the stem.
6. Options should be clearly true or clearly false. They should be free of absolutes such as
“always”, “never”, “all”, and of vague terms like “usually” and “frequently”. Incorrect answers
should be clearly inferior to the correct answer.
7. Please do not construct options that include “All of the above”, “none of the above”, “A and B
are correct”, etc.
8. Incorrect answers should be similar to the correct answer in construction and length.
9. The options should be homogeneous in content and plausible and attractive to the
uninformed. While five options are typical and recommended, three good options are actually
sufficient because students quickly eliminate two or three options for most questions. Include
more options than three if they are good distractors.
10. The grammar should be consistent and logically compatible between stem and options.
11. Questions should be as simple and direct as possible. They should be free of superfluous
information, and “tricky” and overly complex wording. Notorious examples of the latter are
items with double negatives. Please avoid these!
12. Numeric data should be presented consistently. Options in general should be presented in a
logical order.
13. Vague terms should not be used. Language between options should be consistent.
14. Answers should not be “hinged” to the answer of a related item.
15. This last guideline reinforces the first. Try to think of questions that relate to concepts
rather than factual recall. Some terminology-related questions are necessary, but they
should not be about “picky” facts.
NBME questions place an item in a clinical or functional context by adding a brief scenario or
statement at the beginning of the stem. Discrimination between better and poorer students is
increased by adding vignettes to the item. A shorter vignette is usually preferable to a longer
vignette because the latter may add irrelevant difficulty. Questions without vignettes, while
often less discriminatory, are perfectly fine for testing knowledge of specific concepts.
The guidelines listed above are intended to minimize irrelevant difficulty related to the
construction of the items so that an exam score will more accurately measure a student’s
mastery of the exam subject. Many types of flaws in the construction of questions provide
inadvertent clues about the right answer; and a “testwise” student is good at detecting the
clues. This is an important aspect of item construction that merits further attention. Categories
of flaws that contribute to testwiseness are listed in Table 4.
1. Grammatical clues. If an option does not grammatically follow from the stem, it must not be
the correct answer.
3. Absolute terms such as “always” and “never” are unlikely to be in correct options.
4. Long correct answer: a correct answer is often longer, more specific, or more complete than
other options.
5. Words repeated in the stem and the correct answer are obvious clues.
6. Convergence strategies. If options share elements (e.g., three answers are acids and two
bases), a correct answer is often among the options with the most shared elements. If there
is more than one element category among the option, convergence using shared elements
can often isolate the correct answer.
Extended-matching Items.
Extended matching items are intended to reduce cues and to emulate an open-answer item.
They are multiple choice questions with one best answer, but it is selected from an extensive if
not exhaustive list. The choice list is often used for multiple items.
The components of extended-matching items include a theme, the option list, a lead-in
statement, and the individual items.
Options:
Lead-in: For each patient with neurologic abnormalities, select the vessel that is most likely to
be involved.
Stems:
1. A 72-year-old right-handed man has weakness and hyperflexia of the right lower limb, an
extensor plantar response on the right, normal strength of the right arm, and normal facial
movements.
2. A 68-year-old right-handed man has right spastic hemiparesis, an extensor plantar response
on the right, and paralysis of the lower two-thirds of his face on the right. His speech is fluent,
and he has normal comprehension of verbal and written commands.
A 60-year-old alcoholic derelict in status epilepticus is brought to the ER by the police. After
ascertaining that the airway is open, the first step in management should be intravenous
administration of
A. examination of cerebrospinal fluid
B. glucose with vitamin B1
C. CT scan of the head
D. phenytoin
E. diazepam
Crime is
A. equally distributed among the social classes
B. overrepresented among the poor
C. overrepresented among the middle class and rich
D. primarily an indication of psychosexual maladjustment
E. reaching a plateau of tolerability for the nation
Secondary gain is
A. synonymous with malingering
B. a frequent problem in obsessive-compulsive disorder
C. a complication of a variety of illnesses and tends to prolong many of them
D. never seen in organic brain damage
(Long correct answer: correct answer is longer, more specific, or more complete than other
options
C is longer and more detailed, hence more likely to be correct.)
A 59-year-old man with a history of heavy alcohol use and previous psychiatric hospitalization
is confused and agitated. He speaks of experiencing the world as unreal. This symptom is
called
A. derealization
B. depersonalization
C. derailment
D. focal memory deficit
E. signal anxiety
(Word repeats: a word or phrase is included in the stem and in the correct answer
A is correct; “unreal” leads to “derealization”.)
Local anesthetics are most effective in the
A. anionic form, acting from inside the nerve membrane
B. cationic form, acting from inside the nerve membrane
C. cationic form, acting from outside the nerve membrane
D. uncharged form, acting from inside the nerve membrane
E. uncharged form, acting from outside the nerve membrane
(Convergence strategy: The correct answer includes the most elements in common with the
other options. Three options are charged, three “inside”; hence A or B; select B because
cationic appears twice; B is correct.)
Following a second episode of salpingitis, what is the likelihood that a woman is infertile?
A. Less than 20%
B. 20 to 30%
C. Greater than 50%
D. 90%
E. 75%
Other Considerations
Compare these option sets. They demonstrate how option selection affects difficulty and
quality of an item.
Compare these (3) stems. They all ask the same question, with no vignette, short vignette and
long vignette settings. As indicated above, discrimination between better and poorer students
is increased by adding either vignette to the item. A shorter vignette is usually preferable to a
longer vignette because the latter may add irrelevant difficulty. Questions without vignettes,
while often less discriminatory, are perfectly fine for testing knowledge of specific concepts.
What is the most likely abnormality in children with nephrotic syndrome and normal renal
function?
A. acute poststreptococcal glomerulonehpritis
B. hemolytic-uremic syndrome
C. minimal change nephrotic syndrome
D. nephrotic syndrome due to focal and segmental glomerulosclerosis
E. Schönlein-Henoch purpura with nephritis
A 2-year-old boy has a 1-week history of edema. Blood pressure is 100/60 mm Hg, and there
is generalized edema and ascites. Serum concentrations are: creatinine 0.4 mg/dL, albumin 1.4
g/dL, and cholesterol 569 mg/dL. Urinalysis shows 4+ protein and no blood. What is the most
likely diagnosis?
A. acute poststreptococcal glomerulonehpritis
B. hemolytic-uremic syndrome
C. minimal change nephrotic syndrome
D. nephrotic syndrome due to focal and segmental glomerulosclerosis
E. Schönlein-Henoch purpura with nephritis
A 2-year-old African-American child developed swelling of his eyes and ankles over that past
week. Blood pressure is 100/60 mm Hg, pulse 110/min, and respirations 28/min. In addition
to swelling of his eyes and 2+ pitting edema of his ankles, he has abdominal distention with a
positive fluid wave. Serum concentrations are: creatinine 0.4 mg/dL, albumin 1.4 g/dL, and
cholesterol 569 mg/dL. Urinalysis shows 4+ protein and no blood. What is the most likely
diagnosis?
A. acute poststreptococcal glomerulonehpritis
B. hemolytic-uremic syndrome
C. minimal change nephrotic syndrome
D. nephrotic syndrome due to focal and segmental glomerulosclerosis
E. Schönlein-Henoch purpura with nephritis
Resources:
Haladyna, T. M., Downing, S. M. & Rodriguez, M.C. (2002) A review of multiple-choice item-
writing guidelines. Applied Measurement Education, 15, 309–333.