Using Cognitive Models To Develop Quality Multiple-Choice Questions
Using Cognitive Models To Develop Quality Multiple-Choice Questions
net/publication/299342078
CITATIONS READS
48 1,610
5 authors, including:
All content following this page was uploaded by Debra Pugh on 09 April 2016.
Debra Pugh, Andre De Champlain, Mark Gierl, Hollis Lai & Claire Touchie
To cite this article: Debra Pugh, Andre De Champlain, Mark Gierl, Hollis Lai & Claire Touchie
(2016): Using cognitive models to develop quality multiple-choice questions, Medical Teacher,
DOI: 10.3109/0142159X.2016.1150989
ABSTRACT
With the recent interest in competency-based education, educators are being challenged to develop more assessment oppor-
tunities. As such, there is increased demand for exam content development, which can be a very labor-intense process. An
innovative solution to this challenge has been the use of automatic item generation (AIG) to develop multiple-choice ques-
tions (MCQs). In AIG, computer technology is used to generate test items from cognitive models (i.e. representations of the
knowledge and skills that are required to solve a problem). The main advantage yielded by AIG is the efficiency in generating
items. Although technology for AIG relies on a linear programming approach, the same principles can also be used to
improve traditional committee-based processes used in the development of MCQs. Using this approach, content experts
deconstruct their clinical reasoning process to develop a cognitive model which, in turn, is used to create MCQs. This
approach is appealing because it: (1) is efficient; (2) has been shown to produce items with psychometric properties compar-
Downloaded by [University of Ottawa] at 17:42 31 March 2016
able to those generated using a traditional approach; and (3) can be used to assess higher order skills (i.e. application of
knowledge). The purpose of this article is to provide a novel framework for the development of high-quality MCQs using cog-
nitive models.
Background
The multiple-choice question (MCQ) format is commonly Practice points
used in high-stakes assessment and, as such, guidelines for The competency-based education (CBE) model
the development of MCQs are readily available (Case & requires frequent assessment opportunities and,
Swanson 2002; Touchie 2010; Haladyna 2013). The MCQ for- therefore, increased exam content
mat is appealing because it allows for an efficient means of Innovative methods for augmenting exam content
testing large amounts of content. Although often assumed development will be important to meet the
to assess only lower-level skills (e.g., recall), well-constructed demands of CBE
MCQs can be used to assess both knowledge and clinical A cognitive model approach can be used to
reasoning skills (Schuwirth & van der Vleuten 2004). efficiently develop high-quality multiple-choice
However, the development of MCQs remains a laborious questions
and expensive process.
In a traditional approach to developing MCQs, the item
writer selects a topic, then develops an item consisting of a
stem, lead-in question, a correct answer and list of options
or distractors (Case & Swanson 2002). The item then under- An innovative solution to this challenge has been auto-
goes review by a panel of experts to validate the content matic item generation (AIG). AIG employs computer tech-
and identify any potential flaws in the item. Developing nology to generate test items from cognitive models (Gierl
quality MCQ items is time consuming and the associated et al. 2012). In brief, cognitive models are representations of
developmental cost has been estimated to be as high as the knowledge and skills that are required to solve a prob-
$1500-$2500 USD per item (Rudner 2010). lem. In the creation of cognitive models, content experts
In the field of medicine, the renewed interest in compe- deconstruct the clinical reasoning process and, in turn, this
tency-based education (CBE) has posed a challenge to edu- information can be entered into a computer program that
cators to develop more assessment opportunities (Holmboe uses algorithms to generate MCQs. Using this process,
et al. 2010). Because CBE offers flexible educational MCQs are developed using a common stem that differs
models in which learners progress at their own pace, more only by pre-defined variables (e.g., features on history, phys-
frequent testing is required to assess competency along the ical examination or investigation) (Gierl & Lai 2013).
continuum. This might take the form of progress or on- The main advantage that is yielded by AIG is the effi-
demand testing, for example (Mills & Steffen 2000; ciency with which items can be generated. Typically, several
Schuwirth & van der Vleuten 2012). These measures will hundred items can be developed for a particular topic using
necessitate the development of thousands of new, high- one cognitive model (Gierl et al. 2012). Preliminary results
quality items in a relatively short period of time. As a suggest that items for clinical medicine developed using
result, the need for more examination content continues to AIG display psychometric properties (i.e. item difficulty and
increase. item discrimination) that are similar to those obtained using
CONTACT Debra Pugh [email protected] The Ottawa Hospital – General Campus, Box 209, 501 Smyth Road, Ottawa, ON K1H 8L6, Canada
ß 2016 Informa UK Limited, trading as Taylor & Francis Group
2 D. PUGH ET AL.
traditionally developed MCQs (Gierl et al. 2015). The model features that help to differentiate between potential causes
also offers a framework for the systematic creation of plaus- (e.g. cardiac history, allergies, fever, etc.).
ible distractors, in that the content expert not only needs to The process of developing a cognitive model begins by
provide the clinical reasoning underlying a correct response selecting a topic from the overall test blueprint. As with any
but also the cognitive errors associated with each of the assessment, the topic should reflect important content.
distractors (Lai et al. 2016). Consequently, AIG holds great Let’s consider the example of the topic of chest pain. For
promise not only in generating high volumes of items, but this topic, one could decide to develop questions that
also in regards to tailoring diagnostic feedback for remedial assess the candidate’s ability to either make a diagnosis or
purposes. to manage the patient. These are the tasks associated with
Although technology for AIG relies on a linear program- the model. Other tasks could be developed, but these two
ming approach, the same principles can also be used to tasks work well for most models.
improve the traditional committee-based process used in For each task, one then develops a list of associated
the development of MCQs. This approach is appealing for answers. For example, for the diagnosis task, one might
many reasons. First, this method allows item writers to very generate a list that includes myocardial infarction (heart
attack), pulmonary embolism (blood clot to the lung), costo-
efficiently develop several items ( 5–10) that assess the
chondritis (musculoskeletal pain), or aortic dissection (rup-
same construct, using only one model. The items developed
tured aorta), etc. These can be entered into a grid along the
from a single cognitive model can then be used on differ-
x-axis as illustrated in Figure 1.
ent test forms, attenuating the potential impact of pure
The sources of information needed to solve the problem
recall. Second, the cognitive model approach capitalizes on
are then identified and added to the y-axis of the grid.
experts’ ability to identify and articulate the critical elements
These may be historical features, physical examination find-
Downloaded by [University of Ottawa] at 17:42 31 March 2016
required in problem-solving. In other words, when develop- ings, or investigations (e.g., pain characteristics, vital signs,
ing cognitive models, experts explicitly outline their ECG, etc.). These sources of information are then specified
approach to a particular problem by specifying sources of by defining variables for the corresponding diagnoses. For
information that can be varied to lead to different conclu- example for the source of information ‘‘pain characteristics’’,
sions. The explicit representation of the knowledge domain, the variable would be ‘‘heaviness’’ for myocardial infarction
as captured in the cognitive model, can potentially lead to and ‘‘sharp and worse with a deep breath’’ for pulmonary
higher quality items by ensuring that items assess applica- embolism. The sources of information and variables chosen
tion of knowledge rather than just recall. Third, the use of should be factors that would influence how the content
the cognitive model approach provides a framework for the expert (in this case, the physician) would approach the
development of plausible distractors, a task with which problem. In other words, if chest pain radiates to the back
even experienced item writers often struggle (Rogausch and is associated with a difference in blood pressure
et al. 2010). Through modeling the common and distin- between both arms (aortic dissection), the differential diag-
guishing features between the distractors and the correct nosis will be very different than if the pain is worse on
option, item writers are required to explicitly identify and inspiration and is associated with low oxygen saturations
differentiate how distractors should be presented in each (pulmonary embolism).
item. Finally, during the review process for items developed Although other sources of information could be added
from cognitive models, revisions can be made at the model such as age or gender, only those that will actually make a
(rather than the individual question) level, which adds to difference to how one approaches the problem should be
the efficiency of the process. included in the cognitive model. It is also important to note
The purpose of this paper is to provide a novel frame- that the variables do not need to address every possible
work for the development of high-quality MCQs using cog- presentation. For example, for myocardial infarction, the
nitive models. We will focus on examples from medical associated symptoms could also include shortness of breath,
education, but this approach can be applied to any educa- nausea, or dizziness, but providing this much detail is not
tional setting. necessary and may lead to the development of questions
that are too easy. It is therefore important to ensure that
the variables provide enough information to help a candi-
Approach to developing MCQs using cognitive date differentiate between the correct answer and the list
model of distractors, while allowing for the creation of items that
are at an appropriate level of difficulty.
Although there are several different forms of MCQs (e.g.
Once the model has been developed for one task (e.g.
Pick-Ns, R-types, etc.), this paper will only consider the use
diagnosis), another task can easily be added (e.g. manage-
of the more conventional multiple-choice formats (i.e. ‘‘A- ment) using the same sources of information, as demon-
Type’’ or ‘‘Single Best Answer’’). strated in Figure 1. This effectively doubles the number of
questions that can be developed from a single cognitive
Developing a cognitive model model.
At this stage, it is important to solicit input from other
The development of a cognitive model requires that a con- content experts about the cognitive model before progress-
tent expert summarize how they typically approach a given ing any further. If any issues are identified in the cognitive
problem. This necessitates that they articulate what sources model, they can be corrected before any MCQs are pro-
of information would allow them to solve the problem. For duced. For example, the sources of information chosen may
example, given a patient with shock, a clinician might begin not provide enough information to differentiate between
by considering the historical and physical examination the various diagnoses. In this case, additional sources of
MEDICAL TEACHER 3
Tasks
Diagnoses Myocardial Pulmonary Costochondritis Aortic
infarction Embolism Dissection
Management PCI Heparin (blood Analgesia Anti-
(angioplasty) thinner) (pain hypertensives
medication) (blood pressure
medication)
Sources of
Information
(Variables)
information may be added, or the variables may be re- As with MCQs developed using a traditional one-item at
defined. Thorough review and revisions at this stage can a time approach, there are a few ways to verify the quality
potentially be much more efficient than when done at the of items that are being generated. If the stem and lead-in
level of the individual item. question are well-constructed, then the question should be
answerable without referring to the list of options. In add-
ition, the lead-in question should be inextricably linked to
Writing the stem the stem so that one could not answer the question with-
The construction of an MCQ begins with a stem. The stem out reading the stem (Case & Swanson 2002). Lead-in ques-
is generally written as a vignette that includes all the infor- tions, such as ‘‘What is a true statement about chest pain?’’
mation needed to answer the question. It should not violate both these principles.
include extraneous information or tricks and the vocabulary Commonly, the lead-in question begins with ‘‘Which one
should be kept relatively simple (Haladyna 2013). of the following’’ which clearly indicates that the correct
A good stem should present information that leads to answer is included in the list of options provided (Touchie
clinical reasoning rather than the recognition of memorized 2010); see Text Box 3.
facts (Case & Swanson 2002). When using the cognitive By developing two questions, the number of items cre-
model approach, the stem should incorporate all the sour- ated is doubled. For example, using the model shown in
ces of information generated in the cognitive models as Figure 1, four MCQs can be developed for diagnosis and
shown in Text Box 1. The sources of information can then four for management.
be replaced with the appropriate variables for a given
answer as outlined in Text Box 2. This allows for multiple
items to be created from a single model, using the Generating distractors
same stem. The next step is to generate a list of distractors that are
associated with the relevant task. Distractors are the incor-
rect options that are presented along with the correct
Writing the lead-in question/s answer in MCQs. Good distractors should be plausible, yet
After a stem is developed, a lead-in question can then be unequivocally wrong. Furthermore, they should represent
posed for each of the predefined tasks (e.g., a diagnosis clear reasoning errors on the part of weaker candidates
question and a management question in our example). The who may only possess part or none of the knowledge
lead-in question should be carefully worded to ensure that necessary for a correct response. Distractors should be
the task is clear, such as selecting the most likely diagnosis homogenous (e.g. all diagnoses, all investigations etc.) and
or the best next step. distractors such as ‘‘All of the above’’ ‘‘None of the above’’
4 D. PUGH ET AL.
Text Box 1. Example of MCQ stem with variable labels (sources of information are indicated in brackets).
A 60-year-old man presents to the Emergency Department with a sudden onset of retrosternal chest pain. He describes it as a [PAIN CHARACTERISTICS] that
[RADIATION OF PAIN]. It is associated with [ASSOCIATED SYMPTOMS]. On physical examination [VITAL SIGNS]. [PHYSICAL FINDINGS]. His ECG is shown in the
referenced image [ECG].
Text Box 2. An example of a complete MCQ stem (defined variables are in bold font).
A 60-year-old man presents to the Emergency Department with a sudden onset of retrosternal chest pain. He describes it as a heaviness in his chest that radiates
to the jaw. It is associated with nausea. On physical examination, heart rate is 60/minute, blood pressure is 100/80 mmHg, respiratory rate is 16/minute and oxygen saturation
is 92% on room air. There are a few bibasilar crackles and the JVP is 4 cm above the sternal angle. His ECG is shown in the referenced image (not shown, but would demon-
strate ST elevation in the inferior leads).
Text Box 3. Examples of MCQ lead-in questions. Text Box 4. Examples of distractors for diagnosis and
Which one of the following is the most likely diagnosis? management of chest pain.
Which one of the following is the most important step in Diagnosis
managing this patient? Gastroesophageal reflux disease (heartburn)
Pancreatitis (inflamed pancreas)
Pericarditis (inflammation around the cardiac sac)
Esophageal perforation (ruptured esophagus)
should be avoided as they provide some cueing (Haladyna
Management
et al. 2002). Text Box 4 provides examples of possible dis- Pantoprazole infusion
tractors for the chest pain model. Oral colchicine
The number of distractors presented is a policy decision Thoracotomy
Downloaded by [University of Ottawa] at 17:42 31 March 2016
Text Box 5. Two examples of both a diagnosis and a management question associated with the same stem.
A 60-year-old man presents to the Emergency Department with a sudden onset of retrosternal chest pain. He describes it as a heaviness in his chest that radiates
to the jaw. It is associated with nausea. On physical examination, heart rate is 60/minute, blood pressure is 100/80 mmHg, respiratory rate is 16/minute and oxygen saturation
is 92% on room air. There are a few bibasilar crackles and the JVP is 4 cm above the sternal angle. His ECG is shown in the referenced image (ECG would show ST elevation
in the inferior leads).
Which one of the following is the most likely diagnosis?
(1) Pericarditis
(2) Myocardial infarction*
(3) Aortic dissection
(4) Pulmonary embolism
OR
Which one of the following is the most important step in managing this patient?
(1) Percutaneous coronary intervention*
(2) Intravenous labetalol
(3) Pantoprazole infusion
(4) Thoracotomy
A 60-year-old man presents to the Emergency Department with a sudden onset of retrosternal chest pain. He describes it as a tearing pain that radiates to the
back. It is associated with shortness of breath. On physical examination, heart rate is 100/minute, blood pressure is 172/82 mmHg in the right arm and 150/60 mmHg in the
left arm, respiratory rate is 18/minute and oxygen saturation is 95% on room air. There is a diastolic murmur heard over the 3nd left intercostal space. His ECG is shown in the
referenced image (ECG would be normal).
Which one of the following is the most likely diagnosis?
(1) Aortic dissection*
(2) Pulmonary embolism
(3) Myocardial infarction
(4) Pancreatitis
OR
Downloaded by [University of Ottawa] at 17:42 31 March 2016
Which one of the following is the most important step in managing this patient?
(1) Intravenous labetalol*
(2) Pantoprazole infusion
(3) Heparin infusion
(4) Percutaneous coronary intervention
Downing SM, Haladyna T. 2009, Validity and its threats. In: Downing SM Mills C, Steffen M. 2000. The GRE computer adaptive test:
& Yudkowsky R, editors. Assessment in Health Professions Education. Operational issues. In: van der Linen WJ & Glas CAW, editors.
1st edn. Routledge, New York. p. 21–55. Computerized adaptive testing: theory and practice. Norwell: Kluwer.
Gierl MJ, Lai H. 2013. Evaluating the quality of medical multiple-choice p. 75–99.
items created with automated processes. Med Educ. 47:726–733. Rogausch, A, Hofer R, Krebs R. 2010. Rarely selected distractors in
Gierl MJ, Lai H, Pugh D, Touchie C, Boulais AP, De Champlain A. 2015. high stakes medical multiple-choice examinations and their
Evaluating the psychometric properties of generated test items. App recognition by item authors: a simulation and survey. BMC Med
Meas Educ., in press. Educ. 10:85.
Gierl MJ, Lai H, Turner SR 2012. Using automatic item generation to cre- Rudner L. 2010. Implementing the graduate management admission
ate multiple-choice test items. Med Educ. 46:757–765. test computerized adaptive test. In: van der Linden & Glas CAW,
Haladyna TM. 2013. Developing and validating test items. New York: editors. Elements of Adaptive Testing. New York: Springer. p.
Routledge. 151–165.
Haladyna, TM, Downing SM. 1993. How many options is enough for a Schuwirth LWT, van der Vleuten CPM. 2004. Different written assess-
multiple-choice item? Educ Psychol Meas. 53:999–1010. ment methods: what can be said about their strengths and weak-
Haladyna, TM, Downing SM, Rodriguez MC. 2002. A Review of Multiple- nesses? Med Educ. 38:974–979.
Choice Item-Writing Guidelines for Classroom Assessment. App Meas Schuwirth LWT, van der Vleuten CPM. 2012. The use of progress testing.
Educ. 15:309–333. Perspect Med Educ. 1:24–30.
Holmboe E, Sherbino J, Long D, Swing S, Frank J. 2010. The role of assess- Tarrant M, Ware J. 2008. Impact of item-writing flaws in multiple-choice
ment in competency-based medical education. Med Teach. questions on student achievement in high-stakes nursing assess-
32:676–682. ments. Med Educ. 42:198–206.
Lai H, Gierl MJ, Pugh D, Touchie C, Boulais AP, De Champlain A. 2016. Touchie C. 2010. Guidelines for the Development of Multiple-
Using automated item generation to improve the quality of MCQ Choice Questions; [cited 2015 Nov 10]. Available from:
distractors. Teach Learn Med. [Epub ahead of print]. doi: 10.1080/ http://mcc.ca/wp-content/uploads/multiple-choice-question-guide-
10401334.2016.1146608. lines.pdf
Downloaded by [University of Ottawa] at 17:42 31 March 2016