Cognitive Interviewing Training Guide Gordon Willis, PHD February 15, 2012
Cognitive Interviewing Training Guide Gordon Willis, PHD February 15, 2012
i
Contents
Preface......................................................................................................................................... ii
References................................................................................................................................. 47
ii
PREFACE
This guide is a “third generation” product; it is based on the document Cognitive Interviewing:
A Training Guide, by G. Willis, Research Triangle Institute (1999), which was itself based on
Cognitive Interviewing and Questionnaire Design: A Training Manual, by G. Willis, National
Center for Health Statistics, (NCHS Cognitive Methods Staff Working Paper #7, March 1994).
At each revision point, I have borrowed heavily from the earlier version, while also attempting
to “change with the times” as appropriate. In any of these incarnations, the document describes
the cognitive interviewing techniques appropriate for questionnaire development and testing, as
originally developed and practiced at NCHS during the 1990's. Note that there are a variety of
organizations, laboratories, and individual researchers who conduct cognitive interviewing, and
it is well documented that practitioners follow a variety of procedures that may diverge from the
approach I take. I make very limited attempts in this manual to document these varied
procedures. Rather, I simply present what I do, and leave it to the reader to consider alternative
approaches and to develop his or her own personal style.
I would very much like to thank the late Trish Royston, and Deborah Trunzo, both formerly of
NCHS, who are responsible for much of what I know about this area.
0
SECTION 1. INTRODUCTION AND BACKGROUND TO COGNITIVE
INTERVIEWING TECHNIQUES
There is an old adage suggesting that “Everybody talks about the weather, but nobody does
anything about it.” To stretch an analogy, many researchers have talked and written about
cognitive interviewing, but few have provided instruction as to how this activity is actually
carried out. Therefore, the overall purpose of this manual is to provide very explicit instruction
in a number of areas that are relevant to the use of cognitive interviewing techniques for the
purpose of testing and developing survey questions (note that there is also a different type of
“cognitive interviewing” that is mainly used in the law enforcement field to enhance the retrieval
of information by eyewitnesses). A historical review of the topic makes clear that the cognitive
interviewing approach to evaluating sources of response error in survey questionnaires has
origins that reach back to the mid 20th century, well before the advent of cognitive psychology as
a dominant influence in the social sciences. However, the procedures to be described in this
manual were, for the most part, inspired and developed during the 1980's through an
interdisciplinary effort by survey methodologists and psychologists (generally labeled CASM, or
the Cognitive Aspects of Survey Methods). In order to define terms and provide an overview, I
first present some general features of the approach that I will endeavor to describe:
a) The focus is on the questionnaire, rather than the whole survey. In the form
presented here, cognitive interviewing mainly emphasizes the development of the
questionnaire, rather than the entire survey administration process (for example, I do not
consider the choice of administration procedure, such as in-person versus telephone).
However, to the extent that administration mode in particular influences question design,
cognitive testing clearly must necessarily take this into account.
c) Terminology: Subject versus Respondent. For the conduct of the cognitive interview,
volunteer subjects are recruited, and are interviewed in a laboratory environment or in
some other private location. The term subject refers to an individual who is tested
through a cognitive interviewing procedure, and respondent defines someone who is
interviewed in a fielded survey, usually after cognitive pretesting has been completed.
1
d) Use of specialized recruitment. Depending on the nature of the questionnaire, we
often target conduct cognitive interviewing towards persons with specific characteristics
of interest (for example, the elderly, those who have used illicit drugs in the past 12
months, teenagers who have used chewing tobacco, etc.). The capacity to utilize targeted
recruitment is one of the characteristic features of the cognitive interviewing approach.
In the following sections I describe cognitive interviewing in “down in the trenches” detail; this
is intended to benefit readers who would like to apply this technique themselves. Near the end I
attempt to address the issue of evaluation of cognitive techniques, and to suggest likely future
directions.
This section is brief, as cognitive interviewing experience and literature reviews have led me to
conclude that cognitive interviewing is not, in fact, heavily dependent on a sophisticated body of
theory.1 Further, it may be quite possible to refer to the activity without making explicit
reference to cognition (e.g., “Intensive” Interviewing - See Royston, 1989). However, it is
useful to consider the general cognitive underpinnings of the activity, as this does provide a
useful heuristic framework by which to guide our interviewing practices.
The background theory underlying cognitive interviewing has been represented by various
models (see Jobe and Herrmann, 1996). The most general model is attributable to Tourangeau
(1984), and in brief, consists of the following processes. 2
1Note that this may not necessarily be a positive feature - several authors have criticized
cognitive interviewers for not developing a richer theory to guide practice (see Conrad & Blair, 1996;
Forsyth & Lessler, 1991).
2Interestingly, the roots of such cognitive modeling predate the “cognitive revolution” in survey
2
1) Comprehension of the question
a) Question intent: What does the respondent believe the question to be asking?
b) Meaning of terms: What do specific words and phrases in the question mean to
the respondent?
b) Recall strategy: What type of strategies are used to retrieve information? For
example, does the respondent tend to count events by recalling each one
individually, or does he or she use an estimation strategy?
3) Decision/Judgment processes
a) Motivation: Does the respondent devote sufficient mental effort to answer the
question accurately and thoughtfully?
4) Response processes
Mapping the response: Can the respondent match his or her internally generated
answer to the response categories given by the survey question?
For survey questions that are non-trivial, the question-answering process may be very complex,
and necessarily involves a number of cognitive steps. Some of these processes are conscious,
but some are automatic, so that the respondent is not aware of their operation. The cognitive
processes used to answer survey questions may also vary, depending on the type of question
asked. Autobiographical questions may place a heavy burden on retrieval processes (e.g., “For
how many years have you smoked cigarettes every day?”; whereas asking questions that are
sensitive (for example; "Have you ever smoked marijuana?"), may place more demands on the
respondent's decision processes (e.g., the particular decision about whether to check the YES
methods. In particular, see Lansing, Ginsberg, and Braaten, (1961) as an interesting historical precursor.
3
box for that item).
It is very helpful to keep in mind even the very general cognitive processes listed above, when
conducting cognitive interviewing, or evaluating potential modifications to survey questions.
That is, the interviewer should actively consider questions like the following:
b) Is the question itself too long for the respondent to even remember, let alone answer?
c) Will people be likely to remember, or to even know, the answer to the question?
d) Will the respondent know what the response categories are that he/she is supposed to
use to answer the question?
Readers can argue that much of this is common sense, as opposed to the application of theory,
and I admit considerable sympathy for this viewpoint. However, I have found that an explicit
cognitive focus, even at this general level, helps cognitive interviewers to communicate about
findings and problems, and also orients their attention toward the general classes of problems
that they should be attempting to detect and rectify.
An objection that I have sometimes heard, after presenting the general cognitive model of the
survey response process, is that the researcher cannot really tell what laboratory subjects are
thinking - we have no direct means by which to read minds, as opposed to simply listening to
what the subject decides to tell us. This issue has received much general debate in the cognitive
literature (see Ericsson and Simon, 1980). For current purposes, I will attempt to defuse the
argument by simply suggesting that researchers who apply cognitive interviewing techniques
recognize that they cannot know in an absolute sense what transpires in a respondent’s mind as
he or she answers a survey question. Rather, the cognitive interviewer’s goal is to prompt the
individual to reveal information that provides clues and suggests hypotheses as to the types of
processes mentioned above, and in particular, how problems that relate to these processes may
be influence by the way in which the questions are written. The manner in which we try to go
about this is discussed next.
There are two major approaches to cognitive interviewing methods, referred to as think-aloud
4
interviewing, and verbal probing techniques.3 These are described in turn.
The Use of Think-Aloud Interviewing
The think-aloud interview derives from psychological procedures described by Ericsson and
Simon (1980), and was advocated for use in the testing of survey questionnaires by Loftus
(1984). Consistent with common usage (see Willis, et al., 1999), I use the term think-aloud to
describe a very specific type of activity, in which subjects are explicitly instructed to "think
aloud" as they answer the tested survey questions. The interviewer reads each question to the
subject, and then records and/or otherwise notes the processes that subject uses in arriving at an
answer. The interviewer interjects little else, except to say "tell me what you're thinking" when
the subject pauses. For example, a portion of a think-aloud interview might consist of the
following:
INTERVIEWER (reading survey question to be tested): How many times have you talked
to a doctor in the last 12 months?
SUBJECT: I guess that depends on what you mean when you say “talked.” I talk to my
neighbor, who is a doctor, but you probably don’t mean that. I go to my doctor about
once a year, for a general check-up, so I would count that one. I’ve also probably been
to some type of specialist a couple of more times in the past year - once to get a bad knee
diagnosed, and I also saw an ENT about a chronic coughing thing, which I’m pretty sure
was in the past year, although I wouldn’t swear to it. I’ve also talked to doctors several
times when I brought my kids in to the pediatrician - I might assume that you don’t want
that included, although I really can’t be sure. Also, I saw a chiropractor, but I don’t
know if you’d consider that to be a doctor in the sense you mean. So, what I’m saying,
overall, is that I guess I’m not sure what number to give you, mostly because I don’t
know what you want.
From this think-aloud protocol, the interviewer may observe that the individual attempts to
answer this question by attempting to recall each visit individually, rather than by estimating. It
might be concluded that the subject has trouble determining whether a visit was really in the last
12 months. If, after interviewing several subjects, it becomes clear that none could really think
through with confidence the number of times they had been to a doctor, one might decide that
the reference period is simply too long to provide adequate answers. More significantly, the
larger problem here seems to be that the subject is clearly unsure about what is to be included
and excluded from the question, as far as both a) whether this refers only to doctor contacts that
pertain to his/her health, and b) the type of physician or other provider that is to be counted.
3This document covers the major techniques used, rather than the full range. For a comprehensive taxonomy of procedures,
see Forsyth and Lessler (1991).
5
Generally, the interviewer must teach the subject how to perform the think-aloud procedure, and
this training generally involves some practice at the start of an interview. One training approach
that has sometimes been used is the following:
"Try to visualize the place where you live, and think about how many windows there are
in that place. As you count up the windows, tell me what you are seeing and thinking
about."
Depending on how well the subject responds to this exercise, further training may be necessary,
prior to beginning the core part of the interview.
a) Need for subject training. Because thinking-aloud is somewhat unusual for most
people, the technique may require a non-trivial amount of preliminary training of
subjects, in order to elicit a sufficient amount of think-aloud behavior. Such training
may eat into the amount of productive time that can be devoted to the interview itself.
b) Subject resistance. Even given training in the activity, many individuals are simply
not proficient at the think-aloud activity. In particular, they tend to answer the
questions that are asked, without elaboration.
c) Burden on subject. Related to the point above, the think-aloud activity places the
main burden on the subject. The alternative, as described next, is to place more of the
relative burden on the cognitive interviewer.
6
d) Tendency for the subject to stray from the task. Under think-aloud, the subject
controls the nature of much of the elaborative discussion. Therefore, it is very easy for a
free associating individual to wander completely off-track, and to spend a significant
amount of time on one question, often delving into irrelevant areas, so that the
interviewer must struggle to “bring the subject back.” In general, the think-aloud
technique results in relatively few survey questions being tested within a particular
amount of time, relative to alternative approaches (again, see the discussion that follows).
As an alternative to the think-aloud, the use of verbal probing is the basic technique that has
increasingly come into favor by cognitive researchers (Willis, et al., 1999). After the
interviewer asks the survey question, and the subject answers it, the interviewer then follows up
by asking for other, specific information relevant to the question or to the specific answer given.
In general, the interviewer "probes" further into the basis for the response. The following table
contains basic categories of cognitive probes, and an example of each:
7
Comprehension/ What does the term "outpatient" mean to you?
Interpretation probe:
Paraphrasing4: Can you repeat the question I just asked in your own words?
Confidence judgment: How sure are you that your health insurance covers drug and alcohol
treatment?
Recall probe: How do you remember that you went to the doctor five times in
the past 12 months?
Specific probe: Why do you think that cancer is the most serious health
problem?
Interestingly, verbal probing is not new – in particular, Cantril (1944) described this practice and
gave examples of probe questions virtually similar to those contained in the table above, well
before it became fashionable to employ the term “cognition” when evaluating survey questions.
a) Control of the interview. The use of targeted probing to guide the subject tailors the
interchange in a way that is controlled mainly by the interviewer. This practice avoids a
good deal of discussion that may be irrelevant and non-productive. Further, the
interviewer can focus on particular areas that appear to be relevant as potential sources of
response error.
b) Ease of training of the subject. It is fairly easy to induce subjects to answer probe
questions, as these probes often do not differ fundamentally from the survey question
they are otherwise answering. In fact, subjects will sometimes begin to expect probes,
and to offer their own spontaneous thoughts and critiques, so that the interview comes to
4Paraphrasing has been classified by other authors as a specific type of cognitive method, apart from cognitive
interviewing (see Forsyth and Lessler, 1991). In practice, to the degree that one chooses to simply make use of each
method as appropriate, such nomenclature differences have few serious practical implications.
5Note that the probe “tell me what you were thinking” is virtually identical to the general practice used in
think-aloud interviewing to elicit responding. From this perspective, to the extent that the interviewer uses this type of
probe when conducting a cognitive interview, the “think-aloud” can be conceptualized as a variety of verbal probing.
resemble a think-aloud.
a) Artificiality. Occasionally the criticism is made that the validity of verbal probing
techniques is suspect, because the interjection of probes by interviewers may produce a
situation that is not a meaningful analog to the usual survey interview, in which the
interviewer simply administers questions, and the respondent answers them. However,
note that the verbal probing technique is certainly no more unrealistic than the alternative
of thinking-aloud. Further, this criticism may also not be particularly relevant; the basic
purpose of the pretest cognitive interview is very different than that of the fielded
interview (the former analyzes questions, the latter collects data). Alternatively, one
might consider making use of retrospective probing (see below).
b) Potential for Bias. A related criticism is that the use of probes may lead the
respondent to particular types of responses. This is of course possible, but can be
minimized through the careful selection of non-leading probing techniques that minimize
bias. For example, in conducting probing, rather than suggesting to the subject one
possibility ("Did you think the question was asking just about physicians?”), it is
preferable to list all reasonable possibilities ("Did you think the question was asking only
about physicians, or about any type of health professional?”). In other words, probes
should be characterized by unbiased phrasing, in the same manner that survey questions
are intended to.6
My preference (in case this isn’t already clear) is to emphasize verbal probing over think-aloud,
and it appears that current practice, at least within U.S. Federal agencies and contract researcher
organizations, typically involves much more probing than classical think-aloud. However, note
that these activities are not mutually exclusive– I find it helpful to ask subjects to think aloud as
much as possible (we do want to get them to be talkative), but do not hesitate to jump in with
probe questions whenever I feel this to be appropriate. Some people are naturally gifted think-
alouders, and in this case the best we can do is listen to them, while also trying to keep the
interview moving along. In other cases (the memory that springs to my mind is of male teenage
smokers), the notion of “think-aloud” is completely alien to the subject, and we are lucky to
motivate any type of behavior other than monosyllabic verbal responses and facial/body
gestures. Rather than being dogmatic, I suggest using a flexible approach to probing that can be
adjusted to the particular subject.
To the degree that the interviewer takes on the responsibility of applying probe questions, these
can vary in a number of fundamental ways, as follows.
6Paradoxically, one might suggest the need for cognitive testing of probe questions themselves
to determine if they are in fact free from response error, which of course leads to an endlessly recursive
logical loop.
Concurrent versus retrospective probing
Two general approaches to probing are: a) concurrent probing, and b) retrospective probing.
With concurrent probing, the interchange is characterized by: a) the interviewer asking the
survey question, b) the subject answering the question, c) the interviewer asking a probe
question, d) the subject answering the probe question, and e) possibly, further cycles of (c-d).
In retrospective probing, on the other hand, the subject is asked the probe questions after the
entire interview has been administered (sometimes in a separate part of the interview known as a
“debriefing session”).
Overall, it appears that concurrent probing is more frequently used at present, mainly because
the information to be asked about is still fresh in the subject's mind at the time of the probing. It
may seem more realistic to wait and to debrief the subject by probing after the questions have
been administered (in order to avoid the potential for bias mentioned above). However, there is
then a significant danger that subjects may no longer remember what they were thinking as they
answered a question, and will instead fabricate an explanation.
Whether probing is done concurrently or retrospectively, there are two basic categories of probe
questions:
a) Pre-planned probes: These are usually for use by all interviewers, and are developed
prior to the interview.
Pre-planned probes are sometimes referred to as “scripted probes.” However, I would argue
that the fundamental notion is not that they are explicitly scripted, in the sense that they are read
verbatim, but that they are developed prior to the interview, due to a suspicion that a problem
may emerge, even if they are not read exactly as written. Pre-planned probes may be
interviewer-specific (that is, crafted by each individual interviewer on a custom basis), but are
more often meant to be for use by all interviewers who will be conducting interviews, and are
developed before interviewing commences by either a questionnaire development group or a
lead individual. For example, if it is anticipated that a particular term may not be universally
understood, all interviewers can be instructed to apply the probe: "What does ‘vascular’ mean to
you?" These probes are often typed directly into the questionnaire draft, and labeled explicitly
as probes to be administered immediately after the target question is presented.
b) Resources exist to plan a fairly standardized testing approach (as opposed to having
just a few hours after receiving a testable draft questionnaire in which to prepare probes).
c) Cognitive interviewers are relatively inexperienced and would benefit from the
guidance provided by a structured protocol. In fact, inexperienced interviewers often
appreciate the presence of ready-made probes, given that determining how to probe
spontaneously is somewhat of an acquired skill that requires experience to master.
Just as cognitive interviewing may involve both think-aloud and verbal probing, the most
effective interviews may consist of a combination of scripted and spontaneous probes, rather
than either by itself. By way of analogy, a cognitive interview is similar to a session with a
clinical psychologist; the therapist has certain guiding principles, and perhaps specific questions
or comments to apply during a session with the patient. However, much of the interchange
emerges spontaneously during the course of therapy. The clinical session may be approached in
ways similar to other sessions, and be somewhat scripted, but every interview is different, entails
its own developmental sequence, and makes a unique contribution as far as the diagnosis of
problems.
In order to better illustrate the above discussion of cognitive techniques, and the use of verbal
probing in particular, a list of examples of survey questions that have undergone cognitive
testing is presented below.7 Each example consists of:
2) A list of several probes that would be appropriate to use in testing that question.
7These questions were developed during the time the author worked in the Questionnaire Design Research
Laboratory at the National Center for Health Statistics, CDC, in Hyattsville, MD. The tested questions were mainly
intended for use in the National Health Interview Survey (NHIS), an in-person, household-interview-based health survey
conducted annually by NCHS.
EXAMPLE 1
Has anyone in the household ever received vocational rehabilitation services from-
... The State Vocational Rehabilitation program?
2) Probes:
c) How sure are you that (person) got this type of service?
(To determine the subject's ability to recall information confidently.)
3) Results:
4) Suggested revision:
Note: The question was "decomposed", or divided up, to make it easier to understand.
The term "vocational" was also changed to the more understandable form "job".
EXAMPLE 2
2) Probes:
3) Results:
It was found that for tested individuals whose use was intermittent over a long period of
time, the question was interpreted in two distinctly different ways:
1) "How long has it been since (person) first used the (device)?” For example,
the subject may say: "since 1960, so about 30 years".
2) "For how long, overall, has (person) actually used the device since first having
it?” The subject counts up periods of use within a longer time- for example: "For
two five-year periods since 1960, so 10 years".
Note that the problem identified can be considered a type of comprehension problem, but
doesn't involve a failure of comprehension of a key term, as did the last example.
Rather, subjects simply have alternate, but reasonable, interpretations of the question
intent (a good example of basic question vagueness).
4) Suggested revision:
This required consultation with the client, in order to clarify the objective of the question.
It became clear that the desired expression was:
EXAMPLE 3
1) Original form:
About how many miles from here is the home (child) lived in before (he/she) moved to
this home?
2) Probes:
3) Results:
No one had difficulty understanding the question as posed. However, some subjects
needed to think for a fairly long time before giving an answer. Further, some subjects
struggled needlessly with the level of specificity they thought was required (for example,
deciding whether the distance was closer to 20 or to 25 miles, when this information was
ultimately irrelevant, as the interviewer would mark "1-50 miles" in either case).
The problem can be described as one involving a difficult recall task, as opposed to a
comprehension problem. A rephrasing of the question that incorporated response
alternatives was necessary to make clear to subjects the degree of precision that was
necessary in their answer.
4) Suggested revision:
1) Original form:
2) Probes:
3) Results:
Subjects found it very difficult to remember back to the time period specified, at the
required level of detail. In fact, it seemed that some subjects really could not even
answer this with respect to their current behavior, let alone their behavior many years
ago. Recall of information (assuming it was ever "learned" in the first place) seemed to
be the dominant problem.
As for the previous example, the cognitive interviewing staff needed to confer with the
sponsor/client to clarify question objectives. We were able to determine that use of a
broad scale of level of activity, comparing past and present behavior, would satisfy the
data objectives:
4) Suggested revision:
1) Original version:
During a typical work day at your job as an (occupation) for (employer), how much
time do you spend doing strenuous physical activities such as lifting, pushing, or
pulling?
[ CATEGORIES ARE CONTAINED ON A CARD SHOWN TO RESPONDENT ]
___ None
___ Less than 1 hour
___ 1-4 hours
___ 4+ hours
2) Probes:
3) Results:
Careful probing revealed that people who gave reports of 1-4 hours often were office
workers who did little or no heavy physical work. This appeared to be due to biasing
characteristics of the question; saying "none" makes one appear to be entirely "non-
physical", and is therefore somewhat socially undesirable. This problem was seen as
related to respondent decision processes, rather than to comprehension or recall. A
resolution was needed to make it easier for someone to report little work-related physical
activity:
4) Suggested revision:
The next questions are about your job as a (FILL JOB TITLE) for (FILL
EMPLOYER).
Does your job require you to do repeated strenuous physical activities such as lifting,
pushing, or pulling heavy objects?
(IF YES:) During a typical work day, how many minutes or hours altogether do you
spend doing strenuous physical activities?
Note that the results of a field-based survey experiment by Willis and Schechter (1997)
have supported the contention that the revised question form is very likely a better
expression than was the initial version (see Section 9 of this guide).
EXAMPLE 6
1) Original:
Do you believe that prolonged exposure to high levels of radon gas can cause:
YES NO Don't Know
Headaches? __ __ ___
Asthma? __ __ ___
Arthritis? __ __ ___
Lung Cancer? __ __ ___
Other cancers? __ __ ___
2) Probes:
3) Results:
Simple observation of subjects made it clear that this question is difficult to answer.
Subjects required a long time to respond to each item, and tended to be unsure about
several of the items. Further, probing revealed that the format encouraged a "guessing"
strategy, rather than actual retrieval of information. Finally, for people who do not
believe that exposure to radon is harmful, it became very tedious, and sometimes even
offensive, to repeatedly ask about the specific harmful effects of radon.
In this case, it appeared that the subject's decision processes were excessively burdened
by the phrasing of the question.
4) Suggested revision:
Do you believe that prolonged exposure to radon is unhealthy, or do you believe that it
has little or no effect on health?
The revised phrasing provides the respondent with a way to respond just once, that he or
she does not believe that radon is harmful. Then, if he/she does believe it to be harmful,
the next question simply allows him/her to "pick and choose" the items that seem
appropriate. The burden on decision processes appeared to be reduced, using this
alternative.
EXAMPLE 7
1) Original:
What is the primary reason you have not tested your home for radon?
2) Probes:
c) How much have you thought about having your home tested?
3) Results:
Although the question is easily enough understood, it was very difficult for subjects to
produce a reasonable answer, especially if they had never given the issue much thought.
Instead of simply saying "I never thought about it", or "I haven't gotten around to it",
they tried to think of answers that were more specific or defensible. Here both recall and
decision processes appeared to be operating.
The sponsor/client agreed that it was not especially useful to ask the reason that someone
had not carried out this activity.
The discussion above has focused on cognitive problems in questionnaires; that is, problems
involving the comprehension, recall, decision, or response processes necessary to adequately
answer the question. However, cognitive interviewing has several overall positive effects, in
addition to the understanding of specific cognitive processes:
a) Learning about the topic. One can explore the nature of the underlying concepts to
be measured in the survey, and the specific topical material, by relying on lab subjects as
substantive "experts". For example, no one is more knowledgeable about the topic of
illicit drug use than those individuals who have used them, and the basic logic of
questions on the use of assistive devices can best be assessed through intensive
discussions with individuals who use canes, wheelchairs, walkers, and so on. By
learning about the topic we are of course in a much better position to write meaningful
survey questions.
Note that it may require no special probing techniques to detect the types of problems mentioned
above, beyond actively attending to the possibility that they can occur.
As an addendum, some of the “non-cognitive” features described above have recently been re-
interpreted, in particular by Eleanor Gerber at the U.S. Census Bureau, as representing the
Ethnographic approach to cognitive interviewing– in which the emphasis is not so much on the
subject’s cognitive processes per se, but on cultural variables, such as belief systems and
everyday practices that determine whether or not a particular question even makes sense to ask a
respondent (Gerber, 1999).
As a concrete example, surveys of physical activity have traditionally focused mainly on leisure-
time activities most appropriate to relatively well-educated office-workers, but have not been
designed with other groups, such as lower-income Hispanic women, in mind (Ainsworth, 2000).
As a result, the questions do not tend to focus on the physical activities that such respondents do
engage in, and may therefore misrepresent their activity. The questionnaire-based flaw is not
primarily due to limitations in the cognitive processing of the question; rather, the wrong
questions are asked in the first place. I will discuss this issue further in a later section related to
future directions in the cognitive interviewing field, but point out here that cognitive
interviewing techniques may be very useful in addressing the types of cultural variables that
influence question design, and that may contribute to error in survey questions that do not allow
for cultural variation. For example, a careful cognitive interviewer might determine, through
probes which ask a female Hispanic subject to describe her day, that questions limited to
physical activities such as aerobics classes, yoga, and jogging, are grossly insufficient.
SECTION 6. PUTTING COGNITIVE INTERVIEWING IN CONTEXT
This section places the techniques described above into the broader context of conducting
cognitive testing within a real-life survey development process. Because it is useful to first
consider the general sequence of events that may occur after a questionnaire is drafted, the
following schematic diagram shows how cognitive interviewing can be incorporated into the
overall developmental and testing plan.
└────────────────┬───────────────────┘
a) Iteration. Cognitive interviewing appears to function best with multiple small rounds
of interviewing, as opposed to a much larger single round (so, 3 rounds of 9 subjects are
normally preferable to one round of 27).
b) Time. Several critical periods are generally involved: Clearly, overall time for
cognitive testing is important, but in particular, practitioners must allow for time to
develop a testing protocol, to write up testing results, to confer with other interviewers,
and to modify the questionnaire both between rounds and subsequent to the final one.
Given the practical constraints that influence interviewing, the cognitive interviewing team must
consider a number of key issues:
Although cognitive interviews of up to two hours are possible, a common view is that a one-
hour cognitive interview is optimal; longer periods make excessive demands on subjects (as well
as the interviewer!). In general, the interview process should be flexible, and if the
questionnaire is too long to test in one session, the interviewer should not be required cover it in
its entirety– questionnaires can sometimes be broken up into appropriate testing pieces,
especially early in development. Predicting the amount that one will get through in the cognitive
interview is difficult, because questionnaires often have skip patterns that result in widely
varying actual questionnaire lengths for different individuals, and subjects vary in overall speed
and in the degree to which they respond in detailed ways to either the survey questions or probe
questions.
Given that one settles on a one-hour period, a question that always seems to arise is: How long
can a questionnaire be, in terms of “field minutes,” in order to be tested in an hour-long
cognitive interview? This of course varies, depending on a) the amount of probing the
interviewer does, and b) subject characteristics which control the amount of the questionnaire
that is “hit” during testing (for example, cognitive testing of a tobacco questionnaire that
involves regular smokers will increase administration time, relative to a fielded survey which
contains many non-smokers who skip most questions). As a general rule of thumb, however, I
normally suggest a 2:1 ratio between “cognitive interview time” and “field time,” so that 60
minutes of cognitive interviewing time will be used to test a questionnaire that requires 30
minutes to complete in the field.
How much time is required for activities other than the interview itself?
Note that even though the interview itself may take only an hour, the interviewing process
requires considerably more time. In all, preparation, interviewing, and writing up results of the
interview usually take at least three hours, and sometimes considerably more. This appears to
be an area in which cognitive interviewers very considerably. Some organizations require
interviewers to review video- or audio-recordings of every interview, and some have even had
each interview transcribed for subsequent review and analysis. Other organizations favor a
process by which the cognitive interviewer records as many comments as possible during the
interview, and then review recordings only as necessary for purposes of clarification.
The issue of how “intensive” our analysis of cognitive interviewing results should be has to my
knowledge not been explicitly addressed, let alone settled. My preference is to put most of my
effort and attention into a) interview preparation (reviewing the questionnaire, formulating
planned probes) and b) interview conduct (listening intently, formulating spontaneous probes,
writing careful comments). Then, in developing comments and recommendations, I study my
written notes from the interview, across interviews I have done, attempting to determine
common themes as far as potential problems, and then to devise suggestions for modifications.
In the best of worlds I might also listen to each interview again– but this is not necessarily the
best use of limited hours in the day.
Because cognitive interviewing can be a taxing activity, especially if the interviewer is actively
listening to the respondent and formulating spontaneous probes (as opposed to just “going
through the motions” of probing) it is recommended that any individual do no more than three
interviews in a single day. There are, as always, exceptions. I have on occasion been engaged
in activities that have been very concentrated– for example, traveling to a drug-treatment clinic
to carry out as many interviews as could be completed in one day. In this case, each interviewer
carried out up to six interviews, and this proved to be an efficient use of time. I would not have
chosen to do this every day for a week, however.
Overall, I strongly believe that background credentials and degrees are much less important
than is personal approach and -- especially -- gaining specific experience with cognitive
interviewing.
a) Have experience in questionnaire design, and are knowledgeable about both survey
practice and about the purpose of the questionnaire to be tested. These skills are essential
when the time comes to apply the results of the interviews in revising the questionnaire.
b) Have learned the basic premises of cognitive interviewing, and are familiar with the
ways in which fundamental cognitive processes may influence the survey response.
c) Have been exposed to social science measurement phenomena such as bias, context
effects, scaling effects, and so on.
d) Have good inter-personal skills, and so are capable of putting a subject at ease and
remaining non-judgmental in approach. There is no common agreement concerning how
"professional" versus "friendly" the interviewer should be during the interview itself, in
order to obtain the best quality data (this may in part depend on the personality of the
interviewer, as well as the philosophy of the organization).
e) Finally, and not surprisingly, experience counts; the best interviewers seem to be those
who have done this a lot (see the next section, on interviewer training)
Training is one of the most important, though least well-documented, aspects of cognitive
interviewing. I have seen many cases in which investigators have reported on the use of
cognitive interviewing, but without reporting who did the interviewing, how they did it, or how
they were trained (and by whom). Given the wide variety of activities that “cognitive
interviewing” may entail, this may lead to a number of questions about the degree of similarity
in techniques across practitioners (see Willis, et al., 1999 for a review).
c) Probing. They are taught the specific probing methods for use in the interview, by
means of a lecture-based training program or written guide. They can then be shown
examples of the way that probing is used to detect problems in survey questions. This
can be in both written form, and through the use of audio- and video-taped recordings of
previous interviews.
a) Find problems, rather than work around them. Field interviewers have typically
learned over time "to make a question work", for example, by re-wording it, so that a
confused respondent will ultimately provide a codeable response. It must be emphasized
that our task in the lab is different; to find, rather than to adjust for, flaws in the
questions.
b) Slow down. Interviewers tend to work as fast as possible in the field, usually in order
to complete a very long interview before the respondent becomes uncooperative.
Interviewers must be reminded to work at an unhurried pace in the lab.
d) Be flexible. Field interviewers are taught not to deviate from the instructions
contained in the instrument. In contrast, cognitive interviewers must be comfortable
departing from the questionnaire flow when this appears to be called for. They also must
be able to adjust to a situation in which sequencing instructions are incorrect or
completely absent, which often occurs in the testing of a draft questionnaire.
Health surveys in particular sometimes ask questions on private and personal topics such as
sexual behavior and drug use, and economic surveys often request detailed information about
income and employment history. One might ask whether it is possible to conduct intensive
probing of a question such as “Have you ever given money for sex, or received money for sex?”,
or “How many times in the past 12 months have you been drunk?” I have found that it is
possible to conduct effective cognitive testing of sensitive topics, subject to several restrictions:
b) Focus of probing. In some cases investigators have chosen not to ask the respondent
to directly answer the sensitive questions, as we are not primarily interested in their
answers. Rather, it is possible to ask about a) term interpretation (“What does ‘any kind
of sex’ sound like it means?”; b) recall (“Does 12 months seem ok or is it too long to
remember how many times you’ve been drunk?”); or c) decision processes “Would you
rather have an interviewer ask you these questions out loud, or use the computer?”). Of
course, the answers to these probes may be informative to the extent that we also do
know how the respondent would answer the tested questions – and sometimes subjects
who would otherwise be reticent to being asked directly will spontaneously volunteer
their actual behaviors in order to provide clarification. That is, an explicit focus by the
cognitive interviewer on features of the question, rather than of the individual subject,
tends to elicit discussion of the type that one would rarely expect to have with relative
strangers in the course of everyday life.
Introducing the activity: There are several features of laboratory interviewing that are important
for cognitive interviewers to understand, and that are useful to express to the subject, before
beginning a cognitive interview:
a) Focus on questions, not answers. The interviewer should stress to the subject that
he/she is not primarily collecting survey data on them, but rather testing a questionnaire
that has questions that may be difficult to understand, hard to answer, or that make little
sense.
b) Emphasis on finding problems. Make clear that although we are asking the subject to
answer the survey questions as carefully as possible, we are primarily interested in the
ways that they arrived at those answers, and the problems they encountered. Therefore,
any detailed help they can give us is of interest, even if it seems irrelevant or trivial.
Further, orient all defects towards the questions, rather than the subject: for example,
rather than asking “Do you understand this?”, phrase this instead as “Is the question
clear?”
d) Getting the subject to be critical. It also is somewhat helpful to add: "I didn't write
these questions, so don't worry about hurting my feelings if you criticize them- my job is
to find out what's wrong with them". This helps to "bring out" subjects who may
otherwise be sensitive about being overly critical.
Recording the answers to tested questions: One very subtle but critical feature of interviewing
concerns that of whether it is necessary to obtain quantitative answers to the questions we are
testing, or just to record notes such as: “The term ‘practitioner’ is technical in nature and should
be simplified.” Sometimes survey sponsors or clients seem to be very fixated on exactly “how
the subjects answered the questions,” when my view may be that “they really couldn’t answer
them, because they’re bad questions.” Cognitive interviewers do seem to vary widely in this
regard – the degree to which they emphasize both:
a) Obtaining an answer to each tested question from the subject, when conducting
interviews, and;
b) Providing aggregate data that are quantitative in nature, when writing up the results of
interviews.
The former issue -- concerning probing style -- may even have some interesting ramifications
concerning the overall type of probing that the interviewer does. Beatty, Willis, and Schechter
(1997) have described Elaborative versus Re-orienting probing; the former induces the subject
to elaborate on the answer given (“Can you tell me more about why you say that you agree?”);
whereas the latter re-focuses attention back to the question (“I just asked about your opinion on
water conservation - can you tell me what types of things come to mind when you think of
that?”). Both are legitimate examples of probes designed to obtain further information.
However, it is possible that these lead to different forms of interviewing write-ups, related to a
focus on the answers, as opposed to the questions. I will not belabor this point, but suggest that
interviewers be cognizant of this distinction in developing their own probing behaviors.
Consistent with my general approach throughout this manual, I favor a mix of both probe types.
SECTION 7. LOGISTICS: RECRUITMENT, STAFFING, PAYMENT,
AND WRITING UP INTERVIEW RESULTS
Recruitment
From a practical point of view, recruitment is the “500 pound gorilla” driving the feasibility of
cognitive interviewing. One may have on hand a questionnaire to test, highly trained
interviewers, and a ready protocol, but in order to conduct empirical testing of a questionnaire,
the most critical requirement is the recruitment of the appropriate subjects. The researcher
initially needs to identify and recruit volunteers from appropriate sub-populations for testing the
survey questionnaire, taking into account several considerations:
a) Getting the right people. Subjects either have characteristics of interest for the survey
(particular status with respect to health, work, age, sex characteristics), or they may be
"general" subjects, for questionnaires that are asked of the general population. However,
even for a questionnaire that is intended for special populations, it is worth testing the
initial screening sections on people who do not exhibit the characteristic(s) of interest.
This practice allows the interviewers to ensure that the questions do not create problems
in the majority of cases in which the questionnaire will be administered (that is, where
the respondent does not have the characteristic). As an example, a questionnaire that is
intended to identify individuals with Pediatric conditions might be tested only on
individuals who answer an advertisement for "people with foot problems." However,
failure to test the screening/filter questions on individuals without foot problems could be
catastrophic. If, for example, virtually everyone answers initial screening questions (in
effect asking: “Do you have any foot problems?") in the affirmative, a large number of
inappropriate respondents might wind up passing the filter and be subjected to a series of
completely irrelevant follow-up questions. As a general rule, questionnaires that seek to
identify a particular population should be tested to determine that they adequately 1)
screen in people having the characteristic of interest (that is, they exhibit sensitivity), and
also 2) screen out those who do not (they also exhibit specificity).
b) Looking in the right places. Subjects are recruited through newspapers, fliers, service
agencies, and support groups. If payment will be involved, flyers and newspaper ads
should clearly emphasize this feature (monetary incentives tend to be very effective).
c) Getting a range of subjects. Statistical sampling methods are not normally used in
obtaining laboratory subjects. At most, we use a "quota" sample, in which one attempts
to obtain a range of ages, genders, and socio-economic levels, if possible.
Payment (“remuneration”)
As of 2002, the industry standard appears to be $25 - $50 for a one-hour interview, depending
mainly on how difficult it is to induce individuals to participate (and for subjects who represent
“a needle in a haystack,” even more may be necessary). The proper amount to pay depends in
part on variables such as the subjects' travel time and basic inconvenience in traveling to the
location of the interview, if this will be necessary. 8 Further, this payment is enough that it is not
simply a token remuneration; this way, we are less likely to only recruit individuals who are
practiced volunteers, and who may participate mainly out of interest in the survey topic, or in
surveys in general (and who may therefore be very different from the usual household survey
respondent). However, the amount of payment should be determined by considering a number
of issues, such as the general demographic status of the types of subjects required, difficulty or
burden imposed by the interviewing task, and so on.
The “generic” cognitive interviewing procedure consists of the conduct of the cognitive
interviews in a face-to-face mode, within a cognitive laboratory environment. However, it is
also possible to conduct these interviews over the phone, once an interview has been scheduled.
It is rare that researchers will call "out of the blue" to someone selected randomly, as in a
Random-Digit-Dial telephone survey. Telephone-based interviews have been reported to be
useful for several specific purposes (see Schechter, Blair, and Vande Hey, 1996):
b) When the subjects to be interviewed are unable to travel to the interviewing location
(e.g., the elderly disabled), and where it is infeasible or costly to travel to them. In
particular, limitations in mobility due to disability should not be a factor in determining
whether an individual is eligible to be interviewed, and it makes sense to provide
flexibility in this regard, through reliance on the telephone.
Generally, in-person interviews may be preferable, overall, because this allows observation of
non-verbal cues, and provides a more natural type of interchange between the subject and the
interviewer than may be possible over the phone. However, I advocate the imaginative use of
many different testing modes (for example, one may even conduct telephone interviews within a
cognitive laboratory environment, in which the interviewer and subject are placed in different
rooms).
Staffing
8Note that interviews can also be conducted ‘off-site’, such as in appropriate clinics, offices of service
agencies, at churches or libraries, or even in subjects’ homes. The location of the interview is not nearly as important as
the nature of the activities that are conducted. In determining the interview location, the focus should be mainly on
“what do we have to do to interview the people we need.”
Based on experience working in both “staff-structure-oriented” (Federal government) and
“project-based” (contract research organization) environments, I believe that there are is no
single type of organizational structure that is necessary in order to support an effective cognitive
interviewing capacity. It is very helpful to develop a cadre of staff members who have a history
of cognitive interviewing experience. As mentioned several times, interviewing skill is an
acquired capacity, and interviewers tend to improve with time. It also helps very much to have a
particular staff member who can be responsible for subject recruitment: placing advertisements,
developing flyers, making phone calls, scheduling, and generally monitoring interviewing
operations (Federal laboratories sometimes employ a dedicated Laboratory Manager). Further,
staff should have experience in relating to clients or sponsors of questionnaires, in order to
communicate the findings from laboratory interviews. Finally, and very importantly, staff must
have the questionnaire design experience necessary to translate laboratory findings into realistic
and practical solutions.
Related to the general issue of staffing, a question that sometimes arises is that of how many
cognitive interviewers should be employed for a particular project, or testing round. Even if the
size of the interviewing sample is small (nine or fewer), it is useful to involve several
interviewers, in order to have a variety of interviewer opinions. That is, it seems more useful to
have three interviewers each conduct three interviews apiece, than to have one interviewer
conduct nine. However, there is little direct evidence on the efficacy of the various interviews-
per-interviewer ratios that might be used, so this is another issue that is open to debate. I have
sometimes been the lone interviewer who has conducted nine interviews, and felt that this level
of involvement has been enlightening. On other occasions, two other interviewers and I have
each conducted three interviews, and I have felt reassured by the level of overall correspondence
between interviewers, or appreciative of the additional perspective gained by interviewing
colleagues (or just happy to be able to share the load!). Very often, logistical issues (such as
availability of other cognitive interviewers) end up driving decisions in this regard.
Although organizations that conduct relatively large numbers of cognitive interviews, such as
NCHS, the Bureau of Labor Statistics, the Census Bureau, Westat, and RTI, have dedicated
laboratory facilities containing video and audio equipment as well as remote observation
capability, cognitive interviewing does not require special physical environments or
sophisticated recording equipment. In fact, as mentioned above, many interviews have been
conducted outside the cognitive laboratory, such as in service organization offices or homes.
Therefore, any quiet room, such as a conference room or empty office, can serve as a
"laboratory" in which to conduct interviews. Equipment needs are also minimal; it is helpful to
have a portable tape-recorder, as it is useful to record interviews (most subjects do not object, as
long as privacy and confidentiality requirements are met9). Video-taping is also commonly done
by the permanent laboratories. If respondents are to be videotaped, it is necessary to hide the
camera, or to make it minimally unobtrusive (although informed consent from the subject for
taping is of course still necessary). Some organizations also make use of one-way mirrors for
observation; these might also affect the interchange, however, especially when the questions that
are asked are sensitive or potentially embarrassing.
There are a variety of methods used for compiling the results from cognitive interviewing (see
Willis, 1994), and no one way is necessarily best. Some organizations may instruct interviewers
to carefully listen to a taped recording of each interview, whereas others will work only from
written notes. Some will utilize a report for each interview that was conducted, in order to
maintain the “case history” integrity of each interview; others will produce one written report
which aggregates the results across interviews, in order to provide a more summarized, question-
oriented version of the results. Finally, as discussed above, one may be very interested in
compiling the exact responses that were given by subjects to the questions, and other times the
response data will be much less relevant, as purely qualitative information may be of greater
relative importance.
For readers desiring a specific recommendation in this regard, a fairly efficient means for
processing "data," representing a reasonable trade-off between completeness and timeliness, is
the following:
A1. How far do you routinely travel to get health care? Would you say less
than an hour, one to two hours, or more than two hours?
Comments:
Of the four subjects I tested, all had problems answering this question. Three of
them objected that this really varied, depending on the type of provider they’re
visiting. The fourth one stated that the answer to “how far” would be five miles;
note that the question is internally inconsistent, because the question implies a
distance, while the answer categories are all represented by amounts of time.
9Although this guide does not discuss the issue in detail, organizational Institutional Review Board (IRB)
requirements must often be met prior to the conduct of cognitive interviewing. Typically, IRBs have tended to be
sympathetic to such efforts, as long as methods for ensuring privacy and confidentiality are established and rigidly
adhered to.
Finally, it wasn’t really clear what the reference period is. One subject had been
to the doctor once, in the past year or so, and so didn’t know how to handle the
“routine” part, or how far back he should go in thinking about an answer. We
really need to re-think whether we want to know how long it takes people to see
the provider they saw the most, during the past X months, or how long it takes
them when they go for a routine check-up (assuming they do), or something else
entirely.
Note that this comment is fairly involved, points out several problems, and
instead of simply suggesting a re-wording, explicitly brings out the issue of the
need for better specification of question objectives. Such a result is very
common, as a product of cognitive interviewing.
b) Comments of the type illustrated above are then be further aggregated, over
interviewer, and over interview, to provide a complete picture of a particular draft of the
questionnaire.
c) The final annotated questionnaire then becomes the main section of a cognitive
interviewing outcome report, which is prefaced with a description of the specific
purposes of testing, the nature of the subject population, a description of recruitment
methods used, the number of subjects tested, the number of interviewers used, and
description of specific procedures used during the interviews. The “protocol” consisting
of the tested questionnaire, along with scripted probes, can also be included as an
appendix (in which case the main report may only contain those questions for which
interviewers had specific comments). Alternatively, the probes can be included along
with question-specific comments. It is also sometimes useful to provide an overall
written summary of the most significant problems that were found, prior to the detailed
question-by-question listing.10
At mentioned above, some researchers prefer to rely on standardized analysis of tape recordings
of interviews. Be cautioned, however, that this is a very time-consuming activity, and the
appropriateness of this activity depends on the nature of the testing. For production work, in
which revisions are made at a fairly quick rate, it is often not possible to devote the resources
necessary to transcribe or analyze taped interviews. In this case, reliance on written outcome
notes alone may be sufficient. Tape-recording is still valuable, however, where project staff or a
sponsor/client may want to listen to the tape to get a first-hand impression of how the
questionnaire is working. Transcription or analysis of these tapes can also be valuable for
purposes of research, in addition to strict questionnaire evaluation and development.
10Readers should feel free to contact the author to obtain e-mailable samples of cognitive
testing reports.
Again, cognitive interviewing outcome data tends to be qualitative, rather than quantitative.
Qualitative trends worth explicitly focusing on include:
b) Discoveries: Even if they occur in only a single interview, there are some problems
that prove to be very important, because they can severely threaten data quality in a few
cases, or because these problems are expected to be fairly frequent in the actual survey.
Especially because of the generally small samples involved, there is very little in the way of
“truth in numbers.” That is, one must rely heavily on the interviewer’s “clinical judgment,” in
determining the implications of cognitive interview findings, as these have ramifications for the
fielded survey. For example, one might conclude that a particular interview was very
idiosyncratic, and should be ignored. Or, it may be found that the set of subjects tested was
more highly educated, on average, than the population to be surveyed. In this case, even
relatively modest levels of documented comprehension problems might motivate the designers to
attempt a simplification of the questionnaire. In general, it is dangerous to conclude, for
example, that if problems are found in 25% of lab interviews, then they are to be expected in
25% of field interviews. One must always be careful to apply a type of subjective correction
factor to the interview findings, based on knowledge of the likely differences that exist between
the subjects that were tested, and the respondents who will be surveyed. The capacity of the
interviewing and questionnaire design staff for applying judgment, adjustment, and subjective
corrections is basic to the practice of cognitive interviewing. 11
Because the focus of cognitive interviewing is the detection of questionnaire problems, there is
often a tendency to "get into testing quickly", and then deal with the problems that emerge. It is
imperative, however, that initial meetings, or some type of communication, be carried out prior
to interviewing, to make clear the objectives of the questionnaire, and that interviewers conduct
some type of technical review or appraisal of an initial draft. The placement of an Expert
Appraisal step prior to cognitive interviewing may be a particularly effective practice (Forsyth
and Lessler, 1991; Willis & Lessler, 1999). In fact, experienced cognitive interviewers can often
anticipate the types of difficulties that may be expected, prior to interviewing. Once an initial
review, and perhaps a modification, have been conducted, interviewing can commence. After a
11The emphasis on subjectivity, clinical judgment, and opinion may strike some readers as undisciplined.
Note, though, that the usual alternative (in effect, the armchair crafting of survey questions) exhibits these same
problems, but on a much greater scale. The recommendation made here is not to ignore empirical evidence, but to put it
in an appropriate context when making decisions about what is likely to be best questionnaire design practice.
suitable number of interviews are completed, and interviewer notes are compiled, one can
convene a group meeting to discuss findings.
a) Problem seriousness. If it becomes obvious after several interviews that there are
major problems to be rectified, then there is little benefit in conducting more interviews
before modifications are made to the questionnaire. Especially in the very early stages of
development, as few as four interviews may be sufficient to constitute a "round" of
interviewing.
b) Limitations to “beating a dead horse.” Even if it appears that more interviews should
be done, it is seldom necessary to conduct more than 12 - 15 interviews before meeting
or delivering comments concerning that round of interview results12, unless one is mainly
interested in obtaining quantitative data related to the answers to the survey questions
themselves. In that case, though, it might be better to conduct a small-scale field pretest,
as opposed to cognitive interviewing.
At post-interview design meetings, interviewers should discuss their findings in detail with any
questionnaire designer who has not participated in the interviewing process. As a general rule, it
is beneficial if everyone who is actively involved in the questionnaire design process, including
clients, participate in cognitive testing, even if simply as an observer. Clients or sponsors should
be encouraged to observe interviews, or to listen to tape recordings; the impact of a question that
is confusing or long-winded is very difficult to ignore when such evidence is presented. Very
often, where abstract discussions concerning the flaws contained in a questionnaire are
unconvincing, the evidence from only a few laboratory interviews can have a potent impact.
This is a point worth stressing; beyond its strength in identifying problems, a major positive
feature of the cognitive laboratory approach is in the relative persuasiveness of the information
it collects.
Meetings should be used both to point out identified problems and to suggest resolutions to these
problems. An advantage of the cognitive approach is that, if one understands the basis for the
failure of a particular question, a resolution to the problem may be readily suggested. For
example, if a term is clearly not understood, the designer(s) may search for an easier-to-
understand substitute. Likewise, if it is found that a reference period for a question is far too
long for subjects to recall information with any confidence, the use of a shorter interval is in
order.
After the questionnaire has been revised, based on the comments from meetings, and on any
12Researchers who are subject to OMB Clearance restrictions will be limited to the conduct of nine or fewer
interviews.
discussions with clients or sponsors, a new round of interviewing can be conducted to test the
changes made, and to provide additional testing of questionnaire segments that were not yet
changed. Several issues are pertinent at this stage:
b) Changes in the nature of interviewing over time. As noted earlier, the nature of the
interviewing rounds tends to change over the course of development of a questionnaire.
Early in the process, findings relate not only to particular question wordings, but to more
global issues, such as the appropriateness of the survey measurement of major concepts
that the questionnaire is attempting to cover. It may be determined that a class of
information is simply not available through reliance on respondent knowledge and
memory (for example, in the absence of immunization records, parents appear to have
appallingly bad knowledge of their toddlers' immunization histories). Or, it may be
determined that a vital concept, such as the diagnosis of a complex disease, is much too
complicated to be measured in a few short questions, and that a long series would
actually be required to adequately cover this level of complexity.
Once major conceptual problems have been ironed out, later rounds of interviewing tend to be
focused more exclusively on the appropriateness of individual questions (as in the examples
presented previously). Still, the unit of analysis is not necessarily the particular survey question,
apart from its context; relevant findings may cover a series of questions, and relate to clarity,
appropriateness of the series, biases due to question ordering, and so on. Hence, the focus of
testing is on both the question and the questionnaire. One of the challenges of engaging in a
useful cycle of testing and development activities is that we must be cognizant of all of these
levels, both small- and large-scale, simultaneously.
At some point, researchers and others contemplating the use of cognitive interviewing ask the
very reasonable question - “How do I know it works, or that it’s really worth the trouble?” This
is not an easy question to answer, and as such, it is becoming the focus of a considerable amount
of attention. First, there are several logical issues that can be argued, in the absence of any type
of quantitative evaluation data:
Do laboratory subjects differ from survey respondents? If so, does this render the results
invalid?
Volunteers for cognitive interviews are by definition self-selected for participation, and are
therefore clearly not representative of the survey population as a whole. Most importantly,
unless explicit steps are made to counter the effect, laboratory volunteers may tend to be higher
in level of education than the average survey respondent. This could have important
ramifications, in that one might miss problems that occur in "real life", and the laboratory
findings therefore underestimate the severity of problems.
This possibility is not usually seen as a serious flaw. In general, a set of cognitive interviews
does identify a significant number of problems; it is not often reported that “the questions
worked fine - we didn’t find anything wrong.” Further, note that if a question does not “work”
in the cognitive interviews, with presumably more highly able subjects, it will almost certainly
be expected to cause problems in the field, so the significant, and often serious, problems that
surface through cognitive interviewing appear to justify the use of the technique. In any event,
proponents of cognitive interviewers do not argue that such interviewing alone be used to
evaluate questionnaires. Instead, evaluators must also rely, if possible, on field pretests with
more representative respondents, and on additional forms of pretesting (such as the coding of the
interaction between interviewer and respondents, as described by Oksenberg, Cannell, and
Kalton, 1991).
Does it matter that the “cognitive laboratory” environment is different from that of the
field?
Assuming that one makes use of a cognitive “laboratory,” the physical environment will be
different than it is for a field interview. This is another reason why the cognitive lab is not seen
as a substitute for the field test. To see how the questionnaire works in "real-life" circumstances,
it has to be tested under field conditions, and this is worth doing, even for a small survey, with a
few test respondents. However, the extent to which the differences in question-answering
contexts between lab and field matter may depend greatly on the type of question administered.
For example, comprehension processes appear not to differ greatly between the lab and the
household; if someone does not know the location of his or her abdomen in the lab, it is doubtful
that he or she will know this when at home. Retrieval processes, similarly, will be different
between lab and field to the extent that the home environment provides cues that affect the
nature of the retrieval process. This again does not appear to be a great problem, based on the
experience of researchers who have used cognitive interviewing techniques extensively.
However, as discussed in Section 7, the situation may be much different for survey questions
that ask about sensitive topics. Here, environmental variables appear to be critical, and in fact
often seriously affect respondent decision processes (for example, in the laboratory, there is little
danger that a spouse will overhear a subject’s report of more than one sex partner in the past 12
months). Therefore, one should not generally use laboratory cognitive interviewing techniques
to attempt to directly assess how likely people will be to answer survey questions about such
activities as drug use or sexual behavior. Rather, one might use the lab only as a context for
more indirect, experimental studies, in which we interview individuals about their understanding
of questions, or about conditions they believe would be more or less likely to prompt them to
answer truthfully in a fielded survey.
Are the sample sizes from cognitive interviewing large enough?
It is sometimes argued that the cognitive approach is deficient, compared to the field pretest,
because the samples used are too small to make reasonable conclusions. There are at least three
faults in this argument:
a) The purpose of laboratory interviews is not statistical estimation. One does not
desire sample sizes large enough to supply precision in statistical estimates. Rather, we
strive to interview a variety of individuals.
c) The apparent increased sample size of the field pretest is often illusory. As discussed
previously, questionnaires often contain initial screening/filter questions, and then long
follow-up sections that apply only if one "passes" the screener. However, in cases where
respondents infrequently receive the follow-up questions, the general-population-based
field pretest tends to provide only a few cases in which these follow-up questions are
actually tested. For example, Willis (1989) found that a pretest of 300 households routed
less then 12 individuals to an important section on use of assistive devices (canes,
wheelchairs, etc.). On the other hand, prior laboratory testing of the same questionnaire
had specifically incorporated recruitment of subjects who would naturally screen-in to
the follow-up questions, and so the effective sample size of the lab sample turned out to
be larger than that within the field pretest.
The points made above are generally argumentative in nature, rather than truly evaluative, as no
data are presented to support the contention that there is any relevance of “what happens in the
cognitive interview” to “what happens in the field environment.” Although there are few studies
that purport to directly measure what Lessler, Tourangeau, and Salter (1989) term the carry-over
from the laboratory to the field environment, Lessler et al. did demonstrate, in an initial study of
cognitive interviewing, that the types of problems found in such interviews appeared to be
consistent with findings from a field situation.
More recently, a controlled experiment designed to more directly assess the degree of carry-over
from the cognitive laboratory to the field environment was done by Willis and Schechter (1997).
The study procedure involved the production of alternative versions of survey questions, based
on the results of cognitive testing, where these results also imposed explicit predictions
concerning the nature of the data estimates that would occur from fielding of those questions.
For example, for a question on physical activity, on the basis of cognitive interviewing results, it
was predicted that the inclusion of a screener question asking about whether the respondent
engaged in any physical activity would result in lower overall levels of reporting than would a
single question which simply asked how much activity the person engaged in, as the latter
appeared to reduce the expectation that a non-zero amount should be reported. In three
subsequent field studies of various size (ranging from 78 to almost 1000 respondents), involving
a range of survey types (face-to-face household, telephone, and clinical environment), these
predictions were borne out in striking fashion. Although these results are clearly not definitive,
they buttress the contention that cognitive interviews are similar enough to field interviews that
similar mental processes are involved, and that the directionality of qualitative results from
cognitive interviews can reasonably be expected to be maintained in the environment to which
they are intended to generalize to. Davis and DeMaio (1993) conducted a study related to the
use of cognitive interviews in dietary assessment that is somewhat similar in approach, and is
also supportive.
More extensive discussion of the general issues related to the evaluation of the usefulness of
cognitive interviewing, especially as it relates to other pretesting methods such as interaction
coding, is provided by Campanelli (1997), Lessler and Rothgeb (1999), Presser and Blair
(1994), and Willis, et al. (1999). One ramification of these discussions appears to be that it is
not particularly useful to determine which particular pretesting technique is superior, as they are
“stacked up” against one another. Rather, different techniques may best apply for different
purposes, and at different points in the developmental sequence– one does not conduct a field
pretest on an embryonic questionnaire, and on the other hand, the final "dress rehearsal" of a
forms-designed questionnaire is not usually tested with a round of cognitive interviews. The
overall challenge in pretesting is to utilize a cohesive developmental plan that takes advantage of
the strengths of each method. If applied properly, cognitive interviewing is likely to be an
effective means for identifying potential problems, before the problems are encountered
repeatedly– and too late– in the fielded survey.
SECTION 9. WHERE IS THE FIELD HEADED? OBSERVATIONS
AND SPECULATIONS
Finally, the reader may find it worth considering the ways in which new challenges in survey
methods may call on cognitive interviewing as a useful tool. In the form that I have presented it,
cognitive interviewing has been used mainly for the purpose of pretesting survey questionnaires
intended for a “demographically normative” population, involving face-to-face, telephone, or
self-administration, and involving either computerized (CAPI, CATI) or paper-and-pencil
instruments. However, there are several issues that increasingly weigh heavily on questionnaire
design, and that may induce cognitive interviewers to continue to adapt and evolve their methods
as appropriate. So, in this final section, I will attempt to gaze into a crystal ball, in order to
anticipate these trends.
As the U.S. population (as well as that of many countries) continues to diversify, surveys are
increasingly being challenged by the need to utilize questionnaires that appropriately capture a
range of sub-populations, both in terms of a) language translation and b) cultural
appropriateness. Large population-based surveys such as the NCHS National Health Interview
Survey have for years been translated into Spanish, but the trend towards multiple-language
translation is increasing - for example, the 2001 California Health Interview Survey was also
administered in several Asian languages. For the different language versions, researchers have
begun to seriously question whether these demonstrate the characteristic of cross-cultural
equivalence, such that the survey responses obtained from different groups can be meaningfully
compared. Further, even for English-language interviews, an important requirement is that the
questionnaires be culturally appropriate to each group.
Cognitive interviewing has recently been proposed as a possible mechanism for studying cross-
cultural differences related to questionnaire design - for example, cognitive interviews in
Spanish were conducted as part of the development of the 2003 National Cancer Institute’s
Tobacco Use Supplement to the Current Population Survey. Further several recent studies have
used a variety of quantitative and qualitative techniques, including cognitive interviewing, to
address general culturally-relevant aspects of cultural interviewing (McKay, Breslow, Sangster,
Gabbard, Reynolds, Nakamoto, and Tarnai, 1996; Pasick, Stewart, Bird, and D’Onofrio, 2001) .
It is to be expected that the use of cognitive interviewing for this purpose will only increase
further, and this presents significant challenges with respect to training and interviewer selection
(given the need to find cognitive interviewers who are bilingual), logistics (how to conduct
interviewing activities with multiple groups, within a prescribed development schedule, and
evaluation (how to determine whether the cognitive interviewing results from different groups
are themselves comparable). Readers who are interested in advancing the field of cognitive
interviewing may be interested in addressing these vital questions.
Further, given that one adjustment by survey researchers to non-response has been to migrate
toward mixed-mode surveys that provide increased procedural flexibility, a further potential
contribution of cognitive interviewing may be to develop a better understanding of the effects of
mode variation. It is well known that “mode matters” – non-trivial differences in response
distributions are obtained depending on whether a telephone or mail questionnaire is used, for
the same set of questions. Cognitive interviewing involving each mode may provide meaningful
information concerning the source, and significance, of these differences.
A common assertion is that “Internet surveys are coming” – eventually. Although there are at
this point severe sampling-oriented restrictions on the use of the Internet for general-purpose,
representative surveys, many researchers look to the World Wide Web especially as a possibly
viable mechanism, especially for mixed-mode surveys. The use of Web questionnaires present a
host of usability issues, and cognitive issues in general, and it seems clear that cognitive
interviewing will be important in studying these. Further, other novel computer-based
administration procedures such as ACASI (the Audio Computer-Assisted Self Interview, in
which a laptop computer presents the questionnaire aurally by use of digitized speech, rather
than a live questionnaire), similarly pose a number of research questions related to
communication, human factors, and respondent cognition; as such, cognitive interviewing
techniques have begun to be used to develop and evaluate these systems, and this trend should
continue.
Rather than representing a static - or stagnant - endeavor, cognitive interviewing must be seen as
dynamic, changing with the times, and evolving as necessary. I look forward to writing a further
edition of this manual, in order to in turn document and respond to these changes.
REFERENCES
Ainsworth, B.E. (2000). Issues in the assessment of physical activity in women. Research
Quarterly
for Exercise and Sport, 71, 37-42.
Beatty, P., G. Willis, & S. Schechter (1997). Evaluating the Generalizability of Cognitive
Interview Findings. In Seminar on Statistical Methodology in the Public Service: Statistical
Policy Working Paper 26, pp. 353-362. Federal Committee on Statistical Methodology,
Office of Management and Budget.
Campanelli, P., Martin, E., & Rothgeb, J.M. (1991). The use of respondent and
interviewer debriefing studies as a way to study response error in survey data. The
Statistician, 40, 253-264.
DeMaio, T. J., & Rothgeb, J. M. (1996). Cognitive interviewing techniques: In the lab and in
the
field. In N. Schwarz and S. Sudman (Eds.), Answering questions: Methodology for
Determining Cognitive and Communicative Processes in Survey Research, pp. 177-195. San
Francisco: Jossey-Bass.
Dippo, C. (1989). The use of cognitive laboratory techniques for investigating memory
retrieval errors in retrospective surveys. Proceedings of the International Association of
Survey Statisticians, International Statistical Institute, pp. 323-342.
Ericsson, K.A., & Simon, H.A. (1980). Verbal reports as data. Psychological Review, 87, 215-
250.
Esposito, J., & Hess, J. (May, 1992). The use of interviewer debriefings to identify
problematic questions on alternative questionnaires. Paper presented at the annual meeting of
the American Association for Public Opinion Research, St. Petersburg, FL.
Forsyth, B., & Lessler, J.T. (1991). Cognitive laboratory methods: A taxonomy.
In P. Biemer, R. Groves, L. Lyberg, N. Mathiowetz, and S. Sudman (Eds.), Measurement
errors in surveys. New York: Wiley.
Jabine, T.B., Straf, M.L., Tanur, J.M., & Tourangeau, R. (Eds.) (1984). Cognitive
aspects of survey methodology: Building a bridge between disciplines. Washington, DC:
National Academy Press.
Jobe, J.B., & Mingay, D.J. (1991). Cognition and survey measurement: History and
overview. (1991). Applied Cognitive Psychology, 5, 175-192.
Jobe, J. B., Tourangeau, R., & Smith A. F. (1993). Contributions of survey research
to the understanding of memory. Applied Cognitive Psychology, 7, 567-584.
Lansing, J.B., Ginsberg, G.P., & Braaten, K. (1961). An Investigation of Response Error.
Urbana, IL: Bureau of Economic and Business Research, University of Illinois.
Lessler, J.T., & Rothgeb, J.R. (1999). Integrating cognitive research into household survey
design.
In M. Sirken, T. Jabine, G. Willis, E. Martin, & C. Tucker (Eds.), A New Agenda for Interdisciplinary
Survey Research Methods: Proceedings of the CASM II Seminar. National Center for Health
Statistics, pp. 67-69.
Lessler, J.T., & Sirken, M.G. (1985). Laboratory-based research on the cognitive aspects
of survey methodology: The goals and methods of the National Center for Health Statistics
study. Milbank Memorial Fund Quarterly/Health and Society, 63, 565-581.
Lessler, J.T., Tourangeau, R. & Salter, W. (1989). Questionnaire Design Research in the
Cognitive Research Laboratory. Vital and Health Statistics, Series 6, No. 1; DHHS
Publication No. PHS-89-1076. Washington, DC: U.S. Government Printing Office.
McKay, R., Breslow, M.J., Sangster, R.L., Gabbard, S.M., Reynolds, R.W., Nakamoto, J.M, &
Tarnai, J. (1996) Translating survey questionnaires: Lessons learned. In M.T. Braverman &
J.K. Slater (Eds.), Advances in Survey Research (pp. 93-104). San Francisco: Jossey-Bass.
Oksenberg, L., Cannell, C., & Kalton, G. (1991) New Strategies for Pretesting Survey
Questions,
Journal of Official Statistics, 7, 349-365.
Pasick, R.J., Stewart, S.L., Bird, J.A., & D’Onofrio, C.N. (2001). Quality of data in multiethnic
surveys. Public Health Reports, 2001 Supplement 1, 116, 223-243.
Presser, S., & Blair, J. (1994). Survey pretesting: Do different methods produce different
results?
In P. V. Marsden (Ed.), Sociological Methodology: Vol. 24, 73-100. Washington, DC:
American Sociological Association.
Royston, P., Bercini, D., Sirken, M., & Mingay, D. (1986). Questionnaire Design
Research Laboratory. Proceedings of the Section on Survey Methods Research, American
Statistical Association, pp. 703-707.
Schechter, S., Blair, J., & Vande Hey, J. (1996). Conducting cognitive interviews to test self-
administered and telephone surveys: Which methods should we use?
Proceedings of the Section on Survey Research Methods, American Statistical Association,
10-17.
Sirken, M., G., Herrmann, M. G., Schechter, S., Schwarz, N., Tanur, J. M., & Tourangeau, R.
(1999). Cognition and survey research. New York: Wiley.
Sudman, S., Bradburn, N. M., & Schwarz, N. (1996). Thinking about answers:
The
application of cognitive processes to survey methodology. San Francisco:
Jossey-Bass.
Willis, G.B. (1994). Cognitive Interviewing and Questionnaire Design: A Training Manual.
National Center for Health Statistics: Cognitive Methods Staff, Working Paper No. 7.
Willis, G., DeMaio, T., & Harris-Kojetin, B. (1999). Is the Bandwagon Headed to the
Methodological Promised Land? Evaluation of the Validity of Cognitive Interviewing
Techniques. In M. Sirken, D. Herrmann, S. Schechter, N. Schwarz, J. Tanur, & R.
Tourangeau (Eds.), Cognition and Survey Research. New York: Wiley.
Willis, G.B. and J. Lessler (1999). The BRFSS-QAS: A Guide for Systematically Evaluating
Survey Question Wording. Research Triangle Institute.
Willis, G.B., Royston, P., & Bercini, D. (1991). The use of verbal report methods in
the development and testing of survey questionnaires. Applied Cognitive Psychology, 5, 251-
267.
Willis, G.B., & Schechter, S. (1997). Evaluation of Cognitive Interviewing Techniques: Do the
Results Generalize to the Field? Bulletin de Methodologie Sociologique, 55, 40-66.