CBCA by Albert Vrij (Review of 37 Studies) PDF
CBCA by Albert Vrij (Review of 37 Studies) PDF
CBCA by Albert Vrij (Review of 37 Studies) PDF
3
4 VRIJ
Lamb, Sternberg, Esplin, Hershkowitz, & Orbach, 1997; Pezdek & Taylor, 2000;
Ruby & Brigham, 1997; Tully, 1999; Vrij, 2000). However, the present is the
most comprehensive review as it includes more studies than previous reviews and
addresses a greater number of issues, such as interrater agreement rates; the
frequency of occurrence of the individual CBCA criteria in statements; the effect
of age, verbal ability, social ability, and interview style on CBCA scores; and
several aspects related to the Validity Checklist (another part of SVA).
as “What happened next?”) or questions (e.g., “You just mentioned a man. What
did he look like?”) are part of such techniques.
criteria (Criteria 1–13; see Figure 1) are likely to indicate genuine experiences as
they are typically too difficult to fabricate. Therefore, statements that are coherent
and consistent (logical structure), whereby the information is not provided in a
chronological time sequence (unstructured production), and that contain a sig-
nificant amount of detail (quantity of detail) are more likely to be true. Regarding
details, accounts are more likely to be truthful if they include contextual embed-
dings (references to time and space: “He approached me for the first time in the
garden during the summer holidays”), descriptions of interactions (“The moment
my mother came into the room, he stopped smiling”), reproduction of speech
(speech in its original form: “And then he asked, ‘Is that your coat?’”), unexpected
complications (elements incorporated in the statement that are somewhat unex-
pected, e.g., the child mentions that the perpetrator had difficulty with starting the
engine of his car), unusual details (details that are uncommon but meaningful,
e.g., a witness who describes that the man she met had a stutter), and superfluous
details (descriptions that are not essential to the allegation, e.g., a witness who
describes that the perpetrator was allergic to cats). Another criterion that might
indicate truthfulness is when a witness speaks of details that are beyond the
horizon of his or her comprehension, for example, when he or she describes the
adult’s sexual behavior but attributes it to a sneeze or to pain (accurately reported
details misunderstood). Finally, possible indicators of truthfulness are if the child
reports details that are not part of the allegation but are related to it (related
external associations, e.g., a witness who describes that the perpetrator talked
about the women he had slept with and the differences between them), describes
his or her feelings or thoughts experienced at the time of the incident (accounts
of subjective mental state), or describes the perpetrator’s feelings, thoughts, or
motives during the incident (attribution of perpetrator’s mental state: “He was
nervous, his hands were shaking”).
Other criteria (Criteria 14 –18; see Figure 1) are more likely to occur in
truthful statements for motivational reasons. Truthful persons are not as concerned
with impression management as deceivers. Compared with truth tellers, deceivers
are more keen to try to construct a report that they believe will make a credible
impression on others, and so they leave out information that, in their view, will
damage their image of being a sincere person (Köhnken, 1999). As a result, a
truthful statement is more likely to contain information that is inconsistent with
the stereotypes of truthfulness. The CBCA list includes five of these so-called
contrary-to-truthfulness-stereotype criteria (Ruby & Brigham, 1998): spontaneous
corrections (corrections made without prompting from the interviewer (“He wore
black trousers, no, sorry, they were green”), admitting lack of memory (expressing
concern that some parts of the statement might be incorrect: “I think,” “maybe,”
“I am not sure,” etc.), raising doubts about one’s own testimony (anticipated
objections against the veracity of one’s own testimony: “I know this all sounds
really odd”), self-deprecation (mentioning personally unfavorable, self-incrimi-
nating details: “Obviously, it was stupid of me to leave my door wide open
because my wallet was clearly visible on my desk”), and pardoning the perpe-
trator (making excuses for the perpetrator or failing to blame him or her, such as
a girl who says she now feels sympathy for the defendant who possibly faces
imprisonment).
The final criterion relates to details characteristic of the offense. This criterion
CBCA ASSESSMENTS 7
interviewee, (b) interviewer’s style, and (c) coaching of the interviewee. These are
discussed in this review. A fourth external factor, verbal skills of the interviewee,
has been examined in CBCA research but is not included in the Validity Check-
list. This factor is also discussed.
Type of Studies
In an attempt to validate the assumptions of CBCA, two types of studies have
been conducted. In field studies, statements made by persons in actual cases of
alleged sexual abuse have been examined, whereas in experimental laboratory
studies, statements of participants who lied or told the truth for the sake of the
experiment have been assessed. Each paradigm has its advantages, and the one’s
strength is the other’s weakness. The statements assessed in field studies have
clear forensic relevance as these are statements derived from real-life cases.
However, it is often difficult to establish the truth or falsity of these statements
beyond doubt.
Typically, criteria such as confession, polygraph results, and conviction have
been used to establish whether a statement is actually true or false. The problem
is that these criteria are often not independent from the quality of the statement
and, therefore, from CBCA scores. For example, statements were classified as
doubtful if the judge dismissed the charges in studies conducted by Esplin,
Boychuk, and Raskin (1988) and Boychuk (1991). However, a dismissal might
simply be the result of the child being unable to express convincingly to the judge
or jury what he or she had experienced; it does not necessarily imply that the child
is lying.
Another criterion often used to establish whether a statement is actually true
or false is a confession (Craig, Scheibe, Raskin, Kircher, & Dodd, 1999). How-
ever, if the only evidence against the guilty defendant is the incriminating
statement of the child, which is often the case in sexual abuse cases, it is unlikely
that the perpetrator will confess to the crime if the incriminating statement is of
poor quality because the perpetrator’s main motivation for confessing to a crime
is the perception that the evidence against him or her is strong (Moston, Stephen-
CBCA ASSESSMENTS 9
son, & Williamson, 1992). On the other hand, if a false incriminating statement
is persuasive and judged to be truthful by a CBCA expert, the chances of the
innocent defendant’s obtaining an acquittal decrease dramatically, and if there is
no chance of avoiding a guilty verdict, it may be beneficial to plead guilty to
obtain a reduced penalty (Steller & Köhnken, 1989). In summary, poor-quality
(e.g., unconvincing) statements decrease the likelihood of obtaining a confession,
and high-quality (e.g., convincing) statements increase the likelihood of obtaining
a confession, regardless of whether a statement is truthful or fabricated.
Good field studies establish whether the statement is actually true or false on
the basis of criteria that are independent from the witness statement, such as DNA
evidence and medical evidence. However, that type of evidence is often not
available in real-life cases in which CBCA assessments are conducted (Steller &
Köhnken, 1989). For a discussion about difficulties in establishing whether a
statement is true or false in studies of sexual abuse, see Horowitz et al. (1996).
In experimental laboratory studies, there is no difficulty establishing whether
a statement is actually true or false, but experimental situations typically differ
from real-life situations. Recalling a film someone has just seen (a paradigm
sometimes used in laboratory studies) is different from describing a sexual abuse
experience. Therefore, because of this lack of ecological validity, Undeutsch
(1984) believed that laboratory studies are of little use in testing the accuracy of
SVA analyses. Clearly, researchers should attempt to make laboratory studies as
realistic as possible and should try to create situations that mimic elements of
actual child sexual abuse cases.
Steller (1989) has argued that experiences of sexual abuse are characterized
by three important elements: (a) personal involvement, (b) negative emotional
tone of the event, and (c) extensive loss of control over the situation. The first
element could be easily introduced into an experimental study; the latter two
elements are more difficult because of ethical constraints. A popular paradigm in
experimental CBCA research therefore is to invite participants to give an account
of a negative event that they have experienced, such as giving a blood sample,
being bitten by a dog, and so on, or to give a fictitious account of such an event
that they have not actually experienced. Obviously, the experimenter needs to
establish whether the story is actually true or fictitious, for example, by checking
with the participants’ parents, although this does not always happen in experi-
mental research (see, e.g., Ruby & Brigham, 1998).
Different studies have used different paradigms, and the paradigms used are
listed in Table 1. A distinction is made between field studies and laboratory
studies. In the laboratory studies, a further distinction is made between studies in
which respondents actually participated in an event and were asked to tell the truth
or lie about that event afterwards (active), studies in which they were shown a
video and then asked to tell the truth or lie about that video (video), studies in
which they watched a staged event and then were asked to tell the truth or lie
about that event (staged), and studies in which they were asked to tell a truthful
or fictitious story about a previous negative experience in their life (memory).
As mentioned before, CBCA was developed to evaluate statements from
children who are witnesses or alleged victims in sexual abuse cases. Many authors
still describe CBCA as a technique developed solely to evaluate statements made
by children in sexual offense trials (see, e.g., Honts, 1994; Horowitz et al. 1997).
10 VRIJ
Table 1
Differences Between Truth Tellers and Liars on CBCA Criteria
CBCA criterion
Age
Authors (years) Event Status 1 2 3 4 5
Field studies
Boychuk (1991) 4–16 Field Victim ⬎ ⬎ ⬎ ⬎ ⬎
Craig et al. (1999) 3–16 Field Victim
Esplin et al. (1988) 3–15 Field Victim ⬎ ⬎ ⬎ ⬎ ⬎
Lamb, Sternberg, 4–13 Field Victim — ⬎ ⬎ ⬎ ⬎
Esplin, Hershkowitz,
Orbach, & Hovav
(1997)
Parker & Brown (2000) Adult Field Victim — ⬎ ⬎ — ⬎
Laboratory studies
Akehurst et al. (2001) 7–11/ Active Na ⬎ — ⬎ — ⬎
adult
Colwell et al. (2002) Adult Staged Witness ⬎
Höfer et al. (1996) Adult Active Na ⬎ — ⬎ ⬎ —
Köhnken et al. (1995) Adult Video Witness — ⬎ ⬎ —
Landry & Brigham Adult Memory Victim ⬍ ⬎ ⬎ ⬎
(1992)
Porter & Yuille (1996) Adult Active Suspect ⬎ ⬎
Porter et al. (1999) Adult Memory Victim —
Ruby & Brigham Adult Memory Victim ⬍ ⬎ — ⬍ ⬎
(1998)
Santtila et al. (2000) 7–14 Memory Victim — ⬎ ⬎ — —
Sporer (1997) Adult Memory Victim ⬎ — — ⬎ —
Steller et al. (1988) 6–11 Memory Victim ⬎ ⬎ ⬎ —
Tye et al. (1999) 6–10 Active Witness — ⬎ ⬎ ⬎ —
Vrij, Edward, et al. Adult Video Witness — — ⬎ ⬎ —
(2000)
Vrij & Heaven (1999) Adult Video Witness —
Vrij, Kneller, & Mann Adult Video Witness — — ⬎ — —
(2000)a
Vrij et al. (in press) 5–15/ Active Witness/ ⬎ ⬎ ⬎ ⬎
adult suspect
Winkel & Vrij (1995) 8–9 Video Witness ⬎ ⬎ ⬎ ⬎ ⬎
Total (support/total 10/19 9/14 16/20 11/16 9/17
number of studies
ratio)
Total support in 53 64 80 69 53
percentages
Note. CBCA ⫽ Criteria-Based Content Analysis; Na ⫽ participants participated in an
activity but were neither victims nor suspects; ⬎ ⫽ verbal characteristic occurs more
frequently in truthful than in deceptive statements; ⬍ ⫽ verbal characteristic occurs more
frequently in deceptive than in truthful statements; — ⫽ no relationship between the
verbal characteristic and lying/truth telling. Blank cells indicate that the verbal charac-
teristic was not investigated.
a
Uninformed liars only.
CBCA ASSESSMENTS 11
CBCA criterion
6 7 8 9 10 11 12 13 14 15 16 17 18 19 Total
⬎ ⬎ ⬎ ⬎ — ⬎ ⬎ — ⬎ — — — ⬎ —
⬎
⬎ ⬎ ⬎ ⬎ — ⬎ ⬎ ⬎ ⬎ ⬎ — — ⬎ ⬎ ⬎
⬎ — — — — — — — — ⬎
⬎ — — — — — ⬎ ⬎ — — — — —
⬎ — — — ⬎ — — ⬎
⬎ ⬎ — — ⬎ — — ⬎
— — — — — — — ⬎ —
⬎ — ⬎ ⬎ ⬎ ⬍ ⬎ ⬎ ⬎ — ⬎
— — — — — — — ⬎
— ⬎ ⬎ ⬎ ⬍ ⬍ ⬎ ⬎ — ⬍ ⬍
⬎ — ⬎ — — — — ⬎ —
— — — — — — — —
— ⬎ ⬎ ⬎ ⬎ ⬎ — — — — ⬍ —
⬎ — — ⬎ — — — ⬎
⬎ ⬎ — — ⬎ ⬎ — — — ⬎
⬎
— ⬎ — ⬎ ⬎ — — ⬎
⬎ — — — — ⬎
⬎ — ⬎ — — — ⬎
11/16 5/15 9/17 6/17 1/8 4/10 6/15 5/14 6/17 6/13 2/11 0/6 2/5 1/2 11/12
69 33 53 35 12 40 40 36 35 46 18 0 40 50 92
12 VRIJ
Others, however, have advocated the additional use of the technique to evaluate
the testimonies of adults who talk about issues other than sexual abuse (Köhnken
et al., 1995; Porter & Yuille, 1996; Ruby & Brigham, 1997; Steller & Köhnken,
1989). These authors have pointed out that the underlying Undeutsch hypothesis
is restricted neither to children, witnesses, and victims nor to sexual abuse. To
shed light on this issue, I have also indicated in Table 1 whether the statements
were derived from children or adults and whether they were victims, witnesses, or
suspects. Participants who discussed a negative life event they had experienced
have been labelled as victims.
denial by the accused. As identified earlier, none of these criteria are independent
case facts.
In her subsequent study, Boychuk (1991) addressed some of these criticisms.
Statements of 75 children between the ages of 4 and 16 years old were analyzed
by three raters who were masked with regard to case disposition. She also
included in her sample, apart from confirmed and doubtful groups, a third group:
likely abused. The likely abused were those without medical evidence but with
confessions by the accused or criminal sanctions from a superior court. Unfortu-
nately, in all of her analyses, including the one presented in Table 1, she combined
the confirmed group and the likely abused group. By assessing differences
between the two remaining groups on each criterion, Boychuk found fewer
significant differences than Esplin and colleagues (1988) had (see Table 1), but all
13 differences found were in the expected direction. That is, the criteria were
more often present in the confirmed cases than in the doubtful cases, which again
supports the Undeutsch hypothesis.
CBCA assessments were carried out to assess the veracity of adult rape
allegations in a field study published by Parker and Brown (2000). Differences
were found on several criteria, and all differences were in the expected direction
(see Table 1). However, this study also had serious methodological problems. For
example, the criteria for establishing the actual veracity of the statements, con-
vincing evidence of rape (when no information was given as to what was meant
by this) and corroboration in the legal sense and with either a suspect being
identified or charged, are either too vague or not independent case facts. Also,
only one evaluator examined most of the cases, and it is unclear whether that
person was masked with regard to the facts of the case or if she or he had any
background information about the cases she or he was asked to assess.
In a better controlled field study, Lamb, Sternberg, Esplin, Hershkowitz,
Orbach, and Hovav (1997) selected and analyzed the statements of 98 alleged
victims of child sexual abuse (aged 4 –12 years) and included only cases in which
there was (a) evidence of actual physical contact between a known accused and
the child and (b) an element of corroboration present. Using these selection
criteria meant that many other cases needed to be disregarded as the initial sample
consisted of 1,187 interviews.1 They found fewer significant differences than
Boychuk (1991) and Esplin et al. (1988) partly because not all 19 criteria were
included in the assessment. However, again, all differences were in the expected
direction, that is, the criteria were more often present in the plausible group than
in the implausible group. Like Esplin and colleagues, Lamb et al. also calculated
the mean CBCA scores of their two groups. If a criterion was not present in the
statement, it received a score of 0; if it was present, it received a score of 1. Only
14 criteria were used in this study, which meant that the total CBCA score could
vary between 0 and 14. Significantly more criteria were present in the confirmed
1
This is a problem field researchers typically face when they use stringent selection criteria. For
example, Anson et al. (1993) could use only 23 cases that fit their selection criteria out of a sample
of 466 cases. An important issue is whether a small, selective sample means that the sample is
unrepresentative. The fact that the sample is small might affect generalization, but using stringent
selection criteria should not affect representativeness as there are no good reasons to believe that
strong independent corroborative evidence would change the nature of a child’s disclosure.
14 VRIJ
cases (6.74) than in the doubtful cases (4.85). This difference, however, is much
smaller than the difference found by Esplin and colleagues.
Craig et al. (1999) examined 48 statements from children between the ages of
3 and 16 years old who were alleged victims of sexual abuse. A statement was
classified as confirmed if the accused made a confession and/or failed a polygraph
test. A statement was classified as highly doubtful if the child provided a detailed
and credible recantation and/or the accused passed a polygraph test, that is, when
the polygraph test suggested that the accused was innocent. In other words, this
study also did not establish independent case facts. The average CBCA score of
the confirmed cases (7.2) was slightly higher than the average score of the
doubtful cases (5.7). Only 14 criteria were used, and the scores could vary
between 0 and 14. Only total CBCA scores were examined.2
2
In a case study published by Orbach and Lamb (1999), the accuracy of a statement provided
by a 13-year-old sexual abuse victim could be established with more certainty and in greater detail
than in most other studies. Information given by the victim during the interview was compared with
an audiotaped record of that incident. The child had told her mother that her grandfather had
sexually molested her on several occasions, but the mother did not believe the allegations. When one
day the grandfather entered the bathroom while the child was listening to music played on an
audiotape recorder, she pressed the record button and recorded the sexually abusive incident that
was about to unfold. Orbach and Lamb conducted a CBCA analysis on the statement and found that
10 out of 14 criteria they assessed were present in the statement. Obviously, the results of a study
in which only one (truthful) statement is examined do not say much about the validity of CBCA.
Also, the fact that the child knew that there was audiotaped evidence of the incident might have
influenced her statement in an unspecified manner. Nevertheless, the nature and strength of the
corroborative evidence make the study worth mentioning.
CBCA ASSESSMENTS 15
typically believe that such criteria are more present when watching a video than
when reading a text (Strömwall & Granhag, 2003). Moreover, research has
demonstrated that people are better at detecting truths and lies when they read a
transcript than when they watch a video (see DePaulo, Stone, & Lassiter, 1985,
for a review). In other words, these studies deviated considerably from the normal
CBCA procedure on certain points that may have affected the CBCA judgments.
Furthermore, Table 1 shows that in both children’s and adults’ narratives, the
criteria emerged more frequently in truthful reports. Age differences were tested
for directly by including statements from both adults and children in experiments
by Akehurst, Köhnken, and Höfer (2001) and Vrij, Akehurst, Soukara, and Bull
(2002). They both found higher total CBCA scores for truth tellers than for liars
in both age groups (children vs. adults); however, they did not examine age
differences on the separate criteria, and the results presented in Table 1 are the
combined scores for adults and children.
Some criteria occurred more frequently in statements from innocent suspects
than in statements from guilty suspects in a study by Porter and Yuille (1996). Vrij
et al. (2002) are the only researchers to have directly compared statements of
suspects and witnesses. They found a higher total CBCA score for truth tellers
than for liars in both suspects and witnesses but did not examine differences on
each criterion.
These findings support the assumption that CBCA ratings are not restricted to
statements of victims and children about sexual abuse but could be used in
different contexts and with different types of interviewee. However, one should
keep in mind that CBCA assessments can be used only for statements that have
been provided in interviews in which free recall was stimulated and prompting
was kept at a minimum. Such an interview style rarely occurs in police interviews
with suspects, which means that conducting CBCA assessments on suspects’
statements would probably often be inappropriate.
Finally, the expected differences were found in CBCA scores between liars
and truth tellers in all experimental research paradigms—actual involvement,
watching a video, statements derived from memory, and so on—which is a further
indication that differences in CBCA scores are rather robust.
A look at the empirical support for each of the 19 criteria shows that Criterion
3 (quantity of detail) received the most support. The amount of detail was
calculated in 20 studies, and in 16 of those studies (80%), truth tellers included
significantly more details in their accounts than liars (see the bottom of Table 1).
Unstructured production (Criterion 2), contextual embeddings (Criterion 4), and
reproduction of conversation (Criterion 6) all received strong support as well. The
so-called motivational criteria, Criteria 14 to 18, received less support than most
cognitive criteria (1–13). In fact, Criterion 17, self-deprecation, has received no
support at all to date. This criterion has been examined in six studies. In two
studies, a significant difference between liars and truth tellers appeared, and both
times, the criterion appeared less often in the truthful statements. Berliner and
Conte (1993) pointed out that Criteria 14 to 16 require the witness to exhibit a lack
of confidence in the account as evidence for truthfulness. This, they noted,
suggests by implication that confidence diminishes the likelihood of truthfulness,
which is an implication they find disputable. As can be seen in Table 1, several
researchers did not examine Criteria 15 to 19 either because of interrater reliabil-
16 VRIJ
ity concerns (Lamb, Sternberg, Esplin, Hershkowitz, Orbach, & Hovav, 1997) or
because they believed these criteria are theoretically unrelated to the basic
memory concept embodied in the Undeutsch hypothesis (Raskin & Esplin,
1991b). Accurately reported details misunderstood (Criterion 10) and raising
doubts about one’s own testimony (Criterion 16) received little support too, perhaps,
as is shown below, because these criteria are not frequently present in statements.
The hypothesis that truth tellers would obtain a higher total CBCA score than
liars was examined in 12 studies. In 11 out of these 12 studies (92%), the
hypothesis was supported.
Table 2
Interrater Agreement Scores
CBCA criterion
Age
Authors (years) Event Status 1 2 3 4
Field studies
Anson et al. (1993) 4–12 Field Victim .65 .13 .65 .48
Boychuk (1991) 4–16 Field Victim ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83
Buck et al. (2002) 2–14 Field Victim .67 .79 .32 .77
Craig et al. (1999) 3–16 Field Victim ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72
Horowitz et al. 2–19 Field Victim .77 .50 .58 .75
(1997)a
Laboratory studies
Akehurst et al. 7–11/ Active Na .34 .35 .68 .42
(2001) adult
Colwell et al. Adult Staged Witness .83
(2002)
Höfer et al. (1996) Adult Active Na
Porter & Yuille Adult Active Suspect ⬎ .80 ⬎ .80
(1996)
Porter et al. (1999) Adult Memory Victim ⬎ .70 .24
Santtila et al. 7–14 Memory Victim ⬎ .63 ⬎ .63 Nc ⬎ .63
(2000)
Vrij, Edward, et al. Adult Video Witness .55 .65 .90 .85
(2000)
Vrij, Kneller; & Adult Video Witness ⬎ .87 .53 ⬎ .87 ⬎ .87
Mann (2000)b
Vrij et al. (2004) 5–15/ Active Witness/ .49 .08 .56 .76
adult suspect
Vrij et al. (2001a) Adult Video Witness 1.00 .51 .90 .88
Winkel & Vrij 8–9 Video Witness ⬎ .73 ⬎ .73 ⬎ .73 ⬎ .73
(1995)
Total (goodc 11/14 6/12 10/13 10/13 9/12
interrater scores/
number of studies
ratio)
Total (percentage of 79 50 77 77 75
good interrater
agreement scores)
Note. CBCA ⫽ Criteria-Based Content Analysis; Na ⫽ participants participated in
activity but were neither victims nor suspects; Nc ⫽ interrater agreement was not
calculated; MAX ⫽ Maxwell’s random error coefficient of agreement; KAPPA ⫽
Cohen’s kappa; COR ⫽ Pearson or Spearman correlations; AGREE ⫽ proportion agree-
ment. Blank cells indicate that the verbal characteristic was not investigated.
a
First occasion scores only. bUninformed liars only. cGood was defined as .60 or
higher.
(table continues)
Table 2 (continued)
CBCA criterion
Authors 5 6 7 8 9 10 11 12
Field studies
Anson et al. (1993) .13 .65 .56 .39 .48 .83 .22 .13
Boychuk (1991) ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83 ⬎ .83
Buck et al. (2002) .69 .52 .79 .73 .59 .69 .55 .65
Craig et al. (1999) ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72 ⬎ .72
Horowitz et al. .65 .71 .57 .48 .37 .83 .52 .57
(1997)a
Laboratory studies
Akehurst et al. .44 .67 .49 .33 .55 ⫺.04 .62
(2001)
Colwell et al.
(2002)
Höfer et al. (1996)
Porter & Yuille ⬎ .80 ⬎ .80 ⬎ .80 ⬎ .80 ⬎ .80
(1996)
Porter et al. (1999)
Santtila et al. ⬎ .63 .87 ⬎ .63 ⬎ .63 ⬎ .63 ⬎ .63 ⬎ .63 ⬎ .63
(2000)
Vrij, Edward, et al. .90 .97 .77 .69 .58
(2000)
Vrij, Kneller; & ⬎ .87 ⬎ .87 ⬎ .87 ⬎ .87
Mann (2000)b
Vrij et al. (2004) .55 .52 .30 .05 .68
CBCA criterion
13 14 15 16 17 18 19 Total Type
AGREE
.78 COR
.67 ⬎ .80 ⬎ .80 COR
⬎ .70 COR
⬎ .63 ⬎ .63 COR
Table 3
Frequency of Occurrence of the CBCA Criteria (in Percentages)
CBCA criterion
Age
Authors (years) Event Status 1 2 3 4 5
Field Studies
Anson et al. (1993) 4–12 Field Victim 91 70 74 74 48
Boychuk (1991)
confirmed 4–16 Field Victim 100 100 100 96 66
Boychuk (1991)
doubtful 4–16 Field Victim 68 40 48 44 12
Buck et al. (2002) 2–14 Field Victim 77 13 79 97 30
Esplin et al. (1988) true 3–15 Field Victim 100 95 100 100 100
Esplin et al. (1988) false 3–15 Field Victim 55 15 55 35 30
Horowitz et al. (1997)a 2–19 Field Victim 87 71 77 89 32
Lamb, Sternberg, Esplin,
Hershkowitz,
Orbach, & Hovav
(1997) plausible 4–13 Field Victim 100 76 97 82 62
Lamb, Sternberg, Esplin,
Hershkowitz,
Orbach, & Hovav
(1997) implausible 4–13 Field Victim 100 46 77 46 23
Lamers-Winkelman &
Buffing (1996) 2–11 Field Victim 82 45 100 39 31
Total (percentage of
occurrence in field
studies)b 86 55 85 75 41
Laboratory studies
Landry & Brigham
(1992) Adult Memory Victim 86 84 66 71
Tye et al. (1999) true 6–10 Active Witness 92 92 83 75 42
Tye et al. (1999) false 6–10 Active Witness 63 44 13 6 13
Vrij & Heaven (1999)
true Adult Video Witness
Vrij & Heaven (1999)
false Adult Video Witness
Vrij et al. (2001a) Adult Video Witness 100 44 Nc 33 7
Note. CBCA ⫽ Criteria-Based Content Analysis; Nc ⫽ interrater agreement was not
calculated. Blank cells indicate that the verbal characteristic was not investigated.
a
First occasion scores only. bThe total scores were calculated as follows. Criteria 1–14
were scored in a total of 543 statements, and the percentages presented are the percentage
of occurrence in these 543 statements (e.g., Criterion 1 was present in 468 [86%] out of
543 statements). Criteria 15–19 were assessed in a total of 445 statements, and the
percentages presented are the percentage of occurrence in these 445 statements (e.g.,
Criterion 15 was present in 203 [46%] out of 445 statements).
CBCA criterion
6 7 8 9 10 11 12 13 14 15 16 17 18 19
61 24 19 57 9 28 61 17 20 37 0 15 9 65
74 64 52 50 12 42 64 10 86 54 14 16 36 76
20 8 8 24 0 0 24 4 36 52 8 4 12 56
46 14 9 28 8 25 30 5 46 31 1 5 2 43
70 70 95 100 5 90 90 40 100 75 10 25 55 100
0 0 0 5 5 0 30 0 10 35 0 0 5 30
51 29 22 36 10 56 40 18 42 49 1 5 94 97
74 33 41 4 8 4 49 16 26
46 23 15 0 15 8 38 23 8
33 16 23 22 3 42 79 15 21 50 4 3 67 68
50 27 26 29 7 32 51 13 40 46 4 7 45 69
30 43 58 54 85 39 13 4 9 13
67 0 0 75 25 17 8
13 0 0 31 13 6 0
27
5
20 27 99 80 5 3
whereas in studies with laypersons, judgments were typically made on the basis
of watching videotapes with interviewees. As mentioned before, people are better
at detecting truths and lies when they read a transcript than when they watch a
video (DePaulo et al., 1985).
Several researchers have examined the impact of CBCA training directly by
including trained and untrained judges in their samples. Unfortunately, little is
22 VRIJ
known about what kind of training is actually required to become a CBCA expert.
According to Raskin and Esplin (1991b), a 2- or 3-day workshop is advisable,
whereas Köhnken (1999) recommended a 3-week training course. Moreover,
nobody has tested whether such training actually works.3 Although it is unclear
how much training is required, it sounds reasonable to suggest that it should be a
rather extensive training program. Making CBCA/SVA assessments is never a
straightforward task. During CBCA coding, 19 criteria, some of which are
difficult to score, need to be taken into consideration. After the CBCA coding, the
impact of numerous external factors on the final statement needs to be assessed
carefully (Steller, 1989; Wegener, 1989). It is impossible to do all this appropri-
ately without extensive training, and even a 2- or 3-day workshop might be too
short.
All studies that have examined the impact of CBCA training on accuracy
scores clearly fall short of this 2- or 3-day-workshop requirement (Akehurst, Bull,
& Vrij, 1998; Köhnken, 1987; Landry & Brigham, 1992; Ruby & Brigham, 1998;
Santtila, Roppola, Runtti, & Niemi, 2000; Steller, Wellershaus, & Wolf, 1988;
Tye, Amato, Honts, Kevitt, & Peters, 1999). The shortest training session (45
min) was given by Landry and Brigham (1992) and Ruby and Brigham (1998),
though at 90 min, Steller et al.’s (1988) training session did not last much longer.
Akehurst et al.’s (1998) session lasted 2 hr, whereas Köhnken (1987) and Santtila
et al. (2000) did not provide information about the length of their training
sessions. However, their sessions might well have been of similar length because
the content of the training sessions used in those two studies strongly resembled
the training sessions used in the other studies mentioned so far. In a typical study,
trainees are given a handout with information about CBCA criteria. A trainer then
explains the criteria in more depth and provides some examples. Trainees are then
asked to rate one or a few exercise statements, and their ratings are discussed. The
training session was slightly different in Tye et al.’s (1999) study as, rather than
training judges specifically for their experiment, they used a panel of people who
were previously trained in CBCA (no information was given about the training
these previously trained judges had received).
The results of these training studies are mixed. Several researchers have found
that trained judges were better at distinguishing between truths and lies than lay
evaluators (Landry & Brigham, 1992; Steller et al., 1988; Tye et al., 1999). Some
found no training effect (Ruby & Brigham, 1998; Santtila et al., 2000), and others
found that training made judges worse at distinguishing between truths and lies
(Akehurst et al., 1998; Köhnken, 1987). It is probably not fair to discredit CBCA
training on the basis of these findings given the lack of depth of these training
3
In the only field study related to this issue, Gumpert, Lindblad, and Grann (2002a) compared
expert testimony reports prepared by professionals who had a statement analysis background with
reports prepared by a more clinically oriented group often employed within child and adolescent
psychiatry. They found that the reports of the statement analysis group were generally of higher
quality (see Gumpert, Lindblad, & Grann, 2002b, for how quality was measured). Unfortunately,
this study does not reveal anything about the effectiveness of CBCA training. As the authors
acknowledged, they did not assess the accuracy of the recommendations made in the reports.
Moreover, the groups could have differed in other respects besides training.
CBCA ASSESSMENTS 23
sessions. All one can conclude is that providing judges with such short training
programs has an unpredictable effect on the ability to detect truths and lies.
Different ways of calculating accuracy rates. In CBCA research, accuracy
rates—the correct classifications of liars and truth tellers—are computed in three
different ways. First, CBCA scores might be subjected to statistical analyses,
typically, discriminant analysis. Although this is a sound way of calculating
accuracy rates, CBCA experts do not use such analyses in real life.
A second method is by asking CBCA experts to make truth–lie classifications.
This method is more realistic as this is what happens in real life. However, it is
also highly subjective because a classification depends on an assessor’s own
interpretation of a statement. The obvious problem with subjectivity is generali-
zation. There is no guarantee that two different CBCA experts who judge the same
statements will make the same decisions. In other words, the accuracy rate
obtained by one expert in a CBCA study does not predict the accuracy rate
obtained by a second expert in the same study.
A third method is by using decision rules. In this case, the truth–lie judgment
is based on fixed rules, such as “the first five criteria should be present plus two
others” (Zaparniuk et al., 1995). The advantage of this method is that it is
objective: Different assessors who apply the same decision rule will obtain the
same accuracy rates. However, it has serious shortcomings. As I mentioned
earlier, CBCA scores depend on factors other than veracity, such as age and
interview style, and these factors are ignored when such decision rules are used.
CBCA experts are therefore opposed to the use of decision rules (Steller &
Köhnken, 1989), but researchers nevertheless sometimes use them, even in field
studies (Parker & Brown, 2000).
Accuracy rates in field and laboratory studies. The only field study in which
accuracy rates were reported and a very high 90% overall accuracy rate was found
was conducted by Parker and Brown (2000; see also Table 4). Not a single overlap
between CBCA scores of confirmed and unconfirmed cases was found by Esplin
et al. (1988). All scores for the unconfirmed cases were lower than any of the
scores for the confirmed cases, which implies that Esplin et al. found an even
higher (100%) accuracy rate. Although both studies showed tremendous support
for the accuracy of CBCA assessments, as discussed earlier, both studies also had
methodological flaws. I therefore prefer to disregard these results.
Regarding the remaining studies in which accuracy rates were reported (all
laboratory studies), overall accuracy rates in those studies varied from 65% to
90%, with the exception of Landry and Brigham (1992), who obtained a lower
accuracy rate. I have already given several reasons to explain their exceptional
findings—short training, short statements, watching videotapes. In addition to
this, the judges were advised to use a decision rule in which more than five criteria
present equaled a good indication of high credibility, which is not what CBCA
experts typically do. If one disregards their findings, Table 4 reveals that accuracy
rates for truths varied between 53% and 89% and accuracy rates for lies between
60% and 100%. The average accuracy rate for truths in those studies is 73%,
which is similar to the accuracy rates for lies, which is 72%. Accuracy rates for
children do not seem to differ from accuracy rates for adults, further supporting
that CBCA assessments are not restricted to children’s statements.
To my knowledge, Ruby and Brigham (1998) are the only researchers to have
24
Table 4
Accuracy Rates
Authors Age (years) Event Status Assessment Truth (%) Lie (%) Total (%)
Field studies
Esplin et al. (1988) 3–15 Field Victim CBCA experts 100 100 100
Parker & Brown (2000) Adult Field Victim Decision rules 88 92 90
Laboratory studies
Akehurst et al. (2001) 7–11/adult Active Na Discriminant 73 67 70
Akehurst et al. (2001) 7–11 Active Na Discriminant 71
Akehurst et al. (2001) Adult Active Na Discriminant 90
Höfer et al. (1996) Adult Active Na Discriminant 70 73 71
Joffe & Yuille (1992)a 6–9 Active Na CBCA experts 71
Köhnken et al. (1995) Adult Video Witness Discriminant 89 81 85
Landry & Brigham (1992) Adult Memory Victim CBCA experts 75 35 55
Ruby & Brigham (1998) Adult White Memory Victim Discriminant 72 65 69
VRIJ
examined the impact of ethnicity on the quality of statements (see also Vrij &
Winkel, 1991, 1994, for ethnic differences in speech style). This issue merits
attention in future studies given potential differences in narrative techniques
between different cultures (Davies, 1994b; Phillips, 1993).
Validity Checklist
To date, Validity Checklist research has concentrated on the impact of three
external factors included in the Validity Checklist (age of the interviewee,
interviewer’s style, and coaching of the interviewee) on CBCA scores.
4
Some studies did not obtain significant age effects (Akehurst et al., 2001; Tye et al., 1999).
However, in Tye et al.’s (1999) study, children’s ages were not balanced for true and false
statements. The correlation between age and total CBCA score in Hershkowitz et al.’s (1997) study
was only marginally significant (p ⬍ .10).
26
Table 5
Frequency of Occurrence of the CBCA Criteria (in Percentages) as a Function of Age
CBCA criterion
Age
Authors (years) Event Status 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Field studies
Boychuk (1991) 4–5 Field Victim 80 80 67 73 20 40 33 27 53 20 27 40 0 53 13 13 0 20 67
Boychuk (1991) 6–7 Field Victim 87 80 73 80 33 53 27 40 33 7 20 47 7 73 47 7 7 27 73
Boychuk (1991) 8–9 Field Victim 93 87 87 73 60 60 60 53 40 7 20 47 13 67 60 13 13 27 80
Boychuk (1991) 10–12 Field Victim 87 73 93 80 60 53 53 40 40 0 53 47 0 73 67 20 13 27 73
Boychuk (1991) 13–16 Field Victim 100 80 93 87 67 73 73 27 40 7 20 73 13 80 80 7 27 40 87
Buck et al. (2002) 2–3 Field Victim 65 0 55 65 15 30 5 0 5 10 0 10 0 30 5 0 5 0 25
Buck et al. (2002) 4 Field Victim 61 0 72 83 22 39 6 6 17 17 17 11 6 32 11 0 0 0 33
VRIJ
perpetrator’s mental state, and spontaneous corrections compared with the oldest
age group (aged 13–14 years old).5
5
The remaining researchers who examined age differences (Anson et al., 1993; Craig et al.,
1999; Davies et al., 2000; Hershkowitz et al., 1997; Horowitz et al., 1997; Vrij et al., 2002) did not
report age differences for individual CBCA criteria.
28 VRIJ
The CBCA experts did not notice that some participants had been coached and did
not discriminate successfully between truth tellers and coached liars. Perhaps
causing even more concern, in Vrij, Kneller, and Mann’s study, the CBCA experts
still could not indicate which statements belonged to the coached liars even after
they had been informed that some of the participants had been coached.
evaluators assess a case independent of each other. At present, this is not common
practice.6
6
The problem CBCA/SVA evaluators have to deal with—that a witness’s response is influ-
enced not just by the veracity of a statement but also by external factors—is not unique to SVA
assessments but happens in physiological and nonverbal lie detection as well. Those latter lie-
detection techniques attempt to resolve the issue by introducing a baseline response that is a typical,
natural response of the interviewee that the lie detector knows to be a truthful response and that is
provided in circumstances similar to the response under investigation. They then compare the
baseline response with the response under investigation, and because, in that situation, the impact
of external factors on both responses is assumed to be the same, differences between the two
responses may indicate deception. However, the method is complex as creating a good baseline is
often problematic (Vrij, 2002b).
7
Horowitz (1991) pointed out that it is dangerous to form an impression about the veracity of
a statement on the basis of a child’s knowledge about sexual matters as there are no age norms for
such knowledge (Jones & McQuiston, 1989). Moreover, Gordon, Schroeder, and Abrams (1990),
who compared abused children with a matched sample of nonabused children on sexual knowledge,
found no differences between these two groups. Despite this, many professionals consider so-called
age-inappropriate sexual knowledge an important indicator of sexual abuse (Conte, Sorenson,
Fogarty, & Rosa, 1991). As mentioned before, there is not much empirical evidence to support the
idea that Criterion 10 occurs more frequently in truthful responses, perhaps because this criterion is
seldom present in statements at all.
32 VRIJ
therefore are likely to obtain low CBCA scores (i.e., young children, interviewees
with poor verbal skills, etc.) might well be in a disadvantageous position.
Legal Implications
What are the implications of these findings for the use of CBCA/SVA
assessments as scientific evidence in legal systems? A possible way to answer this
question is by examining to what extent CBCA/SVA assessments meet the criteria
that are required for admitting expert scientific evidence in criminal courts. In
Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993), the United States Supreme
Court promulgated a set of guidelines for admitting expert scientific evidence in
the (American) federal courts. The following guidelines were provided by the
Supreme Court and reported and discussed by Honts (1994): (a) Is the scientific
hypothesis testable, (b) has the proposition been tested, (c) is there a known error
rate, (d) has the hypothesis and/or technique been subjected to peer review and
publication, and (e) is the theory on which the hypothesis and/or technique is
based generally accepted in the appropriate scientific community?
The answer to the first question—Is the scientific hypothesis testable?—is
yes. The Undeutsch hypothesis can be tested in scientific research, although, as
this review has revealed, this is not an easy task. The Undeutsch hypothesis can
easily be tested in experimental laboratory-based research, but the findings might
not be ecologically valid given the artificial nature of such studies. Testing the
Undeutsch hypothesis in field studies is possible in principle; however, in prac-
tice, it is difficult to establish the truth or falsity of statements beyond doubt.
The answer to the second question—Has the proposition been tested?—is also
suggested to be affirmative; however, most of the studies indicating this have been
experimental laboratory studies, and in most studies, adults rather than children
participated. There are very few properly conducted field studies testing the
Undeutsch hypothesis. In general, the available studies provide empirical support
for the Undeutsch hypothesis. In 11 out of 12 studies in which a total CBCA score
was calculated, the CBCA score was significantly higher for truth tellers than for
liars, which supports the Undeutsch hypothesis. When the individual criteria are
taken into account, the criteria with the strongest support (Criteria 2, 3, 4, and 6)
are all part of the cognitive component of the Undeutsch hypothesis (Criteria
1–13). Support for the motivational component of the hypothesis (Criteria 14 –18)
is generally weak.
The answer to the third question—Is there a known error rate?—is no.
Clearly, there is a known error rate of CBCA judgments made in experimental
laboratory research, which is approximately 30% for both detecting truths and
detecting lies. However, of particular interest here is the error rate of SVA
judgments in field studies. A properly conducted study examining this issue has
not been published to date. As long as the error rate in field studies is unknown,
there is no better alternative than to use the known error rate in CBCA laboratory
studies. This error rate, around 30%, is probably not an unreasonable estimate for
the accuracy of SVA judgments. There are reasons to believe that truth–lie
assessments in real-life situations are as difficult as or even more difficult than
truth–lie assessments in experimental laboratory studies. Research, reviewed in
the Validity Checklist section of this review, has demonstrated that CBCA scores
CBCA ASSESSMENTS 33
are affected not only by the veracity of the statement but also by other factors,
such as age, verbal ability, and social skills of the interviewee and the interview
style of the interviewer. In the Validity Checklist section, it has also been argued
that it is difficult in real-life situations to indicate which external factors might
have influenced the quality of the statement. Some external factors (such as
coaching of the interviewee) are difficult to detect, and interviewees who come to
know about the method might therefore dupe evaluators. Other factors (such as
social skills of the interviewee) are not included in the Validity Checklist and are
therefore likely to be ignored by evaluators. Further factors (e.g., whether the
interviewee was suggestible during the interview) are difficult to measure. Finally,
I have raised some concerns about the appropriateness of some factors (such as
looking for consistency between statements).
Moreover, it is difficult to determine the exact impact of these external factors
on a particular statement. For example, even in studies in which raters were
instructed to take the child’s age into account (Lamers-Winkelman & Buffing,
1996), CBCA scores still correlated with age. In one of the very few studies
regarding the Validity Checklist, Gumpert and Lindblad (1999) found that SVA
experts had the tendency to rely heavily on the CBCA outcomes and that a
high-quality statement was often considered to be true and a low-quality statement
was often considered to be false. The combined findings of Lamers-Winkelman
and Buffing (1996) and Gumpert and Lindblad (1999) suggest that young inter-
viewees, who naturally produce low CBCA scores, are in a disadvantageous
position. Other interviewees who naturally produce low-quality statements (such
as interviewees with poor verbal skills, socially inept interviewees, etc.) might be
in a similarly disadvantageous position. Finally, a further complication in making
SVA assessments is that some false allegations (i.e., false narratives that contain
many true elements, false memories, well-prepared lies) are difficult to detect.
In summary, although the error rates for SVA assessments in real-life cases
are unknown, incorrect decisions are likely to occur given the numerous difficul-
ties associated with making SVA assessments. If one takes the known error rate
of 30% as a guideline, than it is clear that SVA evaluators are not able to present
the accuracy of their SVA assessments as being beyond reasonable doubt, which
is the standard of proof often set in criminal courts. In other words, SVA
assessments are not accurate enough to be presented as scientific evidence in
criminal courts.
The answer to the fourth question—Has the hypothesis and/or technique been
subjected to peer review and publication?—is again yes. A growing number of
CBCA studies have now been published in peer reviewed journals, although,
again, most studies were laboratory-based studies in which the participants were
often adults rather than children.
The answer to the fifth and final question—Is the theory on which the
hypothesis and/or technique is based generally accepted in the appropriate scien-
tific community?—is probably no. As already mentioned in the introductory
section, several authors have expressed serious doubts about the method
(Brigham, 1999; Davies, 2001; Lamb, Sternberg, Esplin, Hershkowitz, Orbach, &
Hovav, 1997; Rassin, 1999; Ruby & Brigham, 1997; Wells & Loftus, 1991).
However, a proper survey, similar to the one in which scientific opinion concern-
34 VRIJ
ing the polygraph was examined (Iacono & Lykken, 1997), has not been pub-
lished to date.
Conclusions
SVA evaluations do not meet the Daubert (1993) guidelines for admitting
expert scientific evidence in criminal courts. The two main reasons are that the
error rate is too high and that the method is not undisputed in the relevant
scientific community. Regarding the high error rate, SVA evaluators might
challenge the claim that the error rate is around 30% as this is the known error rate
for CBCA assessments made in laboratory studies rather than the error rate for
SVA evaluations made in real-life situations. However, those SVA evaluators
should realize that in case CBCA error rates should be negated, all that could then
be concluded is that the error rate is unknown, an outcome that does not meet the
Daubert guideline either.
At present, SVA evaluations are accepted as evidence in criminal courts in
several countries. In those countries, at the very least, SVA experts should present
the problems and limitations of SVA assessments in court so that judges, jurors,
prosecutors, and solicitors can make an informed decision about the validity of
SVA decisions. In addition, although the interrater agreement rates between
CBCA judges are generally adequate, they are not perfect and are likely to be
higher than the interrater agreement rates regarding the Validity Checklist. This
all clearly makes conducting SVA judgments a subjective exercise, and therefore,
more than one expert should judge each statement to establish interrater reliability
between evaluators.
However, true and fabricated stories can be detected above the level of chance
with CBCA/SVA assessments in both children and adults and in contexts other
than sexual abuse incidents, which makes such assessments a valuable tool for
police investigations. They might be useful, for example, in the initial stage of
investigation for forming rough indications of the veracity of various statements
in cases in which police detectives have different opinions about the veracity of
a statement. Thorough training in how to conduct CBCA/SVA assessments is
probably desirable given the erratic effects obtained in previous studies in which
trainees were exposed to less comprehensive training programs.
References
References marked with an asterisk indicate studies included in the literature review.
*Akehurst, L., Bull, R., & Vrij, A. (1998, September). Training British police officers,
social workers and students to detect deception in children using Criteria-Based
Content Analysis. Paper presented at the 8th European Conference of Psychology and
Law, Krakow, Poland.
*Akehurst, L., Köhnken, G., & Höfer, E. (2001). Content credibility of accounts derived
from live and video presentations. Legal and Criminological Psychology, 6, 65– 83.
*Anson, D. A., Golding, S. L., & Gully, K. J. (1993). Child sexual abuse allegations:
Reliability of criteria-based content analysis. Law and Human Behavior, 17, 331–341.
Arntzen, F. (1982). Die Situation der Forensischen Aussagenpsychologie in der Bundes-
republiek Deutschland [The state of forensic psychology in the Federal Republic of
Germany]. In A. Trankell (Ed.), Reconstructing the past: The role of psychologists in
criminal trials (pp. 107–120). Deventer, the Netherlands: Kluwer.
CBCA ASSESSMENTS 35
Baldry, A. C., Winkel, F. W., & Enthoven, D. S. (1997). Paralinguistic and nonverbal
triggers of biased credibility assessments of rape victims in Dutch police officers: An
experimental study of “nonevidentiary” bias. In S. Redondo, V. Garrido, J. Perze, &
R. Barbaret (Eds.), Advances in psychology and law (pp. 163–174). Berlin, Germany:
Walter de Gruyter.
Berliner, L., & Conte, J. R. (1993). Sexual abuse evaluations: Conceptual and empirical
obstacles. Child Abuse and Neglect, 17, 111–125.
*Boychuk, T. (1991). Criteria-Based Content Analysis of children’s statements about
sexual abuse: A field-based validation study. Unpublished doctoral dissertation,
Arizona State University.
Bradford, R. (1994). Developing an objective approach to assessing allegations of sexual
abuse. Child Abuse Review, 3, 93–101.
Brigham, J. C. (1999). What is forensic psychology, anyway? Law and Human Behavior,
23, 273–298.
*Buck, J. A., Warren, A. R., Betman, S., & Brigham, J. C. (2002). Age differences in
Criteria-Based Content Analysis scores in typical child sexual abuse interviews.
Applied Developmental Psychology, 23, 267–283.
Bull, R. (1992). Obtaining evidence expertly: The reliability of interviews with child
witnesses. Expert Evidence: The International Digest of Human Behaviour Science
and Law, 1, 3–36.
Bull, R. (1995). Innovative techniques for the questioning of child witnesses, especially
those who are young and those with learning disability. In M. Zaragoza (Ed.),
Memory and testimony in the child witness (pp. 179 –195). Thousand Oaks, CA: Sage.
Bull, R. (1998). Obtaining information from child witnesses. In A. Memon, A. Vrij, & R.
Bull, Psychology and law: Truthfulness, accuracy and credibility (pp. 188 –210).
Maidenhead, England: McGraw-Hill.
Burgess, A. W. (1985). Rape and sexual assault: A research book. London: Garland.
Burgess, A. W., & Holmstrom, L. L. (1974). Rape: Victims of crisis. Bowie, MD: Brady.
Bybee, D., & Mowbray, C. T. (1993). An analysis of allegations of sexual abuse in a
multi-victim day-care center case. Child Abuse and Neglect, 17, 767–783.
Ceci, S. J., & Bruck, M. (1995). Jeopardy in the courtroom. Washington, DC: American
Psychological Association.
Ceci, S. J., Huffman, M. L., Smith, E., & Loftus, E. F. (1994). Repeatedly thinking about
a non-event. Consciousness and Cognition, 3, 388 – 407.
Ceci, S. J., Loftus, E. F., Leichtman, M. D., & Bruck, M. (1994). The possible role of
source misattributions in the creation of false beliefs among preschoolers. Interna-
tional Journal of Clinical and Experimental Hypnosis, 17, 304 –320.
*Colwell, K., Hiscock, C. K., & Memon, A. (2002). Interviewing techniques and the
assessment of statement credibility. Applied Cognitive Psychology, 16, 287–300.
Conte, J. R., Sorenson, E., Fogarty, L., & Rosa, J. D. (1991). Evaluating children’s reports
of sexual abuse: Results from a survey of professionals. Journal of Orthopsychiatry,
61, 428 – 437.
*Craig, R. A., Scheibe, R., Raskin, D. C., Kircher, J. C., & Dodd, D. H. (1999).
Interviewer questions and content analysis of children’s statements of sexual abuse.
Applied Developmental Science, 3, 77– 85.
Dalenberg, C. J., Hyland, K. Z., & Cuevas, C. A. (2002). Sources of fantastic elements in
allegations of abuse by adults and children. In M. L. Eisen, J. A. Quas, & G. S.
Goodman (Eds.), Memory and suggestibility in the forensic interview (pp. 185–204).
Mahwah, NJ: Erlbaum.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 113 S. Ct. 2786 (1993).
Davies, G. M. (1991). Research on children’s testimony: Implications for interviewing
36 VRIJ
Gumpert, C. H., Lindblad, F., & Grann, M. (2002a). The quality of written expert
testimony in alleged child sexual abuse: An empirical study. Psychology, Crime, and
Law, 8, 77–92.
Gumpert, C. H., Lindblad, F., & Grann, M. (2002b). A systematic approach to quality
assessment of expert testimony in cases of alleged child sexual abuse. Psychology,
Crime, and Law, 8, 59 –75.
Hershkowitz, I. (1999). The dynamics of interviews yielding plausible and implausible
allegations of child sexual abuse. Applied Developmental Science, 3, 28 –33.
Hershkowitz, I. (2001). Children’s responses to open-ended utterances in investigative
interviews. Legal and Criminological Psychology, 6, 49 – 63.
*Hershkowitz, I., Lamb, M. E., Sternberg, K. J., & Esplin, P. W. (1997). The relationships
among interviewer utterance type, CBCA scores and the richness of children’s
responses. Legal and Criminological Psychology, 2, 169 –176.
*Höfer, E., Akehurst, L., & Metzger, G. (1996, August). Reality monitoring: A chance for
further development of CBCA? Paper presented at the annual meeting of the European
Association on Psychology and Law, Siena, Italy.
Honts, C. R. (1994). Assessing children’s credibility: Scientific and legal issues in 1994.
North Dakota Law Review, 70, 879 –903.
Horowitz, S. W. (1991). Empirical support for Statement Validity Assessment. Behavioral
Assessment, 13, 293–313.
*Horowitz, S. W., Lamb, M. E., Esplin, P. W., Boychuk, T. D., Krispin, O., & Reiter-
Lavery, L. (1997). Reliability of Criteria-Based Content Analysis of child witness
statements. Legal and Criminological Psychology, 2, 11–21.
Horowitz, S. W., Lamb, M. E., Esplin, P. W., Boychuk, T. D., Reiter-Lavery, L., &
Krispin, O. (1996). Establishing ground truth in studies of child sexual abuse. Expert
Evidence: The International Digest of Human Behaviour Science and Law, 4, 42–52.
Iacono, W. G., & Lykken, D. T. (1997). The validity of the lie detector: Two surveys of
scientific opinion. Journal of Applied Psychology, 82, 426 – 433.
*Joffe, R., & Yuille, J. C. (1992, May). Criteria-Based Content Analysis: An experimental
investigation. Paper presented at the NATO Advanced Study Institute on the Child
Witness in Context: Cognitive, Social and Legal Perspectives, Lucca, Italy.
Jones, D. P. H., & McQuiston, M. (1989). Interviewing the sexually abused child. London:
Gaskell.
Kaufmann, G., Drevland, G. C., Wessel, E., Overskeid, G., & Magnussen, S. (2003). The
importance of being earnest: Displayed emotions and witness credibility. Applied
Cognitive Psychology, 17, 21–34.
Köhnken, G. (1987). Training police officers to detect deceptive eyewitness statements:
Does it work? Social Behaviour, 2, 1–17.
Köhnken, G. (1989). Behavioral correlates of statement credibility: Theories, paradigms
and results. In H. Wegener, F. Lösel, & J. Haisch (Eds.), Criminal behavior and the
justice system: Psychological perspectives (pp. 271–289). New York: Springer-
Verlag.
Köhnken, G. (1996). Social psychology and the law. In G. R. Semin & K. Fiedler (Eds.),
Applied social psychology (pp. 257–282). London: Sage.
Köhnken, G. (1999, July). Statement Validity Assessment. Paper presented at the precon-
ference program of applied courses assessing credibility organized by the European
Association of Psychology and Law, Dublin, Ireland.
Köhnken, G. (2002). A German perspective on children’s testimony. In H. L. Westcott,
G. M. Davies, & R. H. C. Bull (Eds.), Children’s testimony: A handbook of
psychological research and forensic practice (pp. 233–244). Chichester, England:
Wiley.
*Köhnken, G., Schimossek, E., Aschermann, E., & Höfer, E. (1995). The cognitive
38 VRIJ
Based Content Analysis on their ability to deceive CBCA-raters. Legal and Crimi-
nological Psychology, 5, 57–70.
Vrij, A., & Winkel, F. W. (1991). Cultural patterns in Dutch and Surinam nonverbal
behavior: An analysis of simulated police/citizen encounters. Journal of Nonverbal
Behavior, 15, 169 –184.
Vrij, A., & Winkel, F. W. (1994). Perceptual distortions in cross-cultural interrogations:
The impact of skin color, accent, speech style and spoken fluency on impression
formation. Journal of Cross-Cultural Psychology, 25, 284 –295.
Walker, A. G., & Warren, A. R. (1995). The language of the child abuse interview: Asking
the questions, understanding the answers. In T. Ney (Ed.), True and false allegations
in child sexual abuse: Assessment and case management (pp. 153–162). New York:
Brunner-Mazel.
Wegener, H. (1989). The present state of statement analysis. In J. C. Yuille (Ed.),
Credibility assessment (pp. 121–134). Dordrecht, the Netherlands: Kluwer.
Wells, G. L., & Loftus, E. F. (1991). Commentary: Is this child fabricating? Reactions to
a new assessment technique. In J. Doris (Ed.), The suggestibility of children’s
recollections (pp. 168 –171). Washington, DC: American Psychological Association.
Winkel, F. W., & Koppelaar, L. (1991). Rape victims’ style of self-presentation and
secondary victimization by the environment. Journal of Interpersonal Violence, 6,
29 – 40.
*Winkel, F. W., & Vrij, A. (1995). Verklaringen van kinderen in interviews: Een
experimenteel onderzoek naar de diagnostische waarde van Criteria-Based Content
Analysis. Tijdschrift voor Ontwikkelingspsychologie, 22, 61–74.
*Yuille, J. C. (1988a, June). A simulation study of Criteria-Based Content Analysis. Paper
presented at the NATO Advanced Study Institute on Credibility Assessment, Mar-
atea, Italy.
Yuille, J. C. (1988b). The systematic assessment of children’s testimony. Canadian
Psychology, 29, 247–262.
*Zaparniuk, J., Yuille, J. C., & Taylor, S. (1995). Assessing the credibility of true and
false statements. International Journal of Law and Psychiatry, 18, 343–352.