0% found this document useful (0 votes)
62 views14 pages

Acy 087

Uploaded by

johnsmithdos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views14 pages

Acy 087

Uploaded by

johnsmithdos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Archives of Clinical Neuropsychology 34 (2019) 1367–1380

Wait, There’s a Baby in this Bath Water! Update on Quantitative and


Qualitative Cut-Offs for Rey 15-Item Recall and Recognition
Kellie Poynter1, Kyle Brauer Boone1,*, Annette Ermshar1, Deborah Miora1, Maria Cottingham2,
Tara L. Victor3, Elizabeth Ziegler4, Michelle A. Zeller5, Matthew Wright6

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


1
California School of Forensic Studies, Alliant International University, Los Angeles, CA 91803, USA
2
Mental Health Care Line, Veterans Administration Tennessee Valley Healthcare System, Nashville, TN 37212, USA
3
California State University, Dominguez Hills, Carson, CA 90747, USA
4
Private Practice, Spokane, WA 99201, USA
5
West Los Angeles Veterans Administration Medical Center, Los Angeles, CA 90073, USA
6
Harbor-UCLA Medical Center, Torrance, CA 90509, USA
*Corresponding author at: Tel.: +1-310-375-5740; fax: +1-310-375-5790.
E-mail address: [email protected] (K.B. Boone)
Editorial Decision 12 October 2018; Accepted 17 October 2018

Abstract
Objective: Evaluate the effectiveness of Rey 15-item plus recognition data in a large neuropsychological sample.
Method: Rey 15-item plus recognition scores were compared in credible (n = 138) and noncredible (n = 353) neuropsychology referrals.
Results: Noncredible patients scored significantly worse than credible patients on all Rey 15-item plus recognition scores. When cut-offs
were selected to maintain at least 89.9% specificity, cut-offs could be made more stringent, with the highest sensitivity found for recognition
correct (cut-off ≤11; 62.6% sensitivity) and the combination score (recall + recognition – false positives; cut-off ≤22; 60.6% sensitivity),
followed by recall correct (cut-off ≤11; 49.3% sensitivity), and recognition false positive errors (≥3; 17.9% sensitivity). A cut-off of ≥4
applied to a summed qualitative error score for the recall trial resulted in 19.4% sensitivity. Approximately 10% of credible subjects failed
either recall correct or recognition correct, whereas two-thirds of noncredible patients (67.7%) showed this pattern. Thirteen percent of cred-
ible patients failed either recall correct, recognition correct, or the recall qualitative error score, whereas nearly 70% of noncredible patients
failed at least one of the three. Some individual qualitative recognition errors had low false positive rates (<2%) indicating that their pres-
ence was virtually pathognomonic for noncredible performance. Older age (>50) and IQ < 80 were associated with increased false positive
rates in credible patients.
Conclusions: Data on a larger sample than that available in the 2002 validation study show that Rey 15-item plus recognition cut-offs can
be made more stringent, and thereby detect up to 70% of noncredible test takers, but the test should be used cautiously in older individuals
and in individuals with lowered IQ.
Keywords: Rey 15-item plus recognition; Performance validity; Malingering

Introduction

The Rey 15-item Test (Lezak, 1995), originally developed more than 50 years ago and often viewed as outdated, in fact
continues to be commonly used as a free-standing performance validity test (PVT). Survey data from Martin, Schroeder, and
Odland (2015) and LaDuke, Barr, Brodale, and Rabin (2018) showed that among practicing neuropsychologists, nearly a
quarter (23%–24.1%) reported using the Rey 15-item Test, and in a survey of experts in performance validity assessment
(Schroeder, Martin, & Odland, 2016), a third reported using the test. Similarly, the Rey 15-item Test has been reported to be
employed by 43.8% of neuropsychologists practicing in Veterans Administration settings (Young, Roper, & Arentsen, 2016).
The original version of the Rey 15-Item Test involves recall of 15 inter-related items to which test takers are briefly
exposed. Boone, Salazar, Lu, Warner-Chacon, and Razani (2002) found that using the recommended cut-off of <9 items

© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
doi:10.1093/arclin/acy087 Advance Access publication on 1 December 2018
1368 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

reproduced, specificity in a credible neuropsychology clinic sample was high (97%–100%), but sensitivity was only 47%.
They attempted to increase Rey 15-item sensitivity by adding a recognition task to follow the recall trial that consists of a
page containing the original 15 items along with 15 foils similar to the target items. The validation sample included 49
compensation-seeking noncredible patients, 36 credible non-compensation-seeking neuropsychology clinic patients, 33 learn-
ing disabled college students, and 60 non-clinical controls. A cut-off of <20 applied to an equation involving recall and recog-
nition data (recall plus recognition minus recognition false positives) resulted in 92% specificity and sensitivity of 71%.
Subsequently, Morse, Douglas-Newman, Mandel, and Swirsky-Sacchetti (2013), in examining the equation in 29 noncred-
ible patients, 63 litigating patients with valid neurocognitive performance, 36 learning disabled individuals, and 54 non-
litigating neuropsychological patients, found the equation cut-off could be raised to <21 and still maintain an acceptable false
positive rate (<8%) while achieving 70% sensitivity. In contrast, Bailey, Soble, and O’Rourke (2018), in an investigation of
the Rey 15-item plus recognition equation in a mixed clinical sample of veterans (44 who passed the Word Memory Test
[WMT] and 18 who failed), documented unacceptable specificity using the cut-off of <20 (75%) as well as lowered sensitiv-

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


ity (39%). However, patients with dementia and amnestic disorder were included in the credible sample, and it is unknown
whether motive to feign was present (i.e., whether patients were also seeking financial compensation concurrently with the
clinical evaluation); presence of dementia/amnesia disorder diagnosis and motive to feign could have served to lower test
specificity. Lowered specificity rates for Rey 15-item recall and recognition scores have in fact been documented in patients
with dementia (Dean, Victor, Boone, Philpott, & Hess, 2009), as well as in individuals with lowered intelligence (Dean,
Victor, Boone, & Arnold, 2008; Love, Glassmire, Zanolini, & Wolf, 2014; Marshall & Happe, 2007; Smith et al., 2014), and
in monolingual Spanish-speakers of lowered educational level (Robles, López, Salazar, Boone, & Glaser, 2015).
In summary, current research is contradictory as to appropriate cut-offs and specificity and sensitivity rates for the Rey 15-
item plus recognition. Available research has been hampered by relatively small sample sizes (e.g., n’s of 18–49 in noncred-
ible samples; n’s of 36–54 in credible comparison samples) and it is necessary to confirm the Rey 15-item plus recognition
equation specificity and sensitivity rates in a larger sample. When cut-offs are selected to allow only 10% of a small credible
sample to fail (e.g., 3–5 patients in the case of n’s of 30–50), one or two outlier scores can markedly change cut-offs.
Further, while credible patients with IQ < 70 were excluded from the Boone and colleagues (2002) and Morse and collea-
gues (2013) credible samples due to evidence that individuals with intellectual disability fail PVTs despite performing to true
ability (e.g., patients with IQs between 60 and 69 fail 44% of administered PVTs; Dean, Victor, Boone, & Arnold, 2008), it
is arguably preferable to exclude patients with IQ < 80 from comparison samples. Individuals with borderline IQ (70–79) fail
approximately 17% of PVTs administered, but once IQ exceeds 80, PVT performance across IQ levels (e.g., low average,
average, high average, superior) is comparable (<10% of PVTs administered are failed; Dean et al., 2008). Similarly, Keary
and colleagues (2013) observed much stronger relationships between IQ and PVT performance at lower IQ levels, with the
relationship diminishing as IQ increased. In a recent cross-validation of the Dot Counting Test (McCaul et al., 2018), the E-
score cut-off could be lowered (from 14.80 to 13.80) and thereby increasing test sensitivity from 62% to 70%, once credible
patients with borderline IQ were excluded. These data suggest that it is optimal to develop PVT cut-offs for individuals of
low average and higher IQ, but to separately validate customized cut-offs for individuals of lower intelligence, such as was
illustrated by Smith and colleagues (2014). In the current study it was judged that excluding individuals with IQ < 80 would
allow higher cut-offs to be selected, while still maintaining adequate specificity, and thereby increasing test sensitivity.
Additionally, in the Boone and colleagues (2002) study, no attempt was made to analyze types of errors on the recall and
recognition trials; it is hypothesized that such qualitative data might add to test sensitivity in identifying noncredible perfor-
mance. Over 20 years ago Griffin and colleagues (1996) developed a qualitative scoring approach for the original Rey 15-
item Test on a simulator group (n = 90). The following qualitative errors were tabulated: (1) Roman Numeral error (the tally
marks in line 5 drawn as Roman numbers), (2) Dyslexia error (reversing “b” to “d”), (3) Within Row error (items within a
row are rearranged), (4) Between Row error (items from two or more different rows are intermixed [e.g., A 2 II]), (5) Row
Sequence error (rows are not reproduced in the sequence depicted on the stimulus card), (6) Repetition error (duplication of a
correct character or correct set of characters [e.g., AAA]), (7) Indistinct Character error (production of indistinguishable
figures [e.g., ]), (8) Gestalt error (failure to reproduce a 3 × 5 configuration when 15 items are present), (9) Wrong Item
error (production of an item not present on the stimulus card), and (10) Embellishment error (an elaboration or adornment of
the configuration or any individual item [e.g., placing a smiling face on the B]). One point was assigned for each type of qual-
itative error, regardless of the number of times the error was committed.
The qualitative scoring method was then validated in 91 “possible” malingerers seeking disability for psychological disor-
ders (likely representing a mixed group of credible and noncredible patients), 90 instructed simulators, 53 individuals in resi-
dential care judged permanently psychiatrically disabled and already receiving disability compensation (21 with
schizophrenia, 12 with mood disorder, 11 with substance abuse, and 9 with mental retardation), and 64 normal controls. The
credible disabled patients rarely committed between row, dyslexia, embellishment, gestalt, indistinct character, and Roman
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1369

numeral errors (<10%), whereas repetition errors (15.1%), row sequence errors (11.3%), within row errors (32.1%), and
wrong item errors (11.3%) were more common. The errors that were more prominent in the possible malingering group as
compared to the credible disabled group included between row errors (11.0% vs. 3.8%), gestalt errors (14.3% vs. 1.9%),
Roman numeral errors (13.2% vs. 1.9%), row sequence errors (26.4% vs. 11.3%), and wrong item errors (19.8% vs. 11.3%).
The instructed malingerers committed a relatively high percentage of between row errors (26.7%), embellishment errors
(13.3%), repetition errors (28.9%), Roman numeral errors (13.3%), row sequence errors (42.2%), within row errors (42.2%),
and wrong item errors (28.9%).
This study is intriguing but problematic methodologically. First, test instructions were modified to enhance the number of
potential scorable errors (i.e., subjects were instructed to reproduce items “just as they appear on the card”). Second, the credi-
ble patient group was substantially disabled (required residential care) and included individuals with intellectual disability. As
discussed above, Dean and colleagues (2008) found that individuals with IQ in the 60–69 range failed a large percentage of
PVTs administered, despite performing to true ability. Studies that include such low functioning credible subjects produce

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


cut-scores that are relatively ineffective in identifying feigning because they have to be adjusted to protect the very disabled
subjects who obtain poor scores on PVTs despite performing to true ability. A third concern is that the possible malingering
group included both credible and noncredible subjects, and as such, reported hit rates for detection of noncredible perfor-
mance are likely an underestimate of true effectiveness. Further, individuals in this group were seeking disability compensa-
tion for psychological conditions, and findings may not extrapolate to settings in which brain dysfunction specifically is being
claimed. A final consideration is that a limit was placed on the total possible qualitative score (e.g., each type of qualitative
error was awarded one point no matter how many times it appeared), which likely serves to limit the true hit rate of the
technique.
Interestingly, Griffin et al.’s (1996) approach to documenting qualitative indicators on the Rey 15-item Test appears to
have been largely ignored since its publication over 20 years ago, with the exception of Love and colleagues (2014) who tab-
ulated Griffin et al.’s (1996) qualitative errors in a sample of 21 individuals involuntarily committed to an inpatient facility
for patients with intellectual disability.
The purpose of the present study was to examine both quantitative and qualitative Rey-15 Item test scores in a large sam-
ple of credible and noncredible patients in order to provide more definitive data regarding optimal test cut-offs and associated
specificity and sensitivity levels.

Methods

Participants

Archival data were accessed from outpatients seen for neuropsychological assessment at the Neuropsychological Service at
Harbor-UCLA Medical Center and patients tested in the private forensic practice of the second author. Patients evaluated in
the former setting were referred by treating psychiatrists or neurologists for diagnostic clarification, case management, and/or
determination of appropriateness for disability compensation. Patients tested in the latter setting were either evaluated in the
context of litigation or at the request of private disability carriers. All patients were fluent in English. Use of the archival data
was approved by the IRB at Alliant International University. We did not exclude patients from the Boone and colleagues
(2002) study because qualitative errors were not examined in that publication; however, at least 303 new noncredible patients
and at least 102 new credible patients were examined in the current study.

Credible patients. The 138 patients assigned to the credible group met the following criteria: (a) no motive to feign symptoms
(not in litigation or attempting to secure disability compensation), (b) failure on one or fewer PVTs out of a total of up to nine
administered (listed with cut-offs in Table 1; note: scores from the same test were counted as a single failure; due the clinical
nature of the data, not all PVTs were available for all patients), and (c) no FSIQ <80 or dementia diagnoses. The nine PVTs
were selected to sample a variety of neurocognitive domains, including processing speed, attention, visual perception, verbal
memory, visual memory, and motor dexterity, to insure that differing types of neurocognitive symptom feigning were as-
sessed. Patients who failed a single PVT were retained in the credible sample given evidence that failure on a single PVT is
not unusual in credible populations (Victor, Boone, Serpa, Buehler, & Ziegler, 2009), and that the expected number of PVT
failures when nine are administered is one (Davis & Millis, 2014). It was judged that exclusion of patients with no motive to
feign and who failed a single PVT might be problematic because they likely have more actual cognitive dysfunction (leading
to the PVT failure), and removal of these individuals could result in a spurious raising of Rey 15-item plus recognition speci-
ficity rates. To more thoroughly evaluate the impact of retention of credible patients with a single PVT failure on study
1370 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

Table 1. PVTs used for group assignment, and numbers of patients completing each PVT
Processing Speed:
b Test Roberson and colleagues (2013)
E-score ≥82 (88 credible, 322 noncredible)
Dot Counting Boone and colleagues (2002)
E-score ≥17 (130 credible, 330 noncredible)
Motor Dexterity:
Finger Tapping (dominant hand) Arnold and colleagues (2005)
Men ≤35 (68 credible, 180 noncredible)
Women ≤28 (61 credible, 127 noncredible)
Visual Memory:
Rey Osterrieth Reedy and colleagues (2013)
Effort Equation ≤50 (120 credible, 302 noncredible)
Digit Symbol Kim, N. and colleagues (2010))

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


recognition equation ≤57 (102 credible, 229 noncredible)
Verbal Memory:
Rey Auditory Verbal Learning Test Boone and colleagues (2005); Sherman and colleagues (2002)
Effort Equation ≤12 (119 credible, 311 noncredible), or
RAVLT/RO discriminant function ≤−.40 (119 credible, 309 noncredible)
Warrington Recognition Memory Test Kim, M., and colleagues (2010))
-Words
Total ≤42 (111 credible, 300 noncredible), or
Time ≥207” (94 credible, 267 noncredible)
Attention:
Digit Span Babikian, Boone, Lu, and Arnold (2006)
ACSS ≤5 (138 credible, 325 noncredible)
RDS ≤6 (137 credible, 346 noncredible)
3-digit time >2” (109 credible, 306 noncredible), or
4-digit time >4” (89 credible, 271 noncredible)
Visual Perception:
Picture Completion Solomon and colleagues (2010)
Most Discrepant Index ≤2 (106 credible, 282 noncredible)
Failures on multiple cut-offs from a single test were counted as a single failure.

results, those failing versus not failing a single PVT were compared on Rey 15-item plus recognition scores (these analyses
are discussed in the Results section below).
Participants with a WAIS-III FSIQ of less than 80 or diagnoses of dementia were excluded due to evidence that these
groups fail PVTs at a high rate despite performing to true ability (Dean et al., 2008; Dean, Victor, Boone, Philpott, & Hess,
2009; Smith et al., 2014); retention of these participants results in cut-off scores that have lowered sensitivity. A listing of the
frequencies of final diagnoses is shown in Table 2, and demographic data are reproduced in Table 3.

Noncredible patients. The 353 subjects assigned to the noncredible group met the following criteria: (a) motive to feign
symptoms (in litigation or attempting to secure disability compensation), (b) failure on at least two independent PVTs (listed
with cut-offs in Table 1; due to the clinical nature of the data, not all PVT data were available for all patients), and (c) evi-
dence that low cognitive scores were inconsistent with normal function in activities of daily living. Unfortunately, the same
exclusion criteria (i.e., FSIQ <80 and presenting diagnoses of dementia) used for the credible group could not be employed
because these data are not accurate in an unknown percentage of compensation-seeking participants. Studies have shown that
noncredible compensation-seekers obtain much lower IQ scores than do credible patient groups without motive to feign (e.g.,
Bianchini, Mathias, Greve, Houston, & Crouch, 2001) because the former are not performing to true ability on the IQ mea-
sures. Likewise, noncredible patients can be incorrectly assigned diagnoses of dementia when they do not perform to true abil-
ity on memory testing.
The approach we used to confirm appropriateness of assignment to the noncredible group was by checking for a mismatch
between low cognitive scores and evidence of normal function in ADLs (e.g., dementia-level memory scores but able to live
independently, work, drive, handle his or her own finances, etc.). If such a mismatch was present, the participant was retained
in the noncredible group. However, if individuals had verifiable evidence of low cognitive function and adaptive failures out-
side of the evaluation context that could account for their PVT failure (e.g., not able to live independently, had never held
employment or been able to drive, had a guardian or conservator, etc.), they were excluded from the noncredible group.
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1371

Table 2. Frequencies of diagnoses by group


Diagnosis Credible (n = 138) Noncredible (n = 353)

Anoxia 4 11
R/O Anoxia 1 —
Anxiety/Panic Disorder 5 4
Asperger’s Syndrome (R/O) 1 —
Attention Deficit Disorder 5 —
R/O Attention Deficit Disorder 1 1
Bipolar Disorder 7 1
Brain Tumor/Abscess 2 2
Chronic Fatigue — 2
Chronic Pain — 7
Cognitive Disorder NOS 2 7

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


Dementia (Vascular) 2
R/O Dementia — 2
Depression 27 22
R/O Depression 2 1
Electrical Injury — 2
Epilepsy 7 10
R/O Epilepsy — —
HIV/AIDs 4 —
Hydrocephalus 1 —
Klinefelter Syndrome 2 —
Learning Disability 13 13
R/O Learning Disability 5 1
Liver disease (end stage) 1 —
Lupus — 1
Meningitis — 2
Mental Retardation — 4
Multiple Sclerosis 1 —
Prenatal Substance Exposure 1 —
Psychotic Disorder 8 31
R/O Psychosis 1 1
Post Traumatic Stress Disorder 1 3
Somatoform Disorder 6 3
R/O Somatoform Disorder 10 —
Stroke 3 20
Substance Abuse 10 6
Syncopal Episodes — 1
Toxic Exposure — 13
Traumatic Brain Injury
Mild 1 141
Mild Complicated — 4
Moderate 2 13
Severe 4 22

Table 2 shows the distribution of presenting/claimed diagnoses; cognitive complaints attributed to these conditions were
ultimately determined to have been feigned or exaggerated. Demographic data are provided in Table 3. Tabulation of the fre-
quency of PVT failures (out of a total of 9) revealed that 8.8% of the noncredible sample failed two PVTs (n = 31), 12.7%
failed three PVTs (n = 45), 18.4% failed four PVTs (n = 65), 17.3% failed five PVTs (n = 61), 20.1% failed six PVTs (n =
71), 14.2% failed seven PVTs (n = 50), 6.5% failed eight PVTs (n = 23), and 2.0% failed all nine PVTs (n = 7).
Patients were only assigned to groups if they met all criteria for group assignment; patients failing ≤1 PVTs but with
motive to feign, and patients failing two or more PVTs but with no motive to feign, were not included in the study in an
attempt to enhance accuracy of group assignment. For example, we judged that in the subgroup of patients who failed one or
fewer PVTs but had motive to feign, it was not appropriate to assign them to the credible group since they might still have
been feigning (given incentive to do so), but had not been detected (given imperfect PVT sensitivity). Likewise, we judged
that individuals failing two or more PVTs in the absence of motive to feign should not be assigned to the noncredible group
since feigning is rare when there is no incentive to do so, and that this subgroup was most likely populated by patients who
failed PVTs due to true, substantial cognitive dysfunction.
1372 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

Table 3. Demographic and Rey 15-Item test score comparison data


Credible (n = 138) Noncredible (n = 353) p t d

Age 42.00 ± 13.45 45.20 ± 12.42 .016 −2.418 0.24


(18–75) (17–77)
Years of education 13.29 ± 3.52 12.96 ± 2.96 .348 0.940 0.09
(0–20) (0–21)
Gender 69 m/65 f (4 missing) 197 m/153 f (3 missing)
Ethnicity
Caucasian 73 (52.9%) 152 (43.1%)
African American 12 (8.7%) 88 (24.9%)
Hispanic 32 (23.2%) 71 (20.1%)
Asian 8 (5.8%) 15 (4.2%)
Middle Eastern 6 (4.3%) 14 (4.0%)

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


Native American 3 (2.2%) 3 (.9%)
East Indian 1 (.7%) 2 (.6%)
Other 3 (2.2%) 3 (.9%)
Missing — 5 (1.4%)
Native Language
English 104 (75.4%) 261 (73.9%)
ESL 24 (17.4%) 84 (23.8%)
Both learned concurrently 9 (6.5%) 5 (1.4%)
Missing 1 (0.7%) 3 (.8%)
Rey 15
Recall Correct 13.91 ± 1.77 10.68 ± 3.56 <.001 9.959 1.83
(6–15) (0–15)
Recall Total Qualitative 0.74 ± 1.74 2.09 ± 3.14 <.001 −4.756 0.78
Errors (0–13) (0–17)
Recognition Correct 13.92 ± 1.47 9.85 ± 3.57 <.001 12.901 2.77
(6–15) (0–15)
Recognition False Positives .46 ± .89 1.14 ± 1.85 <.001 −4.184 0.76
(0–4) (0–10)
Combination Score 27.38 ± 3.17 19.39 ± 7.738 <.001 12.272 2.52
(14–30) (0–30)
Combination Score – 26.65 ± 4.32 17.25 ± 9.45 <.001 11.183 2.18
Revised (3–30) (−18–30)
SD = standard deviation; m = male; f = female; d = effect size.
Combination Score: recall minus + (recognition minus false positive errors).
Combination Score – Revised: (recall minus recall total qualitative errors) + (recognition minus false positive errors).

Procedure and Scores

All subjects were administered the Rey 15-item plus recognition according to standard procedures (see Boone et al., 2002)
as part of a neuropsychological assessment. Scores used for analysis included: (1) Recall correct, (2) Recognition correct, (3)
Recognition false positive errors, (4) Combination equation (recall + recognition – false positives), (5) frequencies of all 15
recognition errors, and frequencies of various types of qualitative errors committed during recall (listed in Table 5), including
several employed by Griffin and colleagues (1996).

Results

As shown in Table 3, groups did not differ in education, but did differ significantly in age, although the difference was
less than 4 years. Correlations computed in each group separately showed that age was significantly related to recall correct (r
= −.186, p = .029), recognition false positive errors (r = .181, p = .033), and the combination equation (r = −.232, p = .006)
in the credible group; and in the noncredible group was significantly correlated with recall correct (r = −.166, p = .002) and
the combination equation (r = −.135, p = .011). However, age accounted for ≤5% test score variance and was not further
considered in analyses.
As reproduced in Table 3, significant differences were found, using independent t-tests (and confirmed with Mann–
Whitney U analyses given the generally non-normal distribution of test scores) between groups in recall correct, recognition
correct, recognition false positive errors, and the combination equation, with the noncredible group performing worse. When
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1373

the previously published combination equation cut-off of <20 was employed, specificity was excellent (97.8%), with sensitiv-
ity of 47.6%. However, as shown in Table 4, when cut-offs were selected to maintain at least 89.9% specificity, the highest
sensitivity was found for recognition correct (cut-off ≤11; 62.6% sensitivity) and the combination score (cut-off ≤22; 60.6%
sensitivity), followed by recall correct (cut-off ≤11; 49.3% sensitivity), and recognition false positive errors (≥3; 17.9% sensi-
tivity). In Table 5 are reproduced the positive and negative predictive power values for the combination score at various base
rates of noncredible performance.
Receiver Operating Characteristic (ROC) curves allow examination of the global utility of each test score regardless of any
particular cut-off point. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between
groups. AUCs for the four Rey 15-item scores were .770 (95% CI = .735–.819) for recall correct, .850 (95% CI = .816–.884)
for recognition correct, .843 (95% CI = .808–.878) for the combination score, and .601 (95% CI = .549–.654) for recognition
false positives, confirming superior classification accuracy for recognition correct and the combination score.
Credible patients who failed one (n = 86) versus 0 (n = 52) PVTs were compared on the Rey 15-item plus recognition

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


scores; mean scores were virtually identical across groups (p >.400) with the exception that group differences approached sig-
nificance for recognition correct. The mean recognition correct score for the group with zero PVT failures was 14.09 ± 1.15,
while the mean recognition correct score in the group with one failure was 13.63 ± 1.85 (t = 1.795; p = .075). Importantly,
the inclusion of the patients with one PVT failure did not impact selection of the optimal cut-off for recognition correct; i.e.,
the cut-off associated with ≥90% specificity in the group with one PVT failure was ≤10.00, while the cut-off at ≥90% speci-
ficity in the group with zero PVT failures was ≤11.0, as was the cut-off which achieved ≥90% specificity in the credible
group as a whole.

Table 4. Specificity and sensitivity rates for Rey 15-item plus recognition cut-offs
Cut-off Recall Correct Recall Qualit. Recog. Correct Recog. FP Combo Score
Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens.

0 100.0 0.3 0 100.0 100.0 0.3 0 100.0 100.0 0.8


1.00 100.0 0.6 73.7 56.6 100.0 0.6 73.2 44.3 100.0 1.1
2.00 100.0 2.0 82.5 41.4 100.0 3.1 88.4 27.0 100.0 1.7
3.00 100.0 4.8 89.1 30.0 100.0 5.9 93.5 17.7 100.0 2.3
4.00 100.0 5.4 94.2 19.4 100.0 8.5 99.3 9.5 100.0 3.7
5.00 100.0 6.5 96.4 13.1 100.0 14.2 100.0 6.7 100.0 4.9
6.00 99.3 16.1 97.1 10.0 99.3 20.1 3.9 100.0 7.1
7.00 99.3 19.8 97.8 7.7 99.3 26.1 2.5 100.0 8.8
8.00 98.6 24.6 99.3 6.9 99.3 31.4 2.2 100.0 9.6
9.00 96.4 39.4 99.3 5.7 99.3 42.2 1.4 100.0 11.3
10.00 95.7 43.1 99.3 4.3 97.1 51.3 1.1 100.0 13.3
11.00 92.0 49.3 99.3 4.0 94.2 62.6 0.3 100.0 16.1
12.00 73.9 70.5 99.3 2.9 81.2 78.5 0.3 100.0 21.0
13.00 70.3 73.1 99.3 2.6 74.6 81.9 0.3 100.0 23.2
14.00 65.9 76.2 100.0 2.3 47.8 88.1 0.3 98.6 26.1
15.00 0 100.0 1.7 0 100.0 0.3 98.6 29.5
16.00 0.3 0.3 98.6 33.7
17.00 0.3 0.3 98.6 36.5
18.00 0 0.3 97.8 44.5
19.00 0.3 97.8 47.6
20.00 0.3 97.1 51.0
21.00 0.3 93.5 57.2
22.00 0.3 92.8 60.6
23.00 0.3 89.1 66.9
24.00 0.3 84.8 73.4
25.00 0.3 80.4 76.2
26.00 0.3 73.9 79.3
27.00 0.3 58.0 85.6
28.00 0 50.0 87.8
29.00 35.5 92.4
30.00 0 100.0
Note: Spec. = Specificity; Sens. = Sensitivity; Recall Qualit. = Recall Qualitative Error Score; Recog. Correct = Recognition Correct; Recog. FP =
Recognition False Positive Errors; Combo Score = Combination Score.
Bolded/italicized numbers identify specificity/sensitivity values associated with optimal cut-scores (≥90% specificity).
1374 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

The Rey 15-item combination cut-off was then applied to the largest noncredible subgroups (mild traumatic brain injury,
n = 141; psychosis, n = 30 [one patient with claimed psychosis did not have combination score data]; severe traumatic brain
injury, n = 22; depression, n = 23) to determine cut-off sensitivity in the context of these claimed diagnoses. The combination
score cut-off achieved 90.0% sensitivity in noncredible psychosis, 78.3% sensitivity in noncredible depression, and 63.6% in
noncredible severe traumatic brain injury, but only 45.4% sensitivity in noncredible mild traumatic brain injury.
Qualitative errors for both recall and recognition trials were tabulated. As can be seen in Table 6, some recall errors
occurred equally frequently in both groups (row sequence, Roman numeral, embellishment), but every other type of error was
over-represented in the noncredible group, and several occurred three to four times more often in noncredible subjects (e.g.,

Table 5. Positive predictive values (PPV) and negative predictive values (NPV) for the Rey 15-Item combination score at 15%, 30%, and 50% base rates of
noncredible performance
Cut-off Scores 15% Base Rate 30% Base Rate 50% Base Rate

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


PPV (%) NPV (%) PPV (%) NPV (%) PPV (%) NPV (%)

≤15 78.8 88.8 90.0 76.5 95.5 58.3


≤18 78.1 90.9 89.7 80.4 95.2 63.8
≤19 79.2 91.4 90.3 81.3 95.6 65.1
≤20 75.6 91.8 88.3 82.2 94.6 66.5
≤21 60.8 92.5 79.0 83.6 89.8 68.6
≤22 59.8 93.0 78.3 84.6 89.4 70.2
≤23 52.0 93.9 72.5 86.3 86.0 72.9
≤24 46.0 94.8 67.4 88.2 94.6 66.5
≤25 40.7 95.0 62.5 88.7 79.5 77.2
≤26 34.9 95.3 56.6 89.3 75.2 78.1
Recommended cut-off in bold.

Table 6. Percentage within each group committing each type of recall error
Recall error Credible Noncredible
(n = 138) (n = 353)

Between Row 2.9 7.7


ABC not first* 7.2 18.5
Repetition Error (individual item) 6.5 12.9
Repetition Error (row) 2.9 6.0
Gestalt* 4.3 22.8
Wrong Item 9.1 23.4
No Numbers* 2.9 8.3
No ABC* 1.5 8.0
Break Sequence 8.7 21.4
Continue Sequence 1.4 6.0
Row Sequence 31.2 34.5
Roman Numeral 6.5 6.6
Embellishment <1.0 <1.0
Note: * Denotes items in which error was counted only once (i.e., total possible = 1).
(1) Between Row: Items belonging to different rows intermixed within the same row
(2) ABC not first*: ABC not present in first row drawn (all three letters must be present for credit)
(3) Repetition error (individual item): An individual item repeated
(4) Repetition error (row): An entire row repeated
(5) Gestalt error*: Failure to reproduce a 3×5 configuration
(6) Wrong item: An item drawn that was not contained within the 15 target items
(7) No numbers*: 1 – 2 − 3 not contained in drawing (entire row must be present for credit)
(8) No ABC*: A – B – C not contained in drawing (entire row must be present for credit)
(9) Break sequence: Overlearned sequence not reproduced (e.g., ABC as ACB, 123 as 234, etc.; reproduction of 1 or 2 items from overlearned sequence,
e.g., ABC as AB or A, etc.; ABC as Abc, etc.)
(10) Continue sequence: Sequence extended (e.g., ABCD, 1234, abcd, l ll lll llll, etc.; 123 on one row followed by 456 on the next row)
(11) Row Sequencing: Rows reproduced in wrong order
(12) Roman Numerals: Items from the final row reproduced as Roman numerals
(13) Embellishment: Correct reproduction of item “ruined” by additions
Recall Total Qualitative Errors: sum of number of errors for #1 through #10 (#11, #12, and #13 excluded).
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1375

gestalt errors, no ABC, continue sequence). A recall qualitative error score was calculated by adding the number of qualitative
errors committed (excluding the three types of errors which occurred equally in both groups; as shown in Table 6, some error
types could be counted more than once). As reproduced in Table 3, the noncredible group obtained a significantly higher
recall qualitative error score; as shown in Table 4, a cut-off score of ≥4 achieved adequate specificity (≥90%), with sensitivity
of 19.4%. The following revised combination score (which included the recall qualitative error score) was calculated:

(recall minus recall qualitative error score) + (recognition minus false positive errors)

Application of a cut-off of ≤21 resulted in 90.5% specificity, and achieved sensitivity of 62.0%, which was generally compa-
rable to the sensitivity level of the original combination score.
In Table 7 are reproduced the rates at which each type of recognition error was committed in each group separately.

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


Sensitivity rates were low (<22%), although some errors occurred very rarely in credible patients (<2%: E, F, f, 5, 6, penta-
gon, parallelogram), indicating that circling of these items was almost pathognomonic for noncredible performance. Further,
circling of the #4, diamond, and 1-, 2-, and 3-hyphen foils occurred 2–3 times more often in noncredible than credible sub-
jects. In contrast, circling of “d” occurred at relatively the same frequency in both groups. Failure to recognize ≥3 items cor-
rectly drawn on recall occurred in only 7% of credible patients, but in a third of noncredible patients.
Various individual Rey 15-item plus recognition scores were then examined in combination. Only 5 (3.6%) credible pa-
tients failed both recall correct and recognition correct cut-offs of ≤11, in contrast to 44.2% of noncredible patients.
Approximately 10% (n = 14) of credible subjects failed either recall correct or recognition correct using these cut-offs,
whereas two-thirds of noncredible patients (67.7%) showed this pattern. Thirteen percent (n = 18) of credible patients failed
either recall correct ≤11 or recognition correct ≤11 or recall qualitative error score ≥4, whereas 69.2% of noncredible patients
failed at least one of the three scores.
In Table 8 are depicted the characteristics of the 10% of credible subjects (n = 10) who fell below cut-offs for the original
combination score (cut-off ≤22). Sixty percent of the subgroup were male, all were Caucasian with the exception of two
Asian patients, and 20% spoke English as a second language, which closes matches the percentage in the credible group as a
whole (17.4%). The average educational level was 16.0, which was higher than that of the credible sample as a whole
(13.29). The mean FSIQ was 101.00, which was comparable to the mean of 99.68 for the entire credible sample. However,
the average age of this subgroup was 50.7, which was higher than the average age of 42.00 for the credible sample as a
whole. Of the 27 credible patients age 50–59, the false positive rate for the combination score cut-off of ≤22 was 14.8%; low-
ering the cut-off to ≤20 was adequately protective for this age group (specificity = 92.8%). When individual score cut-offs
were lowered to <11 for recall and <11 for recognition, failure on either score was associated with 88.9% specificity in this
age subgroup. Of the 15 credible patients age ≥60, specificity for the combination score cut-off of ≤22 was 80%; the

Table 7. Percentage within each group committing each type of recognition false positive error

Recognition error Credible Noncredible


(n = 138) (n = 353)
D 3.6 6.8
E* 0.0 4.5
F* 0.7 2.3
4 2.2 4.8
5* 1.4 3.7
6* 0.7 2.5
d 12.3 13.3
e 2.9 5.1
f* 0.7 4.0
Pentagon* 0.7 5.7
Parallelogram* 0.7 7.1
Diamond 4.3 9.9
– 3.8 9.9
=* 6.5 16.1
≡* 5.8 18.7
Drawn but not Recognized (≥3)* 7.3 33.0
*Denotes items either associated with false positive rate <2% or ≥10% discrepancy across groups.
Drawn but not recognized: Drawn in recall trial but not circled on recognition trial; this score was not included in the tabulation of the recognition false pos-
itive error score
1376 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

Table 8. Demographic, IQ, and diagnostic characteristics of credible subjects falling below combination score cut-off of ≤22
Age Gender Education Language Ethnicity Diagnosis FSIQ

64 Male 16 ESL Asian Bipolar 97


63 Male 20 English White Anxiety 110
62 Female 18 ESL White Depression 118
56 Male 12 English White Bipolar 102
24 Female 14 English White Learning Disorder 87
46 Female 15 English White Depression 89
38 Female 14 English White Anoxia 91
43 Male 18 English Asian Depression 91
53 Male 16 English White R/O Anoxia 103
58 Male 17 English White Substance Abuse 122
ESL = English as a Second Language.

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


combination score had to be lowered to <14 to limit false positive identifications to <10%. Cut-offs required a lowering to
<10 for individual recall and recognition scores to maintain specificity of >90% (95.4%). However, given the very small sam-
ple size in the oldest age group, these cut-offs should be used cautiously.

Discussion

In the current study, credible patients (n = 138) scored significantly better than noncredible patients (n = 353) on Rey 15
recall correct, recognition correct, recognition false positive errors, and the combination score. When cut-offs were selected to
maintain at least 89.9% specificity, the highest sensitivity was found for recognition correct (cut-off ≤11; 62.6% sensitivity)
and the combination score (cut-off ≤22; 60.6% sensitivity), followed by recall correct (cut-off ≤11; 49.3% sensitivity), and
recognition false positive errors (≥3; 17.9% sensitivity).
In the original validation of the Rey-15 plus recognition (Boone et al., 2002), the recall correct cut-off had to be maintained
at the traditional cut-off of <9 to achieve adequate specificity (≥90%) in the sample of 36 credible clinic patients, whereas in
the current study, involving a much larger credible sample, the cut-off could be raised to ≤11 while still limiting false positive
rates to <10%. Similarly, all other score cut-offs could be made more stringent in the current study as compared to the 2002
investigation (i.e., the original recognition correct cut-off associated with ≥90% specificity was <10, the original combination
score cut-off was <20, and the original recognition false positive error cut-off was ≥4). The likely explanation for the higher
specificity rates in the current study is that credible patients with WAIS-III FSIQ <80 were excluded. Individuals with low
average to very superior IQs fail less than 10% of PVTs administered, whereas individuals with IQs of 70–79 fail 17%, and
individuals with IQs 60–69 fail 44% (Dean et al., 2008). Thus, it is appropriate to limit credible patient samples to individuals
with IQs ≥80 when determining generic PVT cut-offs, but customized cut-offs are required for individuals with borderline
and lower IQ levels (see Smith et al., 2014). When individuals of low intellectual scores are included in credible groups for
PVT validation studies, the resulting cut-offs will have lowered sensitivity because cut-offs need to be adequately protective
of low IQ patients. This renders PVTs less sensitive in those populations with low average and higher IQs. In the current
study, 13 patients met criteria for the credible group with the exception that FSIQ was 70–79; to maintain false positives rates
of <10% in this subgroup, the recall correct cut-off had to be lowered to <10, the recognition correct cut-off had to be low-
ered to <9, and the combination score cut-off had to be lowered to <20, confirming the significant impact of lowered intelli-
gence on Rey 15-item cut-offs.
We suspect that the lowered specificity rates found by Bailey and colleagues (2018) for recall correct and combination
scores was at least partially due to the older age of the credible sample (mean = 53.34) and the fact that patients with major
neurocognitive disorders were included (dementia, amnestic disorder, as well as likely lowered IQ). Examination of current
credible subjects who fell below cut-offs for the combination equation showed that they tended to be older than the larger
credible group. Rey 15-item plus recognition cut-offs had to be adjusted for older individuals (age > 50) to maintain specific-
ity of at least 90% (e.g., combination equation <20).
Further, in the Bailey and colleagues (2018) study, group assignment was based on performance on a single PVT (Word
Memory Test; WMT) which has a not inconsequential error rate (sensitivity of 67%–78% and specificity of 70%–81% in trau-
matic brain injury; Greve, Ord, Curtis, Bianchini, & Brennan, 2008), and which likely led to errors in group membership. The
WMT is a verbal forced choice task, while the Rey 15-item is a visual memory free recall and nonforced choice recognition
measure, and it is probable that the two PVTs tap somewhat differing approaches to noncredible test performance and do not
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1377

provide redundant information; in fact, Bailey and colleagues (2018) documented that the correlation between the WMT and
the Rey −15 item scores ranged from .393 to .599, reflecting at most 36% shared score variance.
Morse and colleagues (2013) found more similar specificity rates to those documented in the current study, and it can be
reasonably assumed that if patients with IQ 70–79, and not just patients with IQ < 70, had been excluded from the Morse and
colleagues (2013) credible sample, the recommended cut-offs would have been even more similar to those found to be opti-
mal in the current study.
Test cut-offs were somewhat less sensitive in the current noncredible sample as compared to noncredible subjects included
in the original 2002 validation study. For example, a recall correct cut-off of <9 was associated with 47% sensitivity in the
2002 study as compared to only 24.6% in the current study. Likewise, the combination score of <20 detected 71% of non-
credible patients in the 2002 study, and only 48.7% in the current study. However, because test cut-offs could be raised from
those recommended in the 2002 study, overall sensitivity rates are similar across the two studies. For example, while a combi-
nation cut-off of <20 identified 71% of noncredible subjects in the 2002 study, in the current study, a score of ≤11 on either

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


recall correct or recognition correct achieved 68% sensitivity.
It is of note that a sizable minority (40%) of patients in the noncredible group were claiming residuals from mild traumatic
brain injury (mTBI), whereas only one credible patient presented with this diagnosis (<1%). In the outpatient hospital-based
evaluation setting from which the credible patients were drawn, virtually no patients who were not compensation-seeking
were referred for neuropsychological assessment of mTBI because such patients in fact recovered and were not viewed as
needing evaluation by treaters, as is the expected outcome with mTBI (see Boone, 2013, for review; see also Rohling, Binder,
Demakis, Larrabee, Ploetz, & Langhinrichsen-Rohling, 2011).
Despite the fact that a large minority of the noncredible group was presenting in the context of a condition with no long
term cognitive residuals, the question arose as to whether other members of the noncredible group in fact had conditions
which could have lowered cognitive ability and thereby accounted for group differences. Specifically, in terms of “objec-
tively” documented conditions, only 2.2% of the credible group were diagnosed with stroke (based on brain imaging), as
compared to 5.7% of the noncredible group. Similarly, 0%, 1.4%, and 2.8%, respectively, of the credible group met criteria
for mild complicated, moderate, and severe TBI (based on length of loss of consciousness, Glasgow Coma Scale, length of
anterograde amnesia, and brain imaging abnormalities), while 1.1%, 3.7%, and 6.2%, respectively, of the noncredible group
met criteria for these conditions. As discussed above, to prevent patients who failed PVTs due to major cognitive dysfunction
from being mis-assigned to the noncredible group, patients with evidence of true, significant functional disability were
excluded from the noncredible group. In addition, to examine any remaining effect of objectively documented injury on Rey
15-item performance in the noncredible group, patients were randomly deleted from the noncredible group until the percen-
tages of stroke, and mild complicated, moderate, and severe TBI were comparable in each group. When the Rey 15 combina-
tion score cut-off of ≤22 was applied to the reconstituted noncredible group (n = 317), sensitivity was nearly identical to the
sensitivity rate for the noncredible group as a whole (59.0% vs. 60.6%), arguing that the inclusion of patients with documen-
ted injury did not artificially raise sensitivity rates.
Effectiveness of the Rey 15-item Test in detecting subtypes of noncredible presentations was examined. The combination score
cut-off of ≤22 achieved 90.0% sensitivity in claimed psychosis, 78.3% sensitivity in claimed depression, and 63.6% in noncredible
severe traumatic brain injury presentations, but only 45.4% sensitivity in noncredible mild traumatic brain injury. Thus, Rey 15-
item plus recognition appears to be more sensitive to feigned cognitive symptoms associated with psychiatric and severe TBI pre-
sentations than to noncredible cognitive performance in mTBI. If the Rey 15-item Test is used in evaluating performance validity
in mTBI patients, it should be recognized that a substantial segment (55%) of this population will not be detected.
Comprehensive tabulation of individual qualitative recall and recognition errors showed that they were associated with low
sensitivity rates; however, some scores had very low false positive rates (<2%; recognition errors: E, F, f, 5, 6, pentagon, par-
allelogram), indicating that their presence was virtually pathognomonic for noncredible performance. Drawing of a single
wrong item on recall occurred in less than 10% of credible patients (9.1%) but was found in nearly a quarter of noncredible
subjects (23.4%), and some recall errors occurred rarely in credible patients (<3%; no ABC, continue sequence, between row
error, row repetition error, no numbers), rendering them also virtually pathognomonic for noncredible performance.
Additionally, failure to recognize three or more items that were correctly drawn on recall occurred in <10% of credible pa-
tients, but in a third of noncredible patients, and ABC not first, gestalt errors, and break sequence errors were at least two
times more common in noncredible patients. A recall qualitative error score involving tabulation of the 10 recall qualitative
error types that were most discrepant between groups (cut-off ≥4) achieved 19% sensitivity (>90% specificity).
In the Love and colleagues (2014) tabulation of recall errors in an inpatient patient sample with developmental intellectual
disability, similarly to the current study, only 9.5% committed a wrong item intrusion error, indicating that low intelligence
would not likely account for such qualitative errors. Likewise, in the original Griffin and colleagues (1996) study, only 11.3%
of disabled individuals in residential care drew a wrong item, but this error occurred in 19.8% of possible malingerers and
1378 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

28.9% of simulators, findings highly similar to current findings. Because of low sensitivity, absence of individual qualitative
recall errors cannot be used to rule out noncredible performance, but due to low false positive rates, presence of these errors
can reasonably be used to rule in noncredible performance. A revised combination score, incorporating the recall qualitative
error score, achieved sensitivity (62%) that was essentially comparable to that of the original combination score (60%). As
such, use of the revised score does not appear justified given the extra time required to calculate it.
In line with emerging literature (Smith et al., 2014; Whiteside, Wald, & Busse, 2011), examination of individual scores in
combination showed increased sensitivity at no to minimal sacrifice to specificity. Approximately 10% of credible subjects
failed either recall correct or recognition correct cut-offs of ≤11, whereas over two-thirds (68%) of noncredible subjects
showed this pattern. Thirteen percent of credible patients failed either recall correct ≤11 or recognition correct ≤11 or recall
qualitative error score ≥4, whereas nearly 70% of noncredible patients failed at least one of the three cut-offs. This argues
that future studies should continue to examine use of various scores from individual PVTs in combination.
When assigning patients to credible and noncredible groups, we adopted the procedure of only counting a PVT as failed

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


once, regardless of the number of scores failed from that test. The rationale for this decision is that multiple scores from the
same test are usually highly correlated, meaning that the individual scores often provide redundant information. Further, if
individual scores from PVTs are counted as separate failures, this serves to give more “weight” to PVTs with multiple scores
as compared to PVTs that utilize a single score. As discussed earlier, when nine separate PVTs are employed, only a single
failure is expected in credible samples (Davis & Millis, 2014); however, this research was based on using a single score for
each PVT. Thus, it is unclear whether the same failure rate across separate PVTs is found when multiple scores are examined
from various PVTs. Research from our lab has shown that when multiple scores from a single PVT are utilized, specificity
rates may fall slightly. For example, for the Warrington Recognition Memory Test – Words, time and total correct cut-offs
were identified which individually resulted in ≥90% specificity. However, when they were used together (i.e., failure on either
was counted as the test failure), specificity dropped to 87%. The same phenomenon was also found in the current manuscript
(i.e., individual Rey 15 cut-offs were identified that individually resulted in ≥90% specificity, but when three cut-offs were
used together, the false positive rate rose to 13%). Thus, available data suggest that specificity declines when multiple scores
from a single PVT are employed, but the drop is generally small (e.g., on the order of 4 percentage points), and can be cor-
rected with minor adjustment to cut-offs (see McCaul et al., 2018).
In summary and conclusion, data on a much larger mixed diagnosis neuropsychological sample than that available in the 2002
validation study show that Rey 15-item plus recognition cut-off scores can be made more stringent, while still maintaining low
false positive rates, and thereby detect up to 70% of noncredible test takers. Thus, Rey-15 plus recognition appears to have con-
tinuing relevance in identification of noncredible neurocognitive test performance, although care should be used in when employ-
ing this PVT in older individuals (i.e., adjusted cut-offs should be employed). Further, because credible individuals with IQ < 80
were excluded, cut-offs recommended in this manuscript cannot be used in individuals who likely have IQ less than low average;
clinicians are referred to adjusted cut-offs for individuals with IQ 70–79 described in this manuscript, as well as cut-offs contained
in Smith and colleagues (2014) and Love and colleagues (2014). Similarly, patients with dementia were excluded from the credible
comparison sample, and cut-offs reported in the current manuscript should not be used in the differential of actual versus feigned
dementia (the reader is referred to Boone, 2017, and Dean et al., 2008, for descriptions of methods available in the differential of
actual versus feigned dementia). Finally, while in the current study ESL status was not linked to increased risk of falling below
cut-offs, monolingual Spanish-speakers of low educational level do exhibit high false positive rates on Rey 15-item plus recogni-
tion (Robles et al., 2015), and cut-offs derived from the current study should not be used for this population.

Conflict of Interest

None declared.

Disclosure

Drs. Boone, Ermshar, Miora, Victor, and Ziegler are forensic consultants.

Disclaimer

The views expressed herein do not necessarily represent the views of the Tennessee Valley Healthcare System or the
United States.
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1379

References

Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., et al. (2005). Sensitivity and specificity of finger tapping test scores for the detection of suspect
effort. Clinical Neuropsychologist, 19, 105–120.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity of various digit span scores in the detection of suspect effort. The Clinical
Neuropsychologist, 20, 145–159.
Bailey, K. C., Soble, J. R., & O’Rourke, J. J. (2018). Clinical utility of the Rey 15-Item Test, recognition trial, and error scores for detecting noncredible
neuropsychological performance in a mixed clinical sample of veterans. The Clinical Neuropsychologist, 32, 119–131.
Bianchini, K. J., Mathias, C. W., Greve, K. W., Houston, R. J., & Crouch, J. A. (2001). Classification accuracy of the Portland Digit Recognition Test in trau-
matic brain injury. The Clinical Neuropsychologist, 15, 461–470.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology: An evidence-based approach. New York: Taylor & Francis.
Boone, K. B. (2017). Assessment of neurocognitive performance validity. In Ricker J., & Morgan J. (Eds.), Textbook of clinical neuropsychology (2nd ed).
Abingdon, OX, New York: Taylor and Francis.
Boone, K. B., Lu, P., & Wen, J. (2005). Comparison of various RAVLT scores in the detection of noncredible memory performance. Archives of Clinical

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023


Neuropsychology, 20, 301–319.
Boone, K., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002). The rey 15-item recognition trial: A technique to enhance sensitivity of the rey 15-
item memorization test. Journal of Clinical and Experimental Neuropsychology, 24, 561–573.
Davis, J. J., & Millis, S. R. (2014). Reply to commentary by Bilder, Sugar, and Helleman (2014 this issue) on minimizing false positive error with multiple
performance validity tests. The Clinical Neuropsychologist, 28, 1224–1229.
Dean, A. C., Victor, T. L., Boone, K. B., & Arnold, G. (2008). The relationship of IQ to effort test performance. The Clinical Neuropsychologist, 22,
705–722.
Dean, A. C., Victor, T. L., Boone, K. B., Philpott, L. M., & Hess, R. A. (2009). Dementia and effort test performance. The Clinical Neuropsychologist, 23,
133–152.
Greve, K. W., Ord, J., Curtis, K. L., Bianchini, K. J., & Brennan, A. (2008). Detecting malingering in traumatic brain injury and chronic pain: A comparison
of three forced-choice symptom validity tests. The Clinical Neuropsychologist, 22, 896–918.
Griffin, G. A. E., Normington, J., & Glassmire, D. (1996). Qualitative dimensions in scoring the Rey visual memory test of malingering. Psychological
Assessment, 8, 383–387.
Keary, T. A., Frazier, T. W., Belzile, C. J., Chapin, J. S., Naugle, R. I., Najm, I. M., et al. (2013). Working memory and intelligence are associated with
Victoria Symptom Validity Test hard item performance in patients with intractable epilepsy. Journal of the International Neuropsychological Society, 19,
314–323.
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C. (2010). Sensitivity and specificity of a Digit Symbol Recognition Trial in the identifi-
cation of response bias. Archives of Clinical Neuropsychology, 25, 420–428.
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cottingham, M. E., et al. (2010). The Warrington Recognition Memory Test for Words as a
Measure of Response Bias: Total score and response time cutoffs developed on “Real World” credible and noncredible subjects. Archives of Clinical
Neuropsychology, 25, 60–70.
LaDuke, C., Barr, W., Brodale, D. L., & Rabin, L. A. (2018). Toward generally accepted forensic assessment practices among clinical neuropsychologists: A
survey of professional practice and common test use. The Clinical Neuropsychologist, 32, 145–164.
Lezak, M. D. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press.
Love, C. M., Glassmire, D. M., Zanolini, S. J., & Wolf, A. (2014). Specificity and false positive rates of the Test of Memory Malingering, Rey 15-Item Test,
and Rey Word Recognition Test among forensic inpatients with intellectual disabilities. Assessment, 21, 618–627.
Marshall, P., & Happe, M. (2007). The performance of individuals with mental retardation on cognitive tests assessing effort and motivation. The Clinical
Neuropsychologist, 21, 826–840.
Martin, P. K., Schroeder, R. W., & Odland, A. P. (2015). Neuropsychologists’ validity testing beliefs and practices: A survey of North American profes-
sionals. The Clinical Neuropsychologist, 29, 741–776.
McCaul, C., Boone, K. B., Ermshar, A., Cottingham, M., Victor, T. L., Ziegler, E., et al. (2018). Cross-validation of the Dot Counting Test in a large sample
of credible and non-credible patients referred for neuropsychological testing. Clinical Neuropsychologist, 32, 1054–1067.
Morse, C. L., Douglas-Newman, K., Mandel, S., & Swirsky-Sacchetti, T. (2013). Utility of the Rey-15 recognition trial to detect invalid performance in a
forensic neuropsychological sample. The Clinical Neuropsychologist, 27, 1395–1407.
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F., Lu, P. H., Victor, T. L., et al. (2013). Cross validation of the Lu and colleagues (2003) Rey-
Osterrieth Complex Figure Test effort equation in a large known-group sample. Archives of Clinical Neuropsychology, 28, 30–37.
Roberson, C. J., Boone, K. B., Goldberg, H., Miora, D., Cottingham, M., Victor, T., et al. (2013). Cross validation of the b Test in a large known groups sam-
ple. The Clinical Neuropsychologist, 27, 495–508.
Robles, L., López, E., Salazar, X., Boone, K. B., & Glaser, D. F. (2015). Specificity data for the b Test, Dot Counting Test, Rey-15 Item Plus Recognition,
and Rey Word Recognition Test in monolingual Spanish-speakers. Journal of Clinical and Experimental Neuropsychology, 37, 614–621.
Rohling, M. L., Binder, L. M., Demakis, G. J., Larrabee, G. J., Ploetz, D. M., & Langhinrichsen-Rohling, J. (2011). A meta-analysis of neuropsychological
outcome after mild traumatic brain injury: Re-analyses and reconsiderations of Binder et al., Frencham et al., and Pertab et al. The Clinical
Neuropsychologist, 25, 608–623.
Schroeder, R. W., Martin, P. K., & Odland, A. P. (2016). Expert beliefs and practices regarding neuropsychological validity testing. The Clinical
Neuropsychologist, 30, 515–535.
Sherman, D. S., Boone, K. B., Lu, P., & Razani, J. (2002). Re-examination of a Rey Auditory Verbal Learning Test/Rey Complex Figure discriminant func-
tion to detect suspect effort. The Clinical Neuropsychologist, 16, 242–250.
Smith, K., Boone, K., Victor, T., Miora, D., Cottingham, M., Ziegler, E., et al. (2014). Comparison of credible patients of very low intelligence and non-
credible patients on neurocognitive performance validity indicators. The Clinical Neuropsychologist, 28, 1048–1070.
1380 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380

Solomon, R. E., Boone, K. B., Miora, D., Skidmore, S., Cottingham, M., Victor, T., et al. (2010). Use of the WAIS-III Picture Completion Subtest as an
embedded measure of response bias. The Clinical Neuropsychologist, 24, 1243–1256.
Victor, T. L., Boone, K. B., Serpa, J. G., Buehler, J., & Ziegler, E. A. (2009). Interpreting the meaning of multiple symptom validity test failure. The Clinical
Neuropsychologist, 23, 297–313.
Whiteside, D., Wald, D., & Busse, M. (2011). Classification accuracy of multiple visual spatial measures in the detection of suspect effort. The Clinical
Neuropsychologist, 25, 287–301.
Young, J. C., Roper, B. L., & Arentsen, T. J. (2016). Validity testing and neuropsychology practice in the VA healthcare system: Results from recent practi-
tioner survey. The Clinical Neuropsychologist, 30, 497–514.

Downloaded from https://academic.oup.com/acn/article/34/8/1367/5158231 by guest on 13 March 2023

You might also like