Acy 087
Acy 087
Abstract
Objective: Evaluate the effectiveness of Rey 15-item plus recognition data in a large neuropsychological sample.
Method: Rey 15-item plus recognition scores were compared in credible (n = 138) and noncredible (n = 353) neuropsychology referrals.
Results: Noncredible patients scored significantly worse than credible patients on all Rey 15-item plus recognition scores. When cut-offs
were selected to maintain at least 89.9% specificity, cut-offs could be made more stringent, with the highest sensitivity found for recognition
correct (cut-off ≤11; 62.6% sensitivity) and the combination score (recall + recognition – false positives; cut-off ≤22; 60.6% sensitivity),
followed by recall correct (cut-off ≤11; 49.3% sensitivity), and recognition false positive errors (≥3; 17.9% sensitivity). A cut-off of ≥4
applied to a summed qualitative error score for the recall trial resulted in 19.4% sensitivity. Approximately 10% of credible subjects failed
either recall correct or recognition correct, whereas two-thirds of noncredible patients (67.7%) showed this pattern. Thirteen percent of cred-
ible patients failed either recall correct, recognition correct, or the recall qualitative error score, whereas nearly 70% of noncredible patients
failed at least one of the three. Some individual qualitative recognition errors had low false positive rates (<2%) indicating that their pres-
ence was virtually pathognomonic for noncredible performance. Older age (>50) and IQ < 80 were associated with increased false positive
rates in credible patients.
Conclusions: Data on a larger sample than that available in the 2002 validation study show that Rey 15-item plus recognition cut-offs can
be made more stringent, and thereby detect up to 70% of noncredible test takers, but the test should be used cautiously in older individuals
and in individuals with lowered IQ.
Keywords: Rey 15-item plus recognition; Performance validity; Malingering
Introduction
The Rey 15-item Test (Lezak, 1995), originally developed more than 50 years ago and often viewed as outdated, in fact
continues to be commonly used as a free-standing performance validity test (PVT). Survey data from Martin, Schroeder, and
Odland (2015) and LaDuke, Barr, Brodale, and Rabin (2018) showed that among practicing neuropsychologists, nearly a
quarter (23%–24.1%) reported using the Rey 15-item Test, and in a survey of experts in performance validity assessment
(Schroeder, Martin, & Odland, 2016), a third reported using the test. Similarly, the Rey 15-item Test has been reported to be
employed by 43.8% of neuropsychologists practicing in Veterans Administration settings (Young, Roper, & Arentsen, 2016).
The original version of the Rey 15-Item Test involves recall of 15 inter-related items to which test takers are briefly
exposed. Boone, Salazar, Lu, Warner-Chacon, and Razani (2002) found that using the recommended cut-off of <9 items
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].
doi:10.1093/arclin/acy087 Advance Access publication on 1 December 2018
1368 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380
reproduced, specificity in a credible neuropsychology clinic sample was high (97%–100%), but sensitivity was only 47%.
They attempted to increase Rey 15-item sensitivity by adding a recognition task to follow the recall trial that consists of a
page containing the original 15 items along with 15 foils similar to the target items. The validation sample included 49
compensation-seeking noncredible patients, 36 credible non-compensation-seeking neuropsychology clinic patients, 33 learn-
ing disabled college students, and 60 non-clinical controls. A cut-off of <20 applied to an equation involving recall and recog-
nition data (recall plus recognition minus recognition false positives) resulted in 92% specificity and sensitivity of 71%.
Subsequently, Morse, Douglas-Newman, Mandel, and Swirsky-Sacchetti (2013), in examining the equation in 29 noncred-
ible patients, 63 litigating patients with valid neurocognitive performance, 36 learning disabled individuals, and 54 non-
litigating neuropsychological patients, found the equation cut-off could be raised to <21 and still maintain an acceptable false
positive rate (<8%) while achieving 70% sensitivity. In contrast, Bailey, Soble, and O’Rourke (2018), in an investigation of
the Rey 15-item plus recognition equation in a mixed clinical sample of veterans (44 who passed the Word Memory Test
[WMT] and 18 who failed), documented unacceptable specificity using the cut-off of <20 (75%) as well as lowered sensitiv-
numeral errors (<10%), whereas repetition errors (15.1%), row sequence errors (11.3%), within row errors (32.1%), and
wrong item errors (11.3%) were more common. The errors that were more prominent in the possible malingering group as
compared to the credible disabled group included between row errors (11.0% vs. 3.8%), gestalt errors (14.3% vs. 1.9%),
Roman numeral errors (13.2% vs. 1.9%), row sequence errors (26.4% vs. 11.3%), and wrong item errors (19.8% vs. 11.3%).
The instructed malingerers committed a relatively high percentage of between row errors (26.7%), embellishment errors
(13.3%), repetition errors (28.9%), Roman numeral errors (13.3%), row sequence errors (42.2%), within row errors (42.2%),
and wrong item errors (28.9%).
This study is intriguing but problematic methodologically. First, test instructions were modified to enhance the number of
potential scorable errors (i.e., subjects were instructed to reproduce items “just as they appear on the card”). Second, the credi-
ble patient group was substantially disabled (required residential care) and included individuals with intellectual disability. As
discussed above, Dean and colleagues (2008) found that individuals with IQ in the 60–69 range failed a large percentage of
PVTs administered, despite performing to true ability. Studies that include such low functioning credible subjects produce
Methods
Participants
Archival data were accessed from outpatients seen for neuropsychological assessment at the Neuropsychological Service at
Harbor-UCLA Medical Center and patients tested in the private forensic practice of the second author. Patients evaluated in
the former setting were referred by treating psychiatrists or neurologists for diagnostic clarification, case management, and/or
determination of appropriateness for disability compensation. Patients tested in the latter setting were either evaluated in the
context of litigation or at the request of private disability carriers. All patients were fluent in English. Use of the archival data
was approved by the IRB at Alliant International University. We did not exclude patients from the Boone and colleagues
(2002) study because qualitative errors were not examined in that publication; however, at least 303 new noncredible patients
and at least 102 new credible patients were examined in the current study.
Credible patients. The 138 patients assigned to the credible group met the following criteria: (a) no motive to feign symptoms
(not in litigation or attempting to secure disability compensation), (b) failure on one or fewer PVTs out of a total of up to nine
administered (listed with cut-offs in Table 1; note: scores from the same test were counted as a single failure; due the clinical
nature of the data, not all PVTs were available for all patients), and (c) no FSIQ <80 or dementia diagnoses. The nine PVTs
were selected to sample a variety of neurocognitive domains, including processing speed, attention, visual perception, verbal
memory, visual memory, and motor dexterity, to insure that differing types of neurocognitive symptom feigning were as-
sessed. Patients who failed a single PVT were retained in the credible sample given evidence that failure on a single PVT is
not unusual in credible populations (Victor, Boone, Serpa, Buehler, & Ziegler, 2009), and that the expected number of PVT
failures when nine are administered is one (Davis & Millis, 2014). It was judged that exclusion of patients with no motive to
feign and who failed a single PVT might be problematic because they likely have more actual cognitive dysfunction (leading
to the PVT failure), and removal of these individuals could result in a spurious raising of Rey 15-item plus recognition speci-
ficity rates. To more thoroughly evaluate the impact of retention of credible patients with a single PVT failure on study
1370 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380
Table 1. PVTs used for group assignment, and numbers of patients completing each PVT
Processing Speed:
b Test Roberson and colleagues (2013)
E-score ≥82 (88 credible, 322 noncredible)
Dot Counting Boone and colleagues (2002)
E-score ≥17 (130 credible, 330 noncredible)
Motor Dexterity:
Finger Tapping (dominant hand) Arnold and colleagues (2005)
Men ≤35 (68 credible, 180 noncredible)
Women ≤28 (61 credible, 127 noncredible)
Visual Memory:
Rey Osterrieth Reedy and colleagues (2013)
Effort Equation ≤50 (120 credible, 302 noncredible)
Digit Symbol Kim, N. and colleagues (2010))
results, those failing versus not failing a single PVT were compared on Rey 15-item plus recognition scores (these analyses
are discussed in the Results section below).
Participants with a WAIS-III FSIQ of less than 80 or diagnoses of dementia were excluded due to evidence that these
groups fail PVTs at a high rate despite performing to true ability (Dean et al., 2008; Dean, Victor, Boone, Philpott, & Hess,
2009; Smith et al., 2014); retention of these participants results in cut-off scores that have lowered sensitivity. A listing of the
frequencies of final diagnoses is shown in Table 2, and demographic data are reproduced in Table 3.
Noncredible patients. The 353 subjects assigned to the noncredible group met the following criteria: (a) motive to feign
symptoms (in litigation or attempting to secure disability compensation), (b) failure on at least two independent PVTs (listed
with cut-offs in Table 1; due to the clinical nature of the data, not all PVT data were available for all patients), and (c) evi-
dence that low cognitive scores were inconsistent with normal function in activities of daily living. Unfortunately, the same
exclusion criteria (i.e., FSIQ <80 and presenting diagnoses of dementia) used for the credible group could not be employed
because these data are not accurate in an unknown percentage of compensation-seeking participants. Studies have shown that
noncredible compensation-seekers obtain much lower IQ scores than do credible patient groups without motive to feign (e.g.,
Bianchini, Mathias, Greve, Houston, & Crouch, 2001) because the former are not performing to true ability on the IQ mea-
sures. Likewise, noncredible patients can be incorrectly assigned diagnoses of dementia when they do not perform to true abil-
ity on memory testing.
The approach we used to confirm appropriateness of assignment to the noncredible group was by checking for a mismatch
between low cognitive scores and evidence of normal function in ADLs (e.g., dementia-level memory scores but able to live
independently, work, drive, handle his or her own finances, etc.). If such a mismatch was present, the participant was retained
in the noncredible group. However, if individuals had verifiable evidence of low cognitive function and adaptive failures out-
side of the evaluation context that could account for their PVT failure (e.g., not able to live independently, had never held
employment or been able to drive, had a guardian or conservator, etc.), they were excluded from the noncredible group.
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1371
Anoxia 4 11
R/O Anoxia 1 —
Anxiety/Panic Disorder 5 4
Asperger’s Syndrome (R/O) 1 —
Attention Deficit Disorder 5 —
R/O Attention Deficit Disorder 1 1
Bipolar Disorder 7 1
Brain Tumor/Abscess 2 2
Chronic Fatigue — 2
Chronic Pain — 7
Cognitive Disorder NOS 2 7
—
Table 2 shows the distribution of presenting/claimed diagnoses; cognitive complaints attributed to these conditions were
ultimately determined to have been feigned or exaggerated. Demographic data are provided in Table 3. Tabulation of the fre-
quency of PVT failures (out of a total of 9) revealed that 8.8% of the noncredible sample failed two PVTs (n = 31), 12.7%
failed three PVTs (n = 45), 18.4% failed four PVTs (n = 65), 17.3% failed five PVTs (n = 61), 20.1% failed six PVTs (n =
71), 14.2% failed seven PVTs (n = 50), 6.5% failed eight PVTs (n = 23), and 2.0% failed all nine PVTs (n = 7).
Patients were only assigned to groups if they met all criteria for group assignment; patients failing ≤1 PVTs but with
motive to feign, and patients failing two or more PVTs but with no motive to feign, were not included in the study in an
attempt to enhance accuracy of group assignment. For example, we judged that in the subgroup of patients who failed one or
fewer PVTs but had motive to feign, it was not appropriate to assign them to the credible group since they might still have
been feigning (given incentive to do so), but had not been detected (given imperfect PVT sensitivity). Likewise, we judged
that individuals failing two or more PVTs in the absence of motive to feign should not be assigned to the noncredible group
since feigning is rare when there is no incentive to do so, and that this subgroup was most likely populated by patients who
failed PVTs due to true, substantial cognitive dysfunction.
1372 K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380
All subjects were administered the Rey 15-item plus recognition according to standard procedures (see Boone et al., 2002)
as part of a neuropsychological assessment. Scores used for analysis included: (1) Recall correct, (2) Recognition correct, (3)
Recognition false positive errors, (4) Combination equation (recall + recognition – false positives), (5) frequencies of all 15
recognition errors, and frequencies of various types of qualitative errors committed during recall (listed in Table 5), including
several employed by Griffin and colleagues (1996).
Results
As shown in Table 3, groups did not differ in education, but did differ significantly in age, although the difference was
less than 4 years. Correlations computed in each group separately showed that age was significantly related to recall correct (r
= −.186, p = .029), recognition false positive errors (r = .181, p = .033), and the combination equation (r = −.232, p = .006)
in the credible group; and in the noncredible group was significantly correlated with recall correct (r = −.166, p = .002) and
the combination equation (r = −.135, p = .011). However, age accounted for ≤5% test score variance and was not further
considered in analyses.
As reproduced in Table 3, significant differences were found, using independent t-tests (and confirmed with Mann–
Whitney U analyses given the generally non-normal distribution of test scores) between groups in recall correct, recognition
correct, recognition false positive errors, and the combination equation, with the noncredible group performing worse. When
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1373
the previously published combination equation cut-off of <20 was employed, specificity was excellent (97.8%), with sensitiv-
ity of 47.6%. However, as shown in Table 4, when cut-offs were selected to maintain at least 89.9% specificity, the highest
sensitivity was found for recognition correct (cut-off ≤11; 62.6% sensitivity) and the combination score (cut-off ≤22; 60.6%
sensitivity), followed by recall correct (cut-off ≤11; 49.3% sensitivity), and recognition false positive errors (≥3; 17.9% sensi-
tivity). In Table 5 are reproduced the positive and negative predictive power values for the combination score at various base
rates of noncredible performance.
Receiver Operating Characteristic (ROC) curves allow examination of the global utility of each test score regardless of any
particular cut-off point. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between
groups. AUCs for the four Rey 15-item scores were .770 (95% CI = .735–.819) for recall correct, .850 (95% CI = .816–.884)
for recognition correct, .843 (95% CI = .808–.878) for the combination score, and .601 (95% CI = .549–.654) for recognition
false positives, confirming superior classification accuracy for recognition correct and the combination score.
Credible patients who failed one (n = 86) versus 0 (n = 52) PVTs were compared on the Rey 15-item plus recognition
Table 4. Specificity and sensitivity rates for Rey 15-item plus recognition cut-offs
Cut-off Recall Correct Recall Qualit. Recog. Correct Recog. FP Combo Score
Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens. Spec. Sens.
The Rey 15-item combination cut-off was then applied to the largest noncredible subgroups (mild traumatic brain injury,
n = 141; psychosis, n = 30 [one patient with claimed psychosis did not have combination score data]; severe traumatic brain
injury, n = 22; depression, n = 23) to determine cut-off sensitivity in the context of these claimed diagnoses. The combination
score cut-off achieved 90.0% sensitivity in noncredible psychosis, 78.3% sensitivity in noncredible depression, and 63.6% in
noncredible severe traumatic brain injury, but only 45.4% sensitivity in noncredible mild traumatic brain injury.
Qualitative errors for both recall and recognition trials were tabulated. As can be seen in Table 6, some recall errors
occurred equally frequently in both groups (row sequence, Roman numeral, embellishment), but every other type of error was
over-represented in the noncredible group, and several occurred three to four times more often in noncredible subjects (e.g.,
Table 5. Positive predictive values (PPV) and negative predictive values (NPV) for the Rey 15-Item combination score at 15%, 30%, and 50% base rates of
noncredible performance
Cut-off Scores 15% Base Rate 30% Base Rate 50% Base Rate
Table 6. Percentage within each group committing each type of recall error
Recall error Credible Noncredible
(n = 138) (n = 353)
gestalt errors, no ABC, continue sequence). A recall qualitative error score was calculated by adding the number of qualitative
errors committed (excluding the three types of errors which occurred equally in both groups; as shown in Table 6, some error
types could be counted more than once). As reproduced in Table 3, the noncredible group obtained a significantly higher
recall qualitative error score; as shown in Table 4, a cut-off score of ≥4 achieved adequate specificity (≥90%), with sensitivity
of 19.4%. The following revised combination score (which included the recall qualitative error score) was calculated:
(recall minus recall qualitative error score) + (recognition minus false positive errors)
Application of a cut-off of ≤21 resulted in 90.5% specificity, and achieved sensitivity of 62.0%, which was generally compa-
rable to the sensitivity level of the original combination score.
In Table 7 are reproduced the rates at which each type of recognition error was committed in each group separately.
Table 7. Percentage within each group committing each type of recognition false positive error
Table 8. Demographic, IQ, and diagnostic characteristics of credible subjects falling below combination score cut-off of ≤22
Age Gender Education Language Ethnicity Diagnosis FSIQ
Discussion
In the current study, credible patients (n = 138) scored significantly better than noncredible patients (n = 353) on Rey 15
recall correct, recognition correct, recognition false positive errors, and the combination score. When cut-offs were selected to
maintain at least 89.9% specificity, the highest sensitivity was found for recognition correct (cut-off ≤11; 62.6% sensitivity)
and the combination score (cut-off ≤22; 60.6% sensitivity), followed by recall correct (cut-off ≤11; 49.3% sensitivity), and
recognition false positive errors (≥3; 17.9% sensitivity).
In the original validation of the Rey-15 plus recognition (Boone et al., 2002), the recall correct cut-off had to be maintained
at the traditional cut-off of <9 to achieve adequate specificity (≥90%) in the sample of 36 credible clinic patients, whereas in
the current study, involving a much larger credible sample, the cut-off could be raised to ≤11 while still limiting false positive
rates to <10%. Similarly, all other score cut-offs could be made more stringent in the current study as compared to the 2002
investigation (i.e., the original recognition correct cut-off associated with ≥90% specificity was <10, the original combination
score cut-off was <20, and the original recognition false positive error cut-off was ≥4). The likely explanation for the higher
specificity rates in the current study is that credible patients with WAIS-III FSIQ <80 were excluded. Individuals with low
average to very superior IQs fail less than 10% of PVTs administered, whereas individuals with IQs of 70–79 fail 17%, and
individuals with IQs 60–69 fail 44% (Dean et al., 2008). Thus, it is appropriate to limit credible patient samples to individuals
with IQs ≥80 when determining generic PVT cut-offs, but customized cut-offs are required for individuals with borderline
and lower IQ levels (see Smith et al., 2014). When individuals of low intellectual scores are included in credible groups for
PVT validation studies, the resulting cut-offs will have lowered sensitivity because cut-offs need to be adequately protective
of low IQ patients. This renders PVTs less sensitive in those populations with low average and higher IQs. In the current
study, 13 patients met criteria for the credible group with the exception that FSIQ was 70–79; to maintain false positives rates
of <10% in this subgroup, the recall correct cut-off had to be lowered to <10, the recognition correct cut-off had to be low-
ered to <9, and the combination score cut-off had to be lowered to <20, confirming the significant impact of lowered intelli-
gence on Rey 15-item cut-offs.
We suspect that the lowered specificity rates found by Bailey and colleagues (2018) for recall correct and combination
scores was at least partially due to the older age of the credible sample (mean = 53.34) and the fact that patients with major
neurocognitive disorders were included (dementia, amnestic disorder, as well as likely lowered IQ). Examination of current
credible subjects who fell below cut-offs for the combination equation showed that they tended to be older than the larger
credible group. Rey 15-item plus recognition cut-offs had to be adjusted for older individuals (age > 50) to maintain specific-
ity of at least 90% (e.g., combination equation <20).
Further, in the Bailey and colleagues (2018) study, group assignment was based on performance on a single PVT (Word
Memory Test; WMT) which has a not inconsequential error rate (sensitivity of 67%–78% and specificity of 70%–81% in trau-
matic brain injury; Greve, Ord, Curtis, Bianchini, & Brennan, 2008), and which likely led to errors in group membership. The
WMT is a verbal forced choice task, while the Rey 15-item is a visual memory free recall and nonforced choice recognition
measure, and it is probable that the two PVTs tap somewhat differing approaches to noncredible test performance and do not
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1377
provide redundant information; in fact, Bailey and colleagues (2018) documented that the correlation between the WMT and
the Rey −15 item scores ranged from .393 to .599, reflecting at most 36% shared score variance.
Morse and colleagues (2013) found more similar specificity rates to those documented in the current study, and it can be
reasonably assumed that if patients with IQ 70–79, and not just patients with IQ < 70, had been excluded from the Morse and
colleagues (2013) credible sample, the recommended cut-offs would have been even more similar to those found to be opti-
mal in the current study.
Test cut-offs were somewhat less sensitive in the current noncredible sample as compared to noncredible subjects included
in the original 2002 validation study. For example, a recall correct cut-off of <9 was associated with 47% sensitivity in the
2002 study as compared to only 24.6% in the current study. Likewise, the combination score of <20 detected 71% of non-
credible patients in the 2002 study, and only 48.7% in the current study. However, because test cut-offs could be raised from
those recommended in the 2002 study, overall sensitivity rates are similar across the two studies. For example, while a combi-
nation cut-off of <20 identified 71% of noncredible subjects in the 2002 study, in the current study, a score of ≤11 on either
28.9% of simulators, findings highly similar to current findings. Because of low sensitivity, absence of individual qualitative
recall errors cannot be used to rule out noncredible performance, but due to low false positive rates, presence of these errors
can reasonably be used to rule in noncredible performance. A revised combination score, incorporating the recall qualitative
error score, achieved sensitivity (62%) that was essentially comparable to that of the original combination score (60%). As
such, use of the revised score does not appear justified given the extra time required to calculate it.
In line with emerging literature (Smith et al., 2014; Whiteside, Wald, & Busse, 2011), examination of individual scores in
combination showed increased sensitivity at no to minimal sacrifice to specificity. Approximately 10% of credible subjects
failed either recall correct or recognition correct cut-offs of ≤11, whereas over two-thirds (68%) of noncredible subjects
showed this pattern. Thirteen percent of credible patients failed either recall correct ≤11 or recognition correct ≤11 or recall
qualitative error score ≥4, whereas nearly 70% of noncredible patients failed at least one of the three cut-offs. This argues
that future studies should continue to examine use of various scores from individual PVTs in combination.
When assigning patients to credible and noncredible groups, we adopted the procedure of only counting a PVT as failed
Conflict of Interest
None declared.
Disclosure
Drs. Boone, Ermshar, Miora, Victor, and Ziegler are forensic consultants.
Disclaimer
The views expressed herein do not necessarily represent the views of the Tennessee Valley Healthcare System or the
United States.
K. Poynter et al. / Archives of Clinical Neuropsychology 34 (2019); 1367–1380 1379
References
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., et al. (2005). Sensitivity and specificity of finger tapping test scores for the detection of suspect
effort. Clinical Neuropsychologist, 19, 105–120.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity of various digit span scores in the detection of suspect effort. The Clinical
Neuropsychologist, 20, 145–159.
Bailey, K. C., Soble, J. R., & O’Rourke, J. J. (2018). Clinical utility of the Rey 15-Item Test, recognition trial, and error scores for detecting noncredible
neuropsychological performance in a mixed clinical sample of veterans. The Clinical Neuropsychologist, 32, 119–131.
Bianchini, K. J., Mathias, C. W., Greve, K. W., Houston, R. J., & Crouch, J. A. (2001). Classification accuracy of the Portland Digit Recognition Test in trau-
matic brain injury. The Clinical Neuropsychologist, 15, 461–470.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology: An evidence-based approach. New York: Taylor & Francis.
Boone, K. B. (2017). Assessment of neurocognitive performance validity. In Ricker J., & Morgan J. (Eds.), Textbook of clinical neuropsychology (2nd ed).
Abingdon, OX, New York: Taylor and Francis.
Boone, K. B., Lu, P., & Wen, J. (2005). Comparison of various RAVLT scores in the detection of noncredible memory performance. Archives of Clinical
Solomon, R. E., Boone, K. B., Miora, D., Skidmore, S., Cottingham, M., Victor, T., et al. (2010). Use of the WAIS-III Picture Completion Subtest as an
embedded measure of response bias. The Clinical Neuropsychologist, 24, 1243–1256.
Victor, T. L., Boone, K. B., Serpa, J. G., Buehler, J., & Ziegler, E. A. (2009). Interpreting the meaning of multiple symptom validity test failure. The Clinical
Neuropsychologist, 23, 297–313.
Whiteside, D., Wald, D., & Busse, M. (2011). Classification accuracy of multiple visual spatial measures in the detection of suspect effort. The Clinical
Neuropsychologist, 25, 287–301.
Young, J. C., Roper, B. L., & Arentsen, T. J. (2016). Validity testing and neuropsychology practice in the VA healthcare system: Results from recent practi-
tioner survey. The Clinical Neuropsychologist, 30, 497–514.