The National Academies Press: Psychological Testing in The Service of Disability Determination (2015)
The National Academies Press: Psychological Testing in The Service of Disability Determination (2015)
The National Academies Press: Psychological Testing in The Service of Disability Determination (2015)
DETAILS
CONTRIBUTORS
GET THIS BOOK Committee on Psychological Testing, Including Validity Testing, for Social
Security Administration Disability Determinations; Board on the Health of
Select Populations; Institute of Medicine
FIND RELATED TITLES
SUGGESTED CITATION
Visit the National Academies Press at NAP.edu and login or register to get:
Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press.
(Request Permission) Unless otherwise indicated, all materials in this PDF are copyrighted by the National Academy of Sciences.
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the
Governing Board of the National Research Council, whose members are drawn
from the councils of the National Academy of Sciences, the National Academy of
Engineering, and the Institute of Medicine. The members of the committee respon-
sible for the report were chosen for their special competences and with regard for
appropriate balance.
Additional copies of this report are available for sale from the National Academies
Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or
(202) 334-3313; http://www.nap.edu.
For more information about the Institute of Medicine, visit the IOM home page
at: www.iom.edu.
The serpent has been a symbol of long life, healing, and knowledge among almost
all cultures and religions since the beginning of recorded history. The serpent ad-
opted as a logotype by the Institute of Medicine is a relief carving from ancient
Greece, now held by the Staatliche Museen in Berlin.
The National Academy of Engineering was established in 1964, under the charter
of the National Academy of Sciences, as a parallel organization of outstanding en-
gineers. It is autonomous in its administration and in the selection of its members,
sharing with the National Academy of Sciences the responsibility for advising the
federal government. The National Academy of Engineering also sponsors engineer-
ing programs aimed at meeting national needs, encourages education and research,
and recognizes the superior achievements of engineers. Dr. C. D. Mote, Jr., is presi-
dent of the National Academy of Engineering.
The National Research Council was organized by the National Academy of Sciences
in 1916 to associate the broad community of science and technology with the
Academy’s purposes of furthering knowledge and advising the federal government.
Functioning in accordance with general policies determined by the Academy, the
Council has become the principal operating agency of both the National Academy
of Sciences and the National Academy of Engineering in providing services to
the government, the public, and the scientific and engineering communities. The
Council is administered jointly by both Academies and the Institute of Medicine.
Dr. Ralph J. Cicerone and Dr. C. D. Mote, Jr., are chair and vice chair, respectively,
of the National Research Council.
www.national-academies.org
vi
Reviewers
vii
viii REVIEWERS
or recommendations nor did they see the final draft of the report before its
release. The review of this report was overseen by Nancy Adler, University
of California, San Francisco, and Randy Gallistel, Rutgers University.
Appointed by the National Research Council and the Institute of Medicine,
they were responsible for making certain that an independent examination
of this report was carried out in accordance with institutional procedures
and that all review comments were carefully considered. Responsibility for
the final content of this report rests entirely with the authoring committee
and the institution.
Preface
ix
x PREFACE
Contents
SUMMARY 1
1 INTRODUCTION 15
Committee’s Approach to Its Charge, 20
Report Organization, 30
References, 31
xi
xii CONTENTS
APPENDIXES
BOXES
S-1 Statement of Task, 4
FIGURES
S-1 Components of psychological assessment, 3
xiii
TABLES
1-1 Characteristics of SSDI and SSI Beneficiaries, 2012, 16
1-2 SSDI and SSI Beneficiaries by Diagnostic Category, 2012, 17
1-3 Definitions of Psychological Terms, 26
3-1 Listings for Mental Disorders and Types of Psychological Tests, 107
xv
Summary1
BACKGROUND
In 2012, the U.S. Social Security Administration (SSA) provided bene
fits to nearly 15 million disabled adults and children through two disabil-
ity programs. The majority of beneficiaries, 8.8 million, received benefits
through the Social Security Disability Insurance (SSDI) program for dis-
abled individuals, and their dependent family members, who have worked
and contributed to the Social Security trust funds. The remaining beneficia-
ries (4.9 million adults and 1.3 million children) received benefits through
the Supplemental Security Income (SSI) program, which is a means-tested
program based on income and financial assets for adults aged 65 years or
older and disabled adults and children.
SSA disability determinations are based on the medical evidence and all
evidence considered relevant by the examiners in an applicant’s case record.
Physical or mental impairments must be established by objective medical
evidence consisting of medical signs and laboratory findings, which may
include psychological tests and other standardized test results. SSA estab-
lishes the presence of a medically determinable impairment in individuals
with mental disorders other than intellectual disability through the use of
standard diagnostic criteria, which include symptoms and signs. Evidence
for these mental impairment claims, as well as for many other categories
of claims, such as those for certain musculoskeletal and connective tissue
1 This summary does not include references. Citations to support text, conclusions, and
recommendations made herein are provided in the body of the report.
2 Psychological Testing
conditions, relies less on standard laboratory tests than for some other
categories of impairment.
SSA maintains a list of criteria for specific conditions that an appli-
cant with one or more of those conditions must meet in order to receive
disability benefits based solely on medical criteria. SSA currently requires
psychological test results, specifically intelligence test results, in the listing
criteria for intellectual disability in children and adults and in the criteria
for cerebral palsy, convulsive epilepsy, and meningomyelocele and related
disorders. SSA questions the value of purchasing psychological testing in
cases involving mental disorders, other than for intellectual disability, and it
does not require testing either to establish or to assess the severity of other
mental disorders.
As noted, SSA indicates that objective medical evidence may include
the results of standardized psychological tests. Given the great variety
of psychological tests, some are more objective than others. Whether a
psychological test is appropriately considered objective has much to do
with the process of scoring. For example, unstructured measures that call
for open-ended responding rely on professional judgment and interpreta-
tion in scoring; thus, such measures are considered less than objective.
In contrast, standardized psychological tests and measures, such as those
discussed in the report, are structured and objectively scored. In the case of
non-cognitive self-report measures, the respondent generally answers ques-
tions regarding typical behavior by choosing from a set of predetermined
answers. With cognitive tests, the respondent answers questions or solves
problems, which usually have correct answers, as well as he or she possibly
can. Such measures generally provide a set of normative data (i.e., norms),
or scores derived from groups of people for whom the measure is designed
(i.e., the designated population), to which an individual’s responses or per-
formance can be compared. Therefore, standardized psychological tests and
measures rely less on clinical judgment and are considered to be more ob-
jective than those that depend on subjective scoring. Unlike measurements
such as weight or blood pressure, standardized psychological tests require
the individual’s cooperation with respect to self-report or performance on
a task. The inclusion of validity testing in the test or test battery allows for
greater confidence in the test results. Standardized psychological tests that
are appropriately administered and interpreted can be considered objective
evidence.
As illustrated in Figure S-1, standardized psychological testing is one
component of a full psychological assessment. Standardized psychological
tests can be divided into measures of typical behavior and tests of maximal
performance. Measures of typical behavior, such as personality, interests,
values, and attitudes, may be referred to as non-cognitive measures. Tests of
maximal performance ask people to answer questions and solve problems
SUMMARY 3
Psychological assessment
Standardized
Clinical interview Observations Record review
psychological tests
Non-cognitive
Cognitive tests
measures
4 Psychological Testing
BOX S-1
Statement of Task
SUMMARY 5
COMMITTEE’S RECOMMENDATIONS
The committee identified three elements of SSA’s disability determination
process in which psychological testing could play a role: (1) identification of
a “medically determinable impairment,” (2) evaluation of functional capac-
ity for work, and (3) assessment of the validity of applicants’ psychological
test results or the consistency of applicants’ statements about self-reported
symptoms. Although this report addresses all three elements, the committee
focuses on the second and the third, for which questions about the use of
psychological tests are more complex. As indicated in the following section,
the committee found that the results of standardized psychological testing
do provide information of value to each of the three elements.
6 Psychological Testing
SUMMARY 7
8 Psychological Testing
SUMMARY 9
10 Psychological Testing
SUMMARY 11
Economic Considerations
Systematic use of standardized psychological testing in SSA disability
evaluations for a broader set of physical and mental impairments than is
current practice will have financial implications. The average cost of testing
services varies by the type of testing (e.g., psychological, neuropsychologi-
cal), by the type of provider (e.g., psychologist or physician, technician),
and by geographic area. The variation in pricing implies that the expected
costs to SSA of requiring psychological testing will depend on exactly which
tests are required, the qualifications mandated for testing providers, and
the geographical location of the providers most in demand. Estimating the
exact cost of broad use of psychological testing by SSA will require more
detailed data on the exact implementation strategy.
At present, there do not appear to be any independently conducted
studies regarding the accuracy of the disability determination process as
implemented by DDS offices. Some published estimates of billions of dollars
in potential cost savings to SSA associated with the use of symptom valid-
ity testing and performance validity testing are based on assumptions that
if violated would substantially lower the estimated cost savings. Potential
cost savings associated with testing vary considerably based on the assump-
tions about who it is applied to and how many individuals it detects and
thus rejects for disability benefits. A full financial cost-benefit analysis of
psychological testing will require SSA to collect additional data both before
and after the implementation of the recommendations of this report.
12 Psychological Testing
Conclusions
• Accurate assessments of the net financial impact of psychological
testing as recommended by the committee will require information
on the current accuracy of DDS decisions and how the accuracy is
affected by the increased use of standardized psychological testing.
• The absence of data on the rates of false positives and false nega-
tives in current SSA disability determinations precludes any assess-
ment of their accuracy and consistency.
• There currently is great variability in allowance rates for both SSI
and SSDI among states that are not fully accounted for by differ-
ences in the populations of applicants. There also is great variability
in the disability determination appeal rulings among administrative
law judges within and across states. Although it is not possible to
know definitively whether the large share of unexplained variation
in state filing, award, and allowance rates is driven by variability in
the federal disability determination process, there is some evidence
that states differ in how they manage claims.
• In light of this unexplained variability, systematic use of standard-
ized psychological testing as recommended by the committee is
expected to improve the accuracy and consistency of disability
determinations.
SUMMARY 13
Over the course of the project, the committee identified two areas in
particular in which it expects that the results of further research would help
to inform disability determination processes as indicated in the following
conclusions and recommendation.
Conclusions
• Additional research is needed on the use of SVTs and PVTs in
populations representative of the pool of disability applicants,
including in terms of gender, ethnicity, race, primary language,
educational level, medical condition, and the like. In particular,
additional research on the development of appropriate criterion
or cutoff scores for PVTs and SVTs in these populations for the
purposes of disability evaluation would be beneficial.
• The committee’s task was to evaluate the usefulness of psychologi-
cal testing in the disability determination process, as reflected in the
foregoing recommendations. However, the committee recognizes
that just as systematic use of standardized psychological testing
is expected to improve the accuracy and consistency of disability
determinations for applicants who allege cognitive impairment or
whose allegation of functional impairment is based solely on self-
report, the use of other standardized assessment tools also may be
expected to improve the accuracy of disability determinations. The
value of standardized assessment tools, including psychological
tests, to assessments of individuals’ work-related functional capac-
ity is an area that would benefit from further research.
14 Psychological Testing
Introduction
15
16 Psychological Testing
NOTE: FRA = full retirement age; SSDI = Social Security Disability Insurance; SSI = Supple-
mental Security Income.
SOURCES: SSA, 2013a, Tables 19 and 20, 2013b, Table 19.
1 SSA guidelines for consultative examination reports are available (SSA, 2015).
INTRODUCTION 17
NOTE: SSDI = Social Security Disability Insurance; SSI = Supplemental Security Income.
SOURCES: SSA, 2013a, Table 21, 2013b, Tables 20, 35, 36.
18 Psychological Testing
3 SeeSocial Security Ruling (SSR) on the Evaluation of Symptoms in Disability Claims: As-
sessing the Credibility of an Individual’s Statements (SSA, 1996).
INTRODUCTION 19
4 In the project background material, the sponsor asked the committee to consider topics
such as the cost of administering these tests, whether the cost varies by location, and the cost
effectiveness (including cost per claim) of requiring a single test or a combination of tests in
the disability evaluation process for physical and mental impairments (Revised project back-
ground, submitted by Joanna Firmin, Social Security Administration, May 23, 2014).
20 Psychological Testing
BOX 1-1
Statement of Task
Concept of Disability
SSA defines disability in adults as
The inability to engage in any substantial gainful activity … by reason of
any medically determinable physical or mental impairment(s) which can
be expected to result in death or which has lasted or can be expected to
last for a continuous period of not less than 12 months. (SSA, n.d., see
also 2012b)
Substantial gainful activity is work that “involves doing significant and
productive physical or mental duties” and “is done (or intended) for pay
INTRODUCTION 21
22 Psychological Testing
BOX 1-2
Major Concepts in the International Classification of
Functioning, Disability and Health
SOURCE: WHO, 2001, pp. 10, 211–214. Reprinted from IOM, 2007b, p. 38.
INTRODUCTION 23
Health Condition
(Disorder or Disease)
F unctioning and
Disability Environmental Personal Factors
Factors
Individual Societal
Figure 1-1
prosthetic leg. Similarly whether an individual is disabled as a result of his
or her functional or activity limitations depends on the accommodations
available to the individual that permit the person to engage in activities he
or she otherwise would be unable to perform (IOM, 1997).
For this reason, disability is not tightly correlated with the presence of
impairment. Both need to be evaluated, but the measures are fundamentally
different, including objective measures (performance and anatomical) and
self-report measures that help determine how usual roles are disrupted. The
linkages among an individual’s anatomy, diagnosis, and impairment are not
sufficient to determine the presence of work disability. As the 2007 IOM
report Improving the Social Security Disability Decision Process states with
respect to work disability:
Work disability … results from the interaction of individuals’ impairments,
functional limitations resulting from the impairments, assistive technolo-
gies to which they may have access, and attitudinal and other personal
characteristics (such as age, education, skills, and work history) with the
physical and mental requirements of potential jobs, accessibility of trans-
portation, attitudes of family members and coworkers, and willingness of
an employer to make accommodations. (IOM, 2007c, p. 26)
Given the complex interaction among the variety of factors that under-
lie a disability, it is clear that disability determinations are multidimensional
and always involve some element of judgment (IOM, 1987). Although
objective medical evidence can indicate the presence of physical or mental
24 Psychological Testing
Psychological Terms
Psychological assessment refers to
the comprehensive integration of information from a variety of sources—
including formal psychological tests, informal tests and surveys, structured
clinical interviews, interviews with others, school and/or medical records,
and observational data—to make inferences regarding the mental or be-
havioral characteristics of an individual or to predict behavior. (Furr and
Bacharach, 2013; Hubley and Zumbo, 2013)
Psychological testing refers to “the use of formal, standardized proce-
dures for sampling behavior that ensure objective evaluation of the test-
taker regardless of who administers the test” (Furr and Bacharach, 2013;
Hubley and Zumbo, 2013).
Major categories of psychological tests include (1) intelligence tests,
(2) neuropsychological tests, (3) personality tests, (4) disorder-specific tests
(e.g., depression, anxiety), (5) achievement tests, (6) aptitude tests, and (7)
occupational or interests tests. The first four categories capture the tests that
are most relevant to disability determinations. Standardized psychological
tests can be divided into measures of typical behavior and tests of maximal
performance. Measures of typical behavior, such as personality, interests,
values, and attitudes, may be referred to as non-cognitive measures. Tests of
maximal performance ask people to answer questions and solve problems
as well as they possibly can. Because tests of maximal performance typi-
cally involve cognitive performance, they are often referred to as cognitive
tests. It is through these two lenses—non-cognitive measures and cognitive
tests—that the committee examined psychological testing for the purpose
of disability evaluation in this report. Intelligence tests and neuropsycho-
logical tests are examples of cognitive-based measures, while depression,
anxiety, or personality inventories are examples of non-cognitive measures.
Psychological tests may also be categorized as performance based and self-
report. Cognitive tests tend to be performance based, and non-cognitive
measures tend to be based on self-report.
A variety of validity tests have been developed to assist examiners in
interpreting the results of different psychological tests. The committee dis-
tinguishes in this report between performance validity tests (PVTs), which
provide information about an individual’s effort on tests of maximal per-
formance, such as cognitive tests, and symptom validity tests (SVTs), which
provide information about the consistency and accuracy of an individual’s
self-report of symptoms he or she is experiencing. PVTs are stand-alone or
INTRODUCTION 25
Credibility
In situations involving the potential for secondary gain—such as mon-
etary gain from a SSA disability payment—there may be motivation for
individuals intentionally to feign or exaggerate symptoms or to exert sub
optimal effort on performance measures in order to present a stronger need
for support or disability benefits. Malingering is the intentional presentation
of false or exaggerated symptoms, intentionally poor performance, or a com-
bination of the two, motivated by external incentives (American Psychiatric
Association, 2013; Bush et al., 2005; Heilbronner et al., 2009). Two key
elements of malingering are intention to deceive or mislead and motivation
to do so for the purpose of achieving some type of secondary gain.
It is important to distinguish between malingering and the credibility
or noncredibility of an individual’s performance or symptom report, even
in situations of potential secondary gain. Individuals might over- or under-
report symptoms or not give their best effort on cognitive-based measures
26 Psychological Testing
INTRODUCTION 27
Psychological assessment
Standardized
Clinical interview Observations Record review
psychological tests
Non-cognitive
Cognitive tests
measures
for any number of reasons. SVTs and PVTs do not in themselves provide
information about the motivations of an examinee5 or the reasons why
his or her performance or symptom report may appear to be noncredible.
Throughout the report, the committee has avoided use of the term malin-
gering when discussing the results of PVTs and SVTs, opting instead to refer
to the credibility or accuracy of an individual’s performance or symptom
report. The committee intends such terms to be value-neutral with respect
to the examinee, referring only to whether the examinee exerted sufficient
effort for the test results to be considered valid and to the consistency and
accuracy of the individual’s statements about the experience of symptoms.
5 Although below chance scores on a PVT can speak to an examinee’s intention—the indi-
vidual knew the answer and deliberately chose the wrong one—they cannot speak directly to
the individual’s motivation (reason) for intentionally choosing the wrong answer.
28 Psychological Testing
Study Focus
Although the report focuses primarily on the use of psychological tests in
disability determinations in adults, the use of such tests in children is also ad-
dressed. There are three areas in SSA’s disability determination process where
psychological testing could be of value: (1) identification of a “medically de-
terminable impairment,” (2) evaluation of functional capacity for work, and
(3) assessment of the validity of claimants’ psychological test results or the
accuracy of statements about self-reported symptoms. Although the report
addresses all three areas, the committee focuses on the second and the third,
where questions about the use of psychological tests are more complex.
In considering its task, the committee observed that the vast number (in
the hundreds) of cognitive and non-cognitive psychological tests available for
use precludes a detailed analysis of each specific test and recommendations
about the use of specific tests. In addition, decisions about which specific
tests are most appropriate for particular individuals in a particular set of
circumstances properly fall in the realm of clinical decision making. Instead,
the committee reviewed categories of psychological tests, including validity
tests, and this report provides general guidance on the use of such tests in SSA
disability determinations for claims involving physical and mental disorders.
It is important to note that SSA specifically requested that the com-
mittee not address the use of intelligence tests in making determinations
about intellectual disability since that topic was previously examined in a
2002 National Research Council (NRC) report titled Mental Retardation:
Determining Eligibility for Social Security Benefits (NRC, 2002).
Consideration of intelligence tests with respect to embedded validity mea-
sures, however, was deemed to be within the committee’s purview.
Information-Gathering Process
The committee conducted an extensive review of the literature pertain-
ing to the use of psychological tests, including PVTs and SVTs, in disability
determinations. The committee began with an English-language literature
search of online databases, including PubMed, Embase, Medline, Web of
Science, Scopus, PsychINFO, Government Accountability Office (GAO),
Congressional Research Service, Google, Google Scholar, and Legistorm
(GAO reports, congressional memorandums). Additional literature and other
resources were identified by committee members and project staff using
traditional academic research methods and online searches. Attention was
given to consensus and position statements issued by relevant experts and
professional organizations.
The committee used a variety of sources to supplement its review of the
literature. It met in person five times and held two public workshops to hear
INTRODUCTION 29
from invited experts in areas pertinent to the topic (see Appendix A for the
open session agendas and speaker lists). Speakers included neuropsycholo-
gists with expertise in performance and symptom validity testing in adults
and children, the use of psychological and validity tests in culturally diverse
populations, and the use of such tests in non-SSA disability determination
contexts (e.g., private disability insurance programs, Canadian auto insur-
ance, U.S. military disability or return-to-duty decisions, veterans’ disability
compensation). The committee also heard from SSA and DDS representa-
tives about the SSA disability determination process and its current policies
surrounding the use of psychological and validity testing.
In addition, the committee commissioned two papers to provide addi-
tional critical analysis in areas relevant to the committee’s work. One paper
addresses issues of diversity (e.g., in terms of culture, language, gender and
gender identity, educational or socioeconomic status) and multiculturalism
in the use of psychological tests (self-report measures and performance-
based cognitive tests as well as corresponding validity tests) in making
disability determinations. The authors were asked to discuss the use of
psychological tests in diverse populations in terms of their validity, fairness,
and other characteristics. They also were asked to address whether, when,
and/or how to use such measures, despite any limitations, in disability de-
terminations for diverse populations in the United States.
Based on its review of the literature, the presentations from invited
experts on PVT and SVT research at its open sessions, and the expertise
of several of its members, the committee understood the arguments and
evidence supporting the inclusion of validity tests in psychological and
neuropsychological tests and test batteries. Because the committee found
very little published literature critiquing the use of SVTs and PVTs, they
felt it was important to seek more information about potential concerns or
questions pertaining to their use. To this end, they commissioned a second
paper and asked the author to address a number of questions designed to
probe any challenges or cautions about the use of validity tests for disability
determinations in different populations. The questions posed by the com-
mittee included the following:
• For whom are PVTs and SVTs useful for informing disability deter-
minations? In what way?
• How or in what way do the results of PVTs or SVTs correlate with
assessing functional limitations (such as limitations in a person’s
ability to do basic work activities, activities of daily living, social
functioning, and concentration, persistence, or pace) due to an
impairment?
• Given the historical context in which PVTs and SVTs were devel-
oped for forensic use in litigation settings, can they be adapted for
30 Psychological Testing
REPORT ORGANIZATION
Chapter 2 describes the current SSA disability determination process,
focusing on areas relevant to the use of psychological tests. It also dis-
cusses the use of psychological tests in disability evaluations in non-SSA
contexts. Chapter 3 provides an overview of psychological tests, including
the different types of tests and their use, psychometrics and norms, and the
administration of tests. Chapter 4 reviews the use of standardized psycho-
logical self-report measures and SVTs in the context of SSA disability de-
terminations. Chapter 5 addresses standardized cognitive tests and the use
of PVTs. Chapter 6 explores economic considerations related to the use of
psychological testing in SSA disability determinations. Chapter 7 contains
the committee’s conclusions and recommendations.
INTRODUCTION 31
REFERENCES
American Psychiatric Association. 2013. American Psychiatric Association: Diagnostic and
statistical manual of mental disorders, fifth edition (DSM-5). Arlington, VA: American
Psychiatric Association.
APA (American Psychological Association). 2013. Specialty guidelines for forensic psychology.
American Psychologist 68(1):7-19.
British Psychological Society. 2009. Assessment of effort in clinical testing of cognitive func-
tioning for adults. Leicester, UK: British Psychological Society.
Bush, S. S., R. M. Ruff, A. I. Tröster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds,
and C. H. Silver. 2005. Symptom validity assessment: Practice issues and medical
necessity. NAN Policy & Planning Committee. Archives of Clinical Neuropsychology
20(4):419-426.
Bush, S. S., R. L. Heilbronner, and R. M. Ruff. 2014. Psychological assessment of symp-
tom and performance validity, response bias, and malingering: Official position of the
Association for Scientific Advancement in Psychological Injury and Law. Psychological
Injury and Law 7(3):197-205.
Furr, R. M., and V. R. Bacharach. 2013. Psychometrics: An introduction. Thousand Oaks,
CA: Sage Publications, Inc.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference
Participants. 2009. American Academy of Clinical Neuropsychology consensus confer-
ence statement on the neuropsychological assessment of effort, response bias, and ma-
lingering. The Clinical Neuropsychologist 23(7):1093-1129.
Hubley, A. M., and B. D. Zumbo. 2013. Psychometric characteristics of assessment procedures:
An overview. In APA handbook of testing and assessment in psychology, Volume 1—Test
theory and testing and assessment in industrial and organizational psychology, edited by
K. F. Geisinger, N. R. Kuncel, S. P. Reise, M. C. Rodriguez. Washington, DC: American
Psychological Association.
IOM (Institute of Medicine). 1987. Pain and disability: Clinical, behavioral, and public policy
perspectives. Washington, DC: National Academy Press.
IOM. 1991. Disability in America: Toward a national agenda for prevention. Washington,
DC: National Academy Press.
IOM. 1997. Enabling America: Assessing the role of rehabilitation science and engineering.
Washington, DC: National Academy Press.
IOM. 2007a. A 21st century system for evaluating veterans for disability benefits. Washington,
DC: The National Academies Press.
IOM. 2007b. The future of disability in America. Washington, DC: The National Academies
Press.
IOM. 2007c. Improving the social security disability decision process. Washington, DC: The
National Academies Press.
IOM and NRC (National Research Council). 2007. PTSD compensation and military service.
Washington, DC: The National Academies Press.
IOPC (Inter Organizational Practice Committee). 2013. Use of symptom validity indicators in
SSA psychological and neuropsychological evaluations. Letter to Senator Tom Coburn.
https://www.nanonline.org/docs/PAIC/PDFs/SSA%20and%20Symptom%20Validity%20
Tests%20-%20IOPC%20letter%20to%20Sen%20Coburn%20-%202-11-13.pdf (ac-
cessed February 8, 2015).
Larrabee, G. J. 2012. Performance validity and symptom validity in neuropsychological assess-
ment. Journal of the International Neuropsychological Society 18(4):625-630.
32 Psychological Testing
Larrabee, G. J. 2014. Performance and symptom validity. Presentation to the IOM Committee
on Psychological Testing, Including Validity Testing, for Social Security Administration
Disability Determinations, June 25, 2014, Washington, DC.
Nagi, S. Z. 1965. Some conceptual issues in disability and rehabilitation. In Sociology and
rehabilitation, edited by M. B. Sussman. Washington, DC: American Sociological
Association. Pp. 100-113.
Nagi, S. Z. 1976. An epidemiology of disability among adults in the United States. Milbank
Memorial Fund Quarterly Health and Society 54(4):439-467.
NRC (National Research Council). 2000. Survey measurement of work disability: Summary
of a workshop. Washington, DC: National Academy Press
NRC. 2002. Mental retardation: Determining eligibility for social security benefits. Washington,
DC: The National Academies Press.
Office of the Inspector General, SSA (Social Security Administration). 2013. The Social Security
Administration’s policy on symptom validity tests in determining disability claims.
Washington, DC: SSA. http://oig.ssa.gov/sites/default/files/audit/full/pdf/A-08-13-23094.
pdf (accessed March 27, 2015).
SSA (Social Security Administration). 1996. SSR 96-7p: Policy interpretation ruling Titles II
and XVI: Evaluation of symptoms in disability claims: Assessing the credibility of an
individual’s statements. http://www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-
07-di-01.html (accessed October 3, 2014).
SSA. 2012a. DI 00115.001 Social Security Administration’s (SSA) disability programs. Program
Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/0400115001
(accessed October 2, 2014).
SSA. 2012b. DI 00115.015 Definitions of disability. Program Operations Manual System
(POMS). https://secure.ssa.gov/poms.nsf/lnx/0400115015 (accessed October 3, 2014).
SSA. 2013a. Annual statistical report on the Social Security Disability Insurance program,
2012. http://www.socialsecurity.gov/policy/docs/statcomps/di_asr/2012/index.html (ac-
cessed September 26, 2014).
SSA. 2013b. SSI annual statistical report, 2012. http://www.socialsecurity.gov/policy/docs/
statcomps/ssi_asr/2012/index.html (accessed September 26, 2014).
SSA. 2015. DI 22510.000 Development of consultative examinations (CE). Program Operations
Manual System (POMS). https://secure.ssa.gov/apps10/poms.nsf/lnx/0422510000 (ac-
cessed January 27, 2015).
SSA. n.d. Disability evaluation under social security; Part I—General information. http://
www.ssa.gov/disability/professionals/bluebook/general-info.htm (accessed November 14,
2014).
WHO (World Health Organization). 1992. International statistical classification of diseases
and related health problems, 10th revision (ICD-10). Geneva: WHO.
WHO. 2001. International classification of functioning, disability and health (ICF). Geneva:
WHO.
33
34 Psychological Testing
DI SA B I L I T Y DE T E R M I NAT I ON SE R V I C E S
Evaluation of disability
Traditional Site Single-Decision-Maker Site
(STEPS 2-5)
(Disability Determination Team)
Disability
Examiner
Determinations are
Psychological Disability Medical
based on information
Consultant Examiner Consultant Medical Psychological
from claimant’s Consultant Consultant
medical sources
Figure 2-1
Disability Determination Numbers at Each Stage of the Process:
Concurrent Title II/Title XVI in 2013
No
Yes
No
Not
11.7% Disabled 4. Capacity for past work?
Yes
No
33.8% Not
5. Capacity for any work? Disabled 12.5%
Disabled Yes No
NOTE: Other 13% (procedural denials 9.2%), 23.8% allowed at initial determination level.
No
Yes
No
Not
14.1% Disabled 4. Capacity for past work?
Yes
No
24.3% Not
5. Capacity for any work? Disabled 25.5%
Disabled Yes No
Figure 2-2b
Disability Determination Numbers at Each Stage of the Process:
Title XVI Adults in 2013, SSI Adults
No
Yes
No
Not
5.9% Disabled 4. Capacity for past work?
Yes
No
40.1% Not
5. Capacity for any work? Disabled 13.9%
Disabled Yes No
NOTE: Other 19.0% (procedural denials 6.2%), 28.1% allowed at initial determination level.
Figure 2-2c
Copyright National Academy of Sciences. All rights reserved.
Psychological Testing in the Service of Disability Determination
36 Psychological Testing
Yes
Yes
No
1 For SSI child applicants, the income test relates to the resources of the household.
agency, where a disability examiner develops and reviews the medical and
other evidence2 for the claim and makes an initial determination about
disability. In 2013, state DDS offices evaluated approximately 2.8 million
applications for disability benefits distributed as follows: 915,679 SSDI;
887,506 concurrent SSDI/SSI adult; 653,699 SSI adult; and 428,208 SSI
child (SSA, 2014h). Before beginning the disability evaluation, DDS exam-
iners recheck that applicants meet the financial and other nonmedical crite-
ria for the disability programs. As shown in Figure 2-2, almost no cases that
reach the DDSs are rejected at this step, because the SSA field offices have
already screened the applicants on these criteria. If the financial criteria are
met, the DDS agencies begin to develop the case.
DDS agencies follow either a traditional or a single-decision-maker
(SDM) model (see Figure 2-1), depending on the state. In the traditional
model, the disability examiner makes the determination in conjunction
with a DDS psychological consultant or a medical consultant (20 CFR §
404.1615). In the SDM model (20 CFR § 404.906), disability examiners
have the authority to make the initial disability determination. In most
cases, the disability examiners prepare the assessments and have the author-
ity to approve or deny claims without obtaining the signature of a medical
or psychological consultant. The exception is denials for mental impair-
ments, which must be reviewed by a psychological consultant. Medical and
psychological consultants are always available to assist disability examiners
in their review of claims.
2 Types of evidence may include (1) objective medical evidence—i.e., medical signs and
laboratory findings, (2) medical history and treatment records, (3) medical source opinions and
statements, (4) statements from claimant or others, and (5) information from other sources—
e.g., educational personnel, social welfare agency personnel (SSA, 2012b).
38 Psychological Testing
applicants, and 7.0 percent of SSI adult applicants were denied at this step
(see Figure 2-2) (SSA, 2014h). If the applicant is found to have a severe
impairment, the disability evaluation moves to the next step.
3 For mental disorders, functional limitations are used to assess the severity of the impair-
ment. Paragraph B and C criteria in the Listing of Impairments for mental disorders describe
the areas of function that are considered necessary for work (SSA, 2009).
and allowed applicants across all stages. Applications for SSDI and SSI adult benefits may be
initially denied at any point along the five-step determination process. Applications may be
allowed only at Steps 3 and 5.
40 Psychological Testing
denied benefits during this initial evaluation process may be eligible for
appeal. As such, the allowance rates from this initial evaluation stage are
lower than the final allowance rates for all applicants.
served, apart from [self-reported symptoms]. Signs must be shown by medically acceptable
clinical diagnostic techniques. Psychiatric signs are medically demonstrable phenomena that
indicate specific psychological abnormalities, e.g., abnormalities of behavior, mood, thought,
memory, orientation, development, or perception. They must also be shown by observable
facts that can be medically described and evaluated” (20 CFR § 404.1528).
8 “Laboratory findings are anatomical, physiological, or psychological phenomena which
can be shown by the use of medically acceptable laboratory diagnostic techniques. Some of
these diagnostic techniques include chemical tests, electrophysiological studies (electrocardio-
gram, electroencephalogram, etc.), roentgenological studies (X-rays), and psychological tests”
(20 CFR § 404.1528).
42 Psychological Testing
Appeals Process
If the DDS denies an application, the applicant can appeal the deci-
sion in turn to (1) the DDS (reconsideration), (2) an administrative law
judge (ALJ), (3) the Appeals Council, and (4) a federal court.9 Data on the
number of applicants who appeal their decision at each stage are available
from SSA. Because it takes time for denied applicants to move through
the various stages of the appeal process, data are available through 2010.
The data show that approximately 55 percent of those who applied for
9 A
10-state pilot program begun in 1999 permits a claimant to bypass reconsideration by
DDS and submit the appeal directly to an ALJ.
SSDI or concurrent worker benefits in 2010 and were denied during the
initial evaluation, appealed the decision (calculation based on data from
the 2013 Annual Statistical Report on the SSDI program, Tables 61 and
62 [SSA, 2014b]).10 The rates of appeal were slightly lower for denied SSI
applicants. Approximately 45 percent of 2010 SSI adult applicants and 30
percent of 2010 SSI child applicants who were rejected in the initial deter-
mination process appealed their decisions (calculations based on data from
the 2013 Annual Statistical Report on the SSI program, Tables 70 and 71
[SSA, 2014k]).
The first level of appeal, which takes place within the DDS, is a re-
consideration of the original claim or, for SSI, a review of an initial deter-
mination. Reconsideration involves a complete review of the initial claim
by an examiner and, where applicable, a medical consultant who did not
participate in the original evaluation. DDSs are reported to approve about
5 percent of reconsideration claims (Morton, 2014).
If the reconsideration is denied, the next level of appeal is a hearing
before an ALJ. ALJs are employed by SSA and, on appeal, review the evi-
dence in an applicant’s file, including any new evidence submitted by the
applicant. The ALJ also may interview the applicant and any witnesses
brought by the applicant, as well as relevant medical or psychological con-
sultants, other health care providers, or vocational experts. The applicant
or a representative also may question any of the other witnesses. After
considering all of the evidence and testimony, the ALJ issues a written deci-
sion (SSA, n.d.-i). If the ALJ finds that additional evidence is needed, he or
she may order a CE or otherwise seek further development of the case file
(SSA, 2012f). Reportedly about 67 percent of the claims reviewed by ALJs
overall are approved, although the approval rate varies among ALJs and
can be much higher (Morton, 2014; SSA, 2015).
Claims that are denied at the ALJ level may be brought to the Appeals
Council, which serves as the final level of appeal within SSA. The Appeals
Council considers each case brought to it and either denies the request for
review, if it agrees with the ALJ’s decision; sends it for review by another
ALJ, if it finds a technical or procedural error with the ALJ’s decision; or
decides the case itself and grants benefits to the applicant (Laurence, 2015;
SSA, n.d.-h). About 16 percent of requests for review are returned for re-
review by an ALJ. In fiscal year 2014, the Appeals Council received more
than 155,000 new requests for review. The council processed more than
162,280 requests that year. The processing time averaged 374 days.11
44 Psychological Testing
12 In 2010, there were still applications pending final approval. Allowance rates for earlier
years with smaller numbers of pending decisions were slightly higher than those referenced
here for 2010.
NH
WA
MT ND VT ME
MN
OR
MA
ID WI NY
SD
WY MI RI
IA PA
NE
OH CT
NV UT IL IN
CA CO WV
KS VA NJ
MO KY
TN NC DE
AZ OK
NM AR SC
GA MD
MS AL
TX LA DC
AK
FL
First Quantile: 1.64 - 2.54
HI Second Quantile: 1.24 - 1.64
Third Quantile: 0.99 - 1.24
Fourth Quantile: 0.60 - 0.99
NH
WA
MT ND VT ME
MN
OR
MA
ID WI NY
SD
WY MI RI
IA PA
NE
OH CT
NV UT IL IN
CA CO WV
KS VA NJ
MO KY
TN NC DE
AZ OK
NM AR SC
GA MD
MS AL
TX LA DC
AK
FL
First Quantile: 0.68 - 1.56
HI Second Quantile: 0.49 - 0.68
Third Quantile: 0.35 - 0.49
Fourth Quantile: 0.06 - 0.35
46 Psychological Testing
NH
WA
MT ND VT ME
MN
OR
MA
ID WI NY
SD
WY MI RI
IA PA
NE
OH CT
NV UT IL IN
CA CO WV
KS VA NJ
MO KY
TN NC DE
AZ OK
NM AR SC
GA MD
MS AL
TX LA DC
AK
FL
First Quantile: 36.35 - 53.49
HI Second Quantile: 32.64 - 36.35
Third Quantile: 29.70 - 32.64
Fourth Quantile: 23.66 - 29.70
Figure 2-5a
Child Allowance Rate by State - 2013
Percent of Determinaons Resulng in an Allowance
NH
WA
MT ND VT ME
MN
OR
MA
ID WI NY
SD
WY MI RI
IA PA
NE
OH CT
NV UT IL IN
CA CO WV
KS VA NJ
MO KY
TN NC DE
AZ OK
NM AR SC
GA MD
MS AL
TX LA DC
AK
FL
First Quantile: 51.58 - 77.41
HI Second Quantile: 43.20 - 51.58
Third Quantile: 36.47 - 43.20
Fourth Quantile: 26.22 - 36.47
Figure 2-5b
13 A long literature has documented the relationship between local labor market conditions,
generally measured by the unemployment rate, and applications and awards for disability
benefits. In general the results show that poor economic conditions/higher unemployment rates
are associated with increased applications and awards for benefits (Autor and Duggan, 2003;
Black et al., 2002; Burkhauser et al., 2002; Duggan and Imberman, 2008; Kreider, 1999; Rupp
and Stapleton, 1995). Research on allowance rates and economic conditions (Rupp, 2012;
Rupp and Stapleton, 1995; Strand, 2002) generally finds a negative relationship suggesting
that SSA is able to screen out some marginally qualified candidates who might apply for the
program in response to poor economic conditions.
48 Psychological Testing
NOTES: A total of 12 regressions were estimated: 3 models for each of the 4 program groups.
For each program group, independent variables were included in a sequential manner. The
first model included only state fixed effects. The second model added year fixed effects. The
third model added the time-varying variables. The results in this table reflect state-level OLS
regression models. Totals may not sum to 100 because of rounding.
a The first row contains the R2 from the first model for each program group. The subse-
quent two rows reflect the marginal increase in the R2 arising from adding the given group of
independent variables to the model. The total of the first three rows represents the R2 for the
third model that included all three groups of variables.
b The unexplained variation was calculated by subtracting the R2 for the third model that
50 Psychological Testing
Intellectual Disability
1%
Other Mental
16%
Other
28%
Circulatory System
11%
Musculoskeletal
36%
Other Mental
27%
Other
36%
Musculoskeletal
26% Nervous
Systems
and Sense
Organs
7%
Figure 2-6b
Copyright National Academy of Sciences. All rights reserved.
Psychological Testing in the Service of Disability Determination
Intellectual
Disability
7%
Other
24%
Other Mental
58%
FIGURE 2-6 Composition of new beneficiaries in 2013 for SSDI and SSI adults
and children.
SOURCES: SSA, 2014b,k.
Figure 2-6c
14 Under a notice of proposed rulemaking, SSA has proposed revised Paragraph B criteria
to capture “the mental abilities an adult uses to function in a work setting” (SSA, 2010,
p. 51340). The revised B criteria are the abilities to “understand, remember, and apply in-
formation”; “interact with others”; “concentrate, persist, and maintain pace”; and “manage
oneself.”
52 Psychological Testing
15 Under the same notice of proposed rulemaking (SSA, 2010), SSA has proposed revised
listing categories.
16 Somatoform disorders are discussed separately in the following section.
17 The structure of the listing for intellectual disability and for substance addiction disorders
differ from that of the other mental disorder listings. There are four sets of criteria (Paragraphs
A through D) for the intellectual disability listing, and the listing for substance addiction dis-
orders refers to which of the other listings should be used to evaluate the various physical or
behavioral changes related to the disorder.
54 Psychological Testing
including any impairments [the child has] that are not ‘severe’ (see §
416.924(c))” (20 CFR § 416.926a). When assessing a child’s functional
limitations, it considers “how appropriately, effectively, and independently
[the child] performs … activities compared to the performance of other chil-
dren [the same] age who do not have impairments” (20 CFR § 416.926a).
Documentation
As previously described, the DDS uses all relevant evidence in an ap-
plicant’s file in making a disability determination. The medical evidence in
an applicant’s file must be sufficiently complete and detailed to allow the
DDS to make a determination. Medical evidence includes a history of the
individual’s mental impairment, the results of any mental status examina-
tions and psychological tests, and the records of any treatments and hospi-
talizations provided by an “acceptable medical source” (SSA, 2014f, n.d.-e).
Although a full mental status exam, performed during a clinical inter-
view, can be tailored to target the specific areas most relevant to the alleged
impairment, a comprehensive exam generally would include “a narrative
description of [the individual’s] appearance, behavior, and speech; thought
process (e.g., loosening of associations); thought content (e.g., delusions);
perceptual abnormalities (e.g., hallucinations); mood and affect; sensorium
and cognition (orientation, recall, concentration, intelligence); and judg-
ment and insight” (SSA, n.d.-e, section D4).
Psychological Testing
SSA understands “standardized psychological tests” to be psychologi-
cal test measures that have “appropriate validity, reliability, and norms”
representative of relevant populations (SSA, n.d.-e, section D5). SSA char-
acterizes a “good test” as one that is valid (“measures what it is supposed
to measure”) and reliable (use of the same test in the same individual yields
consistent results over time) and has “appropriate normative data” and a
“wide scope of measurement” (measures a broad range of elements of the
domain being assessed) (SSA, n.d.-e, section D5).
SSA specifies the tests would be administered, scored, and interpreted
by a “qualified” specialist—meaning someone “currently licensed or certi-
fied in the state to administer, score, and interpret psychological tests” with
the “training and experience to perform the test” (SSA, n.d.-e, section D5).
The types of specialists who are qualified to administer, score, and inter-
pret standardized psychological tests are discussed in Chapters 3, 4, and
5. Observations of the test administrator—such as ability to concentrate,
interact appropriately with test administrator, perform independently—
would supplement the report of test results. The report would also address
56 Psychological Testing
SSA policy states that applicants may not be found disabled solely on the
basis of self-reported statements about pain or other symptoms (Social
Security Act § 223(d)(5)(A), § 1614(a)(3)(D); 20 CFR 404.1508, 404.1529,
416.908, 416.929; SSA, 1996b, 2014g).
In cases where an individual’s self-reported symptoms, including pain,
suggest a greater degree of impairment than expected based on the objective
medical evidence alone, other corroborative information from treating and
nontreating medical sources and other sources is considered. Such informa-
tion may include information about the individual’s
daily activities; the location, duration, frequency, and intensity of [the]
pain or other symptoms; precipitating and aggravating factors; the type,
dosage, effectiveness, and side effects of any medication … taken to allevi-
ate [the] pain or other symptoms; treatment, other than medication …; any
measures … used to relieve [the] pain or other symptoms …; and other
factors concerning [the individual’s] functional limitations and restrictions
due to pain or other symptoms. (20 CFR 404, Subpart P, § 404.1529; 20
CFR 416, Subpart I, § 416.929)
SSA has issued guidance on its policy for evaluating claims involving
chronic fatigue syndrome (CFS) (SSA, 2014g). This guidance explains how
SSA determines the presence of a medically determinable impairment in
an individual with CFS, including some of the possible medical signs and
laboratory findings that would help to support such a finding. SSA then
assesses whether the medically determinable impairment could reasonably
be expected to produce the reported symptoms. In cases where objective
medical evidence does not substantiate the person’s statements, SSA consid-
ers the same types of evidence described for pain and other symptoms. SSA
will also make a finding about the credibility of the person’s statements as
described in the following section.
58 Psychological Testing
SSA requires the examiner to articulate specific reasons for the cred-
ibility finding based on the medical and other evidence in the case record. It
is important to note both that a credibility finding need not reflect complete
acceptance or rejection of the individual’s statements (i.e., the statements
may be found to be partially credible) and that credibility concerns alone
do not rule out the presence of disability (SSA, 1996c).
19 Such tests include the following: Rey-15 Item Memory Test (Rey-II), Miller Forensic
On the other hand, SSA acknowledges that validity test results can
“provide evidence suggestive of poor effort or intentional symptom ma-
nipulation” and states that it will consider validity test results that are
already in an applicant’s file, along with all other relevant evidence. In fact,
the statement that no one test “conclusively determines the presence of
inaccurate patient self-report” seems to run counter to SSA’s dedication to
obtaining as much evidence as possible and taking account of all the infor-
mation when making a disability determination. It is important to divorce
the concept of “malingering” from that of validity testing. As introduced
in the following section, and made clear later in this chapter and elsewhere
in the report and appendixes, validity test results can speak to performance
(on performance-based tasks) and to the consistency and accuracy of re-
sponses on self-report measures. However, they provide limited information
about intentionality and none about motive. It is important, therefore, not
to discount the potential usefulness of validity test results on the grounds
60 Psychological Testing
21 Respondents were asked the extent to which each of the following supported such an
assessment in their cases: “below empirical cut-off on forced-choice tests”; “below chance on
forced-choice tests”; “below empirical cut-off on other malingering tests”; “pattern of cogni-
tive test performance does not make neuropsychological sense (inconsistent with condition)”;
“severity of cognitive impairment inconsistent with condition”; “implausible changes in test
scores across repeated examinations”; “above validity scale cut-offs on objective personality
tests”; “discrepancies among records, self-report, and observed behavior”; and “implausible
self-reported symptoms in interview” (Mittenberg et al., 2002, p. 1102).
22 Theadjusted value is corrected to remove significant variation due to referral source.
23 Theinformation and data in this sentence have been revised from that provided in the
prepublication version of the report.
62 Psychological Testing
24 To the committee’s knowledge, the “DDS Malingering Rating Scale” has never been used
64 Psychological Testing
Malingering Rating Scale, and 20.5 to 30.4 percent of adults and 15.4 to
32.5 percent of children scored below chance (Chafetz et al., 2007, p. 10).
In a subsequent paper that draws on the research reported in Chafetz
and colleagues (2007), Chafetz reports 67.8 percent of adults who were ad-
ministered both the TOMM and the DDS Malingering Rating Scale failed
at least one, 45.8 percent failed both, and 36.5 percent scored at or below
chance. For adults who were administered both the MSVT and the rating
scale, 68.4 percent failed at least one, 59.7 percent failed both, and 47.4
percent scored at or below chance on at least one of the SVT subtests. Sixty
percent of children who were administered the TOMM and the rating scale
failed at least one and 26.3 percent scored at or below chance. Of children
who were administered the MSVT and the rating scale, 48 percent failed
at least one, and 20 scored at or below chance on at least one of the SVT
subtests (Chafetz, 2008).
In the context of SSA disability evaluations, it is important to note that
even if an applicant performs below his or her capability on cognitive tests
or inconsistently reports symptoms, neither scenario means the individual
is not disabled. However, both scenarios suggest the need for additional
assessment of the alleged impairment with the goal of making an accurate
determination of disability. Doing so first requires identification of the in-
dividuals for whom additional assessment may improve the accuracy of the
disability determination. As described in the section on assessing credibility,
when a disability claim is based primarily on an applicant’s self-report of
symptoms and statements about their intensity, persistence, and limiting ef-
fects, SSA relies on an assessment of the consistency of the self-report with
all of the evidence in the claimant’s medical evidence record. As discussed,
SSA policy currently precludes the purchase of validity tests by SSA (e.g.,
as part of a psychological CE). One question is whether the results of this
type of standardized test could contribute to the evidence available for
assessment. The following section discusses the potential value of adding
standardized data collection and interpretation to clinical data collection
and evaluation.
Defining Terms
Data collection Medical professionals often evaluate patients using a com-
bination of what Wedding and Faust call clinical and mechanical data
(Wedding and Faust, 1989). Clinical data collection includes all testing and
examining that is variable depending on how the clinician performs the
exam and/or on which aspects of the exam the clinician chooses to perform.
For example, clinicians may interview patients to elicit their description
of the symptoms of their illness; alternatively, clinicians may perform a
physical exam. By contrast, mechanical data collection involves the use of
standardized testing where the data collection is structured and the method
typically does not vary from patient to patient. For example, if clinicians
order a serum sodium level or MMPI tests on their patients, they are col-
lecting mechanical data.
It should be noted that mechanical data collection is not completely
divorced from clinical expertise. For example, clinicians may need to de-
termine which mechanical data are relevant to collect in a given patient,
making a judgment about whose diagnosis will be aided by a serum sodium
level or an MMPI. In addition, the administration of mechanical tests can
be affected by clinical skill. For example, a clinician who draws a patient’s
blood above an IV site will get a false sodium level. Similarly, a clinician
who administers an MMPI test after the patient has been exhausted by
previous examinations may also be collecting the data in a way that will
reduce the value and accuracy of the test results.
66 Psychological Testing
68 Psychological Testing
Military 25
There are significant differences between policies and procedures fol-
lowed by SSA and the military. In contrast to disability evaluations for SSA
and the Veterans Benefits Administration (VBA), discussed in the following
section, military assessments for mental and behavioral health are per-
formed to assess combat or duty readiness. Assessing whether an individual
is capable of performing his or her duty may be an issue of safety not only
for the individual but also for others.
Fitness for duty and return-to-duty determinations are made by medi-
cal evaluation boards and physical evaluation boards. Mental health pro-
viders serve as consultants to the boards, providing them with reports of
diagnostic impressions, assessment of degree of impairment and impact on
military duty performance, prognosis, and recommendations. In contrast to
SSA and the VBA, evaluations in the military are often performed by thera-
pists and care professionals who are not “interrogators” but are considered
25 Muchof the information in this section is drawn from the presentation to the committee
by Robert Seegmiller (2014).
70 Psychological Testing
26 Much of the information in this section is drawn from the presentation to the committee
by Stacey Pollack (2014).
72 Psychological Testing
27 The information in this section is drawn from the presentation to the committee by
Thomas McLaren (2014).
28 This is consistent with the findings of the SSA Office of the Inspector General, which
reports on the practices of three private disability insurance providers, all of which allow
the purchase and use the results of validity tests in their disability claims processes. All three
companies also indicated that validity test results are just one piece of data they consider when
evaluating claims (Office of the Inspector General, 2013). The names of the companies are
not released in the report.
74 Psychological Testing
International Community
Canada
The Canada Pension Plan (CPP) provides disability benefits to eligible
individuals using much the same criteria in its disability determination pro-
cess as SSA does (Government of Canada, 2014) As in the United States,
there are a number of different settings in which disability determina-
tions are made. Settings in addition to the CPP include the Worker Safety
Insurance Board, Veterans Affairs Canada, and the auto insurance industry.
Psychologists and neuropsychologists do not work under the Canadian
national health care system. As a result, they work in a number of other
settings, such as auto insurance.
Brian Levitt (2014) presented to the committee on the use of psycho-
logical testing under private auto insurance in the province of Ontario as
well as tort law in Ontario. In this setting as well, the decision of whether
to administer psychological tests and, if so, which particular test to use is
determined by the individual psychologists according to the practice stan-
dards in that area of inquiry. The Canadian Academy of Psychologists and
Disability Assessment standards related to psychological testing include the
following:
These standards are consistent with the message that the use of validity
tests is important, but they constitute only one piece of data, which must
be interpreted in the context of all the other information.
76 Psychological Testing
Europe
Merten and colleagues (2013) have reported that large-scale research
on and use of SVTs and PVTs in Europe followed that in the United States
by about a decade, beginning in earnest in the early 2000s. As in the
United States, the setting or context (forensic, clinical, etc.) seems to m
atter
(Dandachi-FitzGerald et al., 2013; McCarter et al., 2009; Merten et al.,
2013). It is important to note that in the study by Dandachi-FitzGerald
and colleagues (2013) the definition of SVT was left to the respondent.
Everything from discrepancies between records and observed behavior, to
more “objective” scales on personality and effort tests was included, mak-
ing it very difficult to interpret the findings regarding the percentage of
medical professionals using SVTs when contracted to assess work capacity
due to claims of psychological disability. There also appear to be differences
in SVT and PVT use across European countries, with practitioners in the
Netherlands and Norway reporting the greatest use of such tests (Merten
et al., 2013).
Closing Comments
SSA, the U.S. military, the VBA, private disability insurance providers,
and forensic assessment in civil and criminal judicial contexts have different
goals, needs, and approaches to the evaluation and determination of dis-
ability (see Table 2-3). All share common elements, including identification
of the presence of impairment and evaluation of its effect on the individual’s
ability to function.
Although the use of psychological testing must be understood in the
context of each system’s goals, each of the systems encourages a compre-
hensive evaluation, as determined by the evaluator, in an effort to answer
these questions and each permits a broad range of evaluations. Whether
to order psychological tests and the selection of which tests to administer
are left to the discretion of the professional performing the evaluation
or examination. With the exception of SSA, all of the systems permit, or
in some cases require, the use of validity testing to provide information
about the validity of the results of other psychological tests being admin-
istered. Nevertheless, all agree that although validity tests yield important
information, the results of such tests are only one piece of data that needs
to be assessed and interpreted in the context of all the other information
available.
Policy on Psychological
Who Performs the What Are the Psychological Tests or Neuropsychological
Setting Assessments Assessments Employed Tests Concerns/Conflicts
SSA DDS disability Medical record review Primarily intelligence Intelligence tests for
examiners Clinical interview tests intellectual disability
Consultative examiner Behavioral Other standardized claims
psychologists observations tests as Other tests at
determined by discretion of DDS
consultative and consultative
examiner and examiner
paid for by state Disallows purchase of
DDS agencies SVTs/PVTs
VA Psychiatrist Clinical files Any relevant, None specifically Diagnostic listings are limited
Psychologist IDES scientifically required or Inconsistency in the use of
Under supervision: Lab studies/tests valid tests (as prohibited tests; not all VA medical
Residents Functional evaluations determined by SVTs/PVTs are neither centers use the same
Psychological Testing in the Service of Disability Determination
continued
TABLE 2-3 Continued
78
Policy on Psychological
Who Performs the What Are the Psychological Tests or Neuropsychological
Setting Assessments Assessments Employed Tests Concerns/Conflicts
Private Disability evaluators: Clinical files or Any relevant, Evaluator determines Industry has additional
Neuropsychologists recordsa scientifically valid necessary testing resources
Psychologists tests PVTs/SVTs required Each company makes its own
Psychiatrists policy
Social workers
Forensic: Mental health Hired by defense or
Civil and professionals hired prosecution to support
Criminal by defense or position favorable to that
prosecution: side
Psychologists
Psychiatrists
Social workers
Psychological Testing in the Service of Disability Determination
NOTE: DDS = Disability Determination Services; IDES = Independent Disability Examination System; NP = nurse practitioner; PA = physician as-
sistant; PTSD = posttraumatic stress disorder; PVT = performance validity test; SVT = symptom validity test; TBI = traumatic brain injury.
a Some require standard tests, such as the AMA Guide (see, for example, Rondinelli, 2008).
FINDINGS
• There currently is great variability in allowance rates for both SSI
and SSDI among states that is not fully accounted for by differences
in the populations of applicants. There also is great variability in
the disability determination appeal rulings among ALJs within and
across states.
• Each state DDS agency, within the confines of SSA policy, issues
its own rules regarding the tests that may be purchased as part of
a CE. For this reason, there is variation among states about when
and which standardized psychological tests can be purchased, with
the exception of PVTs and SVTs, which are precluded from pur-
chase by SSA.
• There currently are no data on the rates of false positives and false
negatives in SSA disability determinations.
• Identification and documentation of the presence and severity of
medically determinable mental impairments at Step 2 of SSA’s
disability determination process could be informed by results of
standardized psychological tests.
• Identification and assessment of the severity of work-related func-
tional impairment relevant to disability evaluations at the listing
level (Step 3) and to mental residual functional capacity (Steps 4
and 5) are other points in SSA’s disability determination process
that could be informed by results of standardized psychological
tests.
• Consultative examinations may be ordered by DDS examin-
ers or ALJs to supplement evidence in a claimant’s case record.
Psychological tests could be administered as part of a CE.
• In some cases, SSA disability examiners must evaluate the credibil-
ity of statements by individuals about the intensity and persistence
of their symptoms and the effect on the individual’s ability to func-
tion and perform work-related activities.
• Current data on the prevalence of inconsistent reporting of symp-
toms or performing below one’s capability on cognitive tests among
SSDI and SSI applicant populations are limited.
• Current SSA policy precludes the purchase of (validity) tests—
e.g., MMPI-2 and TOMM—to help inform determinations about
the credibility of an individual’s statements or about possible
malingering.
• There is inconsistency among SSA’s statements on validity testing:
o Results can “provide evidence suggestive of poor effort or inten-
tional symptom manipulation.”
80 Psychological Testing
REFERENCES
Ægisdóttir, S., M. J. White, P. M. Spengler, A. S. Maugherman, L. A. Anderson, R. S. Cook,
C. N. Nichols, G. K. Lampropoulos, B. S. Walker, G. Cohen, and J. D. Rush. 2006. The
meta-analysis of clinical judgment project: Fifty-six years of accumulated research on
clinical versus statistical prediction. The Counseling Psychologist 4(3):341-382.
APA (American Psychological Association). 2015. Guidelines and principles for accreditation of
programs in professional psychology: Quick reference guide to doctoral programs. http://
www.apa.org/ed/accreditation/about/policies/doctoral.aspx (accessed January 20, 2015).
Autor, D. H., and M. G. Duggan. 2003. The rise in the disability rolls and the decline in
unemployment. Quarterly Journal of Economics 118(1):157-205.
Black, D., K. Daniel, and S. Sanders. 2002. The impact of economic conditions on participa-
tion in disability programs: Evidence from the coal boom and bust. American Economic
Review 92(1):27-50.
Burkhauser, R., J. S. Butler, and R. Weathers II. 2002. How policy variables influence the
timing of applications for Social Security Disability Insurance. Social Security Bulletin
64(1):52-83.
Bush, S. S., R. M. Ruff, A. I. Tröster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds,
and C. H. Silver. 2005. Symptom validity assessment: Practice issues and medical ne-
cessity. NAN Policy & Planning Committee. Archives of Clinical Neuropsychology
20(4):419-426.
Bush, S. S., R. L. Heilbronner, and R. M. Ruff. 2014. Psychological assessment of symp-
tom and performance validity, response bias, and malingering: Official position of the
Association for Scientific Advancement in Psychological Injury and Law. Psychological
Injury and Law 7(3):197-205.
82 Psychological Testing
Price, J. H. 2014. Disability Determination Services panel discussion with the commit-
tee. Presentation to the IOM Committee on Psychological Testing, Including Validity
Testing, for Social Security Administration Disability Determinations, August 11, 2014,
Washington, DC.
Rondinelli, R. D., ed. 2008. AMA guides to the evaluation of permanent impairment, sixth
edition. Chicago, IL: American Medical Association.
Rupp. 2012. Factors affecting initial disability allowance rates for the Disability Insurance and
Supplemental Security Income programs: The role of the demographic and diagnostic
composition of applicants and local labor market conditions. Social Security Bulletin
72(4):11-35. http://ssrn.com/abstract=2172488 (accessed February 4, 2015).
Rupp, K., and D. Stapleton. 1995. Determinants of the growth in the Social Security
Administration’s disability programs—An overview. Social Security Bulletin 58(4):43-70.
Salzinger, K. 2005. Clinical, statistical, and broken-leg predictions. Behavior and Philosophy
33:91-99.
Samuel, R. Z., and W. Mittenberg. 2005. Determination of malingering in disability evalua-
tions. Primary Psychiatry 12(12):60-68.
Scheinkman, J. A., and W. Xiong. 2003. Overconfidence and speculative bubbles. Journal of
Political Economy 111(6):1183-1220.
Seegmiller, R. 2014. Use of psychological tests, including PVTs and SVTs, in select popula-
tions: The U.S. military. Presentation to the IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations,
June 25 2014, Washington, DC.
Soss, J., and L. R. Keiser. 2006. The political roots of disability claims: How state environ-
ments and policies shape citizen demands. Political Research Quarterly 59(1):133-148.
SSA (Social Security Administration). 1996a. SSR 96-3p: Policy interpretation ruling. Titles II
and XVI: Considering allegations of pain and other symptoms in determining whether a
medically determinable impairment is severe. http://www.socialsecurity.gov/OP_Home/
rulings/di/01/SSR96-03-di-01.html (accessed August 20, 2014).
SSA. 1996b. SSR 96-4p: Policy interpretation ruling. Titles II and XVI: Symptoms, medi-
cally determinable physical and mental impairments, and exertional and nonexertional
limitations. http://www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-04-di-01.html
(accessed October 3, 2014).
SSA. 1996c. SSR 96-7p: Policy interpretation ruling Titles II and XVI: Evaluation of symp-
toms in disability claims: Assessing the credibility of an individual’s statements. http://
www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-07-di-01.html (accessed October
3, 2014).
SSA. 2008. National Q&A, 08-003 rev 2, do tests of malingering have any value for SSA
evaluations? Washington, DC: SSA.
SSA. 2009. DI 22511.005 Documenting the impact of a medically determinable mental impair-
ment on an individual’s ability to work. Program Operations Manual System (POMS).
https://secure.ssa.gov/apps10/poms.nsf/lnx/0422511005 (accessed January 30, 2015).
SSA. 2010. Revised medical criteria for evaluating mental disorders. Federal Register
75(160):51336-51368.
SSA. 2012a. DI 00115.001 Social Security Administration’s (SSA’s) disability programs.
Program Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/
lnx/0400115001 (accessed August 20, 2014).
SSA. 2012b. DI 22501.001 Disability case development for medical and other evidence.
Program Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/
lnx/0422501001 (accessed October 3, 2014).
84 Psychological Testing
SSA. 2012c. DI 22510.048 Pediatric consultative examination (CE) report content guide-
lines—Mental disorders. Program Operations Manual System (POMS). https://secure.
ssa.gov/poms.nsf/lnx/0422510048 (accessed October 3, 2014).
SSA. 2012d. DI 22511.007 Sources of evidence. Program Operations Manual System (POMS).
https://secure.ssa.gov/apps10/poms.nsf/lnx/0422511007 (accessed December 30, 2014).
SSA. 2012e. Disability Determination Services adminsistrative letter no. 866: Consulative
examinations malingering & credibility tests—Information. Washington, DC: SSA.
SSA. 2012f. Social Security testimony before Congress. Statement of Michael I. Astrue,
Commisioner, Social Security Administration before the Committee on Ways and
Means Subcommittee on Social Security, June 27, 2012. http://www.ssa.gov/legislation/
testimony_062712.html (accessed October 20, 2014).
SSA. 2013. DI 22510.006 When not to purchase a consultative examination (CE). Program
Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/0422510006
(accessed October 3, 2014).
SSA. 2014a. Annual report of the Supplemental Security Income program. Baltimore, MD: SSA.
SSA. 2014b. Annual statistical report on the Social Security Disability Insurance program,
2013. Washington, DC. SSA. http://www.ssa.gov/policy/docs/statcomps/di_asr (accessed
February 24, 2015).
SSA. 2014c. DDS performance management report. Disability claims data. Consultative
examination rates, fiscal year 2013. Data prepared by ORDP, ODP, and ODPMI. Data
submitted to the IOM Committee on Psychological Testing, Including Validity Testing,
for Social Security Administration Disability Determinations by Joanna Firmin, Social
Security Administration, on October 8, 2014.
SSA. 2014d. Disability claims data (initial, reconsideration, continuing disability review) by
adjudictive level and body system. SSDI, SSI, Concurrent, and Total Claims. Data pre-
pared by ORDP, ODP, and ODPMI. Submitted to the IOM Committee on Psychological
Testing, Including Validity Testing, for Social Security Administration Disability
Determinations by Joanna Firmin, Social Security Administration, on October 8, 2014.
SSA. 2014e. DI 22510.021 Consultative examination (CE) report content guidelines: Mental
disorders. Program Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/
lnx/0422510021 (accessed October 3, 2014).
SSA. 2014f. DI 24515.008 Titles II and XVI: Considering opinions and other evidence from
sources who are not “acceptable medical sources” in disability claims; considering
decisions on disability by other governmental and nongovernmental agencies (SSR 06-
03p). Program Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/
lnx/0424515008 (accessed February 24, 2015).
SSA. 2014g. DI 24515.075 Evaluating claims involving Chronic Fatigue Syndrome (CFS).
Program Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/
0424515075 (accessed December 16, 2014).
SSA. 2014h. National data: Title II—SSDI, Title XVI—SSI, & concurrent Title II/XVI initial
disability determinations. By regulation basis code for adults and children (reason for de-
cision), fiscal year 2013. Data submitted to the IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations
by Joanna Firmin, Social Security Administration, on October 23, 2014.
SSA. 2014i. Open government initiative. Data on combined Title II disability and Title XVI
blind/disabled average processing time (in days) (excludes technical denials). http://www
.ssa.gov/open/data/Combined-Disability-Processing-Time.html (accessed December 16,
2014).
SSA. 2014j. SSDI awards by diagnostic group and age of awardee under the age of 65, 2013
(preliminary data). Data submitted to the IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations
by Joanna Firmin, Social Security Administration, on October 21, 2014.
SSA. 2014k. SSI annual statistical report, 2013. Washington, DC: SSA. http://www.ssa.gov/
policy/docs/statcomps/ssi_asr (accessed February 24, 2015).
SSA. 2014l. SSI awards by diagnostic group and age of awardee under the age of 65, 2013.
Data prepared by ORDP, ODP, and ODPMI. Data submitted to the IOM Committee
on Psychological Testing, Including Validity Testing, for Social Security Administration
Disability Determinations by Joanna Firmin, Social Security Administration, on
October 21, 2014.
SSA. 2014m. Substantial gainful activity. http://www.socialsecurity.gov/oact/cola/sga.html
(accessed December 15, 2014).
SSA. 2015. Hearings and appeals. ALJ disposition data, fiscal year 2015 (for reporting pur-
poses: 09/2/2014 through 01/20/2015). http://www.ssa.gov/appeals/DataSets/03_ALJ_
Disposition_Data.html (accessed February 27, 2015).
SSA. n.d.-a. Disability evaluation under Social Security—Part II: Evidentiary require-
ments. http://www.ssa.gov/disability/professionals/bluebook/evidentiary.htm (accessed
September 4, 2014).
SSA. n.d.-b. Disability evaluation under Social Security—Part III: Listing of impairments.
http://www.ssa.gov/disability/professionals/bluebook/listing-impairments.htm (accessed
October 3, 2014).
SSA. n.d.-c. Disability evaluation under Social Security—Part III: Listing of impairments—
Adult listings (Part A). http://www.ssa.gov/disability/professionals/bluebook/12.00-Men-
talDisorders-Adult.htm (accesseed October 3, 2014).
SSA. n.d.-d. Disability evaluation under Social Security—Part III Listing of impairments—
Childhood listings (Part B). http://www.ssa.gov/disability/professionals/bluebook/
ChildhoodListings.htm (accessed October 7, 2014).
SSA. n.d.-e. Disability evaluation under Social Security—Part III: Listing of impairments—
Adult listings (Part A)—section 12.00 mental disorders. http://www.ssa.gov/disability/
professionals/bluebook/12.00-MentalDisorders-Adult.htm (accessed November 14,
2014).
SSA. n.d.-f. Disability evaluation under Social Security—Part III: Listing of impairments—
Childhood listings (Part B)—section 112.00 mental disorders. http://www.ssa.gov/disabil-
ity/professionals/bluebook/112.00-MentalDisorders-Childhood.htm (accessed October
3, 2014).
SSA. n.d.-g. Hearings and appeals. Federal court review process. http://www.socialsecurity.
gov/appeals/court_process.html#a0=1 (accessed October 7, 2014).
SSA. n.d.-h. Hearings and appeals. Information about requesting review of an administra-
tive law judge’s hearing decision. http://www.socialsecurity.gov/appeals/appeals_process.
html#a0=2 (accessed October 7, 2014).
SSA. n.d.-i. Hearings and appeals. What you need to know to request a hearing before
an administrative law judge. http://www.socialsecurity.gov/appeals/hearing_process.
html#a0=4&sb=3 (accessed October 7, 2014).
SSA. n.d.-j. How we decide if you are disabled. Information we need about your work and
education. http://www.ssa.gov/disability/step4and5.htm (accessed October 7, 2014).
SSA. n.d.-k. Medical/professional relations. Consultative examinations: A guide for health
professionals. http://www.ssa.gov/disability/professionals/greenbook (accessed October
16, 2014).
SSA. n.d.-l. Occupational Information System project. http://www.ssa.gov/disabilityresearch/
occupational_info_systems.html (accessed December 30, 2014).
86 Psychological Testing
SSA. n.d.-m. Selected data from Social Security’s disability program. http://www.ssa.gov/oact/
STATS/dibStat.html (accessed January 27, 2015).
SSDRC (Social Security Disability and SSI [Social Security Insurance] Resource Center). n.d.
Applying for disability: How long does it take to get Soical Security Disability or SSI ben-
efits? http://www.ssdrc.com/disabilityquestions1-46.html (accessed December 15, 2014).
Strand, A. 2002. Social Security disability programs: Assessing the variation in allowance rates.
ORES working paper series, no. 98. Washington, DC: Social Security Administration,
Division of Policy Evaluation, Office of Research, Evaluation, and Statistics. http://
socialsecurity.gov/policy/docs/workingpapers/wp98.pdf (accessed February 4, 2015).
Ward, T. A. 2014. Disability Determination Services panel discussion. Presentation to the
IOM Committee on Psychological Testing, Including Validity Testing, for Social Security
Administration Disability Determinations, August 11, 2014, Washington, DC.
Wedding, D., and D. Faust. 1989. Clinical judgement and decision making in neuropsychology.
Archives of Clinical Neuropsychology 4(3):233-256.
87
88 Psychological Testing
90 Psychological Testing
Psychological assessment
Standardized
Clinical interview Observations Record review
psychological tests
Non-cognitive
Cognitive tests
measures
Attention and
Memory Personality
vigilance
Performance validity
tests
stimuli an individual will project his or her underlying and unconscious mo-
tivations and attitudes. The scoring of these latter measures is often more
complex than it is for structured measures.
There is great variety in cognitive tests and what they measure, thus
requiring a lengthier explanation. Cognitive tests are often separated into
tests of ability and tests of achievement; however, this distinction is not as
clear-cut as some would portray it. Both types of tests involve learning.
Both kinds of tests involve what the test-taker has learned and can do.
However, achievement tests typically involve learning from very special-
ized education and training experiences; whereas, most ability tests assess
learning that has occurred in one’s environment. Some aspects of learning
are clearly both; for example, vocabulary is learned at home, in one’s social
environment, and in school. Notably, the best predictor of intelligence test
performance is one’s vocabulary, which is why it is often given as the first
test during intelligence testing or in some cases represents the body of the
intelligence test (e.g., the Peabody Picture Vocabulary Test). Conversely,
one can also have a vocabulary test based on words one learns only in
an academic setting. Intelligence tests are so prevalent in many clinical
psychology and neuropsychology situations that we also consider them as
neuropsychological measures. Some abilities are measured using subtests
from intelligence tests; for example, certain working memory tests would
be a common example of an intelligence subtest that is used singly as well.
There are also standalone tests of many kinds of specialized abilities.
Some ability tests are broken into verbal and performance tests. Verbal
tests, obviously enough, use language to ask questions and demonstrate
answers. Performance tests on the other hand minimize the use of language;
they can involve solving problems that do not involve language. They may
involve manipulating objects, tracing mazes, placing pictures in the proper
order, and finishing patterns, for example. This distinction is most com-
monly used in the case of intelligence tests, but can be used in other ability
tests as well. Performance tests are also sometimes used when the test-taker
lacks competence in the language of the testing. Many of these tests assess
visual spatial tasks. Historically, nonverbal measures were given as intel-
ligence tests for non-English speaking soldiers in the United States as early
as World War I. These tests continue to be used in educational and clinical
settings given their reduced language component.
Different cognitive tests are also considered to be speeded tests versus
power tests. A truly speeded test is one that everyone could get every ques-
tion correct if they had enough time. Some tests of clerical skills are exactly
like this; they may have two lists of paired numbers, for example, where
some pairings contain two identical numbers and other pairings are differ-
ent. The test-taker simply circles the pairings that are identical. Pure power
tests are measures in which the only factor influencing performance is how
much the test-taker knows or can do. A true power test is one where all
test-takers have enough time to do their best; the only question is what they
can do. Obviously, few tests are either purely speeded or purely power tests.
Most have some combination of both. For example, a testing company
may use a rule of thumb that 90 percent of test-takers should complete 90
percent of the questions; however, it should also be clear that the purpose
of the testing affects rules of thumb such as this. Few teachers would wish
to have many students unable to complete the tests that they take in classes,
for example. When test-takers have disabilities that affect their ability to
respond to questions quickly, some measures provide extra time, depend-
ing upon their purpose and the nature of the characteristics being assessed.
92 Psychological Testing
Questions on both achievement and ability tests can involve either rec-
ognition or free-response in answering. In educational and intelligence tests,
recognition tests typically include multiple-choice questions where one can
look for the correct answer among the options, recognize it as correct, and
select it as the correct answer. A free-response is analogous to a “fill-in-the-
blanks” or an essay question. One must recall or solve the question without
choosing from among alternative responses. This distinction also holds for
some non-cognitive tests, but the latter distinction is discussed later in this
section because it focuses not on recognition but selections. For example,
a recognition question on a non-cognitive test might ask someone whether
they would rather go ice skating or to a movie; a free recall question would
ask the respondent what they like to do for enjoyment.
Cognitive tests of various types can be considered as process or product
tests. Take, for example, mathematics tests in school. In some instances,
only getting the correct answer leads to a correct response. In other cases,
teachers may give partial credit when a student performs the proper op-
erations but does not get the correct answer. Similarly, psychologists and
clinical neuropsychologists often observe not only whether a person solves
problems correctly (i.e., product), but how the client goes about attempting
to solve the problem (i.e., process).
Test Administration
One of the most important distinctions relates to whether tests are
group administered or are individually administered by a psychologist,
physician, or technician. Tests that traditionally were group administered
were paper-and-pencil measures. Often for these measures, the test-taker
received both a test booklet and an answer sheet and was required, unless
he or she had certain disabilities, to mark his or her responses on the an-
swer sheet. In recent decades, some tests are administered using technology
(i.e., computers and other electronic media). There may be some adaptive
qualities to tests administered by computer, although not all computer-
administered tests are adaptive (technology-administered tests are further
discussed below). An individually administered measure is typically pro-
vided to the test-taker by a psychologist, physician, or technician. More
faith is often provided to the individually administered measure, because
the trained professional administering the test can make judgments during
the testing that affect the administration, scoring, and other observations
related to the test.
Tests can be administered in an adaptive or linear fashion, whether by
computer or individual administrator. A linear test is one in which ques-
tions are administered one after another in a pre-arranged order. An adap-
tive test is one in which the test-taker’s performance on earlier items affects
Scoring Differences
Tests are categorized as objectively scored, subjectively scored, or in
some instances, both. An objectively scored instrument is one where the
correct answers are counted and they either are, or they are converted to,
the final scoring. Such tests may be scored manually or using optical scan-
ning machines, computerized software, software used by other electronic
media, or even templates (keys) that are placed over answer sheets where
a person counts the number of correct answers. Examiner ratings and self-
report interpretations are determined by the professional using a rubric
or scoring system to convert the examinee’s responses to a score, whether
numerical or not. Sometimes subjective scores may include both quantita-
tive and qualitative summaries or narrative descriptions of the performance
of an individual.
Scores on tests are often considered to be norm-referenced (or norma-
tive) or criterion-referenced. Norm-referenced cognitive measures (such as
college and graduate school admissions measures) inform the test-takers
where they stand relative to others in the distribution. For example, an
applicant to a college may learn that she is at the 60th percentile, meaning
that she has scored better than 60 percent of those taking the test and less
well than 40 percent of the same norm group. Likewise, most if not all intel-
ligence tests are norm-referenced, and most other ability tests are as well.
94 Psychological Testing
In recent years there has been more of a call for criterion-referenced tests,
especially in education (Hambleton and Pitoniak, 2006). For criterion-
referenced tests, one’s score is not compared to the other members of the
test-taking population but rather to a fixed standard. High school gradu-
ation tests, licensure tests, and other tests that decide whether test-takers
have met minimal competency requirements are examples of criterion-
referenced measures. When one takes a driving test to earn one’s driver’s
license, for example, one does not find out where one’s driving falls in the
distribution of national or statewide drivers, one only passes or fails.
Test Content
As noted previously, the most important distinction among most psy-
chological tests is whether they are assessing cognitive versus non-cognitive
qualities. In clinical psychological and neuropsychological settings such
as are the concern of this volume, the most common cognitive tests are
intelligence tests, other clinical neuropsychological measures, and perfor-
mance validity measures. Many tests used by clinical neuropsychologists,
psychiatrists, technicians, or others assess specific types of functioning,
such as memory or problem solving. Performance validity measures are
typically short assessments and are sometimes interspersed among compo-
nents of other assessments that help the psychologist determine whether
the examinee is exerting sufficient effort to perform well and responding
to the best of his or her ability. Most common non-cognitive measures
in clinical psychology and neuropsychology settings are personality mea-
sures and symptom validity measures. Some personality tests, such as the
Minnesota Multiphasic Personality Inventory (MMPI), assess the degree to
which someone expresses behaviors that are seen as atypical in relation to
the norming sample.1 Other personality tests are more normative and try
to provide information about the client to the therapist. Symptom valid-
ity measures are scales, like performance validity measures, that may be
interspersed throughout a longer assessment to examine whether a person
is portraying him- or herself in an honest and truthful manner. Somewhere
between these two types of tests—cognitive and non-cognitive—are vari-
ous measures of adaptive functioning that often include both cognitive and
non-cognitive components.
Reliability
Reliability refers to the degree to which scores from a test are stable
and results are consistent. When constructs are not reliably measured the
obtained scores will not approximate a true value in relation to the psycho-
logical variable being measured. It is important to understand that observed
or obtained test scores are considered to be composed of true and error
elements. A standard error of measurement is often presented to describe,
within a level of confidence (e.g., 95 percent), that a given range of test
scores contains a person’s true score, which acknowledges the presence of
some degree of error in test scores and that obtained test scores are only
estimates of true scores (Geisinger, 2013).
Reliability is generally assessed in four ways:
96 Psychological Testing
Validity
While the scores resulting from a test may be deemed reliable, this
finding does not necessarily mean that scores from the test have validity.
Validity is defined as “the degree to which evidence and theory support the
interpretations of test scores for proposed uses of tests” (AERA et al., 2014,
p. 11). In discussing validity, it is important to highlight that validity refers
not to the measure itself (i.e., a psychological test is not valid or invalid) or
the scores derived from the measure, but rather the interpretation and use
of the measure’s scores. To be considered valid, the interpretation of test
scores must be grounded in psychological theory and empirical evidence
that demonstrates a relationship between the test and what it purports to
measure (Furr and Bacharach, 2013; Sireci and Sukin, 2013). Historically,
the fields of psychology and education have described three primary types
of evidence related to validity (Sattler, 2014; Sireci and Sukin, 2013):
1. Test content: Does the test content reflect the important facets
of the construct being measured? Are the test items relevant and
appropriate for measuring the construct and congruent with the
purpose of testing?
2. Relation to other variables: Is there a relationship between test
scores and other criterion or constructs that are expected to be
related?
3. Internal structure: Does the actual structure of the test match the
theoretically based structure of the construct?
4. Response processes: Are respondents applying the theoretical con-
structs or processes the test is designed to measure?
5. Consequences of testing: What are the intended and unintended
consequences of testing?
98 Psychological Testing
2 The brief overview presented here draws on the works of De Ayala (2009) and DeMars
(2010), to which the reader is directed for additional information.
one cannot achieve a high score by guessing or using other means to answer
currently. The three-parameter IRT model contains a third parameter, that
factor related to chance level correct scoring. This parameter is sometimes
called the pseudo-guessing parameter, and this model is generally used for
large-scale multiple-choice testing programs.
These models, because of their lessened reliance on the sampling of
test-takers, are very useful in the equating of tests that is the setting of
scores to be equivalent regardless of the form of the test one takes. In some
high-stakes admissions tests such as the GRE, MCAT, and GMAT, for ex-
ample, forms are scored and equated by virtue of IRT methods, which can
perform such operations more efficiently and accurately than can be done
with classical statistics.
continued
Recurrent obsessions or
compulsions that are a
source of marked distress
Unrealistic interpretation of
physical signs or sensations
associated with the
preoccupation or belief that
one has a serious disease or
injury
BOX 3-1
Descriptions of Tests by Four Areas of Core
Mental Residual Functional Capacity*
REFERENCES
AACN (American Academy of Clinical Neuropsychology). 2007. AACN practice guide-
lines for neuropsychological assessment and consultation. Clinical Neuropsychology
21(2):209-231.
AERA (American Educational Research Association), APA (American Psychological
Association), and NCME (National Council on Measurement in Education). 2014.
Standards for educational and psychological testing. Washington, DC: AERA.
APA. 2010. Ethical principles of psychologists and code of conduct. http://www.apa.org/ethics/
code (accessed March 9, 2015).
Brandt, J., and W. van Gorp. 1999. American Academy of Clinical Neuropsychology policy
on the use of non-doctoral-level personnel in conducting clinical neuropsychological
evaluations. The Clinical Neuropsychologist 13(4):385-385.
Buros Center for Testing. 2015. Test reviews and information. http://buros.org/test-reviews-
information (accessed March 19, 2015).
Chaytor, N., and M. Schmitter-Edgecombe. 2003. The ecological validity of neuropsychologi-
cal tests: A review of the literature on everyday cognitive skills. Neuropsychology Review
13(4):181-197.
Cronbach, L. J. 1949. Essentials of psychological testing. New York: Harper.
Cronbach, L. J. 1960. Essentials of psychological testing. 2nd ed. Oxford, England: Harper.
De Ayala, R. J. 2009. Theory and practice of item response theory. New York: Guilford
Publications.
DeMars, C. 2010. Item response theory. New York: Oxford University Press.
Furr, R. M., and V. R. Bacharach. 2013. Psychometrics: An introduction. Thousand Oaks,
CA: Sage Publications, Inc.
Geisinger, K. F. 2013. Reliability. In APA handbook of testing and assessment in psychology.
Vol. 1, edited by K. F. Geisinger (editor) and B. A. Bracken, J. F. Carlson, J. C. Hansen,
N. R. Kuncel, S. P. Reise, and M. C. Rodriguez (associate editors). Washington, DC: APA.
Groth-Marnat, G. 2009. Handbook of psychological assessment. Hoboken, NJ: John Wiley
& Sons.
Groth-Marnat, G., and M. Teal. 2000. Block design as a measure of everyday spatial ability:
A study of ecological validity. Perceptual and Motor Skills 90(2):522-526.
Hambleton, R. K., and M. J. Pitoniak. 2006. Setting performance standards. Educational
Measurement 4:433-470.
ITC (International Test Commission). 2005. ITC guidelines for translating and adaptating
tests. Geneva, Switzerland: ITC.
Lezak, M., D. Howieson, E. Bigler, and D. Tranel. 2012. Neuropsychological assessment. 5th
ed. New York: Oxford University Press.
NAN (National Academy of Neuropsychology). 2001. NAN definition of a clinical neuro
psychologist: Official position of the National Academy of Neuropsychology. https://www.
nanonline.org/docs/PAIC/PDFs/NANPositionDefNeuro.pdf (accessed November 25, 2014).
PAR (Psychological Assessment Resources). 2015. Qualifications levels. http://www4.parinc.
com/Supp/Qualifications.aspx (accessed January 5, 2015).
Pearson Education. 2015. Qualifications policy. http://www.pearsonclinical.com/psychology/
qualifications.html (accessed January 5, 2015).
Sattler, J. M. 2014. Foundations of behavioral, social, and clinical assessment of children. 6th
ed. La Mesa, CA: Jerome M. Sattler, Publisher, Inc.
Sireci, S. G., and T. Sukin. 2013. Test validity. In APA handbook of testing and assessment in
psychology. Vol. 1, edited by K. F. Geisinger (editor) and B. A. Bracken, J. F. Carlson,
J. C. Hansen, N. R. Kuncel, S. P. Reise, and M. C. Rodriguez (associate editors).
Washington, DC: APA.
SSA (Social Security Administration). n.d. Disability evaluation under social security—Part
III: Listing of impairments—Adult listings (Part A)—section 12.00 mental disorders.
http://www.ssa.gov/disability/professionals/bluebook/12.00-MentalDisorders-Adult.htm
(accessed November 14, 2014).
Suzuki, L. A., S. Naqvi, and J. S. Hill. 2014. Assessing intelligence in a cultural context. In
APA handbook of multicultural psychology. Vol. 1, edited by F. T. L. Leong, L. Comas-
Diaz, G. C. Nagayama Hall, V. C. McLoyd, and J. E. Trimble. Washington, DC: APA.
Trimble, J. E. 2010. Cultural measurement equivalence. In Encyclopedia of cross-cultural
school psychology. New York: Springer. Pp. 316-318.
Turner, S. M., S. T. DeMers, H. R. Fox, and G. Reed. 2001. APA’s guidelines for test user
qualifications: An executive summary. American Psychologist 56(12):1099.
Weiner, I. B. 2003. The assessment process. In Handbook of psychology, edited by I. B. Weiner.
Hoboken, NJ: John Wiley & Sons.
117
BOX 4-1
SSA Definitions of Symptoms, Signs,
and Laboratory Findings
Mental Disorders
Within its mental health listings, SSA (n.d.-a) identifies nine diagnostic
categories (see Chapter 3, Table 1). Of these nine, the committee identi-
fied five categories for which non-cognitive measures may provide useful
information: (1) schizophrenic, paranoid, and other psychotic disorders; (2)
affective disorders; (3) anxiety-related disorders; (4) personality disorders;
and (5) somatoform disorders.2 Box 4-2 contains the SSA descriptions of
each of the first four mental disorders categories.
These categories of mental disorders are well-established psychiatric
diagnoses with distinct diagnostic criteria. In clinical settings, diagnosis
in these categories often relies on self-report of symptoms, which are then
weighed against criteria in the Diagnostic and Statistical Manual of the
American Psychiatric Association (DSM-5). However, the method for as-
sessing symptom report may vary, from a simple, unstructured clinical
interview to more systematic approaches, such as the use of standardized
psychiatric diagnostic schedules and interviews or formal psychological
self-report measures. The use of such systematic approaches may help cor-
roborate and validate a patient’s symptom report.
There are also 11 mental disorder diagnostic categories listed by SSA
specifically for children. The structure and organization of these categories
is parallel to mental disorder listings shown for adults. The categories that
contain conditions typically first diagnosed in childhood contain intellectual
disability, autistic disorder and other pervasive developmental disorders,
and attention deficit hyperactivity disorder. In addition, conduct disorder
and oppositional defiant disorder are contained in the SSA listing for per-
sonality disorders.
2 Although somatoform disorders are included in the SSA mental health listings, the com-
mittee focuses on these in the next section on disproportionate somatic symptoms, alongside
multisystem illnesses and chronic idiopathic pain conditions.
BOX 4-2
SSA Definitions of Relevant Mental Disorders
BOX 4-3
Definitions of Relevant Disorders with
Disproportionate Somatic Symptoms
(1) activities of daily living (ADLs); (2) social functioning; (3) concentra-
tion, persistence, or pace; and (4) episodes of decompensation. However,
SSA (2010) published a Notice of Proposed Rulemaking (NPRM)4 for its
mental disorders listings, which among other changes, would alter the func-
tional categories on which disability determinations would be based, in-
creasing focus on the relation of functioning to the work setting. Proposed
functional domains in the NPRM are the abilities to (1) understand, re-
member, and apply information; (2) interact with others; (3) concentrate,
persist, and maintain pace; and (4) manage oneself.5 Definitions of each of
these domains are presented in Box 4-4. With SSA’s move in this direction
and the greater focus on functional abilities as they relate to work, the com-
mittee will examine the relevance of psychological self-report measures to
the proposed functional domains.
Although non-cognitive assessments do not provide direct evidence of
functional capacity, information obtained from these measures allows for
the corroboration of symptoms as presented, which can lead to greater
diagnostic accuracy. For example, self-report instruments allow for a stan-
dardized method of obtaining information that is normed against other
clinical and nonclinical groups, adding to the ability of a clinician to offer
accurate diagnoses. In addition, some of these instruments have validity
scales, which measure test-taking strategies, as discussed in detail below.
Understanding these presentation approaches (i.e., over- or underreporting
of symptoms) is helpful in identifying conditions accurately. From obtain-
ing an accurate diagnosis, the ability to generate more accurate prognostic
indicators increases and thereby provides greater ability to discern the
chronicity of conditions presented.
4 Public comments are still under review and a final rule has yet to be published as of the
BOX 4-4
SSA Proposed Functional Domains
6 These are commonly referred to as level C tests. Some tests have less stringent qualifica-
Types of SVTs
Many SVTs are scales within larger personality or multiscale invento-
ries assessing test-taker response styles used in completing the battery. These
scales may be designed as such and embedded or later derived from existing
items and scales based on typical response patterns, including those of spe-
cific populations. For example, each of the personality measures discussed
earlier in this chapter (i.e., MMPI-2-RF, MCMI-III, and PAI) contains valid-
ity scales that examine consistency of response, negative self-presentation,
and positive self-presentation to varying degrees. Box 4-5 lists the negative
self-presentation SVTs included in each of these measures.
Though fewer in number, stand-alone SVTs also exist to assess po-
tential exaggeration or feigning of psychological and neuropsychological
symptoms. These include a number of structured interviews, such as the
Structured Interview of Reported Symptoms (Rogers et al., 1992), the
Structured Inventory of Malingered Symptomatology (Widows and Smith,
2005), and the Miller Forensic Assessment of Symptom Test (Miller, 2001).
Like the embedded/derived measures, these SVTs examine accuracy of
symptom report in a variety of ways. As this is their sole purpose, they are
often used in conjunction with other measures that do not contain tests
of validity. Box 4-6 lists the scales related to negative self-presentation in
stand-alone SVTs.
BOX 4-5
Embedded/Derived SVTs for Negative Self-Presentation
MMPI-2-RFa
Infrequent Responses Overreporting across psychological, cognitive, and so-
(F-r) matic dimensions (as compared with general population)
Infrequent Overreporting of emotional distress and psychiatric ill-
Psychopathology ness (as compared with psychiatric populations)
Responses (Fp-r)
Infrequent Somatic Overreporting of somatic complaints (as compared with
Responses (Fs) medical patient populations)
Symptom Validity Overreporting of somatic and cognitive complaints
(FBS-r)
Response Bias (RBS) Overreporting of memory complaints
Henry-Heilbronner Physical symptom exaggeration (empirically derived
Indexb from existing scales; for use with personal injury litigants
and disability claimants)
Malingered Mood Exaggeration of emotional disturbance (empirically de-
Disorder Scalec rived from existing scales; for use with personal injury
litigants and disability claimants)
MCMI-IIId
Validity (V) Improbable symptoms; may measure confusion, difficul-
ties reading and understanding items, or responding in
a random fashion
Disclosure (X) Acknowledgment of difficulties and willingness to pres-
ent with symptoms
Debasement (Z) Tendency to present symptoms in an accentuated fashion
PAIe
Infrequency (INF) Statistically unlikely response patterns in items that have
low rates of endorsement and high rates of endorsement
Negative Impression Rare symptoms and those that are not reported by many
(NIM) respondents
Malingering Index Unlikely patterns; features that are more likely to be
(MAL) found in persons simulating mental disorders than in
clinical patients
Rogers Discriminant A statistically determined method that distinguishes
Function (RDF) simulators from those who were responding honestly
BOX 4-6
Stand-Alone SVTs for Negative Self-Presentation
• Psychosis (P)
• Neurologic Impairment (NI)
• Amnestic Disorders (AM)
• Low Intelligence (LI)
• Affective Disorders (AF)
REFERENCES
American Psychiatric Association. 2013. The diagnostic and statistical manual of mental
disorders: DSM-5. Washington, DC: American Psychiatric Association.
APA (American Psychological Association). 2010. Ethical principles of psychologists and code
of conduct. http://www.apa.org/ethics/code (accessed March 9, 2015).
Barsky, A. J., and J. F. Borus. 1999. Functional somatic syndromes. Annals of Internal
Medicine 130(11):12.
Beck, A., and R. Steer. 1993. Beck Anxiety Inventory manual. San Antonio, TX: Harcourt
Brace & Company.
Beck, A. T., R. Steer, and G. Brown. 1996. Beck Depression Inventory. 2nd ed. San Antonio,
TX: The Psychological Corporation.
Ben-Porath, Y. S., A. Tellegen, and N. Pearson. 2008. MMPI-2-RF: Manual for administration,
scoring, and interpretation. Minneapolis, MN: University of Minnesota Press.
Bigler, E. D. 2012. Symptom validity testing, effort, and neuropsychological assessment.
Journal of the International Neuropsychological Society 18(4):632-642.
Bush, S. S., R. M. Ruff, A. I. Tröster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds,
and C. H. Silver. 2005. Symptom validity assessment: Practice issues and medical
necessity. NAN policy and planning committee. Archives of Clinical Neuropsychology
20(4):419-426.
Bush, S. S., R. L. Heilbronner, and R. M. Ruff. 2014. Psychological assessment of symp-
tom and performance validity, response bias, and malingering: Official position of the
Association for Scientific Advancement in Psychological Injury and Law. Psychological
Injury and Law 7(3):197-205.
Butcher, J. N., W. Dahlstrom, J. Graham, A. Tellegen, and B. Kaemmer. 1989. MMPI-2:
Manual for administration and scoring. Minneapolis, MN: University of Minnesota
Press.
Derogatis, L. 1994. SCL-90-R: Symptom Checklist-90-R. Minneapolis, MN: Pearson.
Derogatis, L. R., and P. Spencer. 1993. Brief Symptom Inventory: BSI. Minneapolis, MN:
Pearson.
First, M. B., R. L. Spitzer, M. Gibbon, and J. B. Williams. 2012. Structured Clinical Interview
for DSM-IV axis I disorders (SCID-I), clinician version, administration booklet.
Arlington, VA: American Psychiatric Publishing.
Gibbon, M., R. L. Spitzer, and M. B. First. 1997. User’s guide for the Structured Clinical
Interview for DSM-IV axis II personality disorders: SCID-II. Arlington, VA: American
Psychiatric Publishing.
Hamilton, M. 1980. Rating depressive patients. Journal of Clinical Psychiatry 41(12):21-24.
Hathaway, S. R., and J. C. McKinley. 1940. A multiphasic personality schedule (Minnesota):
I. Construction of the schedule. Journal of Psychology 10:249-254.
Hathaway, S. R., and J. C. McKinley. 1943. Manual for the Minnesota Multiphasic Personality
Inventory. New York: The Psychological Corporation.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference par-
ticipants. 2009. American Academy of Clinical Neuropsychology consensus conference
statement on the neuropsychological assessment of effort, response bias, and malingering.
The Clinical Neuropsychologist 23(7):1093-1129.
Henningsen, P., S. Zipfel, and W. Herzog. 2007. Management of functional somatic syn-
dromes. Lancet 369(9565):946-955.
Henry, G. K., R. L. Heilbronner, W. Mittenberg, C. Enders, and D. M. Roberts. 2008.
Empirical derivation of a new MMPI-2 scale for identifying probable malingering in per-
sonal injury litigants and disability claimants: The 15-item Malingered Mood Disorder
Scale (MMDS). The Clinical Neuropsychologist 22(1):158-168.
Henry, G. K., R. L. Heilbronner, J. Algina, and Y. Kaya. 2013. Derivation of the MMPI-2-RF
Henry-Heilbronner Index-r (HHI-r) scale. The Clinical Neuropsychologist 27(3):509-515.
Larrabee, G. J. 2012. Performance validity and symptom validity in neuropsychological assess-
ment. Journal of the International Neuropsychological Society 18(4):625-630.
Larrabee, G. J. 2014. Performance and Symptom Validity. Presentation to IOM Committee
on Psychological Testing, Including Validity Testing, for Social Security Administration,
June 25, 2014, Washington, DC.
Miller, H. A. 2001. M-FAST: Miller forensic assessment of symptoms test professional manual.
Odessa, FL: Psychological Assessment Resources.
Millon, T., C. Millon, R. D. Davis, and S. Grossman. 2009. Millon Clinical Multiaxial
Inventory-III (MCMI-III) manual. San Antonio, TX: Pearson/PsychCorp.
Morey, L. C. 2007. Personality Assessment Inventory. Odessa, FL: Psychological Assessment
Resources.
Rogers, R., R. M. Bagby, and S. E. Dickens. 1992. Structured Interview of Reported Symptoms:
Professional manual. Odessa, FL: Psychological Assessment Resources.
Sheehan, D., Y. Lecrubier, K. Sheehan, P. Amorim, J. Janavs, E. Weiller, T. Hergueta, R. Baker,
and G. Dunbar. 1998. The Mini-International Neuropsychiatric Interview (MINI): The
development and validation of a structured diagnostic psychiatric interview for DSM-IV
and ICD-10. Journal of Clinical Psychiatry 59(20):22-33.
Spitzer, R. L., K. Kroenke, J. B. Williams, and P. H. Q. P. C. S. Group. 1999. Validation and
utility of a self-report version of PRIME-MD: The PHQ primary care study. JAMA
282(18):1737-1744.
SSA (Social Security Administration). 2010. Revised medical criteria for evaluating mental
disorders. Federal Register 75(160):34.
SSA. n.d.-a. Disability evaluation under social security—Part III: Listing of impairments—
Adult listings (Part A)—section 12.00 mental disorders. http://www.ssa.gov/disability/
professionals/bluebook/12.00-MentalDisorders-Adult.htm (accessed November 14, 2014).
SSA. n.d.-b. Disability evaluation under Social Security: Part I—general information. http://
www.ssa.gov/disability/professionals/bluebook/general-info.htm (accessed November 14,
2014).
Tollison, D., and J. Langley. 1995. Pain Patient Profile manual. Minneapolis, MN: National
Computer Systems.
Van Dyke, S. A., S. R. Millis, B. N. Axelrod, and R. A. Hanks. 2013. Assessing effort:
Differentiating performance and symptom validity. The Clinical Neuropsychologist 27(8):
1234-1246.
Vranceanu, A., A. Barsky, and D. Ring. 2009. Psychosocial aspects of diabling musculoskeletal
pain. Journal of Bone and Joint Surgery 91(8):2014-2018.
Weathers, F., B. Litz, D. Herman, J. Huska, and T. Keane. 1994. The PTSD checklist-civilian
version (PCL-C). Boston, MA: National Center for PTSD.
WHO (World Health Organization). 1993. Composite International Diagnostic Interview
(CIDI): Interviewer’s manual. Geneva, Switzerland: WHO.
Widows, M. R., and G. P. Smith. 2005. Structured Inventory of Malingered Symptomatology:
Professional manual. Lutz, FL: Psychological Assessment Resources.
Wing, J. K., T. Babor, T. Brugha, J. Burke, J. Cooper, R. Giel, A. Jablenski, D. Regier, and N.
Sartorius. 1990. SCAN: Schedules for Clinical Assessment in Neuropsychiatry. Archives
of General Psychiatry 47(6):589-593.
1 Asdocumented in Chapters 1 and 2, 57 percent of claims fall under mental disorders other
than intellectual disability and/or connective tissue disorders.
141
2 Public comments are currently under review and a final rule has yet to be published as of
the publication of this report.
tasks should be administered are determined and clearly spelled out. All
examiners use such methods and procedures during the process of collect-
ing the normative data, and such procedures normally should be used in
any other administration. Typical standardized administration procedures
or expectations include (1) a quiet, relatively distraction-free environment;
(2) precise reading of scripted instructions; and (3) provision of necessary
tools or stimuli. Use of standardized administration procedures enables ap-
plication of normative data to the individual being evaluated (Lezak et al.,
2012). Without standardized administration, the individual’s performance
may not accurately reflect his or her ability. An individual’s abilities may
be overestimated if the examiner provides additional information or guid-
ance than what is outlined in the test administration manual. Conversely,
a claimant’s abilities may be underestimated if appropriate instructions,
examples, or prompts are not presented.
may also result from other mental or physical disorders, such as bipolar
disorder, depression, schizophrenia, psychosis, or multiple sclerosis (Etkin
et al., 2013; Rao, 1986).
Processing Speed
Processing speed refers to the amount of time it takes to respond to
questions and process information, and “has been found to account for
variability in how well people perform many everyday activities, includ-
ing untimed tasks” (OIDAP, 2009, p. C-23). This domain reflects mental
efficiency and is central to many cognitive functions (NIH, n.d.). Tests for
deficits in processing speed include the WAIS-IV processing speed index and
the Trail Making Test Part A (Reitan, 1992).
Executive Functioning
Executive functioning is generally used as an overarching term encom-
passing many complex cognitive processes such as planning, prioritizing,
organizing, decision making, task switching, responding to feedback and
error correction, overriding habits and inhibition, and mental flexibility
(American Psychiatric Association, 2013; Elliott, 2003; OIDAP, 2009). It
has been described as “a product of the coordinated operation of various
processes to accomplish a particular goal in a flexible manner” (Funahashi,
2001, p. 147). Impairments in executive functioning can lead to disjointed
Interindividual Differences
The most basic level of interpretation is simply to compare an indi-
vidual’s testing results with the normative data collected in the develop-
ment of the measures administered. This level of interpretation allows the
examiner to determine how typical or atypical an individual’s performance
is in comparison to same-aged individuals within the general population.
Normative data may or may not be further specialized on the basis of race/
ethnicity, gender, and educational status. There is some degree of variability
in how an individual’s score may be interpreted based on its deviation from
the normative mean due to various schools of thought, all of which cannot
be described in this text. One example of an interpretative approach would
be that a performance within one standard deviation of the mean would be
considered broadly average. Performances one to two standard deviations
below the mean are considered mildly impaired, and those two or more
standard deviations below the mean typically are interpreted as being at
least moderately impaired.
Intraindividual Differences
In addition to comparing an individual’s performances to that of the
normative group, it also is important to compare an individual’s pat-
tern of performances across measures. This type of comparison allows for
identification of a pattern of strengths and weaknesses. For example, an
individual’s level of intellectual functioning can be considered a benchmark
to which functioning within some other domains can be compared. If all
performances fall within the mildly to moderately impaired range, an in-
terpretation of some degree of intellectual disability may be appropriate,
depending on an individual’s level of adaptive functioning. It is important
to note that any interpretation of an individual’s performance on a battery
of tests must take into account that variability in performance across tasks
is a normal occurrence (Binder et al., 2009) especially as the number of tests
administered increases (Schretlen et al., 2008). However, if there is signifi-
cant variability in performances across domains, then a specific pattern of
impairment may be indicated.
Profile Analysis
When significant variability in performances across functional domains
is assessed, it is necessary to consider whether or not the pattern of func-
tioning is consistent with a known cognitive profile. That is, does the indi
vidual demonstrate a pattern of impairment that makes sense or can be
reliably explained by a known neurobehavioral syndrome or neurological
disorder. For example, an adult who has sustained isolated injury to the
temporal lobe of the left hemisphere would be expected to demonstrate
some degree of impairment on some measures of language and verbal
memory, but to demonstrate relatively intact performances on measures of
visual-spatial skills. This pattern of performance reflects a cognitive profile
consistent with a known neurological injury. Conversely, a claimant who
demonstrates impairment on all measures after sustaining a brief concus-
sion would be demonstrating a profile of impairment that is inconsistent
with research data indicating full cognitive recovery within days in most
individuals who have sustained a concussion (McCrea et al., 2002, 2003).
that was completed. The reason for the evaluation, or more specifically,
the type of claim of impairment, may suggest a need for a specific type of
qualification of the individual performing and especially interpreting the
evaluation.
As stated in existing SSA (n.d.-a) documentation, individuals who ad-
minister more specific cognitive or neuropsychological evaluations “must
be properly trained in this area of neuroscience.” Clinical neuropsycholo-
gists, as defined above, are individuals who have been specifically trained
to interpret testing results within the framework of brain-behavior relation-
ships and who have achieved certain educational and training benchmarks
as delineated by national professional organizations (AACN, 2007; NAN,
2001). More specifically, clinical neuropsychologists have been trained to
interpret more complex and comprehensive cognitive or neuropsychologi-
cal batteries that could include assessment of specific cognitive functions,
such as attention, processing speed, executive functioning, language, visual-
spatial skills, or memory. As stated above, interpretation of data involves
examining patterns of individual cognitive strengths and weaknesses within
the context of the individual’s history including specific neurological injury
or disease (i.e., claims on the basis of TBI).
For these reasons, analysis of the entire cognitive profile for consistency
is generally recommended. Specific patterns that increase confidence in the
validity of a test battery and overall assessment include
Specific tests have also been designed especially to aid in the examina-
tion of performance validity. The development of and research on these
PVTs has increased rapidly during the past two decades. There have been
attempts to formally quantify performance validity during testing since the
mid-1900s (Rey, 1964), with much of the initial focus on examining the
consistency of an individual’s responses across a battery of testing, with
the suggestion that inconsistency may indicate variable effort. However, a
significant push for specific formal measures came in response to the in-
creased use of neuropsychological and cognitive testing in forensic contexts,
including personal injury litigation, workers compensation, and criminal
proceedings in the 1980s and 1990s (Bianchini et al., 2001; Larrabee,
2012a). Given the nature of these evaluations, there was often a clear
incentive for an individual to exaggerate his or her impairment or to put
forth less than optimal effort during testing, and neuropsychologists were
being called upon to provide statements related to the validity of test results
(Slick et al., 1999). Several studies documented that use of clinical judgment
and interpretation of performance inconsistencies alone was an inadequate
methodology for detection of poor effort or intentionally poor performance
(Faust et al., 1988; Heaton et al., 1978; van Gorp et al., 1999). As such, the
need for formal standardized measures of effort and means for interpreta-
tion of these measures emerged.
Types of PVTs
PVTs may be designed as such and embedded within other cognitive
tests, later derived from standard cognitive tests, or designed as stand-alone
measures. Examples of each type of measure are discussed below.
Stand-Alone Measures
A stand-alone PVT is a measure that was developed specifically to as-
sess a test-taker’s effort or consistency of responses. That is, although the
measure may appear to assess some other cognitive function (e.g., memory),
it was actually developed to be so simple that even an individual with severe
impairments in that function would be able to perform adequately. Such
measures may be forced choice or non-forced choice (Boone and Lu, 2007;
Grote and Hook, 2007).
The Test of Memory Malingering (TOMM) (Tombaugh and Tombaugh,
1996), the Word Memory Test (WMT) (Green et al., 1996), and the Rey
Memory for Fifteen Items Test (RMFIT) (Rey, 1941) are examples of stand-
alone measures of performance validity. As with many stand-alone mea-
sures, the TOMM, WMT, and RMFIT are memory tests that appear more
difficult than they really are. The TOMM and WMT use a forced-choice
method to identify noncredible performance in which the test-taker is asked
to identify which of two stimuli was previously presented. Accuracy scores
are compared to chance level performance (i.e., 50 percent correct), as
well as performance by normative groups of head-injured and cognitively
impaired individuals, with cut-offs set to minimize false-positive errors.
Alternatively, the RMFIT uses a non-forced-choice method in which the
test-taker is presented with a group of items and then asked to reproduce
as many of the items as possible.
Forced-Choice PVTs
As noted above, some PVTs are forced-choice measures on which
performance significantly below chance has been suggested to be evidence
of intentionally poor performance based on application of the binomial
theorem (Larrabee, 2012a). For example, if there are two choices, it would
be expected that purely random guessing would result in 50 percent of
items correct. Scores deviating from 50 percent in either direction indicate
nonchance-level performance. The most probable explanation for sub-
stantially below-chance PVT scores is that the test-taker knew the correct
answer but purposely selected the wrong answer. The Slick and colleagues
3 Atthe committee’s second meeting, Drs. Bianchini, Boone, and Larrabee all expressed great
concern about the susceptibility of PVTs to coaching and stressed the importance of ensur-
ing test security, as disclosure of test materials adversely affects the reliability and validity of
psychological test results.
Boone (2009, 2014), Larrabee (2012, 2014a,b), and others assert that
multiple PVT failures are generally required,4 and as the number of PVT
failures increase, the chance for a false positive approaches zero. Yet, it is
possible that PVT failures (i.e., below cut-off score performance) in certain
populations reflect legitimate cognitive impairments. For this reason, it has
also been recommended that close attention be paid to the pattern of PVT
performance and the potential for false positives in these at-risk popula-
tions in order to inform interpretation and reduce the chances for false
positives (Larrabee, 2014a,b) and to inform future PVT research (Boone,
2007; Larrabee, 2007).
For these reasons, it is necessary to evaluate PVTs in the context of
the individual disability applicant, including interpretation of the degree of
PVT failure (e.g., below-chance performance versus performance slightly
below cut-off score performance) and the consistency of failure across
PVTs. Furthermore, careful interpretation of grey area PVT performance
(significantly above chance but below standard cut-offs) is necessary, given
that a significant proportion of individuals with bona fide mental or cogni-
tive disorders may score in this “grey area.” Adding to the complexity of
interpreting these scores, population-based norms, and certainly norms for
specific patient groups, are not available for most PVTs. Rather, owing to
the process of development of these tasks, normative data exist only for
select populations, typically litigants or those seeking compensation for in-
jury. Thus, there are no norms for specific demographic groups (e.g., racial/
ethnic minority groups). It has been suggested that examiners can compen-
sate for these normative issues by using their clinical judgment to identify
an alternate cut-off score for increased specificity (which will come at a cost
of lower sensitivity) (Boone, 2014). For example, if an examiner identifies
cultural, ethnic, and/or language factors known to affect PVT scores, the
examiner should adjust his or her thresholds for identifying noncredible
performance (Salazar et al., 2007).
Despite the practice standard of using multiple PVTs, there may be an
increased likelihood of abnormal performances as the number of measures
administered increases, a pattern that occurs in the context of standard cog-
nitive measures (Schretlen et al., 2008). This type of analysis is beginning
to be applied to PVTs specifically with inconsistent findings to date. Several
studies examining PVT performance patterns in groups of clinical patients
have indicated that it is very unlikely that an individual putting forth good
effort on testing will fail two or more PVTs regardless of type of PVT (i.e.,
embedded or free-standing) (Iverson and Franzen, 1996; Larrabee, 2003).
In fact, Victor and colleagues (2009) found a significant difference in the
Intellectual Disability
SSA has clear and appropriate standards for documentation for indi-
viduals applying for disability on the basis of intellectual disability (SSA,
n.d.-a). As stated by SSA, “standardized intelligence test results are essential
to the adjudication of all cases of intellectual disability” if the claimant does
not clearly meet or equal the medical listing without. There are individual
cases, of course, in which the claimant’s level of impairment is so signifi-
cant that it precludes formalized testing. For these individuals, their level
of functioning and social history provides a longitudinal consistent record
and documentation of impairment. For those who can complete intellectual
testing and for whom their social history is inconsistent, inclusion of some
documentation or assessment of effort may be warranted and would help
to validate the results of intellectual and adaptive functioning assessment.
Use of PVTs is common among practitioners assessing for intellectual
disability, with the TOMM being the most commonly used measure (Victor
and Boone, 2007). However, caution is warranted in interpreting PVT re-
sults in individuals with intellectual disability, as IQ has consistently been
correlated with PVT performance (Dean et al., 2008; Graue et al., 2007;
Hurley and Deal, 2006; Shandera et al., 2010). More importantly, individu-
als with intellectual disability fail PVTs at a higher rate than those without
(Dean et al., 2008; Salekin and Doane, 2009). In fact, Dean and colleagues
(2008) found in their sample that all individuals with an IQ of less than 70
failed at least one PVT. Thus, cut-off scores for individuals with suspected
intellectual disability may need to be adjusted due to a higher rate of false-
positive results in this population. For example, lowering the TOMM Trial
2 and Retention Trial cut-off scores from 45 to 30 resulted in very low
false-positive rates (0–4 percent) (Graue et al., 2007; Shandera et al., 2010).
Neurocognitive Impairments
There are individuals who apply for disability with primary allegations
of cognitive dysfunction in one or more of the functional domains outlined
above (e.g., “fuzzy” thinking, slowed thinking, poor memory, concentration
difficulties). Standardized cognitive test results, as has been required for
individuals claiming intellectual disability, are essential to the adjudication
of such cases. These individuals may present with cognitive impairment due
to a variety of reasons including, but not limited to, brain injury or disease
(e.g., TBI or stroke) or neurodevelopmental disorders (e.g., learning disabil-
ities, attention deficit hyperactivity disorder). Similarly, disability applicants
may claim cognitive impairment secondary to a psychiatric disorder. For
all of these claimants, documentation of impairment in functional cognitive
domains with standardized cognitive tests is critically important. Within the
CONCLUSION
The results of standardized cognitive tests that are appropriately ad-
ministered, interpreted, and validated can provide objective evidence to
help identify and document the presence and severity of medically determin-
able mental impairments at Step 2 of SSA’s disability determination process.
In addition, such tests can provide objective evidence to help identify and
assess the severity of work-related cognitive functional impairment relevant
to disability evaluations at the listing level (Step 3) and to mental residual
functional capacity (Steps 4 and 5).Therefore, standardized cognitive test
results are essential to the determination of all cases in which an applicant’s
allegation of cognitive impairment is not accompanied by objective medical
evidence.
The results of cognitive tests are affected by the effort put forth by
the test-taker. If an individual has not given his or her best effort in tak-
ing the test, the results will not provide an accurate picture of the person’s
neuropsychological or cognitive functioning. Performance validity indica-
tors, which include PVTs, analysis of internal data consistency, and other
corroborative evidence, help the evaluator to interpret the validity of an
individual’s neuropsychological or cognitive test results. For this reason, it
is important to include an assessment of performance validity at the time
REFERENCES
AACN (American Academy of Clinical Neuropsychology). 2007. AACN practice guide-
lines for neuropsychological assessment and consultation. Clinical Neuropsychology
21(2):209-231.
Allen, L. M., III, R. L. Conder, P. Green, and D. R. Cox. 1997. CARB ‘97: Manual for the
computerized assessment of response bias. Durham, NC: Cognisyst.
American Psychiatric Association. 2013. The diagnostic and statistical manual of mental
disorders: DSM-5. Washington, DC: American Psychiatric Association.
APA (American Psychological Association). 2015. Guidelines and principles for accreditation of
programs in professional psychology: Quick reference guide to doctoral programs. http://
www.apa.org/ed/accreditation/about/policies/doctoral.aspx (accessed January 20, 2015).
Barrash, J., A. Stillman, S. W. Anderson, Y. Uc, J. D. Dawson, and M. Rizzo. 2010. Predicition
of driving ability with neuropsychological tests: Demographic adjustments diminish
accuracy. Journal of the International Neuropsychological Society 16(04):679-686.
Benedict, R. H. 1997. Brief visuospatial memory test—revised: Professional manual. Lutz, FL:
Psychological Assessment Resources.
Benedict, R. H., D. Schretlen, L. Groninger, and J. Brandt. 1998. Hopkins Verbal Learning
Test–Revised: Normative data and analysis of inter-form and test-retest reliability. The
Clinical Neuropsychologist 12(1):43-55.
Benton, A. L., K. S. de Hamsher, N. R. Varney, and O. Spreen. 1983. Contributions to neuro-
psychological assessment: A clinical manual. New York: Oxford University Press.
Benton, L., K. de Hamsher, and A. Sivan. 1994a. Controlled oral word association test.
Multilingual Aphasia Examination 3.
Benton, A. L., K. S. de Hamsher, N. R. Varney, and O. Spreen. 1994b. Contributions to
neuropsychological assessment: A clinical manual—second edition. New York: Oxford
University Press.
Berthelson, L., S. S. Mulchan, A. P. Odland, L. J. Miller, and W. Mittenberg. 2013. False
positive diagnosis of malingering due to the use of multiple effort tests. Brain Injury
27(7-8):909-916.
Bianchini, K. J., C. W. Mathias, and K. W. Greve. 2001. Symptom validity testing: A critical
review. The Clinical Neuropsychologist 15(1):19-45.
Bigler, E. D. 2012. Symptom validity testing, effort, and neuropsychological assessment.
Journal of the International Neuropsychological Society 18(4):632-642.
Bigler, E. D. 2014. Limitations with symptom validity, performance validity, and effort tests.
Presentation to IOM Committee on Psychological Testing, Including Validity Testing, for
Social Security Administration, June 25, 2014, Washington, DC.
Bigler, E. D. 2015. Use of symptom validity tests and performance validity tests in disability
determinations. Paper commissioned by the IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations.
http://www.iom.edu/psychtestingpaperEB (accessed April 9, 2015).
Bilder, R. M., C. A. Sugar, and G. S. Hellemann. 2014. Cumulative false positive rates
given multiple performance validity tests: Commentary on Davis and Millis (2014) and
Larrabee (2014). The Clinical Neuropsychologist 28(8):1212-1223.
Binder, L. M. 1993. Portland Digit Recognition Test manual—second edition. Portland, OR:
Private Publication.
Binder, L. M., and S. C. Willis. 1991. Assessment of motivation after financially compensable
minor head trauma. Psychological Assessment 3(2):175-181.
Binder, L. M., M. R. Villanueva, D. Howieson, and R. T. Moore. 1993. The Rey AVLT recog-
nition memory task measures motivational impairment after mild head trauma. Archives
of Clinical Neuropsychology 8:137-147.
Binder, L. M., G. L. Iverson, and B. L. Brooks. 2009. To err is human: “Abnormal” neuro-
psychological scores and variability are commin in healthy adults. Archives of Clinical
Neuropsychology 24:31-46.
Boone, K. B. 2007. Assessment of feigned cognitive impairment: A neuropsychological per-
spective. New York: Guilford Press.
Boone, K. B. 2009. The need for continuous and comprehensive sampling of effort/response
bias during neuropsychological examinations. The Clinical Neuropsychologist
23(4):729-741.
Boone, K. B. 2014. Selection and use of multiple performance validity tests (PVTs). Presentation
to IOM Committee on Psychological Testing, Including Validity Testing, for Social
Security Administration, June 25, 2014, Washington, DC.
Boone, K. B. and P. Lu. 2007. Non-forced-choice effort measures. In Assessment of malingered
neurocognitive deficits, edited by G. J. Larrabee. New York: Oxford University Press.
Pp. 27-43.
Boone, K. B., P. Lu, C. Back, C. King, A. Lee, L. Philpott, E. Shamieh, and K. Warner-Chacon.
2002a. Sensitivity and specificity of the Rey Dot Counting Test in patients with suspect
effort and various clinical samples. Archives of Clinical Neuropsychology 17(7):625-642.
Boone, K. B., P. H. Lu, and D. Herzberg. 2002b. The B Test manual. Los Angeles: Western
Psychological Services.
Boone, K. B., P. Lu, and J. Wen. 2005. Comparison of various RAVLT scores in the detection
of non-credible memory performance. Archives of Clinical Neuropsychology 20:301-319.
Brandt, J., and R. H. Benedict. 2001. Hopkins Verbal Learning Test, Revised: Professional
manual. Lutz, FL: Psychological Assessment Resources.
Brandt, J., and W. van Gorp. 1999. American Academy of Clinical Neuropsychology policy
on the use of non-doctoral-level personnel in conducting clinical neuropsychological
evaluations. The Clinical Neuropsychologist 13(4):385.
Busch, R. M., G. J. Chelune, and Y. Suchy. 2006. Using norms in neuropsychological assess-
ment of the elderly. In Geriatric neuropsychology: Assessment and intervention, edited
by D. K. Attix and K. A. Welsh-Bohmer. New York: Guilford Press.
Bush, S. S., R. M. Ruff, A. I. Tröster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds,
and C. H. Silver. 2005. Symptom validity assessment: Practice issues and medical
necessity. NAN policy & planning committee. Archives of Clinical Neuropsychology
20(4):419-426.
Carone, D. A. 2008. Children with moderate/severe brain damage/dysfunction outperform
adults with mild-to-no brain damage on the Medical Symptom Validity Test. Brain Injury
22(12):960-971.
Carrow-Woolfolk, E. 1999. CASL: Comprehensive Assessment of Spoken Language. Circle
Pines, MN: American Guidance Services.
Gervais, R. O., M. L. Rohling, P. Green, and W. Ford. 2004. A comparison of WMT, CARB,
and TOMM failure rates in non-head injury disability claimants. Archives of Clinical
Neuropsychology 19(4):475-487.
Goodglass, H., and E. Kaplan. 1983. Boston diagnostic aphasia examination. Philadelphia:
Lea & Febiger.
Graue, L. O., D. T. Berry, J. A. Clark, M. J. Sollman, M. Cardi, J. Hopkins, and D. Werline.
2007. Identification of feigned mental retardation using the new generation of malingering
detection instruments: Preliminary findings. The Clinical Neuropsychologist 21(6):929-942.
Green, P. 2004. Green’s Memory Complaints Inventory (MCI). Edmonton, Alberta, Canada:
Green’s.
Green, P. 2005. Green’s Word Memory Test for Window’s: User’s manual. Edmonton, Alberta,
Canada: Green’s.
Green, P. 2008. Manual for Nonverbal Medical Symptom Validity Test. Edmonton, Alberta,
Canada: Green’s.
Green, P., and L. Flaro. 2003. Word Memory Test performance in children. Child
Neuropsychology 9(3):189-207.
Green, P., L. Allen, and K. Astner. 1996. The Word Memory Test: A user’s guide to the oral
and computer-administered forms, U.S. version 1.1. Durham, NC: CogniSyst.
Greiffenstein, M. F., W. J. Baker, and T. Gola. 1994. Validation of malingered amnesia mea-
sures with a large clinical sample. Psychological Assessment 6(3):218-224.
Greiffenstein, M., R. Gervais, W. J. Baker, L. Artiola, and H. Smith. 2013. Symptom validity
testing in medically unexplained pain: A chronic regional pain syndrome type 1 case
series. The Clinical Neuropsychologist 27(1):138-147.
Greve, K. W., and K. J. Bianchini. 2004. Setting empirical cutoffs on psychometric indica-
tors of negative response bias: A methodological commentary with recommendations.
Archives of Clinical Neuropsychology 19(4):533-541.
Griffin, G. A., J. Normington, R. May, and D. Glassmire. 1996. Assessing dissimulation among
Social Security disability income claimants. Journal of Consulting Clinical Psychology
64(6):1425-1430.
Gronwall, D. 1977. Paced auditory serial-addition task: A measure of recovery from concus-
sion. Perceptual and Motor Skills 44(2):367-373.
Grote, L. G. and J. N. Hook. 2007. Forced-choice recognition tests of malingering. In
Assessment of malingered neurocognitive deficits, edited by G. J. Larrabee. New York:
Oxford University Press. Pp. 27-43.
Groth-Marnat, G. 2009. Handbook of psychological assessment. Hoboken, NJ: John Wiley
& Sons.
Hammill, D. D., and S. C. Larsen. 2009. Test of written language: Examiner’s manual. 4th
ed. Austin, TX: Pro-Ed.
Hampson, N. E., S. Kemp, A. K. Coughlan, C. J. Moulin, and B. B. Bhakta. 2013. Effort test
performance in clinical acute brain injury, community brain injury, and epilepsy popula-
tions. Applied Neuropsychology: Adult (ahead-of-print):1-12.
Heaton, R. K. 1993. Wisconsin Card Sorting Test: Computer version 2. Odessa, FL:
Psychological Assessment Resources.
Heaton, R. K., H. H. Smith, R. A. Lehman, and A. T. Vogt. 1978. Prospects for faking
believable deficits on neuropsychological testing. Journal of Consulting and Clinical
Psychology 46(5):892.
Heaton, R. K., I. Grant, and C. G. Matthews. 1991. Comprehensive norms for an expanded
Halstead-Reitan Battery: Demographic corrections, research findings, and clinical ap-
plications. Odessa, FL: Psychological Assessment Resources.
Heaton, R. K., M. Taylor, and J. Manly. 2001. Demographic effects and demographically cor-
rected norms with the WAIS-III and WMS-III. In Clinical interpretations of the WAIS-II
and WMS-III, edited by D. Tulsky, R. K. Heaton, G. J. Chelune, I. Ivnik, R. A. Bornstein,
A. Prifitera, and M. Ledbetter. San Diego, CA: Academic Press. Pp. 181-210.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference
Participants. 2009. American Academy of Clinical Neuropsychology consensus confer-
ence statement on the neuropsychological assessment of effort, response bias, and ma-
lingering. The Clinical Neuropsychologist 23(7):1093-1129.
Higginson, C. I., K. Lanni, K. A. Sigvardt, and E. A. Disbrow. 2013. The contribution of
trail making to the prediction of performance-based instrumental activities of daily
living in Parkinson’s disease without dementia. Journal of Clinical and Experimental
Neuropsychology 35(5):530-539.
Hiscock, M., and C. K. Hiscock. 1989. Refining the forced-choice method for the de-
tection of malingering. Journal of Clinical and Experimental Neuropsychology
11(6):967-974.
HNS (Houston Neuropsychological Society). 2003. The Houston Conference on Specialty
Education and Training in Clinical Neuropsychology policy statement. http://www.
uh.edu/hns/hc.html (accessed November 25, 2014).
Holdnack, J. A., and L. W. Drozdick. 2009. Advanced clinical solutions for WAIS-IV and
WMS-IV: Clinical and interpretive manual. San Antonio, TX: Pearson.
Hurley, K. E., and W. P. Deal. 2006. Assessment instruments measuring malingering used
with individuals who have mental retardation: Potential problems and issues. Mental
Retardation 44(2):112-119.
Iverson, G. L., and M. D. Franzen. 1996. Using multiple objective memory procedures to
detect simulated malingering. Journal of Clinical and Experimental Neuropsychology
18(1):38-51.
Jelicic, M., H. Merckelbach, I. Candel, and E. Geraets. 2007. Detection of feigned cognitive
dysfunction using special malinger tests: A simulation study in naïve and coached malin-
gerers. The International Journal of Neuroscience 117(8):1185-1192.
Johnson-Greene, D., L. Brooks, and T. Ference. 2013. Relationship between performance
validity testing, disability status, and somatic complaints in patients with fibromyalgia.
The Clinical Neuropsychologist 27(1):148-158.
Kaplan, E., H. Goodglass, and S. Weintraub. 2001. Boston Naming Test. Austin, TX: Pro-Ed.
Killgore, W. D., and L. DellaPietra. 2000. Using the WMS-III to detect malingering: Empirical
validation of the rarely missed index (RMI). Journal of Clinical and Experimental
Neuropsychology 22:761-771.
Kirkwood, M. 2014. Validity testing in pediatric populations. Presentation to IOM Committee
on Psychological Testing, Including Validity Testing, for Social Security Administration,
June 25, 2014, Washington, DC.
Kirkwood, M. W., K. O. Yeates, C. Randolph, and J. W. Kirk. 2012. The implications of
symptom validity test failure for ability-based test performance in a pediatric sample.
Psychological Assessment 24(1):36-45.
Larrabee, G. J. 2003. Detection of malingering using atypical performance patterns on stan-
dard neuropsychological tests. The Clinical Neuropsychologist 17(3):410-425.
Larrabee, G. J. 2007. Introduction: Malingering, research designs, and base rates. In
Assessment of malingered neuropsychological deficits, edited by G. J. Larrabee. New
York: Oxford University Press.
Larrabee, G. J. 2012a. Assessment of malingering. In Forensic neuropsychology: A scientific
approach, edited by G. J. Larrabee. New York: Oxford University Press.
Spreen, O., and E. Strauss. 1991. Controlled oral word association (word fluency). In A com-
pendium of neuropsychological tests, edited by O. Spreen and E. Strauss. Oxford, UK:
Oxford University Press. Pp. 219-227.
SSA (Social Security Administration). n.d.-a. Disability evaluation under social security—Part
III: Listing of impairments—Adult listings (Part A)—section 12.00 mental disorders.
http://www.ssa.gov/disability/professionals/bluebook/12.00-MentalDisorders-Adult.htm
(accessed November 14, 2014).
SSA. n.d.-b. Disability evaluation under Social Security: Part I—general information. http://
www.ssa.gov/disability/professionals/bluebook/general-info.htm (accessed November 14,
2014).
Stevens, A., K. Schneider, B. Liske, L. Hermle, H. Huber, and G. Hetzel. 2014. Is subnormal
cognitive performance in schizophrenia due to lack of effort or to cognitive impairment?
German Journal of Psychiatry 17(1):9.
Strauss, E., E. M. Sherman, and O. Spreen. 2006. A compendium of neuropsychological tests:
Administration, norms, and commentary. Oxford, UK: Oxford University Press.
Suchy, Y., G. Chelune, E. I. Franchow, and S. R. Thorgusen. 2012. Confronting patients
about insufficient effort: The impact on subsequent symptom validity and memory per-
formance. The Clinical Neuropsychologist 26(8):1296-1311.
Suhr, J. A., and D. Boyer. 1999. Use of the Wisconsin Card Sorting Test in the detection of ma-
lingering in student simulator and patient samples. Journal of Clinical and Experimental
Neuropsychology 21:701-708.
Sweet, J. J., D. G. Meyer, N. W. Nelson, and P. J. Moberg. 2011. The TCN/AACN 2010 “sal-
ary survey”: Professional practices, beliefs, and incomes of U.S. neuropsychologists. The
Clinical Neuropsychologist 25(1):12-61.
Tombaugh, T. N., and P. W. Tombaugh. 1996. Test of Memory Malingering: TOMM. North
Tonawanda, NY: Multi-Health Systems.
Trahan, D. E., and G. J. Larrabee. 1988. Continuous Visual Memory Test. Odessa, FL:
Psychological Assessment Resources.
van Gorp, W. G., L. A. Humphrey, A. Kalechstein, V. L. Brumm, W. J. McMullen, M.
Stoddard, and N. A. Pachana. 1999. How well do standard clinical neuropsychological
tests identify malingering?: A preliminary analysis. Journal of Clinical and Experimental
Neuropsychology 21(2):245-250.
Victor, T. L., and K. B. Boone. 2007. Identification of feigned mental retardation. In Assessment
of feigned cognitive impairment, edited by K. Boone. New York: Guilford Press. Pp.
310-345.
Victor, T. L., K. Boone, J. G. Serpa, J. Buehler, and E. Ziegler. 2009. Interpreting the meaning
of multiple symptom validity test failure. The Clinical Neuropsychologist 23(2):297-313.
Warrington, E. 1984. Recognition Memory Test manual. Windsor: Nfer-Nelson.
Wechsler, D. 1997a. Wechsler Adult Intelligence Scale (WAIS-III): Administration and scoring
manual—3rd edition. San Antonio, TX: The Psychological Corporation.
Wechsler, D. 1997b. WMS-III: Wechsler Memory Scale administration and scoring manual.
San Antonio, TX: The Psychological Corporation.
Wechsler, D. 2003. Wechsler Intelligence Scale for Children—fourth edition (WISC-IV). San
Antonio, TX: The Psychological Corporation.
Wechsler, D. 2008. Wechsler Adult Intelligence Scale—fourth edition (WAIS-IV). San Antonio,
TX: NCS Pearson.
Wechsler, D. 2009. WMS-IV: Wechsler Memory Scale—Administration and scoring manual.
San Antonio, TX: The Psychological Corporation.
WHO (World Health Organization). 2001. International classification of functioning, dis-
ability, and health (ICF). Geneva, Switzerland: WHO.
Young, G. 2014. Resource material for ethical psychological assessment of symptom and
performance validity, including malingering. Psychological Injury and Law 7(3):206-235.
Economic Considerations
177
seeking testing in advance of filing an application. One way SSA could estimate this is by
examining the share of applicants with intellectual disabilities who file for benefits with all
required testing in the application.
2 In some cases tests, could be administered online using computer-administered tests. These
Because most of the applicants for disability benefits live in the community rather than in an
institution, the present discussion focuses on non-facility prices.
ute rather than hourly increments. Hence the data were transformed to hourly rates for the
purpose of comparability to other codes.
SOURCE: CMS, 2015, and committee calculations.
4 The codes listed reflect a sample of codes that may be used by providers.
5 The length of an evaluation will vary depending on the purpose of the evaluation, and
more specifically, the type of psychological and/or cognitive impairments being assessed. Most
psychological and neuropsychological evaluations include (1) a clinical interview, (2) admin-
istration of standardized cognitive or non-cognitive psychological tests, and (3) professional
time for interpretation and integration of data. The relevant CPT codes for each of these pro-
cesses are generally billed in 1 hour per unit of service (the exception is 96150, which is a 15
minute/unit code). That is, an evaluation may include billing for 1 hour for clinical interview
(96116), 1 hour for administration of tests (96119), and 1 hour for interpretation and integra-
tion (96118) for a total of 3 hours of clinical service. However, a more complex case likely
will require additional hours of test administration and interpretation/integration in order to
fully answer the clinical question. In fact, the results of a national professional survey indicate
that billing for a typical neuropsychological evaluation is roughly 6 hours, with a range from
0.5 to 25 hours (Sweet et al., 2011).
6 The table includes both weighted and unweighted averages. Weighted averages are ap-
propriate for considering total costs to SSA since they are weighted to reflect population dif-
ferences across counties in which the reimbursement rate holds. Unweighted averages provide
information relevant to considering cost dispersion across states. Average prices referenced in
the text reflect weighted averages.
working.
continued
TABLE 6-2 Continued
184
SSDI
Claimants 259,977 $34,831.72 $21,048 $17,229 $24,680 $25,798 $21,141 $5,586.91
Concurrent
Claimants 176,617 $23,663.15 $14,299 $11,704 $16,766 $17,526 $14,362 $3,795.50
SSI Adult
Claimants 106,257 $14,236.31 $8,03 $7,042 $10,087 $10,544 $8,641 $2,283.46
Psychological Testing in the Service of Disability Determination
SSI Child
Claimants 297 $39.79 $24 $20 $28 $29 $24 $6.38
Total Cost N/A $72,771 $43,973 $35,994 $51,561 $53,897 $44,169 $11,672
Total Cost N/A $212,140 $128,190 $104,930 $150,309 $157,118 $128,760 $34,027
NOTE: Based on 2013 application data and 2014 Medicare pricing information, geographically weighted. Values in Table 6-2 may not exactly reflect
multiplication of weighted pricing data from Table 6-1 and number of persons in column one of Table 6-2 due to rounding error.
SOURCES: CMS, 2015; SSA, 2014c,d,e; and committee calculations.
be $212 million. This cost would drop to $51 million if such testing were
only provided to applicants with mental disorders (excluding intellectual
disabilities). Similarly, costs would be lower if other forms of psychological
testing were required or if other types of service providers were used.
Importantly, the cost estimates in Table 6-2 assume that SSA will be
responsible for all the costs of psychological testing. However, as noted
previously, some applicants may acquire and include required tests as part
of the medical records presented at application. In this case, the cost to
SSA would be minimal, providing that the disability determination offices
already have sufficient personnel to adequately evaluate the test findings.
Another assumption implicit in this simple cost calculation is that the
psychological testing would be added to current DDS case development
costs. To the extent that psychological testing replaces rather than augments
existing case development modalities, the costs to SSA would be lower than
the simple estimates in the table. There are good reasons to believe that this
might be the case. Consultative exams are already a common component
of disability determinations.9 Some of these exams include psychological
testing and it might be possible to add additional tests with limited ad-
ditional costs.
Of course, the estimates in Table 6-2 could also understate the costs,
especially since the calculations rely on a mapping of the recommendations
to publically available data that may insufficiently capture the true number
of individuals who could require testing. Accurately assessing the costs of
mandatory psychological testing by SSA will require more detailed informa-
tion on the parameters of implementation as well as experience in the field
once testing has begun.
9
On average 47 percent of disability evaluations include a consultative examination, al-
though there is considerable variation across states (SSA, 2014a,b).
10 Improved accuracy could also decrease the number of individuals falsely denied benefits.
However, the focus of the literature has been on reducing those falsely allowed onto the
program.
NOTES: The 40 percent rate is bolded as the probable rate of malingering given in Larrabee,
Millis, and Meyers (2009). For the SSDI total, the number of disabled workers is used, remov-
ing spouse and child beneficiaries. Costs were estimated by multiplying the average disability
figure for each mental condition by the December 2011 number of individuals with that
condition, summing over all conditions, and then multiplying by 12 for the yearly estimated
amount. B = billion.
SOURCE: Chafetz and Underhill, 2013. Reproduced with permission.
TABLE 6-4 Calculation of 2011 SSI (Adult) Costs for Each Level of
Malingering of Mental Disorders
No. of Adults less than age 65 = 2011 Total Cost
Level (%) 2,797,743 $32,067,993,684
10 279,774 $1.799 B
20 559,549 $3.597 B
30 839,323 $5.396 B
40 1,119,097 $7.195 B
50 1,398,872 $8.994 B
60 1,678,646 $10.792 B
70 1,958,420 $12.591 B
80 2,238,194 $14.390 B
90 2,517,969 $16.189 B
NOTES: The 40 percent rate is bolded as the probable rate of malingering given in Larrabee,
Millis, and Meyers (2009). The SSI figures include the number of adults (less than age 65)
minus the children as of December 2011. Costs were estimated by multiplying the average
disability figure for each mental condition by the December 2011 number of individuals with
that condition, summing over all conditions, and then multiplying by 12 for the yearly esti-
mated amount. B = billion.
SOURCE: Chafetz and Underhill, 2013. Reproduced with permission.
and SSI beneficiaries were falsely awarded and would have been denied ben-
efits if given a SVT or PVT as part of the disability determination process.
This assumption is synonymous with the view that DDS offices currently
detect no one who exaggerates or fabricates their condition, symptoms, or
functional limitations. In other words, the Chafetz and Underhill compu-
tation assumes that under current practice 40 percent of all awardees are
given benefits even though they are not truly eligible. The extremeness of the
Chafetz and Underhill assumption suggests that the cost savings associated
with psychological testing is likely to be lower than they suggest.
The other important assumption embedded in the Chafetz and Underhill
projected cost savings is that SVTs and PVTs would be retroactively ap-
plied to the population of existing beneficiaries, regardless of time on the
program.11 Should SSA choose to implement mandatory SVT and PVT
testing, it would likely do so for new applicants to the disability programs,
making the potential cost savings lower than that computed by Chafetz
and Underhill.
Finally, the Chafetz and Underhill calculation is static. The more ap-
propriate method of computing cost savings is to consider the present
discounted value of an estimated stream of potential benefit savings, which
would generate a much larger estimate.
The importance of altering the assumptions about improved accuracy
of disability determinations and the size of the population exposed to test-
ing can be seen in Table 6-5. Reflecting the mapping of the committee’s
recommendations for testing used in Table 6-2, cost savings are estimated
for new awardees with mental impairments other than intellectual disabili-
ties and for those with arthritis and back disorders. For completeness, the
estimates are also provided for all new beneficiaries, regardless of condi-
tion and for all awardees and awardees determined eligible in Steps 4 or
5 of the disability determination process. The alternative estimates also
show the sensitivity of the estimated cost savings to the assumption about
the potential for mandatory SVT and PVT use to improve the accuracy of
SSA disability determinations. The 40 percent test failure rate preferred by
Chafetz and Underhill (2013) applies if the current SSA process detects zero
percent of those who exaggerate or fabricate; the 10 percent test failure rate
applies if SSA is relatively accurate, but makes some false-positive errors
that would be identified through the use of SVTs and PVTs.
Several important points emerge from the computations in the table.
First, the potential annual cost savings associated with mandatory SVT and
PVT testing is substantially reduced when it is applied to new awardees
11 Chafetz and Underhill (2013) limit the group to those with mental disorders, but even
so this assumption greatly increases the cost savings associated with greater use of testing,
because it essentially applies the 40 percent base malingering rate to all existing beneficiaries.
Average Benefit,a Diagnostic Distribution,b and Disability Applications Datac (in thousands of dollars)
Back Disorders
Concurrent 46,459 42,098 $173,628 $157,330 $43,407 $39,332
SSI Adults 32,649 29,677 $81,172 $73,783 $20,293 $18,466
SSI Children 622 244 $1,546 $607 $387 $152
All Diagnostic SSDI 399,722 233,522 $2,069,914 $1,209,267 $517,479 $302,317
Groups
Concurrent 210,812 111,331 $787,853 $416,070 $196,963 $104,017
SSI Adults 183,930 90,792 $498,182 $245,914 $124,546 $61,479
SSI Children 171,574 90,479 $464,716 $$245,066 $116,179 $61,267
average benefit payments by diagnosis, so the average benefit level for all persons was used for all concurrent enrollment calculations. For SSDI
and SSI, the average benefit amount for mental disabilities (excluding intellectual disability) was calculated as a weighted average of the average
monthly benefits awarded for mental disability diagnoses (excluding intellectual disability) using diagnostic distribution data. For musculoskeletal
conditions, there are no data available specifically for back disorders and arthritis, so the average benefit for musculoskeletal disorders was used to
calculate estimated savings. SSA did not have information concerning average SSI benefits by diagnosis available separately for children and adults,
so a single weighted average was used for both groups using diagnostic and benefit distributions for all recipients under age 65.
b SSDI diagnostic distribution data are from 2012. SSI and concurrent enrolled diagnostic distribution data are from 2013.
c All disability application data are from 2013.
d Test failure rates are synonymous with what some literature refers to as malingering rates.
rather than all beneficiaries on the programs. Considering only new award-
ees with mental impairments other than intellectual disabilities, the cost
savings assuming the 40 percent malingering rate is $236 million for SSDI
and $153 million for SSI, about one-fifth of the savings reported by Chafetz
and Underhill (2013). Second, cost savings are also reduced when the as-
sumption about the accuracy improvements associated with symptom and
validity testing are relaxed. If SSA misses 10, rather than 40, percent of
those with exaggerated or fabricated claims, the cost savings from manda-
tory testing on new awardees with mental impairments other than intellec-
tual disabilities falls from $236 to $59 million for SSDI and from $153 to
$38 million for SSI adults. Finally, cost savings decline if testing is required
only for applicants who reach Steps 4 or 5 of the disability determination
process. Although these estimates are far from exact, they suggest that cau-
tion is warranted when projecting potential cost savings from mandatory
psychological testing.
As noted earlier, the static calculations in Table 6-5, although useful
for comparing to Chafetz and Underhill, are not appropriate for computing
the expected savings associated with implementing SVTs and PVTs in SSA’s
disability determination process. The expected program savings is more
accurately calculated as the present discounted value of the averted pay-
ment flows associated with the denied applicants captured by psychologi-
cal testing. Using the same diagnostic categories as in Table 6-5, Table 6-6
shows the present discounted value of expected savings from disallowing
an unqualified applicant from each of the three disability programs. The
table also shows the estimated program savings to SSA under the assump-
tion that psychological testing as recommended would result in the denial
of benefits to 10 percent of applicants who would otherwise receive them.
Two points emerge from the table. First, the expected cost savings as-
sociated with denying an applicant improperly allowed on the program can
be sizeable, depending on the diagnosis and program. The estimated savings
are largest for individuals with mental impairments; this reflects the earlier
age of benefit receipt and longer average time on the program. Estimated
savings are smallest for SSI recipients with arthritis and back pain, again
largely reflecting the age at which recipients enter the program. Second, the
amount of program savings that comes from implementing psychological
testing depends mostly on how many additional individuals would be iden-
tified as unqualified for benefits relative to current practice. It is important
to keep in mind that psychological testing as recommended may also result
in the awarding of benefits to some portion of applicants who otherwise
would be denied. Assuming that implementation of psychological testing
reduces the number of newly awarded beneficiaries by 10 percent, the sav-
ings per cohort, while significant, still would be less than the annual savings
estimated by Chafetz and Underhill.
FINDINGS
Understanding the financial costs and benefits of using psychological
testing in the SSA disability determination process is an important, but un-
finished, task. The data necessary to make accurate calculations are limited,
and estimates based on available data are subject to considerable error. That
said, the framework for a proper computation is well understood and can
be used to guide data collection and evaluation when testing is and is not
employed.
Accurate assessments of the net financial impact of mandatory psycho-
logical testing will require information on the current accuracy of DDS deci-
sions and how the accuracy is improved, or unaffected, by the use of more
REFERENCES
Chafetz, M., and J. Underhill. 2013. Estimated costs of malingered disability. Archives of
Clinical Neuropsychology 28(7):633-639.
CMS (Centers for Medicare & Medicaid Services). 2015. Physician fee schedule search tool.
http://www.cms.gov/apps/physician-fee-schedule/search/search-criteria.aspx (accessed
January 20, 2015).
IOPC (Inter Organizational Practice Committee). 2013. Use of symptom validity indicators in
SSA psychological and neuropsychological evaluations. Letter to Senator Tom Coburn.
https://www.nanonline.org/docs/PAIC/PDFs/SSA%20and%20Symptom%20Validity%20
Tests%20-%20IOPC%20letter%20to%20Sen%20Coburn%20-%202-11-13.pdf (ac-
cessed February 8, 2015).
Larrabee, G. J., S. R. Millis, and J. E. Meyers. 2009. 40 plus or minus 10, a new magical
number: Reply to Russell. The Clinical Neuropsychologist 23(5):841-849.
Riley, G. F., and K. Rupp. 2014. Cumulative expenditures under the DI, SSI, Medicare, and
Medicaid programs for a cohort of disabled working-age adults. Health Services Research
50(2):514-536. doi: 10.1111/1475-6773.12219.
SSA (Social Security Administration). 2014a. DDS performance management report. Disability
claims data. Consultative examination rates, fiscal year 2013. Data prepared by ORDP,
ODP, and ODPMI. Submitted to the IOM Committee on Psychological Testing, Including
Validity Testing, for Social Security Administration Disability Determinations by Joanna
Firmin, Social Security Administration, on August 25, 2014.
SSA. 2014b. Disability claims data (initial, reconsideration, continuing, disability review)
by adjudicative level and body system. SSDI, SSI, concurrent, and total claims allow-
ance rates for claims with consultative examinations by U.S. states, fiscal year 2013.
Data prepared by ORDP, ODP, and ODPMI. Submitted to the IOM Committee on
Psychological Testing, Including Validity Testing, for Social Security Administration
Disability Determinations by Joanna Firmin, Social Security Administration, on
August 25, 2014.
SSA. 2014c. National data Title II-SSDI, Title XVI-SSI, & Concurrent Title II/XVI ini-
tial disability determinations by regulation basis code (reason for decision), fiscal year
2013. All cases except mental disorders (other than intellectual disability) and arthritis
and back diorders. Data prepared by SSA, ORDP, ODP, and ODPMI. Submitted to
the IOM Committee on Psychological Testing, Including Validity Testing, for Social
Security Administration Disability Determinations by Joanna Firmin, Social Security
Administration, on October 23, 2014.
SSA. 2014d. National data Title II-SSDI, Title XVI-SSI, & Concurrent Title II/XVI initial
disability determinations by regulation basis code (reason for decision), fiscal year 2013.
Arthritis and back disorders only. Data prepared by SSA, ORDP, ODP, and ODPMI.
Submitted to the IOM Committee on Psychological Testing, Including Validity Testing,
for Social Security Administration Disability Determinations by Joanna Firmin, Social
Security Administration, on October 23, 2014.
SSA. 2014e. National data Title II-SDI, Title XVI-SSI, & Concurrent Title II/XVI initial
disability determinations by regulation basis code (reason for decision), fiscal year 2013.
Mental disorders only (excluding intellectual disability). Data prepared by SSA, ORDP,
ODP, and ODPMI. Submitted to the IOM Committee on Psychological Testing, Including
Validity Testing, for Social Security Administration Disability Determinations by Joanna
Firmin, Social Security Administration, on October 23, 2014.
SSAB (Social Security Advisory Board). 2012. Aspects of disability decision making: Data and
materials. Washington, DC: SSAB.
Sweet, J. J., D. G. Meyer, N. W. Nelson, and P. J. Moberg. 2011. The TCN/AACN 2010
“salary survey”: Professional practices, beliefs, and incomes of U.S. neuropsychologists.
The Clinical Neuropsychologist 25(1):12-61.
197
ECONOMIC CONSIDERATIONS
The committee concluded the following with respect to the complex
economic considerations raised by increased systematic use of standardized
psychological testing by SSA as recommended:
Over the course of the project, the committee identified two areas in
particular in which it expects that the results of further research would help
to inform disability determination processes as indicated in the following
conclusions and recommendation.
AGENDA
8:30 a.m. Opening remarks
Herbert Pardes, M.D., Committee Chair
209
DISCUSSION
DISCUSSION
APPENDIX A 211
DISCUSSION
DISCUSSION
AGENDA
8:30 a.m. Opening remarks
Herbert Pardes, M.D., Committee Chair
APPENDIX A 213
12:45 p.m.
Disability Determination Services panel discussion with
the committee
Moderator—Mary C. Daly, Ph.D., Committee Member
Biographical Sketches of
Committee Members
Herbert Pardes, M.D. (Chair) is Executive Vice Chair of the Board of Trustees
of New York-Presbyterian Hospital. He formerly served as President and
Chief Executive Officer of New York-Presbyterian Hospital and the New
York-Presbyterian Healthcare System. His origins are in the field of psy-
chiatry, and he has an extensive background in health care and academic
medicine. He is nationally recognized for his broad expertise in education,
research, clinical care, and health policy, and as an ardent advocate of sup-
port for academic medicine. Dr. Pardes served as Director of the National
Institute of Mental Health (NIMH) and U.S. Assistant Surgeon General
during the Carter and Reagan administrations (1978–1984). Dr. Pardes
left NIMH in 1984 to become Chair of the Department of Psychiatry at
Columbia University’s College of Physicians and Surgeons and in 1989 was
also appointed Vice President for Health Sciences for Columbia University
and Dean of the Faculty of Medicine at the College of Physicians and
Surgeons. He served as President of the American Psychiatric Association
(1989), as Chair of the Association of American Medical Colleges (AAMC)
(1995–1996), and as Chair of the AAMC’s Council of Deans (1994–1995).
In addition, he served two terms as Chair of the New York Association
of Medical Schools. Dr. Pardes chaired the Intramural Research Program
Planning Committee of the National Institutes of Health (NIH) from 1996
to 1997, served on the Presidential Advisory Commission on Consumer
Protection and Quality in the Healthcare Industry, and is President of the
Scientific Council of the National Alliance for Research on Schizophrenia
and Depression. He serves on numerous editorial boards, has written more
215
than 155 articles and chapters on mental health and academic medicine
topics, and has negotiated and conducted international collaborations
with a variety of countries including India, China, and the former Soviet
Union. Dr. Pardes has earned numerous honors and awards, including the
U.S. Army Commendation Medal (1964), the Sarnat International Prize in
Mental Health (1997), election to the Institute of Medicine of the National
Academy of Sciences (1997), and election to the American Academy of Arts
and Sciences (2002). Dr. Pardes received his medical degree from the State
University of New York-Downstate Medical Center (Brooklyn) in 1960.
He received his bachelor of science degree summa cum laude from Rutgers
University in 1956. He completed his internship and residency training
in psychiatry at Kings County Hospital in Brooklyn and also did psycho
analytic training at the New York Psychoanalytic Institute.
APPENDIX B 217
research spans public finance, labor, and welfare economics, and she has
published widely on topics related to labor market fluctuations, public
policy, income inequality, and the economic well-being of less advantaged
groups. She previously served as a visiting scholar with the Congressional
Budget Office, as a member of the Social Security Advisory Board’s Technical
Panel, and the National Academy of Social Insurance Committee on the
Privatization of the Social Security Retirement Program. She has published on
the economics of the Social Security system. She currently serves on the edi-
torial board of the journal Industrial Relations. Dr. Daly joined the Federal
Reserve as an Economist in 1996 after completing a National Institute on
Aging postdoctoral fellowship at Northwestern University. Dr. Daly earned
a Ph.D. in Economics from Syracuse University. She joined the Institute for
the Study of Labor (IZA) as a Research Fellow in February 2014.
Naomi Lynn Gerber, M.D., is University Professor and Director of the Center
for the Study of Chronic Illness and Disability in the College of Health and
Human Services at George Mason University. She works in the areas of
measurement and treatment of impairments and disability in patients with
musculoskeletal deficits (including children with osteogenesis imperfecta;
persons with rheumatoid arthritis and cancer). Her research investigates
causes of functional loss and disability in chronic illness. Specifically, she
studies human movement and the mechanisms and treatment of fatigue.
Dr. Gerber is/has been a recipient of National Science Foundation, PNC
Foundation, National Institute on Disability and Rehabilitation Research
(NIDRR), National Institutes of Health (NIH), and Department of Defense
funding administered by the Henry Jackson Foundation. She was the Chief
of the Rehabilitation Medicine Department at the Clinical Center of NIH
in Bethesda, Maryland, from 1975 to 2005. She has been the recipient of
the Distinguished Service Award of the American Academy of Physical
Medicine and Rehabilitation (AAPMR) and the Oncology Section of
American Physical Therapy Association, the Distinguished Academician
Award of the Association of Academic Physiatrists, the WISE/Geico award,
NIH Directors Award, Surgeon General Award for Exemplary Service, and
the Smith College Medal. Dr. Gerber has served on many national com-
mittees and advisory boards including Osteogenesis Imperfecta Foundation
(1995–present), Kessler Medical Rehabilitation Research (2001–present),
National Center for Medical Rehabilitation Research, (2007–2011), Blue
Ribbon Panel Assessing Rehabilitation/Research, NIH (2011–2012). She is/
has been a grant reviewer for NIDRR, NIH, National Science Foundation,
and the Veterans Affairs. She served on the Board of Governors of the
AAPMR (2005–2008). Dr. Gerber is a member of the Institute of Medicine
of the National Academy of Sciences. In 2013 she delivered the Zeiter
Lecture at the AAPMR 75th anniversary. Dr. Gerber is a graduate of Tufts
University School of Medicine, diplomate of the American Board of Internal
Medicine, Rheumatology sub-specialty, and the American Board of Physical
Medicine and Rehabilitation.
APPENDIX B 219
APPENDIX B 221
Glossary
223
APPENDIX C 225
Reliability: the degree to which a test produces stable and consistent results
(Geisinger, 2013)
Substantial gainful activity: “work that involves doing significant and pro-
ductive physical or mental duties and is done (or intended) for pay or
profit” (20 CFR § 416.910)
Validity: the degree to which evidence and theory support the use and in-
terpretation of test scores (AERA et al., 2014)
REFERENCES
AERA (American Educational Research Association), APA (American Psychology Association),
and NCME (National Council on Measurement in Education). 2014. Standards for edu-
cational and psychological testing. Washington, DC: AERA.
American Psychiatric Association. 2013. The diagnostic and statistical manual of mental
disorders: DSM-5. Washington, DC: American Psychiatric Association.
APA (American Psychological Association). 2010. Public description of clinical neuropsychol-
ogy. http://www.apa.org/ed/graduate/specialize/neuro.aspx (accessed June 24, 2014).
APA. 2014. Public description of clinical psychology. http://www.apa.org/ed/graduate/
specialize/clinical.aspx (accessed June 24, 2014).
Bush, S. S., R. M. Ruff, A. I. Tröster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds,
and C. H. Silver. 2005. Symptom validity assessment: Practice issues and medical n ecessity.
NAN policy & planning committee. Archives of Clinical Neuropsychology 20(4):419-426.
Furr, R. M., and V. R. Bacharach. 2013. Psychometrics: An introduction. Thousand Oaks,
CA: Sage Publications, Inc.
Geisinger, K. F. 2013. Reliability. In APA handbook of testing and assessment in psychology.
Vol. 1, edited by K. F. Geisinger (editor) and B. A. Bracken, J. F. Carlson, J. C. Hansen,
N. R. Kuncel, S. P. Reise, and M. C. Rodriguez (associate editors). Washington, DC: APA.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference par-
ticipants. 2009. American Academy of Clinical Neuropsychology consensus conference
statement on the neuropsychological assessment of effort, response bias, and malingering.
The Clinical Neuropsychologist 23(7):1093-1129.
Hubley, A. M., and B. D. Zumbo. 2013. Psychometric characteristics of assessment proce-
dures: An overview. In APA handbook of testing and assessment in psychology. 3 vols.
Vol. 1, edited by K. F. Geisinger. Washington, DC: American Psychological Association.
IOM (Institute of Medicine). 2007. The future of disability in America. Washington, DC: The
National Academies Press.
APPENDIX C 227