Iom 2015 SVT Report Full
Iom 2015 SVT Report Full
Iom 2015 SVT Report Full
record_id=21704
ISBN
Committee on Psychological Testing, Including Validity Testing, for Social
978-0-309-37090-5 Security Administration Disability Determinations; Board on the Health of
Select Populations; Institute of Medicine
240 pages
6x9
PAPERBACK (2015)
INSTITUTE OF MEDICINE
Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press.
Unless otherwise indicated, all materials in this PDF are copyrighted by the National Academy of Sciences.
Request reprint permission for this book
NOTICE: The project that is the subject of this report was approved by the Governing Board of the
National Research Council, whose members are drawn from the councils of the National Academy
of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of
the committee responsible for the report were chosen for their special competences and with regard
for appropriate balance.
This study was supported by Contract/Grant No. SS00-13-60048/0003 between the National
Academy of Sciences and the Social Security Administration. Any opinions, findings, conclusions,
or recommendations expressed in this publication are those of the author(s) and do not necessarily
reflect the views of the organizations or agencies that provided support for the project.
Additional copies of this report are available for sale from the National Academies Press, 500 Fifth
Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313;
http://www.nap.edu.
For more information about the Institute of Medicine, visit the IOM home page at:
www.iom.edu.
The serpent has been a symbol of long life, healing, and knowledge among almost all cultures and
religions since the beginning of recorded history. The serpent adopted as a logotype by the Institute
of Medicine is a relief carving from ancient Greece, now held by the Staatliche Museen in Berlin.
Suggested citation: IOM (Institute of Medicine). 2015. Psychological testing in the service of
disability determination. Washington, DC: The National Academies Press.
The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in
scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general
welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to
advise the federal government on scientific and technical matters. Dr. Ralph J. Cicerone is president of the National Academy of
Sciences.
The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a
parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing
with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of
Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and
recognizes the superior achievements of engineers. Dr. C. D. Mote, Jr., is president of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent
members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts
under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal
government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Victor J. Dzau is
president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community
of science and technology with the Academy’s purposes of furthering knowledge and advising the federal government.
Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating
agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the
government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies
and the Institute of Medicine. Dr. Ralph J. Cicerone and Dr. C. D. Mote, Jr., are chair and vice chair, respectively, of the National
Research Council.
www.national-academies.org
HERBERT PARDES (Chair), Executive Vice Chairman of the Board, New York-Presbyterian Hospital,
The University Hospital of Columbia and Cornell, New York, New York
ARTHUR J. BARSKY III, Professor of Psychiatry, Harvard Medical School, and Vice Chair for
Psychiatric Research, Brigham & Womens Hospital, Boston, Massachusetts
MARY C. DALY, Senior Vice President and Associate Director of Economic Research, Federal Reserve
Bank of San Francisco, California
KURT F. GEISINGER, W.C. Meierhenry Distinguished University Professor of Educational Psychology
and Director, Buros Center for Testing, University of Nebraska Lincoln
NAOMI LYNN GERBER, University Professor, Center for the Study of Chronic Illness and Disability,
George Mason University, Fairfax, Virginia
ALAN M. JETTE, Professor of Health Policy & Management, Boston University School of Public Health
JENNIFER I. KOOP, Associate Professor, Department of Neurology, Medical College of Wisconsin,
Milwaukee
LISA A. SUZUKI, Associate Professor of Applied Psychology, New York University Steinhardt School of
Culture, Education, and Human Development, New York, New York
ELIZABETH W. TWAMLEY, Associate Professor of Psychiatry, University of California, San Diego
PETER A. UBEL, Madge and Dennis T. McLawhorn University Professor of Business, Fuqua School of
Business, and Professor of Public Policy, Sanford School of Public Policy, Duke University, Durham,
North Carolina
JACQUELINE REMONDET WALL, Professor, School of Psychological Sciences, University of
Indianapolis, Indiana, and Director, Office of Program Consultation and Accreditation, American
Psychological Association, Washington, DC
Liaison to IOM Standing Committee of Medical Experts to Assist Social Security on Disability Issues
HOWARD H. GOLDMAN, Professor Psychiatry, University of Maryland School of Medicine, Baltimore
Project Staff
CAROL MASON SPICER, Study Director
FRANK R. VALLIERE, Associate Program Officer
ALEJANDRA MARTIN, Research Associate (since January 2015)
NICOLE GORMLEY, Senior Program Assistant (since December 2014)
JONATHAN PHILLIPS, Senior Program Assistant (April to November 2014)
JON SANDERS, Program Coordinator (through January 2015)
PAMELA RAMEY-MCCRAY, Administrative Assistant
FREDRICK ERDTMANN, Director, Board on the Health of Select Populations
Reviewers
This report has been reviewed in draft form by individuals chosen for their diverse perspectives and
technical expertise, in accordance with procedures approved by the National Research Council’s Report
Review Committee. The purpose of this independent review is to provide candid and critical comments
that will assist the institution in making its published report as sound as possible and to ensure that the
report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The
review comments and draft manuscript remain confidential to protect the integrity of the deliberative
process. We wish to thank the following individuals for their review of this report:
Although the reviewers listed above have provided many constructive comments and suggestions, they
were not asked to endorse the conclusions or recommendations nor did they see the final draft of the report
before its release. The review of this report was overseen by Nancy Adler, University of California, San
Francisco, and Randy Gallistel, Rutgers University. Appointed by the National Research Council and the
Institute of Medicine, they were responsible for making certain that an independent examination of this
report was carried out in accordance with institutional procedures and that all review comments were
carefully considered. Responsibility for the final content of this report rests entirely with the authoring
committee and the institution.
Preface
The U.S. Social Security Administration (SSA) disability programs provide important, sometimes
vital, benefits to millions of adults and children annually in the United States. The programs are an
expression of the nation’s principle of caring for individuals who need support from the larger community.
Within the confines of SSA policy, the state Disability Determination Services (DDS) agencies, which
implement the policy, have the latitude to do so in whatever way they deem fit. It is not surprising that in a
country as diverse as the United Sates we would find geographic variations in the style and methods with
which that process is undertaken.
One element of such variation is the use or not of standardized psychological tests during the
disability determination process, other than the use of intelligence tests in determinations of intellectual
disability in children and adults. In this context, SSA asked the Institute of Medicine (IOM) to review
selected psychological tests and to evaluate the value of and provide guidance on the use of psychological
testing in SSA disability determinations.
SSA and the DDS agencies have the critical task of determining which applicants qualify for
disability benefits, a task complicated by the lack of direct correlation between the presence of an
impairment and disability, which SSA defines as the inability to work. DDS examiners undertake the very
complex task of reviewing and developing applicants’ files to determine which requests for disability
benefits are justified. As described in the report, the committee felt that it was worth considering whether
increased systematic use of standardized psychological testing in specific circumstances would strengthen
the current process for disability determination.
The committee thanks colleagues, organizations, and agencies that were willing to share their
expertise, time, and information during the committee’s information-gathering meetings. The names of the
speakers are included in the meeting agendas provided in Appendix A. The committee is grateful to the
authors of the two commissioned papers, Erin Bigler, David Freedman, and Jennifer Manly, for the in-
depth analyses they provided. The study sponsor, SSA, gladly provided information and data and
responded to questions. We also thank Howard Goldman, chair of the IOM Standing Committee of
Medical Experts to Assist Social Security on Disability Issues, who served as a consultant to the
committee and provided valuable insight. The contributions from all of these sources informed the
committee deliberations and enhanced the quality of this report.
I want also to pay tribute to and thank the expert members of our committee. A diversity of views,
at times a difference of views, all contributed to generating a consensus about issues important to SSA and
to the country. Throughout the project, they put in an enormous amount of time and effort; contributed
their experience, knowledge, and perspective; listened to contending arguments; and ultimately generated
the recommendations in this report. It is heartening to me and the other committee members to experience
the excellence and the commitment of so many good colleagues. I trust this report will be helpful to and
well received by SSA.
Finally, the committee thanks the IOM staff members who contributed to the production of this
report, including Frederick “Rick” Erdtmann (board director), Carol Mason Spicer (study director), Frank
Valliere (associate program officer), Alejandra Martín (research associate), Nicole Gormley (senior
program assistant), Jonathan Phillips (senior program assistant), Jon Sanders (program coordinator), Julie
Wiltshire (financial associate), and other staff of the Board on the Health of Select Populations and the
IOM, who provided support. Research assistance was provided by Daniel Bearss, Rebecca Morgan, and
Catherine van der List.
Contents
SUMMARY S-1
1 INTRODUCTION 1-1
Committee’s Approach to Its Charge, 1-6
Report Organization, 1-15
References, 1-15
APPENDIXES
BOXES
S-1 Statement of Task, S-4
3-1 Descriptions of Tests by Four Areas of Core Mental Residual Functional Capacity, 3-22
FIGURES
S-1 Components of psychological assessment, S-3
TABLES
2-1 Components of Total Variation in Allowance Rates from Level Fixed Effects OLS
Regressions Models, by SSA Program Group (in percent), 1993-2008, 2-14
2-2 Summary of Reported Base Rates of Malingering, 2-26
2-3 Psychological Testing in Different Settings, 2-38
3-1 Listings for Mental Disorders and Types of Psychological Tests, 3-16
Summary
BACKGROUND
In 2012, the U.S. Social Security Administration (SSA) provided benefits to nearly 15
million disabled adults and children through two disability programs. The majority of
beneficiaries, 8.8 million, received benefits through the Social Security Disability Insurance
(SSDI) program for disabled individuals, and their dependent family members, who have worked
and contributed to the Social Security trust funds. The remaining beneficiaries (4.9 million adults
and 1.3 million children) received benefits through the Supplemental Security Income (SSI)
program, which is a means-tested program based on income and financial assets for adults aged
65 years or older and disabled adults and children.
SSA disability determinations are based on the medical evidence and all evidence
considered relevant by the examiners in an applicant’s case record. Physical or mental
impairments must be established by objective medical evidence consisting of medical signs and
laboratory findings, which may include psychological tests and other standardized test results.
SSA establishes the presence of a medically determinable impairment in individuals with mental
disorders other than intellectual disability through the use of standard diagnostic criteria, which
include symptoms and signs. Evidence for these mental impairment claims, as well as for many
other categories of claims, such as those for certain musculoskeletal and connective tissue
conditions, relies less on standard laboratory tests than for some other categories of impairment.
SSA maintains a list of criteria for specific conditions that an applicant with one or more
of those conditions must meet in order to be receive disability benefits based solely on medical
criteria. SSA currently requires psychological test results, specifically intelligence test results, in
the listing criteria for intellectual disability in children and adults and in the criteria for cerebral
palsy, convulsive epilepsy, and meningomyelocele and related disorders. SSA questions the
value of purchasing psychological testing in cases involving mental disorders, other than for
intellectual disability, and it does not require testing either to establish or to assess the severity of
other mental disorders.
As noted, SSA indicates that objective medical evidence may include the results of
standardized psychological tests. Given the great variety of psychological tests, some are more
objective than others. Whether a psychological test is appropriately considered objective has
much to do with the process of scoring. For example, unstructured measures that call for open-
ended responding rely on professional judgment and interpretation in scoring; thus, such
S-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
measures are considered less than objective. In contrast, standardized psychological tests and
measures, such as those discussed in the report, are structured and objectively scored. In the case
of non-cognitive self-report measures, the respondent generally answers questions regarding
typical behavior by choosing from a set of predetermined answers. With cognitive tests, the
respondent answers questions or solves problems, which usually have correct answers, as well as
he or she possibly can. Such measures generally provide a set of normative data (i.e., norms), or
scores derived from groups of people for whom the measure is designed (i.e., the designated
population) to which an individual’s responses or performance can be compared. Therefore,
standardized psychological tests and measures rely less on clinical judgment and are considered
to be more objective than those that depend on subjective scoring. Unlike measurements such as
weight or blood pressure standardized psychological tests require the individual’s cooperation
with respect to self-report or performance on a task. The inclusion of validity testing in the test or
test battery allows for greater confidence in the test results. Standardized psychological tests that
are appropriately administered and interpreted can be considered objective evidence.
As illustrated in Figure S-1, standardized psychological testing is one component of a full
psychological assessment. Standardized psychological tests can be divided into measures of
typical behavior and tests of maximal performance. Measures of typical behavior, such as
personality, interests, values, and attitudes, may be referred to as non-cognitive measures. Tests
of maximal performance ask people to answer questions and solve problems as well as they
possibly can. Because tests of maximal performance typically involve cognitive performance,
they are often referred to as cognitive tests. It is through these two lenses—non-cognitive
measures and cognitive tests—that the committee examined psychological testing for the
purpose of disability evaluation in this report. Intelligence tests and neuropsychological tests are
examples of cognitive tests, while depression, anxiety, or personality inventories are examples of
non-cognitive measures. Cognitive tests tend to be performance-based, and non-cognitive
measures tend to be based on self-report. Validity testing is an area of psychological testing.
Performance validity tests (PVTs) provide information about an individual’s effort on tests of
maximal performance, such as cognitive tests. Symptom validity tests (SVTs) provide
information about the consistency and accuracy of an individual’s self-report of symptoms he or
she is experiencing.
SUMMARY S-3
There are differences of opinion on the use of validity tests and their value for work
disability evaluations. Current SSA policy precludes the purchase of validity tests as part of a
consultative examination to supplement an applicant’s medical evidence record, although
applicants and their representatives sometimes submit validity test results in support of their
claims. Professional organizations of neuropsychologists and psychologists have issued position
statements and guidance advocating for the use of validity tests in clinical and medicolegal
contexts, and several have challenged SSA’s institutional prohibition on ordering such tests. A
September 2013 report from SSA’s Office of the Inspector General concluded that although SSA
does not allow the purchase of validity tests, “medical literature, national neuropsychological
organizations, other Federal agencies, and private disability insurance providers support the use
of [validity tests] in determining disability claims.”
It is within this context that SSA asked the Institute of Medicine (IOM) to convene a
committee of relevant experts (e.g., adult and pediatric neuropsychology, psychology,
psychiatry, disability medicine, behavioral economics, and economics) to review selected
psychological tests, including SVTs and PVTs, and to evaluate the value of and provide guidance
on the use of such testing in the adjudication of claims submitted to the SSA Disability Programs
(see Box S-1 for the statement of task). In carrying out this task, the Committee on Psychological
Testing, Including Validity Testing, for Social Security Administration Disability
Determinations was asked to address several specific topics, including testing norms, the
administration of relevant tests and the qualifications for administering them, the interpretation
and reporting of test results, and economic considerations.
BOX S-1
Statement of Task
In considering its charge “to evaluate the value of psychological testing in the
adjudication of disability claims,” the committee interpreted value in terms of improved accuracy
with respect to rates of false negatives and false positives in SSA’s disability determinations and
consistency with respect to different adjudicators reaching the same determinations when
presented with the same evidence for comparable cases. As part of its information-gathering
process, the committee conducted an extensive review of the literature pertaining to the use of
psychological tests, including PVTs and SVTs, in disability determinations. The committee
supplemented its review of the literature with two public workshops to hear from
neuropsychologists with expertise in performance validity and symptom validity testing in adults
and children, the use of psychological and validity tests in culturally diverse populations, and the
use of such tests in non-SSA disability determination contexts (e.g., private disability insurance
programs, Canadian auto insurance, U.S. military disability or return-to-duty decisions, veterans’
disability compensation). The committee also heard from SSA and Disability Determination
Services representatives about the SSA disability determination process and its current policies
surrounding the use of psychological and validity testing. The committee commissioned two
papers to provide additional critical analysis in areas relevant to the committee’s work. The
committee’s work was further informed by previous IOM and National Research Council reports
focused on different aspects of the SSA disability determination process.
SUMMARY S-5
COMMITTEE’S RECOMMENDATIONS
There currently is great variability in allowance rates for both SSI and SSDI among states
that are not fully accounted for by differences in the populations of applicants. In addition, there
is great variability in the disability determination appeal rulings among administrative law judges
within and across states. Each state Disability Determination Services agency, within the
confines of SSA policy, issues its own rules regarding the tests that may be purchased as part of
a consultative examination. Aside from the use of intelligence tests as described in the listings
for intellectual disability and certain neurological impairments, SSA does not require or specify
the purchase of any type of (or individual) psychological test. SSA provides general guidance
that good psychological tests are valid and reliable and have appropriate normative data. For this
reason, there is variation among states about when and which standardized psychological tests
can be purchased, with the exception of SVTs and PVTs, which are precluded from purchase by
SSA except in rare cases such as a court order.
Although there currently are no data on the rates of false positives and false negatives in
SSA disability determinations, systematic use of standardized psychological testing for a broader
set of physical and mental impairments than is current practice is expected to improve the
accuracy and consistency of disability determinations for applicants who allege cognitive
impairment or whose allegation of functional impairment is based solely on self-report. The
results of standardized cognitive and non-cognitive psychological tests that are appropriately
administered, interpreted, and validated can provide objective evidence to help identify and
document the presence and severity of medically determinable mental impairments at Step 2 of
SSA’s disability determination process. In addition, standardized cognitive test results can
provide objective evidence to help identify and assess the severity of work-related cognitive
functional impairment relevant to disability evaluations at the listing level (Step 3) and to mental
residual functional capacity (Steps 4 and 5).
Current data on the prevalence of inconsistent reporting of symptoms or performing
below one’s capability on cognitive tests are very imprecise. In the context of SSA disability
applicants, neither scenario rules out disability, but both suggest the need for additional
assessment of the alleged impairment with the goal of making an accurate determination of
disability. When a disability claim is based primarily on an applicant’s self-report of symptoms
and self-reported statements about their intensity, persistence, and limiting effects, SSA relies on
an assessment of the consistency of the self-report with all of the evidence in the applicant’s
medical evidence record.
Although SSA’s current policy precludes the purchase of SVTs and PVTs, these tests
provide information about the validity of standardized non-cognitive and cognitive test results
when administered as part of the test or test battery and therefore are an important addition to the
medical evidence record in such cases. It is important that SVTs and PVTs only be administered
in the context of a larger test battery and only be used to interpret information from that battery.
Validity tests do not provide information about whether or not the individual is, in fact, disabled.
SUMMARY S-7
Standardized cognitive test results are essential to the determination of all cases in which
an applicant’s allegation of cognitive impairment is not accompanied by objective medical
evidence. The results of cognitive tests are affected by the effort put forth by the test-taker. If an
individual has not given his or her best effort in taking the test, the results will not provide an
accurate picture of the person’s neuropsychological or cognitive functioning. Performance
validity indicators, which include PVTs, analysis of internal data consistency, and other
corroborative evidence, help the evaluator to interpret the validity of an individual’s
neuropsychological or cognitive test results. For this reason, it is important to include an
assessment of performance validity when cognitive testing is administered. It also is important
that validity be assessed throughout the cognitive evaluation.
A PVT only provides information about the validity of an individual’s cognitive test
results that are obtained during the same evaluation. Evidence of invalid performance based on
PVT results pertains only to the cognitive test results obtained and does not provide information
about whether or not the individual is, in fact, disabled. A lack of validity on performance
validity testing alone is insufficient grounds for denying a disability claim. In such cases,
additional information is required to assess the applicant’s allegation of disability.
SUMMARY S-9
Economic Considerations
Based on its examination of the literature and dialogues with experts in a variety of areas,
including psychological and neuropsychological testing, performance validity testing and
symptom validity testing, and the disability evaluation process both within SSA and in other
arenas, the committee recognizes many questions remain with regard to the use of standardized
psychological testing in the disability determination process.
As part of its assessment of the use of standardized psychological tests for the disability
evaluation process, the committee was asked to discuss the costs and cost-effectiveness of
requiring a single test or a combination of tests. This report provides an initial framework for
evaluating the economic costs and highlights the types of data that will be needed to accurately
determine the financial impact of implementing the committee’s first two recommendations. The
following conclusions and recommendation relate to this enterprise.
Conclusions
Disability Determination Services decisions and how the accuracy is affected by the
increased use of standardized psychological testing.
• The absence of data on the rates of false positives and false negatives in current SSA
disability determinations precludes any assessment of their accuracy and consistency.
• There currently is great variability in allowance rates for both SSI and SSDI among
states that are not fully accounted for by differences in the populations of applicants.
There also is great variability in the disability determination appeal rulings among
administrative law judges within and across states. Although it is not possible to
know definitively whether the large share of unexplained variation in state filing,
award, and allowance rates is driven by variability in the federal disability
determination process, there is some evidence that states differ in how they manage
claims.
• In light of this unexplained variability, systematic use of standardized psychological
testing as recommended by the committee is expected to improve the accuracy and
consistency of disability determinations.
Over the course of the project, the committee identified two areas in particular in which it
expects that the results of further research would help to inform disability determination
processes as indicated in the following conclusions and recommendation.
SUMMARY S-11
Conclusions
Introduction
The U.S. Social Security Administration (SSA) administers two disability programs:
Social Security Disability Insurance (SSDI), for disabled individuals and their dependent family
members, who have worked and contributed to the Social Security trust funds, and Supplemental
Security Income (SSI), which is a means-tested program based on income and financial assets
for adults aged 65 years or older and disabled adults and children (SSA, 2012a). Both programs
require that claimants have a disability and meet specific medical criteria in order to qualify for
benefits.
In 2012, SSA provided benefits to nearly 15 million disabled adults and children (see
Table 1-1). The majority of beneficiaries, 8.8 million, received benefits through the SSDI
program (SSA, 2013a, Table 20). The remaining beneficiaries received benefits through the SSI
program; SSI paid benefits to 4.9 million adults and 1.3 million children (SSA, 2013b, Table 19).
Disability determinations are based on the medical evidence and all other evidence
considered relevant by the examiners in a claimant’s case record. Physical or mental impairments
must be established by objective medical evidence consisting of medical signs and laboratory
findings, which according to SSA may include psychological and other standardized test results
(20 CFR § 404.1528). The presence of an impairment requires objective findings and cannot be
based solely on a claimant’s statement of symptoms and functional limitations, although such
statements are treated as part of the overall evidence. SSA also considers the extent to which
such self-reported claims of impairment and functional limitation are consistent with the
observations by medical treating sources, and collateral observers, such as former employers,
teachers, family, or acquaintances. After reviewing all of the evidence relevant to the claim,
including medical evidence, the examiner makes a determination about what the evidence shows.
In some situations, the examiner is unable to make a determination because the evidence in the
case record is insufficient or inconsistent. In such cases, the examiner may ask the claimant to
attend a consultative examination, which SSA purchases.1
1
SSA guidelines for consultative examination reports are available (SSA, 2015).
1-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
INTRODUCTION 1-3
SSA maintains a list of criteria for specific conditions that an applicant with one or more
of those conditions must meet in order to receive disability benefits based solely on medical
criteria. SSA currently requires psychological test results, specifically intelligence test results, in
the listing criteria for intellectual disability in children and adults and in the criteria for cerebral
palsy, convulsive epilepsy, and meningomyelocele and related disorders. SSA questions the
value of purchasing psychological testing in cases involving mental disorders, other than for
intellectual disability, and it does not require testing either to establish or to assess the severity of
other mental disorders.
Disability examiners are experts at assessing the consistency of all evidence and making
a determination of its validity. As described more fully later in the chapter, there are two types of
validity tests that might assist in this process. Performance validity tests (PVTs) provide
information about an individual’s effort on cognitive and other performance-based tests.
Symptom validity tests (SVTs) provide information about the accuracy of an individual’s self-
report of symptoms he or she is experiencing. Both types of validity testing have generated
controversy with respect to SSA policy.
There are differences of opinion on the use of validity tests and their value for work
disability evaluations. SSA’s current position is not to purchase validity tests to address issues of
credibility or malingering as part of a consultative examination. Although SSA does not purchase
validity tests, claimants and their representatives sometimes submit them in support of their
claims. Professional organizations of neuropsychologists and psychologists, such as the
American Academy of Clinical Neuropsychology (AACN), the National Academy of
Neuropsychology (NAN), the American Psychological Association (APA), the Association for
Scientific Advancement in Psychological Injury and Law, and the British Psychological Society,
have issued position statements and guidance advocating for the use of validity tests in clinical
and medicolegal contexts (APA, 2013; British Psychological Society, 2009; Bush et al., 2005,
2014; Heilbronner et al., 2009). Two of these organizations, the AACN and the NAN, along with
Division 40 (Neuropsychology) of the APA and the American Board of Professional
Neuropsychology have challenged SSA’s institutional prohibition on ordering validity tests
(IOPC, 2013). In addition, a September 2013 report from SSA’s Office of the Inspector General
concluded that although SSA does not allow the purchase of validity tests, “medical literature,
2
See Social Security Ruling (SSR) on the Evaluation of Symptoms in Disability Claims: Assessing the Credibility
of an Individual’s Statements (SSA, 1996).
INTRODUCTION 1-5
BOX 1-1
Statement of Task
3
In the project background material, the sponsor asked the committee to consider topics such as the cost of
administering these tests, whether the cost varies by location, and the cost effectiveness (including cost per claim) of
requiring a single test or a combination of tests in the disability evaluation process for physical and mental
impairments (Revised project background, submitted by Joanna Firmin, Social Security Administration, May 23,
2014).
In considering its charge “to evaluate the value of psychological testing in the
adjudication of disability claims,” the committee interpreted value in terms of improved accuracy
with respect to rates of false negatives and false positives in SSA’s disability determinations and
consistency with respect to different adjudicators reaching the same determinations when
presented with the same evidence for comparable cases. Additional terminology that is
fundamental to the committee’s report, including the concept of disability, a variety of
psychological terms, and the concept of credibility, is described in the following sections.
Appendix C of the report contains a glossary of definitions for a number of terms that are
particularly relevant to the committee’s work.
Concept of Disability
INTRODUCTION 1-7
BOX 1-2
Major Concepts in the International Classification of
Functioning, Disability, and Health
Consistent with previous disability frameworks, including those from prior IOM reports
(IOM, 1991, 1997, 2007a) and Nagi (1965, 1976), “the ICF attempts to provide a comprehensive
view of health-related states from a biological, personal, and social perspective” (IOM, 2007b, p.
37). Human functioning and disability are portrayed “as the product of a dynamic interaction
between various health conditions and environmental and personal contextual factors” (IOM,
2007b, p. 37). The ICF framework differs from previous frameworks in that its components are
described using both positive and negative terms (see Box 1-2) (IOM, 2007b, p. 37). Thus, it
refers to health and functioning as well as disability.
As in the 1991 and 1997 IOM frameworks,
the ICF identifies multiple levels of human functioning and disability: at the level
of body or body parts, at the level of the whole person, and at the level of the
whole person who is functioning in his or her environment. These levels, in turn,
involve three aspects of human functioning that the ICF terms body functions and
structures, activities, and participation. (IOM, 2007b, p. 37–38)
Within the ICF, the term disability is used to denote decrements in all three aspects of human
functioning, which are labeled impairments, activity limitations, and participation restrictions
(IOM, 2007b, p. 38). For the purposes of SSA, disability in adults refers to the inability to work
at any job for a continuous period or 12 or more months. On this definition, disability refers to a
participation restriction, namely, an inability to participate in work-related activity. Disability in
children refers to “marked and severe functional limitations” relative to typically functioning
peers of the same age.
Noteworthy is the dynamic interaction between the different components of the ICF
model and various environmental (social and physical) and personal contextual (biological and
behavioral) factors (Figure 1-1) (IOM, 1991; WHO, 2001, p. 19). Movement between the
components is mediated by these factors and may occur in either direction—disabling or
enabling (IOM, 1991, 1997; WHO, 2001). Someone who lost a leg to disease or injury, for
example, would then have a limitation with respect to walking, but that limitation might be
reversed by the provision of a prosthetic leg. Similarly whether an individual is disabled as a
result of his or her functional or activity limitations depends on the accommodations available to
the individual that permit the person to engage in activities he or she otherwise would be unable
to perform (IOM, 1997).
For this reason, disability is not tightly correlated with the presence of impairment. Both
need to be evaluated, but the measures are fundamentally different, including objective measures
(performance and anatomical) and self-report measures that help determine how usual roles are
disrupted. The linkages between an individual’s anatomy, diagnosis, and impairment are not
sufficient to determine the presence of work disability. As the 2007 IOM report Improving the
Social Security Disability Decision Process states with respect to work disability:
Work disability … results from the interaction of individuals’ impairments,
functional limitations resulting from the impairments, assistive technologies to
which they may have access, and attitudinal and other personal characteristics
(such as age, education, skills, and work history) with the physical and mental
requirements of potential jobs, accessibility of transportation, attitudes of family
members and coworkers, and willingness of an employer to make
accommodations. (IOM, 2007c, p.26)
INTRODU
UCTION 1-9
Given
G the commplex interaaction among g the varietyy of factors thhat underlie a disability, it is
clear thatt disability determination
d ns are multiddimensional and always involve som me element oof
judgmentt (IOM, 1987). Although h objective medical
m eviddence can inddicate the prresence of
physical or mental im mpairments, the decision n about whetther those im mpairments result in a
disability
y is an admin nistrative or legal one (IO
OM, 1987; IIOM and NR RC, 2007).
Psycholo
ogical Termss
Psychologica
P al assessment refers to
th
he compreheensive integrration of info ormation fromm a variety of sources——
in
ncluding form mal psychological tests, informal tessts and surveeys, structured
cllinical interv
views, interv
views with otthers, schooll and/or meddical recordss, and
obbservationall data—to make
m inferencces regardingg the mentall or behaviorral
ch
haracteristics of an indivvidual or to predict
p behavvior. (Furr aand Bacharacch, 2013;
Hubley
H and Zumbo,
Z 20133)
Psycholoogical testing
g refers to “tthe use of forrmal, standaardized proceedures for saampling behavior
that ensu
ure objective evaluation ofo the test-taaker regardleess of who addministers thhe test” (Furrr and
Bacharacch, 2013; Hu ubley and Zu umbo, 2013).
Major
M categories of psych hological tessts include (11) intelligencce tests, (2)
neuropsyychological tests,
t (3) personality testts, (4) disordder-specific ttests (e.g., deepression,
anxiety), (5) achievemment tests, (6)
( aptitude tests,
t and (7)) occupationnal or interessts tests. Thee first
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
four categories capture the tests that are most relevant to disability determinations. Standardized
psychological tests can be divided into measures of typical behavior and tests of maximal
performance. Measures of typical behavior, such as personality, interests, values, and attitudes,
may be referred to as non-cognitive measures. Tests of maximal performance ask people to
answer questions and solve problems as well as they possibly can. Because tests of maximal
performance typically involve cognitive performance, they are often referred to as cognitive
tests. It is through these two lenses—non-cognitive measures and cognitive tests—that the
committee examined psychological testing for the purpose of disability evaluation in this report.
Intelligence tests and neuropsychological tests are examples of cognitive-based measures, while
depression, anxiety, or personality inventories are examples of non-cognitive measures.
Psychological tests may also be categorized as performance based and self-report. Cognitive
tests tend to be performance based, and non-cognitive measures tend to be based on self-report.
A variety of validity tests have been developed to assist examiners in interpreting the
results of different psychological tests. The committee distinguishes in this report between
performance validity tests (PVTs), which provide information about an individual’s effort on
tests of maximal performance, such as cognitive tests, and symptom validity tests (SVTs), which
provide information about the consistency and accuracy of an individual’s self-report of
symptoms he or she is experiencing. PVTs are stand-alone or embedded or derived measures that
are used to assess whether an examinee is performing at a level consistent with his or her actual
abilities (Larrabee, 2014). Measures of performance validity, often referred to as “effort” in the
literature, generally are associated with neuropsychological or cognitive testing. As discussed in
Chapter 5, PVTs help the examiner to interpret the validity of an individual’s neuropsychological
or cognitive test results. If an individual has not given his or her best effort in taking the test, the
results may not provide an accurate picture of the person’s neuropsychological or cognitive
functioning. SVTs are measures embedded in non-cognitive psychological measures (e.g.,
personality, mood scales) that are used to assess whether an examinee is providing an accurate
report of their actual symptom experience (Larrabee, 2014).
The distinction between performance validity and symptom validity was first introduced
in the literature in 2012 (Larrabee, 2012). Prior to that time, the term symptom validity often
encompassed the concept of performance validity as well as the accuracy of symptom self-report.
The committee has made every effort to maintain the distinction between performance validity
and symptom validity and to use the terms consistently throughout the report. In some cases,
doing so required interpreting published literature, particularly older literature, in light of the
revised terminology. For this reason, the report, when appropriate, may refer to performance
validity when discussing a particular publication, despite the original source using the term
symptom validity.
Table 1-3 provides a summary of the psychological terms discussed in this section, and
Figure 1-2 shows the relationships among the different terms.
INTRODUCTION 1-11
1-12 PSY
YCHOLOGIC
CAL TESTING
G IN THE SER
ERVICE OF D
DISABILITY D
DETERMINA
ATION
Credibiliity
In
n situations involving
i thee potential for
fo secondaryy gain—suchh as monetarry gain from ma
SSA disaability payment—there may m be motiv vation for inndividuals inntentionally tto feign or
exaggeraate symptoms or to exert suboptimal effort on peerformance m measures in oorder to pressent a
stronger need for sup pport or disab bility benefiits. Malingerring is the inntentional preesentation off
false or exaggerated
e symptoms, intentionally
i y poor perforrmance, or a combinatioon of the twoo,
motivated by externaal incentives (American Psychiatric
P A
Association,, 2013; Bushh et al., 20055;
Heilbronnner et al., 20
009). Two keey elements of malingeri ring are intenntion to deceeive or misleead
and motivation to do so for the purpose of acchieving som me type of seecondary gaiin.
Itt is importan
nt to distingu
uish between n malingeringg and the creedibility or nnoncredibilitty of
an individual’s performance or sy ymptom report, even in situations off potential seecondary gaiin.
Individuaals might ov ver- or underrreport symptoms or not give their beest effort on cognitive-based
measuress for any num mber of reasons. SVTs and a PVTs doo not in them mselves proviide informattion
4
about thee motivations of an exam minee or thee reasons whhy his or her performancee or symptom m
report maay appear to be noncrediible. Throug ghout the repport, the commmittee has aavoided use oof
4
Althoughh below chancee scores on a PV
VT can speak to
t an examineee’s intention— —the individual knew the answ wer
and deliberrately chose th
he wrong one——they cannot sp
peak directly too the individuaal’s motivation (reason) for
intentionallly choosing the wrong answeer.
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
INTRODUCTION 1-13
the term malingering when discussing the results of PVTs and SVTs, opting instead to refer to
the credibility or accuracy of an individual’s performance or symptom report. The committee
intends such terms to be value-neutral with respect to the examinee, referring only to whether the
examinee exerted sufficient effort for the test results to be considered valid and to the accuracy
of the individual’s statements about the experience of symptoms.
Study Focus
Although the report focuses primarily on the use of psychological tests in disability
determinations in adults, the use of such tests in children is also addressed. There are three areas
in SSA’s disability determination process where psychological testing could be of value: (1)
identification of a “medically determinable impairment”; (2) evaluation of functional capacity
for work; and (3) assessment of the validity of claimants’ psychological test results or the
accuracy of statements about self-reported symptoms. Although the report addresses all three
areas, the committee focuses on the second and third, where questions about the use of
psychological tests are more complex.
In considering its task, the committee observed that the vast number (in the hundreds) of
cognitive and non-cognitive psychological tests available for use precludes a detailed analysis of
each specific test and recommendations about the use of specific tests. In addition, decisions
about which specific tests are most appropriate for particular individuals in a particular set of
circumstances properly fall in the realm of clinical decision making. Instead, the committee
reviewed categories of psychological tests, including validity tests, and this report provides
general guidance on the use of such tests in SSA disability determinations for claims involving
physical and mental disorders.
It is important to note that SSA specifically requested that the committee not address the
use of intelligence tests in making determinations about intellectual disability since that topic
was previously examined in a 2002 National Research Council (NRC) report titled Mental
Retardation: Determining Eligibility for Social Security Benefits (NRC, 2002). Consideration of
intelligence tests with respect to embedded validity measures, however, was deemed to be within
the committee’s purview.
Information-Gathering Process
The committee conducted an extensive review of the literature pertaining to the use of
psychological tests, including PVTs and SVTs, in disability determinations. The committee
began with an English-language literature search of online databases, including PubMed,
Embase, Medline, Web of Science, Scopus, PsychINFO, Government Accountability Office,
Congressional Research Service, Google, Google Scholar, Legistorm (GAO reports,
congressional memorandums). Additional literature and other resources were identified by
committee members and project staff using traditional academic research methods and online
searches. Attention was given to consensus and position statements issued by relevant experts
and professional organizations.
The committee used a variety of sources to supplement its review of the literature. It met
in person five times and held two public workshops to hear from invited experts in areas
pertinent to the topic (see Appendix A for open session agendas and speaker lists). Speakers
included neuropsychologists with expertise in performance and symptom validity testing in
adults and children, the use of psychological and validity tests in culturally diverse populations,
and the use of such tests in non-SSA disability determination contexts (e.g., private disability
insurance programs, Canadian auto insurance, U.S. military disability or return-to-duty
decisions, veterans’ disability compensation). The committee also heard from SSA and DDS
representatives about the SSA disability determination process and its current policies
surrounding the use of psychological and validity testing.
In addition, the committee commissioned two papers to provide additional critical
analysis in areas relevant to the committee’s work. One paper addresses issues of diversity (e.g.,
in terms of culture, language, gender and gender identity, educational or socioeconomic status)
and multiculturalism in the use of psychological tests (self-report measures and performance-
based cognitive tests as well as corresponding validity tests) in making disability determinations.
The authors were asked to discuss the use of psychological tests in diverse populations in terms
of their validity, fairness, and other characteristics. They also were asked to address whether,
when, and/or how to use such measures, despite any limitations, in disability determinations for
diverse populations in the United States.
Based on its review of the literature, the presentations from invited experts on PVT and
SVT research at its open sessions, and the expertise of several of its members, the committee
understood the arguments and evidence supporting the inclusion of validity tests in psychological
and neuropsychological tests and test batteries. Because the committee found very little
published literature critiquing the use of SVTs and PVTs, they felt it was important to seek more
information about potential concerns or questions pertaining to their use. To this end, they
commissioned a second paper and asked the author to address a number of questions designed to
probe any challenges or cautions about the use of validity tests for disability determinations in
different populations. The questions posed by the committee included the following:
• In whom are PVTs and SVTs useful for informing disability determinations? In what
way?
• How or in what way do the results of PVTs or SVTs correlate with assessing
functional limitations (such as limitations in a person’s ability to do basic work
activities, activities of daily living, social functioning, and concentration, persistence,
or pace) due to an impairment?
• Given the historical context in which PVTs and SVTs were developed for forensic
use in litigation settings, can they be adapted for use in disability determinations?
Discuss the transferability of PVTs and SVTs given the differences in evidence use
and decision-making between fields (legal versus mediated or negotiated).
• How should one interpret validity test scores or results in the “grey area” between
clear failures (e.g., below chance scores) and clear passes on SVTs or PVTs? How
many people fail completely versus at the margins?
• When interpreting PVT or SVT failures, particularly in the “grey zone,” are there
factors aside from malingering or intentionally poor performance that may explain the
results (e.g., stems from symptoms, fatigue, apathy)?
• How does the current norming of SVTs and PVTs affect their usefulness in a variety
of different populations (e.g., a diversity of race, ethnicity, culture, and educational or
socioeconomic status)? Are there ways to resolve or mitigate the challenges posed by
lack of norming for particular populations?
INTRODUCTION 1-15
The committee’s work was further informed by previous IOM and NRC reports,
including Pain and Disability: Clinical, Behavioral, and Public Policy Perspectives (IOM,
1987); Disability in America: Toward a National Agenda for Prevention (IOM, 1991); Enabling
America: Assessing the Role of Rehabilitation Science and Engineering (IOM, 1997); PTSD
Compensation and Military Service (IOM and NRC, 2007); The Future of Disability in America
(IOM, 2007b); Improving the Social Security Disability Decision Process (IOM, 2007c); A 21st
Century System for Evaluating Veterans for Disability Benefits (IOM, 2007a); Mental
Retardation: Determining Eligibility for Social Security Benefits (NRC, 2002); and Survey
Measurement of Work Disability: Summary of a Workshop (NRC, 2000).
REPORT ORGANIZATION
Chapter 2 describes the current SSA disability determination process, focusing on areas
relevant to the use of psychological tests. It also discusses the use of psychological tests in
disability evaluations in non-SSA contexts. Chapter 3 provides an overview of psychological
tests, including the different types of tests and their use, psychometrics and norms, and the
administration of tests. Chapter 4 reviews the use of standardized psychological self-report
measures and SVTs in the context of SSA disability determinations. Chapter 5 addresses
standardized cognitive tests and the use of PVTs. Chapter 6 explores economic considerations
related to the use of psychological testing in SSA disability determinations. Chapter 7 contains
the committee’s conclusions and recommendations.
REFERENCES
American Psychiatric Association. 2013. American Psychiatric Association: Diagnostic and statistical
manual of mental disorders, fifth edition (DSM-5). Arlington, VA: American Psychiatric
Association.
APA (American Psychological Association). 2013. Specialty guidelines for forensic psychology.
American Psychologist 68(1):7-19.
British Psychological Society. 2009. Assessment of effort in clinical testing of cognitive functioning for
adults. Leicester, UK: British Psychological Society.
Bush, S. S., R. M. Ruff, A. I. Trӧster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds, and C. H.
Silver. 2005. Symptom validity assessment: Practice issues and medical necessity. NAN Policy &
Planning Committee. Archives of Clinical Neuropsychology 20(4):419-426.
Bush, S. S., R. L. Heilbronner, and R. M. Ruff. 2014. Psychological assessment of symptom and
performance validity, response bias, and malingering: Official position of the Association for
Scientific Advancement in Psychological Injury and Law. Psychological Injury and Law
7(3):197-205.
Furr, R. M., and V. R. Bacharach. 2013. Psychometrics: An introduction. Thousand Oaks, CA: Sage
Publications, Inc.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference Participants.
2009. American Academy of Clinical Neuropsychology consensus conference statement on the
neuropsychological assessment of effort, response bias, and malingering. Clinical
Neuropsychologist 23(7):1093-1129.
Hubley, A. M., and B. D. Zumbo. 2013. Psychometric characteristics of assessment procedures: An
overview. In APA handbook of testing and assessment in psychology, Volume 1— Test theory and
INTRODUCTION 1-17
In 2013, the Social Security Administration (SSA) received approximately 2.6 million
applications for Social Security Disability Insurance (SSDI) disabled worker benefits (SSA, n.d.
-m), 1.6 million applications for the Supplemental Security Income (SSI) adult program (SSA,
2014a, p. 92, Table V.C.1), and 442,000 applications for the SSI child program (SSA, 2014a, p.
24, Table V.C.2). This chapter describes SSA’s process for evaluating applications and
determining the disability status of the applicants, including the use of psychological testing in
SSA disability evaluations. It also provides an overview of base rates of “malingering” and a
discussion of the benefits of formal, standardized data collection and actuarial data interpretation.
The chapter concludes with an overview of the use of psychological tests in disability
evaluations in non-SSA systems, including the U.S. military and Veterans Affairs, private
disability insurance, forensic assessments, and some international programs.
The overall disability determination process (see Figure 2-1) is the same for both SSDI
and SSI, although the specific steps of the process vary for adults (20 CFR § 416.920; see Figure
2-2) and children (20 CFR §416.924; see Figure 2-3). For the average applicant the initial
determination process takes between 90 and 120 days from the date of filing. Decisions for
applicants with certain medical conditions, incomplete medical records, or who appeal the initial
decision can take far longer, in some cases stretching across several years (SSA, 2014i; SSDRC,
n.d.).
2-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
2-2 PS
SYCHOLOGIICAL TESTIN
NG IN THE SE
ERVICE OF D
DISABILITY D
DETERMINATION
Step
S 1: Nonmedical Eliigibility?
Applications
A for disabilityy benefits arre made at a local SSA ffield office. D During the fifirst
step of thhe disability determinatio on process, officials
o in thhe SSA fieldd offices verrify applicannts’
financial and other no onmedical (ee.g., age, wo ork credits) eeligibility reqquirements ((SSA, 2012aa).
For SSDII and SSI ap pplicants, thee examiners first check too see if appllicants are cuurrently worrking
and earniing more thaan the substaantial gainfull activity (SG GA) amountt—$1,040 peer month in 22013
for non-b blind applicaants (SSA, 20 014m). For SSIS applicannts, examineers also veriffy that appliccants
meet the income and resource lim mits necessarry to qualifyy for these m means-tested benefits.1 Foor
concurren nt SSDI/SSII adult appliccants, financcially eligibillity is checkked for both pprograms. Iff
applicantts fail on anyy of these finnancial criterria, the appliication is dennied.
Iff an applican
nt meets the nonmedical eligibility reequirements, the applicaation is
forwardeed to the statee Disability Determinatiion Services (DDS) agenncy, where a disability
examinerr develops an nd reviews the
t medical and a other evvidence2 for tthe claim annd makes an
initial determination about disabiility. In 2013 3, state DDS S offices evaaluated approoximately 2.8
1
For SSI child
c applicantss the income teest relates to th
he resources of the householdd.
2
Types of evidence may include (1) obj bjective medicaal evidence—i. e., medical siggns and laborattory findings, (2)
medical history and treatm ment records, (3)
( medical sou urce opinions aand statementss, (4) statementts from claimannt or
others, (5) information frrom other sourcces—e.g., educcational personnnel, social wellfare agency peersonnel (SSA A,
2012b).
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
million applications for disability benefits distributed as follows: 915,679 SSDI; 887,506
concurrent SSDI/SSI adult; 653,699 SSI adult; and 428,208 SSI child (SSA, 2014h). Before
beginning the disability evaluation, DDS examiners recheck that applicants meet the financial
and other nonmedical criteria for the disability programs. As shown in Figure 2-2, almost no
cases that reach the DDSs are rejected at this step, because the SSA field offices have already
screened the applicants on these criteria. If the financial criteria are met, the DDS agencies begin
to develop the case.
DDS agencies follow either a traditional or a single decision-maker (SDM) model (see
Figure 2-1), depending on the state. In the traditional model, the disability examiner makes the
determination in conjunction with a DDS psychological consultant or a medical consultant (20
CFR § 404.1615). In the SDM model (20 CFR § 404.906), disability examiners have the
authority to make the initial disability determination. In most cases, the disability examiners
prepare the assessments and have the authority to approve or deny claims without obtaining the
signature of a medical or psychological consultant. The exception is denials for mental
impairments, which must be reviewed by a psychological consultant. Medical and psychological
consultants are always available to assist disability examiners in their review of claims.
The second step of the process is designed to screen out claimants whose medically
determinable impairments are not considered to be “severe”—i.e., those who are clearly able to
work at some sort of substantial gainful activity or whose impairment is expected to resolve
within 12 months. A medically determinable physical or mental impairment or combination of
impairments is considered severe “if it significantly limits an individual’s physical or mental
abilities to do basic work activities” (SSA, 1996a). The impairment also must either be expected
to result in death or have lasted (or be expected to last) for 12 continuous months. An applicant is
denied at this step if the medically determinable impairment or combination of impairments “has
no more than a minimal effect on the ability to do basic work activities” (SSA, 1996a) or does
not meet the duration criterion. In 2013, 9.5 percent of SSDI applicants, 17.8 percent of
SSDI/SSI concurrent applicants, and 7.0 percent of SSI adult applicants were denied at this step
(see Figure 2-2) (SSA, 2014h). If the applicant is found to have a severe impairment, the
disability evaluation moves to the next step.
2-4 PS
SYCHOLOGIICAL TESTIN
NG IN THE SE
ERVICE OF D
DISABILITY D
DETERMINATION
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
Step 3:
3 Meets or Equals
E Med
dical Listinggs?
At
A Step 3, app plicants’ imppairments arre evaluated to determinee whether thhey meet or eequal
the mediccal criteria codified
c in SSA’s Listing g of Impairmments for aduults (SSA, n..d.-c). The
Listing off Impairmen nts is organizzed by majorr body system m and contaains criteria tto evaluate thhe
severity of
o a listed im
mpairment. These
T criteriaa may includde assessmennts of work--related
functioniing3 and are designed to identify individuals withh impairmennts that are ssufficiently
severe too prohibit theem from eng gaging in anyy kind of “gaainful activitty” (SSA, n.d.-b). In som me
cases, ann individual has
h multiple impairmentts, none of w which is, by iitself, sufficiiently severee to
meet the listing criterria, or an im
mpairment thaat is not incluuded in the L
Listing. In suuch cases, thhe
examinerr considers whether
w the impairment
i or
o combinattion of impaiirments is m medically equual to
a listed im
mpairment. IfI a claimantt’s impairmeent(s) meets or equals thhe listing critteria, the claim is
allowed. In 2013, 17.8 percent off SSDI appliicants, 11.2 ppercent of SSDI/SSI conncurrent
applicatioons, and 14.1 percent off SSI adult ap pplicants weere allowed aat this step oof the disabillity
screening g process (seee Figure 2-22) (SSA, 201 14h). All rem
maining claim ms move to the fourth sttep in
the evaluuation processs.
3
For mental disorders, fuunctional limitaations are used
d to assess the sseverity of the impairment. P
Paragraph B andd C
criteria in the
t Listing of Impairments
I fo
or mental disorrders describe tthe areas of funnction that are considered
necessary for
f work (SSAA, 2009).
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
At this step, applicants are assessed with respect to their mental or physical “residual
functional capacity” and the extent to which they can still perform activities related to jobs they
have held in the past 15 years. Applicants who are found to meet the demands of “past relevant
work” are denied. In 2013, 14.1 percent of SSDI applicants, 11.7 percent of SSDI/SSI concurrent
applicants, and 5.9 percent of adult SSI applicants were denied at this step of the process (see
Figure 2-2) (SSA, 2014h). Applicants who no longer are able to perform work they have done in
the past are then assessed for their ability to perform any work in the national economy (Step 5).
At this step, applicants’ residual functional capacity is evaluated along with the
vocational factors of age, education, and previous work experience to determine whether they
would be able to adjust to other work that exists in the national economy. Disability examiners
consider increasing age, generally beginning at age 50; years of education or specialized job or
vocational training; and transferability of skills from previous employment, along with an
individual’s residual physical and mental abilities, when determining whether the applicant could
adjust to doing some sort of work (SSA, n.d.-j). For example, a 50-year-old applicant with less
than a high school education, no skilled work experience, and a maximum sustained work
capacity limited to sedentary work could be considered disabled, while the same 50-year-old
applicant who has experience as a skilled worker could be denied. If an applicant is found unable
to perform any work in the national economy, the claim is allowed; otherwise, the claim is
denied. In 2013, 24.3 percent of SSDI applicants were denied benefits at this stage, and 25.5
percent were determined to be eligible for benefits (see Figure 2-2) (SSA, 2014h). Among
SSDI/SSI concurrent applicants, 33.8 percent were denied at Step 5, and 12.5 percent were
allowed (see Figure 2-2) (SSA, 2014h). Among SSI adult applicants, 40.1 percent were denied at
Step 5, and 13.9 percent were allowed (see Figure 2-2) (SSA, 2014h). Notably, more than 50
percent of the initial determinations made at the DDS level in 2013 were made in this final step
of the disability determination process, when medical-vocational factors are a primary
component of the determination decision.4
SSA is in the process of updating its system for making medical-vocational decisions
(SSA, n.d.-l). The medical-vocational decisions require up-to-date information about the
occupations that exist in the national economy. Through an interagency agreement with the
Bureau of Labor Statistics (BLS), SSA is working to develop an Occupational Information
System (OIS). The OIS would include data elements of interest to SSA, including data elements
that describe the mental and cognitive demands of work, on the full range of occupations
available in the national economy.
At the end of the five-step determination process, 43.7 percent of SSDI applicants, 23.8
percent of SSDI/SSI adult concurrent applicants, and 28.1 percent of SSI adult applicants in 2013
were awarded benefits during the initial determination process (SSA, 2014h).5 As described
below, applicants denied benefits during this initial evaluation process may be eligible for
4
The large number of cases determined on medical-vocational criteria is not unusual or unique to 2013.
5
These figures are obtained by summing the percentages shown in Figure 2-2 for denied and allowed applicants
across all stages. Applications for SSDI and SSI adult benefits may be initially denied at any point along the five-
step determination process. Applications may be allowed only at Steps 3 and 5.
appeal. As such, the allowance rates from this initial evaluation stage are lower than the final
allowance rates for all applicants.
The first two steps of the disability determination process are similar for children under
18 years of age and adults. As with SSDI and SSI adult applications, almost no applications are
rejected at Step 1 due to prescreening of the nonmedical eligibility requirements by the SSA field
offices. Step 2 for children involves a determination of whether the child has a medically
determinable impairment or combination of impairments that causes more than “minimal
functional limitations” rather than whether it precludes substantial gainful activity as in the adult
cases (20 CFR § 416.924). In 2013, 6.1 percent of SSI child applications were denied at Step 2
(see Figure 2-3) (SSA, 2014h). As with adults, Step 3 involves a determination of whether a
child’s medically determinable physical or mental impairment/s meets or medically equals the
clinical criteria in SSA’s Listing of Impairments for children (SSA, n.d.-d). If so, the claim is
allowed. In 2013, 19 percent of SSI child applications were allowed at this stage (see Figure 2-3)
(SSA, 2014h).
The primary difference between disability evaluations for children and adults is in an
additional component of the evaluation at Step 3 for children whose impairments do not meet or
medically equal the listings. In these cases, the examiner considers whether the impairment
results in limitations that functionally equal the medical listings (20 CFR § 416.926a). To be
functionally equal to the listings, the impairment must result in “marked” limitations in two of
six domains of functioning or an “extreme” limitation in one of the domains.6 The six domains
considered are “(1) acquiring and using information, (2) attending and completing tasks, (3)
interacting and relating with others, (4) moving about and manipulating objects, (5) caring for
oneself, and (6) health and physical well-being” (20 CFR § 416.926a). In making the assessment,
the examiner considers all of the information in the record about the interactive and cumulative
effects of the impairments, including any that are not “severe,” on the child’s functioning during
all activities at home, at school, and in the community. The assessment is based on how
“appropriately, effectively, and independently” the child performs these activities compared to
children of the same age who do not have impairments (20 CFR § 416.926a). If the child’s
impairment functionally equals the severity of the medical listings, the application is approved.
In 2013, 21.1 percent of applications were allowed and 48.6 percent were denied at this final step
(see Figure 2-3) (SSA, 2014h).
The remaining steps of the disability determination process for adults, Steps 4 and 5, do
not pertain to children. Summing the allowances in at Steps 2 and 3 (see Figure 2-3) brings total
allowances in the initial determination stage to 40.1 percent (SSA, 2014h).The remaining cases
were denied during the initial determination process. As with adults, denied applicants are
allowed to appeal their decision, potentially increasing the final allowance rate for the program.
6
A limitation is “marked” if it seriously interferes with the child’s ability to independently initiate, sustain, or
complete activities and is “extreme” if it very seriously interferes with the child’s ability to independently initiate,
sustain, or complete age-appropriate activities (20 CFR § 416.926a) .
2-8 PS
SYCHOLOGIICAL TESTIN
NG IN THE SE
ERVICE OF D
DISABILITY D
DETERMINATION
Medical
M and
d Other Eviidence and C
Consultativve Exams
The
T DDS usees the medicaal and other evidence in the applicannts’ files in m making disabbility
determinnations. SSA recognizes different cattegories of evvidence, inccluding (1) oobjective meddical
evidence; (2) narrativ
ve medical records, opin nions, and staatements froom treating aand nontreatiing
medical sources;
s (3) statements by
b the appliccant for the ffile or made to medical ssources or SS SA
field office or DDS representativ
r ves; and (4) information
i from other nnonmedical ssources (e.g.,
educationnal personneel, social wellfare agency
y personnel). More generrally the cateegories can bbe
grouped as “objective medical ev vidence,” app plicant self-rreports, and third-party rreports (meddical
and nonm medical). Acccording to SSA
S regulations, objectivve medical eevidence refeers to medical
signs7 an
nd laboratoryy findings.8 Laboratory
L findings
f musst be demonsstrated throuugh “medicallly
7
“Signs arre anatomical, physiological,
p or psychologiccal abnormalitiies which can bbe observed, appart from [self- f-
reported syymptoms]. Sign ns must be sho
own by medicallly acceptable clinical diagnoostic techniquees. Psychiatric signs
are medicaally demonstrab ble phenomenaa that indicate specific
s psychoological abnorm malities, e.g., aabnormalities oof
behavior, mood,
m thought,, memory, orieentation, develo
opment, or percception. They m must also be shhown by obserrvable
facts that can
c be medicallly described an nd evaluated” (20
( CFR § 4044.1528).
8
“Laborato ory findings arre anatomical, physiological,
p or psychologiccal phenomenaa which can be shown by the use of
medically acceptable
a labo oratory diagno
ostic techniquess. Some of thesse diagnostic teechniques incluude chemical ttests,
electrophysiological stud dies (electrocardiogram, electrroencephalogrram, etc.), roenntgenological sttudies (X-rays)), and
psychologiical tests” (20 CFR
C § 404.152 28).
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
acceptable laboratory diagnostic techniques,” among which SSA includes psychological tests (20
CFR § 404.1528).
SSA’s use of the term objective medical evidence to refer to observable medical signs and
laboratory or test results implies that the other types of evidence are “subjective” and therefore,
perhaps, less reliable, which creates a tension among the different types of evidence that SSA
considers. This may arise particularly for categories of claims in which impairments are
established and assessed primarily on reports of signs and symptoms of impairment and
functional limitation (e.g., mental impairments other than intellectual disability, certain
musculoskeletal conditions). It is important to note, as discussed in Chapter 4, that self-report
measures can be valid assessment tools. In addition, SSA considers the consistency of all the
evidence in a record to establish confidence in the validity of the claim of impairment and
functional limitation.
If the information is insufficient to make a determination, the examiner generally tries to
obtain additional information from the applicant’s medical sources and, in some cases, other
sources. Medical reports should include the applicant’s medical history, clinical and laboratory
findings, diagnosis, and prescribed treatment, including the applicant’s response and prognosis.
In addition, the report should include a statement about what the applicant can still do, including,
for adults, the physical and/or cognitive ability to perform work-related activities. For children,
the statement should discuss the child’s functional limitations relative to other children of the
same age (SSA, n.d.-a).
If the information requested from the applicant’s treating and other sources is unavailable
or remains insufficient (e.g., lacking in necessary detail or conflicting, inconsistent, or
ambiguous) to make a determination, the DDS may arrange for a consultative examination (CE)
to obtain additional information needed to evaluate the claim (20 CFR § 404.1519a). In 2013,
45.1 percent of disability applicants received a CE as part of the initial disability determination
process (SSA, 2014d). CEs were more commonly acquired for SSI and concurrent SSDI/SSI
adult applicants than for SSDI applicants (SSA, 2014d). The minimum requirements for CE
reports for mental disorders in adults and children can be found in the SSA’s consultative
examination guide for health professionals (SSA, n.d.-k). (See also for adults, SSA [2014e] and
for children SSA [2012c]).
Appeals Process
If the DDS denies an application, the applicant can appeal the decision in turn to (1) the
DDS (reconsideration), (2) an administrative law judge (ALJ), (3) the Appeals Council, and (4) a
federal court.9 Data on the number of applicants who appeal their decision at each stage are
available from SSA. Because it takes time for denied applicants to move through the various
stages of the appeal process, data are available through 2010. The data show that approximately
55 percent of those who applied for SSDI or concurrent worker benefits in 2010 and were denied
during the initial evaluation, appealed the decision (calculation based on data from the 2013
Annual Statistical Report on the SSDI program, Tables 61 and 62 [SSA, 2014b]).10 The rates of
appeal were slightly lower for denied SSI applicants. Approximately 45 percent of 2010 SSI
9
A 10-state pilot program begun in 1999 permits a claimant to bypass reconsideration by the DDS and submit the
appeal directly to an ALJ.
10
This figure includes concurrent SSDI/SSI applicants.
adult applicants and 30 percent of 2010 SSI child applicants who were rejected in the initial
determination process appealed their decisions (calculations based on data from the 2013 Annual
Statistical Report on the SSI program, Tables 70 and 71 [SSA, 2014k]).
The first level of appeal, which takes place within the DDS, is a reconsideration of the
original claim or, for SSI, a review of an initial determination. Reconsideration involves a
complete review of the initial claim by an examiner and, where applicable, a medical consultant
who did not participate in the original evaluation. DDSs are reported to approve about 5 percent
of reconsideration claims (Morton, 2014).
If the reconsideration is denied, the next level of appeal is a hearing before an ALJ. ALJs
are employed by SSA and, on appeal, review the evidence in an applicant’s file, including any
new evidence submitted by the applicant. The ALJ also may interview the applicant and any
witnesses brought by the applicant, as well as relevant medical or psychological consultants,
other health care providers, or vocational experts. The applicant or a representative also may
question any of the other witnesses. After considering all of the evidence and testimony, the ALJ
issues a written decision (SSA, n.d.-i). If the ALJ finds that additional evidence is needed, he or
she may order a CE or otherwise seek further development of the case file (SSA, 2012f).
Reportedly about 67 percent of the claims reviewed by ALJs overall are approved, although the
approval rate varies among ALJs and can be much higher (Morton, 2014; SSA, 2015).
Claims that are denied at the ALJ level may be brought to the Appeals Council, which
serves as the final level of appeal within SSA. The Appeal Council’s role is to determine whether
the ALJ made the correct decision. The Appeals Council considers each case and either
dismisses the request for review, if it agrees with the ALJ’s decision; sends it for review by
another ALJ, if it finds a technical or procedural error with the ALJ’s decision; or decides the
case itself and grants benefits to the applicant (Laurence, 2015; SSA, 2014h, n.d.-h). About 22
percent of requests for review are returned for re-review by an ALJ, and 2 to 3 percent of
requests are overturned by the Appeals Council, resulting in a favorable decision (Disability
Benefits Center, 2014; Laurence, 2015). In fiscal year 2013, the Appeals Council received more
than 172,000 new requests for review. The council processed more than 176,000 requests that
year but still finished the year with a backlog of more than 157,000 pending. The processing time
averaged 364 days (SSA, n.d.-h).
If the Appeals Council dismisses or does not reverse an unfavorable decision by the ALJ,
the applicant may contest SSA’s final decision by filing a civil suit in U.S. district court (SSA,
n.d.-g). In fiscal year 2013, more than 18,700 new cases were filed (SSA, n.d.-g). The federal
judge agrees with or overturns the decision of the ALJ and the Appeals Council, thereby denying
or awarding benefits, or sends the case back for re-review by the ALJ.
Returning to data for 2010, by the end of all stages of the appeal process, 53 percent of
SSDI or concurrent worker applicants who appealed their initial denial ultimately received an
award (calculation based on data from the 2013 Annual Statistical Report on the SSDI program,
Tables 62 and 63 [SSA, 2014b]). The rates are lower for SSI applicants: 40 percent of SSI adult
applicants and 27 percent of child applicants in 2010 were ultimately awarded benefits after
appeal (calculations based on data from the 2013 Annual Statistical Report on the SSI program,
Tables 71 and 72 [SSA, 2014k]).
The final award rate, which includes initial and appealed decisions, varies across
disability programs but is always higher than the initial award rates given in Figures 2-2 and 2-3.
Based on data for applicants who filed for benefits in 2010, final award rates for disability
benefit applicants are around 55 percent for SSDI workers, including concurrent applicants; 40
percent for SSI adult applicants; and 45 percent for SSI child applicants (SSA, 2014b, Tables 61,
62, 63; 2014k, Tables 70, 71, 72).11
Although state DDS offices and SSA follow the same disability determination and
appeals process, award rates vary significantly by state, reflecting variation in both filing rates
(applications per eligible population) (see Figure 2-4) and allowance rates (allowances per DDS
determinations) (see Figure 2-5). Variation in these rates stems, in part, from factors outside of
the direct control of DDS offices or SSA. Such factors include state-level differences in
population characteristics, such as age, education, and impairment type, as well as differences in
local labor market conditions, such as the unemployment rate or mix of jobs available for
workers with different skills.12
Several studies have attempted to quantify the degree to which state variation in
application, allowance, and award rates is explained by these factors. In general the results
suggest that observable state and individual characteristics account for half or more of the total
variation. For example, Strand (2002) finds that controlling for state-level observables and year
effects reduced variation in state-level allowance rates (1997–1999) by half. Soss and Keiser
(2006) find similar reductions in variation for SSDI and SSI application rates.
Rupp (2012) decomposes overall cross-state variation in allowance rates for the 1993–
2008 period and attributes it to one of four sources: (1) time-varying independent variables
(unemployment rate and demographic and diagnostic criteria); (2) year fixed effects that capture
national changes in economic conditions or policies affecting disability programs; (3) state fixed
effects that capture unobservable, long-term differences across states that may or may not be
related to DDS management; and (4) residual unexplained that captures the remaining variation
not associated with any of the model variables (Table 2-1).
11
In 2010, there were still applications pending final approval. Allowance rates for earlier years with smaller
numbers of pending decisions were slightly higher than those referenced here for 2010.
12
A long literature has documented the relationship between local labor market conditions, generally measured by
the unemployment rate, and applications and awards for disability benefits. In general the results show that poor
economic conditions/higher unemployment rates are associated with increased applications and awards for benefits
(Autor and Duggan, 2003; Black et al., 2002; Burkhauser et al., 2002; Duggan and Imberman, 2008; Kreider, 1999;
Rupp and Stapleton, 1995). Research on allowance rates and economic conditions (Rupp, 2012; Rupp and Stapleton,
1995; Strand, 2002) generally finds a negative relationship suggesting that SSA is able to screen out some
marginally qualified candidates who might apply for the program in response to poor economic conditions.
TABLE 2-1 Components of Total Variation in Allowance Rates from Level Fixed-Effects OLS
Regression Models, by SSA Program Group (in percent), 1993–2008
Adult Program Group
a
Component of Variation SSDI Only SSI Only Concurrent SSI Child
State fixed effects 52 41 46 50
Year fixed effects 14 16 9 29
Time-varying independent variables 10 17 18 6
(unemployment rate and demographic and
diagnostic characteristics of applicants)
Unexplainedb 24 25 27 16
Total 100 100 100 100
NOTES: A total of 12 regressions were estimated: three models for each of the four program
groups. For each program group, independent variables were included in a sequential manner. The
first model included only state fixed effects. The second model added year fixed effects. The third
model added the time-varying variables. The results in this table reflect state-level OLS regression
models. Totals may not sum to 100 because of rounding.
a
The first row contains the R2 from the first model for each program group. The subsequent two
rows reflect the marginal increase in the R2 arising from adding the given group of independent
variables to the model. The total of the first three rows represents the R2 for the third model that
included all three groups of variables.
b
The unexplained variation was calculated by subtracting the R2 for the third model that included all
of the predictors from 100 percent.
SOURCES: Data are based on 1,736,554 initial disability determinations in the 50 states and the
District of Columbia for the 1993–2008 period, taken from SSA’s National Disability Determination
Services System File. State unemployment rate data are taken from the Current Population Survey.
Reprinted with permission from Rupp, 2012, Table 9.
The results show that time-varying independent variables explain a relatively small share
of the state variation in allowance rates; about 10 percent for SSDI allowance rates and about 20
percent of variation in adult SSI and concurrent SSDI/SSI claims. Only 6 percent of the total
variation in SSI child allowance rates is accounted for by the time-varying independent variables
included in his model. Year fixed effects account for an additional small share of the variation in
adult allowance rates (SSDI and SSI) but nearly 30 percent of the variation in SSI child
allowances. Notably, between 40 and 50 percent of the overall variation in allowance rates across
states is explained by long-term, unobservable state-specific differences. Combining these
numbers with the amount unexplained by the model, the total variation in state allowance rates
that cannot be traced back to observable variables outside of the DDS control is approximately
75 percent.
Although it is not possible to know definitively whether the large share of unexplained
variation in state filing, award, and allowance rates is driven by variability in the federal
disability determination process, there is some evidence that states differ in how they manage
claims. For example, there are significant differences across states in the percentage of cases
requiring a consultative exam as part of the initial determination. Recall that nationally about 45
percent of initial determinations request a consultative exam. By contrast, in low-CE states such
as Hawaii, Missouri, and Virginia about one-quarter of cases receive a CE (SSA, 2014c). In
high-CEs state such as Indiana, Kentucky, and Tennessee about two-thirds of initial
determinations request a CE (SSA, 2014c). That said, since the committee could locate no study
of the variability of CE rates this evidence is only suggestive of differences in case management
across states.
Although there are no data on the composition of impairments affecting applicants, the
data on allowed claims provide insight into the types of individuals seen at the state DDS offices.
Figure 2-6 shows the composition of new beneficiaries in 2013 for SSDI and SSI adults and
children. By far the largest two impairment categories for all three disability programs are mental
disorders (excluding intellectual disabilities) and musculoskeletal and connective tissue
disorders. In 2013, these two categories accounted for 52 percent of new SSDI awards, 53
percent of new SSI adult awards, and 58 percent of new SSI child awards. Within these two
categories, a significant fraction of the claimants have conditions, including affective mood
disorders and disorders of the back, for which the presence and severity of impairment and
associated functional limitations are based largely on applicant self-report (SSA, 2014j, l).
The large share of these two categories in the flow of new beneficiaries indicates that
DDS offices are evaluating a large number of cases that require more subjective judgment about
the functional limitations the client faces. This is supported by the large number of adult cases
that are determined on medical-vocational criteria at Steps 4 and 5 of the determination process:
more than 50 percent of the initial DDS decisions and more than 80 percent of decisions at the
hearing level (SSA, n.d.-l).
2-16 PS
SYCHOLOGIICAL TESTIN
NG IN THE SE
ERVICE OF D
DISABILITY D
DETERMINATION
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
Adults who file for SSA disability on the basis of mental disorders and meet the
nonmedical eligibility criteria are evaluated at Step 2 for the presence of a medically
determinable mental impairment, the severity of the functional limitation it imposes on the
individual’s ability to work, and a determination that the impairment has lasted or will last for 12
or more continuous months (SSA, 2012d, n.d.-e). The DDS assesses the presence of a medically
determinable mental impairment on the basis of the medical evidence, including relevant signs,
symptoms, and laboratory or psychological test findings (SSA, 2012d).
The DDS assesses the severity of a medically determinable mental impairment on the
basis of the functional limitations it imposes on the claimant’s ability to engage in work-related
activities. Functional limitations assessed in four areas that are considered essential for work: (1)
activities of daily living (ADLs); (2) social functioning; (3) concentration, persistence, or pace;
and (4) episodes of decompensation in a worklike setting—or “the ability to tolerate increased
mental demands associated with competitive work” (SSA, 2009, section B). These areas
correspond to the Paragraph B criteria,13 which are part of the listings of impairments for mental
disorders assessed at Step 3. A functional limitation is considered “marked” if it is “more than
moderate but less than extreme”; in other words, the degree of limitation “interfere[s] seriously
with [the claimant’s] ability to function independently, appropriately, effectively, and on a
sustained basis” (SSA, n.d.-e, section C).
ADLs and social functioning are evaluated within the contexts of (1) appropriateness, (2)
independence, (3) sustainability, (4) quality, and (5) effectiveness (SSA, 2009). Information
about the claimant’s ADLs and social functioning is acquired through interview, self-report,
observation, and other report. Concentration, persistence, or pace “refers to the ability to sustain
focused attention sufficiently long to permit the timely completion of tasks commonly found in
work settings” (SSA, 2009, section D). These functions may be assessed with a mental status
exam or psychological tests, but such tests represent a point in time and do not necessarily reflect
the ongoing stresses of a work environment. Clinical and test data should be supplemented by
other evidence, such as observations of performance in a work or worklike setting.
The inability to tolerate the increased demands associated with work (deterioration or
decompensation) is demonstrated by an increase in the signs or symptoms and the need for new
or additional treatment or removal from the stressful environment. Generally to meet the criteria
the claimant would have had at least three episodes, each lasting 2 weeks or longer, in the most
recent year.
Step 2 is the first point at which the results of cognitive and non-cognitive tests can help
inform SSA’s disability determination process. The results of such tests can help support the
identification and documentation of the presence and severity of medically determinable mental
impairments. It is important to note that an individual’s level of functioning can fluctuate over
13
Under a notice of proposed rulemaking, SSA has proposed revised Paragraph B criteria to capture “the mental
abilities an adult uses to function in a work setting” (SSA, 2010, p. 51340). The revised B criteria are the abilities to
“understand, remember, and apply information,” “interact with others”; “concentrate, persist, and maintain pace”;
and “manage oneself.”
For most of the diagnostic categories,16 adult claimants will meet a listing if the
impairment satisfies the following: (1) the diagnostic description of the mental disorder; (2)
specified medical findings—e.g., symptoms (self-report), signs (medically demonstrable),
laboratory findings (including psychological test findings)—(Paragraph A criteria); and (3)
specified “impairment-related functional limitations that are incompatible with the ability to do
any gainful activity” (Paragraph B or Paragraph C) criteria) (SSA, n.d.-e). Paragraph A criteria,
in conjunction with the diagnostic description, substantiate the presence of the specific mental
disorder based on the medical evidence. Paragraph B and Paragraph C criteria list the functional
limitations resulting from the mental impairment that preclude the ability to engage in gainful
activity. Cognitive and non-cognitive test results can inform disability determinations at Step 3,
particularly with respect to Paragraph A and B criteria.
If a claimant’s impairment does not meet the diagnostic definition or the Paragraph A
criteria of a listing but does result in the functional limitations specified in the Paragraph B or C
criteria, the impairment is considered to equal the listing. Claimants whose impairments are
severe but do not meet or equal any of the listings are not approved at Step 3. They move on to
an evaluation of their residual function capacity at Steps 4 and 5 of the determination process.
Residual functional capacity refers to the work-related capacities a claimant still possesses
despite the impairment. Assessment of residual functional capacity is another area of the
determination process that the results of psychological testing could inform.
The determination process differs somewhat for children at Step 3. In addition to asking
whether the child’s impairment(s) meet or medically equal one of the listings, a second question
is posed if it does not: Does the impairment functionally equal the listings? By “functionally
14
Under the same notice of proposed rulemaking (SSA, 2010), SSA has proposed revised listing categories.
15
Somatoform disorders are discussed separately in the following section.
16
The structure of the listing for intellectual disability and for substance addiction disorders differ from that of the
other mental disorder listings. There are four sets of criteria (Paragraphs A through D) for the intellectual disability
listing, and the listing for substance addiction disorders refers to which of the other listings should be used to
evaluate the various physical or behavioral changes related to the disorder.
equal the listings,” SSA means that “the impairment(s) must be of listing-level severity; i.e., it
must result in ‘marked’ limitations in two domains of functioning or an ‘extreme’ limitation in
one domain” (20 CFR § 416.926a). The functional limitations caused by the child’s
impairment(s) are assessed. In determining functional equivalence, SSA considers “the
interactive and cumulative effects of all of the impairments for which [it has] evidence, including
any impairments [the child has] that are not ‘severe’ (see § 416.924(c))” (20 CFR § 416.926a).
When assessing a child’s functional limitations, it considers “how appropriately, effectively, and
independently [the child] performs … activities compared to the performance of other children
[the same] age who do not have impairments” (20 CFR § 416.926a).
Documentation
As previously described, the DDS uses all relevant evidence in a claimant’s file in
making a disability determination. The medical evidence in a claimant’s file must be sufficiently
complete and detailed to allow the DDS to make a determination. Medical evidence includes a
history of the individual’s mental impairment, the results of any mental status examinations and
psychological tests, and the records of any treatments and hospitalizations provided by an
“acceptable medical source” (SSA, 2014f, n.d.-e).
Although a full mental status exam, performed during a clinical interview, can be tailored
to target the specific areas most relevant to the alleged impairment, a comprehensive exam
generally would include “a narrative description of [the individual’s] appearance, behavior, and
speech; thought process (e.g., loosening of associations); thought content (e.g., delusions);
perceptual abnormalities (e.g., hallucinations); mood and affect; sensorium and cognition
(orientation, recall, concentration, intelligence); and judgment and insight” (SSA, n.d.-e, section
D4).
Psychological Testing
can help to confirm the presence of intellectual disability and organic mental disorders as well as
the severity of cognitive impairment. SSA states that standardized personality measures (e.g.,
Minnesota Multiphasic Personality Inventory-2) or projective testing techniques (e.g.,
Rorschach) may provide useful data for the evaluation of disability “when corroborated by other
evidence, including results from other psychological tests and information obtained in the course
of the clinical evaluation” (SSA, n.d.-e, section D7). SSA also states that “comprehensive
neuropsychological examinations may be used to establish the existence and extent of brain
function, particularly in cases involving organic mental disorders” currently licensed or certified
(SSA, n.d.-e, sections D6, D7, D8).
SSA specifies the minimum content requirements for CE reports for adults with mental
disorders (SSA, n.d.-k, Part IV, Mental Disorders). These requirements include the following:
applicants’ longitudinal, current, and past medical history; current medications; social and family
history; physical examination; mental status evaluation.17 In addition, the report is to include
interpretation of any psychological and/or clinical test results in relation to the history and
examination findings as well as identification of the individual providing the interpretation if
different from the provider signing the CE report (SSA, n.d.-k, Part IV, Mental Disorders,
section H). The report also is to specify “a full multiaxial classification as set forth in the current
Diagnostic and Statistical Manual of Mental Disorders” and prognosis and recommendations for
treatment, if indicated (SSA, n.d.-k, Part IV, Mental Disorders, section I).
For applicants with intellectual impairments, current documentation of intelligence
quotient (IQ) is required along with interpretation of the results, including an assessment of their
validity, and consistency of the results “with the claimant’s educational, vocational, and social
background” (SSA, n.d.-k, Part IV, Mental Disorders, section I). Also required is “a
comprehensive and detailed description of adaptive behavior in the areas of personal, social,
academic, and occupational functioning during the developmental period” (SSA, n.d.-k, Part IV,
Mental Disorders, section I).
Additionally, SSA specifies that CE reports for mental disorders should include
statements from the medical source regarding “the nature and extent of the mental disorder” and
“an assessment of the claimant’s abilities and limitations based on medical history, observations
during examination, and results of relevant laboratory tests” as well as an opinion regarding the
applicant’s ability to carry out certain functions (SSA, n.d.-k, Part IV, Mental Disorders, section
J). The report should discuss “any apparent discrepancies in medical history or in examination
findings and how the discrepancies were resolved”; include “a statement regarding malingering,
if applicable”; and “a statement regarding the [applicant’s] capability to manage funds” (SSA,
n.d.-k, Part IV, Mental Disorders, section J).
In practice CEs for mental disorders generally consist of nonstandardized diagnostic
interviews and mental status exams, with little to no standardized psychological testing other
17
Elements include “(1) manner and approach to evaluation; (2) dress, grooming, hygiene and presentation; (3)
mood and affect; (4) eye contact; (5) expressive/receptive language; (6) recall/memory, including working, recent
and remote; (7) orientation in all four spheres; (8) concentration and attention; (9) thought processes and content;
(10) perceptual abnormalities; (11) suicidal/homicidal ideation; (12) judgment/insight; and (13) estimated level of
intelligence” (SSA, n.d.-k, Part IV, Mental Disorders, section G).
than intelligence testing (Chafetz, 2008; Chafetz et al., 2007; Griffin et al., 1996; Heiser, 2014;
McLaren, 2014; Price, 2014; Ward, 2014).
Aside from the use of intelligence tests as described in the listings for intellectual
disability and certain neurological impairments, SSA does not require or specify the purchase of
any type of or individual psychological test. The primary guidance provided by SSA is that good
psychological tests are valid, reliable, and appropriately normed, and have a wide scope of
measurement, as previously described. In addition, as discussed later under Use of Validity Tests,
current SSA policy precludes the purchase of validity tests except in rare cases, such as a court
order.
There are three distinct groups of claimants seeking disability compensation for somatic
symptoms unaccompanied by demonstrable anatomical, biochemical, or physiological
abnormalities: somatoform disorders (recently termed somatic symptom disorders in the fifth
edition of the Diagnostic and Statistical Manual of Mental Disorders [DSM-5]); “multisystem
illnesses”; and chronic idiopathic pain conditions.
In all three of these types of conditions—somatoform disorder, multisystem illness, and
chronic pain—the credibility, reliability, validity, or accuracy of the reported symptoms and/or
impairment may be called into question. This is due to the absence of objective evidence or
biomarkers that could explain or substantiate the claimant’s report of subjective distress and
disability. When relying on self-report of symptoms and impairment, SSA policy states that
claimants may not be found disabled solely on the basis of self-reported statements about pain or
other symptoms (Social Security Act § 223(d)(5)(A), § 1614(a)(3)(D); 20 CFR 404.1508,
404.1529, 416.908, 416.929; SSA, 1996b, 2014g).
In cases where an individual’s self-reported symptoms, including pain, suggest a greater
degree of impairment than expected based on the objective medical evidence alone, other
corroborative information from treating and nontreating medical sources and other sources is
considered. Such information may include information about the individual’s
daily activities; the location, duration, frequency, and intensity of [the] pain or
other symptoms; precipitating and aggravating factors; the type, dosage,
effectiveness, and side effects of any medication … taken to alleviate [the] pain or
other symptoms; treatment, other than medication …; any measures … used to
relieve [the] pain or other symptoms …; and other factors concerning [the
individual’s] functional limitations and restrictions due to pain or other
symptoms. (20 CFR 404, Subpart P, § 404.1529; 20 CFR 416, Subpart I, §
416.929)
SSA has issued guidance on its policy for evaluating claims involving chronic fatigue
syndrome (CFS) (SSA, 2014g). This guidance explains how SSA determines the presence of a
medically determinable impairment in an individual with CFS, including some of the possible
medical signs and laboratory findings that would help to support such a finding. SSA then
assesses whether the medically determinable impairment could reasonably be expected to
produce the reported symptoms. In cases where objective medical evidence does not substantiate
the person’s statements, SSA considers the same types of evidence described for pain and other
symptoms. SSA will also make a finding about the credibility of the person’s statements as
described in the following section.
• Consistency, both internally (i.e., with other statements by claimant) and with
other information in record (e.g., objective medical evidence, third-party
reports and observations);
• The extent to which objective medical evidence may inform conclusions about
the intensity and persistence of reported symptoms, even if the latter are not
objectively measurable; and
• The individual’s longitudinal medical record (history) of persistence and
severity of reported symptoms.
SSA requires the examiner to articulate specific reasons for the credibility finding based on the
medical and other evidence in the case record. It is important to note both that a credibility
finding need not reflect complete acceptance or rejection of the individual’s statements (i.e., the
statements may be found to be partially credible) and that credibility concerns alone do not rule
out the presence of disability (SSA, 1996c).
With rare exceptions, such as a court order, current SSA policy precludes the purchase of
(validity) tests18 to help inform determinations about the credibility of an individual’s statements
or about possible malingering (SSA, 2012e, 2013). It is SSA’s position that “Tests cannot prove
whether a claimant is credible or malingering because there is no test that, when passed or failed,
conclusively determines the presence of inaccurate self-reporting” (SSA, 2013, section D),
although SSA acknowledges that the results of such tests “can provide evidence suggestive of
poor effort or of intentional symptom manipulation” (SSA, 2008). Nevertheless, SSA will
consider, along with all other relevant evidence, the results of symptom validity tests (SVTs) that
are already in the claimant’s file (SSA, 2013). According to a 2013 report from the Office of the
Inspector General, SSA:
On the other hand, SSA acknowledges that validity test results can “provide evidence
suggestive of poor effort or intentional symptom manipulation” and states that it will consider
validity test results that are already in an applicant’s file, along with all other relevant evidence.
In fact, the statement that no one test “conclusively determines the presence of inaccurate patient
self-report” seems to run counter to SSA’s dedication to obtaining as much evidence as possible
and taking account of all the information when making a disability determination. It is important
to divorce the concept of “malingering” from that of validity testing. As introduced in the
following section, and made clear later in this chapter and elsewhere in the report and
appendixes, validity test results can speak to performance (on performance-based tasks) and to
18
Such tests include the following: Rey 15 Item Memory Test (Rey-II), Miller Forensic Assessment of Symptoms
Test (M-FAST), Millon Clinical Multiaxial Inventory, Minnesota Multiphasic Personality Inventory (MMPI),
Minnesota Multiphasic Personality Inventory-2 (MMPI-2), Malingering Probability Scale, Structured Interview of
Reported Symptoms, Test of Memory Malingering, and Validity Indicator Profile (SSA, 2008, 2013).
19
Quotations are taken from SSA (2008).
the consistency and accuracy of responses on self-report measures. However, they provide
limited information about intentionality and none about motive. It is important, therefore, to not
discount the potential usefulness of validity test results on the grounds that malingering cannot
be proven with tests or that a high likelihood of malingering and the presence of severe
limitations resulting from a genuine medically determinable impairment cannot coexist.
20
Respondents were asked the extent to which each of the following supported such an assessment in their cases:
“below empirical cut-off on forced choice tests”; “below chance on forced choice tests”; “below empirical cut-off on
other malingering tests”; “pattern of cognitive test performance does not make neuropsychological sense
(inconsistent with condition)”; “severity of cognitive impairment inconsistent with condition”; “implausible changes
in test scores across repeated examinations”; “above validity scale cut-offs on objective personality tests”;
“discrepancies among records, self-report, and observed behavior”; and “implausible self-reported symptoms in
interview” (Mittenberg et al., 2002, p. 1102).
Depression Inventory. Nineteen percent (n = 32) of the 167 applicants assessed scored at a level
identified as “malingering.” The CDMI scores for this group more closely resembled those of a
group of disability examiners who were instructed to malinger than those of the comparison
group of psychologically disabled individuals with no incentive to malinger. The subgroup
identified as “malingering” differed from the rest of the disability applicant group only in the
presence of a self-reported history of substance abuse.
In their 2002 survey, Mittenberg and colleagues (2002) found a base rate of “probable
malingering or symptom exaggeration,” as described in note 17, of approximately 30 percent
(reported) to 33 percent (adjusted)21 for disability or worker’s compensation cases. The rate
varied relative to the referral source, with patients referred by defense attorneys or insurers
having a higher rate of “probable malingering or symptom exaggeration.” Their estimates were
based on a total of 33,532 cases reported in surveys returned by 131 of 375 possible respondents
among the 388 members of the American Board of Clinical Neuropsychology. Eleven percent of
the cases involved disability or worker’s compensation (n = 3,688), 19 percent (n = 6,371)
involved personal injury litigation, 4 percent (n = 1,341) involved criminal litigation, and 66
percent (n = 22,131) were medical or psychiatric cases not involving litigation or compensation.
The reported base rate of “probable malingering or symptom exaggeration” in the last group was
only 8 percent (Mittenberg et al., 2002, pp. 1095–1096).
In a sample of adult SSA disability applicants, Chafetz and Abrahams found that 13.8
percent scored below chance performance and 58.6 percent failed two or more validity indicators
(Chafetz and Abrahams, 2005, reported in Larrabee, 2007b). Miller and colleagues reported that
54 percent of Social Security disability applicants failed “conservative criteria for poor effort” on
either the Computerized Assessment of Response Bias or the Word Memory Test (Miller et al.,
2006, reported in Chafetz, 2008 and Larrabee, 2007a).
Chafetz and colleagues administered the Test of Memory Malingering (TOMM) or the
Medical Symptom Validity Test (MSVT) to adult and child disability applicants, most with low
cognitive functioning, who were referred for a psychological consultative examination by the
DDS (Chafetz et al., 2007). Based on their performance on the test, subjects’ performance was
scored as “below chance,” “chance or below,” or “failing.” In this study, 55.8 percent of adults
(n = 136) and 28.3 percent of children (n = 96) failed the TOMM, and 12.4 percent of adults and
8.7 percent of children scored below chance on the test. On the MSVT, 61.4 percent of adults (n
= 58) and 37.0 percent of children (n = 27) failed, and 12.3 percent of adults and 7.4 percent of
children scored below chance.
The same study was designed to validate a tool, the “DDS Malingering Rating Scale,”
developed by the authors to help psychologists assess and inform DDSs about the validity of
their findings (Chafetz et al., 2007). The rating scale was validated against the TOMM and the
MSVT and was found to correlate well with “formal tests and indicators of effort in adults and
children” (Chafetz et al., 2007, p. 11). Fifty-one point six (51.6) to 58.9 percent of adults and
34.6 to 43.8 percent of children failed the DDS Malingering Rating Scale, and 20.5 to 30.4
percent of adults and 15.4 to 32.5 percent of children scored below chance. (Chafetz et al., 2007,
p. 10).
21
The adjusted value is corrected to remove significant variation due to referral source.
In a subsequent paper that draws on the research reported in Chafetz and colleagues
(2007), Chafetz reports 67.8 percent of adults who were administered both the TOMM and the
DDS Malingering Rating Scale failed at least one, 45.8 percent failed both, and 36.5 percent
scored at or below chance. For adults who were administered both the MSVT and the rating
scale, 68.4 percent failed at least one, 59.7 percent failed both, and 47.4 percent scored at or
below chance on at least one of the SVT subtests. Sixty percent of children who were
administered the TOMM and the rating scale failed at least one and 26.3 percent scored at or
below chance. Of children who were administered the MSVT and the rating scale, 48 percent
failed at least one, and 20 scored at or below chance on at least one of the SVT subtests (Chafetz,
2008).
In the context of SSA disability evaluations, it is important to note that even if an
applicant performs below his or her capability on cognitive tests or inconsistently reports
symptoms, neither scenario means the individual is not disabled. However, both scenarios
suggest the need for additional assessment of the alleged impairment with the goal of making an
accurate determination of disability. Doing so first requires identification of the individuals for
whom additional assessment may improve the accuracy of the disability determination. As
described in the section on assessing credibility, when a disability claim is based primarily on an
applicant’s self-report of symptoms and statements about their intensity, persistence, and limiting
effects, SSA relies on an assessment of the consistency of the self-report with all of the evidence
in the claimant’s medical evidence record. As discussed, SSA policy currently precludes the
purchase of validity tests by SSA (e.g., as part of a psychological consultative examination). One
question is whether the results of this type of standardized test could contribute to the evidence
available for assessment. The following section discusses the potential value of adding
standardized data collection and interpretation to clinical data collection and evaluation.
Defining Terms
Data collection Medical professionals often evaluate patients using a combination of what
Wedding and Faust call clinical and mechanical data (Wedding and Faust, 1989). Clinical data
collection includes all testing and examining that is variable depending on how the clinician
performs the exam and/or on which aspects of the exam the clinicians choose to perform. For
example, clinicians may interview patients to elicit their description of the symptoms of their
illness; alternatively, clinicians may perform a physical exam. By contrast, mechanical data
collection involves the use of standardized testing where the data collection is structured and the
method typically does not vary from patient to patient. For example, if clinicians order a serum
sodium level or MMPI tests on their patients, they are collecting mechanical data.
It should be noted that mechanical data collection is not completely divorced from
clinical expertise. For example, clinicians may need to determine which mechanical data are
relevant to collect in a given patient, making a judgment about whose diagnosis will be aided by
a serum sodium level or an MMPI. In addition, the administration of mechanical tests can be
affected by clinical skill. For example, a clinician who draws a patient’s blood above an IV site
will get a false sodium level. Similarly, a clinician who administers an MMPI test after the
patient has been exhausted by previous examinations may also be collecting the data in a way
that will reduce the value and accuracy of the test results.
Data interpretation Once data have been collected—whether clinical data, mechanical data, or
some combination of both—they must be interpreted to determine whether the patient has a
specific health condition and to estimate how severe that condition is. Data interpretation
generally takes one of two approaches: clinical or actuarial. In clinical data interpretation, a
clinician looks at all the data and makes a judgment. (“Based on your age, family history, chest
pain, and ECG [electrocardiogram], I think you are having a heart attack.”) In actuarial data
interpretation, data are entered into a diagnostic program and weighed according to a statistical
procedure. (“The presence of chest pain, given your age, family history, and ECG changes,
yields a risk score of x, which estimates the probability of a heart attack to be y.”)
It is difficult for many clinicians to believe that an inflexible rule (“3 points for chest
pain, 2 points for family history and heart disease, 2 points for change in the ST segment of the
ECG leads to…”) would perform better than an experienced clinician who could take advantage
of information not included in the actuarial formula. Indeed, some clinicians recoil at actuarial
methods for being too impersonal; for treating patients like numbers and not like unique
individuals. Others criticize actuarial methods for ignoring useful information available to
clinicians. A famous criticism of actuarial methods is known as the “broken leg problem.” In one
version of this, Professor A goes to the movies almost every Tuesday night. Knowing that today
is Tuesday, an actuarial table might predict that the probability of Professor A going to the movie
tonight is 0.9. However, you might know that Professor A just broke his leg and cannot get out
of the house. You will have a much more accurate estimate of tonight’s chance of him going to
the movie than the actuarial approach (Salzinger, 2005).
The psychological power of this counterexample is that it makes it seem obvious that a
clinician, given actuarial information, can always improve on actuarial judgment by using
additional information not available to the actuarial formula. In practice, however, few cases are
as clear-cut as the broken leg example. Most additional information will not dramatically change
likelihood estimates derived from validated actuarial methods. In addition, even when additional
relevant data are available, clinicians may not make proper use of the data. They may give the
data too much or too little weight (Dawes, 1979).
In summary, clinicians are trained to collect clinical data from patients and to make
decisions about which mechanical data will aid in diagnoses as well as to interpret these clinical
and mechanical data. However, clinicians are generally not as good at interpreting those data as
are established actuarial methods (Grove and Meehl, 1996; Grove et al., 2000; Meehl, 1954).
There is evidence that the use of clinical judgment alone to assess whether an individual is
exerting sufficient effort on performance-based tests or is providing an accurate self-report of
symptoms is unreliable (Faust et al., 1988a,b; Heaton et al., 1978; Oldershaw and Bagby, 1997),
making it important for the evaluator to collect and consider relevant mechanical data along with
other objective data in making such assessments.
Mental and behavioral health conditions have become more prevalent and consume a
larger portion of the military and VA budget than they did 5 years ago. Within the past 10 years,
the VA has reached consensus about the compensability of behavioral health conditions (e.g.,
posttraumatic stress disorder [PTSD]).
Significant progress has been made in defining mental and behavioral diagnoses. Both
the military and the VA have measures of mental and behavioral health, and both evaluation
systems address function as a key determinant for disability, although for somewhat different
purposes, as described in the following sections.
Military22
There are significant differences between policies and procedures followed by SSA and
the military. In contrast to disability evaluations for SSA and the Veterans Benefits
22
Much of the information in this section is drawn from the presentation to the committee by Robert Seegmiller
(2014).
Administration (VBA), discussed in the following section, military assessments for mental and
behavioral health are performed to assess combat or duty readiness. Assessing whether an
individual is capable of performing his or her duty may be an issue of safety not only for the
individual but also for others.
Fitness for duty and return-to-duty determinations are made by medical evaluation boards
and physical evaluation boards. Mental health providers serve as consultants to the boards,
providing them with reports of diagnostic impressions, assessment of degree of impairment and
impact on military duty performance, prognosis, and recommendations. In contrast to SSA and
VBA, evaluations in the military are often performed by therapists and care professionals who
are not “interrogators” but are considered advocates and treating professionals, which may
present a conflict with respect to treatment goals versus determinations of fitness for duty. It
also should be noted that Army behavioral health professionals “diagnose and treat and should
not be in an adversarial role with patients in terms of disability processes” and “must approach
with a soldier-centered focus that provides soldiers the benefit of the doubt.” Providers “on the
whole do support the patient/soldier on face value and advocate in every way for them; however,
[they] lose credibility with both medical personnel and line units when [they] fail to properly
investigate and obtain collateral information.” (USAMEDCOM Behavioral Health Training Day,
June 12, 2012, reported in Seegmiller, 2014).
Evaluations typically include review of medical records, consideration of premorbid
functioning (ASVAB), clinical interview and behavioral observations, and information from
collateral sources. Psychological or neuropsychological testing is required in cases involving
reported traumatic brain injury (TBI), but not always in cases involving PTSD. The selection of
specific tests is left to the discretion of the clinician performing the evaluation, as is the use of
PVTs and SVTs, although most providers, particularly psychologists and neuropsychologists,
recognize the importance of their use.
A previous Office of the Surgeon General/Army Medical Command (OTSG/MEDCOM)
policy memo on the optimal use of psychological and neuropsychological assessment, notes (1)
“Psychological and neuropsychological assessments are valuable tools in quantifying patient
deficits, clarifying diagnoses, informing treatment, and in making decisions regarding a soldier’s
continued fitness for military service” and (2) “Certain clinical tests in use by neuropsychology
are designed to evaluate level of effort on the part of the test-taker. Poor effort on cognitive
symptom validity measures means only that the data is not valid to be fully interpreted, and
invalid data can be due to a range of causes other than malingering” (Policy Memo 11-076:
Optimal Use of Psychological/Neuropsychological Assessment [21 Sept 2011-2013], reported in
Seegmiller, 2014). “Poor effort on psychological/neuropsychological tests does not equate
malingering, which requires proof of intent, per OTSG/MEDCOM Policy 11-076. In addition,
this diagnosis requires the signatures of two credentialed care providers, including a supervisor,
Department Chief, or Deputy Commander for Clinical Services” (OTSG/MEDCOM Policy
Memo 12-035: Policy Guidance on the Assessment and Treatment of Post-Traumatic Stress
Disorder [10 Apr 12 thru 10 Apr 14], reported in Seegmiller, 2014).
In his discussion with the committee, Dr. Robert Seegmiller (2014) asserted that SVTs
and PVTs are critical tools that provide valuable information about the validity of an individual’s
test results. When making decisions and recommendations about whether soldiers are fit for duty
or whether they need disability, Seegmiller noted the importance of ensuring that one has good
information in order to make the decision and recommendation that is the fairest for them and
best for the system in terms of returning to work or not. However, such tests are only one type of
tool: clinician’s performing the evaluation also review the individual’s medical records, conduct
a clinical interview, make behavioral observations, gather collateral information, and the like,
and consider the consistency of all of the information with what the patient is reporting.
Veterans Administration23
The VBA is responsible for administering and delivering an array of federally authorized
benefits and services to eligible veterans and their dependents and survivors. In fiscal year 2012,
3,536,802 veterans received compensation benefits. PTSD was the third most prevalent service-
connected disability among veterans receiving compensation at the end of fiscal year 2012, and
TBI has been widely reported as the hallmark injury of the wars in Afghanistan and Iraq. To be
eligible for disability compensation, a veteran must have served under conditions other than
dishonorable, and the disability must not be the result of misconduct by the veteran. In contrast
to the military setting, in which service members are assessed in terms of fitness for duty,
veterans’ assessments are performed with the recognition that there is responsibility to care for
individuals who served in the military.
Disability compensation is paid monthly and varies according to the degree of disability
and the number of dependents. The rate of compensation is graduated from 10 percent to 100
percent disabling, in increments of 10 percent, according to the combined degree of the veteran’s
disabilities. This differs from SSA, which determines an individual to be either disabled (100
percent) or not. Also unlike SSA, recipients of veteran’s disability benefits may work with no
limit on their earnings.
Disability examinations are conducted by full-time employees of the Veterans Health
Administration (VHA), fee-basis staff, and contracted staff. Initial evaluations can be conducted
by “(1) board-certified psychiatrists; (2) psychiatrists who have successfully completed an
accredited psychiatry residency and who are appropriately credentialed and privileged; (3)
licensed doctoral-level psychologist[s]; (4) nonlicensed doctoral-level psychologists working
toward licensure under close supervision by a board-certified, or board-eligible, psychiatrist or a
licensed doctoral-level psychologist; (5) psychiatry residents under close supervision by a board-
certified, or board-eligible, psychiatrist or a licensed doctoral-level psychologist; and (6)
psychology interns or residents under close supervision by a board-certified, or board-eligible,
psychiatrist or a licensed doctoral-level psychologist” (VHA Directive 2012-021, August 27,
2012). Under the close supervision of a board-certified or board-eligible psychiatrist or licensed
doctoral-level psychologist, reviews and increase evaluations can be conducted by licensed
clinical social workers, nurse practitioners or clinical nurse specialists, and physician assistants
(VHA Directive 2012-021, August 27, 2012).
VHA requires all examiners to complete general online training regarding compensation
and pension (C&P). Some specialty examiners are required to take additional training related to
specific disabilities (e.g., PTSD).
The objective of a C&P mental disorder examination is to obtain competent,
critical, objective, and unbiased evaluations. To ensure that examination providers
are competent to provide findings and opinions that are valid and sufficient for
rating purposes, individuals who conduct C&P mental disorder examinations have
23
Much of the information in this section is drawn from the presentation to the committee by Stacey Pollack (2014).
specific qualifications and must have completed the required training. (VHA
Directive 2012-021, August 27, 2012)
Examiners conducting C&P examinations for mental disorders are instructed to:
• Diagnose mental disorders, including personality disorders, using the
nomenclature in the most current edition of the Diagnostic and Statistical
Manual of Mental Disorders; …
• Determine when clinician-administered psychometric testing is necessary and
integrate the results of such testing into the examination reports; …
• When necessary, comment on the significance of the veteran’s prior mental
health assessments (as reported) with respect to symptoms, occupational
history, social history, and global assessment of functioning. (VHA Directive
2012-021, August 27, 2012)
For all initial PTSD disability evaluations, the examiner is instructed to review the
veteran’s claims file (C-file) or any other available medical records prior to conducting the
examination. For an Integrated Disability Examination System (IDES) examination, the
examiner is required to review the service member's medical records. Examiners are instructed to
obtain results from all pertinent studies, evaluations, and tests, and order or perform any further
studies, evaluations, or tests needed to diagnose a mental disorder before completing their report.
In addition, examiners must assess the individual for functional impairment. The examination
report is used along with all other evidence to determine what level of compensation may be
awarded to the veteran or service member.
VHA policy requires mental health examiners to review all records provided by VBA as
part of a comprehensive evaluation. These records typically include the claimant’s medical
record. If there are psychological tests in the claimant’s medical record, these should be reviewed
as part of the evidence used in a comprehensive examination. The option to order additional
psychological tests, including validity tests, is left to the discretion of the examiner. VA policy
neither requires nor prohibits the ordering or use of any specific tests or categories of tests to
evaluate any mental health conditions.
Unum is the largest commercial disability insurer in the United States for both short-term
and long-term disability. The committee looked to its processes to gain an understanding of how
private disability insurers approach the use of psychological testing in adjudicating claims.24 In
evaluating a claim, examiners, who are clinicians, are required to consider all of the information
in the claimant’s file, including the results of previously administered psychological and
neuropsychological tests. Examiners will attempt to acquire the raw test materials—the actual
reports, the actual scores, the actual tests with the questions and answers—to analyze those data
independently and determine whether they match the conclusions of the clinician who
administered the tests. The examiners also are mandated to speak to the claimant’s attending
physicians.
24
The information in this section is drawn from the presentation to the committee by Thomas McLaren (2014).
At its most basic, the role of the legal system is to adjudicate disputes based on factual
evidence. To achieve this goal, the courts rely on the collection of facts from a multitude of
sources that are directly relevant to a specific legal question. One such source of information is
the testimony of witnesses, who may provide the court with factual evidence based on personal
knowledge of the matter but are prohibited from testifying based on their own opinions or
analysis (Federal Rules of Evidence, Rule 602). However, under certain circumstances, the law
does allow for the provision of opinions by an expert based on facts or data in the case (Federal
Rules of Evidence, Rule 703). According to Rule 702 of the Federal Rules of Evidence:
A witness who is qualified as an expert by knowledge, skill, experience, training,
or education may testify in the form of an opinion or otherwise if:
(1) the expert’s scientific, technical, or other specialized knowledge will help the
trier of fact to understand the evidence or to determine a fact in issue;
(2) the testimony is based on sufficient facts or data;
(3) the testimony is the product of reliable principles and methods; and
(4) the expert has reliably applied the principles and methods to the facts of the
case.
With the requirement that the expert witness be able to provide information that is
directly relevant to the question at hand, such witnesses can come from a variety of fields,
including mental health. Once established as an expert witness, a mental health professional (i.e.,
psychologist, psychiatrist, or social worker) may provide expert opinion to assist in answering
the legal question at hand.
Psychological assessments may be used in a variety of contexts and at all stages of the
judicial process. For example, one of the primary uses for psychological assessments is to assess
competency. During pretrial information gathering, this includes competencies such as whether a
defendant was competent to consent to search and seizure or to confess, or to answer questions
regarding mental state at the time of the offense. Similarly, psychological assessments may be
used during the trial phase to answer questions related to competence to plead guilty, waive the
right to counsel, testify, or refuse an insanity defense. Following a guilty verdict, psychological
International Community
Canada
The Canada Pension Plan (CPP) provides disability benefits to eligible individuals using
much the same criteria in its disability determination process as SSA does (Government of
Canada, 2014) As in the United States, there are a number of different settings in which
disability determinations are made. Settings in addition to the CPP include the Worker Safety
Insurance Board, Veterans Affairs Canada, and the auto insurance industry. Psychologists and
neuropsychologists do not work under the Canadian national healthcare systems. As a result,
they work in a number of other settings, such as auto insurance.
25
The Association for Scientific Advancement in Psychological Injury and Law has identified a third type of
validity important to forensic psychological assessment, termed response validity, as “the accuracy of the
examinee’s responses to autobiographical questions (e.g., educational history, vocational history, legal history) and
questions pertaining to the legal matter in question (e.g., the nature of, and events surrounding, an injury, crime, or
traumatic event)” (Bush et al., 2014, p. 199).
Brian Levitt presented to the committee on the use of psychological testing under private
auto in the province of Ontario as well as tort law in Ontario. In this setting as well, the decision
of whether to administer psychological tests and, if so, which particular test to use is determined
by the individual psychologists according to the practice standards in that area of inquiry. The
Canadian Academy of Psychologists and Disability Assessment standards related to
psychological testing include the following:
• A psychologist shall employ standardized psychometric tests whenever possible,
• Psychologists whenever possible shall employ psychometric procedures which
measure response bias and symptom validity, and
• Psychologists shall address any apparent discrepancies between the results of
psychometric tests and other information.
These standards are consistent with the message that the use of validity tests is important,
but they constitute only one piece of data, which must be interpreted in the context of all the
other information.
Europe
Merten and colleagues (2013) have reported that large-scale research on and use of SVTs
and PVTs in Europe followed that in the United State by about a decade, beginning in earnest in
the early 2000s. As in the United States, the setting or context (forensic, clinical, etc.) seems to
matter (Dandachi-FitzGerald et al., 2013; McCarter et al., 2009; Merten et al., 2013). Although it
is important to note that in the study by Dandachi-FitzGerald and colleagues (2013) the
definition of SVT was left to the respondent. Everything from discrepancies between records and
observed behavior, to more “objective” scales on personality and effort tests was included,
making it very difficult to interpret the findings regarding the percent of medical professionals
using SVTs when contracted to assess work capacity due to claims of psychological disability.
There also appear to be differences in SVT and PVT use across European countries, with
practitioners in the Netherlands and Norway reporting the greatest use of such tests (Merten et
al., 2013).
Closing Comments
SSA, the military, the VBA, private disability insurance providers, and forensic
assessment in civil and criminal judicial contexts have different goals, needs, and approaches to
the evaluation and determination of disability (see Table 2-3). All share common elements,
including identification of the presence of impairment and evaluation of its effect on the
individual’s ability to function.
Although the use of psychological testing must be understood in the context of each
system’s goals, each of the systems encourages a comprehensive evaluation, as determined by
the evaluator, in an effort to answer these questions and each permits a broad range of
evaluations. Whether to order psychological tests and the selection of which tests to administer
are left to the discretion of the professional performing the evaluation or examination. With the
exception of SSA, all of the systems permit, or in some cases require, the use of validity testing
to provide information about the validity of the results of other psychological tests being
administered. Nevertheless, all agree that although validity tests yield important information, the
results of such tests are only one piece of data that needs to be assessed and interpreted in the
context of all the other information available.
Policy on
Psychological or
Who Performs the What Are the Psychological Tests Neuropsychological
Setting Assessments Assessments Employed Tests Concerns/Conflicts
Private Disability evaluators: Clinical files or Any relevant, Evaluator determines Industry has additional
Neuropsychologists recordsa scientifically valid necessary testing resources
Psychologists tests PVTs/SVTs required Each company makes its own
Psychiatrists policy
Social workers
Forensic: Mental health Hired by defense or
Psychological Testing in the Service of Disability Determination
FINDINGS
• There currently is great variability in allowance rates for both SSI and SSDI
among states that is not fully accounted for by differences in the populations of
applicants. There also is great variability in the disability determination appeal
rulings among administrative law judges within and across states.
• Each state DDS agency, within the confines of SSA policy, issues its own rules
regarding the tests that may be purchased as part of a consultative examination.
For this reason, there is variation among states about when and which
standardized psychological tests can be purchased, with the exception of PVTs
and SVTs, which are precluded from purchase by SSA.
• There currently are no data on the rates of false positives and false negatives in
SSA disability determinations.
• Identification and documentation of the presence and severity of medically
determinable mental impairments at Step 2 of SSA’s disability determination
process is could be informed by results of standardized psychological tests.
• Identification and assessment of the severity of work-related functional
impairment relevant to disability evaluations at the listing level (Step 3) and to
mental residual functional capacity (Steps 4 and 5) are other points in SSA’s
disability determination process that could be informed by results of
standardized psychological tests.
• Consultative examinations may be ordered by DDS examiners or ALJs to
supplement evidence in a claimant’s case record. Psychological tests could be
administered as part of a CE.
• In some cases, SSA disability examiners must evaluate the credibility of
statements by individuals about the intensity and persistence of their symptoms
and the effect on the individual’s ability to function and perform work-related
activities.
• Current data on the prevalence of inconsistent reporting of symptoms or
performing below one’s capability on cognitive tests among SSDI and SSI
applicant populations are limited.
• Current SSA policy precludes the purchase of (validity) tests—e.g., Minnesota
Multiphasic Personality Inventory-2 and Test of Memory Malingering—to
help inform determinations about the credibility of an individual’s statements
or about possible malingering.
• There is inconsistency among SSA’s statements on validity testing:
o Results can “provide evidence suggestive of poor effort or intentional
symptom manipulation.”
o “Malingering cannot be proven with tests”; “malingering is one aspect of
the larger sphere of inaccurate self-reporting.”
o “No test … conclusively determines the presence of inaccurate patient
self-report.”
o “Even a high likelihood of malingering does not preclude severe
limitations resulting from a genuine medically determinable impairment.”
REFERENCES
Larrabee, G. J. 2007b. Introduction: Malingering, research designs, and base rates. In Assessment
of malingered neuropsychological deficits, edited by G. J. Larrabee. New York: Oxford
University Press.
Laurence, B. 2015. Third level of appeal for disability: Appeals council & remands.
http://www.disabilitysecrets.com/appeals-council.html (accessed October 20, 2014).
McCarter, R. J., N. H. Walton, D. N. Brooks, and G. E. Powell. 2009. Effort testing in
contemporary UK neuropsychological practice. Clinical Neuropsychologist 23(6):1050-
1066.
McLaren, T. 2014. Use of performance and symptom validity assessment within the independent
disability insurer context. Presentation given to the IOM Committee on Psychological
Testing, Including Validity Testing, for Social Security Administration Disability
Determinations: Meeting 1, June 25, 2014. Washington, DC: Institute of Medicine.
Meehl, P. E. 1954. Clinical versus statistical prediction: A theoretical analysis and a review of
the evidence. Minneapolis, MN: University of Minnesota Press.
Merten, T., B. Dandachi-FitzGerald, V. Hall, B. A. Schmand, P. Santamaría, and H. González-
Ordi. 2013. Symptom validity assessment in European countries: Development and state
of the art. Clínica y Salud 24(3):129-138.
Mittenberg, W., C. Patton, E. M. Canyock, and D. C. Condit. 2002. Base rates of malingering and
symptom exaggeration. Journal of Clinical and Experimental Neuropsychology
24(8):1094-1102.
Moore, D., and P. Healy. 2008. The trouble with overconfidence. American Psychological
Association 115(2):502-517.
Morton, D. 2014. Social security disability: FourLevels of appeal. http://www.nolo.com/legal-
encyclopedia/social-security-disability-appeal-levels-32398.html (accessed March 27,
2015).
Office of the Inspector General, SSA (Social Security Administration). 2013. The Social Security
Administration’s policy on symptom validity tests in determining disability claims.
Washington, DC: SSA. http://oig.ssa.gov/sites/default/files/audit/full/pdf/A-08-13-
23094.pdf (accessed March 27, 2015).
Oldershaw, L., and M. Bagby. 1997. Children and deception. New York: Guildford.
Oskamp, S. 1965. Overconfidence in case-study judgments. Journal of Consulting Psychology
29(3):261-265.
Pollack, S. 2014. VA policies and/or practices surrounding the use of psychological tests and
symptom validity tests in the disability determination process. Presentation given to the
IOM Committee on Psychological Testing, Including Validity Testing, for Social
Security Administration Disability Determinations: Meeting 2, June 25, 2014.
Washington, DC: Institute of Medicine.
Price, J. H. 2014. Disability Determination Services panel discussion with the committee.
Presentation given to the IOM Committee on Psychological Testing, Including Validity
Testing, for Social Security Administration Disability Determinations: Meeting 2, August
11, 2014. Washington, DC: Institute of Medicine.
Rondinelli, R. D., ed. 2008. AMA guides to the evaluation of permanent impairment, 6th edition.
Chicago, IL: American Medical Association.
Rupp. 2012. Factors affecting initial disability allowance rates for the Disability Insurance and
Supplemental Security Income programs: The role of the demographic and diagnostic
composition of applicants and local labor market conditions. Social Security Bulletin
72(4):11-35. http://ssrn.com/abstract=2172488 (accessed February 4, 2015).
Rupp, K., and D. Stapleton. 1995. Determinants of the growth in the Social Security
Administration's disability programs—an overview. Social Security Bulletin 58(4):43-70.
Salzinger, K. 2005. Clinical, statistical, and broken-leg predictions. Behavior and Philosophy
33:91-99.
Samuel, R. Z., and W. Mittenberg. 2005. Determination of malingering in disability evaluations.
Primary Psychiatry 12(12):60-68.
Scheinkman, J. A., and W. Xiong. 2003. Overconfidence and speculative bubbles. Journal of
Political Economy 111(6):1183-1220.
Seegmiller, R. 2014. Use of psychological tests, including PVTs and SVTs, in select populations:
The U.S. military. Presentation given to the IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations:
Meeting 2, June 25 2014. Washington, DC: Institute of Medicine.
Soss, J., and L. R. Keiser. 2006. The political roots of disability claims: How state environments
and policies shape citizen demands. Political Research Quarterly 59(1):133-148.
SSA (Social Security Administration). 1996a. SSR 96-3p: Policy interpretation ruling. Titles II
and XVI: Considering allegations of pain and other symptoms in determining whether a
medically determinable impairment is severe.
http://www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-03-di-01.html (accessed
August 20, 2014).
SSA. 1996b. SSR 96-4p: Policy interpretation ruling. Titles II and XVI: Symptoms, medically
determinable physical and mental impairments, and exertional and nonexertional
limitations. http://www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-04-di-01.html
(accessed October 3, 2014).
SSA. 1996c. SSR 96-7p: Policy interpretation ruling Titles II and XVI: Evaluation of symptoms
in disability claims: Assessing the credibility of an individual's statements.
http://www.socialsecurity.gov/OP_Home/rulings/di/01/SSR96-07-di-01.html (accessed
October 3, 2014).
SSA. 2008. National Q&A, 08-003 rev 2, do tests of malingering have any value for SSA
evaluations? Washington, DC: SSA.
SSA. 2009. DI 22511.005 Documenting the impact of a medically determinable mental
impairment on an individual's ability to work. Program Operations Manual System
(POMS). https://secure.ssa.gov/apps10/poms.nsf/lnx/0422511005 (accessed January 30,
2015).
SSA. 2010. Revised medical criteria for evaluating mental disorders. Federal Register
75(160):51336-51368.
SSA. 2012a. DI 00115.001 Social Security Administration’s (SSA) disability programs. Program
Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/0400115001
(accessed August 20, 2014).
SSA. 2012b. DI 22501.001 Disability case development for medical and other evidence. Program
Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/0422501001
(accessed October 3, 2014).
SSA. 2012c. DI 22510.048 Pediatric consultative examination (CE) report content guidelines—
mental disorders. Program Operations Manual System (POMS).
https://secure.ssa.gov/poms.nsf/lnx/0422510048 (accessed October 3, 2014).
SSA. 2012d. DI 22511.007 Sources of evidence. Program Operations Manual System (POMS).
https://secure.ssa.gov/apps10/poms.nsf/lnx/0422511007 (accessed December 30, 2014).
SSA. 2012e. Disability Determination Services adminsistrative letter no. 866: Consulative
examinaitons malingering & credibility tests—Information. Washington, DC: SSA.
SSA. 2012f. Social Security testimony before congress. Statement of Michael I. Astrue,
Commisioner, Social Security Administration before the Committee on Ways and Means
Subcommittee on Social Security, June 27, 2012.
http://www.ssa.gov/legislation/testimony_062712.html (accessed October 20, 2014).
SSA. 2013. DI 22510.006 When not to purchase a consultative examination (CE). Program
Operations Manual System (POMS). https://secure.ssa.gov/poms.nsf/lnx/0422510006
(accessed October 3, 2014).
SSA. 2014a. Annual report of the Supplemental Security Income Program. Baltimore, MD: SSA.
SSA. 2014b. Annual statistical report on the Social Security Disability Insurance Program, 2013.
Washington, DC. SSA. http://www.ssa.gov/policy/docs/statcomps/di_asr/ (accessed
February 24, 2015).
SSA. 2014c. DDS performance management report disability claims data consultative
examination rates, fiscal year 2013. Data prepared by ORDP, ODP, and ODPMI. Data
submitted to IOM Committee on Psychological Testing, Including Validity Testing, for
Social Security Administration Disability Determinations by Joanna Firmin, Social
Security Administration, on October 8, 2014.
SSA. 2014d. Disability claims data (initial, reconsideration, continuing disability review) by
adjudictive level and body system. SSDI, SSI, Concurrent, and Total Claims. Data
prepared by ORDP, ODP, ODPMI. Submitted to the Committee on Psychological
Testing, Including Validity Testing, for Social Security Administration Disability
Determinations by Joanna Firmin, Social Security Administration, on October 8, 2014.
SSA. 2014e. DI 22510.021 Consultative examination (CD) report content guidelines: Mental
disorders. Program Operations Manual System (POMS).
https://secure.ssa.gov/poms.nsf/lnx/0422510021 (accessed October 3, 2014).
SSA. 2014f. DI 24515.008 Titles II and XVI: Considering opinions and other evidence from
sources who are not “acceptable medical sources” in disability claims; considering
decisions on disability by other governmental and nongovernmental agencies (SSR 06-
03p). Program Operations Manual System (POMS)
https://secure.ssa.gov/poms.nsf/lnx/0424515008 (accessed February 24, 2015).
SSA. 2014g. DI 24515.075 Evaluating claims involving Chronic Fatigue Syndrome (CFS).
Program Operations Manual System (POMS).
https://secure.ssa.gov/poms.nsf/lnx/0424515075 (accessed December, 2014).
SSA. 2014h. National data: Title II—SSDI, Title XVI—SSI, & concurrent Title II/XVI initial
disability determinations. By regulation basis code for adults and children (reason for
decision), fiscal year 2013. Data submitted to IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations
by Joanna Firmin, Social Security Administration, on October 23 2014.
SSA. 2014i. Open government initiative. Data on combined Title II disability and Title XVI
blind/disabled average processing time (in days) (excludes technical denials)
http://www.ssa.gov/open/data/Combined-Disability-Processing-Time.html (accessed
December 16, 2014).
SSA. 2014j. SSDI awards by diagnostic group and age of awardee under the age of 65, 2013
(preliminary data). Data submitted to IOM Committee on Psychological Testing,
Including Validity Testing, for Social Security Administration Disability Determinations
by Joanna Firmin, Social Security Administration, on October 21, 2014.
SSA. 2014k. SSI annual statistical report, 2013. Washington, DC: SSA.
http://www.ssa.gov/policy/docs/statcomps/ssi_asr/ (accesed February 24, 2015).
SSA. 2014l. SSI awards by diagnostic group and age of awardee under the age of 65, 2013. Data
prepared by ORDP, ODP, and ODPMI. Data submitted to IOM Committee on
Psychological Testing, Including Validity Testing, for Social Security Administration
Disability Determinations by Joanna Firmin, Social Security Administration, on October
21, 2014.
3-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
records recognized that may aid the clinician in reaching a diagnosis. Conceptually, clinical
interviewing explores the presenting complaint(s) (i.e., referral question), informs the
understanding of the case history, aids in the development of hypotheses to be examined in the
assessment process, and assists in determination of methods to address the hypotheses through
formal testing.
An important piece of the assessment process and the focus of this report, psychological
testing consists of the administration of one or more standardized procedures under particular
environmental conditions (e.g., quiet, good lighting) in order to obtain a representative sample of
behavior. Such formal psychological testing may involve the administration of standardized
interviews, questionnaires, surveys, and/or tests, selected with regard to the specific examinee
and his or her circumstances, that offer information to respond to an assessment question.
Assessments, then, serve to respond to questions through the use of tests and other procedures. It
is important to note that the selection of appropriate tests requires an understanding of the
specific circumstances of the individual being assessed, falling under the purview of clinical
judgment. For this reason, the committee refrains from recommending the use of any specific test
in this report. Any reference to a specific test is to provide an illustrative example, and should
not be interpreted as an endorsement by the committee for use in any specific situation; such a
determination is best left to a qualified assessor familiar with the specific circumstances
surrounding the assessment.
To respond to questions regarding the use of psychological tests for the assessment of the
presence and severity of disability due to mental disorders, this chapter provides an introductory
review of psychological testing. The chapter is divided into three sections: (1) types of
psychological tests, (2) psychometric properties of tests, and (3) test user qualifications and
administration of tests. Where possible an effort has been made to address the context of
disability determination; however, the chapter is primarily an introduction to psychological
testing.
There are many facets to the categorization of psychological tests, and even more if one
includes educationally oriented tests; indeed, it is often difficult to differentiate many kinds of
tests as purely psychological tests as opposed to educational tests. The ensuing discussion lays
out some of the distinctions among such tests; however, it is important to note that there is no
one correct cataloging of the types of tests because the different categorizations often overlap.
Psychological tests can be categorized by the very nature of the behavior they assess (what they
measure), their administration, their scoring, and how they are used. Figure 3-1 illustrates the
types of psychological measures as described in this report.
OVERVIE
EW OF PSYC
CHOLOGICAL
L TESTING 3-3
One
O of the mo ost common n distinctionss made amonng tests relattes to whetheer they are
measuress of typical behavior
b (oft
ften non-cogn nitive measuures) versus tests of maxximal
performaance (often cognitive
c tests) (Cronbacch, 1949, 19660). A meassure of typicaal behavior aasks
those commpleting the instrument to describe whatw they w would commoonly do in a given situatiion.
Measuress of typical behavior,
b succh as personnality, interessts, values, aand attitudess, may be
referred to
t as non-cog gnitive meassures. A testt of maximall performancce, obviouslyy enough, assks
people to
o answer queestions and solve
s problemms as well a s they possibbly can. Beccause tests off
maximal performancce typically involve cogn nitive performmance, they are often referred to as
cognitivee tests. Most intelligencee and other ability
a tests w
would be connsidered coggnitive tests; they
can also be
b known ass ability testss, but this woould be a moore limited ccategory. Noon-cognitive
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
measures rarely have correct answers per se, although in some cases (e.g., employment tests)
there may be preferred responses; cognitive tests almost always have items that have correct
answers. It is through these two lenses—non-cognitive measures and cognitive tests—that the
committee examines psychological testing for the purpose of disability evaluation in this report.
One distinction among non-cognitive measures is whether the stimuli composing the
measure are structured or unstructured. A structured personality measure, for example, may ask
people true-or-false questions about whether they engage in various activities or not. Those are
highly structured questions. On the other hand, in administering some commonly used
personality measures, the examiner provides an unstructured projective stimulus such as an
inkblot or a picture. The test-taker is requested to describe what they see or imagine the inkblot
or picture to be describing. The premise of these projective measures is that when presented with
ambiguous stimuli an individual will project his or her underlying and unconscious motivations
and attitudes. The scoring of these latter measures is often more complex than it is for structured
measures.
There is great variety in cognitive tests and what they measure, thus requiring a lengthier
explanation. Cognitive tests are often separated into tests of ability and tests of achievement;
however, this distinction is not as clear-cut as some would portray it. Both types of tests involve
learning. Both kinds of tests involve what the test-taker has learned and can do. However,
achievement tests typically involve learning from very specialized education and training
experiences; whereas, most ability tests assess learning that has occurred in one’s environment.
Some aspects of learning are clearly both; for example, vocabulary is learned at home, in one’s
social environment, and in school. Notably, the best predictor of intelligence test performance is
one’s vocabulary, which is why it is often given as the first test during intelligence testing or in
some cases represents the body of the intelligence test (e.g., the Peabody Picture Vocabulary
Test). Conversely, one can also have a vocabulary test based on words one learns only in an
academic setting. Intelligence tests are so prevalent in many clinical psychology and
neuropsychology situations that we also consider them as neuropsychological measures. Some
abilities are measured using subtests from intelligence tests; for example, certain working
memory tests would be a common example of an intelligence subtest that is used singly as well.
There are also standalone tests of many kinds of specialized abilities.
Some ability tests are broken into verbal and performance tests. Verbal tests, obviously
enough, use language to ask questions and demonstrate answers. Performance tests on the other
hand minimize the use of language; they can involve solving problems that do not involve
language. They may involve manipulating objects, tracing mazes, placing pictures in the proper
order, and finishing patterns, for example. This distinction is most commonly used in the case of
intelligence tests, but can be used in other ability tests as well. Performance tests are also
sometimes used when the test-taker lacks competence in the language of the testing. Many of
these tests assess visual spatial tasks. Historically, nonverbal measures were given as intelligence
tests for non-English speaking soldiers in the United States as early as World War I. These tests
continue to be used in educational and clinical settings given their reduced language component.
Different cognitive tests are also considered to be speeded tests versus power tests. A
truly speeded test is one that everyone could get every question correct if they had enough time.
Some tests of clerical skills are exactly like this; they may have two lists of paired numbers, for
example, where some pairings contain two identical numbers and other pairings are different.
The test-taker simply circles the pairings that are identical. Pure power tests are measures in
which the only factor influencing performance is how much the test-taker knows or can do. A
true power test is one where all test-takers have enough time to do their best; the only question is
what they can do. Obviously, few tests are either purely speeded or purely power tests. Most
have some combination of both. For example, a testing company may use a rule of thumb that 90
percent of test-takers should complete 90 percent of the questions; however, it should also be
clear that the purpose of the testing affects rules of thumb such as this. Few teachers would wish
to have many students unable to complete the tests that they take in classes, for example. When
test-takers have disabilities that affect their ability to respond to questions quickly, some
measures provide extra time, depending upon their purpose and the nature of the characteristics
being assessed.
Questions on both achievement and ability tests can involve either recognition or free-
response in answering. In educational and intelligence tests, recognition tests typically include
multiple-choice questions where one can look for the correct answer among the options,
recognize it as correct, and select it as the correct answer. A free-response is analogous to a “fill-
in-the-blanks” or an essay question. One must recall or solve the question without choosing from
among alternative responses. This distinction also holds for some non-cognitive tests, but the
latter distinction is discussed later in this section because it focuses not on recognition but
selections. For example, a recognition question on a non-cognitive test might ask someone
whether they would rather go ice skating or to a movie; a free recall question would ask the
respondent what they like to do for enjoyment.
Cognitive tests of various types can be considered as process or product tests. Take, for
example, mathematics tests in school. In some instances, only getting the correct answer leads to
a correct response. In other cases, teachers may give partial credit when a student performs the
proper operations but does not get the correct answer. Similarly, psychologists and clinical
neuropsychologists often observe not only whether a person solves problems correctly (i.e.,
product), but how the client goes about attempting to solve the problem (i.e., process).
Test Administration
One of the most important distinctions relates to whether tests are group administered or
are individually administered by a psychologist, physician, or technician. Tests that traditionally
were group administered were paper-and-pencil measures. Often for these measures, the test-
taker received both a test booklet and an answer sheet and was required, unless he or she had
certain disabilities, to mark his or her responses on the answer sheet. In recent decades, some
tests are administered using technology (i.e., computers and other electronic media). There may
be some adaptive qualities to tests administered by computer, although not all computer-
administered tests are adaptive (technology-administered tests are further discussed below). An
individually administered measure is typically provided to the test-taker by a psychologist,
physician, or technician. More faith is often provided to the individually administered measure,
because the trained professional administering the test can make judgments during the testing
that affect the administration, scoring, and other observations related to the test.
Tests can be administered in an adaptive or linear fashion, whether by computer or
individual administrator. A linear test is one in which questions are administered one after
another in a pre-arranged order. An adaptive test is one in which the test-taker’s performance on
earlier items affects the questions he or she received subsequently. Typically, if the test-taker is
answering the first questions correctly or in accordance with preset or expected response
algorithms, for example, the next questions are still more difficult until the level appropriate for
the examinee performance is best reached or the test is completed. If one does not answer the
first questions correctly or as typically expected in the case of a non-cognitive measure, then
easier questions would generally be presented to the test-taker.
Tests can be administered in written (keyboard or paper and pencil) fashion, orally, using
an assistive device (most typically for individuals with motor disabilities), or in performance
format, as previously noted. It is generally difficult to administer oral or performance tests in a
group situation; however, some electronic media are making it possible to administer such tests
without human examiners.
Another distinction among measures relates to who the respondent is. In most cases, the
test-taker him- or herself is the respondent to any questions posed by the psychologist or
physician. In the case of a young child, many individuals with autism, or an individual, for
example, who has lost language ability, the examiner may need to ask others who know the
individual (parents, teachers, spouses, family members) how they behave and to describe their
personality, typical behaviors, and so on.
Scoring Differences
Test Content
As noted previously, the most important distinction among most psychological tests is
whether they are assessing cognitive versus non-cognitive qualities. In clinical psychological and
neuropsychological settings such as are the concern of this volume, the most common cognitive
tests are intelligence tests, other clinical neuropsychological measures, and performance validity
measures. Many tests used by clinical neuropsychologists, psychiatrists, technicians, or others
assess specific types of functioning, such as memory or problem solving. Performance validity
measures are typically short assessments and are sometimes interspersed among components of
other assessments that help the psychologist determine whether the examinee is exerting
sufficient effort to perform well and responding to the best of his or her ability. Most common
non-cognitive measures in clinical psychology and neuropsychology settings are personality
measures and symptom validity measures. Some personality tests, such as the Minnesota
Multiphasic Personality Inventory (MMPI), assess the degree to which someone expresses
behaviors that are seen as atypical in relation to the norming sample.1 Other personality tests are
more normative and try to provide information about the client to the therapist. Symptom
validity measures are scales, like performance validity measures, that may be interspersed
throughout a longer assessment to examine whether a person is portraying him- or herself in an
honest and truthful manner. Somewhere between these two types of tests—cognitive and non-
cognitive—are various measures of adaptive functioning that often include both cognitive and
non-cognitive components.
Reliability
Reliability refers to the degree to which scores from a test are stable and results are
consistent. When constructs are not reliably measured the obtained scores will not approximate a
true value in relation to the psychological variable being measured. It is important to understand
that observed or obtained test scores are considered to be composed of true and error elements. A
standard error of measurement is often presented to describe, within a level of confidence (e.g.,
95 percent), that a given range of test scores contains a person’s true score, which acknowledges
the presence of some degree of error in test scores and that obtained test scores are only
estimates of true scores (Geisinger, 2013).
Reliability is generally assessed in four ways:
1. Test-retest: Consistency of test scores over time (stability, temporal consistency);
2. Inter-rater: Consistency of test scores between independent judges;
3. Parallel or alternate forms: Consistency of scores across different forms of the test
(stability and equivalence); and
1
This may be in comparison to a nationally representative norming sample, or with certain tests or measures, such
as the MMPI, particular clinically diagnostic samples.
A number of factors can affect the reliability of a test’s scores. These include time
between two testing administrations that affect test-retest and alternate-forms reliability, and
similarity of content and expectations of subjects regarding different elements of the test in
alternate forms, split-half, and internal consistency approaches. In addition, changes in subjects
over time and introduced by physical ailments, emotional problems, or the subject’s
environment, or test-based factors such as poor test instructions, subjective scoring, and guessing
will also affect test reliability. It is important to note that a test can generate reliable scores in one
context and not in another, and that inferences that can be made from different estimates of
reliability are not interchangeable (Geisinger, 2013).
Validity
While the scores resulting from a test may be deemed reliable, this finding does not
necessarily mean that scores from the test have validity. Validity is defined as “the degree to
which evidence and theory support the interpretations of test scores for proposed uses of tests”
(AERA et al., 2014, p. 11). In discussing validity, it is important to highlight that validity refers
not to the measure itself (i.e., a psychological test is not valid or invalid) or the scores derived
from the measure, but rather the interpretation and use of the measure’s scores. To be considered
valid, the interpretation of test scores must be grounded in psychological theory and empirical
evidence that demonstrates a relationship between the test and what it purports to measure (Furr
and Bacharach, 2013; Sireci and Sukin, 2013). Historically, the fields of psychology and
education have described three primary types of evidence related to validity (Sattler, 2014; Sireci
and Sukin, 2013):
1. Construct evidence of validity: The degree to which an individual’s test scores
correlate with the theoretical concept the test is designed to measure (i.e., evidence
that scores on a test correlate relatively highly with scores on theoretically similar
measures and relatively poorly with scores on theoretically dissimilar measures);
2. Content evidence of validity: The degree to which the test content represents the
targeted subject matter and supports a test’s use for its intended purposes; and
3. Criterion-related evidence of validity: The degree to which the test’s score correlates
with other measurable, reliable, and relevant variables (i.e., criterion) thought to
measure the same construct.
Other kinds of validity with relevance to the SSA have been advanced in the literature, but are
not completely accepted in professional standards as types of validity per se. These include
• Diagnostic validity: The degree to which psychological tests are truly aiding in the
formulation of an appropriate diagnosis.
• Ecological validity: The degree to which test scores represent everyday levels of
functioning (e.g., impact of disability on an individual’s ability to function
independently).
• Cultural validity: The degree to which test content and procedures accurately reflect
the sociocultural context of the subjects being tested.
Each of these forms of validity poses complex questions regarding the use of particular
psychological measures with the SSA population. For example, ecological validity is especially
critical in the use of psychological tests with SSA given that the focus of the assessment is on
examining everyday levels of functioning. Measures like intelligence tests have been sometimes
criticized for lacking ecological validity (Groth-Marnat, 2009; Groth-Marnat and Teal, 2000).
Alternatively, “research suggests that many neuropsychological tests have a moderate level of
ecological validity when predicting everyday cognitive functioning” (Chaytor and Schmitter-
Edgecombe, 2003, p. 181).
More recent discussions on validity have shifted toward an argument-based approach to
validity, using a variety of evidence to build a case for validity of test score interpretation (Furr
and Bacharach, 2013). In this approach, construct validity is viewed as an overarching paradigm
under which evidence is gathered from multiple sources to build a case for validity of test score
interpretation. Five key sources of validity evidence that affect the degree to which a test fulfills
its purpose are generally considered (AERA et al., 2014; Furr and Bacharach, 2013; Sireci and
Sukin, 2013):
1. Test content: Does the test content reflect the important facets of the construct being
measured? Are the test items relevant and appropriate for measuring the construct and
congruent with the purpose of testing?
2. Relation to other variables: Is there a relationship between test scores and other
criterion or constructs that are expected to be related?
3. Internal structure: Does the actual structure of the test match the theoretically based
structure of the construct?
4. Response processes: Are respondents applying the theoretical constructs or processes
the test is designed to measure?
5. Consequences of testing: What are the intended and unintended consequences of
testing?
As part of the development of any psychometrically sound measure, explicit methods and
procedures by which tasks should be administered are determined and clearly spelled out. This is
what is commonly known as standardization. Typical standardized administration procedures or
expectations include (1) a quiet, relatively distraction free environment; (2) precise reading of
scripted instructions; and (3) provision of necessary tools or stimuli. All examiners use such
methods and procedures during the process of collecting the normative data, and such procedures
normally should be used in any other administration, which enables application of normative
data to the individual being evaluated (Lezak et al., 2012).
Standardized tests provide a set of normative data (i.e., norms), or scores derived from
groups of people for whom the measure is designed (i.e., the designated population) to which an
individual’s performance can be compared. Norms consist of transformed scores such as
percentiles, cumulative percentiles, and standard scores (e.g., T-scores, Z-scores, stanines, IQs),
allowing for comparison of an individual’s test results with the designated population. Without
standardized administration, the individual’s performance may not accurately reflect his/her
ability. For example, an individual’s abilities may be overestimated if the examiner provides
additional information or guidance than what is outlined in the test administration manual.
Conversely, a claimant’s abilities may be underestimated if appropriate instructions, examples,
or prompts are not presented. When nonstandardized administration techniques must be used,
norms should be used with caution due to the systematic error that may be introduced into the
testing process; this topic is discussed in detail later in the chapter.
It is important to clearly understand the population for which a particular test is intended.
The standardization sample is another name for the norm group. Norms enable one to make
meaningful interpretations of obtained test scores, such as making predictions based on evidence.
Developing appropriate norms depends on size and representativeness of the sample. In general,
the more people in the norm group the closer the approximation to a population distribution so
long as they represent the group who will be taking the test.
Norms should be based upon representative samples of individuals from the intended test
population, as each person should have an equal chance of being in the standardization sample.
Stratified samples enable the test developer to identify particular demographic characteristics
represented in the population and more closely approximate these features in proportion to the
population. For example, intelligence test scores are often established based upon census-based
norming with proportional representation of demographic features including race and ethnic
group membership, parental education, socioeconomic status, and geographic region of the
country.
When tests are applied to individuals for whom the test was not intended and, hence,
were not included as part of the norm group, inaccurate scores and subsequent misinterpretations
may result. Tests administered to persons with disabilities often raise complex issues. Test users
sometimes use psychological tests that were not developed or normed for individuals with
disabilities. It is critical that tests used with such persons (including SSA disability claimants)
include attention to representative norming samples; when such norming samples are not
available, it is important for the assessor to note that the test or tests used are not based on
representative norming samples and the potential implications for interpretation (Turner et al.,
2001).
Performance on psychological tests often has significant implications (high stakes) in our
society. Tests are in part the gatekeepers for educational and occupational opportunities and play
a role in SSA determinations. As such, results of psychological testing may have positive or
negative consequences for an individual. Often such consequences are intended; however, there
is the possibility for unintended negative consequences. It is imperative that issues of test
fairness be addressed so no individual or group is disadvantaged in the testing process based
upon factors unrelated to the areas measured by the test. Biases simply cannot be present in these
kinds of professional determinations. Moreover, it is imperative that research demonstrates that
measures can be fairly and equivalently used with members of the various subgroups in our
population. It is important to note that there are people from many language and cultural groups
for whom there are no available tests with norms that are appropriately representative for them.
As noted above, in such cases it is important for assessors to include a statement about this
situation whenever it applies and potential implications on scores and resultant interpretation.
While all tests reflect what is valued within a particular cultural context (i.e., cultural
loading), bias refers to the presence of systematic error in the measurement of a psychological
construct. Bias leads to inaccurate test results given that scores reflect either overestimations or
underestimations of what is being measured. When bias occurs based upon culturally related
variables (e.g., race, ethnicity, social class, gender, educational level) then there is evidence of
cultural test bias (Suzuki et al., 2014).
Relevant considerations pertain to issues of equivalence in psychological testing as
characterized by the following (Suzuki et al., 2014, p. 260):
• Functional: Whether the construct being measured occurs with equal frequency across
groups;
• Conceptual: Whether the item information is familiar across groups and means the
same thing in various cultures;
• Scalar: Whether average score differences reflect the same degree, intensity, or
magnitude for different cultural groups;
• Linguistic: Whether the language used has similar meaning across groups; and
• Metric: Whether the scale measures the same behavioral qualities or characteristics
and the measure has similar psychometric properties in different cultures.
2
The brief overview presented here draws on the works of De Ayala (2009) and DeMars (2010), to which the reader
is directed for additional information.
a person’s actual score were there no error present in the assessment (and unfortunately, there is
always some error, both random and systematic). The model further assumes that all error is
random and that any correlation between error and some other variable, such as true scores, is
effectively zero (Geisinger, 2013). The approach leans heavily on reliability theory, which is
largely derived from the premises mentioned above.
Since the 1950s and largely since the 1970s, a newer mathematically sophisticated model
developed called item response theory (IRT). The premise of these IRT models is most easily
understood in the context of cognitive tests, where there is a correct answer to questions. The
simplest IRT model is based upon the notion that the answering of a question is generally based
on only two factors: the difficulty of the question and the ability level of the test-taker. Computer
adaptive testing estimates scores of the test-taker after each response to a question and adjusts
the administration of the next question accordingly. For example, if a test-taker answers a
question correctly, he or she is likely to receive a more difficult question next. If one, on the
other hand, answers incorrectly, he or she is more likely to receive an easier question, with the
“running score” held by the computer adjusted accordingly. It has been found that such
computer-adaptive tests can be very efficient.
Item response theory models have made the equating of test form far easier. Equating
tests permits one to use different forms of the same examination with different test items to yield
fully comparable scores due to slightly different item difficulties across forms. To convert the
values of item difficulty to determine the test-taker’s ability scores one needs to have some
common items across various tests; these common items are known as anchor items. Using such
items, one can essentially establish a fixed reference group and base judgments from other
groups on these values.
As noted above, there are a number of common IRT models. Among the most common
are the one-, two-, and three-parameter models. The one-parameter model is the one already
described; the only item parameter is item difficulty. A two-parameter model adds a second
parameter to the first, related to item discrimination. Item discrimination is the ability of the item
to differentiate those lacking the ability in high degree from those holding it. Such two-parameter
models are often used for tests like essay tests where one cannot achieve a high score by
guessing or using other means to answer currently. The three-parameter IRT model contains a
third parameter, that factor related to chance level correct scoring. This parameter is sometimes
called the pseudo-guessing parameter and this model is generally used for large-scale multiple-
choice testing programs.
These models, because of their lessened reliance on the sampling of test-takers, are very
useful in the equating of tests that is the setting of scores to be equivalent regardless of the form
of the test one takes. In some high-stakes admissions tests such as the GRE, MCAT, and GMAT,
for example, forms are scored and equated by virtue of IRT methods, which can perform such
operations more efficiently and accurately than with classical statistics.
The test user is generally considered the person responsible for appropriate use of
psychological tests, including selection, administration, interpretation, and use of results (AERA
et al., 2014). Test user qualifications include attention to the purchase of psychological measures
that specify levels of training, educational degree, areas of knowledge within domain of
assessment (e.g., ethical administration, scoring, and interpretation of clinical assessment),
In accordance with the Standards for Educational and Psychological Testing (AERA et
al., 2014) and the APA’s Guidelines for Test User Qualifications (Turner et al., 2001), many
publishers of psychological tests employ a tiered system of qualification levels (generally A, B,
C) required for the purchase, administration, and interpretation of such tests (e.g., PAR, n.d.;
Pearson Education, 2015). Many instruments, such as those discussed throughout this report,
would be considered qualification level C assessment methods, generally requiring an advanced
degree, specialized psychometric and measurement knowledge, and formal training in
administration, scoring, and interpretation. However, some may have less stringent requirements,
for example, a bachelor’s or master’s degree in a related field and specialized training in
psychometric assessment (often classified level B and/or S), or no special requirements (often
classified level C) for purchase and use. While such categories serve as a general guide for
necessary qualifications, individual test manuals provide additional detail and specific
qualifications necessary for administration, scoring, and interpretation of the test or measure.
Given the need for the use of standardized procedures, any person administering
cognitive or neuropsychological measures must be well trained in standardized administration
protocols. He or she should possess the interpersonal skills necessary to build rapport with the
individual being tested in order to foster cooperation and maximal effort during testing.
Additionally, individuals administering tests should understand important psychometric
properties, including validity and reliability, as well as factors that could emerge during testing to
place either at risk. Many doctoral-level psychologists are well-trained in test administration; in
general, psychologists from clinical, counseling, school, or educational graduate psychology
programs receive training in psychological test administration. For cases in which cognitive
deficits are being evaluated, a neuropsychologist may be needed to most accurately evaluate
cognitive functioning (see Chapter 5 for a more detailed discussion on administration and
interpretation of cognitive tests). The use of non-doctoral level psychometrists or technicians in
psychological and neuropsychological test administration and scoring is also a widely accepted
standard of practice (APA, 2010; Brandt and van Gorp, 1999; Pearson, 2015). Psychometrists are
often bachelors or masters level individuals who have received additional specialized training in
standardized test administration and scoring. They do not practice independently or interpret test
scores, but rather work under the close supervision and direction of doctoral-level clinical
psychologists or neuropsychologists.
Interpretation of testing results requires a higher degree of clinical training than
administration alone. Threats to the validity of any psychological measure of a self-report nature
oblige the test interpreter to understand the test and principles of test construction. In fact,
interpreting tests results without such knowledge would violate the ethics code established for
the profession of psychology (APA, 2010). SSA requires psychological testing be “individually
administered by a qualified specialist… currently licensed or certified in the State to administer,
score, and interpret psychological tests and have the training and experience to perform the test”
(SSA, n.d). Most doctoral-level clinical psychologists who have been trained in psychometric
test administration are also trained in test interpretation. SSA (n.d.-a) also requires individuals
who administer more specific cognitive or neuropsychological evaluations “be properly trained
in this area of neuroscience.” As such, clinical neuropsychologists—individuals who have been
specifically trained to interpret testing results within the framework of brain-behavior
relationships and who have achieved certain educational and training benchmarks as delineated
by national professional organizations—may be required to interpret tests of a cognitive nature
(AACN, 2007; NAN, 2001).
As noted in Chapter 2, SSA indicates that objective medical evidence may include the
results of standardized psychological tests. Given the great variety of psychological tests, some
are more objective than others. Whether a psychological test is appropriately considered
objective has much to do with the process of scoring. For example, unstructured measures that
call for open-ended responding rely on professional judgment and interpretation in scoring; thus,
such measures are considered less than objective. In contrast, standardized psychological tests
and measures, such as those discussed in the ensuing chapters, are structured and objectively
scored. In the case of non-cognitive self-report measures, the respondent generally answers
questions regarding typical behavior by choosing from a set of predetermined answers. With
cognitive tests, the respondent answers questions or solves problems, which usually have correct
answers, as well as he or she possibly can. Such measures generally provide a set of normative
data (i.e., norms), or scores derived from groups of people for whom the measure is designed
(i.e., the designated population) to which an individual’s responses or performance can be
compared. Therefore, standardized psychological tests and measures rely less on clinical
judgment and are considered to be more objective than those that depend on subjective scoring.
Unlike measurements such as weight or blood pressure standardized psychological tests require
the individual’s cooperation with respect to self-report or performance on a task. The inclusion
of validity testing, which will be discussed further in Chapters 4 and 5, in the test or test battery
allows for greater confidence in the test results. Standardized psychological tests that are
appropriately administered and interpreted can be considered objective evidence.
The use of psychological tests in disability determinations has critical implications for
clients. As noted earlier, issues surrounding ecological validity (i.e., whether test performance
accurately reflects real-world behavior) is of primary importance in SSA determination. Two
approaches have been identified in relation to the ecological validity of neuropsychological
assessment. The first focuses on “how well the test captures the essence of everyday cognitive
skills” in order to “identify people who have difficulty performing real-world tasks, regardless of
the etiology of the problem” (i.e., verisimilitude), and the second “relates performance on
traditional neuropsychological tests to measures of real-world functioning, such as employment
status, questionnaires, or clinician ratings” (i.e., veridicality) (Chaytor and Schmitter-
Edgecombe, 2003, pp. 182–183). Establishing ecological validity is a complicated endeavor
given the potential effect of non-cognitive factors (e.g., emotional, physical, and environmental)
on test and everyday performance. Specific concerns regarding test performance include (1) the
test environment is often not representative (i.e., artificial), (2) testing yields only samples of
behavior that may fluctuate depending upon context, and (3) clients may possess compensatory
strategies that are not employable during the testing situation; therefore, obtained scores
underestimate the test-taker’s abilities.
Activities of daily living (ADLs) and the client’s likelihood of returning to work are
important considerations in disability determinations. Occupational status, however, is complex
and often multidetermined requiring that psychological test data be complemented with other
sources of information in the evaluation process (e.g., observation, informant ratings,
environmental assessments) (Chaytor and Schmitter-Edgecombe, 2003). Table 3-1 highlights
major mental disorders, relevant types of psychological measures, and domains of functioning.
TABLE 3-1 Listings for Mental Disorders and Types of Psychological Tests
Mental Psychological Relevant Psychiatric Symptoms
Disorder Assessment Cognitive Domains of (per SSA [n.d.] Listings)
Measures and Functioning
Methods
Recurrent obsessions or
compulsions which are a source
of marked distress
Unrealistic interpretation of
physical signs or sensations
associated with the
preoccupation or belief that one
has a serious disease or injury
The SSA uses a standard assessment that examines functioning in four domains:
understanding and memory, sustained concentration and persistence, social interaction, and
adaptation. Psychological testing may play a key role in understanding a client’s functioning in
each of these areas. Box 3-1 describes ways in which these four areas of core mental residual
functional capacity are assessed ecologically. Psychological assessments often address these
areas in a more structured manner through interviews, standardized measures, checklists,
observations and other assessment procedures.
BOX 3-1
Descriptions of Tests by Four Areas of Core Mental Residual Functional Capacity*
This chapter has identified some of the basic foundations underlying the use of
psychological tests including basic psychometric principles and issues regarding test fairness.
Applications of tests can inform disability determinations. The next two chapters build on this
overview, examining the types of psychological tests that may be useful in this process,
including a review of selected individual tests that have been developed for measuring validity of
presentation. Chapter 4 focuses on non-cognitive, self-report measures and symptom validity
tests. Chapter 5 then focuses on cognitive tests and associated performance validity tests.
Strengths and limitations of various instruments are offered, in order to subsequently explore the
relevance for different types of tests for different claims, per category of disorder, with a focus
on establishing the validity of the client’s claim.
REFERENCES
Groth-Marnat, G., and M. Teal. 2000. Block design as a measure of everyday spatial ability: A
study of ecological validity. Perceptual and Motor Skills 90(2):522-526.
Hambleton, R. K., and M. J. Pitoniak. 2006. Setting performance standards. Educational
Measurement 4:433-470.
ITC (International Test Commission). 2005. ITC guidelines for translating and adaptating tests.
Geneva, Switzerland: ITC.
Kline, P. 2000. The handbook of psychological testing. 2nd ed. New York: Routledge.
Puente, A. E., and A. V. Agranovich. 2004. The cultural in cross-cultural neuropsychology. In
Comprehensive handbook of psychological assessment. Vol. 1 of Intellectual and
neuropsychological assessment, edited by M. Hersen (editor), and G. Goldstein and S. R.
Beers (volume editors). Hoboken, NJ: John Wiley & Sons. Pp. 321-332.
Sattler, J. M. 2014. Foundations of behavioral, social, and clinical assessment of children. 6th
ed. La Mesa, CA: Jerome M. Sattler, Publisher, Inc.
Sharland, M. J., and J. D. Gfeller. 2007. A survey of neuropsychologists’ beliefs and practices
with respect to the assessment of effort. Archives of Clinical Neuropsychology 22(2):213-
223.
Sireci, S. G., and T. Sukin. 2013. Test validity. In APA handbook of testing and assessment in
psychology. Vol. 1, edited by K. F. Geisinger (editor), and B. A. Bracken, J. F. Carlson, J.
C. Hansen, N. R. Kuncel, S. P. Reise, and M. C. Rodriguez (associate editors).
Washington, DC: APA.
SSA (Social Security Administration). n.d.. Disability evaluation under Social Security: 12.00
mental disorders—adult. http://www.ssa.gov/disability/professionals/bluebook/12.00-
MentalDisorders-Adult.htm (accessed November 14, 2014).
Suzuki, L. A., S. Naqvi, J. S. Hill. 2014. Assessing intelligence in a cultural context. In APA
handbook of multicultural psychology. Vol. 1, edited by F. T. L. Leong, L. Comas-Diaz,
G. C. Nagayama Hall, V. C. McLoyd, and J. E. Trimble. Washington, DC: APA.
Trimble, J. E. 2010. Cultural measurement equivalence. In Encyclopedia of cross-cultural school
psychology. New York: Springer. Pp. 316-318.
Turner, S. M., S. T. DeMers, H. R. Fox, and G. Reed. 2001. APA’s guidelines for test user
qualifications: An executive summary. American Psychologist 56(12):1099.
Weiner, I. B. 2003. The assessment process. In Handbook of psychology, edited by I. B. Weiner.
Hoboken, NJ: John Wiley & Sons.
Allegations of disability are sometimes made on the basis of self-report, with few, if any,
medical signs or laboratory findings to substantiate such claims. Often in these cases a medical
source or consultative examiner may corroborate a claimant’s history and allegations, finding
them consistent with a medically determinable impairment that causes a particular level of
functional limitation; however, the claim is still based primarily on self-report. Currently, such
evidence may be deemed sufficient to grant disability benefits, albeit via a somewhat
inconsistent process that varies from one state to another. A more systematic approach to
assessing and verifying such claims would improve the consistency and reliability of the
determination process in these cases.
To receive benefits, applicants must prove the existence of a medically determinable
physical or mental impairment and associated functional limitations that result in an inability to
engage in any substantial gainful activity. Social Security Administration (SSA) (n.d.-b) defines
a medically determinable impairment (MDI) as
an impairment that results from anatomical, physiological, or psychological
abnormalities which can be shown by medically acceptable clinical and laboratory
diagnostic techniques …[and] must be established by medical evidence consisting
of signs, symptoms, and laboratory findings—not only by the individual's
statement of symptoms.
4-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
BOX 4-1
SSA DEFINITIONS OF SYMPTOMS, SIGNS, AND LABORATORY FINDINGS
For claims based entirely on self-report, it is important to use a systematic method for
identifying and documenting a medically determinable impairment and assessing the severity of
associated functional limitations. A variety of standardized self-report measures exist that could
further systematize SSA’s disability determination process. Before delving into such measures, it
is important to briefly address the distinction between self-report of symptoms and self-report
measures. As noted above, SSA defines symptoms as “the claimant’s own description of [his or
her] physical or mental impairment, [which] alone are not enough to establish that there is a
physical or mental impairment” (20 CFR § 404.1528). In some cases, such as with children,
symptoms may be reported by a third party, for example, a parent or a teacher. The committee
refers to this as self-report of symptoms. Alternatively, there exist standardized instruments that
rely on self-report (for example, of symptoms, behaviors, personality characteristics and/or traits,
interests, values, and attitudes) with population-based normative data that allow the examiner to
compare an individual’s reported behaviors or symptoms with an appropriate comparison group
(e.g., those of the same age group, sex, education level, and/or race/ethnicity). According to SSA
regulations, such instruments may be considered medically acceptable laboratory diagnostic
techniques, and thus provide signs and laboratory findings that corroborate the claimant’s self-
report of symptoms. The committee refers to these instruments as self-report measures.
SELF-REP
EPORT MEAS
SURES AND SYMPTOM
S VALIDITY
V TES
ESTS 4-3
Among
A these self-report measures
m aree those that ttraditionallyy have been rreferred to as
psycholo ogical tests, such
s as perso
onality, multtiscale, or sinngle syndrom me inventoriies and
ws. These meeasures geneerally assess non-cognitive
standardiized psychiaatric diagnosttic interview
psycholo ogical complaints, and arre therefore referred
r to aas non-cognittive measurees.1 Howeveer, it
is also im
mportant to note
n that som me standardizzed self-repoort measuress that might bbe useful to SSA
in such cases are not considered psychologic
p measures. Exaamples may include
cal tests or m
standardiized measurees of pain, faatigue, sleep
p, or adaptivee living. Som me of these m
may contain
internal validity
v measures, and in ndeed may beb useful to S SSA in the ddisability dettermination
process; however, theese measures are considered outsidee the scope oof the commiittee and thiss
report. Fiigure 4-1 delineates betwween psycho ological (or nnon-cognitivve) self-reporrt measures and
nonpsych hological sellf-report meaasures.
1
Note that when the com
mmittee refers to
o non-cognitiv
ve measures, it is referring to standardized ppsychological sself-
report meaasures.
PREPUBLIICATION CO
OPY: UNCOR
RRECTED PR
ROOFS
In the realm of disability evaluation, the committee identified two primary areas of
impairment in which psychological self-report measures may prove beneficial to SSA disability
determinations: mental disorders and somatic symptoms disproportionate to demonstrable
medical morbidity. Each of these are discussed in turn, followed by a discussion on the ability of
psychological self-report measures to provide useful information in confirming a medically
determinable impairment and assessing functional capacity in these areas. A variety of non-
cognitive measures, such as multiscale personality measures, disorder-specific inventories, and
standardized diagnostic interviews, are provided as illustrative examples, and not an
endorsement of any specific test.
Mental Disorders
Within its mental health listings, SSA (n.d.-a) identifies nine diagnostic categories (see
Chapter 3, Table 1). Of these nine, the committee identified five categories for which non-
cognitive measures may provide useful information: (1) schizophrenic, paranoid, and other
psychotic disorders; (2) affective disorders; (3) anxiety-related disorders; (4) personality
disorders; and (5) somatoform disorders.2 Box 4-2 contains the SSA descriptions of each of the
first four mental disorders categories.
2
Though somatoform disorders are included in the SSA mental health listings, the committee focuses on this in the
next section on disproportionate somatic symptoms, alongside multisystem illnesses and chronic idiopathic pain
conditions.
BOX 4-2
SSA Definitions of Relevant Mental Disorders
BOX 4-3
Definitions of Relevant Disorders with Disproportionate Somatic Symptoms
Somatoform disordersa Physical symptoms for which there are no demonstrable organic
findings or known physiological mechanisms.
Multisystem illnessesb Characterized by multiple, widespread, nonspecific, often diffuse
symptoms that involve several different organ systems and
anatomical locations, for which no consistent biochemical,
anatomical, or physiological abnormality can be demonstrated.
Hence the medical and psychiatric status of these conditions
remains unclear.
Chronic idiopathic pain The only or predominant symptom is bodily pain, most commonly
conditionsc musculoskeletal pain, that is disproportionate to (incompletely
explained by) tissue injury or disease.
a
American Psychiatric Association, 2013
b
Barsky and Borus, 1999; Henningsen et al., 2007
c
Vranceanu et al., 2009
assessment, the MMPI-2RF, comprises 338 items that are part of 51 different scales, and was
normed on a U.S. population (n = 2227) of men and women ages 18–80. Other widely used
multi-scale inventories include the Millon Clinical Multiaxial Inventory (MCMI-III) (Millon et
al., 2009) and the Personality Assessment Inventory (PAI) (Morey, 2007). The MCMI-III is a
175-item test normed largely on individuals seeking psychiatric services. The PAI contains 344
items and was developed on a U.S. normative sample of 1,000 adults matched to the census;
additionally, 1,265 patients and 1,051 college students completed the test in the standardization
process.
Standardized psychiatric diagnostic schedules, interviews, and inventories may also
provide scientific medical findings across a broad range of psychiatric symptoms and diagnoses.
The Symptom Check-List 90 Revised (SCL-90 R) (Derogatis, 1994), a broad-based measure
designed for individuals 13 years and older, contains a list of symptoms commonly associated
with psychological difficulties and psychiatric disorders. Written at a sixth-grade level, the test
measures nine primary symptom dimensions (i.e., somatization, obsessive-compulsive disorder,
interpersonal sensitivity, depression, anxiety, hostility, phobic anxiety, paranoid ideation, and
psychoticism), assessing symptom presence, and frequency and severity across a 1-week period
of time. There is also a 53-item version of the scale, the Brief Symptom Inventory (BSI)
(Derogatis and Spencer, 1993). Designed specifically to measure subjective symptom report, the
SCL-90R has separate norms for nonpatient adults, adult psychiatric outpatients, adult
psychiatric inpatients, and nonpatient adolescents. Some reviewers suggest that this instrument is
best used to screen for global psychological distress, as the individual symptom dimensions have
not always been identified in studies examining the psychometric properties of the scale. Another
broad symptom inventory, the Patient Health Questionnaire (PHQ) (Spitzer et al., 1999), was
developed for use in primary care settings and normed against this population. From the original
test, scales to measure symptoms of depression (PHQ-9), anxiety (GAD-7), and somatic
symptom severity (PHQ-15) have been constructed, along with a derivate scale, the PHQ-SADS
that measures convergence of psychiatric symptoms often seen in primary care patients:
depression, anxiety, and somatic complaints.
Many disorder specific scales, such as the Beck Depression Inventory, second edition
(Beck et al., 1996), Hamilton Depression Rating Scale (Hamilton, 1980), Beck Anxiety
Inventory (Beck and Steer, 1993), and PTSD (Posttraumatic stress disorder) Checklist (Weathers
et al., 1994) may also provide medical evidence to corroborate patients’ identification and report
of symptoms.
Confirming the diagnosis of disproportionate somatic symptoms may be more difficult,
as the first step involves ruling out the presence of demonstrable anatomical, biochemical, or
physiological abnormalities as the sole cause for symptom presentation and severity. Note that
this does not rule out the existence of such abnormalities, but that reported symptom severity is
disproportionate to the diagnosis. Additionally, the lack of a medical explanation does not
automatically equal a psychiatric diagnosis (American Psychiatric Association, 2013). There are
a variety of self-report questionnaires to assess somatization and somatoform disorders, which
examine the number, nature, intensity, persistence, and severity of physical symptoms. These
instruments include the PHQ-15, the somatization subscale of the SCL-90 R, the Somatic
Symptom Inventory (SSI), and the MMPI-2-RF. There are also several structured diagnostic
interviews containing modules for diagnosing somatoform disorders, including the Composite
International Diagnostic Interview (CIDI) (WHO, 1993), Structured Clinical Interview for DSM
(SCID) (First et al., 2012; Gibbon et al., 1997), the Mini International Neuropsychiatric
Interview (MINI) (Sheehan et al., 1998), and the Schedule for Clinical Assessment in
Neuropsychiatry (SCAN) (Wing et al., 1990).
There are a great many self-report inventories for assessing the severity, character,
location, and chronicity of pain; the nonpsychological nature of such measures place them
outside of the committee’s scope. However, there are non-cognitive measures that are used to
identify and assess psychological factors related to pain, such as the Pain Patient Profile (P-3)
(Tollison and Langley, 1995), which comprises three clinical scales measuring depression,
anxiety, and somatization.
The second criterion in disability determinations is the impact of the medically
determinable impairment on the applicant’s ability to function in a work setting, what SSA refers
to as the Paragraph B criteria. In the realm of mental disorders, SSA currently assesses
functioning in four categories: (1) activities of daily living (ADLs); (2) social functioning; (3)
concentration, persistence, or pace; and (4) episodes of decompensation. However, SSA (2010)
published a Notice of Proposed Rulemaking (NPRM)4 for its mental disorders listings, which
among other changes, would alter the functional categories on which disability determinations
would be based, increasing focus on the relation of functioning to the work setting. Proposed
functional domains in the NPRM are the abilities to: (1) understand, remember, and apply
information; (2) interact with others; (3) concentrate, persist, and maintain pace; and (4) manage
oneself.5 Definitions of each of these domains are presented in Box 4-4. With SSA’s move in this
direction and the greater focus on functional abilities as they relate to work, the committee will
examine the relevance of psychological self-report measures to the proposed functional domains.
Although non-cognitive assessments do not provide direct evidence of functional
capacity, information obtained from these measures allows for the corroboration of symptoms as
presented, which can lead to greater diagnostic accuracy. For example, self-report instruments
allow for a standardized method of obtaining information that is normed against other clinical
and nonclinical groups, adding to the ability of a clinician to offer accurate diagnoses. In
addition, some of these instruments have validity scales, which measure test-taking strategies, as
discussed in detail below. Understanding these presentation approaches (i.e., over- or under
reporting of symptoms) is helpful in identifying conditions accurately. From obtaining an
accurate diagnosis, the ability to generate more accurate prognostic indicators increases and
thereby, provides greater ability to discern the chronicity of conditions presented.
4
Public comments are still under review and a final rule has yet to be published as of the publication of this report.
5
These proposed domains align closely with the recommendations of the Mental Cognitive Subcommittee of the
Occupational Information Development Advisory Panel (OIDAP), which conceptualized psychological abilities
essential to work in four categories: (1) neurocognitive functioning; (2) initiative and persistence; (3) interpersonal
functioning; and (4) self-management. Note that with this first category, neurocognitive functioning, the Mental
Cognitive Subcommittee’s recommendation goes into greater detail; this will be discussed further in the following
chapter, which focuses on cognitive testing. The Mental Cognitive Subcommittee was assembled to advise the
OIDAP about what psychological abilities of disability applicants should be included in the Content Model and
Classification Recommendations made to SSA.
BOX 4-4
SSA Proposed Functional Domains
Understand, remember, The ability to acquire, retain, integrate, access, and use information
and apply information to perform work activities. You use this mental ability when, for
example, you follow instructions, provide explanations, and identify
and solve problems.
Interact with others The ability to relate to and work with supervisors, co-workers, and
the public. You use this mental ability when, for example, you
cooperate, handle conflicts, and respond to requests, suggestions,
and criticism.
Concentrate, persist, and The ability to focus attention on work activities and to stay on task at
maintain pace a sustained rate. You use this mental ability when, for example, you
concentrate, avoid distractions, initiate and complete activities,
perform tasks at an appropriate and consistent speed, and sustain
an ordinary routine.
Manage oneself The ability to regulate your emotions, control your behavior, and
maintain your well-being in a work setting. You use this mental ability
when, for example, you cope with your frustration and stress,
respond to demands and changes in your environment, protect
yourself from harm and exploitation by others, inhibit inappropriate
actions, take your medications, and maintain your physical health,
hygiene, and grooming.
non-cognitive measures require that the individual be able to complete a self-report inventory, a
task that requires reading and responding to a list of dichotomous (True/False) or Likert scale
items. To complete a task like this, one must have the ability to attend, read, comprehend, and
respond to a series of items. For example, the MMPI-2-RF was developed with a fifth-grade
reading level, while the MCMI-3 and the PAI both require an eighth-grade reading level.
Although some tests have alternative methods of administration (e.g., standardized audio tape
administration, computerized administration), ensuring that the examinee is able to understand
information at a content level equivalent to the items on the test and has the capacity to attend to
and respond to items is generally recommended. In addition, the capacity of the individual to
work on an activity with similar characteristics for the development of normative data must be
considered. Additionally, consideration of the examinee’s language and administration of a test
that has been translated and normed within the language is generally recommended.
SSA requires psychological testing be “individually administered by a qualified
specialist,” defining qualified as “currently licensed or certified in the state to administer, score,
and interpret psychological tests and have the training and experience to perform the test” (SSA,
n.d.-a). It is important to note here, as discussed in Chapter 3, the different qualification levels
that may be necessary for administration and interpretation. It is common practice for
psychometrists or technicians, with specialized training to administer and score psychological
tests, under the close supervision and direction of doctoral-level clinical psychologists.
Interpretation of testing results requires a higher degree of clinical training than administration
alone. Most psychological tests require interpretation by doctoral-level psychologists with a high
level of expertise in psychometric test administration and interpretation.6 Threats to the validity
of any psychological measure of a self-report nature oblige the test interpreter to understand the
test and principles of test construction. In fact, interpreting tests results without such knowledge
would violate the ethics code established for the profession of psychology (APA, 2010). Finally,
it is important for the person interpreting the test results to address in the assessment report the
reliability and validity of test scores and test norms relative to the individual being assessed.
6
These are commonly referred to as level C tests. Some tests have less stringent qualifications (level B) or no
special qualifications (level A) necessary for purchase, administration, and interpretation. See Chapter 3 for
additional information on different qualification levels.
based upon normative data on the scale. Norms may be based upon nationally representative
samples or subpopulations of relevance to the particular patient concern. For example, the
MMPI-2-RF contains a validity scale that compares reports of emotional distress and psychiatric
illness with psychiatric populations (i.e., Infrequent Psychopathology Responses [Fp-r]) and
another that compares reporting of somatic complaints with medical patient populations (i.e.,
Infrequent Somatic Responses [Fs]). Norms may also include specific diagnostic groups that
illuminate particular profiles on the test that may be indicative of a particular diagnosis. Cutoff
scores are established to identify the presence of a response set that is either incongruent with
known diagnoses or suggestive of responding employing an alternative response set (e.g.,
overendorsement of symptoms). Such response sets are commonly seen as invalid and dependent
on the test. The scale(s) are interpreted using clinical judgment by the examiner taking into
consideration the referral questions, history of the examinee, and context of the evaluation.
Types of SVTs
Many SVTs are scales within larger personality or multiscale inventories assessing test-
taker response styles used in completing the battery. These scales may be designed as such and
embedded or later derived from existing items and scales based on typical response patterns,
including those of specific populations. For example, each of the personality measures discussed
earlier in this chapter (i.e., MMPI-2-RF, MCMI-III, and PAI) contains validity scales that
examine consistency of response, negative self-presentation, and positive self-presentation to
varying degrees. Box 4-5 lists the negative self-presentation SVTs included in each of these
measures.
BOX 4-5
Embedded/Derived SVTs for Negative Self-Presentation
MMPI-2-RFa
Infrequent Responses (F-r) Overreporting across psychological, cognitive, and somatic
dimensions (as compared with general population)
Infrequent Psychopathology Overreporting of emotional distress, psychiatric illness (as
Responses (Fp-r) compared with psychiatric populations)
Infrequent Somatic Overreporting of somatic complaints (as compared with medical
Responses (Fs) patient populations)
Symptom Validity (FBS-r) Overreporting of somatic and cognitive complaints
Response Bias (RBS) Overreporting of memory complaints
Henry-Heilbronner Indexb Physical symptom exaggeration (empirically derived from existing
scales; for use with personal injury litigants and disability
claimants)
Malingered Mood Disorder Exaggeration of emotional disturbance
Scalec
(empirically derived from existing scales; for use with personal
injury litigants and disability claimants)
MCMI-IIId
Validity (V) Improbable symptoms; may measure confusion, difficulties reading
and understanding items or responding in a random fashion
Disclosure (X) Acknowledgement of difficulties and willingness to present with
symptoms
Debasement (Z) Tendency to present symptoms in an accentuated fashion
PAIe
Infrequency (INF) Statistically unlikely response patterns in items that have low rates
of endorsement and high rates of endorsement
Negative Impression (NIM) Rare symptoms and those that are not reported by many
respondents
Malingering Index (MAL) Unlikely patterns; features that are more likely to be found in
persons simulating mental disorders than in clinical patients
Rogers Discriminant Function A statistically determined method that distinguishes simulators from
(RDF) those who were responding honestly
a
Ben-Porath et al., 2008.
b
Henry et al., 2013.
c
Henry et al., 2008.
d
Millon et al., 2009.
e
Morey, 2007.
Though fewer in number, stand-alone SVTs also exist to assess potential exaggeration or
feigning of psychological and neuropsychological symptoms. These include a number of
structured interviews, such as the Structured Interview of Reported Symptoms (Rogers et al.,
1992), the Structured Inventory of Malingered Symptomatology (Widows and Smith, 2005), and
the Miller Forensic Assessment of Symptom Test (Miller, 2001). Like the embedded/derived
measures, these SVTs examine accuracy of symptom report in a variety of ways. As this is their
sole purpose, they are often used in conjunction with other measures that do not contain tests of
validity. Box 4-6 lists the scales related to negative self-presentation in stand-alone SVTs.
BOX 4-6
Stand-Alone SVTs for Negative Self-Presentation
As suggested above, there are a number of allegations that may warrant the
administration of non-cognitive tests. Such allegations generally fall in two broad categories:
mental disorders and disorders with somatic complaints that are disproportionate to demonstrable
medical morbidity. Mental disorders include schizophrenic, paranoid, and other psychotic
disorders; affective disorders; anxiety-related disorders; and personality disorders. It is important
to note that some of these conditions may also include cognitive complaints, in which case
cognitive testing (discussed in Chapter 5) may be more appropriate. Disorders with somatic
complaints that are disproportionate to demonstrable medical morbidity include somatoform
disorders, multisystem illnesses (e.g., chronic fatigue syndrome, repetitive strain injury, chronic
Lyme disease), and chronic idiopathic pain conditions (e.g., fibromyalgia, carpal tunnel
syndrome).
The committee concludes that the use of standardized non-cognitive psychological
measures is essential to the determination of all cases in which an applicant’s allegation of non-
cognitive functional impairment meets three requirements:
• The applicant alleges a mental disorder (i.e., schizophrenic, paranoid, and other
psychotic disorders; affective disorders; anxiety-related disorders; and personality
disorders) unaccompanied by cognitive complaints or a disorder with somatic
symptoms that are disproportionate to demonstrable medical morbidity (i.e.,
somatoform disorders, multisystem illnesses, and chronic idiopathic pain conditions).
• The presence and severity of impairment and associated functional limitations are
based largely on applicant self-report.
• Objective medical evidence or longitudinal medical records sufficient to make a
disability determination do not accompany the claim.
REFERENCES
American Psychiatric Association. 2013. The diagnostic and statistical manual of mental disorders:
DSM-5. Washington, DC: American Psychiatric Association.
APA (American Psychological Association). 2010. Ethical principles of psychologists and code of
conduct. http://www.apa.org/ethics/code/ (accessed March 9, 2015).
Barsky, A. J., and J. F. Borus. 1999. Functional somatic syndromes. Annals of Internal Medicine
130(11):12.
Beck, A., and R. Steer. 1993. Beck anxiety inventory manual. San Antonio, TX: Harcourt Brace &
Company.
Beck, A. T., R. Steer, and G. Brown. 1996. Beck depression inventory. 2nd ed. San Antonio, TX: The
Psychological Corporation.
Ben-Porath, Y. S., A. Tellegen, and N. Pearson. 2008. MMPI-2-RF: Manual for administration, scoring
and interpretation. Minneapolis, MN: University of Minnesota Press.
Bigler, E. D. 2012. Symptom validity testing, effort, and neuropsychological assessment. Journal of the
International Neuropsychological Society 18(4):632-642.
Bigler, E. D. 2014. Use of symptom validity tests and performance validity tests in disability
determinations. Paper commissioned by the Committee on Psychological Testing, Including
Validity Testing, for Social Security Administration Disability Determinations.
http://www.iom.edu/psychtestingpaperEB (accessed April 9, 2015).
Bush, S. S., R. M. Ruff, A. I. Troster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds, and C. H.
Silver. 2005. Symptom validity assessment: Practice issues and medical necessity NAN policy
and planning committee. Archives of Clinical Neuropsychology 20(4):419-426.
Bush, S. S., R. L. Heilbronner, and R. M. Ruff. 2014. Psychological assessment of symptom and
performance validity, response bias, and malingering: Official position of the Association for
Scientific Advancement in Psychological Injury and Law. Psychological Injury and Law
7(3):197-205.
Butcher, J. N., W. Dahlstrom, J. Graham, A. Tellegen, and B. Kaemmer. 1989. MMPI-2: Manual for
administration and scoring. Minneapolis, MN: University of Minnesota Press.
Derogatis, L. 1994. SCL-90-R: Symptom checklist-90-R. Minneapolis, MN: Pearson.
Derogatis, L. R., and P. Spencer. 1993. Brief symptom inventory: BSI. Minneapolis, MN: Pearson.
First, M. B., R. L. Spitzer, M. Gibbon, and J. B. Williams. 2012. Structured clinical interview for DSM-IV
axis I disorders (SCID-I), clinician version, administration booklet. Arlington, VA: American
Psychiatric Publishing.
Gibbon, M., R. L. Spitzer, and M. B. First. 1997. User’s guide for the structured clinical interview for
DSM-IV axis II personality disorders: SCID-II. Arlington, VA: American Psychiatric Publishing.
Hamilton, M. 1980. Rating depressive patients. Journal of Clinical Psychiatry 41(12): 21–24.
Hathaway, S. R., and J. C. McKinley. 1940. A multiphasic personality schedule (Minnesota): I.
Construction of the schedule. Journal of Psychology 10:249-254.
Hathaway, S. R., and J. C. McKinley. 1943. Manual for the Minnesota Multiphasic Personality Inventory.
New York: The Psychological Corporation.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference participants.
2009. American Academy of Clinical Neuropsychology consensus conference statement on the
neuropsychological assessment of effort, response bias, and malingering. Clinical
Neuropsychologist 23(7):1093-1129.
Henningsen, P., S. Zipfel, and W. Herzog. 2007. Management of functional somatic syndromes. Lancet
369(9565):946-955.
Henry, G. K., R. L. Heilbronner, W. Mittenberg, C. Enders, and D. M. Roberts. 2008. Empirical
derivation of a new MMPI-2 scale for identifying probable malingering in personal injury
litigants and disability claimants: The 15-item malingered mood disorder scale (MMDS). Clinical
Neuropsychologist 22(1):158-168.
Henry, G. K., R. L. Heilbronner, J. Algina, and Y. Kaya. 2013. Derivation of the MMPI-2-RF Henry-
Heilbronner Index-r (HHI-r) scale. Clinical Neuropsychologist 27(3):509-515.
Larrabee, G. J. 2012. Performance validity and symptom validity in neuropsychological assessment.
Journal of the International Neuropsychological Society 18(4):625-630.
Larrabee, G. J. 2014. Performance and Symptom Validity. Presentation to IOM Committee on
Psychological Testing, Including Symptom Validity Assessment, for Social Security
Administration: Meeting 2, June 25, 2014, Washington, DC.
Miller, H. A. 2001. M-FAST: Miller forensic assessment of symptoms test professional manual. Odessa,
FL: Psychological Assessment Resources.
Millon, T., C. Millon, R. D. Davis, and S. Grossman. 2009. Millon clinical multiaxial inventory-III
(MCMI-III) manual. San Antonio, TX: Pearson/PsychCorp.
Morey, L. C. 2007. Personality assessment inventory. Odessa, FL: Psychological Assessment Resources.
Pearson Education. 2015. Qualifications policy.
http://www.pearsonclinical.com/psychology/qualifications.html (accessed January 5, 2015).
Rogers, R., R. M. Bagby, and S. E. Dickens. 1992. Structured interview of reported symptoms:
Professional manual. Odessa, FL: Psychological Assessment Resources.
Sheehan, D., Y. Lecrubier, K. Sheehan, P. Amorim, J. Janavs, E. Weiller, T. Hergueta, R. Baker, and G.
Dunbar. 1998. The Mini-International Neuropsychiatric Interview (MINI): The development and
validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of
Clinical Psychiatry 59(20): 22-33.
Spitzer, R. L., K. Kroenke, J. B. Williams, and P. H. Q. P. C. S. Group. 1999. Validation and utility of a
self-report version of prime-md: The PHQ primary care study. JAMA 282(18):1737-1744.
SSA (Social Security Administration). 2010. Revised medical criteria for evaluating mental disorders.
Federal Register 75(160):34.
SSA. n.d.-a. Disability evaluation under Social Security: 12.00 mental disorders—adult.
http://www.ssa.gov/disability/professionals/bluebook/12.00-MentalDisorders-Adult.htm
(accessed November 14, 2014).
SSA. n.d.-b. Disability evaluation under Social Security: Part I—general information.
http://www.ssa.gov/disability/professionals/bluebook/general-info.htm (accessed November 14,
2014).
Tollison, D., and J. Langley. 1995. Pain patient profile manual. Minneapolis, MN: National Computer
Systems.
Van Dyke, S. A., S. R. Millis, B. N. Axelrod, and R. A. Hanks. 2013. Assessing effort: Differentiating
performance and symptom validity. The Clinical Neuropsychologist 27(8):1234-1246.
Vranceanu, A., A. Barsky, and D. Ring. 2009. Psychosocial aspects of diabling musculoskeletal pain. The
Journal of Bone and Joint Surgery. 91(8): 2014–2018.
Weathers, F., B. Litz, D. Herman, J. Huska, and T. Keane. 1994. The PTSD checklist-civilian version
(PCL-C). Boston, MA: National Center for PTSD.
WHO (World Health Organization). 1993. Composite International Diagnostic Interview (CIDI):
Interviewer’s manual. Geneva, Switzerland: World Health Organization.
Widows, M. R., and G. P. Smith. 2005. Structured inventory of malingered symptomatology: Professional
manual. Lutz, FL: Psychological Assessment Resources.
Wing, J. K., T. Babor, T. Brugha, J. Burke, J. Cooper, R. Giel, A. Jablenski, D. Regier, and N. Sartorius.
1990. SCAN: Schedules for clinical assessment in neuropsychiatry. Archives of General
Psychiatry 47(6):589-593.
1
As documented in Chapters 1–2, 57 percent of claims fall under the other mental disorders and/or connective tissue
disorders.
2
Public comments are currently under review and a final rule has yet to be published as of the publication of this
report.
5-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
interact with others; concentrate, persist, and maintain pace; and manage oneself—increase focus
on the relation of functioning to the work setting; because of SSA’s move in this direction, the
committee examines the relevance of psychological testing in terms of these proposed functional
domains. As will be discussed below, cognitive testing may prove beneficial to the assessment of
each of these requirements.
duration of an evaluation varies, including reason for referral, the type or degree of psychological
and/or cognitive impairments, or factors specific to the individual.
The most important aspect of administration of cognitive and neuropsychological tests is
selection of the appropriate tests to be administered. That is, selection of measures is dependent
upon examination of the normative data collected with each measure and consideration of the
population on which the test was normed. Normative data are typically gathered on generally
healthy individuals who are free from significant cognitive impairments, developmental
disorders, or neurological illnesses that could compromise cognitive skills. Data are generally
gathered on samples that reflect the broad demographic characteristics of the United States
including factors such as age, gender, and educational status. There are some measures that also
provide specific comparison data on the basis of race and ethnicity.
As discussed in detail in Chapter 3, as part of the development of any psychometrically
sound measure, explicit methods and procedures by which tasks should be administered are
determined and clearly spelled out. All examiners use such methods and procedures during the
process of collecting the normative data, and such procedures normally should be used in any
other administration. Typical standardized administration procedures or expectations include (1)
a quiet, relatively distraction free environment; (2) precise reading of scripted instructions; and
(3) provision of necessary tools or stimuli. Use of standardized administration procedures
enables application of normative data to the individual being evaluated (Lezak et al., 2012).
Without standardized administration, the individual’s performance may not accurately reflect his
or her ability. An individual’s abilities may be overestimated if the examiner provides additional
information or guidance than what is outlined in the test administration manual. Conversely, a
claimant’s abilities may be underestimated if appropriate instructions, examples, or prompts are
not presented.
To qualify at Step 3 in the process (as discussed in Chapter 2), there must be medical
evidence that substantiates the existence of an impairment and associated functional limitations
that meet or equal the medical criteria codified in SSA’s Listings of Impairments. If an adult
applicant’s impairments do not meet or equal the medical listing, residual functional capacity—
the most a claimant can still do despite his or her limitations—is assessed; this includes whether
the applicant has the capacity for past work (Step 4) or any work in the national economy (Step
5). For child applicants, once there has been identification of a medical impairment,
documentation of a “marked and severe functional limitation relative to typically developing
peers” is required. Cognitive testing is valuable in both child and adult assessments in
determining the existence of a medically determinable impairment and evaluating associated
functional impairments and residual functional capacity.
are referred to the comprehensive textbooks, Neuropsychological Assessment (Lezak et al., 2012)
or A Compendium of Neuropsychological Tests (Strauss et al., 2006).
This domain refers to abilities to register and store new information (e.g., words,
instructions, procedures) and retrieve information as needed (OIDAP, 2009; WHO, 2001).
Functions of memory include “short-term and long-term memory, immediate, recent and remote
memory; memory span; retrieval of memory; remembering; [and] functions used in recalling and
learning” (WHO, 2001, p. 53). However, it is important to note that semantic, autobiographical,
and implicit memory are generally preserved in all but the most severe forms of neurocognitive
dysfunction (American Psychiatric Association, 2013; OIDAP, 2009). Impaired memory
functioning can arise from a variety of internal or external factors, such as depression, stress,
stroke, dementia, or traumatic brain injury, and may affect an individual’s ability to sustain work,
due to a lessened ability to learn and remember instructions or work-relevant material. Examples
of tests for learning and memory deficits include the Wechsler Memory Scale (Wechsler, 2009),
Wide Range Assessment of Memory and Learning (Sheslow and Adams, 2003), California
Verbal Learning Test (Delis, 1994; Delis and Kramer, 2000), Hopkins Verbal Learning Test-
Revised (Benedict et al., 1998; Brandt and Benedict, 2001), Brief Visuospatial Memory Test-
Revised (Benedict, 1997), and the Rey-Osterrieth Complex Figure Test (Rey, 1941).
Processing Speed
Processing speed refers to the amount of time it takes to respond to questions and process
information, and “has been found to account for variability in how well people perform many
everyday activities, including untimed tasks” (OIDAP, 2009, p. C-23). This domain reflects
mental efficiency and is central to many cognitive functions (NIH, n.d.). Tests for deficits in
processing speed include the WAIS-IV processing speed index and Trail Making Test Part A
(Reitan, 1992).
Executive Functioning
Once a test has been administered, assuming it has been done so according to
standardized protocol, the test-taker’s performance can be scored. In most instances, an
individual’s raw score, that is the number of items on which he or she responded correctly, is
translated into a standard score based on the normative data for the specific measure. In this
manner, an individual’s performance can be characterized by its position on the distribution
curve of normal performances.
The majority of cognitive tests have normative data from groups of people that mirror the
broad demographic characteristics of the population of the United States based on census data.
As a result, the normative data for most measures reflect the racial, ethnic, socioeconomic, and
educational attainment of the population majorities. Unfortunately, that means that there are
some individuals for whom these normative data are not clearly and specifically applicable. This
does not mean that testing should not be done with these individuals, but rather that careful
consideration of normative limitations should be made in interpretation of results.
Selection of appropriate measures and assessment of applicability of normative data vary
depending on the purpose of the evaluation. Cognitive tests can be used to identify acquired or
developmental cognitive impairment, to determine the level of functioning of an individual
relative to typically functioning same-aged peers, or to assess an individual’s functional capacity
for everyday tasks (Freedman and Manly, 2015). Clearly, each of these purposes could be
relevant for SSA disability determinations. However, each of these instances requires different
interpretation and application of normative data.
When attempting to identify a change in functioning secondary to neurological injury or
illness, it is most appropriate to compare an individual’s postinjury performance to his or her
premorbid level of functioning. Unfortunately, it is rare that an individual has a formal
assessment of his or her premorbid cognitive functioning. Thus, comparison of the postinjury
performance to demographically matched normative data provides the best comparison to assess
a change in functioning (Freedman and Manly, 2015; Heaton et al., 2001; Manly and
Echemendia, 2007). For example, assessment of a change in language functioning in a Spanish-
speaking individual from Mexico who has sustained a stroke will be more accurate if the
individual’s performance is compared to norms collected from other Spanish-speaking
individuals from Mexico rather than English speakers from the United States or even Spanish-
speaking individuals from Puerto Rico. In many instances, this type of data is provided in
alternative normative data sets rather than the published population-based norms provided by the
test publisher.
In contrast, the population-based norms are more appropriate when the purpose of the
evaluation is to describe an individual’s level of functioning relative to same-aged peers (Busch,
2006; Freedman and Manly, 2015). A typical example of this would be in instances when the
purpose of the evaluation is to determine an individual’s overall level of intellectual (i.e., IQ) or
even academic functioning. In this situation, it is more relevant to compare that individual’s
performance to that of the broader population in which he or she is expected to function in order
to quantify his or her functional capabilities. Thus for determination of functional disability,
demographically or ethnically corrected normative data are inappropriate and may actually
underestimate an individual’s degree of disability (Freedman and Manly, 2015). In this situation,
use of otherwise appropriate standardized and psychometrically sound performance-based or
cognitive tests is appropriate.
Determination of an individual’s everyday functioning or vocational capacity is perhaps
the evaluation goal most relevant to the SSA disability determination process. To make this
determination, the most appropriate comparison group for any individual would be other
individuals who are currently completing the expected vocational tasks without limitations or
disability (Freedman and Manly, 2015). Unfortunately, there are few standardized measures of
skills necessary to complete specific vocational tasks and, therefore, also no vocational specific
normative data at this time. This type of functional capacity is best measured by evaluation
techniques that recreate specific vocational settings and monitor an individual’s completion of
related tasks.
Until such specific vocational functioning measures exist and are readily available for
use in disability determinations, objective assessment of cognitive skills that are presumed to
underlie specific functions will be necessary to quantify an individual’s functional limitations.
Despite limitations in normative data as outlined in Freedman and Manly (2015), formal
psychometric assessment can be completed with individuals of various ethnic, racial, gender,
educational, and functional backgrounds. However, the authors note that “limited research
suggests that demographic adjustments reduce the power of cognitive test scores to predict
every-day abilities” (e.g., Barrash et al., 2010; Higginson et al., 2013; Silverberg and Millis,
2009). In fact, they go on to state “The normative standard for daily functioning should not
include adjustments for age, education, sex, ethnicity, or other demographic variables” (p. 9).
Use of appropriate standardized measures by appropriately qualified evaluators as outlined in the
following sections further mitigates the impact of normative limitations.
Interpretation of results is more than simply reporting the raw scores an individual
achieves. Interpretation requires assigning some meaning to the standardized score within the
individual context of the specific test-taker. There are several methods or levels of interpretation
that can be used and a combination of all is necessary to fully consider and understand the results
of any evaluation (Lezak et al., 2012). This section is meant to provide a brief overview;
although a full discussion of all approaches and nuances of interpretation is beyond the scope of
this report, interested readers are referred to various textbooks (e.g., Lezak et al, 2012; Groth-
Marnat, 2009).
Interindividual Differences
The most basic level of interpretation is simply to compare an individual’s testing results
with the normative data collected in the development of the measures administered. This level of
interpretation allows the examiner to determine how typical or atypical an individual’s
performance is in comparison to same-aged individuals within the general population. Normative
data may or may not be further specialized on the basis of race/ethnicity, gender, and educational
status. There is some degree of variability in how an individual’s score may be interpreted based
on its deviation from the normative mean due to various schools of thought, all of which cannot
be described in this text. One example of an interpretative approach would be that a performance
within one standard deviation of the mean would be considered broadly average. Performances
one to two standard deviations below the mean are considered mildly impaired, and those two or
more standard deviations below the mean typically are interpreted as being at least moderately
impaired.
Intraindividual Differences
Profile Analysis
Regardless of the level of interpretation, it is important for any evaluator to keep in mind
that poor performance on a set of cognitive or neuropsychological measures does not always
mean that an individual is truly impaired in that area of functioning. Additionally, poor
performance on a set of cognitive or neuropsychological measures does not directly equate to
functional disability.
In instances of inconsistent or unexpected profiles of performance, a thorough
interpretation of the psychometric data requires use of additional information. The evaluator
must consider the validity and reliability of the data acquired, such as whether or not there were
errors in administration that rendered the data invalid, emotional or psychiatric factors that
affected the individual’s performance, or sufficient effort put forth by the individual on all
measures.
To answer the latter question, administration of performance validity tests (PVTs) as part
of the cognitive or neuropsychological evaluation battery can be helpful. Interpretation of PVT
data must be undertaken carefully. Any PVT result can only be interpreted in an individual’s
personal context, including psychological/emotional history, level of intellectual functioning, and
other factors that may affect performance. Particular attention must be paid to the limitations of
the normative data available for each PVT to date. As such, a simple interindividual
interpretation of PVT testing results is not acceptable or valid. Rather, consideration of
intraindividual patterns of performance on various cognitive measures is an essential component
of PVT interpretation. PVTs will be discussed in greater detail later in this chapter.
Given the need for the use of standardized procedures, any person administering
cognitive or neuropsychological measures must be well trained in standardized administration
protocols. He or she should possess the interpersonal skills necessary to build rapport with the
individual being tested in order to foster cooperation and maximal effort during testing.
Additionally, individuals administering testing should understand important psychometric
properties, including validity and reliability, as well as factors that could emerge during testing to
place either at risk (as described in Chapter 3).
Many doctoral-level psychologists are well-trained in test administration. In general,
psychologists from clinical, counseling, school, or educational graduate psychology programs
receive training in psychological test administration. However, the functional domains of
emphasis in most of these programs include intellectual functioning, academic achievement,
aptitude, emotional functioning, and behavioral functioning (APA, 2015). Thus, if the request for
disability is based on a claim of intellectual disability or significant emotional/behavioral
dysfunction, a psychologist with solid psychometric training from any of these types of graduate-
level training programs would typically be capable of completing the necessary evaluation.
For cases in which the claim is based on specific cognitive deficits, particularly those
attributed to neurological disease or injury, a neuropsychologist may be needed to most
accurately evaluate the claimant’s functioning. Neuropsychologists are clinical psychologists
relates to normal and abnormal functioning of the central nervous system. (HNS,
2003)
Specific tests have also been designed especially to aid in the examination of
performance validity. The development of and research on these PVTs has increased rapidly over
the past 2 decades. There have been attempts to formally quantify performance validity during
testing since the mid-1900s (Rey, 1964), with much of the initial focus on examining the
consistency of an individual’s responses across a battery of testing, with the suggestion that
inconsistency may indicate variable effort. However, a significant push for specific formal
measures came in response to the increased use of neuropsychological and cognitive testing in
forensic contexts, including personal injury litigation, workers compensation, and criminal
proceedings in the 1980s and 1990s (Bianchini et al., 2001; Larrabee, 2012). Given the nature of
these evaluations, there was often a clear incentive for an individual to exaggerate his or her
impairment or to put forth less than optimal effort during testing, and neuropsychologists were
being called upon to provide statements related to the validity of test results (Slick et al., 1999).
Several studies documented that use of clinical judgment and interpretation of performance
inconsistencies alone was an inadequate methodology for detection of poor effort or intentionally
poor performance (Faust et al., 1988; Heaton et al., 1978; van Gorp et al., 1999). As such, the
need for formal standardized measures of effort and means for interpretation of these measures
emerged.
PVTs are measures that assess the extent to which an individual is providing valid
responses during cognitive or neuropsychological testing. PVTs are typically simple tasks that
are easier than they appear to be and on which an almost perfect performance is expected based
on the fact that even individuals with severe brain injury have been found capable of good
performance (Larrabee, 2012). On the basis of that expectation, each measure has a performance
cutoff defined by an acceptable number of errors designed to keep the false positive rate low.
Performances below these cutoff points are interpreted as demonstrating invalid test
performance.
Types of PVTs
PVTs may be designed as such and embedded within other cognitive tests, later derived
from standard cognitive tests, or designed as stand-alone measures. Examples of each type of
measure are discussed below.
Embedded and derived PVTs are similar in that a specific score or assessment of
response bias is determined from an individual’s performance on an aspect of a preexisting
standard cognitive measure. The primary difference is that embedded measures consist of indices
specifically created to assess validity of performance in a cognitive test, whereas derived
measures typically use novel calculations of performance discrepancies rather than simply
examining the pattern of performance on already established indices. The rationale for this type
of PVT is that it does not require administration of any additional tasks and therefore does not
result in any added time or cost. Additionally, development of these types of PVTs can allow for
retrospective consideration or examination of effort in batteries in which specific stand-alone
measures of effort were not administered (Solomon et al., 2010).
The Forced Choice condition of the California Verbal Learning Test-II (Delis and
Kramer, 2000) is an example of an embedded PVT. Following learning, recall, and recognition
trials involving a 16-item word list, the test-taker is presented with pairs of words and asked to
identify which one was on the list. More than 92 percent of the normative population, including
individuals in their eighties, scored 100 percent on this test. Scores below the published cutoff
are unusually low and indicative of potential noncredible performance. Scores below chance are
considered to reflect purposeful noncredible performance, in that the test-taker knew the correct
answer but purposely chose the wrong answer.
Reliable Digit Span, based on the Digit Span subtest of the Wechsler Adult Intelligence
Scale, is an example of a measure that was derived based on research following test publication.
The Digit Span subtest requires test-takers to repeat strings of digits in forward order (forward
digit span), as well as in reverse order (backward digit span). To calculate Reliable Digit Span,
the maximum forward and backward span are summed, and scores below the cutoff point are
associated with noncredible performance (Greiffenstein et al., 1994). A full list of embedded and
derived PVTs is provided in Table 5-1.
WCST-FMS Wisconsin Card Sorting Test, Failure-To-Maintain Suhr and Boyer (1999)
Set Score
WCT Word Choice Test, in the WMS-IV Wechsler (2009)
WMI Working Memory Index Wechsler (1997a)
WMS-III-VPA Wechsler Memory Scale, Third Edition, Verbal Wechsler (1997a
Paired Associates-2 Scale Score
SOURCE: Young, 2014. Reproduced with permission.
Stand-Alone Measures
It is within that historical medicolegal context that clinical practice guidelines for
neuropsychology emerged to emphasize the use of psychometric indicators of response validity
(as opposed to clinician judgment alone) in determining the interpretability of a battery of
cognitive tests (Bianchini et al., 2001; Heilbronner et al., 2009). Moreover, it has become
standard clinical practice to use multiple PVTs throughout an evaluation (Boone, 2009;
Heilbronner et al., 2009). In general, multiple PVTs should be administered over the course of
the evaluation, because performance validity may wax and wane with increasing and decreasing
fatigue, pain, motivation, or other factors that can influence effortful performance (Boone, 2009;
Boone, 2014; Heilbronner et al., 2009). Some of the PVT development studies have attempted to
examine these factors (i.e., effect of experimentally induced pain) and found no effect on PVT
performance (Etherton et al., 2005a,b).
In clinical evaluations, most individuals will pass PVTs, and a small proportion will fail
at the below-chance level. These clear passes can support the examiner’s interpretation of the
evaluation data being valid. Clear failures, that is below-chance performances, certainly place the
validity of any other data obtained in the evaluation in question.
The risk of falsely identifying failure on one PVT as indicative of noncredible
performance has resulted in the common practice of requiring failure on at least two PVTs to
make any assumptions related to effort (Boone, 2009; Boone, 2014; Larrabee, 2014a). According
to practice guidelines of NAN, performance slightly below the cutoff point on only one PVT
cannot be construed to represent noncredible performance or biased responding; converging
evidence from other indicators is needed to make a conclusion regarding performance bias (Bush
et al., 2005). Similarly, AACN suggests the use of multiple validity assessments, both embedded
and stand-alone, when possible, noting that effort may vary during an evaluation (Heilbronner et
al., 2009). However, it should be noted that in cases where a test-taker scores significantly below
chance on a single forced-choice PVT, intent to deceive may be assumed and test scores deemed
invalid. It is also important to note that some situations may preclude the use of multiple validity
indicators. For example, when evaluating an early school-aged child, at present, the TOMM is
the only empirically established PVT (Kirkwood, 2014). In such situations, “it is the clinician’s
responsibility to document the reasons and explicitly note the interpretive implications” of
reliance on a single PVT (Heilbronner et al., 2009).
The number of noncredible performances and the pattern of PVT failure are both
considered in making a determination about whether the remainder of the neuropsychological
battery can be interpreted. This consideration is particularly important in evaluations in which
the test-taker’s performance on cognitive measures falls below an expected level, suggesting
potential cognitive impairment. That is, an individual’s poor performance on cognitive measures
may reflect insufficient effort to perform well, as suggested by PVT performance, rather than a
true impairment. However, even in the context of PVT failure, performances that are in the
average range can be interpreted as reflecting ability that is in the average range or above, though
such performances may represent an underestimate of actual level of ability. Certainly, PVT
“failure” does not equate to malingering or lack of disability. However, clear PVT failures make
the validity of the remainder of the cognitive battery questionable; therefore, no definitive
conclusions can be drawn regarding cognitive ability (aside from interpreting normal
performances as reflecting normal cognitive ability). An individual who fails PVTs may still
have other evidence of disability that can be considered in making a determination; in these
cases, further information would be needed to establish the case for disability.
The AACN and NAN endorse the use of PVT measures in the context of any
neuropsychological examination (Bush et al., 2005; Heilbronner et al., 2009). The practice
standards require clinical neuropsychologists performing evaluations of cognitive functioning for
diagnostic purposes to include PVTs and comment on the validity of test findings in their reports.
There is no gold standard PVT, and use of multiple PVTs is recommended. A specified set of
PVTs, or other cognitive measures for that matter, is not recommended due to concerns
regarding test security and test-taker coaching.3
Given the primary use of cutoff scores, even within the context of forced-choice tasks,
the interpretation of PVT performance is inherently different than interpretation of performance
on other standardized measures of cognitive functioning owing to the nature of the scores
obtained. Unlike general cognitive measures that typically use a norm-referenced scoring
paradigm assuming a normal distribution of scores, PVTs typically use a criterion-referenced
scoring paradigm because of a known skewed distribution of scores (Larrabee, 2014a). That is,
an individual’s performance is compared to a cutoff score set to keep false positive rates below
10 percent for determining whether or not the individual passed or failed the task.
3
At the committee’s second meeting, Drs. Bianchini, Boone, and Larrabee all expressed great concern about the
susceptibility of PVTs to coaching and stressed the importance of ensuring test security, as disclosure of test
materials adversely affects the reliability and validity of psychological test results.
A resulting primary critique of PVTs is that the development of the criterion or cut-off
scores has not been as rigorous or systematic as is typically expected in the collection of
normative data during development of a new standardized measure of cognitive functioning. In
general, determination of what is an acceptable or passing performance and associated cutoff
scores have been established in somewhat of a post hoc or retrospective fashion. However, there
are some embedded PVTs that have been co-normed with their “parent” tests, such as the forced
choice condition of the CVLT-II, which was normed along with the CVLT-II and thus has norms
from the general population.
For most PVTs, however, rather than administering the measures to a large number of
“typical” individuals of various ages, ethnicities, and even clinical diagnoses, researchers have
examined the pattern of performance retrospectively in clinical samples that may have had some
incentive to underperform (i.e., secondary gain), such as litigants (Roberson et al., 2013) or
individuals presenting for consultative evaluations for Social Security disability determination
(Chafetz, 2011; Chafetz and Underhill, 2013). An alternative methodology is to use
simulation/nonsimulation samples in which one group of participants is told to perform poorly as
if they had some type of impairment and the other is told to perform typically. Performances in
these types of groups have then been used to establish cutoff scores via (1) identification of a
fixed but arbitrary cut-off score of performance, or (2) identification of an “empirical floor”
based on the lowest level of performance of a chosen clinical sample (the “known groups”
approach, i.e., severely brain-injured patients) (Bianchini et al., 2001). One concern with this
methodology is that data from simulators, especially data used to determine the sensitivity or
specificity of a PVT, may not be applicable to real-world clinical samples (Boone et al., 2002,
2005) In fact, few PVTs (other than some embedded PVTs such as CVLT-II Forced-Choice
Recognition) have been normed on population-based samples or samples that are not biased in
some way due to the method of recruitment (Freedman and Manly, 2015). Thus, the applicability
or generalizability of cutoff scores to a broader (i.e., nonforensic) population is questionable.
As a result of this methodology, there are no true “traditional” normative data for many
of these measures. However, the need for this type of normative data is minimal given the fact
that the simple nature of tasks allows most patients with even severe brain injury, let alone
“typical” individuals, to perform at near perfect levels (Larrabee, 2014a). Because of these
skewed performance patterns, expectations for sensitivity and specificity for detection of poor
performance have been developed rather than traditional norms (Greve and Bianchini, 2004).
Sensitivity in this context is defined as the degree to which a performance score on the
measure will correctly identify an individual who is putting forth less than optimal effort.
Specificity is the degree to which a performance score will correctly identify a person who is
putting forth sufficient or optimal effort. Thus, to be most useful, ideally a PVT has high
sensitivity and specificity. In general, however, most PVT cutoff scores are determined to have
sensitivity within the 50–60 percent range and specificity within the 90–95 percent range.4 A
meta-analysis of 47 studies by Sollman and Berry (2011) examined the sensitivity and specificity
of five stand-alone forced-choice PVTs, finding a mean sensitivity of 69 percent and mean
specificity of 90 percent. However, the individual sensitivities and specificities of the measures
varied (e.g., WMT sensitivity ranged from 49 percent to 100 percent and specificity ranged from
25 percent to 96 percent; TOMM sensitivity ranged from 34 percent to 100 percent and
specificity ranged from 69 percent to 100 percent). There is general agreement among
neuropsychologists that PVT specificity must be at least 90 percent for a PVT to be acceptable,
in order to avoid falsely labeling valid performances as noncredible (Boone, 2007).
Sensitivity and specificity levels have been “verified” in experimental studies that
employ comparison between groups that were expected to or told to perform well and those that
were expected to or told to perform poorly. That is, researchers compared the performance on
PVTs of groups of people “known” or expected to be performing poorly (i.e., those with clear
secondary gain, those instructed to feign poor performance, or those who meet Slick and
colleagues [1999] criteria for malingering) to those who perform well on PVTs or without clear
secondary gain. Otherwise, studies have simply examined the pass/fail rates in clinical samples
and the correlations of PVT performance with performance on the broader neuropsychological
battery. There has been some comparison between the overall performance of subgroups who
failed PVTs with the performance of the subgroup that did not, with the suggestion that those
who fail PVTs tend to perform more poorly on testing overall. Although this methodology may
appear to be more appropriate to the clinical situation, it still does not provide any indication of
why an individual failed a PVT, which could be due to lack of effort or a variety of other factors,
including true cognitive impairment (Freedman and Manly, 2015).
Although many would argue that PVT failure caused by true cognitive impairment is
rare, the fact that failure could occur for valid reasons means that interpretation of PVT
performances is exceptionally critical and must be done very cautiously. There are insufficient
data related to the base-rate of below-chance performances on PVTs in different populations
(Freedman and Manly, 2015). As Bigler (2012, 2014, 2015) points out, there are many
individuals whose performances fall within a grey area, meaning they perform below the
identified cut-off level but above chance. For example, individuals with multiple sclerosis,
schizophrenia, traumatic brain injury, or epilepsy have PVT failure rates of 11–30 percent in
terms of falling below standard cutoff scores, even in the absence of known secondary gain
(Hampson et al., 2013; Stevens et al., 2014; Suchy et al., 2012). Davis and Mills (2014)
identified increased rates of PVT failure in individuals with lower educational status and lower
functional status (i.e., independence in activities of daily living). Alternatively, others contend
that concerns about grey area performance are unfounded, as the risk for false positives can be
minimized, For example, Larrabee (2012, 2014a,b), Boone (2009, 2014), and others assert that
multiple PVT failures are generally required,5 and as the number of PVT failures increase, the
chance for a false positive approaches zero. Yet, it is possible that PVT failures (i.e., below
cutoff score performance) in certain populations reflect legitimate cognitive impairments. For
this reason, it has also been recommended that close attention be paid to the pattern of PVT
performance and the potential for false positives in these at-risk populations in order to inform
interpretation and reduce the chances for false positives (Larrabee 2014a,b) and inform future
PVT research (Boone, 2007; Larrabee, 2007).
For these reasons, it is necessary to evaluate PVTs in the context of the individual
claimant, including interpretation of the degree of PVT failure (e.g., below chance performance
vs. performance slightly below cutoff score performance) and the consistency of failure across
PVTs. Furthermore, careful interpretation of grey area PVT performance (significantly above
chance but below standard cutoffs) is necessary, given that a significant proportion of individuals
with bona fide mental or cognitive disorders may score in this “grey area.” Adding to the
complexity of interpreting these scores, population-based norms, and certainly norms for specific
5
The exception being a single below chance failure on a forced-choice PVT is sufficient to render scores invalid.
patient groups, are not available for most PVTs. Rather, owing to the process of development of
these tasks, normative data exist only for select populations, typically litigants or those seeking
compensation for injury. Thus, there are no norms for specific demographic groups (e.g.,
racial/ethnic minority groups). It has been suggested that examiners can compensate for these
normative issues by using their clinical judgment to identify an alternate cutoff score for
increased specificity (which will come at a cost of lower sensitivity) (Boone, 2014). For
example, if an examiner identifies cultural, ethnic, and/or language factors known to affect PVT
scores, the examiner should adjust their thresholds for identifying noncredible performance
(Salazar et al., 2007).
Despite the practice standard of using multiple PVTs, there may be an increased
likelihood of abnormal performances as the number of measures administered increases, a
pattern that occurs in the context of standard cognitive measures (Schretlen et al., 2008).This
type of analysis is beginning to be applied to PVTs specifically with inconsistent findings to
date. Several studies examining PVT performance patterns in groups of clinical patients have
indicated that it is very unlikely that an individual putting forth good effort on testing will fail
two or more PVTs regardless of type of PVT (i.e., embedded or free-standing) (Iverson and
Franzen, 1996; Larrabee, 2003). In fact, Victor and colleagues (2009) found a significant
difference in the rate of failure on two or more embedded PVTs between those determined to be
credible responders (5 percent failure) and noncredible responders (37 percent failure) in a
clinical referral sample. Davis and Millis (2014) also found no predictive relation between the
number of PVTs administered and the rate of PVT failure in a retrospective review of 158
consecutive referrals for evaluation. In contrast, others have utilized statistical modeling
techniques to argue that there is an increased rate of false positives PVT failures with increased
number of PVTs administered (Berthelson et al., 2013; Bilder et al, 2014). Thus, ongoing careful
interpretation of failure patterns is warranted.
Clinical use and research on PVT use in pediatric samples to date is significantly limited
compared to that in adults. As such, specific pediatric criteria to determine pass/fail
performances on PVTs do not exist. However, in general, the conclusion has been that children,
even down to age 5 years, typically are able to pass most stand-alone measures of effort even
when compared to the adult-based cutoff scores (DeRight and Carone, 2015). Despite these
greater limitations in normative data, use of PVTs is becoming common practice even in
pediatric patient samples. As in adults, children’s performance on PVTs has been correlated with
intellectual abilities (Gast and Hart, 2010; MacAllister et al., 2009), although even those with
mildly impaired cognitive abilities have been able to pass stand-alone measures (Green and
Flaro, 2003). Additionally, in samples of consecutive clinical referrals, failure on PVTs has not
been associated with demographic, developmental disorders, or neurological status (Kirkwood et
al., 2012). Even children with documented moderate to severe brain injury/dysfunction have
been found to pass PVTs at the expected adult level (Carone, 2008). There are currently no
studies examining PVT use with children younger than age five; however, research has shown
that deception strategies at this age generally cannot be sustained and are fairly basic and
obvious. As such, behavioral observations are important to assessing validity of cognitive testing
with preschool-aged children (DeRight and Carone, 2015; Kirkwood, 2014).
As suggested above, there are many claimants for whom administration of cognitive or
neuropsychological testing would be beneficial to improve the standardization and credibility of
determinations based on allegations of disability on the basis of cognitive impairment. The
discussion below should not be considered all-inclusive, but rather as an attempt to highlight
categories of disability applicants in which cognitive or performance-based testing would be
appropriate.
Intellectual Disability
The SSA has clear and appropriate standards for documentation for individuals applying
for disability on the basis of intellectual disability (SSA, n.d.-a). As stated by SSA,
“Standardized intelligence test results are essential to the adjudication of all cases of intellectual
disability” if the claimant does not clearly meet or equal the medical listing without. There are
individual cases, of course, in which the claimant’s level of impairment is so significant that it
precludes formalized testing. For these individuals, their level of functioning and social history
provides a longitudinal consistent record and documentation of impairment. For those who can
complete intellectual testing and for whom their social history is inconsistent, inclusion of some
documentation or assessment of effort may be warranted and would help to validate the results of
intellectual and adaptive functioning assessment.
Use of PVTs is common among practitioners assessing for intellectual disability, with the
TOMM being the most commonly used measure (Victor and Boone, 2007). However, caution is
warranted in interpreting PVT results in individuals with intellectual disability, as IQ has
consistently been correlated with PVT performance (Dean et al., 2008; Graue et al., 2007; Hurley
and Deal, 2006; Shandera et al., 2010). More importantly, individuals with intellectual disability
fail PVTs at a higher rate than those without (Dean et al., 2008; Salekin and Doane, 2009). In
fact, Dean and colleagues (2008) found in their sample that all individuals with an IQ of less than
70 failed at least one PVT. Thus, cutoff scores for individuals with suspected intellectual
disability may need to be adjusted due to a higher rate of false-positive results in this population.
For example, lowering the TOMM Trial 2 and Retention Trial cutoff scores from 45 to 30
resulted in very low false-positive rates (0–4 percent) (Graue et al., 2007; Shandera et al., 2010).
Neurocognitive Impairments
There are individuals who apply for disability with primary allegations of cognitive
dysfunction in one or more of the functional domains outlined above (e.g., “fuzzy” thinking,
slowed thinking, poor memory, concentration difficulties). Standardized cognitive test results, as
has been required for individuals claiming intellectual disability, are essential to the adjudication
of such cases. These individuals may present with cognitive impairment due to a variety of
reasons including, but not limited to, brain injury or disease (e.g., traumatic brain injury or
stroke) or neurodevelopmental disorders (e.g., learning disabilities, ADHD). Similarly, disability
applicants may claim cognitive impairment secondary to a psychiatric disorder. For all of these
claimants, documentation of impairment in functional cognitive domains with standardized
cognitive tests should be required. Within the process of collection of test result evidence of
CONCLUSION
REFERENCES
AACN (American Academy of Clinical Neuropsychology ). 2007. AACN practice guidelines for
neuropsychological assessment and consultation. Clinical Neuropsychology 21(2):209-231.
Allen, L. M., III, R. L. Conder, P. Green, and D. R. Cox. 1997. CARB ‘97: Manual for the computerized
assessment of response bias. Durham, NC: CogniSyst.
American Psychiatric Association. 2013. The diagnostic and statistical manual of mental disorders:
DSM-5. Washington, DC: American Psychiatric Association.
APA (American Psychological Association). 2015. Guidelines and principles for accreditation of
programs in professional psychology: Quick reference guide to doctoral programs.
http://www.apa.org/ed/accreditation/about/policies/doctoral.aspx (accessed January 20, 2015).
Benedict, R. H. 1997. Brief visuospatial memory test—revised: Professional manual. Lutz, FL:
Psychological Assessment Resources.
Benedict, R. H., D. Schretlen, L. Groninger, and J. Brandt. 1998. Hopkins verbal learning test–revised:
Normative data and analysis of inter-form and test-retest reliability. The Clinical
Neuropsychologist 12(1):43-55.
Benton, L., K. Hamsher, and A. Sivan. 1994. Controlled oral word association test. Multilingual Aphasia
Examination 3.
Benton, A. L., K. S. de Hamsher, N. R. Varney, and O. Spreen. 1983. Contributions to
neuropsychological assessment: A clinical manual. New York: Oxford University Press.
Bianchini, K. J., C. W. Mathias, and K. W. Greve. 2001. Symptom validity testing: A critical review. The
Clinical Neuropsychologist 15(1):19-45.
Bigler, E. D. 2014. Use of symptom validity tests and performance validity tests in disability
determinations. Paper commissioned by the Committee on Psychological Testing, Including
Symptom Validity Testing, for Social Security Administration Disability Determinations.
http://www.iom.edu/psychtestingpaperEB (accessed April 9, 2015).
Binder, L. M. 1993. Portland digit recognition test manual—second edition. Portland, OR: Private
Publication.
Binder, L. M., G. L. Iverson, and B. L. Brooks. 2009. To err is human: "Abnormal" neuropsychological
scores and variability are commin in healthy adults. Archives of Clinical Neuropsychology 24: 31-
46.
Binder, L. M., M. R. Villanueva, D. Howieson, and R. T. Moore. 1993. The Rey AVLT recognition
memory task measures motivational impairment after mild head trauma. Archives of Clinical
Neuropsychology 8:137–147.
Binder, L. M., and S. C. Willis. 1991. Assessment of motivation after financially compensable minor
head trauma. Psychological Assessment, 3(2):175–181.
Boone, K. B. 2007. Assessment of feigned cognitive impairment: A neuropsychological perspective. New
York: Guilford Press.
Boone, K. B. 2009. The need for continuous and comprehensive sampling of effort/response bias during
neuropsychological examinations. Clinical Neuropsychologists 23(4):729-741.
Boone, K. B. 2014. Selection and use of multiple performance validity tests (PVTs). Presentation to IOM
Committee on Psychological Testing, Including Validity Assessment, for Social Security
Administration: Meeting 2, June 25, 2014, Washington, DC.
Boone, K. B. and P. Lu. 2007. Non-forced-choice effort measures. In Assessment of malingered
neurocognitive deficits, edited by G. J. Larrabee. New York: Oxford University Press. Pp. 27-43.
Boone, K. B., P. Lu, C. Back, C. King, A. Lee, L. Philpott, E. Shamieh, and K. Warner-Chacon. 2002.
Sensitivity and specificity of the Rey dot counting test in patients with suspect effort and various
clinical samples. Archives of Clinical Neuropsychology 17(7):625-642.
Boone, K. B., P. H. Lu, and D. Herzberg. 2002. The b test manual. Los Angeles: Western Psychological
Services.
Boone, K. B., P. Lu, and J. Wen. 2005. Comparison of various RAVLT scores in the detection of non-
credible memory performance. Archives of Clinical Neuropsychology 20:301-319.
Brandt, J., and R. H. Benedict. 2001. Hopkins verbal learning test, revised: Professional manual. Lutz,
FL: Psychological Assessment Resources.
Brandt, J., and W. van Gorp. 1999. American Academy of Clinical Neuropsychology policy on the use of
non-doctoral-level personnel in conducting clinical neuropsychological evaluations. Clinical
Neuropsychologist 13(4):385.
Busch, R. M., G. J. Chelune, and Y. Suchy. 2006. Using norms in neuropsychological assessment of the
elderly. In Geriatric neuropsychology: Assessment and intervention, edited by D. K. Attix and K.
A. Welsh-Bohmer. New York: Guilford Press.
Bush, S. S., R. M. Ruff, A. I. Troster, J. T. Barth, S. P. Koffler, N. H. Pliskin, C. R. Reynolds, and C. H.
Silver. 2005. Symptom validity assessment: Practice issues and medical necessity NAN policy &
planning committee. Archives of Clinical Neuropsychology 20(4):419-426.
Carone, D. A. 2008. Children with moderate/severe brain damage/dysfunction outperform adults with
mild-to-no brain damage on the medical symptom validity test. Brain Injury 22(12):960-971.
Carrow-Woolfolk, E. 1999. CASL: Comprehensive assessment of spoken language. Circle Pines, MN:
American Guidance Services.
Chafetz, M. D. 2008. Malingering on the Social Security disability consultative exam: Predictors and base
rates. The Clinical Neuropsychologist 22(3):529-546.
Chafetz, M. D. 2011. The psychological consultative examination for Social Security disability.
Psychological Injury and Law 4(3-4):235-244.
Chafetz, M. D., and J. Underhill. 2013. Estimated costs of malingered disability. Archives of Clinical
Neuropsychology 28(7):633-639.
Chafetz, M. D., J. P. Abrahams, and J. Kohlmaier. 2007. Malingering on the Social Security disability
consultative exam: A new rating scale. Archives of Clinical Neuropsychology 22(1):1-14.
Conder, R., L. Allen, and D. Cox. (1992) Computerized assessment of response bias test manual.
Durham, NC: Cognisyst.
Dean, A. C., T. L. Victor, K. B. Boone, and G. Arnold. 2008. The relationship of IQ to effort test
performance. Clinical Neuropsychologist 22(4):705-722.
Delis, D. C. 1994. CVLT-C, California verbal learning test: Children’s version: Manual. San Antonio,
TX: The Psychological Corporation.
Delis, D. C., J. H. Kramer, and E. Kaplan. 2000. California verbal learning test: CVLT-II; adult version;
manual. San Antonio, TX: The Psychological Corporation.
Delis, D., E. Kaplan, and J. Kramer. 2001. Delis-Kaplan executive function system. San Antonio, TX: The
Psychological Corporation.
DeRight, J., and D. A. Carone. 2015. Assessment of effort in children: A systematic review. Child
Neuropsychology 21(1):1-24.
Edmonds, E. C., L. Delano-Wood, D. R. Galasko, D. P. Salmon, and M. W. Bondi. 2014. Subjective
cognitive complaints contribute to misdiagnosis of mild cognitive impairment. Journal of the
International Neuropsychological Society 20(08):836-847.
Elliott, R. 2003. Executive functions and their disorders. British Medical Bulletin 65:49-59.
Etkin, A., A. Gyurak, and R. O’Hara. 2013. A neurobiological approach to the cognitive deficits of
psychiatric disorders. Dialogues in Clinical Neuroscience 15(4):419.
Etherton, J. L., K. J. Bianchini, M. A. Ciota, and K. W. Greve. 2005a. Reliable digit span is unaffected by
laboratory-induced pain: Implications for clinical use. Assessment 12(1): 101-106.
Etherton, J. L., K. J. Bianchini, K. W. Greve, and M. A. Ciota. 2005b. Test of Memory Malingering
performance is unaffected by laboratory-induced pain: Implications for clinical use. Archives of
Clinical Neuropsychology 20(3): 375-384.
Farias, S. T., D. Mungas, and W. Jagust. 2005. Degree of discrepancy between self and other‐reported
everyday functioning by cognitive status: Dementia, mild cognitive impairment, and healthy
elders. International Journal of Geriatric Psychiatry 20(9):827-834.
Faust, D., K. Hart, T. Guilmette, and H. Arkes. 1988. Neuropsychologists’ capacity to detect adolescent
malingerers. Professional Psychology: Research and Practice 19:508-515.
Frederick, R. I. 1997. Validity indicator profile manual. Minnetonka, MN: NCS Assessments.
Frederick, R. I., and H. G. Foster. (1991). Multiple measures of malingering on a forced-choice test of
cognitive ability. Psychological Assessment 3(4):596–602.
Freedman, D., and J. Manly. 2015. Use of normative data and measures of performance validity and
symptom validity in assessment of cognitive function. Paper commissioned by the Committee on
Psychological Testing, Including Validity Testing, for Social Security Administration Disability
Determinations. http://www.iom.edu/psychtestingpapersDFJM (accessed April 9, 2015).
Funahashi, S. 2001. Neuronal mechanisms of executive control by the prefrontal cortex. Neuroscience
Research 39:147-165.
Gast, J., and K. J. Hart. 2010. The performance of juvenile offenders on the test of memory malingering.
Journal of Forensic Psychology Practice 10(1):53-68.
Gervais, R. O., M. L. Rohling, P. Green, and W. Ford. 2004. A comparison of WMT, CARB, and TOMM
failure rates in non-head injury disability claimants. Archives of Clinical Neuropsychology
19(4):475-487.
Goodglass, H., and E. Kaplan. 1983. Boston diagnostic aphasia examination. Philadelphia: Lea &
Febiger.
Graue, L. O., D. T. Berry, J. A. Clark, M. J. Sollman, M. Cardi, J. Hopkins, and D. Werline. 2007.
Identification of feigned mental retardation using the new generation of malingering detection
instruments: Preliminary findings. Clinical Neuropsychologist 21(6):929-942.
Green, P., and L. Flaro. 2003. Word memory test performance in children. Child Neuropsychology
9(3):189-207.
Green, P., L. Allen, and K. Astner. 1996. The word memory test: A user’s guide to the oral and computer-
administered forms, US version 1.1. Durham, NC: CogniSyst.
Greiffenstein, M. F., W. J. Baker, and T. Gola. 1994. Validation of malingered amnesia measures with a
large clinical sample. Psychological Assessment 6(3):218.
Greiffenstein, M., R. Gervais, W. J. Baker, L. Artiola, and H. Smith. 2013. Symptom validity testing in
medically unexplained pain: A chronic regional pain syndrome type 1 case series. Clinical
Neuropsychologist 27(1):138-147.
Greve, K. W., and K. J. Bianchini. 2004. Setting empirical cutoffs on psychometric indicators of negative
response bias: A methodological commentary with recommendations. Archives of Clinical
Neuropsychology 19(4):533-541.
Griffin, G. A., J. Normington, R. May, and D. Glassmire. 1996. Assessing dissimulation among Social
Security disability income claimants. Journal of Consulting Clinical Psychology 64(6):1425-
1430.
Gronwall, D. 1977. Paced auditory serial-addition task: A measure of recovery from concussion.
Perceptual and Motor Skills 44(2):367-373.
Grote, L. G. and J. N. Hook. 2007. Forced-choice recognition tests of malingering. In Assessment of
malingered neurocognitive deficits, edited by G. J. Larrabee. New York: Oxford University Press.
27-43.
Hammill, D. D., and S. C. Larsen. 2009. Test of written language: Examiner’s manual. 4th ed. Austin,
TX: Pro-Ed.
Hampson, N. E., S. Kemp, A. K. Coughlan, C. J. Moulin, and B. B. Bhakta. 2013. Effort test performance
in clinical acute brain injury, community brain injury, and epilepsy populations. Applied
Neuropsychology: Adult (ahead-of-print):1-12.
Heaton, R. K. 1993. Wisconsin card sorting test: Computer version 2. Odessa, FL: Psychological
Assessment Resources.
Heaton, R. K., I. Grant, & C. G. Matthews. 1991. Comprehensive norms for an expanded Halstead-Reitan
Battery: Demographic corrections, research findings, and clinical applications. Odessa, FL:
Psychological Assessment Resources.
Heaton, R. K., H. H. Smith, R. A. Lehman, and A. T. Vogt. 1978. Prospects for faking believable deficits
on neuropsychological testing. Journal of Consulting and Clinical Psychology 46(5):892.
Heaton, R. K., M. Taylor, and J. Manly. 2001. Demographic effects and demographically corrected norms
with the WAIS-III and WMS-III. In Clinical interpretations of the WAIS-II and WMS-III, edited
by D. Tulsky, R. K. Heaton, G. J. Chelune, I. Ivnik, R. A. Bornstein, A. Prifitera, and M.
Ledbetter. San Diego, CA: Academic Press. Pp. 181-210.
Heilbronner, R. L., J. J. Sweet, J. E. Morgan, G. J. Larrabee, S. R. Millis, and Conference Participants.
2009. American Academy of Clinical Neuropsychology consensus conference statement on the
neuropsychological assessment of effort, response bias, and malingering. The Clinical
Neuropsychologist 23(7):1093-1129.
Holdnack, J. A., and L. W. Drozdick. 2009. Advanced clinical solutions for WAIS-IV and WMS-IV:
Clinical and interpretive manual. San Antonio: Pearson.
HNS (Houton Neuropsychological Society). 2003. The Houston Conference on Specialty Education and
Training in Clinical Neuropsychology policy statement. http://www.uh.edu/hns/hc.html (accessed
November 25, 2014).
Green, P. 2004. Green’s Memory Complaints Inventory (MCI). Edmonton: Green’s.
Green, P. 2005. Green’s word memory test for window’s: User’s manual. Edmonton: Green’s.
Green, P. 2008. Manual for nonverbal medical symptom validity test. Edmonton: Green’s.
Greiffenstein, M. F., W. J. Baker, and T. Gola. 1994. Validation of malingered amnesia measures with a
large clinical sample. Psychological Assessment 6:218–224.
Hiscock, M., and C. K. Hiscock. 1989. Refining the forced-choice method for the detection of
malingering. Journal of Clinical and Experimental Neuropsychology 11(6):967–974.
Iverson, G. L., and M. D. Franzen. 1996. Using multiple objective memory procedures to detect simulated
malingering. Journal of Clinical and Experimental Neuropsychology 18(1):38-51.
Jelicic, M., H. Merckelbach, I. Candel, and E. Geraets. 2007. Detection of feigned cognitive dysfunction
using special malinger tests: A simulation study in naïve and coached malingerers. The
International Journal of Neuroscience 117(8):1185–1192.
Johnson-Greene, D., L. Brooks, and T. Ference. 2013. Relationship between performance validity testing,
disability status, and somatic complaints in patients with fibromyalgia. The Clinical
Neuropsychologist 27(1):148-158.
Kaplan, E., H. Goodglass, and S. Weintraub. 2001. Boston Naming Test. Austin, TX: Pro-Ed.
Killgore, W. D., and L. DellaPietra. 2000. Using the WMS-III to detect malingering: Empirical validation
of the rarely missed index (RMI). Journal of Clinical and Experimental Neuropsychology
22:761–771.
Kirkwood, M. W., K. O. Yeates, C. Randolph, and J. W. Kirk. 2012. The implications of symptom
validity test failure for ability-based test performance in a pediatric sample. Psychological
Assessment 24(1):36-45.
Larrabee, G. J. 2003. Detection of malingering using atypical performance patterns on standard
neuropsychological tests. The Clinical Neuropsychologist 17(3):410-425.
Larrabee, G. J. 2012. Assessment of malingering. In Forensic neuropsychology: A scientific approach,
edited by G. J. Larrabee. New York: Oxford University Press.
Larrabee, G. J. 2014a. False-positive rates associated with the use of multiple performance and symptom
validity tests. Archives of Clinical Neuropsychology 29(4): 364-373.
Larrabee, G. J. 2014b. Performance and Symptom Validity. Presentation to IOM Committee on
Psychological Testing, Including Symptom Validity Assessment, for Social Security
Administration: Meeting 2, June 25, 2014, Washington, DC.
Lewis, R. F. 1990. Digit vigilance test. Lutz, FL: Psychological Assessment Resources.
Lezak, M., D. Howieson, E. Bigler, and D. Tranel. 2012. Neuropsychological assessment. 5th ed. New
York: Oxford University Press.
Lu, P. H., K. B. Boone, L. Cozolino, and C. Mitchell. 2003. Effectiveness of the Rey Osterrieth complex
figure test and the Meyers and Meyers recognition trial in the detection of suspect effort. The
Clinical Neuropsychologist 17:426–440.
MacAllister, W. S., L. Nakhutina, H. A. Bender, S. Karantzoulis, and C. Carlson. 2009. Assessing effort
during neuropsychological evaluation with the TOMM in children and adolescents with epilepsy.
Child Neuropsychology 15(6):521-531.
McCrea, M., J. P. Kelly, C. Randolph, R. Cisler, and L. Berger. 2002. Immediate neurocognitive effects
of concussion. Neurosurgery 50(5):1032-1042.
McCrea, M., K. M. Guskiewicz, S. W. Marshall, W. Barr, C. Randolph, R. C. Cantu, J. A. Onate, J.
Yang, and J. P. Kelly. 2003. Acute effects and recovery time following concussion in collegiate
football players: The NCAA concussion study. JAMA 290(19):2556-2563.
Meyers, J. E., and M. Volbrecht. 1999. Detection of malingers using the Rey Complex Figure and
Recognition Trial. Applied Neuropsychology 6: 201–207.
Mittenberg, W., C. Patton, E. M. Canyock, and D. C. Condit. 2002. Base rates of malingering and
symptom exaggeration. Journal of Clinical and Experimental Neuropsychology 24(8):1094-1102.
Mittenberg, W., C. Patton, and W. Legler. 2003. Identification of malingered head injury on the
Wechsler Memory Scale-Third Edition. Paper presented at the annual conference of the National
Academy of Neuropsychology, Dallas, TX.
Moritz, S., S. Ferahli, and D. Naber. 2004. Memory and attention performance in psychiatric patients:
Lack of correspondence between clinician-rated and patient-rated functioning with
neuropsychological test results. Journal of the International Neuropsychological Society
10(04):623-633.
NAN (National Academy of Neuropsychology). 2001. NAN definition of a clinical neuropsychologist:
Official position of the National Academy of Neuropsychology.
https://www.nanonline.org/docs/PAIC/PDFs/NANPositionDefNeuro.pdf (accessed November 25,
2014).
NIH (National Institutes of Health). n.d. NIH Toolbox: Processing speed.
http://www.nihtoolbox.org/WhatAndWhy/Cognition/ProcessingSpeed/Pages/default.aspx
(accessed October 15, 2014).
Niccolls, R., and J. F. Bolter 1991. Multi-digit memory test. San Luis Obispo, CA: Wang
Neuropsychological Laboratories.
OIDAP (Occupational Information Development Advisory Panel). 2009. Mental cognitive subcommittee:
Content model and classification recommendations.
http://www.ssa.gov/oidap/Documents/AppendixC.pdf (accessed October 6, 2014).
Paulhus, D. L. 1998. Paulhus Deception Scales (PDS). Toronto: Multi-Health Systems.
Randolph, C. 1998. Repeatable Battery for the Assessment of Neuropsychological Status (RBANS). San
Antonio, TX: Psychological Corporation.
Rao, S. M. 1986. Neuropsychology of multiple sclerosis: A critical review. Journal of Clinical and
Experimental Neuropsychology 8(5):503-542.
Reitan, R. M. 1992. Trail making test: Manual for administration and scoring. Mesa, AZ: Reitan
Neuropsychology Laboratory.
Reitan, R. M., and D. Wolfson. (1993). The Halstead-Reitan neuropsychological test battery: Theory and
clinical interpretation—second edition. Tucson: Neuropsychology Press.
Rey, A. 1941. L’examen psychologique dans les cas d’encéphalopathie traumatique (les problems).
Archives de Psychologie 28:286-240.
Rey, A. 1964. The clinical examination in psychology. Paris, France: Presses Universitaires de France.
Suhr, J. A., and D. Boyer. 1999. Use of the Wisconsin card sorting test in the detection of malingering in
student simulator and patient samples. Journal of Clinical and Experimental Neuropsychology
21:701–708.
Sweet, J. J., D. G. Meyer, N. W. Nelson, and P. J. Moberg. 2011. The TCN/AACN 2010 “salary survey”:
Professional practices, beliefs, and incomes of US neuropsychologists. The Clinical
Neuropsychologist 25(1):12-61.
Tombaugh, T. N., and P. W. Tombaugh. 1996. Test of memory malingering: TOMM. North Tonawanda,
NY: Multi-Health Systems.
Trahan, D. E., and G. J. Larrabee. 1988. Continuous visual memory test. Odessa, FL: Psychological
Assessment Resources.
van Gorp, W. G., L. A. Humphrey, A. Kalechstein, V. L. Brumm, W. J. McMullen, M. Stoddard, and N.
A. Pachana. 1999. How well do standard clinical neuropsychological tests identify malingering?:
A preliminary analysis. Journal of Clinical and Experimental Neuropsychology 21(2):245-250.
Victor, T. L., and K. B. Boone. 2007. Identification of feigned mental retardation. In Assessment of
feigned cognitive impairment, edited by K. Boone. New York: Guilford Press. Pp. 310-345.
Victor, T. L., K. Boone, J. G. Serpa, J. Buehler, and E. Ziegler. 2009. Interpreting the meaning of multiple
symptom validity test failure. The Clinical Neuropsychologist 23(2):297-313.
Warrington, E. 1984. Recognition memory test manual. Windsor: Nfer-Nelson.
Wechsler, D. 1997. Wechsler adult intelligence scale (WAIS-III): Administration and scoring manual—
3rd edition. San Antonio: Psychological Corporation.
Wechsler, D. 2003. Wechsler intelligence scale for children—fourth edition (WISC-IV). San Antonio,
TX: The Psychological Corporation.
Wechsler, D. 2008. Wechsler adult intelligence scale—fourth edition (WAIS-IV). San Antonio, TX: NCS
Pearson.
Wechsler, D. 2009. WMS-IV: Wechsler memory scale-administration and scoring manual. San Antonio,
TX: The Psychological Corporation.
WHO (World Health Organization). 2001. International classification of functioning, disability and
health (ICF). Geneva, Switzerland: WHO.
Young, G. 2014. Resource material for ethical psychological assessment of symptom and performance
validity, including malingering. Psychological Injury and Law 7(3):206-235.
Economic Considerations
This chapter discusses the possible financial impact of the committee’s recommendations
that the Social Security Administration (SSA) require systematic use of standardized
psychological testing for a broader set of physical and mental impairments than is current
practice for applicants who allege cognitive impairment or whose allegation of functional
impairment is based solely on self-report. Although the committee’s recommendations are based
on its assessment of the scientific evidence underlying standardized psychological testing and of
the contributions such testing could make to determinations regarding the extent of impairment
and degree of functional capacity in those populations, it recognizes that financial considerations
also are relevant to decisions regarding implementation of psychological testing. In this context,
the chapter provides an initial framework for evaluating the economic costs of implementation
and highlights the types of data that will be needed to accurately determine the financial impact
of mandatory psychological testing as recommended by the committee for disability
determinations. A more thorough assessment of the financial implications is beyond the
committee’s ability or charge.
The chapter begins with a discussion of the potential cost outlays associated with
required psychological testing and describes how these costs vary by test type, provider, and
geographical location. As a benchmark, simple cost estimates are provided, along with
sensitivity analysis that illustrates the relationship between financial outlays and the size of the
applicant population requiring testing. The chapter then focuses on the potential financial
benefits of testing, primarily any cost savings from expanding the use of psychological testing as
recommended by the committee. In this context, the chapter discusses research arguing that
requiring psychological testing, specifically symptom validity tests (SVTs) and performance
validity tests (PVTs), will generate significant savings for the Social Security Disability
Insurance (SSDI) and Supplemental Security Income (SSI) programs by greatly reducing the
number of “false” favorable determinations (false positives). The chapter concludes with a
summary of the types of data that SSA and state Disability Determination Services (DDS) offices
would need to collect in order to accurately assess the net financial impact of implementation.
6-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
1
It is difficult to project how many applicants would respond to testing requirements by seeking testing in advance
of filing an application. One way SSA could estimate this is by examining the share of applicants with intellectual
disabilities who file for benefits with all required testing in the application.
2
In some cases tests could be administered online using computer-administered tests. These tests still require a
licensed provider to interpret the results.
3In some cases, costs of services are significantly lower when provided inside a facility. Since most of the applicants
for disability benefits live in the community rather than in an institution, the present discussion focuses on non-
facility prices.
4
The codes listed reflect a sample of codes that may be used by providers.
5
The length of an evaluation will vary depending on the purpose of the evaluation, and more specifically, the type
psychological and/or cognitive impairments being assessed. Most psychological and neuropsychological evaluations
include a (1) clinical interview, (2) administration of standardized cognitive or non-cognitive psychological tests,
and (3) professional time for interpretation and integration of data. The relevant CPT codes for each of these
processes are generally billed in 1 hour per unit of service (the exception is 96150, which is a 15 minute/unit code).
That is, an evaluation may include billing for 1 hour for clinical interview (96116), 1 hour for administration of tests
(96119), and 1 hour for interpretation and integration (96118) for a total of 3 hours of clinical service. However, a
more complex case likely will require additional hours of test administration and interpretation/integration in order
to fully answer the clinical question. In fact, the results of a national professional survey indicate that billing for a
typical neuropsychological evaluation is roughly 6 hours, with a range from 0.5 to 25 hours (Sweet et al., 2011).
The average cost of testing services varies by the type of testing, psychological versus
neuropsychological, and by the type of provider, as in a psychologist or physician versus a
technician.6 For an equivalent unit of service, a psychiatric diagnostic interview is the most
expensive and was reimbursed by Medicare at an average rate of $134 in 2014. Psychological
testing by a technician is the least expensive, with an average reimbursement rate of $66 in 2014.
As the minimum and maximum values in the table highlight, the cost of purchasing
qualified psychological testing services of any type varies considerably across states and
localities (SSAB, 2012, p. 52, Figure 47). For example, in the most expensive area, 1 hour of
psychiatric evaluation costs $188 compared to $124 in the lease expensive area. There is also
substantial variation in service costs for general psychological testing, with the variation greater
among technician-provided services than services provided by psychologists or physicians. The
variation in pricing is similarly large for neuropsychological testing. For physicians or
6
The table includes both weighted and unweighted averages. Weighted averages are appropriate for considering
total costs to SSA since they are weighted to reflect population differences across counties in which the
reimbursement rate holds. Unweighted averages provide information relevant to considering cost dispersion across
states. Average prices referenced in the text reflect weighted averages.
The cost of requiring psychological testing depends on the price of the tests and on the
number of individuals who must be tested. There is no straightforward way to map the
committee’s recommendations regarding who should receive psychological testing onto SSA’s
publicly available data to derive an accurate measure of the size of the tested population.7
However, the data do permit the calculation of cost estimates associated with testing groups of
applicants the committee judges to be most likely to fall under the recommendations in this
report. The results of this exercise are provided in Table 6-2. The table shows cost computations
for testing applicants who reach Step 4 or 5 of the disability determination process described in
Chapter 2. These are individuals who did not qualify for benefits by meeting or equaling the
medical listings but were sent along for further evaluation, rather than being denied. By
definition, these are individuals for whom a determination regarding benefits requires further
case development, including assessment of their ability to perform substantial gainful activity at
some job in the national economy.8 In addition to calculations for all applicants reaching this
stage, the table shows cost estimates should psychological testing be required for the subset of
applicants with mental impairments other than intellectual disabilities or arthritis and back
disorders.
The results from this exercise demonstrate the variation in projected costs associated with
factors related to implementation including which tests will be required, the qualifications
mandated for testing providers, and the number of individuals who will need to be tested. For
example, if SSA provided psychiatric diagnostic interviews at the average Medicare
reimbursement rate for all applicants reaching Step 4 or 5, the cost would be $212 million. This
cost would drop to $51 million if such testing were only provided to applicants with mental
disorders (excluding intellectual disabilities). Similarly, costs would be lower if other forms of
psychological testing were required or if other types of service providers were used.
Importantly, the cost estimates in Table 6-2 assume that SSA will be responsible for all
the costs of psychological testing. However, as noted previously, some applicants may acquire
and include required tests as part of the medical records presented at application. In this case, the
cost to SSA would be minimal, providing that the disability determination offices already have
sufficient personal to adequately evaluate the test findings.
7
SSA collects a variety of data that it does not provide publicly and may be able to do a more accurate initial
assessment of the costs associated with the recommendations. However, to fully measure the potential costs it is
likely that SSA would need to pilot the use of testing and the costs associated with it.
8 For children applying for SSI, the evaluation is based on attending school rather than working.
Total Cost N/A $51,909 $31,367 $25,676 $36,780 $38,446 $31,507 $8,326
SSI Child
Claimants 297 $39.79 $24 $20 $28 $29 $24 $6.38
Total Cost N/A $72,771 $43,973 $35,994 $51,561 $53,897 $44,169 $11,672
All Diagnostic Groups
Psychological Neuropsycho-
Psychiatric Testing by Psychological logical Testing Neuropsycho- Health and
Number Diagnostic Psychologist/ Testing by Neurobehavioral by Psychologist/ logical Testing Behavioral
of Interview Physician Technician Status Exam Physician by Technician Assessment
Persons (90791) (96101) (96102) (96116) (96118) (96119) (96150)
SSDI
Claimants 584,669 $78,333.95 $47,335 $38,746 $55,503 $58,017 $47,545 $12,564.54
Concurrent
Claimants 515,157 $69,020.73 $41,708 $34,139 $48,904 $51,119 $41,893 $11,070.72
SSI Adult
Total Cost N/A $212,140 $128,190 $104,930 $150,309 $157,118 $128,760 $34,027
NOTE: Based on 2013 application data and 2014 Medicare pricing information, geographically weighted. Values in Table 6-2 may not exactly reflect
multiplication of weighted pricing data from Table 6-1 and number of persons in column one of Table 6-2 due to rounding error.
SOURCES: Centers for Medicare & Medicaid Services, 2015; SSA, 2014c, d, e; and committee calculations.
Another assumption implicit in this simple cost calculation is that the psychological
testing would be added to current DDS case development costs. To the extent that psychological
testing replaces rather than augments existing case development modalities, the costs to SSA
would be lower than the simple estimates in the table. There are good reasons to believe that this
might be the case. Consultative exams are already a common component of disability
determinations.9 Some of these exams include psychological testing and it might be possible to
add additional tests with limited additional costs.
Of course, the estimates in Table 6-2 could also understate the costs, especially since the
calculations rely on a mapping of the recommendations to publically available data that may
insufficiently capture the true number of individuals that could require testing. Accurately
assessing the costs of mandatory psychological testing by SSA will require more detailed
information on the parameters of implementation as well as experience in the field once testing
has begun.
Recent calls for greater use of psychological testing in SSA’s disability determination
process assume that the current process is making significant mistakes and allowing unqualified
applicants onto the disability programs (Chafetz and Underhill, 2013; IOPC, 2013). However,
the committee has been unable to uncover any evidence on either side of this claim. At present,
there do not appear to be any independently conducted studies regarding the accuracy of the
disability determination process as implemented by DDS offices. As such, it is difficult to assess
whether greater use of psychological testing will increase, decrease, or leave unchanged the
number of individuals awarded benefits. The outcome depends on how accurately DDS offices
currently are in making disability determinations.
Even if the DDS offices are making relatively accurate determinations in the absence of
psychological testing, greater standardization could produce other benefits. A more standardized
process could potentially reduce the number of applicants who appeal their decisions. For
applicants who do appeal, the inclusion of psychological testing in the medical records could
help reduce the burden on administrative law judges to make subjective determinations on the
adequacy of the claim. Standardization might also make the process more transparent and
efficient, improving public understanding and reducing the time it takes to process claims.
However, none of these potential benefits can be quantified without additional research on the
accuracy and efficiency of current practice. Such an assessment is an important first step in
developing an implementation strategy for the committee’s recommendations.
One of the main purported benefits of mandatory psychological testing is its potential to
generate significant savings for the SSDI and SSI programs. The proponents of this view argue
that requiring psychological testing (SVTs and PVTs) for SSDI and SSI applicants would result
in a significant reduction of the number of individuals allowed onto the benefit rolls. For
example, Chafetz and Underhill (2013), estimate that requiring SVTs and PVTs in the DDS
9
On average 47 percent of disability evaluations include a consultative examination, although there is considerable
variation across states (SSA, 2014a, b).
process would save approximately $12.8 billion for the SSDI system and $7.2 billion for the SSI
system, or about 40 percent of total program costs (Table 6-3 and Table 6-4 reproduced from
Chafetz and Underhill [2013]). The estimated savings results from the assumed reduction in the
number of falsely awarded individuals coming onto the disability programs.10
TABLE 6-3 Calculation of 2011 SSDI Costs for Each Level of Malingering of Mental Disorders
Level (%) No. Disabled Workers = 2,768,928 2011 Total Cost $32,067,993,684
10 276,893 $3.207 B
20 553,786 $6.414 B
30 830,678 $9.620 B
40 1,107,571 $12.827 B
50 1,384,464 $16.034 B
60 1,661,357 $19.241 B
70 1,938,250 $22.448 B
80 2,215,142 $25.654 B
90 2,492,035 $28.861 B
NOTE: The 40 percent rate is bolded as the probable rate of malingering given in Larrabee, Millis, and
Meyers (2009). For the SSDI Total, the number of disabled workers is used, removing spouse and child
beneficiaries. Costs were estimated by multiplying the average disability figure for each mental condition
by the December 2011 number of individuals with that condition, summing over all conditions, and then
multiplying by 12 for the yearly estimated amount.
SOURCE: Chafetz and Underhill, 2013. Reproduced with permission.
10
Improved accuracy could also decrease the number of individuals falsely denied benefits. However, the focus of
the literature has been on reducing those falsely allowed onto the program.
TABLE 6-4 Calculation of 2011 SSI (Adult) Costs for Each Level of Malingering of Mental Disorders
Level (%) No. of Adults less than age 65 = 2,797,743 2011 Total Cost $32,067,993,684
10 279,774 $1.799 B
20 559,549 $3.597 B
30 839,323 $5.396 B
40 1,119,097 $7.195 B
50 1,398,872 $8.994 B
60 1,678,646 $10.792 B
70 1,958,420 $12.591 B
80 2,238,194 $14.390 B
90 2,517,969 $16.189 B
NOTE: The 40 percent rate is bolded as the probable rate of malingering given in Larrabee, Millis, and
Meyers (2009). The SSI figures include the number of adults (less than age 65) minus the children as of
December 2011.Costs were estimated by multiplying the average disability figure for each mental
condition by the December 2011 number of individuals with that condition, summing over all conditions,
and then multiplying by 12 for the yearly estimated amount. B = billion
SOURCE: Chafetz and Underhill, 2013. Reproduced with permission.
The committee performed a critical evaluation of this estimate and concluded that it is
based on several assumptions that if violated would substantially lower the projected cost
savings. Most important is the assumption that the current disability determination process, as
implemented by DDS offices, is unable to detect any applicants who exaggerate or fabricate their
impairments and related functional limitations. Although not stated directly in the analysis, this
assumption is implicit in the authors’ use of base rates of malingering from populations of
applicants and claimants ex ante of any disability screening. For example, the $12.8 and $7.2
billion savings computed by Chafetz and Underhill (2013) assumes that 40 percent of current
SSDI and SSI beneficiaries were falsely awarded and would have been denied benefits if given a
SVT or PVT as part of the disability determination process. This assumption is synonymous with
the view that DDS offices currently detect no one who exaggerates or fabricates their condition,
symptoms, or functional limitations. In other words, the Chafetz and Underhill computation
assumes that under current practice 40 percent of all awardees are given benefits even though
they are not truly eligible. The extremeness of the Chafetz and Underhill assumption suggests
that the cost savings associated with psychological testing is likely to be lower than they suggest.
The other important assumption embedded in the Chafetz and Underhill projected cost
savings is that SVTs and PVTs would be retroactively applied to the population of existing
beneficiaries, regardless of time on the program.11 Should SSA choose to implement mandatory
SVT and PVT testing, it would likely do so for new applicants to the disability programs, making
the potential cost savings lower than that computed by Chafetz and Underhill.
Finally, the Chafetz and Underhill calculation is static. The more appropriate method of
computing cost savings is to consider the present discounted value of an estimated stream of
potential benefit savings, which would generate much larger estimate.
The importance of altering the assumptions about improved accuracy of disability
determinations and the size of the population exposed to testing can be seen in Table 6-5.
Reflecting the mapping of the committee’s recommendations for testing used in Table 6-2, cost
savings are estimated for new awardees with mental impairments other than intellectual
disabilities and for those with arthritis and back disorders. For completeness, the estimates are
also provided for all new beneficiaries, regardless of condition and for all awardees and
awardees determined eligible in Steps 4 or 5 of the disability determination process. The
alternative estimates also show the sensitivity of the estimated cost savings to the assumption
about the potential for mandatory SVT and PVT use to improve the accuracy of SSA disability
determinations. The 40 percent test failure rate preferred by Chafetz and Underhill (2013) applies
if the current SSA process detects zero percent of those who exaggerate or fabricate; the 10
percent test failure rate applies if SSA is relatively accurate, but makes some false positive errors
that would be identified through the use of SVTs and PVTs.
Several important points emerge from the computations in the table. First, the potential
annual cost savings associated with mandatory SVT and PVT testing is substantially reduced
when it is applied to new awardees rather than all beneficiaries on the programs. Considering
only new awardees with mental impairments other than intellectual disabilities, the cost savings
assuming the 40 percent malingering rate is $236 million for SSDI and $153 million for SSI,
about one-fifth of the savings reported by Chafetz and Underhill (2013). Second, cost savings are
also reduced when the assumption about the accuracy improvements associated with symptom
and validity testing are relaxed. If SSA misses 10, rather than 40, percent of those with
exaggerated or fabricated claims the cost savings from mandatory testing on new awardees with
mental impairments other than intellectual disabilities falls from $236 to $59 million for SSDI
and from $153 to $38 million for SSI adults. Finally, cost savings decline if testing is required
only for applicants who reach Steps 4 or 5 of the disability determination process. Although
these estimates are far from exact, they suggest that caution is warranted when projecting
potential cost savings from mandatory psychological testing.
As noted earlier, the static calculations in Table 6-5, although useful for comparing to
Chafetz and Underhill, are not appropriate for computing the expected savings associated with
implementing SVTs and PVTs in SSA’s disability determination process. The expected program
savings is more accurately calculated as the present discounted value of the averted payment
flows associated with the denied applicants captured by psychological testing. Using the same
diagnostic categories as in Table 6-5, Table 6-6 shows the present discounted value of expected
savings from disallowing an unqualified applicant from each of the three disability programs.
The table also shows the estimated program savings to SSA under the assumption that
psychological testing as recommended would result in the denial of benefits to 10 percent of
applicants who would otherwise receive them.
11
Chafetz and Underhill (2013) limit the group to those with mental disorders, but even so this assumption greatly
increases the cost savings associated with greater use of testing, since it essentially applies the 40 percent base
malingering rate to all existing beneficiaries.
disability
SSI Children 72,203 41,636 $202,081 $116,531 $50,520 $29,133
Arthritis and SSDI 117,512 109,295 $671,336 $624,393 $167,834 $156,098
Back
Concurrent 46,459 42,098 $173,628 $157,330 $43,407 $39,332
Disorders
SSI Adults 32,649 29,677 $81,172 $73,783 $20,293 $18,466
SSI Children 622 244 $1,546 $607 $387 $152
All Diagnostic SSDI 399,722 233,522 $2,069,914 $1,209,267 $517,479 $302,317
Groups
Concurrent 210,812 111,331 $787,853 $416,070 $196,963 $104,017
SSI Adults 183,930 90,792 $498,182 $245,914 $124,546 $61,479
SSI Children 171,574 90,479 $464,716 $$245,066 $116,179 $61,267
a
SSDI benefit data are from 2012, and SSI and concurrent benefit data are from 2013. For concurrent enrollees there are no data available on
Two points emerge from the table. First, the expected cost savings associated with
denying an applicant improperly allowed on the program can be sizeable, depending on the
diagnosis and program. The estimated savings are largest for individuals with mental
impairments; this reflects the earlier age of benefit receipt and longer average time on the
program. Estimated savings are smallest for SSI recipients with arthritis and back pain again
largely reflecting the age at which recipients enter the program. Second, the amount of program
savings that comes from implementing psychological testing depends mostly on how many
additional individuals would be identified as unqualified for benefits relative to current practice.
It is important to keep in mind that psychological testing as recommended may also result in the
awarding of benefits to some portion of applicants who otherwise would be denied. Assuming
that implementation of psychological testing reduces the number of newly awarded beneficiaries
by 10 percent, the savings per cohort, while significant, still would be less than the annual
savings estimated by Chafetz and Underhill.
TABLE 6-6 Estimated Lifetime Spending on an Individual Disability Awardee, 2% Annual Discounting
Cohort Lifetime Cohort Lifetime Cohort Lifetime
Individual Individual Savings—10% Savings—10% Savings—10%
Lifetime Lifetime Test Failure Test Failure Test Failure
Savings—SSDI Savings—SSI Rate of New Rate of New SSI Rate of New SSI
Average Benefit Average Benefit SSDI Awardees Adult Awardees Child Awardees
Mental Disorder $202,121 $119,101 $1,004,542,011 $650,756,461 $859,945,621
(excluding
Intellectual
Disability)
Arthritis and $171,561 $74,662 $2,016,047,512 $243,763,319 $4,643,964
Back Disorders
All Diagnostics $161, 434 $84,438 $6,452,880,242 $1,553,065,482 $1,448,734,067
Groups
NOTE: SSDI benefit data from 2012, SSI from 2013. The average benefit amount for mental disabilities
(excluding intellectual disability) was calculated as a weighted average of the average monthly benefits
awarded for mental disability diagnoses (excluding intellectual disabilities) using diagnostic distribution
data. For musculoskeletal conditions, there are no data available specifically for back disorders or
arthritis, so the average benefit for musculoskeletal disorders was used to calculate estimated savings.
Overall average benefit by program was used to calculate “all diagnostic groups” savings. SSA did not
have information concerning average SSI benefits by diagnosis available separately for children and
adults, so a single weighted average was used for both groups using diagnostic and benefit distributions
for all recipients under age 65. Average time spent on disability benefits by diagnosis comes from Riley
and Rupp (2014, Table 3). As Riley and Rupp do not differentiate between programs, the same value was
used for all programs within a diagnosis.
FINDINGS
Understanding the financial costs and benefits of using psychological testing in the SSA
disability determination process is an important, but unfinished, task. The data necessary to make
accurate calculations are limited, and estimates based on available data are subject to
considerable error. That said, the framework for a proper computation is well understood and can
be used to guide data collection and evaluation when testing is and is not employed.
Accurate assessments of the net financial impact of mandatory psychological testing will
require information on the current accuracy of DDS decisions and how the accuracy is improved,
or unaffected, by the use of more standardized testing. It will also be important to determine
which types of tests should be given and to which groups in the applicant population. This
information can then be used to consider the impact on the demand for testing services across the
country and whether or not that demand affects service pricing. All of these components could be
gathered in pilot programs that allow for experimentation and assessment prior to wider
implementation. In addition, the committee found
• The average cost of testing services varies by the type of testing (e.g., psychological,
neuropsychological), by the type of provider (e.g., psychologist or physician,
technician), and geographic area. The variation in pricing implies that the expected
costs to SSA of requiring psychological testing will depend on exactly which tests are
required, the qualifications mandated for testing providers, and the geographical
location of the providers most in demand.
• Estimating the exact cost of broad use of psychological testing by SSA will require
more detailed data on the exact implementation strategy. To fully measure the
potential costs, it is likely that SSA will need to pilot the use of testing and the costs
associated with it.
• Some published estimates of the potential cost savings to SSA associated with the use
of symptoms validity testing and performance validity testing are based on
assumptions that if violated would substantially lower the estimated cost savings.
Potential cost savings associated with testing vary considerably based on the
assumptions about who it is applied to and how many individuals it detects and thus
rejects for disability benefits.
• At present, there do not appear to be any independently conducted studies regarding
the accuracy of the disability determination process as implemented by DDS offices.
• A full financial cost benefit analysis of psychological testing requires will require SSA
to collect additional data both before and after the implementation of the
recommendations of this report.
REFERENCES
Chafetz, M., and J. Underhill. 2013. Estimated costs of malingered disability. Archives of Clinical
Neuropsychology 28(7):633-639.
CMS (Centers for Medicare & Medicaid Services). 2015. Physician fee schedule search tool.
http://www.cms.gov/apps/physician-fee-schedule/search/search-criteria.aspx (accessed January
20, 2015).
IOPC (Inter Organizational Practice Committee). 2013. Use of symptom validity indicators in SSA
psychological and neuropsychological evaluations. Letter to Senator Tom Coburn.
https://www.nanonline.org/docs/PAIC/PDFs/SSA%20and%20Symptom%20Validity%20Tests%
20-%20IOPC%20letter%20to%20Sen%20Coburn%20-%202-11-13.pdf (accessed February 8,
2015).
Larrabee, G. J., S. R. Millis, and J. E. Meyers. 2009. 40 plus or minus 10, a new magical number: Reply
to Russell. Clinical Neuropsychologist 23(5):841-849.
Riley, G. F., and K. Rupp. 2014. Cumulative expenditures under the DI, SSI, Medicare, and Medicaid
programs for a cohort of disabled working-age adults. Health Services Research 50(2):514-536.
doi: 10.1111/1475-6773.12219.
SSA (Social Security Administration). 2014a. DDS performance management report. Disability claims
data. Consultative examination rates, fiscal year 2013. Data prepared by ORDP, ODP, and
ODPMI. Submitted to the IOM Committee on Psychological Testing, Including Validity Testing,
for Social Security Administration Disability Determinations by Joanna Firmin, Social Security
Administration on August 25, 2014.
SSA. 2014b. Disability claims data (initial, reconsideration, continuing, disability review) by
adjudicative level and body system. SSDI, SSI, Concurrent and Total claims allowance rates for
claims with consultative examinations by U.S. States, fiscal year 2013. Data prepared by ORDP,
ODP, ODPMI. Submitted to the IOM Committee on Psychological Testing, Including Validity
Testing, for Social Security Administration Disability Determinations by Joanna Firmin, Social
Security Administration on August 25, 2014.
SSA. 2014c. National data Title II-SSDI, Title XVI-SSI, & Concurrent Title II/XVI initial disability
determinations by regulation basis code (reason for decision), fiscal year 2013. All cases except
mental disorders (other than intellectual disability) and arthritis and back diorders. Data
prepared by SSA, ORDP, ODP, ODPMI. Submitted to the IOM Committee on Psychological
Testing, Including Validity Testing, for Social Security Administration Disability Determinations
by Joanna Firmin, Social Security Administration on October 23, 2014.
SSA. 2014d. National data Title II-SSDI, Title XVI-SSI, & Concurrent Title II/XVI initial disability
determinations by regulation basis code (reason for decision), fiscal year 2013. Arthritis and
back disorders only. Data prepared by SSA, ORDP, ODP, and ODPMI. Submitted to the IOM
Committee on Psychological Testing, Including Validity Testing, for Social Security
Administration Disability Determinations by Joanna Firmin, Social Security Administration on
October 23, 2014.
SSA. 2014e. National data Title II-SDI, Title XVI-SSI, & Concurrent Title II/XVI initial disability
determinations by regulation basis code (reason for decision), fiscal year 2013. Mental disorders
only (excluding intellectual disability). Data prepared by SSA, ORDP, ODP, ODPMI. Submitted
to the IOM Committee on Psychological Testing, Including Validity Testing, for Social Security
Administration Disability Determinations by Joanna Firmin, Social Security Administration on
October 23, 2014.
SSAB (Social Security Advisory Board). 2012. Aspects of disability decision making: Data and
materials. Washington, DC: SSAB.
Sweet, J. J., D. G. Meyer, N. W. Nelson, and P. J. Moberg. 2011. The TCN/AACN 2010 “salary survey”:
Professional practices, beliefs, and incomes of us neuropsychologists. Clinical Neuropsychologist
25(1):12-61.
• The two largest impairment categories for Supplemental Security Income (SSI) (adults
and children) and Social Security Disability Insurance (SSDI) are mental disorders
(excluding intellectual disabilities) and musculoskeletal and connective tissue
disorders. Within these two categories, a significant fraction of the claimants have
conditions, including affective mood disorders and disorders of the back, for which the
presence and severity of impairment and associated functional limitations are based
largely on applicant self-report.
• SSA disability determinations are based on the medical and all relevant evidence in an
applicant’s case record. Physical or mental impairments must be established by
objective medical evidence consisting of medical signs and laboratory findings, which
may include psychological and other standardized test results. SSA establishes the
presence of a medically determinable impairment in individuals with mental disorders
other than intellectual disability through the use of standard diagnostic criteria, which
include symptoms and signs. Evidence for these mental impairment claims, as well as
for claims for conditions in which the somatic symptoms are disproportionate to
physical findings (e.g., somatoform disorder, multisystem illness, and chronic pain),
relies less on standard laboratory tests than for some other categories of impairment.
The validity of the self-reported symptoms and/or impairment severity may be called
into question due to the absence of objective medical evidence or biomarkers that
could explain or substantiate the applicant’s self-report of distress and disability.
• In some cases, SSA disability examiners must evaluate the credibility of statements by
individuals about the intensity and persistence of their symptoms and the effect on the
individual’s ability to function and perform work-related activities. When a disability
claim is based primarily on an applicant’s self-report of symptoms and self-reported
statements about their intensity, persistence, and limiting effects, SSA relies on an
assessment of the consistency of the self-report with all of the evidence in the
claimant’s medical evidence record.
7-1
PREPUBLICATION COPY: UNCORRECTED PROOFS
• There currently is great variability in allowance rates for both SSI and SSDI among
states that is not fully accounted for by differences in the populations of applicants. In
addition, there is great variability in the appeal rulings among administrative law
judges within and across states.
• Psychological consultative examinations often consist of nonstandardized diagnostic
interviews and a mental status exam, with little or no standardized psychological
testing. Because clinicians generally are not as good at interpreting clinical and
standardized test data as are established actuarial methods, reliance on established
actuarial methods (when available) to interpret the data will improve the accuracy of
diagnostic evaluations.
• Each Disability Determination Services agency, within the confines of SSA policy,
issues its own rules regarding the tests that may be purchased as part of a consultative
examination. Aside from the use of intelligence tests as described in the listings for
intellectual disability and certain neurological impairments, SSA does not require or
specify the purchase of any type of (or individual) psychological test. SSA provides
general guidance that good psychological tests are valid and reliable and have
appropriate normative data. For this reason, there is variation among states about when
and which standardized psychological tests can be purchased, with the exception of
performance validity tests (PVTs) and symptom validity tests (SVTs), which are
precluded from purchase by SSA except in rare cases such as a court order.
• The results of standardized cognitive tests and non-cognitive psychological tests that
are appropriately administered, interpreted, and validated can provide objective
evidence to help identify and document the presence and severity of medically
determinable mental impairments at Step 2 of SSA’s disability determination process.
In addition, standardized cognitive test results can provide objective evidence to help
identify and assess the severity of work-related cognitive functional impairment
relevant to disability evaluations at the listing level (Step 3) and to mental residual
functional capacity (Steps 4 and 5).
• Current data on the prevalence of inconsistent reporting of symptoms or performing
below one’s capability on cognitive tests are very imprecise. In the context of SSA
disability applicants, neither scenario rules out disability, but both suggest the need for
additional assessment of the alleged impairment with the goal of making an accurate
determination of disability.
• When a disability claim is based primarily on an applicant’s self-report of symptoms
and self-reported statements about their intensity, persistence, and limiting effects,
SSA relies on an assessment of the consistency of the self-report with all of the
evidence in the applicant’s medical evidence of record.
• SVTs and PVTs provide information about the validity of standardized non-cognitive
and cognitive test results when administered as part of the test or test battery and are
an important addition to the medical evidence of record for specific groups of
applicants. Validity tests do not provide information about whether or not the
individual is, in fact, disabled.
• Because SVTs and PVTs are used to help assess the validity of an individual’s
standardized non-cognitive and/or cognitive psychological test results respectively, it
is important that SVTs and PVTs only be administered in the context of a larger test
battery and only be used to interpret information from that battery.
• Current SSA policy precludes the purchase of SVTs and PVTs to help inform
determinations about the credibility of an individual’s statements or about possible
malingering. Specific tests outlined as examples in this policy include not only stand-
alone PVTs and SVTs (e.g., Test of Memory Malingering, Validity Indicator Profile,
Structured Interview of Reported Symptoms), but also psychological self-report
measures that contain symptom validity scales (e.g., Minnesota Multiphasic
Personality Inventory-2, Millon Clinical Multiaxial Inventory) among other scales of
psychological functioning. This policy is inconsistent with the practice of other
disability benefit programs, such as the Veterans Benefits Administration, private
disability insurers, and some international disability programs.
• Although there currently are no data on the rates of false positives and false negatives
in SSA disability determinations, systematic use of standardized psychological testing
for a broader set of physical and mental impairments than is current practice is
expected to improve the accuracy and consistency of disability determinations for
applicants who allege cognitive impairment or whose allegation of functional
impairment is based solely on self-report.
• Assessment of symptom validity, including the use of SVTs, analysis of internal data
consistency, and other corroborative evidence, helps the evaluator to interpret the
accuracy of an individual’s self-report of behavior, experiences, or symptoms and
responses on standardized non-cognitive psychological measures. For this reason, it is
important to include an assessment of symptom validity when non-cognitive
psychological measures are administered.
• Evidence of inconsistent self-report based on an assessment of symptom validity is
cause for concern with regard to self-reported symptoms but does not provide
information about whether or not the individual is, in fact, disabled. A lack of validity
on symptom validity testing alone is insufficient grounds for denying a disability
claim, although additional information would be required to assess the claimants’
allegation of disability.
documented history, and the like. It is important to note that a finding of inconsistency between
the test results and the areas specified is more informative than a finding of consistency would
be.
The committee’s recommendation here and in the following recommendation that SSA
“pursue additional evidence of the applicant’s allegation” for cases in which validation is not
achieved means that the test results in those cases are an insufficient basis to make a
determination regarding disability status.
The committee reached the following conclusions and recommendation about the
qualifications for the administration and interpretation of standardized psychological tests:
• Use of standardized procedures for the administration of standardized non-cognitive
and cognitive psychological tests enables application of normative data to the
individual being evaluated. Without standardized administration, the test takers’
performance may not accurately reflect their ability. It is important that any person
administering cognitive or neuropsychological tests be well trained in the
administration protocols for those particular tests, possess the interpersonal skills
necessary to build rapport with the test-taker, and understand important psychometric
properties, including validity and reliability, as well as factors that could emerge
during testing to place either at risk.
• Interpretation of standardized psychological test results is more than a report of the
standardized test scores; it requires assigning meaning to the scores within the
individual context of the specific examinee. As such, interpretation of test results
requires a higher level of clinical training than does the administration alone of some
psychological tests.
• Licensed psychologists and neuropsychologists are the specialists qualified to interpret
the results of most standardized psychological and neuropsychological tests. Under
close supervision and direction of licensed psychologists and neuropsychologists, it is
standard practice for psychometrists or technicians with specialized training to
administer and score tests. Test manuals specify the qualifications necessary for
administration, scoring, and interpretation of the test or measure.
• It is important as well that the individual responsible for making the disability
determination (disability examiner or administrative law judge) have the training and
experience to understand and evaluate the report provided by the psychologist or
neuropsychologist.
ECONOMIC CONSIDERATIONS
The committee concluded the following with respect to the complex economic
considerations raised by increased systematic use of standardized psychological testing by SSA
as recommended:
• The average cost of testing services varies by the type of testing (e.g., psychological,
neuropsychological), by the type of provider (e.g., psychologist or physician,
technician), and by geographic area. The variation in pricing implies that the expected
costs to SSA of requiring psychological testing will depend on exactly which tests are
required, the qualifications mandated for testing providers, and the geographical
location of the providers most in demand.
• Estimating the exact cost of broad use of psychological testing by SSA will require
more detailed data on the exact implementation strategy. To fully measure the
potential costs, it is likely that SSA will need to pilot the use of testing and the costs
associated with it.
• At present, there do not appear to be any independently conducted studies regarding
the accuracy of the disability determination process as implemented by DDS offices.
Some published estimates of billions of dollars in potential cost savings to SSA
associated with the use of symptom validity testing and performance validity testing
are based on assumptions that if violated would substantially lower the estimated cost
savings. Potential cost savings associated with testing vary considerably based on the
assumptions about who it is applied to and how many individuals it detects and thus
rejects for disability benefits.
• A full financial cost benefit analysis of psychological testing will require SSA to
collect additional data both before and after the implementation of the
recommendations of this report.
Based on its examination of the literature and dialogues with experts in a variety of areas,
including psychological and neuropsychological testing, performance validity testing and
symptom validity testing, and the disability evaluation process both within SSA and in other
arenas, the committee recognizes many questions remain with regard to the use of standardized
psychological testing in the disability determination process.
As part of its assessment of the use of standardized psychological tests for the disability
evaluation process, the committee was asked to discuss the costs and cost-effectiveness of
requiring a single test or a combination of tests. This report provides an initial framework for
evaluating the economic costs and highlights the types of data that will be needed to accurately
determine the financial impact of implementing the committee’s first two recommendations. The
following conclusions and recommendation relate to this enterprise.
• Accurate assessments of the net financial impact of psychological testing as
recommended by the committee will require information on the current accuracy of
DDS decisions and how the accuracy is affected by the increased use of standardized
psychological testing.
• The absence of data on the rates of false positives and false negatives in current SSA
disability determinations precludes any assessment of their accuracy and consistency.
• There currently is great variability in allowance rates for both SSI and SSDI among
states that is not fully accounted for by differences in the populations of applicants.
There also is great variability in the disability determination appeal rulings among
administrative law judges within and across states. Although it is not possible to know
definitively whether the large share of unexplained variation in state filing, award, and
allowance rates is driven by variability in the federal disability determination process,
there is some evidence that states differ in how they manage claims.
• In light of this unexplained variability, systematic use of standardized psychological
testing as recommended by the committee is expected to improve the accuracy and
consistency of disability determinations.
Over the course of the project, the committee identified two areas in particular in which it
expects that the results of further research would help to inform disability determination
processes as indicated in the following conclusions and recommendation.
• Additional research is needed on the use of SVTs and PVTs in populations
representative of the pool of disability applicants, including in terms of gender,
ethnicity, race, primary language, educational level, medical condition, and the like. In
particular, additional research on the development of appropriate criterion or cut-off
scores for PVTs and SVTs in these populations for the purposes of disability
evaluation would be beneficial.
• The committee’s task was to evaluate the value of psychological testing in the
disability determination process, as reflected in the foregoing recommendations.
However, the committee recognizes that just as systematic use of standardized
psychological testing is expected to improve the accuracy and consistency of disability
determinations for applicants who allege cognitive impairment or whose allegation of
functional impairment is based solely on self-report, the use of other standardized
assessment tools also may be expected to improve the accuracy of disability
determinations. The value of standardized assessment tools, including psychological
tests, to assessments of individuals’ work-related functional capacity is an area that
would benefit from further research.
Appendix A
Room 106
Keck Center of the National Academies
500 Fifth St, NW
Washington, DC
AGENDA
A-1
DISCUSSION
12:00 p.m. Break for lunch
1:00 p.m. Use of psychological tests, including SVTs, in select populations
Moderator—Lisa A. Suzuki, Ph.D., Committee Member
Validity testing in pediatric populations
Michael Kirkwood, Ph.D., Associate Clinical Professor, Physical Medicine and
Rehabilitation, University of Colorado School of Medicine and Children’s
Hospital Colorado, Aurora, Colorado
DISCUSSION
3:00 p.m. Break
3:15 p.m. Use of psychological tests in disability determinations in other systems
Moderator—Alan M. Jette, M.P.H., Ph.D., Committee Member
Veterans Affairs policies and/or practices surrounding the use of
psychological tests and symptom validity tests in the disability determination
process
Stacey Pollack, Ph.D., Director of Program Policy Implementation, Mental Health
Services, Veterans Affairs Central Office, Washington, DC
APPENDIX A A-3
DISCUSSION
5:10 p.m. Closing remarks
Herbert Pardes, M.D., Committee Chair
5:15 p.m. Adjourn
Room 100
Keck Center of the National Academies
500 Fifth St, NW
Washington, DC
AGENDA
8:40 a.m. Discussion with the committee on the use of psychological, symptom validity,
and performance validity testing in disability evaluations
Moderator—Peter A. Ubel, M.D., Committee Member
10:35 a.m. Discussion with the committee on the use of psychological, symptom validity,
and performance validity testing in disability evaluations (continued)
12:45 p.m. Disability Determination Services panel discussion with the committee
Moderator—Mary C. Daly, Ph.D., Committee Member
Jennifer Nottingham, President, National Association of Disability Examiners;
Supervisor, Ohio Disability Determination Service
Charles A. Jones, Director, Michigan Disability Determination Service
Tom A. Ward, Past President, National Association of Disability Examiners;
Supervisor, Michigan Disability Determination Service
Jeffrey H. Price, President Elect, National Association of Disability Examiners;
Disability Determination Specialist III, Health and Human Services Department,
North Carolina
Nancy Heiser, Ph.D., Psychological Consultant, Washington, DC, Department of
Disability Services
2:15 p.m. Disability Determination Services panel discussion with the committee
(continued)
Appendix B
Biographical Sketches of
Committee Members
Herbert Pardes, M.D.,(Chair) is Executive Vice Chair of the Board of Trustees of New York-
Presbyterian Hospital. He formerly served as President and Chief Executive Officer of New
York-Presbyterian Hospital and the New York-Presbyterian Healthcare System. His origins are
in the field of psychiatry, and he has an extensive background in healthcare and academic
medicine. He is nationally recognized for his broad expertise in education, research, clinical care,
and health policy, and as an ardent advocate of support for academic medicine. Dr. Pardes served
as Director of the National Institute of Mental Health (NIMH) and U.S. Assistant Surgeon
General during the Carter and Reagan Administrations (1978-1984). Dr. Pardes left NIMH in
1984 to become Chair of the Department of Psychiatry at Columbia Universitys College of
Physicians and Surgeons and in 1989 was also appointed Vice President for Health Sciences for
Columbia University and Dean of the Faculty of Medicine at the College of Physicians and
Surgeons. He served as President of the American Psychiatric Association (1989), as Chair of the
Association of American Medical Colleges (AAMC) (1995-1996), and as Chair of the AAMCs
Council of Deans (1994-1995). In addition, he served two terms as Chair of the New York
Association of Medical Schools. Dr. Pardes chaired the Intramural Research Program Planning
Committee of the NIH from 1996 to 1997, served on the Presidential Advisory Commission on
Consumer Protection and Quality in the Healthcare Industry, and is President of the Scientific
Council of the National Alliance for Research on Schizophrenia and Depression. He serves on
numerous editorial boards, has written more than 155 articles and chapters on mental health and
academic medicine topics, and has negotiated and conducted international collaborations with a
variety of countries including India, China, and the former Soviet Union. Dr. Pardes has earned
numerous honors and awards, including the U.S. Army Commendation Medal (1964), the Sarnat
International Prize in Mental Health (1997), election to the Institute of Medicine of the National
Academy of Sciences (1997), and election to the American Academy of Arts and Sciences
(2002). Dr. Pardes received his medical degree from the State University of New York-
Downstate Medical Center (Brooklyn) in 1960. He received his bachelor of science degree
summa cum laude from Rutgers University in 1956. He completed his internship and residency
training in psychiatry at Kings County Hospital in Brooklyn and also did psychoanalytic training
at the New York Psychoanalytic Institute.
Arthur J. Barsky III, M.D., is Professor of Psychiatry at Harvard Medical School and Vic
Chair for Research in the Department of Psychiatry at the Brigham and Women’s Hospital in
Boston, Massachusetts. His major interests are hypochondriasis and somatization, the
psychological factors that affect symptom reporting in the medically ill, and the cognitive and
behavioral treatment of somatic symptoms. Dr. Barsky has been the principal investigator of nine
National Institute of Mental Health (NIMH) and National Institutes of Health (NIH) research
grants in these areas. He has authored 140 articles, 23 book chapters, and the books Worried
Sick: Our Troubled Quest for Wellness and Feeling Better. Dr. Barsky received the President’s
Research Award from the American Psychosomatic Society. He has been a Faculty Fellow of the
Mind/Brain/Behavior Interfaculty Initiative of Harvard University, and was a member of the
work group to revise The Diagnostic and Statistical Manual of Mental Disorders (DSM 5). He
has been a visiting professor at the Georgetown University School of Medicine, the University of
Wisconsin Medical School, the University of Illinois College of Medicine, Dartmouth Medical
School, and the Allegheny University of the Health Sciences. He is a Distinguished Life Fellow
of the American Psychiatric Association, a Fellow of the American College of Psychiatrists, and
served on the Council of the American Psychosomatic Society. Dr. Barsky graduated from
Williams College and the Columbia University College of Physicians and Surgeons. He interned
at the Beth Israel Medical Center in New York City and completed a residency in psychiatry at
the Massachusetts General Hospital in Boston, where he remained on the full-time faculty until
1993 when he moved to the Brigham and Women’s Hospital.
Mary C. Daly, Ph.D., is Senior Vice President and Associate Director of Economic Research at
the Federal Reserve Bank of San Francisco. Dr. Daly’s research spans public finance, labor, and
welfare economics, and she has published widely on topics related to labor market fluctuations,
public policy, income inequality, and the economic well-being of less advantaged groups. She
previously served as a visiting scholar with the Congressional Budget Office, as a member of the
Social Security Advisory Board’s Technical Panel, and the National Academy of Social
Insurance Committee on the Privatization of the Social Security Retirement Program. She has
published on the economics of the Social Security system. She currently serves on the editorial
board of the journal Industrial Relations. Dr. Daly joined the Federal Reserve as Economist in
1996 after completing a National Institute on Aging postdoctoral fellowship at Northwestern
University. Dr. Daly earned a Ph.D. in Economics from Syracuse University. She joined the
Institute for the Study of Labor (IZA) as a Research Fellow in February 2014.
Kurt F. Geisinger, Ph.D., is Director of the Buros Center on Testing and WC Meierhenry
Distinguished University Professor at the University of Nebraska. He previously was Professor
and Chair of the Department of Psychology at Fordham University, Professor of Psychology and
Dean of Arts and Sciences at the State University of New York at Oswego (SUNY-Oswego),
Professor of Psychology and Academic Vice President at LeMoyne College, and Professor of
Psychology and Vice President for Academic Affairs at the University of St. Thomas, in
Houston, Texas. He has served the maximum two terms as council representative for the
Division of Measurement, Evaluation, and Statistics in the American Psychological Association,
which he also represented on the International Organization for Standardization’s (ISO)
International Test Standards committee. He was elected President of the Coalition for Academic,
Scientific, and Applied Psychology for the 2009 year, to the board of the International Test
Commission, and to the American Psychological Association’s Board of Directors. He currently
serves as Treasurer for the International Test Commission. His primary interests lie in validity
APPENDIX B B-3
theory, admissions testing, proper test use, test use with individuals with disabilities, the testing
of language minorities, and the translation or adaptation of tests from one language and culture to
another. Previously Dr. Geisinger was an American Psychological Association (APA) delegate
and chair of the Joint Committee on Testing Practices (1992-1996), a member of APA’s
Committee on Psychological Testing and Assessment, Chair of the Graduate Record
Examination Board, Chair of the Technical Advisory Committee for the Graduate Record
Examination, a member of the SAT Advisory Committee, a member of National Council on
Measurement in Educations’(NCME) Ad Hoc Committee to Develop a Code of Ethical
Standards Committee, and has served on numerous other ad hoc task forces and panels. He
chaired the College Board’s Research and Development Committee and is currently chair of the
Council for the Accreditation of Educator Preparation’s Research Committee, having served on
their Commission on Standards and Performance Reporting. He is editor of Applied
Measurement in Education and serves or has served on the editorial committees for the eight
other journals. He has edited or co-edited the Psychological Testing of Hispanics and Test
Interpretation and Diversity, both with APA books, as well as the 17th, 18th and 19th Mental
Measurements Yearbooks. He served as editor-in-chief for the Handbook of Testing and
Assessment in Psychology, published by APA Books in 2013 and his vastly revised volume,
Psychological Testing of Hispanics: Clinical and Intellectual Issues is in press, also with APA
Books.
Naomi Lynn Gerber, M.D., is University Professor and Director of the Center for the Study of
Chronic Illness and Disability in the College of Health and Human Services at George Mason
University. She works in the areas of measurement and treatment of impairments and disability
in patients with musculoskeletal deficits (including children with osteogenesis imperfecta;
persons with rheumatoid arthritis and cancer). Her research investigates causes of functional loss
and disability in chronic illness. Specifically, she studies human movement and the mechanisms
and treatment of fatigue. Dr. Gerber is/has been a recipient of National Science Foundation, PNC
Foundation, National Institute on Disability and Rehabilitation Research (NIDRR), National
Institutes of Health (NIH), and Department of Defense funding administered by the Henry
Jackson Foundation. She was the Chief of the Rehabilitation Medicine Department at the
Clinical Center of the National Institutes of Health in Bethesda, Maryland, from 1975 to 2005.
She has been the recipient of the Distinguished Service Award of the American Academy of
Physical Medicine and Rehabilitation (AAMPR) and the Oncology Section of American
Physical Therapy Association, the Distinguished Academician Award of the Association of
Academic Physiatrists, the WISE/Geico award, NIH Directors Award, Surgeon General Award
for Exemplary Service and the Smith College Medal. Dr. Gerber has served on many national
committees and advisory boards including: Osteogenesis Imperfecta Foundation (1995-present),
Kessler Medical Rehabilitation Research (2001-present), National Center for Medical
Rehabilitation Research, (2007- 2011), Blue Ribbon Panel Assessing Rehabilitation /Research,
NIH (2011-2012). She is/has been a grant reviewer for NIDRR, NIH, National Science
Foundation, and the Veterans Affairs. She served on the Board of Governors of the AAPMR
2005-2008. Dr. Gerber is a member of the Institute of Medicine of the National Academy of
Sciences. In 2013 she delivered the Zeiter Lecture at the AAPMR 75th anniversary. Dr. Gerber is
a graduate of Tufts University School of Medicine, diplomate of the American Board of Internal
Medicine, Rheumatology sub-specialty, and the American Board of Physical Medicine and
Rehabilitation.
Alan M. Jette, P.T., M.P.H., Ph.D., is Professor of Health Policy and Management at the
Boston University School of Public Health. Dr. Jette is an international expert in the
measurement and evaluation of functioning and health outcomes and in the measurement,
epidemiology, and prevention of disability. His work has addressed the need to bring conceptual
clarity to the measurement of patient-centered outcomes in a range of challenging clinical areas
such as work disability, spinal cord injury and neurologic, orthopedic, and geriatric conditions.
He chaired the Institute of Medicine (IOM) Panel that authored the 2007 Institute of Medicine
report, The Future of Disability in America, and currently co-chairs the Institute of Medicine
Forum on Aging, Disability, and Independence. Dr. Jette received a B.S. in Physical Therapy
from the State University of New York at Buffalo in 1973 and his M.P.H. (1975) and Ph.D.
(1979) in Public Health from the University of Michigan.
Lisa A. Suzuki, Ph.D., is Associate Professor in the Department of Applied Psychology at the
Steinhardt School of Culture, Education, and Human Development of New York University.
Prior to this, she served as a faculty member in counseling psychology at Fordham University
and the University of Oregon. Dr. Suzuki received the Distinguished Contribution Award from
the Asian American Psychological Association in 2006 and Visionary Leadership Award from
the National Multicultural Conference and Summit in 2007. She has written extensively in the
area of multicultural issues in psychological assessment, and her work appears in chapters of the
Handbook of Multicultural Counseling, American Psychological Association (APA) Handbook
of Testing and Psychology, APA Handbook of Counseling Psychology, Handbook of
Psychology, APA Handbook of Multicultural Psychology, and the Cambridge Handbook of
Intelligence. She is senior editor of the Handbook of Multicultural Assessment and a co-editor of
the Handbook of Multicultural Counseling. She is co-author of Intelligence Testing and Minority
Students (Valencia and Suzuki, 2001). Dr. Suzuki obtained her Ph.D. from the University of
Nebraska- Lincoln, in 1992.
APPENDIX B B-5
other cognitive impairments reach their highest potential social and occupational functioning.
She supervises psychology interns and practicum students at UCSD Outpatient Psychiatric
Services and the Veterans Affairs San Diego Healthcare System. She also conducts a
neuropsychological assessment clinic at the St. Vincent De Paul Medical Clinic. Dr. Twamleys
research focuses on bridging neuropsychology and interventions for individuals with severe
mental illness or traumatic brain injury. Current intervention studies focus on supported
employment and compensatory cognitive training. Other research interests include the
neuropsychology of everyday functioning, genetic markers of cognition in schizophrenia, and
cognitive impairment in post traumatic stress disorder (PTSD). Dr. Twamley earned a B.A. in
Social Ecology at University of California, Irvine and a Ph.D. in Clinical Psychology from
Arizona State University. She completed her clinical psychology internship and postdoctoral
fellowship at UCSD and joined the faculty of the Department of Psychiatry in 2003.
Peter A. Ubel, M.D., is the Madge and Dennis T. McLawhorn University Professor of Business
at the Fuqua School of Business and Professor of Public Policy at the Sanford School of Public
Policy at Duke University. He is a physician and behavioral scientist specializing in health policy
and economics, whose research and writing explores the mixture of rational and irrational forces
that affect health, happiness and the way society functions. His research explores controversial
issues about the role of values and preferences in health care decision making, from decisions at
the bedside to policy decisions. He uses the tools of decision psychology and behavioral
economics to explore topics like informed consent, shared decision making and health care cost
containment. His books include Pricing Life: Why Its Time for Healthcare Rationing (MIT Press
2000) and Free Market Madness: How Economics Is at Odds with Human Nature-and Why It
Matters (Harvard Business Press, 2009). His newest book, Critical Decisions (HarperCollins,
2012) explores the challenges of shared decision making between doctors and patients. Dr. Ubel
previously was Professor of Medicine and Psychology at the University of Michigan, where he
taught from 2000 to 2010, and later went on to direct the Center for Behavioral and Decision
Sciences in Medicine. Dr. Ubel received his B.A. from Carleton College and his M.D. from the
University of Minnesota.
Jacqueline Remondet Wall, Ph.D., is Professor in the School of Psychological Sciences at the
University of Indianapolis and Director of the Office of Program Consultation and Accreditation
at the American Psychological Association in Washington, DC, where she is an Associate
Executive Director in the Education Directorate. Her professional and research interests include
assessment, selection, training, and evaluation. Dr. Wall received her Ph.D. from the University
of Tulsa with a specialization in industrial and organizational psychology and a post-doctoral
respecialization in clinical rehabilitation and neuropsychology at the Illinois Institute of
Technology, the Medical School of the University of Mississippi, and the Rehabilitation Institute
of Michigan.
Glossary
Activity limitations: difficulties an individual may have in executing activities (IOM, 2007,
WHO 2001)
Cognitive Test: standardized measure of task performance used to assess cognitive functioning
(e.g.., intellectual capacity, attention and concentration, processing speed, language and
communication, visual-spatial abilities, memory)
Disability: decrements in all three aspects of human functioning (body functions and structures,
activities, and participation), which are labeled impairments, activity limitations, and
participation restrictions (IOM, 2007, WHO 2001) the limitation on an individual’s abilities to
perform certain activities of daily life (e.g., school- or work-related, personal care, social
interactions)
impairment(s) which can be expected to result in death or which has lasted or can be expected to
last for a continuous period of not less than 12 months” (SSA, 2012); in children, a medically
determinable physical or mental impairment or combination of impairments that causes marked
and severe functional limitations, and that can be expected to cause death or that has lasted or
can be expected to last for a continuous period of not less than 12 months (SSA, 2014).
Effort: the extent to which the examinee performed to actual capacity on a test (Bush et al.,
2005).
Performance validity: the validity of actual ability task performance; often referred to as effort
in the literature (Larrabee, 2012, 2014)
APPENDIX C C-3
Psychological testing: the use of formal, standardized procedures for sampling behavior that
ensure objective evaluation of the test-taker regardless of who administers the test (Furr and
Bacharach, 2013; Hubley and Zumbo, 2013). Major categories of psychological tests include (1)
intelligence tests, (2) neuropsychological tests, (3) personality tests, (4) clinical or diagnostic
tests (e.g., depression, anxiety), (5) achievement tests, (6) aptitude tests, and (7) occupational or
interests tests
Psychometrics: the scientific study, including the development, interpretation, and evaluation,
of psychological tests and measures used to assess variability in behavior and link such
variability to psychological phenomena (Furr and Bacharach, 2013; Hubley and Zumbo, 2013)
Reliability: the degree to which a test produces stable and consistent results (Geisinger, 2013).
Self-report of symptoms: the claimant’s own description of his or her physical or mental
impairment; in some cases, symptoms may be reported by a third party (e.g., children’s
symptoms may be reported by parent or teacher) (20 CFR § 404.1528)
Substantial gainful activity: “work that involves doing significant and productive physical or
mental duties and is done (or intended) for pay or profit” (20 CFR § 416.910)
Symptom validity test: embedded or stand-alone measures used to assess whether an examinee
is providing an accurate report of their actual symptom experience on non-cognitive
psychological measures (e.g., emotional, behavioral, and personality measures) (Larrabee, 2012,
2014)
Validity: the degree to which evidence and theory support the use and interpretation of test
scores (AERA et al., 2014)
REFERENCES
AERA (American Educational Research Association), APA (American Psychology Association), and
NCME (National Council on Measurement in Education). 2014. Standards for educational and
psychological testing. Washington, DC: AERA.
APA (American Psychological Association). 2010. Public description of clinical neuropsychology.
http://www.apa.org/ed/graduate/specialize/neuro.aspx (accessed June 24, 2014).