0% found this document useful (0 votes)
36 views414 pages

Screening For Depression in Clinical Practice An Evidence-Based Guide

The document discusses the importance of screening for depression in clinical practice, highlighting the challenges in detection and the evolving nature of treatment. It emphasizes the need for accurate diagnostic tools and the integration of screening into enhanced care models to improve patient outcomes. The material serves as an evidence-based guide for clinicians, detailing various aspects of depression diagnosis and management across different medical settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views414 pages

Screening For Depression in Clinical Practice An Evidence-Based Guide

The document discusses the importance of screening for depression in clinical practice, highlighting the challenges in detection and the evolving nature of treatment. It emphasizes the need for accurate diagnostic tools and the integration of screening into enhanced care models to improve patient outcomes. The material serves as an evidence-based guide for clinicians, detailing various aspects of depression diagnosis and management across different medical settings.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 414

Screening for Depression in Clinical Practice

This material is not intended to be, and should not be considered, a


substitute for medical or other professional advice. Treatment for the
conditions described in this material is highly dependent on the individual
circumstances. While this material is designed to offer accurate information
with respect to the subject matter covered and to be current as of the time it
was written, research and knowledge about medical and health issues is
constantly evolving, and dose schedules for medications are being revised
continually, with new side effects recognized and accounted for regularly.
Readers must therefore always check the product information and clinical
procedures with the most up-to-date published product information and data
sheets provided by the manufacturers and the most recent codes of conduct
and safety regulation. Oxford University Press and the authors make no
representations or warranties to readers, express or implied, as to the accu-
racy or completeness of this material, including without limitation that they
make no representations or warranties as to the accuracy or efficacy of the
drug dosages mentioned in the material. The authors and the publishers do not
accept, and expressly disclaim, any responsibility for any liability, loss, or
risk that may be claimed or incurred as a consequence of the use and/or
application of any of the contents of this material.
SCREENING FOR
DEPRESSION IN
CLINICAL
PRACTICE
An Evidence-Based Guide

ALEX J. MITCHELL, MRCPsych


Consultant and Honorary Senior Lecturer, Department of Liaison
Psychiatry, Leicester General Hospital and University of Leicester, UK

JAMES C. COYNE, PhD


Professor of Psychology, Department of Psychiatry,
University of Pennsylvania Health System

1
2010
1
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.

Oxford New York


Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto

With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam

Copyright  2010 by Oxford University Press, Inc.

Published by Oxford University Press, Inc.


198 Madison Avenue, New York, New York 10016

www.oup.com

Oxford is a registered trademark of Oxford University Press.

All rights reserved. No part of this publication may be reproduced,


stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.

Mitchell, Alex J.
Screening for depression in clinical practice: an evidence-based guide / by Alex J. Mitchell,
James C. Coyne.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-0-19-538019-4
1. Depression, Mental—Diagnosis. 2. Primary care (Medicine)
I. Coyne, James C., 1947– II. Title.
[DNLM: 1. Depressive Disorder—diagnosis. 2. Primary Health Care. WM 171 C881s 2009]
RC537.M5625 2009
616.850 27075—dc22
2009007863

9 8 7 6 5 4 3 2 1

Printed in the United States of America


on acid-free paper
Contents

List of Contributors, xi
Preface, xv
Wayne Katon

1. Is the Syndrome of Depression a Valid Concept?, 3


Alex J. Mitchell and Mark Zimmerman
What is Meant by Depression?, 3
Value and Validity of the Syndrome Concept, 7
Diagnostic Checklists (including DSM and ICD), 10
Unstructured (Unassisted) Clinician Diagnosis, 15
Structured and Semi-Structured Assisted Diagnostic
Interviews, 19
Conclusion, 22
References, 24

2. Overview of Depression Scales and Tools, 29


Alex J. Mitchell
Background, 29
The Classic Severity Scales (1960–1980), 36
The New Severity Scales (1981–2008), 39
The Future of Screening Scales, 44
References, 51

3. Why Do Clinicians Have Difficulty Detecting


Depression?, 57
Alex J. Mitchell
Introduction to the Problem of Over- and Under-Detection, 57
Predictors of Detection, 62

v
vi CONTENTS

Patient and Clinician Influences on Detection, 66


Illness-Related Influences on Detection, 71
Conclusions, 74
References, 75

4. How Can Existing Mood Scales Be Improved? How to Test,


Refine, and Improve Existing Scales, 83
Adam B. Smith
Introduction, 83
The Rasch Model and Other Item Response Models, 86
Conclusion, 95
References, 96

5. How Do We Know When a Screening Test is Clinically


Useful?, 99
Alex J. Mitchell
How Do Clinicians Make a Diagnosis?, 99
Scientific Aspects of Diagnostic Accuracy, 103
Clinical Aspects of Diagnostic Accuracy, 105
Testing Screening via Implementation Studies, 109
Conclusions, 111
References, 111

6. Clinical Judgment and the Influence of Screening on


Decision Making, 113
Howard N. Garb
Introduction, 113
Research on Clinical Judgment, 114
The Limits of Screening, 119
References, 120

7. Implementing Screening as Part of Enhanced Care:


Screening Alone is Not Enough, 123
Simon Gilbody and Dan Beck
The Case for Screening, 123
Screening and Enhanced Care for Depression, 128
New and Additional Evidence Relating to Enhanced Care, 128
Is Screening a Necessary Intervention to Improve the Quality and
Outcome of Care?, 129
To Screen or Not to Screen?, 136
References, 137
CONTENTS vii

8. Technological Approaches to Screening and Case


Finding for Depression, 143
William H. Rogers, Debra Lerner, and David A. Adler
Technological Methods of Screening for Depression, 144
Ten Issues When Developing Computerized Screening for
Depression, 147
Examples of Implementation of Computerized
Screening for Depression, 150
Discussion, 153
Conclusion, 154
References, 154

9. Screening for Depression in Primary Care: Can It


Become More Efficient?, 161
Kathryn M. Magruder and Derik E. Yeager
Introduction, 161
Epidemiology of Depression in Primary Care, 162
Is Screening for Depression in Primary Care Worthwhile?, 165
Which Screening Tool Should Be Used?, 169
Implementing Screening in Primary Care, 178
What Developments Are on the Horizon?, 183
Conclusions, 185
References, 185

10. Screening for Depression in Medical Settings: Are Specific


Scales Useful?, 191
Gordon Parker and Matthew Hyett
An Introductory Logic, 191
Depression in the Medically Ill, 192
‘‘False-Positive’’ Depression Reflecting Confounding by Physical
Symptoms Associated with Medical Illness, 193
Screening Measures Used to Assess Depression in the
Medically Ill, 194
Discussion, 198
References, 199

11. Screening for Depression in Medical Settings:


The Case Against Specific Scales, 203
Fariba Babaei and Alex J. Mitchell
Overview of Depression in Physical Disease, 203
Defining Somatic Symptoms, 205
viii CONTENTS

Diagnostic Accuracy of Somatic Symptoms in


Depression, 209
Evidence For and Against Somatic Symptoms when Diagnosing
Comorbid Depression, 211
Implications for Screening, 217
References, 236

12. Screening for Depression in Neurologic Disorders, 241


Andres M. Kanner
Depression in Stroke, 242
Depression in Multiple Sclerosis, 246
Depression in Epilepsy, 249
Depression in Parkinson’s Disease, 255
Conclusions, 258
References, 258

13. Screening for Depression in Cancer Care, 265


Linda E. Carlson, Sheena K. Clifford, Shannon L. Groff,
Olga Maciejewski, and Barry D. Bultz
Prevalence of Depression in Cancer Care, 265
Screening Methods for Depression, 266
Screening for Depression in Oncology, 267
Implementing Screening Programs in Oncology Settings, 276
Special Issues in Screening Cancer Patients, 292
Summary, Integration, Future Directions, 293
Acknowledgments, 294
References, 295

14. Screening for Depression in Perinatal Settings, 299


Jodi Barton and Philip Boyce
Introduction: Perinatal Screening in Context, 299
Why Screen, and What Are We Screening For?, 301
Screening Practices in Perinatal Settings, 303
Screening Guidelines and Recommendations, 304
Evidence-Based Comparison of Screening Methods, 305
Implementation in Practice: Does Screening Make any
Real-World Difference?, 310
Service Delivery and Treatment Implications, 311
Summary and Key Recommendations, 313
References, 314
CONTENTS ix

15. Screening in Cardiovascular Care, 317


Brett D. Thombs and Roy C. Ziegelstein
Depression in Cardiovascular Disease, 318
The Prevalence of Depression in Cardiovascular Disease, 319
Screening Instruments for Depression in Cardiovascular Care, 320
Recommendations for Evaluation and Treatment of Patients
in Cardiovascular Care, 326
Conclusions, 328
References, 329

16. Screening in Diabetes Care: Detecting and Managing


Depression in Diabetes, 335
Norbert Hermanns and Bernhard Kulzer
Depression in Diabetes is a Major Health Problem, 337
Screening Tests, 340
Treatment Options, 343
Screening Program, 344
Conclusions for Clinical Practice, 345
References, 346

17. Commentary and Integration: Is it Time to Routinely


Screen for Depression in Clinical Practice?, 349
James C. Coyne
Integration: Deflating the Puffer Phenomenon and Making
the Case Against Screening, 364
References, 366

Appendix, 371
Index, 385
This page intentionally left blank
List of Contributors

David Adler, Professor of Psychiatry and Medicine, Tufts University School


of Medicine, and Senior Psychiatrist, Department of Psychiatry and ICRHPS,
Tufts Medical Center

Fariba Babaei, Specialist Trainee in Psychiatry, Lincolnshire Partnership


Trust, Grantham, UK

Jodi Barton, Research Co-ordinator, Westmead Perinatal Psychiatry &


Clinical Research Unit, Westmead Hospital

Dan Beck, Research Fellow, Department of Health Sciences, University of


York, UK
Philip Boyce, Professor of Psychiatry, Department of Psychological Medicine,
University of Sydney, Westmead Hospital
Barry D. Bultz, Director, Department of Psychosocial Resources, Tom Baker
Cancer Centre, and Head and Adjunct Professor, Division of Psychosocial
Oncology, Department of Oncology, Faculty of Medicine, University of Calgary,
Calgary, Alberta, Canada

Linda E. Carlson, Enbridge Research Chair in Psychosocial Oncology,


Associate Professor, Division of Psychosocial Oncology, Department of
Oncology, Faculty of Medicine, University of Calgary, and Clinical
Psychologist, Tom Baker Cancer Centre, Calgary, Alberta, Canada
Sheena K. Clifford, Department of Psychosocial Resources, Tom Baker
Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary,
Alberta, Canada

xi
xii LIST OF CONTRIBUTORS

James C. Coyne, Director, Behavioral Oncology Program, Abramson Cancer


Center, and Professor of Psychology, Department of Psychiatry, University of
Pennsylvania School of Medicine

Howard N. Garb, Lackland Air Force Base

Simon Gilbody, Professor of Psychological Medicine and Health Services


Research, Department of Health Sciences, University of York, UK

Shannon L. Groff, Department of Psychosocial Resources, Tom Baker


Cancer Centre, Alberta Cancer Board
Norbert Hermanns, Head of the Research Institute of the Diabetes Academy
Mergentheim

Matthew Hyett, Research Assistant, Black Dog Institute, Sydney, Australia


Andres M. Kanner, Department of Neurological Sciences, Rush
Medical College, Rush Epilepsy Center, Rush University Medical Center,
Chicago, IL

Wayne Katon, Professor and Vice Chair of Psychiatry and Behavioral


Sciences, Director of Division of Health Services and Epidemiology,
University of Washington Medical School, Seattle, WA
Bernhard Kulzer, Head of the Psychosocial Department of the Diabetes
Centre Mergentheim
Debra Lerner, Associate Professor of Medicine and Psychiatry, Tufts
University School of Medicine (TUSM), and Senior Researcher, ICRHPS,
Tufts Medical Center.

Olga Maciejewski, Department of Psychosocial Resources, Tom Baker


Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary,
Alberta, Canada
Kathryn M. Magruder, Veterans Administration Medical Center,
Charleston, SC, and Department of Psychiatry and Behavioral Sciences,
Medical University of South Carolina, Charleston, SC
Alex J. Mitchell, Consultant in Liaison Psychiatry, Leicester General Hospital,
Leicester, and Honorary Senior Lecturer in Liaison Psychiatry, Department of
Cancer & Molecular Medicine, Leicester Royal Infirmary, UK
Gordon Parker, Scientia Professor, School of Psychiatry, University of
New South Wales, Sydney, Australia, Executive Director, Black Dog
Institute
LIST OF CONTRIBUTORS xiii

William Rogers, Senior Statistician, Institute of Clinical Research and Health


Policy Studies (ICRHPS), Tufts Medical Center
Adam B. Smith, Lecturer in Quantitative Methods, Centre for Health and
Social Care, Leeds Institute of Health Sciences, University of Leeds, UK.
Brett D. Thombs, Department of Psychiatry, McGill University and Jewish
General Hospital, Montreal, Quebec

Derik E. Yeager, Department of Biometry, Biostatistics, and Epidemiology,


Medical University of South Carolina, Charleston, SC
Roy C. Ziegelstein, Department of Medicine, Johns Hopkins University
School of Medicine, Baltimore, MD

Mark Zimmerman, Department of Psychiatry and Human Behavior, Brown


University School of Medicine, Rhode Island Hospital, Providence, RI
This page intentionally left blank
Preface

Researchers became interested in screening patients for depression in primary


care in the early 1980s because of evidence of poor recognition of depression
by primary care physicians and gaps in adequacy of treatment.1 Because of
extensive epidemiologic research as well as the development of antidepressant
medications that have fewer side effects and evidence-based brief therapies,
recognition rates of depression by primary care physicians have improved over
the past two decades, with recent studies suggesting that as many as 50% to
65% of patients are accurately diagnosed.2 Most studies also show that greater
severity of depression and increased functional impairment are associated with
higher rates of recognition.3 A study by Rost and colleagues that examined
recognition rates over a 6-month period rather than for just one visit also found
higher rates of accurate diagnosis by primary care physicians.4 This latter study
is important because primary care physicians often make diagnoses over time
as they work up patients over several visits.
Studies have also shown that a much higher percentage of patients in primary
care are exposed to antidepressant medications compared to two decades ago.5
However, there are many remaining gaps in the quality of care for depression in
primary care: only 20% of patients receive the Health Employer Data and
Information Set (HEDIS)-recommended three or more visits in the first 90
days after starting an antidepressant and only 40% to 50% remain on medication
at 6 months.6 Over the past 20 years (from the tricyclic era to the selective
serotonin reuptake inhibitor era), studies consistently report that only 40% of
patients started on antidepressants for major depression recover (a greater than
50% decrease in symptoms) by 4 to 6 months.7 Less than 10% of patients with
major depression in primary care receive evidence-based psychotherapy.5 There
is clearly room for improvement of quality of care in patients with major
depression from screening to improved detection, to healthcare models that
provide enhanced exposure to evidence-based treatments.

xv
xvi PREFACE

One of the unexpected findings of increased interest by primary care


physicians in the detection and treatment of patients with depression is that
approximately half of patients started on medication for depression actually
meet DSM-IV criteria for minor depression.8 This is important because anti-
depressant-versus-placebo trials have generally shown high rates of placebo
response in patients with minor depression and lack of active drug-versus-
placebo differences.9 Screening for depression may actually increase the
number of patients with minor depression who are potentially treated because
many patients cluster around the DSM-IV diagnostic threshold and, depending
on the stressful life events of the past few days, may or may not meet criteria for
major depressive disorder. Patients with minor depression or adjustment reac-
tions to stressful life events must be distinguished from those with a history of
major depression who have significant residual symptoms necessitating active
treatment. For patients who have mild major depression, brief counseling,
watchful waiting, and rescreening them for depression 2 to 4 weeks later
may allow better recognition of whether the patient needs treatment with
medication or psychotherapy.
If screening of depression is to be integrated into primary care, healthcare
organizations are faced with the decision about which screening tool is
optimal. Primary care organizations, the American Psychiatric Association,
and many research foundations have recommended the use of the Patient
Health Questionnaire (PHQ-9) as the optimal depression screening tool in
primary care. The PHQ-9 has the advantage of being able to help measure
the severity of depression (0 to 27 is the severity range of this tool) and, at a
score of above 10, has high sensitivity and specificity compared to structured
psychiatric interviews for the diagnosis of major depression.10
The U.S. Preventive Services Task Force recommended routine depression
screening in primary care in systems that have been reorganized to provide
effective treatment for depression.11 This reflects the fact that studies that
tested depression screening alone showed mild to modest improvement in
the quality of depression treatment provided, but generally no effect on depres-
sion outcomes.12 What do we know about methods to organize care to improve
outcomes of depression?
Although screening for depression alone has not been shown to improve
outcomes, when screening is paired with an organized system of depression
care, multiple studies have shown that depression outcomes can be
improved.13 The chapter by Gilbody reviews the recent meta-analysis of
an intervention called ‘‘collaborative care.’’ A total of 37 randomized trials
that compared collaborative versus usual primary care found that collabora-
tive care was associated with a twofold increase in adherence to antidepres-
sant medication and improvements in depression that lasted 2 to 5 years. 13
The key elements of the most successful collaborative care interventions
PREFACE xvii

included two core components. The first component incorporates a depres-


sion care manager who improves patient education and, with telephone and/
or in-person frequent contacts, tracks depressive symptoms, side effects,
and adherence to treatment.14 The care manager facilitates return appoint-
ments with the primary care doctor or, in some instances, a mental health
specialty referral for patients with persistent symptoms, problematic side
effects, or poor adherence.14 The second crucial component is supervision
of the case manager by a psychiatrist who recommends changes in medica-
tion based on clinical response and side effects. Many recent collaborative
care trials also have used psychologists’ skills to teach care managers
motivational interviewing techniques and brief, evidence-based psy-
chotherapies such as problem-solving therapy.15
In summary, this excellent book summarizes two decades of research on
depression screening and quality-improvement efforts in primary care. We
now have state-of-the-art depression screening tools, and research studies have
shown that pairing depression screening with evidence-based models that
enhance exposure to antidepressant medication and evidence-based psy-
chotherapies can markedly improve depression outcomes for patients with
major depression.

Wayne Katon

References
1. Zung WW, Magill M, Moore JT, et al. Recognition and treatment of depression in a
family medicine practice. J Clin Psychiatry. 1983;44:3–6.
2. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based
sample of patients with diabetes and major depression. Med Care. 2004;42:1222–1229.
3. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care
physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12.
4. Rost K, Zhang ML, et al. Persisently poor outcomes of undetected major depression in
primary care. Gen Hosp Psychiatry. 1998;20(1):12–20.
5. Olfson M, Marcus SC, Druss B, et al. National trends in the outpatient treatment of
depression. JAMA. 2002;287:203–209.
6. Druss BG, Miller CL, Rosenheck RA, et al. Mental health care quality under managed
care in the United States: a view from the Health Employer Data and Information Set
(HEDIS). Am J Psychiatry. 2002;159:860–862.
7. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in
primary care. Gen Hosp Psychiatry. 2002;24:213–224.
8. Katon W, Von Korff M, Lin E, et al. Collaborative management to achieve treatment
guidelines. Impact on depression in primary care. JAMA. 1995;273:1026–1031.
9. Barrett JE, Williams JW, Jr., Oxman TE, et al. Treatment of dysthymia and minor
depression in primary care: a randomized trial in patients aged 18 to 59 years. J Fam
Pract. 2001;50:405–412.
10. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity
measure. J Gen Intern Med. 2001;16:606–613.
xviii PREFACE

11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
12. Katon W, Gonzales J. A review of randomized trials of psychiatric consultation-liaison
studies in primary care. Psychosomatics. 1994;35:268–278.
13. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a systematic
review and cumulative meta-anlysis. Arch Intern Med. 2006;166:2314–2321.
14. Katon W, Unutzer J. Collaborative care models for depression: time to move from
evidence to practice. Arch Intern Med. 2006;166:2304–2306.
15. Unützer J, Katon W, Callahan CM, et al. Collaborative care management of late-life
depression in the primary care setting: a randomized controlled trial. JAMA.
2002;288:2836–2845.
Screening for Depression in Clinical Practice
This page intentionally left blank
1
IS THE SYNDROME OF DEPRESSION
A VALID CONCEPT?

Alex J. Mitchell and Mark Zimmerman

1. What is Meant by Depression?


2. Value and Validity of the Syndrome Concept
3. Diagnostic Checklists (including DSM and ICD)
4. Unstructured (Unassisted) Clinician Diagnosis
5. Structured and Semi-Structured Assisted Diagnostic Interviews
6. Conclusion

Context
Depression is an everyday term, but if clinical management is to be empirically
based, there needs to be a valid and reliable definition of the disorder that is
distinct from normal sadness. The validity of the concept and all studies of
screening for depression are hampered by the absence of a gold standard.
Nevertheless, various thorough methods of assessment may help to improve the
clinical utility of our concept of depression.

1. What is Meant by Depression?


This book is built around the premise that major depressive disorder (MDD)
exists in a way that is recognizable time and again by clinicians around the
world. Considerable effort has been expended in developing and refining
methods to measure depression. This chapter takes a step back and asks
whether this effort is built upon a solid foundation. This begins with an
important question: What is the purpose of making a meaningful diagnosis
in any field of medicine? We suggest it is primarily to gain consensus and

3
4 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.1. Levels of Diagnostic Certainty in Psychiatry

Highest
Externally validated by ‘‘perfect’’ biological test
High
Consensus expert panel performing longitudinal evaluation using all possible
data
Medium to High
Structured or semi-structured interview performed by a trained interviewer or
clinician
Low to High
Severity questionnaires rated by the patient or clinician
Low to Medium
Unstructured, unassisted interview performed by an interested clinician
Low
Unstructured, unassisted interview performed by an inexperienced (or
uninterested) clinician

knowledge that may help individuals and populations who have health-
related ‘‘meetable unmet needs.’’ A medical diagnosis (spurious or not)
has several other benefits (Textbox 1.1). It facilitates agreement with col-
leagues, it lends confidence to patients, it adds legitimacy to treatments, and
it may allow the development of targeted interventions. Because many
conditions can be successfully treated without knowing the true etiology
or the precise diagnosis, the lack of gold standard should not be a cause of
therapeutic nihilism. Consider neurologists attempting to treat a midlife
inherited chorea in 1862. Meticulous clinical method could bring some
success despite the absence of a name and a description for another 10
years and the absence of a known etiology for another 110 years. Although
many early treatments were based largely on placebo effects or environ-
mental manipulation, once a definitive cause is found and the pathophysio-
logic mechanism is revealed, the potential for treatment becomes vast,
whereas once it was small.
Yet there is an even more fundamental issue. Kraepelin believed the
major psychiatric disorders were ‘‘natural disease entities’’ simply
awaiting a discovery of a specific medical cause. After intensive effort
the search for fundamental causes was resigned and nosology underwritten
by internal cohesion of symptoms and signs.1 What if depression has no
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 5

single pathophysiologic explanation and is a complex manifestation of


severe external stress?2 Would our concept be invalid and would existing
treatment be rendered obsolete overnight? Similarly, if severe stress and
mild depression were closely related, then attempting to find a test that
separated them would be difficult to the point of impossibility (Fig. 1.1).
After many decades of debate, it is not at all clear that depression is a
discrete entity and justifies a categorical classification as opposed to a
continuum merging with normal healthy but unhappy people.3 In the
continuum argument, the distribution of symptoms of depression would
theoretically approximate to a skewed normal or half-normal distribution
with no point of rarity (Fig. 1.2).4 Cloninger stated that there is no
empirical evidence for natural boundaries between major syndromes and
that ‘‘no one has ever found a set of symptoms, signs, or tests that separate
mental disorders fully into non-overlapping categories.’’5 Yet all current
diagnostic systems that include MDD appear to assume there is a distinct
syndrome (depressive disorder as distinct from depressive symptoms) and
try to suggest an optimal method to identify it (Fig. 1.3). Even if this
approach was correct and the current nosology of DSM-IV entirely per-
fect, there would be a significant danger of over relying on the concept of
MDD to exclusion of other under researched forms. In other words, given

Point of Partial Rarity


Number
of Normal Stress
Individuals

Depressed

True –ve

True +ve

False –ve False +ve

Score on Hypothetical Diagnostic Test


Optimum Cut-off value

Figure 1.1. Hypothetical distribution of test scores in two related conditions. Two distinct
conditions should be separated by a point of rarity on at least one fundamental measure
(see also Fig. AP.4).
Distribution of HADS Scores in Cancer
Outpatients (n=3071)
3000

2500

2000

1500

1000

500

n
en

en
ur en

nt n
Th lve
Tw en

Se Six n
n
e

t
ro

e
ur
ne

Th o

ve

El n
Se ix

gh

ee
ve tee
e
ve
re

in
Tw

Te
S
Ze

Fo

te

te
Fo irte
ev

fte
Fi
O

e
N
Ei

gh
Fi

Ei
16

14

12

10

0
en two

y- o
Tw Tw en
Si en
Fo elve

Ei een

Tw ent ur
ty ix

x
irt r
ro

ht
Tw en

Tw ty- y
ur

irt irty

Th Fou
gh

Th -Tw

Si
en nt
Si
Tw

en -s
o

ig
Ze

Fo

te
te
T

y-
e

Tw ty-f

Th Th
Ei

y
xt

-E
gh
ur

y
irt

Figure 1.2. Distribution of HADS scores in cancer outpatients (n ¼ 3,071). This


continuous distribution of HADS scores in primary care and secondary (cancer care)
illustrates a skewed normal distribution. Data from Thompson et al. Br J Psychiatry.
2001;179:317–323 and Sharpe et al. Br J Cancer. 2004;90:314–320.

6
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 7

Distribution of DSMIV Symptoms of Depression in Zurich Study


100
90
80
70
60
50
40
30
20
10
0
Zero One Two Three Four Five Six Seven Eight Nine

Figure 1.3. Distribution of DSM-IV symptoms from Zurich study. The sample comprised
591 individuals originally selected in 1978 from the total population of 18- and 19-year-olds
in Zurich, Switzerland, based on their scores of the Symptom Checklist-90 (SCL-90-R)
(Derogatis, 1977). Two thirds of the sample was randomly selected from members of the
total population who scored above the 85th percentile on the SCL-90-R, and one third was
randomly selected from the remainder of the total population. Reprinted from Journal of
Affective Disorders 62, Angst J, Merikangas KR, Multi-dimensional criteria for the
diagnosis of depression, 7–15, Copyright (2001).

recent evidence, psychiatrists would be well advised to pay as much


attention to minor (mild and syndromal) disorders as diabetologists are
now paying to impaired glucose tolerance.6

2. Value and Validity of the Syndrome Concept


The concept of a syndrome is fundamental to diagnostic classification and may
be valuable even if imperfect.7 Without the concept of a syndrome, a disorder
would be defined by a single symptom or simple symptom count. A syndrome
is a special collection of symptoms that cluster in a peculiar way determined by
the underlying pathophysiology, even if that mechanism is unknown. Careful
identification of many psychiatric syndromes and their relationships has
formed a detailed family of mental disorders not dissimilar to the Linnaean
taxonomy proposed by Carl Linnaeus (1707–1778).
In defining clinical syndromes, we rely on certain essential or core
symptoms occurring commonly in those with the disorder but rarely in
those without (Textbox 1.2). By the same token, we often ignore other
symptoms that occur without much discrimination. Hence, some symptoms
8 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.2. Types of Validity Testing

Content validity (Strength: Weak)


The degree of measurement of all fields of interest
Criterion validity (Strength: Strong)
Agreement against a criterion that is external to the measuring instrument
itself
Construct validity (Strength: Moderate)
Agreement with other measures consistent with theoretically derived
hypotheses
Procedural validity Strength: Weak)
Agreement with an existing procedure

are more important diagnostically than others, but without large samples and
rigorous examination, it isn’t obvious which ones these are. Further, life is
rarely simple and rarely is any symptom both entirely unique to a psychiatric
disorder and at the same time always manifest. If it were, then when this
particular symptom was absent, we would know the disorder itself was impos-
sible. We would therefore have a single question diagnostic test with perfect
specificity (see Chapter 5). In MDD, DSM-IV suggests that the core features
involve dysphoria (low mood) and anhedonia (loss of interest), and ICD-10
suggests that fatigue should also be an essential feature.8 In addition to these
symptoms, aspects such as clinical significance, duration, disability, and dis-
tress have been added as a requirement in many diagnostic categories. We
suggest it is no longer sufficient for an expert panel to mandate such features,
no matter how logical it seems, because their predictive values will be uncer-
tain until tested. In fact, all aspects of a definition (the symptoms, signs,
associated features, and rules binding them together) should be amenable to
clarification and empiric testing. If a syndrome is adopted too easily, the
concept can become a pitfall, as Kendell and Jablensky explained: ‘‘Once a
diagnostic concept such as syndrome has come into general use, it tends to
become reified.’’9 In other words, its validity is assumed rather than tested.
How, then, can a syndrome be tested and better tests developed? This is
discussed in detail in Chapters 4 and 5, but in brief, accuracy is usually
determined by validity and reliability. Reliability refers to the extent to
which an observation yields the same results on repeated independent assess-
ments. Essentially, this is a measure of consensus between assessors. Validity,
derived from the Latin validus, meaning strong, refers to how well
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 9

the instrument measures what it purports to measure (see Textbox 1.2). In


essence this is a measure of truth—how much agreement is there with the
actual disorder, assuming it could be defined by some criterion reference (or
gold standard). In MDD there is no accepted gold standard,10 and therefore
reliability and validity testing must be reduced to measures of agreement,
where the critical question becomes: How good is the comparison? In
medical specialties, aspects of the history such as nature of the chest pain
have been subjected to diagnostic validity testing in a similar way to
established investigations such as the electrocardiogram.11,12 In psychiatry
(outside of organic brain disease), such objective tests are rarely if ever
available. Many influences favor the adoption of a medical model in which
an etiologic agent, a pathologic process, and symptoms and signs are
assumed to be present even if unknown. This is often highly acceptable to
patients, clinicians, and other interested parties (eg, the pharmaceutical
industry), not least because stigma may be reduced and help-seeking and
adherence encouraged. The flip side is that patient responsibility may be
diminished and biologic treatments may be overprescribed. If the medical
model of depression is correct, then eventually a definitive core disease
process underlying depression will be found and a diagnostic test developed
that (regardless of convenience) will enable current clinical diagnostic
methods to be fully evaluated. If the medical model of depression is incor-
rect, then a definitive biologic test will never be developed, and we will
continue to develop proxies of illness that may nevertheless correspond to
important correlates of disorder and suffering, such as treatment response,
course, and quality of life.
The astute reader will probably conclude that measures of reliability
and validity in psychiatry (and by implication diagnosis itself) are essen-
tially all tests of agreement, albeit against different standards. Reliability
is agreement with peers, and validity is agreement with an accepted
method. As no group has yet found a robust biologic test for depression,
most work has focused on attempts to improve the reliability of assess-
ments conducted by researchers and clinicians. Often this involves refine-
ment of the clinical interview using methods that assist the clinician.
Semi-structured interviews provide questions that might best elicit symp-
toms but the clinician retains flexibility to deviate from this if necessary.
Structured interviews provide questions that must be asked as described,
purposely removing flexibility, with the useful benefit that clinical training
is not a prerequisite and large population surveys using lay interviewers
becomes possible. One level of assistance to clinicians that does not
interfere with the clinical interview is provision of symptom checklists,
together with the rules for their combination (Textbox 1.3). This essen-
tially forms the basis of ICD-10 and DSM-IV.
10 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.3. Development of Diagnostic Checklists

1972
Feighner, Diagnostic Criteria (FDC): Primary Depression
1978
Spitzer, Research Diagnostic Criteria (RDC): Major Depressive Disorder
1980
Diagnostic and Statistical Manual III: Major Depressive Episode
1987
Diagnostic and Statistical Manual III-R: Major Depressive Episode
1990
ICD-10 International Classification of Diseases: Mild, Moderate, or Severe
Depression
2000
Diagnostic and Statistical Manual IV: Major Depressive Episode
2012
ICD11 International Classification of Diseases Diagnostic and Statistical
Manual V

3. Diagnostic Checklists (including DSM and ICD)


Diagnostic checklists are a list of features, together with the rules for making
a particular diagnosis. If the criteria are monothetic, then all the items must be
present; if polythetic, then only a proportion are required. If features are
necessary, then specific features must be present; if sufficient, then only
certain criteria but no others are needed. Several checklists that generate
one or more systems of psychiatric diagnosis have been proposed (Textbox
1.4).13–15 Checklists leave the clinician to conduct the clinical interview in
any way he or she feels appropriate. Advanced systems may use diagnostic
algorithms that prioritize certain items and use more complex rules, such as
‘‘if x, then y.’’ DSM and ICD-10 use diagnostic checklists but also include
some suggestions for the interview itself. That said, a diagnostic interview
defined only by DSM-IV/ICD-10 lacks clearly defined probe questions,
requiring clinicians to formulate their own approach. Although this adds to
the acceptability, equally it contributes to interrater variability.16 Some con-
sider DSM and ICD distinct from other checklist methods because of the
claim that DSM and ICD are operationalized—that is, each and every step is
described and subject to unambiguous instructions as well as reliability or
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 11

Textbox 1.4. Checklists for Aiding Psychiatric Diagnosis

Lists of Integrated Criteria for the Evaluation of Taxonomy (LICET)


LICET-D for depressive disorders assembles all criteria from 9 diagnostic
systems.
Operational Criteria Checklist (OPCRIT)
OPCRIT generates diagnoses of 13 diagnostic systems and has been proposed
to generate diagnoses direct from medical notes.
ICD-10 Symptom Checklist
Developed by Janca; takes about 15 minutes.
International Diagnostic Checklists (IDCL)
Two 30-item lists, one for ICD-10 and one for DSM-IV.

validity testing. This is probably not the case. Efforts to measure the relia-
bility of DSM-IV have been published.17

ICD and DSM


The World Health Organization (WHO) introduced mental disorders in the
sixth revision of the International Classification of Diseases (ICD-6) in 1948.18
The American Psychiatric Association Committee on Nomenclature and
Statistics published the first edition of the Diagnostic and Statistical
Manual: Mental Disorders (DSM-I) in 1952 (see Textbox 1.3).19 Current
diagnostic classification manuals (DSM-IV and ICD-10) deliberately do not
contain mutually exclusive diagnostic categories; rather, they contain over-
lapping areas. Indeed, if carefully applied, each diagnostic system yields a
different number of cases, as illustrated by Erkinjuntti and colleagues (1997)
for dementia20 and Furukawa and associates21 for depression. Of note, agree-
ment between diagnostic systems examined in the same sample is often modest
(Table 1.1). It was in the eighth revision of ICD (ICD-8) in 1967 and in the third
edition of DSM (DSM-III) in 1980 where a systematic effort to improve the
diagnosis and classification of mental disorders was made. Until then, text-
books containing descriptions of individual conditions were the main source of
information, but naturally this led to numerous disputes. DSM and ICD go
beyond textbook descriptions by providing a checklist of useful criteria and,
importantly, suggesting a diagnostic threshold determined by specific symp-
toms, which usually have to fulfill both frequency (symptom count) and
duration criteria. The key difference between a severity questionnaire and an
operational method is that certain criteria are required in the latter, whereas
12 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Table 1.1. Clinician Agreement (Kappa) Using Different Diagnostic


Systems for Depression

DSR DSM RDC ICD-10


DSM 0.95
RDC 0.71 0.71
ICD-10 0.71 0.7 0.74
FDC 0.59 0.6 0.77 0.63
Adapted from Philipp M, Delmo CD, Buller R, et al. Differentiation between
major and minor depression. Psychopharmacology. 1992;106:S75–S78.

severity questionnaires usually rely on symptom counts alone, without


weighting of symptoms (see Appendix Table 1). That said, questionnaires
can be constructed to follow the DSM diagnostic algorithm.22 This is not
surprising, because most mood questionnaires were proposed by experts
based on clinical experience alone, whereas careful field testing is needed to
rank important items (see Chapter 4). Given this, it is remarkable that severity
questionnaires may perform quite well against structured interviews.

Validation of the DSM-IV/ICD-10 Criteria for Depression


The criteria for major depression, minor depression, and dysthymia are shown
in Table 1.2. Subsyndromal depression is not currently included in DSM-IV
but can be considered present if there are at least two DSM-IV symptoms but
the overall criteria for major or minor depression are not met.23 MDD is
defined by depressed mood or loss of interest in nearly all activities for at
least 2 weeks accompanied by at least three or four (for a total of 5) symptoms.
The criteria for minor depression are identically but require two to four

Table 1.2. Diagnostic Categories for Depressive Disorders

Diagnostic DSM-IV Criteria Symptom


Category Duration
Major depression 5 depressive symptoms, including depressed mood or 2 weeks
anhedonia, causing significant impairment in social,
occupational, or other important areas of functioning
Minor depression 2–4 depressive symptoms, including depressed mood or 2 weeks
(research criteria anhedonia, causing significant impairment in social,
diagnosis) occupational, or other important areas of functioning
Dysthymia 3 or 4 dysthymic symptoms, including depressed mood, 2 years
poor appetite or overeating, insomnia or hypersomnia,
low energy, low self-esteem, poor concentration or
indecisiveness, and hopelessness
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 13

symptoms and require exclusion of previous major depression in an attempt to


avoid confusion over residual symptomatology. Dysthymia is characterized by
fewer symptoms than major depression (three or four) and a chronic course
lasting at least 2 years.
In ICD-10 the core symptoms of depression include decreased energy or
increased fatigability in addition to low mood and loss of interest. Further, only
four symptoms are required for a mild episode, and six (five in early versions)
symptoms qualify as moderate depressive episode. Thus, DSM-IV major
depression is broadly analogous to the ICD-10 concept of moderate or severe
depression. Both ICD and DSM suggest a minimum number of typical and
associated symptoms and a minimum duration of symptoms of 2 weeks. In
DSM-IV, but not in ICD-10, a third feature is added: that the disorder causes
significant impairment in social, occupational, or other important areas of
functioning. As a result, there is discordance in diagnosis based on ICD-10
versus DSM-IV.24–26
Over the past 10 years there have been accumulating challenges to the
diagnostic criteria in DSM-IV, including but not limited to MDD. Philipp and
colleagues (1992) were one of the first groups to show that the major
depression concept may be too narrow.27 In a primary care study using
DSM-III-R, MDD occurred in 17.4%, but the majority of depressed patients
fell into the group of depression ‘‘not otherwise specified’’ (NOS). Adding
the minor depression concept resulted in the reclassification of 38.3% of the
NOS patients to minor depression. Data from the National Comorbidity
Survey have shown that across the minor, major, and severe categories of
depression (depending on the number of symptoms) there is a ‘‘monotonic’’
increase for a number of fundamental indices such as average number of
episodes, impairment, comorbidity, and parental psychopathology,28 sug-
gesting a continuum within depression rather than categorical groupings.
Kendler and Gardner’s 1998 longitudinal analysis of the Virginia Twin
Registry demonstrated that the presence of five or more symptoms of depres-
sion was not a more accurate definition of depression at 1-year follow-up than
the presence of three or four symptoms.29 Additionally, there is little
empirical support for the DSM-IV requirement for 2-week duration or,
indeed, ‘‘clinically significant impairment.30,31
In the Rhode Island MIDAS project, Zimmerman and colleagues (2006)32
conducted an in-depth analysis of symptoms for MDD by having trained raters
administer a semi-structured interview to 1,523 psychiatric outpatients. 54.4%
of the sample had a current MDD. They analyzed a 17-item bank of possible
symptoms of depression, including the standard 9 DSM items but separating
the compound criteria that encompass more than one symptom (eg, increased
sleep or insomnia), along with non-DSM diagnostic items such as hopeless-
ness, helplessness, and unreactive mood. The authors found that some items
14 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.5. Inter-Rater Reliability Eliciting


Individual Symptoms of Depression

Symptoms Kappa
Suicidality 0.94
Depressed mood 0.92
Insomnia 0.91
Anhedonia 0.90
Decreased appetite 0.89
Loss of energy 0.88
Indecisiveness 0.88
Thoughts of death 0.86
Psychomotor agitation 0.83
Feelings of worthlessness 0.80
Increased weight 0.79
Decreased concentration 0.78
Excessive guilt 0.76
Decreased weight 0.69
Increased appetite 0.63
Psychomotor retardation 0.63
Hypersomnia 0.54

were rated more reliably than others—for example, suicidal ideas, plan, or
attempt (suicidality) achieved almost perfect agreement, whereas raters often
disagreed about what constituted psychomotor retardation (Textbox 1.5). The
authors found that the ranked order of diagnostic weight (by individual item)
for DSM-IV membership on logistic regression was depressed mood > anhe-
donia > sleep disturbance > concentration/indecision > worthlessness/exces-
sive guilt > loss of energy > appetite/weight disturbance > psychomotor
change > death/suicidal thoughts. Some items seemed redundant in making a
diagnosis. Zimmerman’s group also looked at a validity of so-called core
criteria.33 Only 1.5% of the 1,800 patients reported five or more criteria in
the absence of low mood or loss of interest or pleasure. Twenty-five of these 27
patients reported depressed mood at a subthreshold level, often in partial
remission. Thus, only a small handful of cases would be false positives if no
core criteria existed. In another paper in the series, they found that few patients
who met the symptom criteria for MDD were ruled out of the diagnosis by the
other components of the diagnostic algorithm, thereby explaining why self-
administered depression symptom questionnaires perform well as diagnostic
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 15

proxies.34 Finally, they addressed the longstanding issue of applying some of


the criteria in patients with comorbid medical illnesses because of symptom
nonspecificity. Based on a series of psychometric analyses that were cross-
validated, they developed an alternative set of diagnostic criteria for MDD that
did not include somatic symptoms but would nonetheless demonstrate a high
level of concordance with the current DSM-IV definition.

4. Unstructured (Unassisted) Clinician Diagnosis


Clinician-based assessment has been poorly investigated compared with
assisted methods of diagnosis. In fields of medicine where a robust external
validation such as postmortem is available, routine diagnostic accuracy has
often proven to be remarkably poor.35,36 It should be no surprise, then, if in the
absence of a gold standard, health professionals have considerable difficulty
making accurate and reliable diagnoses (see Table 1.2).37,38 Regarding missed
diagnoses, one study suggested that only 26% were complete mistakes; 25%
were underestimates of severity and 38% misidentifications. Conversely,
regarding false-positive diagnoses, 35% were overestimates of severity, 24%
misdiagnoses, and 41% complete errors. To compound this problem, 90% of
psychiatrists do not routinely use case identification and severity measurement
for depression (and more than half never do so).39 Most clinicians rely on their
own abilities based on training received earlier in their career. On the other
hand, clinician-based assessment is purported to be a gold standard in psy-
chiatry if the clinician is given adequate time and resources. This was best
conceptualized by Spitzer, who proposed the LEAD standard.40 LEAD is an
acronym that stands for the Longitudinal evaluation performed by Expert
clinicians who utilize All available Data. The LEAD standard is an important
way of obtaining the most likely diagnosis by requiring clinicians to use a
collateral history, hospital records, psychological evaluations, and laboratory
results. However, uncertainty about who is ‘‘expert’’ and which data are
mandatory, as well as availability, limits both the actuarial and practical
value of this standard.41 A related clinical standard is the best estimates
procedure (BEP), which is simpler than the LEAD.42 In the BEP, all available
information is evaluated by experienced clinicians who assign a consensus
‘‘best-estimate diagnosis.’’ As with the LEAD standard, the number of clin-
icians and source of information should always be stated.

Accuracy of Psychiatrists’ Routine Diagnoses


The accuracy of psychiatrists’ diagnostic skills can be compared against BEP
diagnoses and/or structured interviews. The value of BEP was investigated by
16 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Kosten and Rounsaville (1992),43 who interviewed 475 subjects using the
Schedule for Affective Disorders and Schizophrenia-Lifetime (SADS-L).
Two psychologists independently evaluated and diagnosed the same subjects,
applying the BEP. Higher rates of diagnoses of major and minor depressive
disorder, antisocial personality, alcoholism, and drug abuse were revealed
when the BEP was applied than with routine interview alone and with a
minimal rate of false positives. More recently, Taiminen and colleagues
(2001)44 compared routine discharge diagnoses based on DSM-IV and BEP
diagnoses in 116 first-admission patients with psychosis and severe affective
disorder (Table 1.3). However, in this case the BEP included data from a
Schedules for Clinical Assessment in Neuropsychiatry (SCAN) interview,
enforcing an even higher gold standard. Diagnostic agreement was moderate
(kappa 0.51), suggesting frequent errors in the routine diagnoses even when
using DSM-IV criteria. Of note, clinicians tended to miss depressive symptoms
in psychotic patients, to overdiagnose psychotic symptoms in depressive
patients, and to overlook earlier hypomanic or depressive episodes in depres-
sive patients. Spitzer and colleagues (1999)45 evaluated the unassisted accu-
racy of mental health professionals (1 psychologist and 3 mental health social
workers) in comparison with 62 primary care physicians (PCPs) using the
depression scale of the Patient Health Questionnaire (PHQ-9). Accuracy was
calculated in 585 cases who had both assessments within a 48-hour period.
PCPs recognized 61% of cases thought to have major depression by mental
health professionals and excluded 98% of cases thought not to have major
depression. Accuracy in the other direction was not reported. Recently
Carballeira and colleagues from Switzerland (2007)46 studied 212 patients
admitted to the internal medicine units of the University Hospitals of Geneva
(Table 1.4). Each patient completed the PHQ-9 and underwent a blind DSM-IV
diagnostic assessment by a psychiatrist. Compared to the PHQ-9, psychiatrists
recognized 50% of cases with major depression but only 22% of those with

Table 1.3. Diagnostic Accuracy of Primary Care Physicians Against CIDI

Gold Standard Gold Standard


Depressed (CIDI) Not Depressed (CIDI)
Depressed 70 76 PPV 48%
(Unassisted Diagnosis) (false positives)
Not Depressed 104 459 NPV 81.5%
(Unassisted Diagnosis) (false negatives)
Total Se 40.2% Sp 85.8%
Reprinted from General Hospital Psychiatry 21(2), Tiemens BG, VonKorff M, Lin EH, Diagnosis of
depression by primary care physicians versus a structured diagnostic interview. Understanding
discordance, 87–96, Copyright (1999).
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 17

Table 1.4. Diagnostic Accuracy of Psychiatrists vs. PHQ-9 (Patient-Rated)

Psychiatrist PHQ-9 PHQ-9 No PHQ-9 No PHQ-9


Mj Mn Mj Mn
Depressed Depressed Depressed Depressed
Depressed 12 5 26 30 PPV (Mj) 32%
(Unassisted (false (false PPV (Mn) 14%
Diagnosis) positives) positives)
Not Depressed 12 18 162 159 NPV (Mj) 93%
(Unassisted (false (false NPV (Mn) 90%
Diagnosis) negatives) negatives)
Total Se 50% Se 22% Sp 86% Sp 84%
Mj, major (DSM-IV); Mn, minor (DSM-IV).
Reproduced from Carballeira et al. Criterion validity of the French version of Patient Health
Questionnaire (PHQ) in a hospital department of internal medicine. Psychology and Psychotherapy:
Theory, Research and Practice (2007), 80, 69–77.

more milder forms. Rule-out accuracy was high but rule-in accuracy was poor,
with a high rate of false positives. The authors also compared diagnoses of
psychiatrists by internists in medicine, finding a kappa agreement of only 0.20.
This study is valuable because patient-rated symptoms have particular
importance.47
Several groups have explored the accuracy of routine diagnoses against
the Structured Clinical Interview for DSM Disorders (SCID), although few
have used other methods such as the Composite International Diagnostic
Interview (CIDI).48 Helzer and colleagues (1985)49 examined the level of
agreement between a lay-rated Diagnostic Interview Schedule (DIS) in the
Epidemiologic Catchment Area project and routine clinical diagnoses made
by psychiatrists. Overall agreement between the DIS and the psychiatrists
ranged from 79% to 96%, but specificities were all 90% or better. Anthony
and associates (1985)50 studied DSM-III diagnoses made by the DIS in
comparison to a ‘‘standardized’’ DSM-III diagnosis by psychiatrists in the
two-stage Baltimore Epidemiologic Catchment Area mental morbidity
survey. There were considerable disagreements; the only category of
modest agreement was alcohol use disorder. Steiner and colleagues
(1995)51 studied the relationship between diagnoses generated by the
SCID and unstructured psychiatric interviews. Diagnoses generated by
researchers using the SCID and routinely by psychiatrists were compared
for 100 patients. Overall agreement between the SCID diagnosis and the
clinical diagnosis was low (kappa of 0.30). Shear and coworkers (2000)52
examined 164 nonpsychotic patients at two community treatment facilities
using the SCID and compared results to diagnoses obtained from clinician
18 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

records. The majority (59%) of patients met the SCID criteria for a primary
depressive disorder. Diagnoses agreed in only a small minority of cases
(kappa 0.24 overall and 0.33 for mood disorder). Overall, use of the SCID
resulted in more diagnoses than the standard clinical procedures, particu-
larly where comorbidity was present. Anxiety disorders, in particular, were
much more likely to be overlooked by a clinical rater. One exception was
‘‘adjustment disorder,’’ which was more frequently diagnosed by a clinician
than by the SCID rater. In an important but small-scale study, Miller and
colleagues (2001)53 compared three methods of diagnosis for 56 psychiatric
inpatients against the LEAD criterion standard. These were unassisted
clinical assessment, SCID, and a structured Computer-Assisted Diagnostic
Interview (CADI). Psychiatrists’ unassisted assessment had 54% agreement
against LEAD (kappa 0.43), whereas SCID and CADI had agreements
above 85% (kappa 0.81). Compared with similarly trained colleagues,
there was an interrater agreement of only 45.5% (kappa 0.24) for unassisted
clinicians, meaning independent clinicians disagreed most of the time.54 In
one of the largest studies of diagnostic accuracy, Kashner and coworkers
(2003)55 looked at 294 newly enrolled adult psychiatric patients based on
clinical records. Within 2 weeks of their primary evaluation, patients were
randomly assigned to receive a nurse-administered SCID with feedback to
the attending psychiatrist or usual care. The kappa agreement between the
SCID and chart diagnoses of MDD was 0.56 at baseline (unassisted), rising
to 0.90 at the end of the study after feedback of results to clinicians. Against
the SCID, clinicians underdiagnosed all psychiatric disorders (for example,
missing over 60% of substance abuse disorders and anxiety disorders).
However, unassisted clinicians also made several false-positive diagnoses,
most commonly for schizophrenia, bipolar disorders, and MDD. Basco and
associates (2003)56 interviewed 200 psychiatric outpatients and attempted to
establish gold standard diagnoses based on SCID, all medical records, and a
follow-up interview with a psychiatrist or a psychologist trained in diag-
nostic procedures (in effect, the LEAD procedure). The percentage of
agreements with this gold standard was 53% for routine diagnoses, 68%
for the SCID, and 79% for the SCID plus chart review. Concordance was
better for depression. Looking at the subset of patients examined by a
psychiatrist, 70% of those thought by psychiatrists to have MDD actually
did on the SCID (43 of 61 participants), but half of the SCID cases of MDD
were not previously recognized as such, typically assigned adjustment dis-
order or no clinical diagnosis, anxiety disorder, substance abuse, or bipolar
disorder. The accuracy of unassisted clinical ability was examined for both
rule-in and rule-out accuracy (Table 1.5). Psychiatrists were good at
excluding depression but missed 50% of cases when attempting to rule in
a diagnosis. In all groups, when discrepancies occurred, most were judged to
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 19

Table 1.5. Diagnostic Accuracy of Psychiatrists vs. SCID Plus

SCID þ Standard SCID þ Standard


Depressed Depressed
Depressed 17 7 PPV 76%
(Unassisted (false positives)
Diagnosis)
Not Depressed 17 155 NPV 89%
(Unassisted (false negatives)
Diagnosis)
Total Se 50% Sp 96%
SCID+ refers to SCID, plus all medical records and a follow-up interview with a trained
psychiatrist or a psychologist; see text.
Basco et al. Methods to improve diagnostic accuracy in a community mental health setting. Am J
Psychiatry. 2000;157(10):1599–1605.

be of substantial clinical importance. Performance shows remarkable simi-


larity to those of PCPs (see Table 1.4 for comparison). The kappa coeffi-
cients showed that administration of the SCID without the benefit of a
medical record review improved accuracy beyond routine diagnosis alone,
while adding information derived from the chart review resulted in an
additional 25% improvement over and above the SCID alone. These find-
ings are consistent with reports from other studies showing the advantage of
diagnostic interviews over unstructured clinical interviews (see below).57,58
This is one study in which the importance of the competing diagnoses was
investigated. Psychiatrists found separation of MDD versus obsessive-com-
pulsive disorder and MDD versus dysthymia to be relatively straightforward
but struggled with MDD versus adjustment disorder and MDD versus
organic disorder, among others. Reasons for suboptimal accuracy are dis-
cussed in Chapter 3.

5. Structured and Semi-Structured Assisted Diagnostic


Interviews
Semi-structured diagnostic interviews were introduced in the 1970s as a method
that would allow lay interviewers to obtain psychiatric diagnoses close to those a
psychiatrist would obtain.59,60 Rogers suggested that one third of clinical varia-
bility was due to idiosyncratic questioning and two thirds to interpretation of the
information gleaned.60 The premise is that standardization forces an assessor to
cover all the areas of psychopathology and provides consistency in the way
questions are asked. Three main components of the structured interview are (1)
to use the standardized language of clinical method, (2) to sequence the order of
inquiry, and (3) to quantify the responses.
20 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

However, assisted interviews have several significant limitations. First, they


are time-consuming: the average time to administer the SCID is approximately
1 hour and 44 minutes, compared to about 40 minutes for a standard interview
(Textbox 1.6). Second, they have modest acceptability to patients and staff,
who often find these interviews restrictive (for staff) and repetitive (for

Textbox 1.6. Summary of Assisted Interviews

Partially Structured
The PSE (Present State Examination)/ SCAN
Type: Semi-structured interview
Recommended Use by: Clinicians
Generates: ICD-10 and DSM-IV criteria
Duration: 45 minutes
SCID-I (Structured Clinical Interview for DSM-IIIR)
Type: Semi-structured interview
Recommended Use by: Trained interviewer and/or clinicians
Generates: DSM-IV
Duration: 1 hour and 44 minutes
Schedule for Affective Disorders and Schizophrenia (SADS)
Type: Semi-structured interview
Recommended Use by: Trained interviewer and/or clinicians
Generates: RDC
Duration: 90 minutes

Fully Structured
CIDI (Composite International Diagnostic Interview)
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: ICD-10 and DSM-III-R criteria
Duration: 75 minutes
M.I.N.I (Mini-International Neuropsychiatric Interview)
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: ICD-10 and DSM-IV criteria
Duration: 20 minutes
Diagnostic Interview Schedule (DIS),
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: DSM-IV
Duration: 120 minutes
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 21

patients).61 Third, and perhaps unexpectedly, diagnostic interviews can


produce far from uniform results even in the same population. For example,
12-month prevalence rates of major depression in the United States using two
instruments were 4.2%62 and 10.1%.63 Further, no before-and-after study or
randomized trial has shown how much these methods can improve routine
care. These cautions call into question the value of these instruments for
clinical care, at least until further data are available.64
The most common instruments are illustrated in Textbox 1.6. The SCID was
developed alongside DSM-III-R.65 As with most instruments, raters must first be
trained. Compared with the CIDI, the clinician makes more judgments as to
whether each criterion is met and whether all criteria taken together validate the
clinical diagnosis. Numerous studies have evaluated interrater reliability for
major depression using the SCID. One of the largest, from Williams and
colleagues (1992),66 evaluated the ability of psychiatrists (n = 14), psychologists
(n = 6) and master’s degree students (n = 4) to diagnose depression. There was a
modest kappa agreement of 0.64. There are also several studies comparing the
SCID and CIDI. In a sample of 325 patients from the National Comorbidity
Survey, the sensitivity of CIDI was 55% and specificity was 93.7% for lifetime
major depression compared with the SCID (kappa 0.54).67 In the study by Basco
and associates (2003) mentioned previously, the added value of SCID plus chart
diagnoses suggests that the SCID can be improved using very experienced
clinical raters—hence the need for a clinician-led assisted interview.
Interestingly, feedback of SCID results to psychiatrists can lead to more positive
outcomes.68 Philipp and colleagues (1986) proposed a refinement to the SCID
called the Polydiagnostic Interview (PODI).69 The advantage of this approach is
that the PODI generates diagnosis according to several completing diagnostic
checklists, including DSM-III-R, ICD-10, Research Diagnostic Criteria (RDC),
and Feighner Diagnostic Criteria. The Present State Examination (PSE) is a
semi-structured interview designed for use only by clinicians. The current 10th
edition can generate both ICD-10 and DSM-IV diagnoses. A computer program
derived from PSE (CATEGO-5) has also been developed, as has a short version
of PSE. SCAN is a semi-structured interview based on PSE and is also the
product of a collaborative study between the World Health Organization (WHO)
and the U.S. Alcohol, Drug Abuse, and Mental Health Administration
(ADAMHA).70 Again, the PSE requires a thorough training course, making it
expensive and time-consuming for many.

Fully Structured Assisted Interviews


The DIS was developed by National Institute of Mental Health (NIMH) and
was released in its first version in 1978. It was an adaption of the Renard
Diagnostic Instruments designed to assess Feighner’s diagnostic criteria.
22 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

DIS-4 focuses on DSM-IV and is similar to the CIDI. It has been validated,
but one study found low sensitivity of the DIS versus the SCID.71 The CIDI
was produced jointly by WHO and ADAMHA and is designed to enable a
trained interviewer to arrive at a either an ICD-10 or a DSM diagnosis in
about 75 minutes. The CIDI is an amalgamation of two pre-existing instru-
ments, the DIS and the PSE. It contains 276 symptom questions, many of
which are probes to evaluate symptom severity, as well as questions for
assessing help-seeking and psychosocial impairments. A computerized ver-
sion, CIDI 2.1, is available. The first field showed high interrater reliability
but poor test–retest reliability for depressive disorders.72 Subsequent relia-
bility studies (using slightly different versions of the CIDI) demonstrated a
high interrater reliability.73,74 One validity study used a clinician-scored
DSM-III-R symptom checklist as the gold standard.75 Compared with this
gold standard checklist, the CIDI had a sensitivity of 85% and a specificity
of 98% (kappa 0.84). A second study compared the CIDI against the SCID-
assisted LEAD procedure.76 There was modest positive predictive value and
a high negative predicted value (kappa 0.46). The Mini-International
Neuropsychiatric Interview (M.I.N.I.) is an abbreviated structured psychia-
tric interview that takes only 15 to 20 minutes to administer.77 It uses
decision-tree logic to elicit all the symptoms listed in the symptom criteria
for DSM-IV and ICD-10 for 15 major Axis 1 diagnostic categories, for one
Axis II disorder, and for suicidality. Several specific tools are available:
M.I.N.I.-Screen, M.I.N.I.-Plus, and the M.I.N.I.-Kid. Validation of the
M.I.N.I. in relation to the SCID Patient Version, the CIDI, and expert
professional opinion has been conducted.77

6. Conclusion
Some will find the conclusion that a diagnosis of mental disorders is not based
on a robust gold standard surprising.78 Current evidence has repeatedly shown
that unassisted psychiatric diagnoses are neither particularly reliable (when
judged by repeat assessments) nor particularly valid (when judged by con-
sensus methods or assisted interviews), especially when comorbidity is
present.79 Miller and colleagues (2001)53 found that when unassisted, clini-
cians evaluated an average of only 53% of key criteria present in diagnostic
algorithms (32% in the case of depression). Psychiatrists asked about low
mood in 86% of cases but asked about loss of pleasure in only 8%.80 As
awareness of these limitations increases, there will be an increased call for
clinicians to use diagnostic aids as a routine in clinical practice. If this occurs
with proper diagnostic scrutiny (comparing accuracy with and without assis-
tance head to head), psychiatric diagnosis will slowly move from being a
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 23

nonscientific art based on the overall clinical impression to a science where the
accuracy of each method—indeed each question—is known. As Kendell and
Jablensky9 observed: ‘‘Psychiatry is in the position—that most of medicine
was in 200 years ago—of still having to define most of its disorders by their
syndromes. Because of the consequent need to distinguish one disorder from
another by differences between syndromes, the validity of diagnostic concepts
remains an important issue in psychiatry. In this situation, to search for
boundaries between syndromes and to use zones of rarity as criteria of validity
is, we contend, the best strategy available to us.’’
Here Kendell and Jablensky highlight a fundamental problem in the search
for accuracy. That is the notion that many of our current diagnoses are labels of
convenience not any more distinct from each other than short stature and
normal height. Like many conditions based largely on phenotypes alone,
normal height has a Gaussian (normal) distribution that overlaps with many
diseases and disorders that cause growth retardation. The result may be two
distributions with significant overlap and little point of rarity (see Fig. 1.1).

Kappa
160
Time required
Agreement With Gold Standatd
on Specific Diagnoses (kappa)

140
Time Required (minutes)

120

1.00 100

0.80 80

0.60 60

0.40 40

0.20 20

0.00 0
Routine Diagnoses Diagnoses
Diagnoses Based on SCID Based on SCID
Plus Medical
Records

Figure 1.4. Time required to produce accurate diagnoses. Time requirement and reliability
of routine diagnoses, SCID-based diagnoses, and diagnoses based on the SCID plus medical
records for 200 outpatients with severe mental illness. Reprinted from Basco RM, Bostic JQ,
Davies D, Rush AJ, Witte B, Hendrickse W, Barnett V. Methods to improve diagnostic
accuracy in a community mental health setting. Am J Psychiatry. 2000 Oct;157(10):
1599–605 with permission.
24 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

DSM-III and ICD-8 were landmark publications that allowed us to


scrutinize the mysterious process of psychiatric diagnosis. Each new release
brings an incremental improvement. Although neither DSM nor ICD has
been universally accepted (in one study, clinicians used DSM criteria in
23% of visits in which a psychosocial problem was recognized),81 they have
had a beneficial influence.82 As these checklist-based diagnostic systems
with rule-based criteria are field-tested, it becomes apparent that many of
the suggested symptoms, combinations, and associated features are not
particularly useful diagnostically. However, this could be seen as an advan-
tage, as previously no attempt was made at all to change mainstream
psychiatric diagnoses. Finding out what doesn’t work may be as valuable
as finding out what does. Beyond the checklist approach lie assisted inter-
views, which have a good evidence base for reliability, validity, or both.
What is missing are formal implementation trials where one group of
clinicians are randomized to assisted interviews and one group to diagnosis
as usual to discover if clinical outcome actually improves. Unfortunately,
most of the assisted methods so far developed are too long for routine
clinical use. Indeed, a rule of thumb in this field is that the more accurate
the diagnostic method, the longer the time required—and, further, this effect
may not be linear (Fig. 1.4). A key challenge for the future, therefore, is to
develop reliable diagnostic methods of sufficient brevity that they become
routinely accepted in busy clinical practice, including primary and sec-
ondary care.

References
1. Jablensky A. Categories, dimensions and prototypes: critical issues for psychiatric
classification. Psychopathology. 2005;38:201–205.
2. van Praag HM. Can stress cause depression? Prog Neuropsychopharmacol Biol Psych.
2004;28(5):891–907.
3. Parker G. Classifying depression: should paradigms lost be regained? Am J Psychiatry.
2000;157:1195–1203.
4. Sneath PHA. Some thoughts on bacterial classification. J Gen Microbiol.
1957;17:184–200.
5. Cloninger CR. A new conceptual paradigm from genetics and psychobiology for the
science of mental health. Aust N Z J Psychiatry. 1999;33:174–186.
6. Lyness JM, Kim JH, Tang W, et al. The clinical significance of subsyndromal
depression in older primary care patients. Am J Geriatr Psychiatry. 2007;15:214–223.
7. Angst J, Merikangas KR. Multi-dimensional criteria for the diagnosis of depression.
J Affect Disord. 2001;62:7–15.
8. The ICD-10 classification of mental and behavioral disorders: diagnostic criteria for
research, 10th edition. Geneva: World Health Organization, 1993.
9. Kendell R, Jablensky A. Distinguishing between the validity and utility of psychiatric
diagnoses. Am J Psychiatry. 2003;160:4–12.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 25

10. Aboraya A, Compton III W. Biological markers and external validators in psychiatry:
progress report on the validity of psychiatric diagnoses. eCommunity Int J Mental
Health Addiction. Nov. 7, 2004 [online].
11. Tierney W, Fitzgerald J, McHenry R, et al. Physicians’ estimates of the probability of
myocardial-infarction in emergency room patients with chest pain. Medical Decision
Making. 1986;6(1):12–17.
12. Chun AA, McGee SR. Bedside diagnosis of coronary artery disease: A systematic
review. Am J Med. 2004;117(5):334–343.
13. Pull CB, Pull MC, Pichot P. Integrated lists of taxonomic evaluation criteria: LICET-S
and LICET-D. Acta Psychiatr Belg. 1984;84(4):297–309.
14. Mihalopoulos C, McGorry P, Roberts S, et al. The procedural validity of retrospective
case note diagnosis. Aust N Z J Psychiatry. 2000;34(1):154–159.
15. Janca A, Hillerb W. ICD-10 checklists—A tool for clinicians’ use of the ICD-10
classification of mental and behavioral disorders. Comprehensive Psychiatry.
1996;37(3):180–187.
16. Hamilton JD. Do we underutilise actuarial judgement and decision analysis? Evidence-
Based Mental Health. 2001;4:102–103.
17. Holzer III CE, Nguyen HT, Hirschfeld RMA. Reliability of the diagnosis in mood
disorders. Psychiatric Clin North Am. 1996;19(1):73–84.
18. Manual of the international classification of diseases, injuries and causes of death, 6th
ed. Geneva: World Health Organization, 1948.
19. Diagnostic and statistical manual of mental disorders. Washington, DC: American
Psychiatric Publishing, 1952.
20. Erkinjuntti T, Ostbye T, Steenhuis R, et al. The effect of different diagnostic criteria on
the prevalence of dementia. N Engl J Med. 1997;337(23):1667–1674.
21. Furukawa TA, Anraku K, Hiroe T, et al. A polydiagnostic study of depressive disorders
according to DSM-IV and 23 classical diagnostic systems. Psychiatry Clin Neurosci.
1999;53(3):387.
22. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive
disorder VI: Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis.
2006;194:565–569.
23. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC:
American Psychiatric Publishing, 1994.
24. Philipp M, Maier W, Delmo CD. The concept of major depression. I. Descriptive
comparison of six competing operational definitions including ICD-10 and DSM-
III-R. Eur Arch Psychiatry Clin Neurosci. 1991;240(4–5):258–265.
25. Andrews G, Slade T, Peters L, et al. Classification in psychiatry: ICD-10 versus
DSM-IV. Br J Psychiatry. 1999;174(1):3–5.
26. Ravelli A, Bijl RV, Van Brink WD. Consequences of the use of different classification
systems: A comparison of the DSM-III-R and the ICD10 for depression. Int J Methods
Psychiatric Res. 1999;8(4):192–203.
27. Philipp M, Delmo CD, Buller R, et al. Differentiation between major and minor
depression. Psychopharmacology. 1992;106:S75–S78.
28. Kessler RC, Zhao S, Blazer DG, et al. Prevalence, correlates, and course of minor
depression and major depression in the National Comorbidity Survey. J Affect Disord.
1997;45:19–30.
29. Kendler KS, Gardner CO Jr. Boundaries of major depression: an evaluation of DSM-IV
criteria. Am J Psychiatry. 1998;155:172–177.
30. Spitzer RL, Wakefield JC. DSM-IV diagnostic criterion for clinical significance: does it
help solve the false positives problem? Am J Psychiatry. 1999;156:1856–1864.
26 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

31. Beals J, Novins DK, Spicer P, et al., the AI-SUPERPFP Team. Challenges in
operationalizing the DSM-IV clinical significance criterion. Arch Gen Psychiatry.
2004;61(12):1197–1207.
32. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
I. A psychometric evaluation of the DSM-IV symptom criteria. J Nerv Ment Dis.
2006;194:158–163.
33. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
IV. Relationship between number of symptoms and the diagnosis of disorder. J Nerv
Ment Dis. 2006;194:450–453.
34. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive
disorder, VI. Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis.
2006;194:565–569.
35. Lundberg GD. Low-tech autopsies in the era of high-tech medicine: continued value for
quality assurance and patient safety. JAMA. 1998;2801:1273–1274.
36. Mayeux R, Saunders AM, Shea S, et al. Utility of the apolipoprotein E genotype in the
diagnosis of Alzheimer’s disease. Alzheimer’s Disease Centers Consortium on
Apolipoprotein E and Alzheimer’s Disease. N Engl J Med. 1998;338(8):506–511.
37. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol
Rev. 1983;3:103–145.
38. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care physicians
versus a structured diagnostic interview. Understanding discordance. Gen Hosp
Psychiatry. 1999;21(2):87–96.
39. Gilbody SM, House AO, Sheldon TA. Psychiatrists in the UK do not use outcomes
measures: National survey. Br J Psychiatry. 2002;80:101–103.
40. Spitzer RL. Psychiatric diagnosis: Are clinicians still necessary? Comprehensive
Psychiatry. 1983;24:399–411.
41. Antony MM, Barlow DH. Structured and semistructured diagnostic interviews. In
Barlow DH, ed. Handbook of assessment and treatment planning for psychological
disorders. New York: Guilford, 2002:3–37.
42. Leckman JF, Sholomskas D, Thompson WD, et al. Best estimate of lifetime psychiatric
diagnoses. Arch Gen Psychiatry. 1982;39:879–883.
43. Kosten TA, Rounsaville BJ. Sensitivity of psychiatric diagnosis based on the best
estimate procedure. Am J Psychiatry. 1992;149:1225–1227.
44. Taiminen T, Ranta K, Karlsson H, et al. Comparison of clinical and best-estimate
research DSM-IV diagnoses in a Finnish sample of first-admission psychosis and
severe affective disorder. Nord J Psychiatry. 2001;55(2):107–111.
45. Spitzer RL, Kroenke K, Williams JBW, et al. Validation and utility of a self-report
version of PRIME-MD: the PHQ primary care study. JAMA. 1999;282:1737–1744.
46. Carballeira Y, Dumont P, Borgacci S, et al. Criterion validity of the French version of
Patient Health Questionnaire (PHQ) in a hospital department of internal medicine.
Psychol Psychotherapy Theory Res Pract. 2007;80:69–77.
47. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry.
2000;15(3):160–172.
48. Becker J, Kocalevent RD, Rose M, et al. Standardized diagnosing: Computer-assisted
(CIDI) diagnoses compared to clinically-judged diagnoses in a psychosomatic setting.
Psychotherapie Psychosomatik Medizinische Psychologie. 2006;56(1):5–14.
49. Helzer JE, Robins LN, McEvoy LT, et al. A comparison of clinical and diagnostic
interview schedule diagnoses. Physician reexamination of lay-interviewed cases in the
general population. Arch Gen Psychiatry. 1985;42:657–666.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 27

50. Anthony JC, Folstein M, Romanoski AJ, et al. Comparison of the Lay Diagnostic
Interview Schedule and a standardized psychiatric diagnosis. Experience in eastern
Baltimore. Arch Gen Psychiatry. 1985;42(7):667–675.
51. Steiner J, Tebes J, Sledge W, et al. A comparison of the structured clinical interview for
DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183(6):365–369.
52. Shear MK, Greeno C, Kang J, et al. Diagnosis of nonpsychotic patients in community
clinics. Am J Psychiatry. 2000;157:581–587.
53. Miller PR. Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of
structured versus unstructured interviews. Psychiatry Res. 2001;105:265–272.
54. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and outcomes of
structured vs. unstructured interviews. Psychiatry Res. 2001;105:265–271.
55. Kashner TM, Rush AJ, Suris A, et al. Impact of structured clinical interviews on
physicians’ practices in community mental health settings. Psychiatr Serv.
2003;54:712–718.
56. Basco RM, Bostic JQ, Davies D, et al. Methods to improve diagnostic accuracy in a
community mental health setting. Am J Psychiatry. 2000;157(10):1599–1605.
57. Riskind JH, Beck AT, Berchick RJ, et al. Reliability of DSM-III diagnoses for major
depression and generalized anxiety disorder using the Structured Clinical Interview for
DSM-III. Arch Gen Psychiatry. 1987;44:817–820.
58. Williams JBW, Gibbon M, First MB, et al. The Structured Clinical Interview for
DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry.
1992;49:630–636.
59. Robins L. National Institute of Mental Health diagnostic interview schedule—its
history, characteristics, and validity. Arch General Psychiatry. 1981;38:381.
60. Rogers R. Handbook of diagnostic and structured interviewing. New York: Guilford
Publications, 2001.
61. Gibson C. Semi-structured and unstructured interviewing: a comparison of
methodologies in research with patients following discharge from an acute
psychiatric hospital. J Psychiatric Mental Health Nursing. 1998;5(6):469–477.
62. Robins LN. Psychiat Disorders A: 1991.
63. Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of DSM-
III-R psychiatric disorders in the United States—results from the National Comorbidity
Survey. Arch Gen Psychiatry. 1994;51:8.
64. Brugha TS, Bebbington PE, Jenkins R. A difference that matters: comparisons of
structured and semi-structured psychiatric diagnostic interviews in the general
population. Psychol Med. 1999;29:1013–1020.
65. Spitzer RL, Williams JB, Gibbon M, et al. The Structured Clinical Interview for
DSM-III-R (SCID). I: History, rationale, and description. Arch Gen Psychiatry.
1992;49(8):624–629.
66. Williams JB, Gibbon M, First MB, et al. The Structured Clinical Interview for
DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry.
1992;49:630–636.
67. Haro JM, Arbabzadeh-Bouchez S, Brugha TS, et al. Concordance of the Composite
International Diagnostic Interview Version 3.0 (CIDI 3.0) with standardized clinical
assessments in the WHO World Mental Health Surveys. Int J Methods Psychiatric Res.
2006;15(4):167–180.
68. Kashner TM, Rush AJ, Suris A, et al. Impact of structural clinical interviews on physicians’
practices in community mental health settings. Psychiatric Services. 2003;54(5):712–718.
28 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

69. Philipp M, Maier W. The polydiagnostic interview: a structured interview for


polydiagnostic classification of psychiatric patients. Psychopathology. 1986;19:175–185.
70. Wing JK, Babor T, Brugha T, et al. SCAN. Schedules for Clinical Assessment in
Neuropsychiatry. Arch Gen Psychiatry. 1990;47(6):589–593.
71. Murphy JM, Monson RR, Laird NM, et al. A comparison of diagnostic interviews for
depression in the Stirling County Study Challenges for Psychiatric Epidemiology. Arch
Gen Psychiatry. 2000;57:230–236.
72. Wittchen HU, Robins LN, Cottler LB, et al. Cross-cultural feasibility, reliability and
sources of variance of the Composite International Diagnostic Interview (CIDI). The
multicentre WHO/ADAMHA field trials. Br J Psychiatry. 1991;159:645–658.
73. Wittchen HU. Reliability and validity studies of the WHO-Composite International
Diagnostic Interview (CIDI): a critical review. J Psychiatr Res. 1994;28:57–84.
74. Andrews G, Peters L. The psychometric properties of the Composite International
Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol. 1998;33:80–88.
75. Janca A, Robins LN, Bucholz KK, et al. Comparison of Composite International
Diagnostic Interview and clinical DSM-III-R criteria checklist diagnoses. Acta
Psychiatr Scand. 1992;85:440–443.
76. Booth BM, Kirchner JE, Hamilton G, et al. Diagnosing depression in the medically ill:
validity of a lay-administered structured diagnostic interview. J Psychiatric Res.
1998;32(6):353–360.
77. Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-International Neuropsychiatric
Interview (M.I.N.I.): the development and validation of a structured diagnostic
psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl
20):22–57.
78. Kendell RE. Clinical validity. Psychol Med. 1989;19:45–55.
79. Zimmerman M, Mattia JI. Psychiatric diagnosis in clinical practice: is comorbidity
being missed? Comprehensive Psychiatry. 1999;40:182–191.
80. Miller PR. Inpatient diagnostic assessments: 3. Causes and effects of diagnostic
imprecision. Psychiatry Res. 2002;111:191–197.
81. Gardner W, Kelleher KJ, Pajer KA, et al. Primary care clinicians’ use of standardized
psychiatric diagnoses. Child Care Health Development. 2004;30(5):401–412.
82. Toshiyuki S, Makoto T. Is DSM widely accepted by Japanese clinicians? Psychiatry
Clin Neurosci. 2001;55:437–450.
2
OVERVIEW OF DEPRESSION SCALES
AND TOOLS

Alex J. Mitchell

1. Background
2. The Classic Severity Scales (1960–1980)
3. The New Severity Scales (1981–2008)
4. The Future of Screening Scales

Context
There have been a large number of depression tools published for the purposes
of detecting depression or rating its severity. Choosing between them is
difficult without adequate information on their validity, reliability, and
acceptability. Recently, ever-shorter-version mood measures have been
released. Is a shorter scale a better scale? It is important to study each
method against our best standard and ideally compare scales head to head
to judge the optimal scale for each situation.

1. Background
Clinicians and researchers have developed a bewildering number of tools for the
assessment of depression. These are most often questionnaires designed to help
elicit symptoms of depression for the purpose of screening, diagnosis, and
monitoring progress (Textbox 2.1). Although we often use the terms screening,
diagnosis, and case-finding interchangeably, in an epidemiologic sense screening
refers to the attempted detection of disorder in those who had not sought testing or
did not suspect they had a particular condition. Often a screening test is not
usually intended to be diagnostic, in that those with suspicious findings may be
referred for more definitive examination. The latter is perhaps better known as
case-finding. This means a screening tool can favor negative predictive value

29
30 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 2.1. Definitions of Screening and Related Procedures

Screening
‘‘The systematic application of a test or inquiry, to identify individuals at
sufficient risk of a specific disorder to warrant further actions among those
who have not sought medical help for that disorder’’
Case-Finding
‘‘The selected application of a test or inquiry, to identify those individuals
with a suspected disorder and exclude those without a disorder, usually in a
population who have sought medical help’’
Targeted (High-Risk) Case-Finding
‘‘The highly selected application of a test or inquiry, to identify individuals at
high risk of a specific disorder by virtue of known risk factors’’
Severity Assessment
‘‘The application of a test or inquiry, to quantify the severity of a specific
disorder’’
Adapted from Department of Health. Annual report of the National Screening Committee.
London: DoH, 1997.

(NPV) over positive predictive value (PPV) (see Chapter 5). In both screening and
case-finding the test may be applied ‘‘routinely’’ to all cases, or selectively to
those thought to be at high risk. A screening test applied to many individuals
should be as simple as possible to retain high uptake, and positive results must be
paired with an acceptable next step.1 A case-finding measure may be more
involved but should still consider acceptability. Adoption of a test in clinical
practice probably depends more on acceptability than accuracy.2

Historical Aspects
During the past five decades there has been a considerable effort to improve the
methods used to detect and quantify depression (Textbox 2.2).3–6 Some scales,
such as the Cronholm-Ottosson Depression Scale, have fallen into obscurity, while
others, such as the Hamilton Depression Rating Scale and the Beck Depression
Inventory, have each been cited over 10,000 times. Given that there are so many
similar depression scales, it is not surprising that clinicians have trouble choosing
between them. The American College of Psychology Consultants lists 213 psy-
chologically oriented scales with variable validation and reliability data,7 simpli-
fied here to 50 depression scales (Textbox 2.3). Fortunately, this may be distilled
further to ten key depression instruments, five created before 1980 and five more
modern inventions (table 2.1, 2.2). The classic scales are the Hamilton Depression
Rating Scale (HAM-D), the Montgomery-Åsberg Depression Rating Scale
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 31

(MADRS), the Beck Depression Inventory (BDI), the Zung Self-Rating


Depression Scale (SDS), and the Centre for Epidemiologic Studies Depression
Scale (CES-D). The five key scales developed since 1980 are the Hospital Anxiety
Depression Scale (HADS), the Geriatric Depression Scale (GDS), the Edinburgh
Postnatal Depression Scale (EPDS), the MOS 8-Item Depression Screener
(Burnam Screen), and the Patient Health Questionnaire (PHQ-9). In addition, I
have included the less-well-known Major Depression Inventory (MDI) as it has a
special role, facilitating a diagnosis based on both DSM-IV and ICD-10 criteria.
Tools examining more general psychopathology are purposely omitted from this
chapter even if they include a rating of depression. This includes some seminal
scales such as the General Health Questionnaire (GHQ) and the Hopkins
Symptom Checklist (SCL) family (SCL-90, SCL-25, and SCL-8).8–10 To keep
this chapter manageable I will also not discuss reliability and validity data in detail,
but further information can be found in relevant chapters by setting. A comparison
of these key scales is shown in Appendix 1.

Textbox 2.2. Development of Major Depression Scales

1952 DSM-I published


1960 Hamilton Depression Scale (HAM-D)
1961 Beck Depression Inventory (BDI)
1965 Zung Self-Rating Depression Scale (SDS)
1968 DSM-II published
1977 Center for Epidemiologic Studies Depression Scale (CES-D)
1977 ICD-9 published
1979 Montgomery-Åsberg Depression Rating Scale (MADRAS)
1980 DSM-III published
1980 The Bech–Rafaelsen Melancholia Scale (MES)
1982 Geriatric Depression Scale (GDS-30)
1983 Hospital Anxiety and Depression Scale (HADS)
1986 Abbreviated version of Geriatric Depression Scale (GDS-15)
1987 DSM-IIIR published
1987 Edinburgh Postnatal Depression Scale (EPDS)
1987 Inventory to Diagnose Depression (IDD)
1988 MOS-8 Burnam Screen
1992 ICD-10 published
1994 DSM-IV published
1996 Revision of BDI to BDI-II
2001 Patient Health Questionnaire (PHQ)
2001 Major Depression Inventory (MDI)
DSM (Diagnostic and Statistical Manual of Mental Disorders);
ICD – International Classification of Disease
32 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 2.3. Listing of Depression Scales


Generic Scales Special Population Scales
Beck Depression InventoryTM-Second Edition Aphasic Depression Rating Scale
(BDI-II)TM (ADRS)
Brief Psychiatric Rating Scale (BPRS) Calgary Depression Scale for
Brief Symptom Inventory (BSI) Schizophrenia (CDSS)
Burns Depression Checklist (BDC) Children’s Depression Inventory
Carroll Depression Scales-Revised (CDS-R) (CDI)
Center for Epidemiological Studies Depression The Children’s Depression Index
Scale (CES-D) (CDI)
Depression Anxiety Stress Scales (DASS) Children’s Depression Rating
Depression Questionnaire (DQ) Scale-Revised (CDRS-R)
Depression 30 Scale (D-30) Cornell Scale for Depression in
Diagnostic Interview Schedule Dementia (Cornell Scale)
(DIS-IV) Depression and Anxiety in Youth
Diagnostic Inventory for Depression (DID) Scale (DAYS)
Hamilton Depression Inventory (HDI) Depression Intensity Scale Circles
Hamilton Rating Scale for Depression (HRSD) (DISCs)
Hopelessness Depression Symptom Depression Rating Scale (DRS)
Questionnaire (HDSQ) Geriatric Depression Scale (GDS)
Hospital Anxiety and Depression Scale (HADS) Kiddie-Schedule for Affective
Inventory to Diagnose Depression (IDD) Disorders and Schizophrenia for
Inventory of Depressive Symptomatology (IDS) School-Age
IPAT Depression Scale Children-Present and Lifetime
Manual for the Diagnosis of Major Depression Version (K-SADS-PL)
(MDMD) Medical-Based Emotional Distress
Minnesota Multiphasic Personality Inventory 2 Scale (MEDS)
(MMPI-2) Depression Scale Multiscore Depression Inventory
Montgomery–Åsberg Depression Rating Scale for Children (MDI-C)
(MADRS) Postpartum Depression Interview
MOS 8-Item Depression Screener Schedule (PDIS)
Multiple Affect Adjective Checklist-Revised Psychopathology Inventory for
(MAACL-R) Mentally Retarded Adults
Multiscore Depression Inventory for (PIMRA)
Adolescents and Adults (MDI) Reynolds Adolescent Depression
Newcastle Scales Scale (RADS)
Positive and Negative Affect Scales (PANAS) Reynolds Child Depression Scale
Primary Care Evaluation of Mental Disorders (RCDS)
(PRIME-MD) Signs of Depression Scale (SDSS)
Profile of Mood States (POMS) Stroke Aphasic Depression
Raskin Three-Area Severity of Depression Scale Questionnaire (SADQ)
Revised Hamilton-Rating Scale for Depression Visual Analog Mood Scales
(RHRSD): (VAMS)
Reynolds Depression Screening Inventory (RDSI) Youth Depression Adjective
Rimon’s Brief Depression Scale (RBDS) Checklist (Y-DACL)
State Trait-Depression Adjective Check List
(ST-DACL)
Symptom Checklist-90-Revised (SCL-90-R)
Zung Self-Rating Depression Scale (Zung SDS)
Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empirically-based
measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 33

The Limitations of Severity Scales


Most mood scales have only an approximate relationship to the criteria of ICD
and DSM (see Textbox 2.2). None adhere strictly to these algorithmic criteria
(including duration and function), and as such they do not produce operational
diagnoses. Several early scales were developed to measure severity (see
Sensitivity to Change below) during treatment.11 Yet the value of a scale
does not necessarily correspond to its original or intended use—for example,
the EPDS may not be the optimal choice in perinatal settings and yet may be
valuable elsewhere. The evaluation and refinement of existing scales is dis-
cussed in Chapter 4. It remains a significant limitation that only a small number
of well-powered studies have compared the value of multiple scales head to
head.12,13 From these comparative studies, most suggest that severity scales
provide somewhat distinct estimates of depression diagnosis and severity (this
has been confirmed by Rasch analysis).14–16 For example, although all mea-
sure low mood, not all measure anhedonia, somatic symptoms, anxiety, sui-
cidal ideation, and well-being.
Depression scales are predominantly symptom counts over a narrowly
defined period. They do not tend to measure chronicity or effect on daily
function. Thus, they should not be considered a precise measure of burden of
depression. Neither do they measure met or unmet needs or the desire for help.
One fundamental issue is that it is not clear which of many possible symptoms
of depression are most important for diagnosis (see Chapter 1). For example,
some symptoms appear more likely to be associated with greater severity and
pervasiveness of depression.17 If some symptoms are more important than
others, should the scale weight items differently? This has been tried, but
without good validation and at a cost of significant scale complexity.
A second unresolved issue is whether depression differs significantly by
setting and by comorbid disease. If one presupposes that there is one syndrome
of depression manifest in all situations (eg, primary care, specialist care) and
all medical conditions, then the role of any scale is simply to best identify and
quantify these core symptoms. Although the ‘‘one size fits all’’ approach
sounds unlikely, it is essentially the approach taken by DSM-IV and ICD-10.
These do not attempt to define a syndrome of, say, ‘‘post-stroke depression’’ as
opposed to uncomplicated depression in primary care. A number of very
specific depression scales have been proposed to elicit special types of mood
disorders. Examples are listed in Textbox 2.3 and include the Depression Scale
in Schizophrenia (DEPS) scale,18 the Cornell Scale for the Assessment of
Depression in Dementia (CSDD),19 the post-stroke depression scale,20 the
Stroke Aphasic Depression Questionnaire (SADQ),21 and the Aphasic
Depression Rating Scale.22 The scientific basis for and against having special
scales for medical settings is discussed in Chapters 10 and 11. This usually
34 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

revolves around the issue of whether to keep or omit somatic items (see
Appendix 2). A final limitation is the temptation to overrely on scales to
improve quality of care. Numerous studies have explored this issues, which
is discussed by Gilbody in Chapter 7.

Patient-Rated Versus Clinician-Rated Scales


In the case of a mental illness where there is no foolproof gold standard, it is by
no means clear whether patient-rated or clinician-rated measures are more
useful.23 A list of such scales is shown in box 4. Neither patient (self)-rated

Textbox 2.4. Major Clinician vs. Self-Report Scales

Clinician-Rated Protocols
Hamilton Rating Scale for Depression
Inventory of Depressive Symptomatology (IDS-C)
Manual for the Diagnosis of Major Depression
Montgomery–Asberg Depression Rating Scale
Newcastle Scales
Raskin Three-Area Scale
Rimon’s Brief Depression Scale

Self-Report Inventories
Beck Depression Inventory-Second Edition
Carroll Depression Scales-Revised
Center for Epidemiological Studies Depression Scale
Diagnostic Inventory for Depression
Hamilton Depression Inventory
Hopelessness Depression Symptom Questionnaire
Inventory to Diagnose Depression
Inventory of Depressive Symptomatology (IDS-SR)
IPAT Depression Scale
Minnesota Multiphasic Personality Inventory 2 Depression Scale
MOS 8-Item Depression Screener
Multiscore Depression Inventory for Adolescents and Adults
Positive and Negative Affect Scales
Revised Hamilton Rating Scale for Depression: Self-Report
Reynolds Depression Screening Inventory
State Trait-Depression Adjective Check Lists
Zung Depression Self-Rating Depression Scale

Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empirically-
based measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 35

scales nor clinician-rated scales are inherently more sensitive to change nor
more accurate.24,25 A self-rated scale has certain benefits over interviewer-
rated scales and clinical interviews in large population studies. A self-rated
scale takes less time and does not require trained personnel. The adminis-
tration and scoring process is probably more standardized for self-rated
scales.26 Clinician-rated scales can directly augment a clinical interview. If
training is a requirement, then the skills of the clinician may also improve.
The major advantage of interviewer-rated scales is that the experience of the
interviewer comes into play. Faravelli and coworkers (1986)27 compared
the distributions of three doctor-rated scales and three self-rated scales in a
series of 100 depressed patients and noted that doctor-rated scales tend to be
asymmetric toward the left, while self-rated scales tend to be asymmetric
toward the right. This may result from the tendency of patients to judge their
own condition as more severe than average, while doctors tend to rate
severity as less than average. On the other hand, patients can underreport
symptoms in some situations.28 Our advice is to choose the type of scale
most suited to the purpose at hand.

Sensitivity to Change
In psychiatry the concept of sensitivity to change of mood was first used in
psychometric research during the 1970s.29,30 Yet sensitivity to change is a
phrase that has been variably defined in the literature and is poorly understood.
Most consider sensitivity to change to be the ability of a severity scale to detect
small changes in outcomes over time with repeated assessment. A more
accurate description of sensitivity to change is the proportion of those who
actually changed according to a gold standard (eg, responders) that were
correctly identified by the instrument under study (Fig. 2.1). One should also
consider specificity to change as a useful concept. This is the proportion of
those who actually did not change (eg, nonresponders) who are correctly
identified as such by the instrument. That said, no group has yet documented
specificity to change.
The HAM-D has been the main comparator in most sensitivity to change
papers.31 The HAM-D, MADRAS, BDI, and HADS have all been compared
head to head, but results do not demonstrate any consistent superiority of
one scale over another. Vermeersch and associates (2004)32 describe five
factors that may influence the sensitivity of a scale: inclusion of irrelevant
items, categorical items, items not conducive to detect change, items asses-
sing traits, and items susceptible to floor and ceiling effects.
Fundamentally, scales with many items are more likely to be sensitive to
subtle changes.
36 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Gold Standard Gold Standard


Change No Change

Instrument Change A/A + B


PPV
A B

Instrument No Change D/C + D


NPV
C D

Total A/A + C D/B + D


Se to Change Sp to Change

Figure 2.1. Accuracy of change in 2  2 format.

2. The Classic Severity Scales (1960–1980)


Hamilton Rating Scale for Depression (HAM-D)33
In 1953 Max Hamilton moved to Leeds, where he developed one of the best-
known scales in psychiatry.34 The original HAM-D was developed to quantify
severity after an interview had established a diagnosis of depression. Despite
its age the HAM-D remains the most commonly used scale in treatment
studies, helped by the fact that it is in the public domain.34 Indeed, it may
have been a victim of its own success, as independent groups have produced as
many as 20 conflicting variations.35
The HAM-D is rather unusual in that it is designed to be administered by a
trained clinician on the basis of the clinical condition at the time of the inter-
view. It requires a rather long semi-structured interview, taking 15 to 20
minutes. As such, it is probably not a good choice for screening in busy clinical
settings. It was developed before DSM criteria were established for depression
and differs significantly from the DSM approach, assessing four of the nine
DSM-IV criteria. It may favor somatic presentations, as eight items are
related to six somatic symptoms: insomnia, psychomotor retardation, loss of
appetite, loss of energy, loss of weight, and loss of libido. There have
been other criticisms, such as lack of a single unifying structure; differential
item weighing, and limited interrater reliability (although this can be
improved).36,37 In the past 5 years several shortened versions of the
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 37

HAM-D have appeared, including a seven-item version and a six-item ver-


sion.38–40 Using Rasch analysis, Bech and coworkers (1981)41,42 confirmed
that six items associated with unidimensionality could be combined. These
were depressed mood, guilt, work/interests, psychomotor retardation, anxiety
psychic, and general somatic symptoms. Several versions provide standardized
explicit scoring conventions and/or structured interview guidance.43

Montgomery-Åsberg Depression Rating Scale (MADRS)44


Montgomery and Åsberg45 published this 10-item scale in 1979 following
earlier development of the Comprehensive Psychopathological Rating
Scale (CPRS).46 Ratings of patients on the 65-item CPRS were used to
identify the 17 most common symptoms in depression, which were field-
tested in four antidepressant trials and hence refined to 10 items suggested
to show the largest changes with treatment. However, it is a mistake to
assume the MADRS is necessarily most sensitive to change (see above);
indeed, a meta-analysis showed that the HAM-D has superior sensitivity to
change.47
Like the HAM-D, this is a clinician-rated scale designed for a trained
interviewer, although a self-rating form was later developed. It covers the
clinical condition at the time of the interview and does not specify a time-
frame during which the patient should be rated. The 10-item checklist
actually consists of 1 observational item and 9 question items that require
about 15 minutes of additional interview time. The items covered are
apparent sadness, reported sadness, inner tension, reduced sleep, reduced
appetite, concentration difficulties, lassitude, inability to feel, pessimistic
thoughts, and suicidal thoughts. These items also cover all the DSM-IV
criteria for major depression, with the exception of psychomotor retardation
or agitation.

Beck Depression Inventory (BDI)48


The original version of this scale was developed by Aaron Beck and colleagues
at the University of Pennsylvania and first published in 1961.49 It can be
administered by a trained professional or self-administered and covers an
explicit 2 weeks before the evaluation (1 week in the original version). The
21-item version requires 5 to 10 minutes. Each item is scored on a consistent
scale of 0 to 3, with options presented in a multiple-choice format. A reading
age of about 10 years is required for a person who is self-administering the test.
In the original publication no timeframe is mentioned, but in the BDI-IA
38 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

revision, this was changed to 1 week and in the BDI-II the time frame was
extended to 2 weeks to more closely follow the DSM criteria for MDD.
Version II (1996) also replaced body image change, weight loss, somatic
preoccupation, and work difficulty with agitation, worthlessness, concentra-
tion difficulty, and loss of energy. The scale is considered to emphasize
psychological items. In fact, there are eight ‘‘cognitive items’’ (pessimism,
past failures, guilty feelings, punishment feelings, self-dislike, self-critical-
ness, suicidal thoughts or wishes, and worthlessness) and nine ‘‘somatic items’’
(crying, agitation, indecisiveness, loss of energy, change in sleep patterns,
change in appetite, concentration difficulties, tiredness and/or fatigue, and
loss of interest in sex). Other items are sadness, loss of pleasure, loss of interest,
and irritability. The cognitive and somatic items, when considered as sub-
scales, are typically moderately correlated.50
Recently Beck and associates developed the Beck Depression Inventory
Fast Screen (BDI-FS) to address possible somatic contamination.51 It con-
tains 7 of the original 21 BDI-II items to assess cognitive and affective
aspects of depression, conforming with DSM-IV diagnostic criteria. It was
developed to permit more rapid detection of depression in primary care and
hospital settings.
Original validation data was derived two samples, a group of 500 patients
from four psychiatric outpatient facilities and a group of 120 college students.
Rasch analysis of BDI has been reported.52 The BDI was administered to 660
adult patients with unipolar depression and examined using factor analysis.
BDI was internally consistent but yet distinct in severity rating from the
MADRS.53

The Zung Self-Rating Depression Scale (SDS)54


The Zung SDS is a 20-item scale in its original form that takes about 5 to 8
minutes to administer.55 It is the prototypical self-report depression scale. Of
the 20 items, half are worded positively (‘‘I feel hopeful about the future’’) and
half negatively (‘‘I feel downhearted and blue’’). Each item is consistently
rated with a 4-point Likert scale (a little of the time ¼ 1; some of the time ¼ 2; a
good part of the time ¼ 3; or most of the time ¼ 4). A meta-analysis summar-
ized validity studies up to 1986.56 A large factor analysis in over 1,000 cancer
patients showed a four-factor solution: a cognitive symptom factor, a depressed
mood factor, and two somatic factors (eating-related and non–eating-related),
accounting for 20%, 13%, 7%, and 8% of the variance on the Zung, respec-
tively.57 Rasch analysis of the Zung SDS has been performed.58 Several short
forms have been developed, including a 12-item,59 an 11-item,60 and a 10-item
version.61
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 39

The Centre for Epidemiologic Studies Depression Scale62


This 20-item scale was originally developed as a screening instrument for
community-based studies from existing scales such as the BDI and Zung
SDS.63 It was designed at the U.S. National Institute of Mental Health
(NIMH) with government rather than university funding. It bridged both
epidemiologic and clinical needs and was first used in an epidemiologic
study of Kansas City64 and became the most used depression scale in the
1990s. It includes items concerning low mood and loss of interest but not
suicidal ideation. Original psychometric properties were based on three
community samples and two psychiatric patient samples consisting of
about 5,000 healthy individuals but only 70 adult psychiatric patients. Four
of the 20 items are positively worded and reverse scored (negatively keyed).
CES-D is designed for self-completion, telephone administration, or web-
based administration. The approach is mostly psychological, with some
somatic items. The CES-D has four separate factors: low mood, somatic
symptoms, positive affect, and interpersonal relations. A revised version
has been published, the CESD-R, which is more in line with DSM. There
are a variety of short forms, most notably several 10-item versions and a
5-item version.65 Recently Rasch-modeled short forms have been reported in
a general population.66 A second model has been applied to the depressed
population.67

3. The New Severity Scales (1981–2008)


Hospital Anxiety Depression Scale (HADS)68
The HADS can be considered the first in a new generation of scales that were
shorter, easier to score, and no less accurate than the first generation. It is a
relatively brief self-administered rating scale of symptoms and functioning.
Anxiety and depression are assessed as separate components, each with seven
items that are rated from 0 (no problem) to 3. A cut-off of 7v8 in each
subscale is usually recommended, although others have been used.69
Although the scores for the two components have often been added together
to give a composite anxiety–depression score (or emotional distress), this is
not recommended by the authors. It is a fairly simple scale that does not
include somatic and cognitive signs of depression. Limitations are that seven
of nine DSM criteria are not covered in the HADS and the reverse rating of
some items, together with the random sorting of depression and anxiety
questions, can cause confusion. It excludes reduced appetite, weight loss,
sleeping disturbances, fatigue, and concentration difficulties and also
excludes guilt, worthlessness, and suicidality. Notably, it does not include a
40 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

3000

2500

2000

1500

1000

500

n
en

en
n

Se een
Th e

en
n
e

ee
ro

ur

e
ne

ve

en
x

ee
gh

v
ve
re

in
Tw

Te
Si

te

te
el
Ze

Fo

fte
Fi

nt
O

ev
Th

irt
ei

xt
Se

ur

gh
Tw

ve
Fi

Si
El

Fo

Ei
Figure 2.2. Distribution of HADS-D scores in 18,414 primary care attendees. Adapted
from Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition
of depressive symptoms in primary care: The Hampshire Depression Project 3.
Br J Psychiatry. 2001;179:317–323.

question on low mood per se. These choices may or may not be advantageous
in general hospital and primary care settings (see Chapters 10 and 11 for
discussion). Despite these limitations, the HADS has found an important
place and has been used in impressive studies involving thousands of patients
(Fig. 2.2).70–72 Good data are also available on values in nonclinical
populations.73

Geriatric Depression Scale (GDS)74


In its original form the GDS consists of a simple list of 30 questions, all
of which require a ‘‘yes’’ or ‘‘no’’ answer.75 However, a 15-item version
is very commonly used. Ten of the items on the GDS-30 and five of the
items on the GDS-15 are negatively keyed (ie, a ‘‘no’’ response is an
endorsement of a depressive symptom). The GDS is a self-report instru-
ment, and a telephone version has demonstrated good agreement with the
self-report questionnaire. The GDS focuses on the psychological symp-
toms of depression, particularly changes in mood and thoughts. Few
somatic items are included on the GDS—specifically, sleep, appetite,
gastrointestinal symptoms, autonomic symptoms, and sexual symptoms
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 41

are not assessed. GDS-30 covers five of the DSM-IV criteria using
differing terminology (lowered mood, loss of interest, loss of energy,
impaired concentration, and restlessness), and GDS-15 covers three (low-
ered mood, loss of interest, and loss of energy). Questions about suicidal
ideation were intentionally not included, and the scoring of items makes
the GDS a poor choice for rating the burden or severity of depression.
Rasch analysis of GDS has been reported.76 In one study of 526 people
over 65 in home care, the optimal cutoff on the GDS-15 was 5, which
yielded a sensitivity of 71.8% and a specificity of 78.2%.77 A systematic
review of the GDS found 42 studies with a mean sensitivity of 0.753 and
specificity of 0.770 for the GDS-30 and a sensitivity of 0.805 and a
specificity of 0.750 for the GDS-15.71 GDS versions showed significantly
better validity indices than the ‘‘Yale-1-question’’ screen but were similar
to the CES-D. Briefer 10-item, 5-item, and 4-item versions and even a
1-item version have been developed, but their value is currently uncertain.

The Edinburgh Postnatal Depression Scale (EPDS)78


Cox and colleagues developed this scale after noting that some women
endorse somatic items on existing scales because of the physiologic
changes of childbearing and because of normal postnatal sleep distur-
bance.79,80 The authors used clinical intuition to identify possible items
from questionnaires such as the SAD and HAD scales and the BDI. Thirty
items were initially tested, and 13 items that were thought likely to detect
mothers with clinical depression were tested on a sample of 60 postnatal
women against the Clinical Interview Schedule. After factor analysis this
was shortened to the final 10-item scale. Interestingly, the EPDS contains
no specific item about mother–baby interaction or about irritability, which
allowed its use to be expanded beyond perinatal settings. Its appeal is
enhanced by its simple Likert scoring—0 for no presence of the symptom
through 3 for marked presence/change in usual state. It incorporates anxiety
but not suicidality.
Studies suggest that the EPDS includes three factors expressing euthymic
mood, anxiety, and depression. Anxiety (items 3, 4, 5, 6, and 7), depression
(items 8, 9, and 10), and anhedonia (items 1 and 2) are the main components of
the questionnaire, accounting for 63% of the variance.81 A short five-item
version of the EPDS was developed after stepwise multiple regression analysis
was used to find the combination of items that explains the maximum propor-
tion of the variance of the full-scale sum score in 2,730 women. The selected
EPDS items were thereafter correlated with the Hopkins Symptom Check List
(HSCL-25)82 for external validation. The five items were ‘‘I have felt sad or
42 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

miserable,’’ ‘‘I have been anxious or worried for no good reason,’’ ‘‘I have been
so unhappy that I have had difficulty sleeping,’’ ‘‘I have blamed myself
unnecessarily when things went wrong,’’ and ‘‘I have looked forward with
enjoyment to things.’’ Rasch analysis of the EPDS suggested that a revised
eight-item version (EPDS-8) might provide a more psychometrically robust
scale.83 Recent mandated screening programs in Australia and the United
States have recommended routine administration of the EPDS, although
National Institute for Health and Clinical Excellence (NICE) guidance in the
United Kingdom does not.

MOS 8-Item Depression Screener (Burnam Screen)84


This short tool was developed for use in the National Study of Medical Care
Outcomes (MOS).85 It was essentially an adaptation of the CES-D, although
two items related to duration of symptoms (required for DSM diagnosis/
caseness) were drawn from the DIS. The tool has only eight items, although
#7 and #8 are rather unwieldy single questions: 1. I felt depressed, 2. My
sleep was restless, 3. I enjoyed life, 4. I had crying spells, 5. I felt sad, 6. I
felt that people disliked me, 7. In the past year, have you had 2 weeks or
more that you felt sad, blue, depressed, or lost pleasure in things that you
usually cared about or enjoyed?, 8. Have you had 2 years or more in your
life when you felt depressed or sad most days, even if you felt okay some-
times? (If yes:) Have you felt depressed or sad much of the time in the past
year?
Validation data were provided by two samples: 3,132 adults in the Los
Angeles sample of the Epidemiological Catchment Area (ECA) study, and 525
adults from the Psychiatric Screening Questionnaire for Primary Care Patients
(PSP) study. However, a limitation is that a complex scoring algorithm has
been suggested. Additionally, in comparison with the NIMH’s Structured
Clinical Interview for DSM-IV, the screen had low positive predictive value
(Tuunainen et al., 2001).86

The Patient Health Questionnaire (PHQ)87


The PHQ is the self-administered version of the Primary Care Evaluation of
Mental Disorders (PRIME-MD) instrument, which was designed to diagnose
specific disorders in primary care settings using DSM criteria.88 The whole
PRIME-MD has two components: a 1-page patient questionnaire (PQ) and a
12-page clinician evaluation guide (CEG). The PQ, which is completed by the
patient before seeing the primary care physician (PCP), consists of 26 yes/no
questions inquiring about symptoms that were present during the past month.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 43

The focus is on a depressive episode (the SCID focuses on depressive


disorder).
The depression module comprises nine questions (PHQ-9). The first two
questions (known as the PHQ-2), which refer to the ‘‘cardinal’’ symptoms of
anhedonia and depressed mood, can be administered separately as a
screening tool. This scale rates the proportion of time from ‘‘0’’ (not at all)
to ‘‘3’’ (nearly every day). Rated linearly, a cutoff of 10 is suggested to
represent mild depression. However, individual items can be combined
according to a DSM-IV algorithm to generate a diagnosis of major or minor
depression. The DSM-IV exclusion criteria for a depressive disorder are not
included in the PHQ-9; therefore, the PHQ9 diagnosis closely approximates
but is not identical to a DSM-IV diagnosis. Validation of the PHQ-9 took
place in 6,000 patients in eight primary care clinics and seven obstetrics-
gynecology clinics.89
The short version of the PHQ is almost as well known as the long
version. The PHQ-2 is a two-item screen which uses the first two items
from the PHQ that inquire about the frequency of depressed mood (ques-
tion 2) and loss of interest (question 1) over the past 2 weeks, scoring
each as 0 (‘‘not at all’’) to 3 (‘‘nearly every day’’). A score of three points
or more on this version of the PHQ-2 is sometimes recommended.81
However, an even simpler version calls for simple ‘‘yes’’ or ‘‘no’’
responses, with a ‘‘yes’’ response to either question constituting a positive
screen. The questions are as follows: Over the past month, have you often
had little interest or pleasure in doing things? (Yes/ No) Over the past
month, have you often been bothered by feeling down, depressed, or
hopeless? (Yes/ No). A two-stage screening with the PHQ-2 and then
the PHQ-9 has been investigated and is probably more efficient than
either test alone. However, when given by pen and paper, the time
taken to check if there is a positive PHQ-2 may limit the efficiency
saving.

Major Depression Inventory (MDI)90


This self-rated questionnaire aims to help make a diagnosis of major depres-
sion, according to either the DSM-IV criteria or the ICD-10 criteria.91 It covers
the previous 2 weeks and requires 5 to 10 minutes. An answer of ‘‘more than
half of the time’’ to at least 5 of the 10 questions is indicative of major
depression. It has 10 questions, although items 8 and 10 each have two
subitems, a and b—therefore, it can be considered 12 items. Ratings are
consistent from 0 (at no time) to 5 (all of the time), giving a total score from
0 to 50. A score of 4 or more on an item (ie, most of the time) qualifies for the
algorithm of ICD-10 or DSM-IV. The ICD-10 algorithm requires a score of 4
44 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

or 5 on two of the three top items and on at least four of the remaining items.
The DSM-IV algorithm requires a score of 4 or 5 on five of the nine items (item
4 being excluded), but at least one of these five items must be either depressed
mood or loss of interest.
Few validation studies or translations of the MDI exist.92 A comparative
study of the SDS and MDI in 89 patients with Parkinson’s disease suggested
that the MDI is superior to the SDS.93 The largest study compared the MDI in
1,093 persons also interviewed by psychiatrists using SCAN. The specificity of
the MDI was 0.22, the sensitivity 0.67, and kappa 0.25 when major depression
according to SCAN was considered as the index of validity, and with all
depressive disorders the specificity was 0.44, the sensitivity 0.51, and kappa
0.33. More highly educated persons and those with reported disability were
less likely to be false negatives.94

4. The Future of Screening Scales


The ideal scale is one that is very brief, highly acceptable, and very accurate
when tested against an accepted reference standard. It may also be an
advantage if it obeys current conventional diagnostic rules from ICD or
DSM and is freely available but long enough to gauge severity and measure
change. It is unclear whether one scale can fulfill all these purposes, but
there is a trend to develop ever-shorter scales that attempt to retain high
accuracy. All scales must consider the tension between acceptability and
accuracy.

Improving Acceptability
Following on from the originals, ever-shorter versions of every major
scale have been released, usually comprising 10 items or less
(Textbox 2.5). A good example is the 8-item Even Briefer Assessment
Scale for Depression (EBAS DEP) derived from the 21-item Brief
Assessment Scale.95 Of course, eight items might not be short enough for
many settings, and in the extreme case single-item methods (applied by pen
and paper, verbally, or in visual analog form) have been evaluated. The first
‘‘ultra-short’’ scales began to appear in the 1970s with early visual analog
methods of rating mood.96
Just how good are these short and ultra-short scales?97 Whooley and
colleagues (1997)98 compared CES-D (20- and 10-item versions), BDI
(20- and 13-item versions), Symptom-Driven Diagnostic System for
Primary Care (SDDS-PC), and MOS-8 against the Quick Diagnostic
Interview Schedule for major depression. Using summary statistics
Table 2.1. Conventional Cutoff Scores for Different Severities of Depression

Scale Abbreviation No Depression Mild Moderate Severe


(asymptomatic Depression
and
subsyndromal)
Hamilton HAM-D 0 to 7 8 to 13 14 to 18 19 to 63
Depression
Scale
Beck Depression BDI 0 to 9 10 to 16 17 to 29 30 to 63
Inventory
Beck Depression BDI-II 0 to 13 14 to 19 20 to 28 29 to 63
Inventory II
Geriatric GDS-30 0 to 9 10 to 19 20 to 30 20 to 30
Depression
Scale (original)
Zung Self-Rating SDS 0 to 49 50 to 59 60 to 69 70 to 80
Depression
Scale
Hospital Anxiety HADS-D 0 to 7 8 to 10 11 to 14 15 to 21
and Depression
Scale
Montgomery- MADRS 0 to 6 7 to 19 20 to 34 35 to 60
Åsberg
Depression
Rating Scale
Center for CESD 0 to 15 16 to 20 21 to 26 27 to 60
Epidemiologic
Studies
Depression
Scale
Edinburgh EPDS 0 to 9 9 to 12 13 to 30 13 to 30
Postnatal
Depression
Scale
Patient Health PHQ-9 0 to 5 6 to 9 10 to 19 20 to 27
Questionnaire
Patient Health PHQ-9 0 to 9 10 to 16 17 to 22 23 to 27
Questionnaire
(remapped to
DSM-IV)
Major Depression MDI 0 to 13 14 to 19 20 to 26 27 to 50
Inventory

45
Table 2.2. Summary of Scale Properties

Year Scale Abbreviation Original Max Rater Copyright Duration Time Cites Suicidality Somatic
Items Score Frame Per Included? Bias (most
Year to least)
1960 Hamilton HAM-D 21 63 Clinician Public 15 min Past 237 Yes #1
Depression Scale domain week
1961 Beck Depression BDI 21 63 Patient Harcourt 10 min Past few 225 Yes #6
Inventory Assessment days
(BDI)
Last 2
weeks
(in BDI
II)
1965 Zung Self-Rating SDS 20 80 Patient Public 5–8 min Past 84 Yes #5
Depression Scale domain several
days
1977 Center for CESD 20 60 Patient Public 4–5 min Past 256 No #7
Epidemiologic domain week
Studies Depression
Scale
1979 Montgomery- MADRS 10 60 Observer Copyright 10 min Current 107 Yes #4
Åsberg Depression
Rating Scale
Table 2.2. (Continued)

Year Scale Abbreviation Original Max Rater Copyright Duration Time Cites Suicidality Somatic
Items Score Frame Per Included? Bias (most
Year to least)
1982 Geriatric Depression GDS-30 30 30 Patient Public 10 min Past 94 No #10
Scale (original) domain week
1983 Hospital Anxiety HADS 14 42 Patient NFER- 5 min Past 195 No #6
and Depression Nelson week
Scale
1986 Geriatric Depression GDS-15 15 15 Patient Public 5 min Past 31 No #8
Scale (modified) domain week
1987 Edinburgh Postnatal EPDS 10 30 Patient Copyright 1–2 min Past 50 No #11
Depression Scale week
1988 MOS-8 Burnam MOS-8 8 20 Patient RAND 2–5 min 2 weeks 12 No #9
Screen Corporation and 2
years
2001 Patient Health PHQ 9 27 Patient Public 2–4 min 2 weeks 53 Yes #2
Questionnaire domain
2001 Major Depression MDI 10 60 Patient Elsevier 3–5 min 2 weeks 7 Yes #3
Inventory
48 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(Table 2.3), the optimal tests appear to be MOS-8 > CES-D20 > CES-
D10 > BDI-20 > BDI-13 >SDDS-PC, with the least accurate method
being the PHQ-2. However, even the PHQ-2 was good at excluding
nondepressed cases with a high negative predictive value. However, this
finding does not allow for test efficiency—that is, correcting for the
length of the scale. Such weighting requires an economic evaluation,
and such studies are in progress. This finding has since been extended,
showing that even single-item mood scales can be valuable, albeit as a
form of rule out (reassurance) for those who answer negatively.

Textbox 2.5. Short Versions of Rating Scales (10 items or less)


Ten Items Five Items
EPDS-10 (original) EPDS-5
SDS-10 WHO-5
CES-D 10 GDS-5
DEPS-10 Emotion Thermometers
MADRS-10 (original) Four Items
Nine Items GDS-4
PHQ9 Three Items
HDI-Short Form PHQ2 + help question
Eight Items EPDS-3 (anxiety items)
MOS-8 Two Items
EPDS-8 PHQ2
PHQ-8 Whooley / NICE 2 Questions
EBAS-Dep BDI-2
Seven Items EPDS-2
HADS-Depression One Item
HADS-Anxiety PHQ Q1
HAM-D-7 PHQ Q2
BDI-7 GDS-1
DADS-7 Distress Thermometer
EPDS-7 (depression items)
Six Items
EPDS-6
HAM-D-6
CES-D-6

Short methods improve acceptability, but there may be other techniques


to improve uptake. A tool can be administered in the waiting room or by
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 49

Table 2.3. Accuracy of Various Depression Scales Head to Head

Questionnaire Sensitivity Specificity PPV NPV PSI Youden FC AUC


PHQ2 0.96 0.57 0.33 0.98 0.31 0.53 63.99 0.82
SDDS-PC 0.96 0.51 0.30 0.98 0.28 0.47 59.14 0.86
MOS-8 0.93 0.72 0.42 0.98 0.40 0.65 75.75 0.89
CESD20 0.93 0.69 0.40 0.98 0.38 0.62 73.32 0.89
CESD10 0.90 0.72 0.41 0.97 0.38 0.62 75.19 0.87
BDI21 0.89 0.64 0.35 0.96 0.31 0.53 68.47 0.87
BDI13 0.92 0.61 0.34 0.97 0.31 0.53 66.42 0.86

PSI, predictive summary index; PPV, positive predictive value; NPV, negative predictive value; FC,
fraction correct; AUC, area under the curve.
Data from Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. J Gen
Intern Med. 1997;12(7):439.

mail. Increasingly, questionnaires are becoming computerized and can be


given using a Palm Pilot or Tablet or over the Internet (this is discussed
further in Chapter 8). The format of a questionnaire can be influential. For
example, a single-item visual analog item takes no more time than a verbal
item but can quantify a symptom. The seven-item version of the emotion
thermometers tested in cancer and cardiovascular settings is shown in the
Appendix Figure 5.

Improving Accuracy
Algorithmm Approaches
In clinical practice, prevalence is typically low (between 10% and 30%), and
therefore a high negative predictive value is relatively easy to achieve but a high
positive predictive value is difficult. For example, if one applied a screening test
with 80% sensitivity and specificity to a sample of 1,000 individuals with a 20%
rate of depression, the positive predictive value would be 0.50 and the negative
predictive value 0.94 (overall accuracy ¼ 0.80 by fraction correct) (see Appendix
Table Single 3). Given that only 50% of those with a positive result would actually
have depression, what would happen if you applied a second test to those who
scored positive but relied on the results from the first screen for those who scored
negative? This is illustrated in Appendix Figure 3. From Appendix Table
MultiStep 3 providing the second instrument’s sensitivity and specificity of
80% held for the filtered population, the positive predictive value rises to 0.67
at a cost of a small fall in the negative predictive value to 0.85 (overall accuracy
¼ 0.83). In short, applying a second step to those who screen positive in step 1
favors specificity at a cost of sensitivity but with a gain in overall accuracy. This
example of the application of two tests with 80% sensitivity and specificity might
50 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

be unrealistic in clinical practice. Often different test performances are achievable


in each step. A difficult question to answer is: What would be best, to choose
instruments with high sensitivity or high specificity applied in step 1 or step 2?
The answer from Table AP.4 is that it is best to apply the most accurate instrument
first, where clinically possible (although often in screening the reverse occurs). If
both instruments have the same combined value but different sensitivity and
specificity values, the optimal yield can be calculated. The rule of thumb for a
two-step approach for a low-prevalence setting is to avoid putting two instruments
that favor sensitivity together, particularly if one has high sensitivity in the second
step, and this may produce low overall yields. Practical application of two-step
approaches have been recently described.99,100

Weighting Specific Items


In the future there will be re-examination of the weighting of specific symp-
toms of depression in relation to depression in each setting. The current
concept of depression is that there are certain essential core symptoms that
define the disorder and others that contribute to severity.101–104 This may or
may not hold true. A scientific understanding of optimal depression items has
appeared only in the past 3 years. Zimmerman and colleagues have re-exam-
ined the traditional symptoms of depression to discover if all the conventional
symptoms listed in DSM-IV or ICD-10 contribute to a diagnosis of depression.
The difficulty with this method that there is no accepted gold standard (see
Chapter 1). One way around this problem is to simply examine how many
fulfill full DSM-IV (or ICD-10) criteria if only certain symptoms are counted.
Zimmerman and colleagues proposed combining two core and three psycho-
logical symptoms—namely depressed mood, lack of interest, worthlessness,
poor concentration, and thoughts of death. Against full DSM-IV, this abbre-
viated checklist had a sensitivity was 93.7%, specificity 94.8%, positive pre-
dictive value 95.5%, and negative predictive value 91.6%. Andrews and
associates (2007)105 replicated this finding from data from the 10,641 respon-
dents to the Australian National Survey of Mental Health and Well-Being
using the 12-month version of the Composite International Diagnostic
Interview. In this study sensitivity was 92.9%, specificity 99%, positive pre-
dictive value 94%, and negative predictive value 99.7%. Another method is to
start with short versions and only add in items that prove useful. Brody and
colleagues (1998)106 found that adding four follow-up questions on sleep
disturbance, appetite, anhedonia, and self-esteem to the two-question
PRIME-MD markedly improved the specificity while maintaining the
sensitivity.
Future developments will also take into account aspects of depression not
measured by symptom counts alone—for example, tools that measure dura-
tion, impact, function, and desire for professional help.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 51

References
1. Wittkampf KA, van Zwieten M, Smits FT, et al. Patients’ view on screening for
depression in general practice. Fam Pract. 2008;25:438–444.
2. Jepson R, Clegg A, Forbes C, et al. The determinants of screening uptake and
interventions for increasing uptake: a systematic review. Health Technol Assess.
2000;4:14.
3. Grinker RR Sr, Miller J, Sabshin M, et al. The phenomena of depressions. New York:
Hoeber, 1961.
4. Nezu AM, Ronan GF, Meadows EA, et al. Practitioners’ guide to empirically based
measures of depression. Kluwer Academic/Plenum Publishers 2000.
5. Williams JW, Pignone M, Ramirez G, et al. Identifying depression in primary care:
a literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24(4):
225–237.
6. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression:
a meta-analysis. Can Med Assoc J. 2008;178:997–1003.
7. http://www.mentaltests.com/cms/mentaltests_list.
8. Parloff MB, Kelman HC, Frank JD. Comfort, effectiveness, and self-awareness as
criteria of improvement in psychotherapy. Am J Psychiatry. 1954;111:343–351.
9. Derogatis LR, Lipman RS, Covi L. SCL-90: An outpatient psychiatric rating scale,
preliminary report. Psychopharmacol Bull. 1973;9:13–28.
10. Fink P, Ornbol E, Hansen MS, et al. Detecting mental disorders in general hospitals by
the SCL-8 scale. J Psychosom Res. 2004;56(3):371–375.
11. Demyttenaere K, De Fruyt J. Getting what you ask for: On the selectivity of depression
rating scales. Psychotherapy Psychosomatics. 2003;72(2):61–70.
12. Ruhe HG, Dekker JJ, Peen J, et al. Clinical use of the Hamilton Depression
Rating Scale: is increased efficiency possible? A post hoc comparison of
Hamilton Depression Rating Scale, Maier and Bech subscales, Clinical Global
Impression, and Symptom Checklist-90 scores. Comprehensive Psychiatry.
2005;46(6):417–427.
13. Leentjens AF, Lousberg R, Verhey FRJ. The psychometric properties of the Hospital
Anxiety and Depression Scale in patients with Parkinson’s disease. Acta
Neuropsychiatr. 2001;13:83–85.
14. Richter P, Werner J, Heerlein A, et al. On the validity of the Beck Depression
Inventory. A review. Psychopathology. 1998;31(3):160–168.
15. Shafer AB. Meta-analysis of the factor structures of four depression
questionnaires: Beck, CES-D, Hamilton, and Zung. J Clin Psychol.
2005;62(1):123–146.
16. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration
of three scales in the GENDEP study. Psychol Med. 2008;38(2):289–300.
17. Faravelli C, Servi P, Arends JA, et al. Number of symptoms, quantification, and
qualification of depression. Comprehensive Psychiatry. 1996;37(5):307–315.
18. Huttunen J, Taiminen T, Kähkönen J, et al. Depression Scale (DEPS) in schizophrenia.
Acta Psychiatr Scand. 1999;99(3):220–222.
19. Alexopoulos GS, Abrams RC, Young RC, et al. Cornell Scale for Depression in
Dementia. Biol Psychiatry. 1988;23(3):271–284.
20. Gainotti G, Azzoni A, Razzano C, et al. The Post-Stroke Depression Rating Scale: a
test specifically devised toinvestigate affective disorders of stroke patients. J Clin Exp
Neuropsychol. 1997;19(3):340–356.
52 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

21. Leeds L, Meara RJ, Hobson JP. The utility of the Stroke Aphasia Depression
Questionnaire (SADQ) in a stroke rehabilitation unit. Clin Rehab.
2004;18(2):228–231.
22. Benaim C, Cailly B, Perennou D, et al. Validation of the Aphasic Depression Rating
Scale. Stroke. 2004;35:1692.
23. Clements KM, Murphy JM, Eisen SV, et al. Comparison of self-report and clinician-
rated measures of psychiatric symptoms and functioning in predicting 1-year hospital
readmission. Administration And Policy In Mental Health And Mental Health Services
Research. 2006;33(5):568–577.
24. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry.
2000;15(3):160–172.
25. Rush AJ, Carmody TJ, Ibrahim HM, et al. Comparison of self-report and clinician
ratings on two inventories of depressive symptomatology. Psychiatr Serv.
2006;57(6):829–837.
26. Biggs JT, Wylie LT, Ziegler VE. Validity of the Zung Self-Rating Depression Scale.
Br J Psychiatry. 1978;132:381–385.
27. Faravelli C, Albanesi G, Poli E. Assessment of depression: a comparison of rating
scales. J Affect Disord. 1986;11:245–253.
28. Hunt M, Auriemma J, Cashaw ACA. Self-report bias and underreporting of depression
on the BDI-II. J Personality Assess. 2003;80(1):26–30.
29. Vaughan M, Krawiecka M. Sensitivity to change in symptoms of new scales for rating
chronic psychotic patients. Int Pharmacopsychiatry. 1979;14(3):121–126.
30. Maier W, Philipp M, Demuth W, et al. Reliability, validity, transferability and
sensitivity to change of 3 rival observer rating-scales for the severity of depression
(HAM-D, MADRS, BRMS). Int J Neurosci. 1986;31(1–4):288.
31. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale; has
the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177.
32. Vermeersch DA, Whipple JL, Lambert MJ, et al. Outcome questionnaire: Is it
sensitive to changes in counselling center clients? J Counsel Psychol.
2004;51(1):38–49.
33. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry.
1960;23:56–62.
34. http://healthnet.umassmed.edu/mhealth/HamD.pdf.
35. Zitman FG, Mennen MF, Griez E, et al. The different versions of the Hamilton
Depression Rating Scale. Psychopharmacology. 1990;9:28–34.
36. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale: has
the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177.
37. Williams JB. A structured interview guide for the Hamilton Depression Rating Scale.
Arch Gen Psychiatry. 1988;45:742–747.
38. Khullar A, McIntyre RS. An approach to managing depression. Defining and
measuring outcomes. Can Fam Physician. 2004;50:1374–1380.
39. McIntyre RS, Konarski JZ, Mancini DA, et al. Measuring the severity of depression
and remission in primary care: validation of the HAMD-7 scale. Can Med Assoc J.
2005;173:1327–1334.
40. Bobes J, Bulbena A, Luque A, et al. The sufficiency of the HAM-D6 as an outcome
instrument in the acute therapy of antidepressants in the outpatient setting. Int J
Psychiatry Clin Practice. 2007;11(2):146–150.
41. Bech P, Gram LF, Dein E, et al. Quantitative rating of depressive states. Acta
Psychiatr Scand. 1975;51:161–170.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 53

42. Bech P, Allerup P, Gram LF, et al. The Hamilton Depression Scale: evaluation of
objectivity using logistic models. Acta Psychiatr Scand. 1981;63:290–299.
43. Kalali A, Williams JBW, Kobak KA, et al. The new GRID HAM-D: pilot testing and
international field trials. Int J Neuropsychopharmacol. 2002;5:S147–S148.
44. Montgomery SA, Åsberg M. A new depression scale designed to be sensitive to
change. Br J Psychiatry. 1979;134:382–389.
45. http://www.neurotransmitter.net/depressionscales.html.
46. Asberg M, Montgomery SA, Perris C, et al. A comprehensive psychopathological
rating scale. Acta Psychiatr Scand Suppl. 1978;271:5–27.
47. Carroll BJ, Wilson WH. HAM-D and MADRS as depression change measures. In:
New Clinical Drug Evaluation Unit (NCDEU) Program Abstracts, 40th Annual
Meeting, 2000. Rockville, MD: National Institute of Mental Health, poster number 9.
48. Beck AT, Ward CH, Mock J, et al. An inventory for measuring depression. Arch Gen
Psychiatry. 1961;4:561–571.
49. http://harcourtassessment.com/haiweb/cultures/en-us/productdetail.htm?pid=015–
8018–370.
50. Storch EA, Roberti JW, Roth DA. Factor structure, concurrent validity, and internal
consistency of the Beck Depression Inventory-Second Edition in a sample of college
students. Depression Anxiety. 2001;19(3):187–189.
51. Beck AT, Steer RA, Brown GK. BDI-II fast screen for medical patients manual.
London: The Psychological Corporation, 2000.
52. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI):
Applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand.
1987;76(5):568–573.
53. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration
of three scales in the GENDEP study. Psychol Med. 2008;38:289–300.
54. Zung WW. A self-rating depression scale. Arch Gen Psychiatry. 1965;12:63–70.
55. http://healthnet.umassmed.edu/mhealth/ZungSelfRatedDepressionScale.pdf.
56. Lambert MJ, Hatch DR, Kingston MD, et al. Zung, Beck, and Hamilton Rating Scales
as measures of treatment outcome: a meta-analytic comparison. J Consulting Clin
Psychol. 1986;54(1):54–59.
57. Passik SD, Lundberg JC, Rosenfeld B, et al. Factor analysis of the Zung Self-Rating
Depression Scale in a large ambulatory oncology sample. Psychosomatics.
2000;41:121–127.
58. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale
incorporating latent class and Rasch rating scale models. Educational and
Psychological Measurement. 2007;67(2):280–299.
59. Hulstijn EM, Deelman BG, de Graaf A, et al. The Zung-12: a questionnaire
for depression in the elderly. Tijdschr Gerontol Geriatr (Netherlands).
1992;23:85–93.
60. Dugan W, McDonald MV, Passik SD, et al. Use of the Zung Self-Rating Depression
Scale in cancer patients: feasibility as a screening tool. Psychooncology.
1998;7(6):483–493.
61. Tucker MA, Ogle SJ, Davison JG, et al. Validation of a brief screening test for
depression in the elderly. Age Ageing. 1987;16(3):139–144.
62. Radloff LS. The CES-D scale: a self-report depression scale for research in the general
population. Appl Psychol Meas. 1977;1:385–401.
63. http://www.mdlogix.com/cesdr.htm.
54 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

64. Markush RE, Favero RV. Epidemiologic assessment of stressful life events, depressed
mood, and psychophysiological symptoms: A preliminary report. In Dohrenwend BS,
Dohrenwend BP, eds. Stressful life events: their nature and effects. New York: Wiley,
1974:171–190.
65. Furukawa T, Anraku K, Hiroe T, et al. Screening for depression among first-visit
psychiatric patients: Comparison of different scoring methods for the Center for
Epidemiologic Studies Depression Scale using receiver operating characteristic
analyses. Psychiatry Clin Neurosci. 1997;51:71–78.
66. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived
CES-D short form. Psychol Assess. 2004;16(4):360–372.
67. Chan KS, Orlando M, Ghosh-Dastidar B, et al. The interview mode effect on the
Center for Epidemiological Studies Depression (CES-D) scale: an item response
theory analysis. Med Care. 2004;42:281–289.
68. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr
Scand. 1983;67:361–370.
69. Bjellard I, Dahl AA, Tangen Haug T, et al. The validity of the Hospital Anxiety
and Depression Scale. An updated literature review. J Psychosom Res. 2002;
52:69–77.
70. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a
regional cancer centre: screening and unmet treatment needs. Br J Cancer.
2004;90:314–320.
71. Martin CR, Thompson DR, Barth J. Factor structure of the Hospital Anxiety and
Depression Scale in coronary heart disease patients in three countries. J Eval Clin
Pract. 2008;14(2):281–287.
72. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of
depressive symptoms in primary care: The Hampshire Depression Project 3. Br J
Psychiatry. 2001;179:317–323.
73. Crawford JR, Henry JD, Crombie C, et al. Normative data for the HADS from a large
non-clinical sample. Br J Clin Psychol. 2001;40:429–434.
74. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a
geriatric depression screening scale: a preliminary report. J Psychiatr Res.
1983;17:37–49.
75. www.stanford.edu/~yesavage/GDS.html.
76. Tang WK, Wong E, Chiu HFK. The Geriatric Depression Scale should be shortened:
results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20(8):783–789.
77. Marc LG, Raue PJ, Bruce ML. Screening performance of the 15-item Geriatric
Depression Scale in a diverse elderly home care population. Am J Geriatr
Psychiatry. 2008;16(11):914–921.
78. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression: development
of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry.
1987;150:782–786.
79. Wancata J, Alexandrowicz R, Marquart B, et al. The criterion validity of the Geriatric
Depression Scale: a systematic review. Acta Psychiatr Scand. 2006;114(6):398–410.
80. www.aap.org/practicingsafety/Toolkit_Resources/Module2/EPDS.pdf.
81. Cox J, Holden J. Perinatal mental health—A guide to the EPDS. RCPsych
Publications, 2003.
82. Chabrol H, Teissedre F. Relation between the Edinburgh Postnatal Depression Scale
scores at 2–3 days and 4–6 weeks postpartum. J Reprod Infant Psychol. 2004;22:33–39.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 55

83. Hesbacher PT, Rickels K, Morris RJ, et al. Psychiatric illness in family practice. J Clin
Psychiatry. 1980;41:6–10.
84. Burnam MA, Wells KB, Leake B, et al. Development of a brief screening instrument
for detecting depressive disorders. Med Care. 1988;26:775–789.
85. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Post Natal Depression
Scale using Rasch analysis. BMC Psychiatry. 2006;6:28.
86. www.patient.co.uk/showdoc/40025272/.
87. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care. The PRIME-MD 1000 study. JAMA.
1994;272:1749–1756.
88. Tuunainena A, Langer RD, Klauber MR, Kripke DF. Short version of the CES-D
Burnam screen for depression in reference to the structured psychiatric Interview.
Psychiatry Research 2001; 103: 261–270.
89. Kroenke K Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression
severity measure. J Gen Intern Med. 2001;16(9):606–613.
90. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a
two-item depression screener. Med Care. 2003;41:1284–1292.
91. http://www.gp-training.net/protocol/psychiatry/who/mdi.doc.
92. Fountoulakis KN, Iacovides A, Kleanthous S, et al. Reliability, validity and
psychometric properties of the Greek translation of the Major Depression Inventory.
BMC Psychiatry 2003;3:2.
93. Bech P, Wermuth L. Applicability and validity of the MDI in patients with Parkinson’s
Disease. Nord J Psychiatry. 1998;52:305–309.
94. Forsell Y. The Major Depression Inventory versus schedules for clinical assessment in
neuropsychiatry in a population sample. Soc Psychiatry Psychiatric Epi.
2005;40(3):209–213.
95. Weyerer S, Killmann U, Ames D, et al. The Even Briefer Assessment Scale for
Depression (EBAS DEP): its suitability for the elderly in geriatric care in
English- and German-speaking countries. Int J Geriatr Psychiatry. 1999;14(6):
473–480.
96. Folstein M. Reliability, validity, and clinical application of visual analog mood scale.
Psychol Med. 1973;3:479.
97. Blank K, Gruman C, Robison JT. Case-finding for depression in elderly people:
balancing ease of administration. J Gerontol A Biol Sci Med Sci. 2004;59:M378–M384.
98. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
J Gen Intern Med. 1997;12(7):439.
99. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major
depression among patients with coronary artery disease using the Patient Health
Questionnaire: Data from the Heart and Soul Study. J Gen Intern Med. 23(12):
2014–2017.
100. Bech P, Rasmussen N, Olsen R, et al. The sensitivity and specificity of the MDI using
the Present State Examination as the index of diagnostic validity. J Affect Disord.
2001;66:159–164.
101. Mitchell AJ, Baker-Glenn EA, Park B, et al. Can the distress thermometer be improved
by additional mood domains? Part II: What is the Optimal Combination of
Thermometers? Psychooncology. 2009 [e-pub March 18].
102. Evans KR, Sills T, DeBrota DJ, et al. An item response analysis of the Hamilton
Depression Rating Scale using shared data from two pharmaceutical companies.
J Psychiat Res. 2004;38:275–284.
56 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

103. Maier W, Philipp M. Improving the assessment of severity of depressive states: a


reduction of the Hamilton Depression Scale. Pharmacopsychiatry. 1985;18:
114–115.
104. Gibbons RD, Clark D, Kupfer DJ. Exactly what does the Hamilton Depression Rating
Scale measure? J Psychiat Res. 1993;27:259–273.
105. Andrews G, Slade T, Sunderland M, et al. Issues for DSM-V: Simplifying DSM-IV to
enhance utility: the case of major depressive disorder. Am J Psychiatry. 2007;164:
1784–1785.
106. Brody DS, Hahn SR, Spitzer RL et al. Identifying Patients With Depression in the
Primary Care Setting:A More Efficient Method. Arch Intern Med. 1998;158:2469–
2475.
3
WHY DO CLINICIANS HAVE DIFFICULTY
DETECTING DEPRESSION?

Alex J. Mitchell

1. Introduction to the Problem of Over- and Under-Detection


2. Predictors of Detection
3. Patient and Clinician Influences on Detection
4. Illness-Related Influences on Detection
5. Conclusions

Context
Hundreds of studies reveal than most cases of depression remain undetected
and untreated. Yet there is growing concern that efforts to increase detection of
depression entail unacceptable numbers of persons who are not depressed
nonetheless being given a diagnosis and receiving medication. What factors
underlie false-positive and false-negative errors? How might clinicians and
services address these detection errors?

1. Introduction to the Problem of Over- and Under-Detection


Only about half of primary care practitioners (PCPs) feel confident in diag-
nosing depression or assessing suicide risk.1–6 Yet the issue of underdetection
is by no means confined to PCPs7–13 or to depression.14,15 Convincing data
show that clinicians in all medical specialties have difficulty recognizing
mental disorders. This includes depression, anxiety, and delirium and
dementia.16,17 Less discussed in the literature but increasingly recognized as
important is the issue of overdetection. In this chapter I will review the
predictors of diagnostic errors (false positives and false negatives) with

57
58 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

reference to depression in primary care. I will focus on two essential barriers to


correct identification: communication and illness complexity.
To meaningfully discuss errors in recognition, it is important to first
establish baseline rates of depression. Prevalence exerts a powerful influence
upon detection accuracy, not least because clinicians usually have a higher
index of suspicion for high-risk patients. The World Health Organization
(WHO) study on Psychological Problems in General Health Care (PPGHC),
conducted across 14 countries, found that 26% of individuals visiting their
PCP had at least one psychiatric disorder as defined by ICD-10 criteria.18
Fourteen percent had major depression. Almost identical rates were reported
from the European Study of the Epidemiology of Mental Disorders
(ESEMeD).19,20 If one examines depression in older people, the point pre-
valence of major depression is lower in rural than urban primary care
practices (8.3% versus 14.8%).21 Further, if one combines a 14% rate of
major depression with 10% who have minor depression, then the combined
rate approaches 25%.22

How Many Cases of Depression Are Detected in Routine Care?


Approximately 100 studies concerning the unassisted recognition rate of
depression in primary care have been published, but only a third have used a
robust semi-structured interview as a gold standard.23 Of these at least 10 have
had samples of more than 1,000 and 17 studies examined both the ability of
clinicians to rule in and rule out a diagnosis (see table 3.1). From these studies
PCPs’ pooled sensitivity is 48% and specificity 70%. At a prevalence of 16%,
the positive predictive value (PPV) is 21.4% and the negative predictive value
(NPV) is 87.4%. In a low-risk sample where the prevalence is 10%, the PPV
becomes 14% and NPV 92%. This is best illustrated in a Bayesian plot of
conditional probabilities (Fig. 3.1).
Looked at descriptively at a prevalence of 16%, an average PCP would
correctly identify 8 out of 16 depressed cases, missing 8 true positives. He or
she would correctly reassure 57 out of 84 non-cases but falsely diagnose 27
people as depressed (Fig. 3.2). Thus, the number of correctly identified people
per 100 screened would be 64 (the number needed to screen would be 3.5 to
correctly identify one true case or non-case). Out of every five cases thought to
be depressed, only one would be a true case (PPV = 21.4%). Out of every 10
cases thought to be well, approximately 9 would be correctly reassured
(NPV = 87.4%).
In a low-risk sample (such as a rural practice) where the prevalence is 10%,
an average PCP would correctly identify 5 out of 10 cases, missing 5 true
positives, and would correctly reassure 60 out of 90 non-cases, falsely diag-
nosing 30 people as depressed. In a high-risk sample (such as patients with
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 59

known physical disease), at a prevalence of 25%, Bayesian analysis suggests


that an average PCP would correctly identify 12 out of 25 cases, missing 13
true positives, and would correctly reassure 50 out of 75 non-cases, falsely
diagnosing 25 people as depressed.

1.00
Post-test Probability

0.90
Unassisted Attempt to Rule-In Depression

0.80 Unassisted Attempt to Rule-Out Depression

Baseline Probability
0.70

0.60

0.50

0.40

0.30

0.20

0.10
Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 3.1. Bayesian plot of conditional pre-test/post-test probabilities.

Prev 25% 13.0 12.0 50.4 24.6

Depressed Non-Depressed

Prev 10% 5.2 4.8 60.5 29.5

Depressed Non-Depressed
False Negatives (%)
Correctly Diagnosed (%)
Correct Reassured (%)
False Positives (%)
Prev 16% 8.1 7.5 56.7 27.6

Depressed Non-Depressed

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0

Figure 3.2. Rates of correct and incorrect identification per 100 selected cases in
primary care.
60 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Clinicians do less well with minor depression as well as mild depression—a


problem that is shared by those using screening tools as well.24
Underrecognition converts into undertreatment, as recognized patients are
more likely to be offered mental health interventions.25 Data from ESEMeD
shows that only 15.1% of those with an identified mood disorder and 23.2%
with an anxiety disorder received either drug or psychological treatment.26
Maginn and colleagues (2004)27 found that PCPs recorded active management
of a psychological problem in 37% of patients whom they rated as cases. Of
these, 24% were prescribed psychoactive drug treatment, 5% were referred to
psychiatric or psychological services, and 3% were offered both drug and
psychological treatments. Surprisingly, only 5% were offered a follow-up
appointment with their PCP. Wittchen and colleagues28 found somewhat
more favorable rates of conversion to treatment in a large study of 20,421
primary care patients in Germany. After correctly identifying depression
(according to the ICD-10 definition), doctors prescribed drug treatments in
60.8%, prescribed non-drug treatments in 24.9%, and referred the patient to a
mental health specialist in 10%. The take-home message is that the typical
proportion of recognized patients offered treatment from the large ESEMeD,
PPGHC, and INSERM studies is approximately 20%.

Textbox 3.1. Case History: An Example of a Difficult Case?

A previously well 58-year-old man comes to see his GP for the first time soon
after discharge from hospital with a dominant hemisphere stroke from which
he has difficulty walking and word finding. His main complaints are physical,
notably discomfort on walking, fatigue, loss of appetite, and insomnia. His
GP is not sure if he is depressed but asks about low mood and low of interest.
Mood is indeed low since the stroke and motivation is poor, but interest,
weight, and concentration are preserved. There is no hopelessness, guilt, or
suicidal thoughts.

Understanding Detection Errors


To go beyond raw rates of detection accuracy, detailed studies examining the
types of diagnostic error are needed. Tiemens and colleagues (1999)12 found
that that only 26% of missed cases (false negatives) were complete omissions,
while 25% were underestimates of severity (eg, diagnosing subthreshold instead
of mild) and 38% were misidentifications. Conversely, of false-positive diag-
noses, 35% were overestimates of severity, 24% were misdiagnoses, and
41% were complete errors. Diagnostic errors are illustrated in Figure 3,
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 61

using data from Wittchen and colleagues (2002).16 It can be seen that when
deliberating both true cases and true non-cases, there is about a 25% rate of
uncertainty, which is an area for improvement. It also helps explain the
considerable variance between recognition studies, as these possible cases
are sometimes included in those detected and sometimes in those missed. In
the MAGPIE study, Bushnell and associates (2004)29 found that 38% of
depression cases were not recognized. Reasons for this were not categorizing
the patient’s psychological issues as clinically significant (23.4%), recognizing
clinical significance but not ascribing a particular diagnosis (7.1%), or the PCP
making an explicit diagnosis of something other than depression (7.7%).
What, then, distinguishes one clinician from another? Rogers (2001)30
suggested several types of common clinical error when attempting to make a
psychiatric diagnosis: idiosyncratic language in clinical questioning, idiosyn-
cratic coverage in clinical questioning, idiosyncratic sequence of clinical
questioning, idiosyncratic recording of responses and idiosyncratic rating of
severity.

(a) 60.0

50.0

40.0

30.0

20.0

10.0

0.0
ill se se se se se
ntly ca ca ca ca ca
ne ild e re re
rre rli at ve ve
cu M er
ot rde od Se se
N Bo M ry
Ve

Figure 3.3a. and 3.3b. Severity estimates by general practitioners of nondepressed and
depressed patients. Adapted from Wittchen HU, Kessler RC, Beesdo K, et al. Generalized
anxiety and depression in primary care: prevalence, recognition, and management. J Clin
Psychiatry. 2002;63(suppl 8):24–34.
62 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(b) 60.0

50.0

40.0

30.0

20.0

10.0

0.0
ill se se se se se
tly ca ca ca ca ca
rr en ne ild te re re
li a ve ve
cu er M er
ot d od Se se
or y
N B M
Ver

Figure 3.3a and 3.3b (Continued)

2. Predictors of Detection
There have been some impressive studies examining what factors influence
correct detection, although few concerning the influences upon willingness to
look for symptoms of depression. Borowsky and colleagues (2000)31 con-
ducted an impressive study involving 19,309 patients from 349 PCPs in
Boston, Chicago, and Los Angeles. All underwent the MOS eight-item
Burnam screen for depression, and 1,610 underwent a Diagnostic Interview
Schedule (DIS) for DSM-III. Of the patients, 661 were depressed, although
only 70 had current major depression. Physicians were less likely to detect
depression in African Americans, men, and those younger than 35 years and
more likely to detect depression when comorbid hypertension or diabetes was
present. Hickie and colleagues (2001)32 looked at a large sample of 46,515
patients attending 386 PCPs; 56% of cases were not recognized. This is
probably the most comprehensive study of predictors of recognition available.
Patients were more likely to be assessed psychologically if they were middle-
aged, female, Australian-born, unemployed, single, or presenting with mainly
psychological symptoms or for psychological reasons. Doctor characteristics
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 63

associated with willingness to assess were being over 35 years old, having an
interest in mental health, having had previous mental health training, being in
part-time practice, seeing fewer than 100 patients per week, and working in
regional centers. Thompson and colleagues (2001)33 examined recognition
among 156 PCPs in the United Kingdom, involving 18,414 individuals. The
prevalence of depression was 20% based on a 7v8 cutoff on the HADS
depression subscale. The mean recognition sensitivity was 36% and recogni-
tion specificity was 91.5% (Fig. 3.4). Women and unemployed people were
more likely to be detected, while the elderly and retired were more likely to be
missed. However, these relationships were confounded by severity of depres-
sion or anxiety: increased anxiety improved recognition of depression.
Wittchen and colleagues (2002)16 conducted a large study of PCP recogni-
tion in Germany. This impressive nationwide study recruited a total of 20,421
patients, attending 633 PCPs. Taking the doctors’ decision of definite or
probable depression, 75% of all DSM and 59% of all ICD-10 diagnoses were

0.3

0.25

Proportion Missed
0.2
Proportion Recognized

0.15

0.1

0.05

0
ne
n
Th e

en

en
en

Se een

en

n
t

y
e

n
gh

ee
ee
ee

nt
in

Te

el

te

fte
ev

te

-o
Ei

e
N

irt

xt

et
nt
Tw

gh

ty
ur

Tw
El

Fi

Si

in
ve

en
Fo

Ei

Tw

Figure 3.4. Burden and detection of depression by Hampshire (U.K.) general


practitioners. 36% of depression (blue) was detected and 64% was missed (red). 72.6% of
all omissions occurred at a HADS-D score of between 8 and 10. Adapted from Thompson C,
Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive
symptoms in primary care: The Hampshire Depression Project 3. Br J Psychiatry.
2001;179:317–323.
64 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

recognized by the treating physician, albeit with an 11.7% false-positive rate.


Multiple logistic regression revealed that recognition was associated with prior
treatment episodes, increasing number of depression symptoms, patient’s
higher age, practice experience of greater 5 years, and the presence of psycho-
motor retardation. In the MAGPIE study from New Zealand, 63.7% of patients
with a CIDI-diagnosed disorder were recognized as having psychological
problems, although only 40% were recognized as having a clinically signifi-
cant psychological problem and only 33.8% were given an explicit diagnosis.28
In those seen five or more times during the previous year, these recognition
figures increased to 80.2% compared with 28.8% among patients not seen in
the previous year. Maginn and associates (2004)26 examined PCP recognition
of distress in South London. Overall, PCPs identified 65% of cases, but Black
African patients were less likely to be detected or treated than Black Caribbean
and White English patients. Willingness to talk to the doctor about psycholo-
gical problems was the main predictor of detection. Ethnicity did not indepen-
dently predict detection, but Black African individuals were less likely to talk
to their PCP about psychological problems. Worryingly, half as many Black
African individuals with detected distress were offered treatment compared
with English cases (41% versus 22%). Pfaff and Almeida (2004)34 found that
39.9% of patients (87/218) were correctly classified as depressed by their PCP.
Older patients were more likely to be incorrectly classified as ‘‘not depressed’’
by their PCP when they were born outside of Australia or New Zealand, did not
smoke or use sleeping tablets, acknowledged milder levels of depression, and
presented with primarily somatic complaints.
Aragones and colleagues (2004)35 screened 209 Zung-positive patients and
97 negative patients with the SCID. Detection was associated with educational
level, severity of the depression, level of impairment, and the complaint of
explicit psychological symptoms. Antidepressant treatment was associated
with marital status, severity of and impairment from the depression, frequency
of visits to the family physician, and the patient’s complaint of psychological
symptoms. Aragones and colleagues went on to study of predictors of false-
positive diagnoses (2006)36 and found that PCPs had a nearly 50% rate of false-
positive diagnosis. Factors associated independently with overdiagnosis were
higher levels of symptoms SDS score, lower Global Assessment of
Functioning, a previous history of depression, and the absence of generalized
anxiety. Nuyen and colleagues (2005)37 found that among 191 depressed
primary-care patients diagnosed using the CIDI, 28.8% were recognized and
recorded by PCPs over the same period. Patients without chronic somatic
comorbidity, with a lower educational level, with less severe depression, and
with fewer PCP contacts were all significantly more likely not to be diagnosed
as depressed. Verhaak and coworkers (2006)38 conducted a survey of primary
care contacts of patients with a DSM-IV diagnosis of affective disorder,
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 65

anxiety disorder, or alcohol abuse. Forty percent visited their PCP but received
only a somatic diagnosis and 50% were given a psychological or social
diagnosis at least once during 1 year. The chances of a psychological PCP
diagnosis increased with the number of PCP contacts. Patients who were given
a psychological or social diagnosis by their PCP had a higher GHQ score,
lower mental functioning scores on the SF-36, and far more visits to their PCP
than those not diagnosed as psychologically ill. Finally, patients given a
diagnosis tended to express slightly more confidence in their PCP.
McCall and colleagues (2007)39 looked at predictors of recognition of
distress in Austrian primary care practice. Twenty-eight PCPs completed a
clinical audit on 868 of their patients who completed the GHQ-28. PCPs
correctly identified 43% of GHQ-positive cases as having distress. For indivi-
dual PCPs the rate of correct recognition varied considerably, from 4% to
100%. Correct recognition was associated with years of experience as a PCP,
older age of patient, and greater severity of distress.
Clearly, there is a wide variation in the ability of GPs to diagnose mental
health problems, due in part to differences in knowledge, skills, and attitudes
(Textbox 3.2).40,41 Most clinicians have difficulty recalling the current criteria
for major depression.42 Further, only one third claim for make diagnoses based
on validated criteria.43 Self-confident, outgoing physicians with high academic
ability appear to make more accurate diagnoses44—yet this same formula would
apply to psychiatrists’ ability to detect physical illness. One apparently simple
solution is to increase the length of the consultation. There is reasonably good
evidence that short appointments impair detection in difficult cases.45 However,
paradoxically, lengthening the consultation may not improve recognition.46
Verhaak and colleagues (2007)47 found that in general, healthcare system
characteristics do affect PCPs’ performance in psychosocial care. PCPs’ work-
load was not related to their awareness of psychological problems and hardly
related to their communication, except for the finding that a PCP with a
subjective experience of a lack of time is less patient-centered (Textbox 3.3).48

Textbox 3.2. Possible Barriers to Recognition (Diagnostic Barriers)

Patient Related
Younger patient
Male gender
Reluctance to seek help
Reluctance to disclose symptoms
Disclosure of only somatic symptoms
Low awareness of emotional symptoms
Fear of stigma/label of mental illness
66 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 3.2. (Continued)


Clinician Related
Low clinician confidence and skills
Low therapeutic alliance
Low consultation time
Single appointment only
Low index of suspicion
Rare inquiry about depressive symptoms
Caution re: stigma of mental illness

Textbox 3.3. Basic Patient-Centered Interviewing Method

Step 1. Welcoming
Welcome the patient
Introduce self and identify specific role
Ensure patient comfort and privacy
Step 2. Set agenda
Indicate time available and objective
Summarize what is already known and others involved
Indicate own needs
Clarify what patient wants to discuss
Step 3. Non-focused interviewing
Open-ended beginning question: ‘‘How have things been recently?’’
Attentive (active) listening (with prompts): ‘‘That sounds difficult’’
Observe nonverbal cues
Step 4. Focused interviewing
Obtain description of main problem and secondary problems
Clarify the development and context of the problems
Ask about emotional and functional impact of the problems
Step 5. Transition to agreed action
Give brief summary and check accuracy

3. Patient and Clinician Influences on Detection


Do Patients Volunteer Symptoms of Depression?
It should be no surprise that recognition of distress and depression is linked
with the number of symptoms reported during a consultation.49 Recognition is
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 67

facilitated when patients report psychological symptoms of anxiety or depres-


sion early in the consultation.50 Patients who normalize or minimize their
symptoms are less likely to be identified.51 It has been reported that detection
rates may be 100% in those who spontaneously complain of emotional pro-
blems.52 However, patients do not usually complain of ‘‘depression,’’ and
patients’ views about their depressive symptoms are significantly different
from conventional medical views.53,54 Many groups have noted that patients
with depression often present with physical symptoms rather than psycholo-
gical complaints, and the depression is less likely to be recognized as a
consequence.56–62 Perhaps 60% to 70% of patients with depression and anxiety
have predominantly somatic presentations.63,64 Such patients tend to be older
and have less severe depression but not necessarily more comorbid physical
illness. Many authors have shown that patients are often reluctant to discuss
emotional issues with health professionals.65–67 Patients have their own readi-
ness to disclose.68 Indeed, willingness to discuss emotional issues may be one
of the strongest predictors of detection.69 Some ethnic groups (whites and
Hispanics) appear more likely to communicate with a clinician about depres-
sion than others (African Americans).70 However, most patients will discuss
psychological symptoms if asked.71,72 Reassuringly, Davenport and associates
(1987)73 found that there is some association between severity of distress and
spontaneous verbal cues, but this is by no means a perfect correlation,
and those clues are easily overlooked. O’Conner and colleagues (2001)74
examined 1,021 older patients in Melbourne, Australia. Symptom disclosure
was associated with higher depressive scores, previous contact with a psychia-
trist, and female gender; even so, 48% of persons with ICD-10 moderate or
severe depressive episode had not reported any current complaints to their
doctor at the time of the interview. In the MAGPIE study 30% of all primary
care patients of all patients (and 37% of patients with current psychological
symptoms) did not disclose their psychological problems spontaneously;
younger patients, those consulting more frequently, and those with greater
psychiatric disability were more likely to report non-disclosure.75 However, in
this study, reported nondisclosure did not influence detection rates. Verhaak
and colleagues76 collected comprehensive data on detection rates from con-
sultations across 10 European countries and found low rates of spontaneous
emotional complaints.
What, then, are the reasons for not discussing emotional difficulties? The
most frequently given reason in the MAGPIE study was the belief that the PCP
is not the ‘‘right’’ person to talk to (33.8%) or that mental health problems
should not be discussed at all (27.6%). In a survey of primary care attendees
who were high scorers on the GHQ, more than 75% had not mentioned any
emotional problems during a consultation.77 Thirty-six percent felt they were
able to cope without emotional help, but 45% gave reasons including
68 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

psychological embarrassment and hesitation to trouble the doctor, and a further


19% were deterred by the doctors’ interview behaviors (see below). Thirty-
nine percent felt there was little the doctor could do to help with their emotional
problems. In a study by Del Piccolo and associates (1998),78 about two thirds
of patients with stressful life events and social problems had mentioned them to
their PCP. A positive attitude about confiding and emotional distress were the
best predictors of confiding. In women, past confiding and a longstanding
relationship with the PCP were also important. Pollock79 summarized the
difficulty, stating that medical consultations are difficult encounters for most
patients, who often strive to protect their privacy and personal integrity by
‘‘maintaining face,’’ but this in turn may impede the diagnostic process.

Do Clinicians Ask About Depression?


Communication behaviors of clinicians have been much discussed. Individual
clinicians differ in their communicative style, with some more patient-centered
and others less so, but most adjust their style according to the situation, such as
illness severity.79–81 In a large study recording responses of PCPs to standardized
patients, biomedical inquiry/explanations, nonspecific acknowledgment, and
reassurance were common, whereas empathy, expressions of uncertainty, and
exploration of psychosocial factors and emotions were uncommon.82 Yet in
consultations about psychosocial issues, doctors show more emotional behavior,
ask more questions, and give less information than in other consultations.83,84
Feldman and colleagues (2007)85 found that history taking about depression was
directly associated with the likelihood of a chart diagnosis of depression and the
provision of minimally acceptable initial depression care. When PCP decisions
for late-life depression were monitored, a recorded treatment decision occurred
in about 5% of visits, a deferred or monitor-only decision occurred in about a
third of visits, and no decision was made in about half of visits.86 Saltini and
coworkers (2004)87 found that although occupational, financial, and housing
problems and life events of loss were the most important predictors of the GHQ-
12 case definition, PCPs gave significantly more importance to psychiatric
treatment, psychopharmacological drug, use and chronic illness.
A number of authors have commented on suboptimal communication stra-
tegies from clinicians.88 Inadequate interview and diagnostic skills influence
detection.89,90 For example, clinicians appear to miss most cues and concerns
and adopt behaviors that discourage disclosure.91,92 More sophisticated ana-
lysis with video recording of consultations is revealing. In one of the best
examples, Deveugele and colleagues (2004)93 analyzed 2,095 consultations
from 168 PCPs using the Roter Interactional Analysis System. Clinicians
differed markedly in their psychosocial and emotional communication. Some
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 69

studies attempt to go further and uncover an explicit link with detection. In a


seminal study from Marks and associates (1979),94 a research psychiatrist
made detailed observations on 2,098 interviews carried out by 55 PCPs. The
authors found that PCPs who had a better conceptual understanding of mental
illness produced a more accurate diagnosis of the patient’s condition. They also
noted that PCPs with an interest in psychological medicine, those with higher
levels of empathy, and those who asked about social and family problems more
accurately diagnosed psychiatric illness. Badger and colleagues (1994)95 found
two communication behaviors that predicted successful recognition of depres-
sion: the proportion of the interview devoted to emotional issues and the use of
broad, open-ended psychosocial questions. Carney and coworkers (1999)96
found that PCPs who recognized depression asked twice as many questions
about feelings and affect compared with those who did not. In a series of
interviews, Rost and colleagues (2000)97 found that physicians and patients
discussed depression in 47.9% of untreated patients. Chronic physical comor-
bidity decreased the odds that physicians and untreated patients discussed
depression as a possible diagnosis. Interestingly, PCPs who have a preference
for psychotherapy rather than antidepressant treatment also appear more accu-
rate in diagnosing depression.98
There are a number of important barriers to detection, including clinician
attitude (Textbox 3.4). Saltini and associates (2004)99 found that although
occupational, financial, and housing problems and life events of loss were
the most important predictors of the GHQ-12 case definition, PCPs gave
significantly more importance to psychiatric treatment, psychopharmacolo-
gical drug use, and chronic illness. Travado and colleagues6 found that low

Textbox 3.4. Top 10 GP Perceived Barriers to Dealing with


Depression

1. Lack of access to mental health specialists (51.4%)


2. Lack of time (50.6%)
3. Poor reimbursement for depression treatment (50.4%)
4. Distracted by other presenting problems (39.4%)
5. Patient reluctant to be referred to a specialist (37.3%)
6. Workload prevents adequate attention to depression (32.3%)
7. Patient/family reluctance to accept diagnosis of depression (21.7%)
8. Patient inability/unwillingness to discuss depressive symptoms (16.2%)
9. Lack of accessible assessment tools for depression (15.9%)
10. Patient reluctant to begin antidepressant medications (8.6%)
Adapted from Richards JC, Ryan P, McCabe MP, et al. Barriers to the effective management of
depression in general practice. Aust N Z J Psychiatry. 2004;38:795–803.
70 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

psychosocial orientation and burnout symptoms were associated with lower


confidence in communication skills and higher expectations of a negative
outcome after physician–patient communication. In a study of 50 PCPs and
473 patients in Portland, Oregon, routine office visits were audiotaped and
analyzed for communication behaviors and emotional tone using the Roter
Interactional Analysis System.100 Physicians with more positive attitudes to
psychosocial aspects of patient care had more psychosocial discussions in
visits. A large-scale practice audit in Australia found that PCPs with a declared
interest in mental health and those who had obtained mental health training
were more likely to see more patients with depression and more likely to
provide appropriate mental health assessment and treatments. In some studies
insufficient undergraduate and postgraduate training is influential,101 as well
as insufficient time devoted to adequate diagnostic assessment, and a lack of
acquisition of new knowledge relevant to provision of treatments.
Three recent observation studies have examined physician habits in
relation to late-life depression. In a study based in nine primary care clinics
involving 1,023 individuals, Fischer and colleagues (2003)102 found that
physicians were only 6% as likely to ask older depressed patients about
suicide risk and about one-fifth as likely to ask if they felt depressed
compared with younger depressed patients. Tai-Seale and colleagues
(2005)103 observed 389 elderly patients and 33 physicians using video of
their clinical interactions. Physicians assessed depression in only 14% of
the visits and used validated tools only three times. Depression assessment
was more likely in visits that covered multiple topics, contrary to the
‘‘crowding-out’’ hypothesis. Tai-Seale et al (2007)104 observed 35 PCPs
interviewing 366 of their elderly patients. Discussion of mental health
topics occurred in only 22% of visits despite a high prevalence of depres-
sion. A typical mental health discussion lasted approximately 2 minutes.104
Adelman and colleagues (2008)105 audiotaped 482 follow-up visits at three
sites. Depression was discussed in 7.3% of medical visits. Physicians raised
the topic of depression in 41% of visits, patients raised the topic in 48% of
visits, and accompanying persons raised it in 10% of visits. The topic of
depression was raised almost exclusively in the first 2.5 years of the patient–
physician relationship. Physicians with some geriatric training were more
likely to discuss depression.
However, it is important to remember that patient and clinician commu-
nication are reciprocally related. Patient perceptions of how the PCP related to
him or her in the consultation correlates with reduction in symptom severity 3
months later.106 Goldberg and colleagues (1993)107 found that patient cues
were influenced by the PCP’s behavior, increasing with patient-centered
behaviors such as empathic statements or directive questioning about psycho-
logical issues, and decreasing with medical questions and other doctor-led
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 71

behaviors. Similarly, others found that the patient’s willingness to disclose


information is related to physician facilitation, and patient emotional
expression is associated with a warm and empathetic attitude of the
physician.108 Physicians may signal to patients, wittingly or unwittingly,
how emotional problems will be addressed, influencing how patients per-
ceive their interactions with physicians regarding emotional problems. Del
Piccolo and coworkers (2000)109 also found that the proportion of cues
given by patients was related also to the PCP’s verbal behavior, increasing
with closed psychosocial questions and decreasing with the use of active
interview techniques. In fact, patients with detected distress gave more
cues, often with psychological content, whereas patients with undetected
distress gave mainly cues related to their lifestyle and life episodes.
Recently, an international study by Verhaak and colleagues (2007)76
found that eye contact and empathy and asking questions about psycholo-
gical or social topics were associated with more awareness of patients’
psychological problems.
One other important predictor of diagnostic sensitivity (recognition)
includes the amount of contact with the patient.110,111 In the MAGPIE study
from New Zealand, 80.2% of cases seen five or more times during the previous
year were correctly identified, compared with 28.8% of those patients not seen
in the previous year. For example, over time, only 30% remain undetected at
1 year and 14% at the end of 3 years.112,113 Using patient self-report regarding
the adequacy of diagnosis/treatment, Jackson and colleagues114 found that the
cumulative recognition rate was a modest 56% for major depression and 20%
for minor depression, even after 5 years.

4. Illness-Related Influences on Detection


There is some evidence that clinicians find mental illness difficult to deal with
and awkward to diagnose. For example, PCPs in the United States appear
reluctant to code patients as depressed.115 Somatic complaints thought to have
a psychological basis are also perceived as difficult.116,117 In a study of 500
primary care visits, 15% were perceived as difficult by clinicians, and these
were more likely to involve a mental disorder, more than five somatic symp-
toms, more severe symptoms, poorer functional status, more unmet expecta-
tions, less satisfaction with care, and higher use of health services.118
Interestingly, clinicians with poorer psychosocial attitudes perceived three
times as many encounters as being difficult. In the same study, the authors
showed that a 2-hour physician workshop followed by information
provided before each visit improved physician-perceived difficulty of the
encounter.119
Table 3.1. Large-Scale International Studies on Mood Disorders Recognition and Treatment

Study Setting Sample Instrument Prevalence of Recognition % Offered


Mood Disorders in Primary Antidepressants
Care
Institut National Paris, France, 2,419 patients (aged 18–70 MINI Major depression Major Major
de la Santé et de la 1996–97 years) 238 were found to be (14.0%), minor depression depression
Recherche depressed and were followed depression (26%) (21%)
Médicale up for 6 months. (3.1%), and Any mental
(INSERM) study dysthymia (2.1%) disorder
(58%)
European Study Community study 21,425 non-institutionalized WMH-CIDI Lifetime Not Major
of the in Belgium, adults  18 years old prevalence rates examined depression
Epidemiology of France, Germany, (including those 65 years and of 13.4% for (21.2%)
Mental Disorders Germany, Italy, older) major depression
(ESEMeD) the Netherlands, and 4.4% for
and Spain dysthymia were
reported.
World Health 14 countries 26,422 consecutive patients General Mental disorders Major Major
Organization worldwide (aged 15–65 years) Health (24%) depression depression
study on Questionnaire Major depression (15%) (15%)
Psychological (GHQ-12) Any mental
(13.7%)
Problems in disorder
General Health Minor depression
(54%)
Care (PPGHC) (3.6%)
Dysthymia
(3.6 %)
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 73

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
t

en

Th ve

Fo en

en

Si n

ve n

Ei e n

in n

Tw n
en ty
ne
gh

Se e e

e
ee
in

Te

Tw en
el
ev

te

fte

te

-o
e
N
ei

irt

xt

et
nt
Tw

gh
ur

ty
El

Fi

N
Figure 3.5. Detection sensitivity (%) by severity of depression according to the HADS
scale. Adapted from Thompson, C., Ostler, K., Peveler, R. C., et al (2001) Dimensional
perspective on the recognition of depressive symptoms in primary care. The Hampshire
Depression Project 3. British Journal of Psychiatry, 179, 317–323.

Most depressions in primary care are mild to moderate in severity (90%


have a score of 8 to 13 on the HADS), and the detection of mild disorders is a
challenge because symptoms do not differ greatly from those of healthy but
stressed individuals.120,121 Thompson and colleagues (2001)32 examined the
relationship between severity of depression on the HADS-D and proportion of
cases detected (Fig. 3.5). Generally, higher severity of depression is associated
with greater recognition, but because of the great burden of mild depression,
50% of all correct recognition occurs at a HADS-D score of between 8 and 10.
Further, many cases feature physical or mental comorbidities such as anxiety.
Comorbidity may decrease recognition.122 In primary care only about 10% of
all depressions do not feature comorbidity (5% of those with major depres-
sion). About 50% have physical comorbidity and an overlapping 70% to 80%
psychiatric comorbidity (of which 40% to 50% is anxiety). Patients with
anxiety or chronic mixed anxiety and depression were less likely to be offered
active treatment than those considered to have depression.123 One hypothesis is
that somatic complaints, particularly in late-life depression, might cause the
clinician to focus on physical rather than mental symptoms. Many clinicians
have been taught to take an exclusive approach and ignore such complaints, but
accumulating evidence suggests this is probably incorrect and that somatic
symptoms should be ‘‘counted’’ toward depression even when another physical
74 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

illness like stroke or Parkinson’s disease is present. This is discussed further in


Chapters 10 and 11.
However, this ‘‘crowding-out’’ hypothesis has been refuted. For example,
Ani and coworkers (2008)124 found that comorbidity had no effect of recogni-
tion accuracy. Pfaff and Almedia (2005)125 found that predictors of detection
included concomitant polypharmacy (imply higher comorbidity) as well as
higher CESD scores, presenting with psychological complaints, and higher risk
of suicide. O’Conner and associates (2001)126 found that comorbid pain
positively influenced detection of late-life depression. Similarly, Borowsky
and associates (2000)30 found superior detection of depression if comorbid
diabetes or hypertension were present. Other factors were previous psychiatric
consultation, number of years as a patient, severity of depression, and disclo-
sure of depression to the physician. Indeed, the co-occurrence of MDD and
anxiety might actually facilitate recognition of depression127 or psychiatric
caseness.128–130 When faced with ambiguity and diagnostic difficulties, some
evidence suggests that only a minority of clinicians choose to explore the issues
in more detail.131

5. Conclusions
Depression is often a complex comorbid presentation associated with frequent
primary care attendance.132 Recognition of depression in primary care and
hospital settings is poor, yet in part it is worth remembering that depression is a
relatively uncommon reason for presentation in primary care, with at least six
out of seven unselected cases not having depression. In primary care, time and
resources are limited, and hence psychological or even structured self-help
programs are often not available. The most plausible factor explaining under-
treatment is underrecognition. Antidepressants are typically the treatment of
choice for clinicians but not for patients, and hence managing depression can
be seen as difficult.133 Against this background, only about a half of true cases
are diagnosed and perhaps a quarter treated. Conversely, about 70% of non-
cases are correctly reassured.
Two major factors appear to influence detection: how the person with depres-
sion describes his or her symptoms and how the clinician interviews the patient.
The nature of the therapeutic relationship is important. Even in the face of a high
frequency of contact, a therapeutic relationship that is noted by the clinician (or
patient) to be unhelpful is likely to decrease the recognition rate. Discussion of
emotional distress in primary care is also linked with high patient satisfaction.134
Additional factors such as the skill of the clinician and the use of tools may also
play a role (see Chapter 7). There are certainly many potential barriers to
successful diagnosis and treatment.135 Mental health skills training has been
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 75

effective in improving recognition and management of somatizing and depressed


patients by PCPs, but it remains uncertain whether this translates into improved
clinical outcomes.136–138 Interventions are likely to be most successful where
problems are most serious. For example, Shapiro and colleagues (1987)139 con-
ducted a randomized clinical trial involving 1,242 patients attending inner-city
PCPs by giving feedback of GHQ scores. Results showed marked increases in
detection but only among the elderly, African Americans, and men.
Clinicians should have a high index of suspicion in frequent attendees, those
with serious or chronic illness, and those who have persistent but unexplained
pain. High vigilance is warranted in patients with those somatic symptoms, in
men, and in younger patients.140,141 Ultimately, it is useful to reflect on
patients’ opinions on the importance of primary care for depression.142 The
top four most important needs are the clinician’s interpersonal skills, ability to
recognize depression, the effectiveness of treatment, and problems associated
with treatment.

References

1. Callahan CM, Nienaber NA, Hendrie HC, et al. . Depression of elderly outpatients:
Primary care physicians’ attitudes and practice patterns. J Gen Intern Med. 1992;7(1):
26–31.
2. Kaplan MS, Adamek ME, Martin JL. Confidence of primary care physicians in
assessing the suicidality of geriatric patients. Int J Geriatric Psychiatry.
2001;16(7):728–734.
3. Gallo JJ, Ryan SD, Ford DE. Attitudes, knowledge, and behavior of family physicians
regarding depression in late life. Arch Fam Med. 1999;8:249–256.
4. Shao W, Williams J, Lee S, et al. Knowledge and attitudes about depression among
non-generalists and generalists. J Fam Pract. 1997;44:161–168.
5. Feldman MD, Franks P, Duberstein PR, et al. Let’s not talk about it: Suicide inquiry in
primary care. Ann Fam Med. 2007;5(5):412–418.
6. Travado L, Grassi L, Gil F, et al., and the Southern European Psycho-Oncology Study
(SEPOS) Group. Physician-patient communication among Southern European cancer
physicians: The influence of psychosocial orientation and burnout. Psychooncology.
2005;14(8):661—670.
7. Plummer SE, Gournay K, Goldberg D, et al. Detection of psychological distress by
practice nurses in general practice. Psychol Med. 2000;30(5):1233–1237.
8. Cape J, Morris E, Adams N, et al. Identification of psychological morbidity in
older people in primary care by practice nurses. Aging Mental Health.
2003;7(6):446–451.
9. Ryan H, Schofield P, Cockburn J, et al. How to recognize and manage psychological
distress in cancer patients. Eur J Cancer Care. 2005;14(1):7–15.
10. Liu SI, Mann A, Cheng A, et al. Identification of common mental disorders by general
medical doctors in Taiwan. Gen Hosp Psychiatry. 2004;26(4):282–288.
11. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol
Rev. 1983;3:103–145.
76 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

12. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care
physicians versus a structured diagnostic interview. Understanding discordance. Gen
Hosp Psychiatry. 1999;21(2):87–96.
13. Smith MV, Rosenheck RA, Cavaleri MA, et al. Screening for and detection of
depression, panic disorder, and PTSD in public-sector obstetric clinics. Psychiatr
Serv. 2004;55:407–414.
14. Ormel J, Koeter MWJ, van den Brink W, et al. Recognition, management and
course of anxiety and depression in general practice. Arch Gen Psychiatry.
1991;48:700–706.
15. Norton J, De Roquefeuil G, Boulenger JP, et al. Use of the PRIME-MD Patient
Health Questionnaire for estimating the prevalence of psychiatric disorders in
French primary care: comparison with family practitioner estimates and
relationship to psychotropic medication use. Gen Hosp Psychiatry.
2007;29(4):285–293.
16. Wittchen HU, Kessler RC, Beesdo K, et al. Generalized anxiety and depression in
primary care: prevalence, recognition, and management. J Clin Psychiatry.
2002;63(suppl 8):24–34.
17. Jackson JL, Passamonti M , Kroenke K. Outcome and impact of mental disorders in
primary care at 5 years. Psychosom Med. 2007;69(3):270–276.
18. Ustun TB, Von Korff M. Primary mental health services. In: Ustun TB, Sartorius N,
eds. Mental illness in general health care: an international study. Chichester, UK:
John Wiley & Sons; 1995:347–360.
19. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe:
results from the European Study of the Epidemiology of Mental Disorders (ESEMeD)
project. Acta Psychiatr Scand Suppl. 2004;420:21–27.
20. Alonso J, Lépine J-P. Overview of key data from the European Study of the
Epidemiology of Mental Disorders (ESEMeD). J Clin Psychiatry. 2007;68(suppl
2):3–9.
21. Friedman B, Conwell Y, Delavan RL. Correlates of late-life major depression:
A comparison of urban and rural primary care patients. Am J Geriatr Psychiatry.
2007;15(1):28–41.
22. Licht-Strunk E, van der Kooij KG, van Schaik DJF. Prevalence of depression in older
patients consulting their general practitioner in The Netherlands. Int J Geriatr
Psychiatry. 2005;20(11):1013–1019.
23. Mitchell AJ, Vaze A, Rao S. Meta-Analysis of Unassisted Recognition of Depression
in Primary Care: Importance of False Positives and False Negatives. The Lancet 2009
(in press).
24. Lyness JM, Noel TK, Cox C, et al. Screening for depression in elderly primary care
patients. A comparison of the Center for Epidemiologic Studies-Depression Scale and
the Geriatric Depression Scale. Arch Intern Med. 1997 24;157(4):449–454.
25. Greer J, Halgin R, Harvey E. Global versus specific symptom attributions: predicting
the recognition and treatment of psychological distress in primary care. J Psychosom
Res. 2004;57:521–527.
26. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe:
results from the European Study of the Epidemiology of Mental Disorders (ESEMeD)
project. Acta Psychiatr Scand Suppl 2004;420:21–27.
27. Maginn S, Boardman AP, Craig TKL, et al. The detection of psychological problems
by general practitioners. Influence of ethnicity and other demographic variables. Soc
Psychiatry Psychiatr Epidemiol. 2004;39:464–471.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 77

28. Wittchen HU, Hofler M, Meister W. Prevalence and recognition of depressive


syndromes in German primary care settings: poorly recognized and treated? Int Clin
Psychopharmacol. 2001;16(3):121–135.
29. Bushnell J. Frequency of consultations and general practitioner recognition of
psychological symptoms. Br J Gen Pract. 2004;54(508):838–842.
30. Rogers R. Handbook of diagnostic and structured interviewing, New York: Guilford
Publications, 2001.
31. Borowsky SJ, Rubenstein LV, Meredith LS, et al. Who is at risk of nondetection of
mental health problems in primary care? J Gen Intern Med. 2000;15(6):381–388.
32. Hickie IB, Davenport TA, Scott EM, et al. Unmet need for recognition of
common mental disorders in Australian general practice. Med J Australia.
2001;175:S18–S24.
33. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition
of depressive symptoms in primary care. The Hampshire Depression Project 3. Br J
Psychiatry. 2001;179:317–323.
34. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection
of depression in older primary care patients. Australian N Z J Psychiatry.
2005;39(4):262–265.
35. Aragones E, Pinol JL, Labad A, et al. Detection and management of depressive
disorders in primary care in Spain. Int J Psychiatry Med. 2004;34(4):331–343.
36. Aragones E, Pinol JL, Labad A. The overdiagnosis of depression in non-depressed
patients in primary care. Fam Pract. 2006;23(3):363–368.
37. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in
primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol
Med. 2005;35(8):1185–1195.
38. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in
general practice: determinants of general practitioners’ psychological diagnosis. Gen
Hosp Psychiatry. 2006;28:125–132.
39. McCall L, Clarke D, Trauer T, et al. Predictors of accuracy of recognition of
emotional distress in general practice. Primary Care Community Psychiatry.
2007;12(1):1–5.
40. Millar T, Goldberg DP. Link between the ability to detect and manage emotional
disorders: a study of general practitioner trainees. Br J Gen Pract. 1991; 41: 357–359.
41. Davenport TA, Hickie IB, Naismith SL, et al. Variability and predictors of mental
disorder rates and medical practitioner responses across Australian general practices.
Med J Australia. 2001;175:S37–S41.
42. Rapp S, Davis K. Geriatric depression: physicians’ knowledge, perceptions and
diagnostic practices. Gerontologist. 1989;29:252–257.
43. Williams Jr JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to
depressive disorders: effects of physician specialty and practice structure. Arch Fam
Med. 1999;8(1):58–67.
44. Goldberg D, Steele J, Johnson A, et al. Ability of primary care physicians to
make accurate ratings of psychiatric symptoms. Arch Gen Psychiatry.
1982;39:829–833.
45. Hutton C, Gunn J. Do longer consultations improve the management of psychological
problems in general practice? A systematic literature review. BMC Health Services
Research. May 17, 2007;7:Art. No. 71.
46. Howie JG, Porter AM, Heaney DJ, et al. Long to short consultation ratio: a proxy
measure of quality of care for general practice. Br J Gen Pract. 1991;41:48–54.
78 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

47. Verhaak PFM, Van Den Brink-Muinen A, Bensing JM, et al. Demand and
supply for psychological help in general practice in different European
countries—Access to primary mental health care in six European countries.
Eur J Public Health. 2004;14(2):134–140.
48. Zantinge EM, Verhaak PFM, de Bakker DH, et al. The workload of general
practitioners does not affect their awareness of patients’ psychological problems.
Patient Education Counseling. 2007;67(1–2):93–99.
49. Kruse J, Schmitz N, Woller W, et al. Why does the general practitioner overlook
psychological disorders in his patient? Determinates of physicians’ identification with
psychological disorders. Psychotherapie Psychosomatik Medizinische Psychologie.
2004;54(2):45–51.
50. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the
recognition by general practitioners of major depression in women? Br J Gen Pract.
1995;45:575–578.
51. Kessler D, Lloyd K, Lewis G, et al. Cross sectional study of symptom attribution
and recognition of depression and anxiety in primary care. BMJ.
1999;318:436–439.
52. Weich S, Lewis G, Mann AH, et al. The somatic presentation of psychiatric morbidity
in general practice. Br J Gen Pract. 1995;45:143–147.
53. Yeung A, Chang D, Gresham RL, et al. Illness beliefs of depressed Chinese American
patients in primary care. J Nerv Mental Dis. 2004;192(4):324–327.
54. Cornford CS, Hill A, Reilly J. How patients with depressive symptoms view their
condition: a qualitative study. Fam Pract. 2007;24(4): 358–364.
55. Bridges KW, Goldberg DP. Somatic presentation of DSM-III psychiatric disorders in
primary care. J Psychosom Res. 1985;29:563–569.
56. Susman JL, Crabtree BF, Essink G. Depression in rural family practice: easy to
recognize, difficult to diagnose. Arch Fam Med. 1995;4:427–431.
57. Sartorius N, Ustun TB, Lecrubier Y, et al. Depression comorbid with anxiety: results
from the WHO study on psychological disorders in primary health care. Br J
Psychiatry. 1996;168(Suppl. 30):38–43.
58. Freeling P, Rao BM, Paykel ES, et al. Unrecognised depression in general practice.
BMJ. 1985;290:1880–1883.
59. Tylee AT, Freeling P, Kerry S. Why do general practitioners recognize major depression
in one woman patient yet miss it in another? Br J Gen Pract. 1993;43:327–330.
60. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the
recognition by general practitioners of major depression in women? Br J Gen Pract.
1995;45:575–578.
61. Coulehan JL, Schulberg HC, Block MR, et al. Medical comorbidity of major depressive
disorder in a primary medical practice. Arch Intern Med. 1990;150:2363–2367.
62. Freeling P, Rao BM, Paykel ES, et al. Unrecognized depression in general practice.
BMJ. 1985;290:1880–1883.
63. Keeley RD, Smith JL, Nutting PA, et al. Does a depression intervention result in
improved outcomes for patients presenting with physical symptoms? J Gen Intern
Med. 2004;19:615–623.
64. Vuorilehto M, Melartin T, Isometsa E. Depressive disorders in primary care:
recurrent, chronic, and co-morbid. Psychol Med. 2005;35(5):673–682.
65. Priest RG, Vize C, Roberts A, et al. Lay people’s attitudes to treatment of depression:
Results of opinion poll for defeat depression campaign just before its launch. BMJ.
1996;313:858–859.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 79

66. Prior L, Wood F, Lewis G, et al. Stigma revisited, disclosure of emotional problems in
primary care consultations in Wales. Social Sci Med. 2003;56(10):2191–2200.
67. Cape J, McCullough Y. Patients’ reasons for not presenting emotional problems in
general practice consultations. Br J Gen Pract. 1999;49(448):875–879.
68. Leaf PJ, Livingston MM, Tischler GL, et al. Contact with health professionals
for the treatment of psychiatric and emotional problems. Med Care.
1985;23:1322–1337.
69. Maginn S, Boardman AP, Craig TKJ, et al. The detection of psychological problems
by general practitioners—Influence of ethnicity and other demographic variables.
Social Psychiatry Psychiatric Epidemiol. 2004;39(6):464–471.
70. Probst JC, Laditka SB, Moore CG, et al. Race and ethnicity differences in reporting of
depressive symptoms. Administration And Policy In Mental Health And Mental
Health Services Research. 2007;34(6):519–529.
71. Williams JWJ, Mulrow CD, Kroenke K, et al. Case-finding for depression in primary
care: a randomized trial. Am J Med. 1999;106:36–43.
72. Simon GE, Von Korff M, Picinelli M, et al. An international study of the relation
between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335.
73. Davenport S, Goldberg D, Millar T. How psychiatric disorders are missed during
medical consultations. Lancet, 1987;330(8556):439–441.
74. O’Connor DW, Rosewarne R, Bruce A. Depression in primary care. 1:Elderly
patients’ disclosure of depressive symptoms to their doctors. Int Psychogeriatr.
2001;13(3):359–365.
75. Bushnell J, McLeod D, Dowell A, et al. Do patients want to disclose psychological
problems to GPs? Fam Pract. 2005;22(6): 631–637.
76. Verhaak PFM, Bensing JM, Van der Brink-Mulinen A. GP mental health care in 10
European countries: patients’ demands and GPs’ responses. Eur J Psychiatry.
2007;21(1):7–16.
77. Cape J, McCulloch Y. Patients’ reasons for not presenting emotional problems in
general practice consultations. Br J Gen Pract. 1999;49(448): 875–879.
78. Del Piccolo L, Saltini A, Zimmermann C. Which patients talk about stressful life
events and social problems to the general practitioner? Psychol Med.
1998;28(6):1289–1299.
79. Pollock K. Maintaining face in the presentation of depression: constraining the
therapeutic potential of the consultation. Health (London). 2007;11(2): 163–180.
80. Zandbelt LC, Smets EMA, Oort FJ, et al. Determinants of physicians’ patient-
centred behaviour in the medical specialist encounter. Social Sci Med.
2006;63(4):899–910.
81. Del Piccolo L, Mazzi M, Saltini A, et al. Inter- and intra-individual variations in
physicians’ verbal behaviour during primary care consultations. Social Sci Med.
2002;55(10):1871–1885.
82. Epstein RM, Hadee T, Carroll J, et al. ‘‘Could this be something serious?’’—
Reassurance, uncertainty, and empathy in response to patients’ expressions of
worry. J Gen Intern Med. 2007;22(12): 1731–1739.
83. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs
during the consultation related to the diagnosis? A cross-sectional study in six
European countries. Patient Education Counseling. 2004;54(3):283–289.
84. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to
their perceptions of illness severity, coping and social support? Social Sci Med.
2002;55(7):1245–1253.
80 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

85. Feldman MD, Franks P, Epstein RM, et al. Do patient requests for antidepressants
enhance or hinder physicians’ evaluation of depression? A randomized controlled
trial. Med Care. 2006;44(12):1107–1113.
86. Watts SC, Bhutani GE, Stout IH, et al. Mental health in older adult recipients of
primary care services: is depression the key issues? Identification, treatment and the
general practitioner. Int J Geriatr Psychiatry. 2002;17:427–437.
87. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of
emotional distress in primary care. Psychol Med. 2004;34(4):729–739.
88. Maguire P. Improving the recognition of concerns and affective disorders in cancer
patients. Recent Advances in Clinical Psychiatry. 1992;7:15–30.
89. Goldberg DP, Jenkins L, Millar T, et al. The ability of trainee general
practitioners to identify psychological distress among their patients. Psychol
Med. 1993;23:185–193.
90. Tobin M, Hickie I, Urbanc A. Increasing general practitioner skills with patients with
serious mental illness. Aust Health Rev. 1997;20:55–67.
91. Zimmermann C, Del Piccolo L, Finset A. Cues and concerns by patients in medical
consultations: A literature review. Psychol Bull. 2007;133(3):438–463.
92. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to
their perceptions of illness severity, coping and social support? Social Sci Med.
2002;55(7):1245–1253.
93. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs
during the consultation related to the diagnosis? A cross-sectional study in six
European countries. Patient Education and Counseling. 2004;54(3):283–289.
94. Marks JN, Goldberg DP, Hillier VF. Determinants of the ability of general
practitioners to detect psychiatric illness. Psychol Med. 1979;9(2):337–353.
95. Badger LLW, deGruy FV, Hartman MA, et al. Psychosocial interest, medical
interviews, and the recognition of depression. Arch Fam Med. 1994;3:899–907.
96. Carney PA, Eliassen MS, Wolford GL, et al. How physician communication
influences recognition of depression in primary care. J Fam Pract.
1999;48(12):958–964.
97. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment
provided primary care patients with major depression. Arch Fam Med.
2000;9:150–154.
98. Dowrick C, Gask L, Perry R, et al. Do general practitioners’ attitudes towards
depression predict their clinical behaviour? Psychol Med. 2000;30:413–419.
99. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of
emotional distress in primary care. Psychol Med. 2004;34(4):729–739.
100. Levinson W, Roter D. Physicians psychosocial beliefs correlate with their patient
communication-skills. J Gen Intern Med. 1995;10(7):375–379.
101. A report of the Joint Consultative Committee. Primary care psychiatry—the last
frontier. Canberra: Royal Australian College of General Practitioners and Royal
Australian and New Zealand College of Psychiatrists, 1997.
102. Fischer LR, Wei F, Solberg LI, e tal. Treatment of elderly and other adult patients for
depression in primary care. J Am Geriatr Soc. 2003;51(11):1554–1562.
103. Tai-Seale M, Bramson R, Drukker D, et al. Understanding primary care physicians’
propensity to assess elderly patients for depression using interaction and survey data.
Med Care. 2005;43(12):1217–1224.
104. Tai-Seale M, McGuire T, Colenda C, et al. Two-minute mental health care for elderly
patients: Inside primary care visits. J Am Geriatr Soc. 2007;55(12):1903–1911.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 81

105. Adelman RD, Greene MG, Friedmann E, et al. Discussion of depression in follow-up
medical visits with older patients. J Am Geriatr Soc. 2008;56(1):16–22.
106. Cape J. Patient-rated therapeutic relationship and outcome in general practitioner
treatment of psychological problems. Br J Clin Psychol. 2000;39(4):383–395.
107. Goldberg D, Jenkins L, Millar T, et al. The ability of trainee general practitioners to
identify psychological distress among their patients. Psychol Med. 1993;23:185–193.
108. Ishikawa H, Takayama T, Yamazaki Y, et al. The interaction between physician and
patient communication behaviors in Japanese cancer consultations and the influence
of personal and consultation characteristics. Patient Education Counseling.
2002;46(4):277–285.
109. Del Piccolo L, Saltini A, Zimmermann C, et al. Differences in verbal behaviours of
patients with and without emotional distress during primary care consultations.
Psychol Med. 2000;30(3):629–643.
110. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in
primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol
Med. 2005;35:1185–1195.
111. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in
general practice: determinants of general practitioners’ psychological diagnosis. Gen
Hosp Psychiatry. 2006;28:125–132.
112. Rost K, Zhang M, Fortney J, et al. Persistently poor outcomes of undetected major
depression in primary care. Gen Hosp Psychiatry. 1998;20:12–20.
113. Kessler D, Bennewith O, Lewis G, et al. Detection of depression and anxiety in
primary care: follow-up study. BMJ. 2002;325:1016–1017.
114. Jackson JL, Passamonti M, Kroenke K. Outcome and impact of mental disorders in
primary care at 5 years. Psychosom Med. 2007;69(3):270–276.
115. Rost K, Smith R, Matthews DB, et al. The deliberate misdiagnosis of major depression
in primary care. Arch Fam Med. 1994;3(4):333–337.
116. Hahn SR. Physical symptoms and physician-experienced difficulty in the physician-
patient relationship. Ann Intern Med. 2001;134(9):897–904.
117. Carson AJ, Stone J, Warlow C, et al. Patients whom neurologists find difficult to help.
J Neurol Neurosurg Psychiatry. 2004;75(12):1776–1778.
118. Jackson JL, Kroenke K. Difficult patient encounters in the ambulatory clinic: clinical
predictors and outcomes. Arch Intern Med. 1999;159:1069–1075.
119. Jackson JL, Kroenke K, Chamberlin J. Effects of physician awareness of symptom-
related expectations and mental disorders—A controlled trial. Arch Fam Med.
1999;8(2):135–142.
120. Olfson M, Gilbert T, Weissman M, et al. Recognition of emotional distress in
physically healthy primary care patients who perceive poor physical health. Gen
Hosp Psychiatry. 1995;17:173–180.
121. Perez Stable E, Miranda J, Munoz RF. Depression in medical outpatients:
underrecognition and misdiagnosis. Arch Intern Med. 1990;150:1083–1088.
122. Schwenk TL, Coyne JC, Fechner-Bates S. Differences between detected and
undetected patients in primary care and depressed psychiatric patients. Gen Hosp
Psychiatry. 1996;18:407–415.
123. Hyde J, Evans J, Sharp D, et al. Deciding who gets treatment for depression and
anxiety: a study of consecutive GP attenders. Br J Gen Pract. 2005;55(520):846–853.
124. Ani C, Bazargan M, Hindman D, et al. Depression symptomatology and diagnosis:
discordance between patients and physicians in primary care settings. BMC Family
Practice 2008;9:1.
82 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

125. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection
of depression in older primary care patients. Australian N Z J Psychiatry.
2005;39(4):262–265.
126. O’Conner DW, Rosewarne R, Bruce A. Depression in primary care 2: General
practioners’ recognition of major depression in elderly patients. Int Psychogeratrics.
2001;13(3):367–374.
127. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care
physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12.
128. Ormel J, Van den Brink W, Koeter MW, et al. Recognition, management and outcome
of psychological disorders in primary care: a naturalistic follow-up study. Psychol
Med. 1990;20:909–923.
129. Pini S, Berardi D, Rucci P, et al. Identification of psychiatric distress by primary care
physicians. Gen Hosp Psychiatry. 1997;19:411–418.
130. Pini S, Perkonnig A, Tansella M, et al. Prevalence and 12-month outcome of threshold
and sub-threshold mental disorders in primary care. J Affective Disorders.
1999;56:37–48.
131. Seaburn DB, Morse D, McDaniel SH, et al. Physician responses to ambiguous patient
symptoms. J Gen Intern Med. 2005;20(6):525–530.
132. Menchetti M, Cevenini N, De Ronchi D, et al. Depression and frequent attendance in
elderly primary care patients. Gen Hosp Psychiatry. 2006;28(2):119–124.
133. van Schaik DJF, Klijn AFJ, van Hout HPJ, et al. Patients’ preferences in the treatment
of depressive disorder in primary care. Gen Hosp Psychiatry. 2004;26(3):184–189.
134. Gross R, Brammli-Greenberg S, Tabenkin H, et al. Primary care physicians’
discussion of emotional distress and patient satisfaction. Int J Psychiatry Med.
2007;37(3):331–345.
135. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in
primary care. Gen Hosp Psychiatry. 2002;24:213–224.
136. Gask L, McGrath G, Goldberg D, et al. Improving the psychiatric skills of established
general practitioners: evaluation of group teaching. Med Educ. 1987;21:362–368.
137. Gask L, Usherwood T, Thompson H, et al. Evaluation of a training package in the
assessment and management of depression in primary care. Med Educ.
1998;32:190–198.
138. Kaaya S, Goldberg D, Gask L. Management of somatic presentations of psychiatric
illness in general medical settings: evaluation of a new training course for general
practitioners. Med Educ. 1992;26:138–144.
139. Shapiro S, German PS, Skinner EA, et al. An experiment to change detection and
management of mental morbidity in primary care. Med Care. 1987;25:327–339.
140. Gallo JJ, Rabins PV. Depression without sadness: alternative presentations of
depression in late life. Am Fam Physician. 1999;60:820–826.
141. Gallo JJ, Rabins PV, Anthony JC. Sadness in older persons: 13-year follow-up of a
community sample in Baltimore, Maryland. Psychol Med. 1999;29:341–350.
142. Cooper LA, Brown C, Vu HT, et al. Primary care patients’ opinions regarding the
importance of various aspects of care for depression. Gen Hosp Psychiatry.
2000;22(3):163–173.
4
HOW CAN EXISTING MOOD SCALES BE
IMPROVED? HOW TO TEST, REFINE, AND
IMPROVE EXISTING SCALES

Adam B. Smith

1. Introduction
2. The Rasch Model and Other Item Response Models
3. Conclusion

Context
Many scales and tools have been developed by expert opinion. Several methods
are available by which tools can be field tested in order to more accurately
gauge their diagnostic potential. Promising new methods including item banks
and computer-adaptive tests are under development to maximize the efficiency
of screening tools for depression.

1. Introduction
Various methods are available to diagnose psychiatric disorders (see
Chapter 2), but in the absence of a formal semi-structured psychiatric assess-
ment, which remains impractical, the most commonly used method for asses-
sing and screening levels of emotional distress remains by self-completed
questionnaire.1 There have been many hundreds of validation attempts, com-
paring the severity questions against clinical judgment, semi-structured inter-
views, DSM and ICD criteria, and of course each other. Almost universally in
primary care, community, and specialist settings, their accuracy is imperfect
and further refinement is required. When tested according to their ability to
enhance the detection and quality of care for depression, the efficacy of these
instruments remains modest.2 A recent review from Gilbody and colleagues3

83
84 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

found that screening and case-finding instruments were associated with a


modest increase in the recognition of depression by clinicians (relative risk
[RR] 1.27, 95% confidence interval [CI] 1.02 to 1.59) and only a borderline
significant effect on the overall management of depression (RR 1.30, 95% CI
0.97 to 1.76). Seven studies provided data on the impact of screening on
depression outcomes, but there was no evidence of an effect (standardized
mean difference –0.02, 95% CI –0.25 to 0.20). No doubt some of the problem
lies with the organizational elements that may (or may not) accompany
screening and some lies with clinicians’ willingness to treat a probable case.
However, some blame also lies with the instruments themselves, as most were
developed by expert opinion rather than by a scientific process.

Tool Development
The quantitative methods that enable evaluation of the diagnostic accuracy of
severity scales are discussed in Chapter 5. However, the evaluation of scales
should be viewed in a wider context of tool development (Table 4.1). In the
preclinical phase a tool is developed, often in the case of depression borrowing
from existing scales and usually by consensus rather than by scientific testing.
In phases I and II preliminary testing occurs, ideally in the clinically repre-
sentative sample with several competing comparison groups. These diagnostic
validity studies do not prove that the tool is useful, rather that it is potentially

Table 4.1. Stages in the Evaluation of the Screening Tool

Stage Purpose Description


Preclinical Tool Here the aim is to develop a screening method that
development is likely to help in the detection of the underlying
disorder, either in a specific setting or in all
settings. Issues of acceptability of the tool to both
patients and staff must be considered for
implementation to be successful.
Phase I screen Early diagnostic The aim is to evaluate the early design of the
validity testing screening method against a known (ideally
in a selected accurate) standard known as the criterion
sample and reference. In early testing the tool may be refined,
refinement of selecting the most useful aspects and deleting
tool redundant aspects to make the tool as efficient
(brief) as possible while retaining its value.
Phase II screen Diagnostic The aim is to assess the refined tool against a
validity in a criterion (gold standard) in a real-world sample
representative where the comparator subjects may represent
sample several competing conditions that may otherwise
cause difficulty regarding differential diagnosis.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 85

Table 4.1. (Continued)

Stage Purpose Description


Phase III screen Screening This is an important step in which the tool is
randomized evaluated clinically in one group with access to the
controlled trial; new method compared to a second group (ideally
clinicians using selected in a randomized fashion) who make
vs. not using a assessments without the tool. This is akin to
screening tool randomized controlled trials for drugs, and the
outcome of interest is the number of additional
cases correctly diagnosed or ruled out compared
with assessment as usual.
Phase IV screen Screening In this last step the screening tool/method is
implementation introduced clinically but monitored to discover the
studies using effect on important patient outcomes such as new
real-world identifications, new cases treated, and new cases
outcomes entering remission. In short, the question here is
how much the tool influences patient outcomes
and how well the tool is accepted by clinicians
(uptake).

After Mitchell AJ Psycho-Oncol 17: S141, 2008.

accurate. Given a sufficient sample, a tool may be refined by field testing. This
is the basis of the remainder of this chapter. Ultimately the value of a tool must
be proven in the clinical environment by comparison against either an estab-
lished tool or clinical skills alone. The acceptability and availability of the tool
will ultimately influence its uptake as much as its efficacy.
Given that there are a large number of imperfect but widely used instru-
ments, it follows many could be refined by adding or removing items or
changing the weighting of scoring or possibly the diagnostic algorithm.
There have been recent attempts to improve efficacy of screening instruments
using modern psychometrics, most notably using Rasch models. These models
are part of a family of measurement models developed for educational psy-
chology and increasingly employed in test development and refinement in
medicine. Very frequently it is found that conventional instruments may be
shortened in length without significantly decreasing screening efficacy.
Occasionally this shortening is dramatic, reducing an instrument by half or
by a quarter. Yet it should be acknowledged that the ability of these adapted
instruments to identify levels of a key outcome variable, such as ‘‘distress
warranting intervention,’’ remains less than perfect. Combining items drawn
from a number of emotional distress instruments into an item bank may
improve screening efficacy while at the same time minimizing the number of
questions patients are required to answer and consequently reducing patient
burden. Item banks such as these and computer-adaptive tests, which tailor the
86 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

questions presented to patients’ responses, have already been successfully


developed for assessing emotional distress in a psychiatric population.4,5
This chapter describes the Rasch model and its application to mental health
research in more detail.

2. The Rasch Model and Other Item Response Models


In classical test theory, item difficulty (eg, the probability of subjects
responding ‘‘yes’’ or ‘‘no’’ to items or selecting a category from a number of
response options) is calculated from the number of responses or proportion of
responses in the sample.6 The major drawback of this approach is that estima-
tion of item difficulty is sample dependent: the ‘‘endorsability’’ of any given
item will be larger if drawn from a more able population (eg, a healthier
population) than if drawn from a less able population. A similar approach
can also be applied to estimating ‘‘person ability’’ (eg, quality of life, physical
health). Any given estimate of an individual’s ability on a latent (ie, not directly
observable) trait will be dependent on the range of difficulties of the items
presented.
Rasch models7 overcome this problem of sample dependency by esti-
mating person ability and item difficulty independently.8 The raw data are
the sufficient statistics for estimating these parameters—that is, the models
use only the raw scores from individuals for estimating item difficulties
and the response sets across items for estimating person ability estimates.8
To achieve the separation of item and person parameter estimations, the
Rasch models rely on two assumptions: unidimensionality and local
dependence.
Rasch models assume that a uniform latent trait or construct underlies the
data being investigated (eg, mathematical knowledge, physical health). This
assumption is then tested using fit statistics and/or principal components
analysis of residuals. Local independence is related to unidimensionality and
refers to the assumption that the single latent trait (ie, the unidimensionality)
accounts for all the variance in the data—that is, the association between the
variables in a dataset should disappear once the Rasch model has been con-
trolled for.9 It is possible to have unidimensionality but not local dependence;
however, if local independence is proven, then there must also be unidimen-
sionality in the data set. If the assumptions have been met, then the (log)
probability of a person responding to an item can be expressed as the difference
between the individual’s ability and the item difficulty. Unlike in classical test
design, the person ability and item difficulty parameters are estimated jointly to
produce estimates (referred to as ‘‘logits’’ or log-odds), which are independent
of both the items and sample employed.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 87

Assessing the Rasch Model


A fundamental criterion underlying these models is unidimensionality—that
is, a single latent trait should explain the variance in the data. In the absence of
unidimensionality, constituent parts of an instrument cannot be summed to
create a summary index. Unidimensionality can be assessed through principal
components analysis, where the first factor extracted corresponds to the Rasch
‘‘factor,’’ or latent trait.10 Any additional factors extracted can be investigated
to confirm whether these form true factors or random noise. In addition to this,
unidimensionality can be assessed using fit statistics. Both item fit and person
fit to the Rasch model can be evaluated. Fit statistics have an expected value of
1.0 and can range from 0 to infinity. Deviations in excess of the expected value
can be interpreted as noise or lack of fit between the items and the model,
whereas values significantly lower than the expected value can be interpreted
as item redundancy or overlap.
Identifying misfitting items allows those items adding noise to the analysis
to be removed from a scale. The suggested limits for fit statistics are between
0.7 and 1.3, with those items with fit statistics greater than 1.3 being identified
as misfitting.11,12
A similar analysis may also be applied to the response categories and
thresholds (ie, the point at which response to categories is equally probable
between categories). Within the Rasch model the average level of the latent
trait (‘‘ability’’) should increase monotonically across categories. Disordering
of categories, where the average level of the latent trait does not increase in this
manner, may interfere with measurement precision. Therefore, disordered
response categories may be collapsed or items removed to improve fit to the
Rasch model.9
Finally, an additional requirement for Rasch models is item invariance—
that is, item parameter estimates should be independent of the sample used.
Item invariance or differential item functioning (DIF) may be evaluated using
defined subgroups (eg, gender, diagnosis).
When items fit the model, an interval scale is produced where differences
between adjacent scores on a scale are equally spaced. This has important
implications for measurement, since this allows meaningful comparisons to be
made of changes in scores of equal intervals along the latent trait.13 Recent
work has suggested that changes of around 0.5 logits may suggest a clinically
meaningful difference.14

Features of the Rasch Model


The Rasch model is more accurately referred to as belonging to a family of
models. Rasch’s original dichotomous model7 has been extended to
88 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

incorporate polytomous data—that is, from questionnaires incorporating mul-


tiple (more than two) response options. Popular models within health research
are the Rating Scale15 and the Partial Credit Model.16
In the Rasch model the estimates of person ability (or person measure) and
item location or difficulty are located along the same continuum (eg,
Depression). For instance, Figure 4.1 shows a ‘‘person-item map’’ from an
item bank developed for assessing emotional distress in cancer patients.17 The
left side of the map represents the distribution of person measures along the
continuum and the right side describes the location of the items.
As discussed above, the Rasch model describes a probabilistic relationship
between a person’s measure and the item location. For instance, from
Figure 4.1, the Rasch model allows us to state that a patient with a level of
distress around –1 logits will be more likely to endorse items at a corre-
sponding level, such as General Health Questionnaire (GHQ)-1 item (‘‘con-
centration’’) and MHI-1 (‘‘nervousness’’), as well as items below this point, but
would be less likely to endorse items further along the latent trait, such as
Patient Health Questionnaire (PHQ)-9 (‘‘suicidal ideation’’). This analysis can
be extended to the thresholds between each response category (Fig. 4.2).
An additional important feature of Rasch models is that the models can
equate different questionnaires completed by different subgroups of patients,
assuming that a common subset of items exists that all patients have completed.
This process then enables a range of items measuring the same latent trait to be
collated to form an item bank. The development of an item bank may help
improve static questionnaires by including fewer and more relevant questions,
which could cover a broader and more representative spectrum of the latent
trait (for assessment) or may be more focused on discrete areas of clinical
interest, such as clinical thresholds (for screening). It also paves the way for the
development of computer-adaptive testing,18 creating programs that tailor
questions to individual patients based on their previous responses, allowing
an accurate assessment of the patient (eg, level of psychological distress) with
fewer questions.
Taken together, Rasch models offer a number of advantages, including
improving existing measures, reducing the number of items in question-
naires, and allowing the development of item banks and computer-adap-
tive tests.

Application of the Rasch Model to Mental Health Measures


In traditional test theory, questionnaires are often designed and validated using
techniques such as factor analysis. In addition to the sample dependence of
these approaches as described above, rating scales produce ordinal data that do
not meet the assumptions behind factor analyses, potentially leading to
Person Measures Item Lovation

<more> | <rare>

4 +

– |

3 +

– |

2 – + phq9

– |

– |T

1 – + d2

–# | ef4 ghq8 phq6 phq8

–# T|S a7 bdi6 bdi8 d6 ef3 ghq12

0 – ## + a5 d1 ef1 phq1 phq2 stai13

– #### |M a1 a2 a3 a4 bdi1 ef2

– ###### S| bdi2 bdi9 ewb4 ghq3 mhi2 stail

–1 – ######### + bdi4 ewb1 ewb5 ghq1 mhi1 phq3

– ########## |S d4 mhi4 phq4

– ############ | ewb6

–2 – ############ M +

– ######### |T ghq7

– ######## |

–3 – ######### + bdi11

– ###### S|

– ##### | bdi12

–4 – +

– #### |

– T|

–5 – ### +

Figure 4.1. Item-Person Map for Item Bank.


PATSS MAP OF QUESS – 50% Cumulative probabilities (Rasch–Thurstone thresholds)

< more > |


4 + bdi6 .4
| bdi1 .4
. | ghq4 .4
3 + bdi6 .3 bdi2 .4
| bdi1 .3 ghq3 .4
. | phq9 .4
2 . + bdi2 .3 d2 .4
. | phq9 .3 ef4 .4
. |T d6 .4
1 . + a5 .4
.# | d2 .3 a2 .4 ewb4 .5
. # T | S bdi6 .2 ef4 .3 a1 .4 ewb1 .5
0 . ## + a5 .3 bdi12 .4 mhi4 .5
. #### | M bdi1 .2 bdi11 .3 d4 .4 ewb6 .5
. ###### S | bdi2 .2 a1 .3 ewb1 .4
–1 . ######### + bdi4 .2 bdi12 .3 phq4 .4
. ########## | S d2 .2 ewb1 .3 ewb6 .4
. ############ | phq1 .2 d4 .3
–2 . ############ M + a5 .2 ewb6 .3
. ######### | T d1 .2 ghq7 .3
. ######## | a1 .2
–3 . ######### + bdi11 .2
. ###### S |
. ##### | bdi12 .2
–4 . +
. #### |
. T|
–5 . ### +
|
. |
–6 . +
|
|
–7 +
|
| ghq8 .2
–8 + ghq12 .2
|
| ghq3 .2
–9 +
| ghq1 .2
|
–10 +
| ghq7 .2
|
–11 . ### +
<less> | <frequ>

Figure 4.2. Rasch-Thurstone Thresholds for Item Bank.


4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 91

misinterpretation of results.19 Furthermore, these ordinal scales are often


summed to produce total scores that are assumed to meet the criteria of interval
scales; frequently these assumptions are not tested.13
A number of studies have recently described the application of Rasch
models to mental health instruments to overcome the shortcomings of tradi-
tional test theory and design.

Unidimensionality, Item Reduction, and Differential Item Functioning


The Rasch model has been applied to a number of mental health instruments,
including the Beck Depression Inventory (BDI),20 the Zung Self-Rating
Depression Scale,21 the Geriatric Depression Scale (GDS),22 and the
Symptom Checklist (SCL-90 and SCL-90R) (see table 4.2).23 The application
of the model to four of the most commonly used mental health instruments,
namely the Center for Epidemiologic Studies Depression Scale (CES-D),24 the
Hospital Anxiety and Depression Scale (HADS),25 the Hamilton Depression
Scale (HAM-D),26 and the Edinburgh Postnatal Depression Scale (EPDS),27 is
discussed in this section.
These four instruments have been well validated using traditional test theory
involving reliability and validity studies and factor analyses, yet despite this

Table 4.2. Examples of Rasch-Refined Mood Scales

Stage Original Rasch-Derived Unidimensionality Reference


Length Length Shown
CES-D 20 items 13 items Yes Covic et al.
(2007)29
HADS 14 items 11 items Yes Smith et al.
(2006)31
EPDS 10 items 8 items Yes Pallant et al.
(2006)32
Hamilton 17 items 6 items No Licht et al.
(2005)35
Beck 21 items Not changed No Bouman & Kok
(1987)20
Zung SDS 20 items Not changed Yes Hong & Min
(2007)21
GDS 15 items 11 items Yes Tang et al.
(2005)22
SCL90 92 items 63 items Yes (for non- Olsen et al.
psychotic items) (2004)23
SCL25 25 items 8 items Yes Fink et al.
(1995)47
92 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

there has been little previous evidence to support the assumption that these
questionnaires are unidimensional.
Stansbury and colleagues28 applied the Rasch model to the full CES-D
completed by a large community sample of elderly participants. Four of the
positively worded items were identified as misfitting and removed. The
remaining 16 items formed a unidimensional structure that was verified
using confirmatory factor analysis. Additionally, the removal of the misfitting
items also reduced the floor effects that had been observed in this sample.
Covic and colleagues29 demonstrated, using a sample of patients with rheu-
matoid arthritis, that three additional items (appetite, restlessness, sadness)
misfitted the Rasch model. The resulting 13-item CES-D demonstrated good
internal validity. In contrast to these two studies, Pickard and colleagues30
found no misfit for the CES-D in primary care patients, although misfit was
reported for three items that were not positively worded in stroke patients.
Additionally, four items from this instrument demonstrated differential item
functioning when comparing the two patient samples.
Rasch studies of the HADS with cancer patients31 and patients attending an
outpatient musculoskeletal rehabilitation program32 showed that the full
instrument is broadly unidimensional, although the individual subscales con-
tained items that misfitted. Similarly, an analysis of the Edinburgh Postnatal
Depression Scale has recommended that the original 10-item form be reduced
to eight items to produce a unidimensional instrument.33
In addition to identifying misfit, Rasch models have also been used to
develop short forms of these standard instruments. For instance, a 10-item
version of the CES-D has been validated using both Rasch and traditional test
methods,34 as well as the 6-item version of the HAM-D.35 Licht and collea-
gues35 compared the unidimensionality of the Bech-Rafaelsen Melancholia
Scale (MES) and the 17-item HAM-D in 1,629 patients with a major depressive
episode using Mokken and Rasch analysis. Unidimensionality of the
HAM-D-17 could not be confirmed; however, the HAM-D-6 and the MES
did fulfill criteria for unidimensionality.
There have also been recent attempts to apply Rasch models to the standar-
dized psychiatric interview schedule for major depression.36 A modified SCID
interview was used on a large sample of twins from the Virginia Twin Registry
(n = 2,163). Participants were asked to report whether they had experienced
any of the 14 disaggregated DSM-III-R criteria for major depression. The
Rasch model was used to derive liability thresholds (the point at which there
is a 50% probability of a given diagnostic category being endorsed) for the
10 symptom criteria for major depression. The results demonstrated an uneven
spacing between liability thresholds where ‘‘depressed mood’’ was easiest to
endorse (–1.8 logits) and ‘‘suicidal ideation’’ at the other end of the latent trait
(2.5 logits) was hardest to endorse, suggesting a tentative link between the
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 93

latent trait as measured by the Rasch model and that derived from a formal
psychiatric interview.
Other more general distress and psychopathology tools have also been
tested using Rasch models. For example, the 90-item SCL and the 25-item
SCL-25 have been improved.23

Clinical Testing and Clinical Impact


Ultimately any tool (original or adapted) should be field tested, even if the
refinement is minor. In a robust test of a newly developed tool (let’s use the
hypothetical example of CES-D-Revised), the new scale would be compared
alongside the original scale, and unassisted clinical diagnosis against a robust
gold standard such as the SCID for DSM-IV major depression. Any additional
detection beyond the unassisted clinician would suggest that the scale is
clinically useful; any additional detection beyond that achieved by the original
scale would suggest that the new scale is an improvement. If the new version is
shorter, both accuracy and efficiency may be enhanced, and hence accept-
ability increased. If the new version is longer, accuracy may be improved at the
expense of efficiency, and then a clinical judgment is required to explore which
is most useful. Sadly, very few well-designed validation studies exist.
A few studies have employed Rasch models to assess the impact of misfit
and the subsequent removal of misfitting on the diagnostic accuracy of mental
health measures. Smith and colleagues31 applied the Rasch model to both the
full 14-item HADS25 as well as the 7-item anxiety and depression subscales.
In addition to completing the HADS, a subset of cancer patients had also
received a psychiatric assessment in the form of either the Present State
Examination (PSE)37 or the Schedules for Assessment in Clinical
Neuropsychiatry (SCAN World Health Organisation).38 Three items from
the full HADS were identified as misfitting the Rasch model, in addition to
one misfitting item from the subscales. Removal of the items had little or no
impact on the specificity and sensitivity of the scales (including the area under
the curve [AUC]).
Similarly, Tang and colleagues22 identified four items from the GDS that
did not fit the Rasch model. The GDS data were derived from a community
sample of patients with pneumoconiosis who had also received a structured
psychiatric interview with the aim of diagnosing depressive disorders. Once
again, the results demonstrated that removing the misfitting items did not affect
the AUC or sensitivity and specificity.

Item Banking and Computer-Adaptive Testing


The ability of the Rasch model to derive item locations for different instru-
ments and to allow evaluations of whether these items form a unidimensional
construct creates the opportunity to generate item banks. Various methods
94 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

exist for item banking39; however, a frequently employed approach is common


item equating,10 where patients complete a core set of questionnaires.
Additional items or instruments may be added by anchoring the locations for
the core set of items. Typically in this scenario patients will have completed the
core set of items along with further items. The benefit of item banking is that
patients do not have to complete all the questionnaires, which therefore reduces
not only patient burden but also the costs of developing the item bank.
After item banks are developed, two further steps can be taken: (1) the
development of multiple fixed short forms derived from the item bank (see
Ware and associates40 for an example of the development of a short form of the
headache impact scale) and (2) the development of computer-adaptive tests.
Computer-adaptive tests (eg, Wainer18) tailor the items presented to the patient
on the basis of his or her previous responses. They generally present an initial
item aimed at the average level of the latent trait in the target population
(eg, average level of depression); subsequent questions presented are either
easier or harder to endorse. At each step the patient’s level of latent trait
(eg, depression) is estimated until a predetermined number of questions has
been presented or the standard error of the estimate falls below a given
predetermined level.
Computer-adaptive test systems provide a greater level of precision in
estimating the latent trait and may be designed to allow a broad assessment
of, for instance, depression, or specifically designed to present more questions
around diagnostic categories. Another benefit of these systems is that fewer
questions need to be completed by the patient (for the same or greater level of
accuracy).
The development of item banks and computer-adaptive tests has been
progressing apace in fields such as physical health,41 although in mental
health this area is still in its infancy. However, recently an item bank has
been developed for assessing psychological distress in cancer patients.17
A large sample of cancer patients completed the HADS25 in addition to a
variety of other instruments, including the GHQ-12,42, BDI,43 PHQ-9,44 and
Spielberger State-Trait Anxiety Inventory (STAI).45 Common item equating
using the HADS as the anchor was used to create the item bank. The initial
83 items were reduced to a unidimensional item bank with good internal
reliability (Cronbach’s alpha = 0.84) consisting of 63 items once misfitting
items had been removed. An analysis of the item-person map (see Fig. 4.1)
demonstrated good face validity: questions concerning suicidal ideation
were hardest to endorse, whereas questions concerning fatigue and energy
were easiest to endorse. Further analysis of the item-person map revealed
that items tended to be targeted at moderate to high levels of distress,
indicating a floor effect for low levels of distress, potentially requiring
additional items.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 95

Computer-adaptive tests have already been developed for use with psy-
chiatric populations to identify emotional distress.4,5 Fliege and associates4
have developed a system for measuring depression (‘‘D-CAT’’) in a psy-
chosomatic patient sample. Patients completed 11 mental health question-
naires that were subsequently rated as indicative of depressive
symptomatology by expert reviewers. A total of 320 items from the original
questionnaires produced an item bank of 64 items. A simulation study using
patients’ actual responses to the questions demonstrated that levels of
depression could be estimated reliably from six items. Scores generated
from the D-CAT system fell within 2 standard deviations of the sample
mean and correlated well with the overall item bank and two standard
mental health measures (BDI and CES-D).
Finally, recently Gibbons and colleagues46 developed a computer-adaptive
test derived from the 626-item Mood and Anxiety Spectrum Scales (MASS).
This system was designed to identify anxiety and mood disorders in patients
attending outpatient clinics. The study demonstrated that the number of items
presented to patients could be reduced to 24 to 30 items without a loss of
information, representing a significant reduction in both administration time
and patient burden.

3. Conclusion
Despite the intuitive appeal and ease of use of brief self-report instruments to
screen for depressive disorders, there remains a great deal of variability in the
efficacy of a number of commonly employed instruments. Many instruments
have been comprehensively validated by traditional test methods, but issues
still remain about unidimensionality, floor and ceiling effects, and instrument
performance across different groups of patients. Rasch models7 have the
potential to address and overcome these issues, generating instruments that
are independent across samples and providing the basis for item banks and
computer-adaptive tests.
Although item banking is a relatively new area of development in health
measures, the U.S. National Institutes of Health has recently provided major
funding for the Patient-Reported Outcomes Measurement Information System
(PROMIS) initiative, with one of the goals to produce computer-adaptive tests
for the clinical research community (http://nihroadmap.nih.gov/clinicalresearch/
promis.asp). The next step in the development of the item bank will be to develop
computer-adaptive testing systems. An important corollary to this will be to
continue to map the item bank, in particular levels of emotional distress, to both
psychiatric diagnoses of clinical anxiety and major depression, as well as clinical
guidelines. This will not only provide a potentially more sensitive instrument for
96 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

assessing and screening for distress, but will also assist in tailoring the manage-
ment of distress and associated interventions to individual patients.

References
1. Wright AF. Should general practitioners be testing for depression? Br J Gen Pract.
1994;44(380):132–135.
2. Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for
depression. Cochrane Database of Systematic Reviews. 2005, Issue 4.
3. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression:
a meta-analysis. CMAJ. 2008;178:997–1003.
4. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for
depression (D-CAT). Qual Life Res. 2005;14:2277–2291.
5. Walter OB, Becker J, Bjorner JB, et al. Development and evaluation of a computer
adaptive test for ‘Anxiety’ (Anxiety-CAT). Qual Life Res. 2007;16:S143–S155.
6. Suen HK. Principles of test theories. Hillsdale, NJ: Lawrence Erlbaum Associates,
1990.
7. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: The
University of Chicago Press, 1960/1980.
8. Wright BD, Masters G. Rating scale analysis. Chicago: MESA Press, 1982.
9. Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human
sciences. Mahwah, NJ: Lawrence Erlbaum Associates, 2001.
10. Linacre JM. A user’s guide to WINSTEPS/MINISTEPS Rasch-model computer
programs. 2007.
11. Lai JS, Cella D, Chang CH, et al. Item banking to improve, shorten and computerize
self-reported fatigue: an illustration of steps to create a core item bank from the
FACIT-Fatigue Scale. Qual Life Res. 2003;12(5):485–501.
12. Wright BD, Linacre JM, Gustafson J-E, et al. Reasonable mean-square fit values. Rasch
Measurement Transactions. 1994;8:370.
13. Stucki G, Daltroy L, Katz JN, et al. Interpretation of change scores in ordinal clinical
scales and health status measures: the whole may not equal the sum of the parts. J Clin
Epidemiol. 1996;49:711–717.
14. Lai JS, Eton DT. Clinically meaningful gaps. Rasch Measurement Transactions.
2002;15:850.
15. Andrich D. A rating formulation for ordered response categories. Psychometrika.
1978;43:561–573.
16. Masters GN. A Rasch model for partial credit scoring. Psychometrika.
1982;47:149–174.
17. Smith AB, Rush R, Velikova G, et al. The initial development of an item bank to assess
and screen for psychological distress in cancer patients. Psychooncology.
2007;16:724–732.
18. Wainer H. Computerized adaptive testing: a primer. Hillsdale, NJ: Lawrence Erlbaum
Associates, 1990.
19. Schumacker RE, Linacre JM. Factor analysis and Rasch. Rasch Measurement
Transactions. 1996;9:470.
20. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI):
applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand.
1987;76(5):568–573.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 97

21. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale
incorporating latent class and Rasch rating scale models. Educ Psych Measure.
2007;67(2):280–299.
22. Tang WK, Wong E, Chiu HF, et al. The Geriatric Depression Scale should be shortened:
results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20:783–789.
23. Olsen LR, Mortensen EL, Bech P. The SCL-90 and SCL-90R versions validated by
item response models in a Danish community sample. Acta Psychiatr Scand.
2004;110(3):225–229.
24. Radloff LS. The CES-D scale: A self-report depression scale for research in the general
population. Applied Psych Measure. 1977;384–401.
25. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr
Scand. 1983;67:361–370.
26. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry.
1960;23:56–62.
27. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression. Development of
the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786.
28. Stansbury JP, Ried LD, Velozo CA. Unidimensionality and bandwidth in the Center for
Epidemiologic Studies Depression (CES-D) Scale. J Pers Assess. 2006;86:10–22.
29. Covic T, Pallant JF, Conaghan PG, et al. A longitudinal evaluation of the Center for
Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population
using Rasch analysis. Health Qual Life Outcomes. 2007;5:41.
30. Pickard AS, Dalal MR, Bushnell DM. A comparison of depressive symptoms in stroke
and primary care: applying Rasch models to evaluate the Center for Epidemiologic
Studies-Depression scale. Value Health. 2006;9:59–64.
31. Smith AB, Wright EP, Rush R, et al. Rasch analysis of the dimensional structure of the
Hospital Anxiety and Depression Scale. Psychooncology. 2006;15:817–827.
32. Pallant JF, Tennant A. An introduction to the Rasch measurement model: an example
using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol.
2007;46:1–18.
33. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Postnatal Depression
Scale using Rasch analysis. BMC Psychiatry. 2006;6:28.
34. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived
CES-D short form. Psychol Assess. 2004;16:360–372.
35. Licht RW, Qvitzau S, Allerup P, et al. Validation of the Bech-Rafaelsen Melancholia
Scale and the Hamilton Depression Scale in patients with major depression; is the total
score a valid measure of illness severity? Acta Psychiatr Scand. 2005;111:144–149.
36. Aggen SH, Neale MC, Kendler KS. DSM criteria for major depression: evaluating
symptom patterns using latent-trait item response models. Psychol Med.
2005;35:475–487.
37. Wing J Cooper JE, Sartorius N. The description of psychiatric symptoms: an
introduction manual for the PSE and CATEGO System. Cambridge: Cambridge
University Press, 1974.
38. World Health Organization. Mental health: new understanding, new hope. Geneva,
Switzerland: WHO, 1993.
39. Wolfe EW. Equating and item banking with the Rasch model. J Applied Measure.
2000;1(4):409–434.
40. Ware JE Jr, Kosinski M, Bjorner JB, et al. Applications of computerized adaptive
testing (CAT) to the assessment of headache impact. Qual Life Res.
2003;12(8):935–952.
98 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

41. Rose M, Bjorner JB, Becker J, et al. Evaluation of a preliminary physical function item
bank supported the expected advantages of the Patient-Reported Outcomes
Measurement Information System (PROMIS). J Clin Epidemiol. 2008;61:17–33.
42. Goldberg DP, Hillier VF. A scaled version of the General Health Questionnaire.
Psychol Med. 1979;9:139–145.
43. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch
Gen Psychiatry. 1961;4:561–571.
44. Kroenke KJ, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression
severity measure. Gen Intern Med. 2001;16:606–613.
45. Spielberger CD. Manual for the State-Trait Anxiety Inventory (STAI). Palo Alto, CA:
Consulting Psychologists Press, 1983.
46. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce
the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368.
5
HOW DO WE KNOW WHEN A SCREENING
TEST IS CLINICALLY USEFUL?

Alex J. Mitchell

1. How Do Clinicians Make a Diagnosis?


2. Scientific Aspects of Diagnostic Accuracy
3. Clinical Aspects of Diagnostic Accuracy
4. Testing Screening via Implementation Studies
5. Conclusions

Context
There is no shortage of suggested methods to screen for depression, including
clinical interviews. Assuming these are applied to a group containing patients with
depression and patients without depression, how do we decide which are the
optimal methods? In addition, how can tests be compared and how can tests
be combined? This chapter discusses the methods used to compared scales and
tools.

1. How Do Clinicians Make a Diagnosis?


The terms diagnosis and screening both refer to the application of an agreed
method to confirm those with a condition and to exclude those without the
condition (for discussion see Chapter 2). When attempting to separate
depressed versus non-depressed individuals there is always an overlap of
symptoms (or biological markers) (see Chapter 1, Fig. 1); therefore, a perfect
test based on current tests is unobtainable. Testing may be focused on those at
high risk of the condition (such as screening for depression after myocardial
infarction) or applied to a wider population (screening for depression in all

99
100 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

primary care patients). The former is a high-prevalence setting, which favors the
ability to confirm a condition, whereas the latter is a low-prevalence setting,
which favors the ability to refute a condition. It is often forgotten that the clinical
process of making a diagnosis is a form of screening itself. Here the tool is the
clinician’s clinical skill and the sample is all patients seen by the clinician. If a
clinician is attuned to the concept of depression, has a high index of suspicion,
and asks the right questions, then it is likely he or she will have high personal
diagnostic accuracy. If the clinician is unconfident, inexperienced, and
untrained, it is less likely that he or she will be able to make a correct diagnosis
(see Table 5.1 and Chapter 3). Some literature suggests that the added value of
screening tools for depression is apparent only in the latter situation.
A diagnostic test for depression is designed to help the clinician elicit and
weigh symptoms and signs to make a diagnosis. How, then, is this achieved,
and how does a screening test work in scientific terms?

Case Example
Consider the case illustrated in Textbox 5.1. A man who suffered a stroke
2 months previously now complains of five troubling symptoms. Assuming
these symptoms are elicited correctly, is he clinically depressed? Could the
somatic symptoms be features of stroke and not depression (see Chapters 10
and 11)? Five symptoms may immediately sound sufficient for a diagnosis,
but not all symptoms qualify under DSM-IV or ICD-10. For example, loss of
drive is not a qualifying feature and therefore, under these guidelines, must be
ignored. This leaves four qualifying symptoms and only one core symptom,
which is insufficient for a DSM-IV-based diagnosis of major depression.
However, using ICD-10, he does have two core features and two associated
features listed, but only at a level designated as a mild depressive episode.
Thus, clinicians who use a strict operational checklist approach may or may
not diagnose depression in this case. In fact, research suggests that fewer than
one in five psychiatrists would take this strict operational approach, and
fewer still use validated questionnaires such as the Patient Health

Table 5.1. Levels of Diagnostic Confidence

Prior Experience & No Prior Experience &


Training Training
Use a checklist or screening i. Trained, Assisted ii. Untrained, Assisted
tool
Do not use a checklist or iii. Trained, Unassisted iv. Untrained, Unassisted
screening tool
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 101

Textbox 5.1. Case History: Post-Stroke Depression?

A previously well 58-year-old man who suffered a dominant hemisphere


stroke 2 months previously is referred to an outpatient psychiatry clinic. He
reports that he has had five symptoms—low mood, loss of drive, low energy,
poor appetite, and insomnia—for the past 3 weeks. He has no other symptoms
on detailed questioning.

Core Symptoms ICD-10 DSM-IV


Persistent sadness or low mood Yes (core) Yes (core)
Loss of interests or pleasure Yes (core) Yes (core)
Fatigue or low energy Yes (core) Yes
Disturbed sleep Yes Yes
Poor concentration or indecisiveness Yes Yes
Low self-confidence Yes No
Poor or increased appetite Yes No
Suicidal thoughts or acts Yes Yes
Agitation or slowing of movements Yes Yes
Guilt or self-blame Yes Yes
Significant change in weight No Yes

Questionnaire (PHQ)-9. Most trained psychiatrists rely on their own clinical


skills.
Similarly, in primary care, in a survey of 2,500 Australian primary care
practitioners (PCPs), Krupinski and Tiller (2001)1 found that 28% asked about
at least five of the nine standard DSM-IV symptoms. The two symptoms that
were most frequently asked about were sleep disturbance (cited by 86.8%) and
loss of appetite (cited by 55.6%). Only 0.2% of this sample said they would
make a diagnosis using a rating scale.

Toward Evidence-Based Diagnosis


Is ICD or DSM right to place more weight on some symptoms than others? If
so, there must be evidence that specific symptoms have more diagnostic
importance than others. This means that these methods have been subject to
comparative diagnostic validity testing. Most clinicians (psychiatrists and
non-psychiatrists alike) use their own clinical acumen to make a diagnosis
without using any specific tool, but they may have personal experience of the
diagnostic importance of specific symptoms. Even those using DSM-IV still
have to use clinical judgment because there are no recommended structured
102 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

questions in DSM.2 Conventional clinical method replies on experience and


pattern recognition, whereas actuarial judgment uses decision theory
informed by empirically established tests.3 In both cases, reaching a diag-
nosis means narrowing down a long list of possibilities in light of accumu-
lating clinical evidence. However, in the former case it is difficult to check
for inaccuracy, whereas in the latter case there is an attempt to diagnose on
the basis of calculated probabilities. The standard model for this task is
Bayes’ theorem, which calculates post-test probability in relation to the
baseline probability (Fig. 5.1). The baseline (pre-test) probability of the
condition is the local prevalence of the disease, and the post-test probability
is the probability of disease given new information such as a positive test
result.4
Before assuming that assisted methods (eg, screening) are helpful, it is
worth checking on the evidence base for unassisted detection (see Chapter 3).

Textbox 5.2. Definitions of Measures of Diagnostic Accuracy

Sensitivity (Se)
A measure of accuracy defined the proportion of patients with disease in
whom the test result is positive: a/(a + c)
Specificity (Sp)
A measure of accuracy defined as the proportion of patients without disease
in whom the test result is negative: d/(b + d)
Positive Predictive Value
A measure of rule-in accuracy defined as the proportion of true positives in
those with a positive screening result: a/(a + b)
Negative Predictive Value
A measure of rule-out accuracy defined as the proportion of true negatives in
those with a negative screening result: c/(c + d)
Youden’s J
A composite of overall accuracy using sensitivity and specificity that is
unaffected by prevalence: sensitivity + specificity – 1
Predictive Summary Index
A composite of overall accuracy using all positive and negative screens that
reflects the prevalence: PPV + NPV – 1
Kappa
An index that compares the agreement against that which might be expected
by chance. Kappa can be thought of as the chance-corrected proportional
agreement: (Observed agreement – Chance agreement)/(1 – Chance
agreement)
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 103

Decision Trees

Test Positive
Treated condition
Condition Sensitivity
Pre valence X Sens
Pre valence
Test Negative
Untreated condition
Screen 1-sensitivity
Pre valence X T-Sens
Test Positive
False positive
No condition 1-specificity
1-Prev X 1-Spec
1-Pre valence Test Negative
Healthy child
specificity
1-Prev X Spec

Condition
Untreated condition
Pre valence
Pre valence
Don’t Screen

No condition
Healthy child
1-Pre valence
1-Pre valence

Figure 5.1. Decision Theory.

2. Scientific Aspects of Diagnostic Accuracy


Attempts to distinguish patients with a condition from those without on the
basis of a test or clinical method are most simply represented by a 2  2 table
that generates sensitivity, specificity, positive predictive value (PPV), and
negative predictive value (NPV) (Textbox 5.2).5 It is critical to understand
the difference between looking vertically down cells and looking horizontally
across (Figure 5.2). Vertically, the denominator is the number of cases with or
without the condition, a number that is unknown to the clinician but is known in
a research setting with a gold standard. Horizontally, the dominator is the
number of positive or negative screens, a number that is known to clinicians
and hence the reason why PPV and NPV reveal proportions of interest in the
real world. There is a complex relationship between these variables. In real life
the performance of a test varies with the baseline prevalence of the condition.
Put simply, it is simple to spot cases when nothing but cases exist (prevalence =
100%); conversely, it is hard when the prevalence is low.6 Rule-in and rule-out
accuracy are essentially independent variables, although a test may perform
well in both directions. Rule-in accuracy is best measured by the PPV, but a
high specificity also implies there are few false positives, and hence any
positive results will suggest a true case.7 Rule-out accuracy is best measured
by the NPV where the denominator is all who test negative, but again if the
104 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Gold Standard Gold Standard


Disorder No Disorder
Test +ve A/A + B
A B PPV

Test –ve D/C + D


C D NPV

Total A/A + C D/B + D


Se Sp

Figure 5.2. Generic 2  2 Table.

sensitivity is high, there will be few false-negative results, and hence any
negative implies a true non-case.
Optimal accuracy is often achieved by choosing one test for rule in (case-
finding) and another for rule out, but not uncommonly only a single test can be
applied and it must perform as well as possible in both directions. In this situation
summary accuracy statistics are useful. The simplest are Youden’s J and the
predictive summary index, which are essentially averages of sensitivity + spe-
cificity and PPV + NPV, respectively.8 The fraction correct (ratio of true cases
and non-cases/all cases and non-cases) is also useful, as it can easily be used to
compare different methods. All such methods work well when the optimal cutoff
is known or in binary (yes/no) tests. However, where performance varies by
cutoff threshold, sensitivity versus specificity for each cutoff generates a
receiver-operator curve, and the area under the curve gives a measure of the
overall performance. Where multiple tests need to be compared, each with
different optimal sensitivity and specificity values, results can be combined in
a summary receiver operator characteristic curve (sROC).9 Additionally when
the relative importance of false positives or false negatives is significant, then a
cutoff may be chosen that favors rule-in or rule-out accuracy.

Likelihood Ratios
Likelihood ratios can be clinically useful because they do not vary with
prevalence and because they can be calculated for several levels of test
result. A positive likelihood ratio is the odds that a positive test result came
from a patient with the disorder (sensitivity/[1 – specificity]). The negative
likelihood ratio represents the odds that a negative result came from a patient
with the disorder ([1 – sensitivity]/specificity).
A normogram (Fig. 5.3) has been developed for use with likelihood ratios to
determine the post-test probability of disease if the pre-test probability and the
likelihood ratio for the specific test are known. A likelihood ratio greater than 1
produces a post-test probability that is higher than the pre-test probability.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 105

0.1 99

0.2 98

0.5 95
2000
1 1000 90
500
2 200 80
100
50 70
5
20 60
10 10 50
5 40
20 2 30
1
30 0.5 20
40 0.2
50 10
0.1
60 0.05
5
70 0.02
0.01
80 0.005 2
0.002
90 1
0.001
0.0005
95 0.5

98 0.2

99 0.1
Pre-Test Likelihood Post-Test
Probability (%) Ratio Probability (%)

Figure 5.3. Likelihood Ratio Normogram.

3. Clinical Aspects of Diagnostic Accuracy


The best way to understand the clinical applicability of a screening test is to
consider the example listed in Textbox 5.1. The patient complains of five
symptoms and has data from a single Hospital Anxiety and Depression Scale-
Depression (HADS-D) rating. Are these symptoms likely to be symptoms of
depression or do they occur in people with stroke who are not depressed? The
diagnostic impact of each piece of information can be evaluated scientifi-
cally, provided its rate of occurrence is known in both groups (Textbox 5.3
lists these rates). The occurrence rate in the depressed sample is in fact the
sensitivity of each specific item. Thus, the symptom with optimal sensitivity
106 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

is ‘‘persistent low mood.’’ Specificity is derived from the non-occurrence in


the non-depressed subject, and in this case the optimal specificity is a HADS
score of 9 of above, closely followed by poor appetite. However, does this
mean these are the best ‘‘tests’’ for this condition?

Textbox 5.3. Post-Stroke Depression: Symptom Counts

A previously well 58-year-old man who suffered a dominant hemisphere


stroke 2 months previously is referred to an outpatient psychiatry clinic. He
reports that he has had five symptoms—low mood, loss of drive, low energy,
poor appetite, and insomnia—for the last 3 weeks. His score on the HADS
depression scale is 9 out of 21. Out of the last 100 patients seen in this clinic,
54 were depressed.

Patient’s Symptoms % of Depressed Stroke % of Nondepressed Stroke


Patients from Previous Patients from Previous
Studies Studies
Persistent low mood 93% 18%
Loss of drive 88% 30%
Low energy 87% 32%
Disturbed sleep 83% 32%
Poor appetite 45% 11%
HADS score 9 60% 9%
or above

Pre-Test–Post-Test Change
As previously noted, raw sensitivity and specificity figures are of only mod-
erate use by themselves. More useful are the PPV and NPV, which can be
calculated from the above data. The data from Textbox 5.3 are reproduced in
detail in Table 5.2. From this study of 1,000 people following stroke, we see the
complexity of deciding upon the optimal test. Persistent low mood is the
symptom with highest sensitivity and NPV. Thus, if low mood is not present,
there is a 98% chance of identifying a healthy subject on this symptom alone.
This alone improves upon the pre-test probability of 0.80 by 0.18 (pre–post
gain) (Fig. 5.4). Similarly, if all five symptoms listed are present, there is an
88% chance of major depression, a large pre–post gain. This is different from
calculating the value of any one of the five symptoms, which compares ‘‘or’’
rather than ‘‘and’’ combination.
Table 5.2. Summary of Diagnostic Accuracy Results from a Hypothetical Study of Post-Stroke Depression

Patient’s Depressed TP Sensitivity Non- TN Specificity PPV NPV Youden PSI FC UI+ UI
Symptoms after Stroke Depressed
after Stroke

Single Symptoms
Persistent low mood 200 186 0.93 800 656 0.82 0.56 0.98 0.75 0.54 0.84 0.52 0.80
Loss of drive 200 176 0.88 800 560 0.70 0.42 0.96 0.58 0.38 0.74 0.37 0.67
Low energy 200 174 0.87 800 544 0.68 0.40 0.95 0.55 0.36 0.72 0.35 0.65
Disturbed sleep 200 166 0.83 800 544 0.68 0.39 0.94 0.51 0.33 0.71 0.33 0.64
Poor appetite 200 90 0.45 800 712 0.89 0.51 0.87 0.34 0.37 0.80 0.23 0.77
Composite Measures
All five symptoms 200 56 0.28 800 792 0.99 0.88 0.85 0.27 0.72 0.85 0.25 0.84
PHQ2 (Q1 or Q2 200 160 0.80 800 560 0.70 0.40 0.93 0.50 0.33 0.72 0.32 0.65
positive)
HADS: score 9 or 200 130 0.60 800 728 0.91 0.64 0.91 0.51 0.56 0.86 0.39 0.83
above
Algorithm: PHQ2 then 200 96 0.48 800 778 0.97 0.81 0.88 0.45 0.70 0.87 0.39 0.86
HADS (if positive)
Sample size = 1,000; prevalence = 0.20
TP, true positives; TN, true negatives; PSI, predictive summary index; FC, fraction correction; UI, utility Index.
108 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

0.8
Post-test probability

0.6

Max gain
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
Prevalence of prior probability

Figure 5.4. Conditional probabilities graph of pre-test post-test gain from a hypothetical
diagnostic test.

Surely, then, the five-symptom method is the best method to identify


post-stroke depression? In the real world, the situation is more complex than
it first appears because all five symptoms are positive in only 28% of true
cases.

Clinical Utility of a Discriminating Test


Even when a test has a high PPV or NPV, a correction is needed for occurrence
of that test in each respective population. Thus, in this example, if a combina-
tion of five symptoms occurs, then it is 88% likely that major depression is
present; however, this combination is actually uncommon (28%) in clinical
practice. For the clinician, any test with a high PPV will be devalued if it occurs
rarely in true cases. Clinically relevant rule-in accuracy (also known as the
positive utility index) is a product of the PPV and sensitivity. Thus, the positive
utility index for all five symptoms is 0.88  0.28 = 0.32. A similar calculation
applies for ruling out a diagnosis. For example, the symptom ‘‘loss of drive’’
has a high NPV but is negative in only 70% of non-depressed stroke patients.
Thus, its corrected rule-out value can be calculated by the negative utility
index, 0.96  0.70 = 0.67. Utility index scores can be converted into qualita-
tive grades as follows: excellent  0.81, good  0.64, satisfactory  0.49, and
poor < 0.49.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 109

In this example, the most useful population-based rule-in test is low mood,
although it is only a ‘‘satisfactory’’ test. The most useful rule-out test is the
algorithm approach, which can be graded as an ‘‘excellent’’ rule-out test.
Algorithm approaches are worth examining in a bit more detail.

Algorithm Approaches
In this example, three questionnaire approaches are shown. The PHQ-2
achieves modest sensitivity and specificity and identifies 77% of all true
cases. The HADS-D has excellent specificity and NPV and thus could be used
as a rule-out test. Indeed, it could be combined with a high cutoff (eg, 15v16)
as a good rule-in test, leaving a cohort scoring 9 to 15 as diagnostically
uncertain and requiring a second-stage test. The HADS can also be combined
with another questionnaire, in this case the PHQ-2 (see Appendix Fig. 2).
This is a basic algorithm approach where a second test is applied only in those
positive in the first step. This two-step strategy has the effect of reducing the
false positives, improving the PPV and specificity but at the expense of
sensitivity and NPV. In low-prevalence conditions, the overall gain in accu-
racy may be worth the effort of the extra step. Thus, the two-step strategy
improves on the 0.40 PPV from the PHQ-2 alone to 0.81 but reduces the NPV
from 0.93 to 0.88. However, there is an overall gain in accuracy from 65% to
86% correctly identified.
Clinicians may use their own clinical method as an algorithm—for
example, offering a follow-up interview to those who are suspected of
having a disorder on initial examination. The algorithm often offers a
potential economic and efficiency advantage over a conventional approach.
Here the majority of patients receive a simple, inexpensive screening test
and a minority receive a more lengthy case-finding test. However, the
algorithm approach is efficient only where the prevalence of a condition
is very low (or very high, in which case the second step is applied to those
who screen negative to reduce the false negatives). As the prevalence
approaches 0.50, the yield of two-steps converges on the yield from one-
steps. The gain is also at its greatest when the accuracy of the single-step
approach is least (see Appendix Tables 3 and 4 for more details). A practical
example of an algorithm approach to the detection of depression can be
found here.10

4. Testing Screening via Implementation Studies


Even a test of high predictive value and high utility index cannot be assumed to
be beneficial. Guidelines from the U.K. National Screening Committee are
110 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 5.4. U.K. National Screening Committee Guidelines

The condition should:


 Be an important health issue
 Have a well-understood history, with a detectable risk factor or disease
marker
 Have cost-effective primary preventions implemented
The screening tool should:
 Be a valid tool with known cutoff
 Be acceptable to the public
 Have agreed diagnostic procedures
The treatment should:
 Be effective, with evidence of benefits of early intervention
 Have adequate resources
 Have appropriate policies as to who should be treated
The screening program should:
 Show evidence that benefits of screening outweigh risks
 Be acceptable to public and professionals
 Be cost-effective (and have ongoing evaluation)
 Have quality-assurance strategies in place
Adapted from UK National Screening Committee Criteria for appraising the
viability, effectiveness and appropriateness of a screening programme.
Available at: http://www.nsc.nhs.uk/pdfs/criteria.pdf

helpful here (Textbox 5.4). Ultimately, the case for a screening test has to be
proven in an implementation study. This has two important parts: the feasi-
bility of the tool in a clinical setting and the added value of the tool beyond
what could be achieved without it.

Feasibility of Depression Screening


Feasibility asks whether a tool is practical both in application and scoring to
gain acceptance by healthcare professionals and patients. This has been
rarely studied in relation to depression severity scales. Bermejo and associ-
ates (2005)11 looked at attitudes to the PHQ-9 in general practice in
Germany. This study enrolled 1,034 patients from 17 PCPs; both patients
and healthcare professionals were asked about acceptability. Patients found
the instrument highly acceptable, but 62.5% of the PCPs thought it was too
long and 37.5% thought it was too time-consuming, even though it typi-
cally took 1 to 2 minutes. Half of the PCPs rated the PHQ as an impedi-
ment to daily practice and 75% thought it was impractical, compared with
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 111

only 25% of patients. One proxy for feasibility is willingness of clinicians


to use the test: any screening roll-out will be compromised if front-line
staff find the tool too difficult to administer or score.

Added Value
Demonstrating the possible benefit of a screening tool is akin to demonstrating
benefit from a new medicine. Ideally, a randomized controlled trial using
representative clinicians and patients takes place. The design should be a
randomized trial where one group (arm 1) use their clinical skills uninfluenced
by the study taking place (Hawthorn effect) and the other group (arm 2) use
their clinical skills plus the screening tool or method. The advantage of this
design is that the results reveal the unassisted detection rate (arm 1) as well as
added value beyond usual care (the difference between arm 2 and arm 1).
Possible stages of tool development are discussed in Chapter 4.
Ideally, implementation should not stop with demonstration of superior
detection; rather, it should attempt to demonstrate further patient benefits,
such as better quality of care and greater resolution of depression. This is
discussed further in Chapter 7.

5. Conclusions
Although depression is one of the world’s most prevalent disorders and anti-
depressants are the most commonly prescribed class of drug, the science of
diagnosing depression has been hampered by the paucity of simple studies
documenting the rate of symptoms and signs in depressed and non-depressed
subjects. Once these data become available, calculating the diagnostic value of
specific symptoms (both individually and in combination) becomes straight-
forward. Better data exist for depression severity scales and other assisted
methods. Beyond this, further implementation studies are required in which the
true benefit of all proposed diagnostic methods to patients are compared with
conventional unassisted approaches.

References
1. Krupinski J, Tiller J. The identification and treatment of depression by general
practitioners. Aust N Z J Psychiatry. 2001;35:827–832.
2. Steiner JL, Tebes JK, Sledge WH, et al. A comparison of the structured clinical
interview for DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183:365–369.
3. Steadman HJ, Silver E, Monahan J, et al. A classification tree approach to the
development of actuarial violence risk assessment tools. Law and Human Behavior.
2000;24:83–100.
112 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

4. Elstein AS, Schwarz A. Clinical problem solving and diagnostic decision making:
selective review of the cognitive literature. BMJ. 2002;324:729–732.
5. Yerushalmy J. Statistical problems in assessing methods of medical diagnosis, with
special reference to X-ray techniques. Pub Health Rep. 1947;62:1432–1449.
6. Whiting P, Rutjes AWS, Dinnes J, et al. Development and validation of methods for
assessing the quality of diagnostic accuracy studies. Health Technology Assessment.
2004;8(25):1–234.
7. Sackett DL, RB Haynes. The architecture of diagnostic research. BMJ. 2002;324:539–541.
8. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35.
9. Macaskill P. Empirical Bayes estimates generated in a hierarchical summary ROC
analysis agreed closely with those of a full Bayesian analysis. J Clin Epidemiol.
2004;57:925–932.
10. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major depression
among patients with coronary artery disease using the Patient Health Questionnaire:
Data from the Heart and Soul Study. J Gen Intern Med. 2008;23(12):2014–2017.
11. Bermejo I, Niebling W, Mathias B, et al. Patients’ and physicians’ evaluation of the
PHQ-D for depression screening. Primary Care & Community Psychiatry.
2005;10(4):125–131.
6
CLINICAL JUDGMENT AND THE INFLUENCE
OF SCREENING ON DECISION MAKING

Howard N. Garb

1. Introduction
2. Research on Clinical Judgment
3. The Limits of Screening

Context
How do clinicians arrive at diagnostic decisions? In most cases the
decision is not made following formal criteria, but by intuition. In addi-
tion, routine interviews are often narrow and the feedback gleaned from
patients is inadequate. Yet it is not clear if screening helps or hinders
clinical judgment. It might be that only clinicians who have low confi-
dence and interviewing and diagnostic skills are open to the use of and
actually helped by diagnostic tools.

1. Introduction
To provide a theoretical framework for understanding why it is difficult for
physicians to detect depression in primary care settings, a broad array of
research in the mental health fields can be described. For example, more than
1,000 studies have been conducted on clinical judgment in the area of mental
health practice,1,2 and the results from these studies can be used to illuminate
the challenges physicians face in judging whether a patient is clinically

*
The views expressed in this article are those of the author and are not the official policy of the
Department of Defense or the United States Air Force.

113
114 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

depressed and can benefit from treatment. In this chapter, results on clinical
judgment will be described.
A second topic will also be briefly discussed. Results from research on
clinical judgment would seem to indicate that screening should be of value.
Yet, as noted in Chapter 7, stand-alone screening programs have added little
or nothing to outcomes. Reasons for this unexpected result will be explored.

2. Research on Clinical Judgment


Three topics will be discussed: (1) narrowness of interviews, (2) nature of
patient feedback, and (3) the cognitive processes of clinicians.

Narrowness of Interviews
Depression goes undetected because in many cases physicians do not ask
patients if they have symptoms of a depressive mood disorder.3 To place this
in context, it can be noted that mental health professionals also often do not ask
patients about important symptoms and behaviors. Failure to inquire about
depression in primary care settings can be viewed in the broader context of
failure to inquire about important symptoms and events in mental health
settings.
Research on clinical judgment has demonstrated that lack of comprehen-
siveness is often a problem for interviews made in clinical practice. For
example, in one study,4 mental health professionals saw patients in routine
clinical practice, and afterwards research investigators conducted semi-struc-
tured interviews with the patients. Remarkably, the mental health professionals
had evaluated only about 50% of the symptoms that were recorded using the
semi-structured interviews.
Similarly, a number of studies have found that mental health professionals
often do not ask about important events when formulating a case history. For
example, in a study by Malone and associates (1995),5 clinicians at a psychia-
tric hospital failed to document a history of suicidal behavior for 12 of 50
patients who had a history of suicidal behavior. This is important because past
suicidal behavior is one of the best predictors of suicide. In another study,6 26
of 69 psychiatric inpatients reported on a research questionnaire that they were
victims of severe physical abuse by family members or partners during the past
year. The abuse had been documented in medical charts for only nine of the
patients. To give one more example, in another study a computer interview was
used to collect a psychiatric history.7 Important history information was
obtained using the computer interview that had not been obtained by mental
health professionals in the course of their routine work. This was especially
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 115

true for obtaining information about criminal history (26% of patients),


amnesic blackouts after drinking heavily (23%), repeatedly being fired from
jobs (17%), recent drug abuse (10%), and debts (10%).
Another type of error that occurs when evaluations of psychopathology are
not comprehensive is called diagnostic overshadowing. Diagnostic oversha-
dowing is said to occur when clinicians make one or two diagnoses but over-
look other disorders.8,9 For example, when diagnoses are made by mental
health professionals, mental disorders tend to be missed among clients with
mental retardation,10,11 alcohol and drug abuse is often underdiagnosed among
clients presenting with psychiatric problems,12 and diagnoses of personality
disorder are often missed among clients with an Axis I disorder (eg, among
clients with obsessive-compulsive disorder).13
If mental health professionals fail to ask about important emotional and
behavioral problems and overlook mental disorders, it is not surprising that
physicians who are not trained in psychiatry do the same. Since patients in
primary care settings almost always present with physical complaints, we
should not be surprised when diagnostic overshadowing occurs and physicians
do not explore other possible problems.

Nature of Patient Feedback


Another reason why physicians may have difficulty detecting depression in
primary care settings is because they are unlikely to receive accurate feedback.
If a patient with clinically significant depression presents with a medical
problem and the physician misses the diagnosis, it is unlikely that the physician
will later learn that the diagnosis of depression was missed.
One of the most surprising findings on clinical judgment is that it can be
very difficult to learn from clinical experience. Training is often positively
related to validity, but experience is not.14,15 Thus, once physicians and mental
health professionals complete residency or graduate-school levels of training,
the amount of experience they gain is weakly related, or even negatively
related, to the accuracy of judgments and treatment outcomes.
In a review of the literature on the relationship between clinical experience and
quality of healthcare,16 physicians who had been in practice longer were found to
be at risk for providing lower-quality care. A decreasing level of performance (or
treatment) was associated with increasing years in practice for all outcomes
assessed in 32 of 62 studies. In the other studies, decreasing level of performance
was associated with increasing experience for some outcomes but not for others
(13 of 62 studies), no association was observed for 13 of 62 studies, mixed results
were obtained for 3 of 62 studies, and an increasing level of performance with
increasing years in practice for all outcomes was obtained in 1 of 62 studies.
116 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Similarly, in routine clinical practice in the mental health fields, profes-


sionals with extensive clinical experience are typically no more accurate than
other clinicians. For example, in one study,17 different participants (eg, marital
therapists, undergraduates) viewed videotaped conversations of 10 married
couples and predicted which couples were likely to divorce in the future.
Attitudes about marriage, but not amount of clinical experience, were related
to the validity of predictions.

The Cognitive Processes of Clinicians


It is likely that depression often goes undetected in primary care settings not
only because interviews are narrow and feedback is inadequate, but also
because the cognitive processes of clinicians are fallible. The primacy effect,
confirmatory hypothesis testing, cognitive heuristics, and causal reasoning are
described in this section.
One can wonder if one reason physicians miss diagnosing depression is
because they make judgments too quickly. The tendency to make judgments
quickly, sometimes after collecting relatively few data, is called the primacy
effect. It is characteristic of social judgments made in everyday situations as
well as of clinical judgments made in mental health settings.1,18 For example,
Gauron and Dickinson reported that psychiatrists who observed a videotaped
interview routinely formed diagnostic impressions in 30 to 60 seconds.19
Similarly, Kendell found that psychiatrists are often ready to make a diagnosis
for a patient within a few minutes.20 One can wonder if physicians in primary
care settings also tend to reach conclusions surprisingly quickly, and if this is a
reason for their missing diagnoses of depression.
Another reason depression may go undetected is because physicians may
rely on confirmatory hypothesis testing. Confirmatory hypothesis testing refers
to a tendency to seek, use, and remember information that is likely to confirm,
but not refute, a hypothesis. Research on clinical judgment indicates that
mental health professionals tend to seek and remember information that will
support a hypothesis and this leads them to not consider alternative hypotheses.
For example, in an especially well-designed study,21 psychology graduate
students watched a videotape of an initial psychotherapy session. They listed
questions they would like to ask the client portrayed in the videotape, and they
described their reasons for wanting to ask the client these questions. An
independent panel of psychologists coded each question as being likely to
elicit information that could confirm or disconfirm their hypothesis. The style
of hypothesis testing was confirmatory 64% of the time, neutral 21% of the
time, and disconfirmatory 15% of the time. These results, along with results
from other studies, provide insight into why clinicians do not routinely con-
sider alternative hypotheses.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 117

Cognitive heuristics are simple rules that describe how judgments are made.
Made famous by Daniel Kahneman and Amos Tversky, cognitive heuristics
describe cognitive processes that allow us to efficiently process vast amounts
of information.22 However, these same cognitive processes also cause us to
sometimes make characteristic types of mistakes. Cognitive heuristics include
the affect, representativeness, and availability heuristics.
The affect heuristic refers to the fact that people often make judgments and
decisions based, in part, on their feelings. ‘‘Snap judgments’’ and judgments
based on ‘‘gut instinct’’ or intuition are often described by the affect heuristic.
Kahneman believes that the formulation of the affect heuristic is ‘‘probably the
most important development in the study of judgment heuristics in the past few
decades.’’23, p. 703 But how does the affect heuristic relate to the detection of
depressive disorders in primary care settings? For whatever reasons, in many
cases, physicians’ reliance on affect and intuition does not allow them to detect
depression in these settings.
The representativeness heuristic is said to be descriptive of a clinician’s
cognitive processes when a judgment is made by deciding if a patient is
representative of a category.24 For example, when a screening instrument
indicates that a patient may be depressed and physicians must decide if
treatment for depression is required, the physicians may compare the patient
to (a) patients they have worked with who have been clinically depressed,
(b) their concept of the ‘‘typical’’ person with clinically significant depression,
or (c) a theoretical standard that serves to define clinically significant depres-
sion. The representativeness heuristic is often descriptive of how judgments are
made in everyday life,25 and it is even descriptive of how many mental health
professionals make diagnoses.26 Since the representativeness heuristic is often
descriptive of how people make judgments, it is likely to also be descriptive of
physicians in primary care settings. If they are not comparing patients to
appropriate exemplars, stereotypes, or prototypes, then this may explain why
they are having difficulty with this task.
The third heuristic, the availability heuristic, is descriptive of memory when
clinicians are influenced by the ease with which events or different patients can
be remembered. For example, the ease with which information is remembered
can be related to its recency or its vividness. The point to be understood here is
that memory is fallible. We are unable to remember all of the patients we have
seen. By being selective for memory, cognitive efficiency is enormously
enhanced, but learning from experience becomes difficult.
One more feature of the cognitive processes of clinicians will be described.
A major finding on clinical judgment in recent years is that causal reasoning
underlies the manner in which mental health professionals make many dif-
ferent types of judgments, including treatment decisions, predictions, and
diagnoses.27,28
118 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

With regard to treatment decisions, Witteman and Koele addressed the


following questions: ‘‘What explains which treatment is proposed to a
(depressed) patient? Is it the patient characteristics, such as her or his specific
symptoms, social context, and seriousness of the disorder, or is it the theoretical
background of the proposing psychotherapist?’’29, p. 100 For a group of 56
therapists, treatment plans were highly variable, and Witteman and Koele con-
cluded, ‘‘The best explanations of the treatment proposals seemed to be the
therapist’s theory-inspired interpretations of the patient complaints.’’29, p. 100
Causal reasoning also underlies how mental health professionals make
predictions. In one study, clinicians predicted whether patients would
become violent in the next 6 months.30 Ratings were made by mental health
professionals working in a psychiatric emergency room. Mulvey and Lidz
observed:

Clinicians did not appear to be making simple ‘‘yes’’ or ‘‘no’’ judgments of


dangerousness. Rather, they seemed to be making contextualized judgments
regarding future violence. Instead of stating whether they thought someone
was highly likely or unlikely to be involved in violence, the clinicians instead
gave what we called ‘‘conditional judgments’’ regarding future violence. . . . In
other words, they saw the violence as dependent upon certain conditions in the
person’s life.’’30, p. S108

Thus, clinicians will frequently make predictions by formulating case


conceptualizations.
Finally, when clinicians make diagnoses, they are influenced not only by
diagnostic criteria but also by their implicit causal theories.27,31 Clinicians
weigh diagnostic criteria more heavily when the criteria describe symptoms
and behaviors that are part of a clinician’s implicit causal model for a dis-
order.27 When using DSM, clinicians are supposed to weigh each criterion
equally. Similarly, mental health professionals’ implicit theories influenced
their memories of their clients’ mental status. Causally central symptoms were
recalled more often than causally peripheral symptoms and isolated symptoms.
In addition, false memories of a patient having symptoms the patient did not
really have were most likely to occur for symptoms that were causally central
to clinicians’ theories of different disorders.
The finding that causal reasoning underlies different types of clinical judg-
ments is important for helping us understand the actions of physicians in
primary care settings. To understand the etiology and course of a patient’s
physical complaint, physicians should understand the effect of depression. In
other words, for some patients, vague physical complaints and complaints of
fatigue and aches and pains are highly correlated with depression and anxiety.
To the extent that this is recognized by physicians, they will become more
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 119

adept at detecting depression. Thus, to some degree, to bring about change in


primary care settings, we must be concerned with the implicit causal theories of
physicians.

3. The Limits of Screening


The use of screening questionnaires can help physicians overcome some
problems but not others. Screening questionnaires can compensate for
interviews that are not comprehensive, and they can help physicians over-
come some cognitive processes that are counterproductive, such as diag-
nostic overshadowing and confirmatory hypothesis testing. In particular,
screening questionnaires will prompt physicians to consider alternative
hypotheses—that is, results from a screening questionnaire can lead a
physician to consider whether a patient is depressed. Otherwise, the physi-
cian may not even consider the hypothesis that a particular patient has a
mood disorder.
Given everything we know about clinical judgment, it is somewhat sur-
prising that the use of screening questionnaires has not been related to
improved clinical outcomes. A number of reasons can be given for why this
is the case. Two reasons will be described here.
First, some patients overreport symptoms while other patients underreport
them. This can occur if a patient misunderstands an item or if the patient wants
to create an impression of being healthy or of being impaired. To the extent that
symptoms are overreported or underreported on screening instruments, we
should not expect better clinical outcomes.
Second, even with the use of screening questionnaires, physicians must still
rely on clinical judgment. Thus, if a patient tests positive for depression on a
screening instrument, physicians must rely on their clinical judgment to deter-
mine whether the patient’s responses should be viewed as indicating a need for
treatment or as a false positive. If someone is clinically depressed, physicians
will need to determine if he or she may have a bipolar disorder (and should not
be treated with an antidepressant). They must also determine if the patient is at
serious risk for suicide. If physicians are not making the right judgments when
a patient tests positive (eg, making a referral to a mental health professional,
providing treatment for depression, making a differential diagnosis of bipolar
disorder), then the use of screening questionnaires will not lead to improved
clinical outcomes. This is a challenging task for physicians, in part because
they will not receive feedback on the validity of their judgments or the utility of
their decision making and in part because they are unlikely to have specialized
training in mental health diagnosis and treatment. It is also a challenging task
because when patients complete questionnaires inquiring about mental health
120 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

symptoms, false positives are common, usually because patients (and everyone
else) will sometimes interpret items in an idiosyncratic manner.32
In conclusion, we are faced with a dilemma. Clinical judgment is fallible,
and the use of screening questionnaires has not been related to improved
clinical outcomes. However, the use of screening tools should help to improve
clinical judgment, and, much of the time, an optimal strategy will be to conduct
screening and then rely on clinical judgment. Although a large body of research
describes errors and mistakes in clinical judgment, it can still be of consider-
able value, if only to review responses on a screening questionnaire with a
patient so as to better understand how the patient interpreted the items. In
addition, it may be that use of screening assists in the diagnosis of under-
confident clinicians but could be unhelpful in those skilled in making the
diagnosis in question.

References
1. Garb HN. Studying the clinician: judgment research and psychological assessment.
Washington, DC: American Psychological Association, 1998.
2. Garb HN. Clinical judgment and decision making. Ann Rev Clin Psychol.
2005;1:67–89.
3. Nichols GA, Brown JB. Following depression in primary care—Do family practice
physicians ask about depression at different rates than internal medicine physicians?
Arch Fam Med. 2000;9:478–482.
4. Miller PR, Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of
structured vs. unstructured interviews. Psychiatry Res. 2001;105:255–264.
5. Malone KM, Szanto K, Corbitt EM, et al. Clinical assessment versus research methods
in the assessment of suicidal behavior. Am J Psychiatry. 1995;152:1601–1607.
6. Cascardi M, Mueser KT, DeGiralomo J, et al. Physical aggression against psychiatric
inpatients by family members and partners. Psychiatr Serv. 1996;47:531–533.
7. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med.
1983;13:151–158.
8. Jopp DA, Keys CB. Diagnostic overshadowing reviewed and reconsidered. Am J Ment
Retard. 2001;106:416–433.
9. Reiss S, Szyszko J. Diagnostic overshadowing and professional experience with
mentally retarded persons. Am J Mental Defic. 1983;87:396–402.
10. Mason J, Scior K. Diagnostic overshadowing amongst clinicians working with people
with intellectual disabilities in the UK. J Appl Res Int Dis. 2004;17:85–90.
11. Spengler PM, Strohmer DC, Prout HT. Testing the robustness of the overshadowing
bias. Am J Mental Retard. 1990;95:204–214.
12. Drake RE, Osher FC, Noordsy DL, et al. Diagnosis of alcohol use disorders in
schizophrenia. Schizophr Bull. 1990;16:57–67.
13. Tenney NH, Schotte CKW, Denys DAJP, et al. Assessment of DSM-IV personality
disorders in obsessive-compulsive disorder: Comparison of clinical diagnosis, self-report
questionnaire, and semi-structured interview. J Personal Disord. 2003;17:550–561.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 121

14. Garb HN. Clinical judgment, clinical training, and professional experience. Psychol
Bull. 1989;105:387–396.
15. Garb HN, Schramke CJ. Judgment research and neuropsychological assessment: a
narrative review and meta-analyses. Psychol Bull. 1996;120:140–153.
16. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The relationship
between clinical experience and quality of health care. Ann Intern Med.
2005;142:260–273.
17. Ebling R, Levenson RW. Who are the marital experts? J Marriage Fam.
2003;65:130–142.
18. Ambady N, Rosenthal R. Thin slices of expressive behavior as predictors of
interpersonal consequences: A meta-analysis. Psychol Bull. 1992;111:256–274.
19. Gauron EF, Dickinson JK. Diagnostic decision making in psychiatry. Arch Gen
Psychiatry. 1966;14:225–232.
20. Kendell RE. Psychiatric diagnoses: A study of how they are made. Br J Psychiatry.
1973;122:437–445.
21. Haverkamp BE. Confirmatory bias in hypothesis testing for client-identified and
counselor self-generated hypotheses. J Couns Psychol. 1993;40:303–315.
22. Tversky A, Kahneman D. Judgments under uncertainty: heuristics and biases. Science.
1974;185:1124–1131.
23. Kahneman D. A perspective on judgment and choice: Mapping bounded rationality. Am
Psychol. 2003;58:697–720.
24. Kahneman D, Slovic P, Tversky A, eds. Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press, 1982.
25. Gilovich T, Griffin D, Kahneman, D, eds. Heuristics and biases. New York: Cambridge
University Press, 2002.
26. Garb HN. The representativeness and past-behavior heuristics in clinical judgment.
Prof Psychol Res Pr. 1996;27:272–277.
27. Kim NS, Ahn W. Clinical psychologists’ theory-based representations of mental
disorders predict their diagnostic reasoning and memory. J Exp Psychol Gen.
2002;131:451–476.
28. Wakefield JC, Kirk SA, Pottick KJ, et al. Disorder attribution and clinical judgment in
the assessment of adolescent antisocial behavior. Soc Work Res. 199;23:227–238.
29. Witteman C, Koele P. Explaining treatment decisions. Psychother Res. 1999;9:100–114.
30. Mulvey, EP, Lidz CW. Clinical prediction of violence as a conditional judgment. Soc
Psychiatry Psychiatr Epidemiol. 1998;33:S107–S113.
31. Pottick KJ, Kirk SA, Hsieh DK, et al. Judging mental disorder in youths: Effects of
client, clinician, and contextual differences. J Consult Clin Psychol. 2007;75:1–8.
32. Nease DE, Klinkman MS, Aikens JE. Depression case findings in primary care:
A method for the mandates. Int J Psychiatry Med. 2006;36:141–151.
This page intentionally left blank
7
IMPLEMENTING SCREENING AS PART OF
ENHANCED CARE: SCREENING ALONE IS NOT
ENOUGH

Simon Gilbody and Dan Beck

1. The Case for Screening


2. Screening and Enhanced Care for Depression
3. New and Additional Evidence Relating to Enhanced Care
4. Is Screening a Necessary Intervention to Improve the Quality and Outcome
of Care?
5. To Screen or Not to Screen?

Context
There are conflicting conclusions and policy recommendations relating to the
effects of screening on the outcome of depression, but what does the latest
evidence suggest? Based on the best available information to date, it emerges
that screening alone is not a sufficient intervention to improve the quality and
outcomes of care for depression. What is less clear is whether screening is a
necessary condition for enhanced and improved quality of care and, given
additional components, to what extent screening programs can potentially
improve quality of routine care.

1. The Case for Screening


Depression is the most common mental health problem and is associated with
decrements in functioning and quality of life comparable to other chronic
physical diseases.1 The prevalence, chronicity, and burden of suffering are
such that the World Bank has predicted that depression will become the second
leading cause of global disability by 2020.2 The economic consequences of
depression are also profound, with the healthcare costs, welfare costs, and

123
124 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

losses to productivity amounting to £9 billion ($20 billion) in the United


Kingdom3 and $53 billion in the United States.4
Depression is most commonly encountered in primary care and in hospital
settings, yet it often goes unrecognized by healthcare professionals.5–7 This has
led to calls to implement screening programs to aid in the detection and
management of this problem.8,9 The rationale and evidence base to support
screening for depression is the focus of the present book and is discussed
extensively in other chapters (see Chapters 2, 4, and 9). In the United States,
screening has shifted from being an intervention that was not initially sup-
ported in national policy recommendations10 to being one that is regarded as
being of proven effectiveness.11 An evolution in thinking has occurred that
places screening at the center of mental health policy and practice, and is based
upon the general assumption that screening will logically lead to improvements
in the quality and outcome of care. Some have termed this the screening–
detection–treatment–improvement paradigm.12,13 Recently screening for
common mental health problems in the United States has become the corner-
stone of the president’s agenda to improve the mental health of the U.S.
population.14

Arguments For and Against Screening


Screening has a long and honorable tradition in helping to improve the health
and well-being of populations and individuals.15 However, screening is a
‘‘special case’’ in the armory of healthcare interventions, since testing and
treatment may be offered to those who do not necessarily know they have a
condition or do not specifically ask for help for that problem.16 Screening
programs have also been implemented in the past without due consideration of
their effectiveness, their ethical and clinical implications, and their impact on
finite healthcare resources.17 Consequently, clear criteria have evolved that
must be satisfied before screening programs are adopted (see Chapter 5).18 In
the case of depression, screening is just one of a range of possible interventions
that might be offered to improve care for depression at a population level,19 and
the implementation of screening programs should be supported by sound
clinical and economic evidence.20 The relative merits of screening for depres-
sion more generally have been reviewed by Gilbody and colleagues 20 and by
Palmer and Coyne.13 Gilbody and colleagues used a set of analytic principles
laid down by the World Health Organization18 and adopted by the U.K.
National Screening Committee.21 In their analysis, they agued that the relative
merits of screening programs are sometimes overstated, and that convincing
evidence that screening substantially influences the outcomes of depression is
difficult to find. The principal concerns that have been highlighted are that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 125

screening for depression uncovers a substantial body of undetected psycholo-


gical need that is not currently well met within existing healthcare systems.
Much of this represents short-term and self-limiting distress, the natural history
of which is not readily influenced by active intervention.22 In addition, the
common belief that unrecognized depression is as responsive to the evidence-
supported interventions (antidepressants and brief psychotherapy) currently
used for already recognized depression is not necessarily true: unrecognized
depression may be more difficult to treat because it tends to be mild or atypical.
Most importantly of all, they highlighted the relative lack of evidence in the
form of randomized controlled trials to show that the introduction of screening
programs for depression makes any substantial difference to the outcomes of
depression itself.23 There is also a dearth of economic data to inform this
population-level policy intervention. It is this area of supportive epidemiologic
and economic evidence that has produced the greatest amount of debate and
controversy, which we will review in more detail within this chapter.
Two strategies have been scrutinized and variously rejected10,24,25 or advo-
cated.11,26,27 The first is the use of screening as a ‘‘stand-alone’’ quality
improvement strategy. The second is the use of screening within a more
general enhancement of the care for depression in non-specialist settings. Let
us examine each of these strategies in turn to establish whether screening is a
sufficient or necessary condition in improving the quality and outcome of care
for depression.

Is Screening a Sufficient Intervention to Improve the Quality


and Outcome of Care?
The effectiveness of screening for depression was first addressed with refer-
ence to the research literature the 1990s. The first evidence synthesis was
conducted by the U.S. Agency for Health Care Policy and Research, which
looked at the evidence to support various aspects of the management of
depression in primary care settings, including screening.28 This review exam-
ined the totality of research and came down firmly against screening. On the
basis of a review of the literature published in May 1993, the U.S. Preventive
Services Task Force (USPSTF) concluded that there was ‘‘sufficient evidence
to exclude screening for depression in the primary care setting’’ (a ‘‘grade D’’
recommendation). This research highlighted that screening instruments did not
generally improve the detection rate or management of depression. The evi-
dence they reviewed was primarily related to the use of screening programs as
a ‘‘stand-alone’’ measure.
A similar conclusion was found in a 2001 evidence review24 also published
under the auspices of the Cochrane Collaboration (first in 200523 and updated
126 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

again in 200829,30). The most recent version of this review of ‘‘stand-alone’’


depression screening, which now includes 16 primary trials of the effectiveness
of screening strategies (5,000 + patients), concluded, ‘‘There is substantial
evidence that routinely administered case finding/screening questionnaires
for depression have minimal impact on the detection, management or outcome
of depression by clinicians.’’
The most important finding from the Cochrane reviews29,30 has been the
consistent demonstration that screening had minimal impact on the actual
outcomes of depression when screened populations were followed up over
time. This review concurs with the first USPSTF review,11 and an overall
summary diagram of the lack of effect of simple screening strategies based on
the Cochrane review is shown in Figure 7.1.
A review conducted at around the same time as the first Cochrane review,
to provide updated guidance to the USPSTF,11 examined a similar body of
research and found a similar lack of effect in relation to the impact of stand-
alone screening strategies. However, this review was altogether more posi-
tive about screening (Textbox 7.1). The reasons for this shift in recommen-
dation by the USPSTF deserve examination in some detail, and relate to the
additional consideration of screening alongside ‘‘additional enhancements
of care.’’

Depression outcomes (SMD)


Study (95% CI)

Bergus 2005 -0.29 (-1.40, 0.82)

Callaghan 1994 -0.05 (-0.97, 0.86)

Johnstone 1976 -0.77 (-1.54, 0.00)

Lewis GHQ 1996 0.10 (-0.09, 0.29)

Lewis PRQ 1996 -0.06 (-0.25, 0.13)

Whooley 2000 -0.16 (-0.72, 0.39)

Williams 1999 -0.22 (-0.81, 0.37)

Overall -0.03 (-0.16, 0.10)

–1.5 –1 –.5 0 .5 1 1.5


Depression outcomes (SMD)
Favors screening Favors control

Figure 7.1. Summary of random effects meta-analysis of the effect of simple screening/
case-finding instruments on the outcome of depression at follow-up (adapted from
references 23, 29, and 30).
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 127

Textbox 7.1. Current Policy Recommendations on Screening for


Depression

U.K. National Institute of Clinical Excellence31


‘‘Screening should be undertaken in primary care and general hospital
settings for depression in high-risk groups—for example, those with a past
history of depression, significant physical illnesses causing disability, or
other mental health problems such as dementia.’’
Review of reviews to inform practice and policy in Australia and
New Zealand32
‘‘Brief self-report instruments have acceptable psychometric properties and are
practical for use in general practice settings. Screening increases the recognition
and diagnosis of depression and, when integrated with a commitment to provide
coordinated and prompt follow up of diagnosis and treatment, clinical outcomes
are improved. Although controversial, the evidence is now in favour of the
appropriate use of screening tools in primary care.’’
U.S. Preventive Services Task Force11
‘‘The USPSTF found good evidence that screening improves the accurate
identification of depressed patients in primary care settings and that treatment
of depressed adults identified in primary care settings decreases clinical
morbidity. Trials that have directly evaluated the effect of screening on
clinical outcomes have shown mixed results. Small benefits have been
observed in studies that simply feed back screening results to clinicians.
Larger benefits have been observed in studies in which the communication
of screening results is coordinated with effective follow-up and treatment.
The USPSTF concluded the benefits of screening are likely to outweigh any
potential harms.’’
Strength of recommendation: B (‘‘there is at least fair evidence that the
intervention improves important health outcomes and that the benefits
outweigh the harms’’)
Canadian Task Force on Preventive Health Care27
‘‘The CTFPHC concludes that there is fair evidence to recommend screening
adults for depression in primary care settings since screening improves health
outcomes when linked to effective follow-up and treatment.’’
Strength of recommendation: B (‘‘there is fair evidence to recommend the
clinical preventive action’’)
‘‘The CTFPHC concludes that there is insufficient evidence to recommend
for or against screening adults for depression in primary care settings where
effective follow-up and treatment are not available.’’
Strength of recommendation: I (‘‘insufficient evidence [in quantity and/or
quality] to make a recommendation, however other factors may influence
decision-making’’)
128 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

2. Screening and Enhanced Care for Depression


The major shift between recommendations produced in 1996 and 2003 turns
upon the change in the scope of the evidence review and the inclusion criteria
that were set.25 In contrast to earlier reviews, the USPSTF in their updated
report reviewed both stand-alone screening programs and those embedded
within enhancements of care. An example of such an enhanced care study was
that conducted by Wells and colleagues (the Partners in Care study),33 which
provided practice-level enhancements in the quality of care for depression,
including structured psychotherapy or medication management, clinician edu-
cation and consultation/liaison, treatment guidelines, and structured follow-
up. Recruitment to this trial was by screening and, as such, was considered by
the USPSTF as evidence to support the effectiveness of screening in practice.
This study showed strongly positive results on the outcomes of depression and
was included a summary meta-analysis (accounting for 33% of the overall
weight of evidence). On the basis of this evidence, the USPSTF concluded,
‘‘benefits have been observed in studies in which the communication of
screening results is coordinated with effective follow-up and treatment.’’
A subsequent 2005 review published by the Canadian Task Force on
Preventive Health Care (CTFPHC)27 made a nearly identical recommendation,
highlighting the ineffectiveness of stand-alone screening and the effectiveness
of screening plus enhanced care. A similar recommendation was made in the
United Kingdom in guidance offered by the U.K. National Institute of Clinical
Excellence (NICE) (see Textbox 7.1).31

3. New and Additional Evidence Relating to Enhanced Care


The specific recommendations made by the USPSTF, CTFPHC and NICE
relating to screening plus enhanced care fit into a much wider body of research
relating to organizational enhancements to the process of care for depression.34
The enhancement of primary care for depression is an active area of research,
and a substantial body of research evidence now exists to show that this is an
effective intervention.35 The most recent review of this topic has included
pooled data from over 30 randomized trials, based on over 12,000 patients
with depression, and has shown that enhanced care is effective in the short and
medium term.35 The finding that enhanced or collaborative care is effective is
now a consistent one that has been supported in several independently con-
ducted meta-analyses (see Bower and Gilbody36 for an overview of reviews in
this area). In the aforementioned Partners in Care study, the benefits of an
enhanced care intervention have persisted at up to 5 years.37 However, while
the effectiveness of enhanced care is now beyond reasonable doubt, the
USPSTF review included only 438–41 of the 36 trials of enhanced care that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 129

were summarized in the largest or most comprehensive review to date. From


these four studies, the U.S. and Canadian reports drew quite specific conclusions
about the effectiveness of screening (the topic of their review) rather than about
the effectiveness of enhanced and collaborative care in general.25 Many studies
of enhanced care do not use screening as an entry criterion or component of
quality improvement, but these were not reviewed by the USPSTF. This is not
just of academic interest, since it is clear that many healthcare systems have
taken the positive endorsement of screening within enhancements of care as an
endorsement of screening per se. In the United Kingdom, for example, financial
inducements have been introduced to encourage primary care physicians to
screen for depression, without any requirement that further enhancements in
the quality of care are introduced.20 Clearly, the specific question about the
relative contribution of screening to the effectiveness of quality improvement
strategies is important from a policy and practice perspective. To what extent is
screening the critical component in determining the quality of depression care?

4. Is Screening a Necessary Intervention to Improve the


Quality and Outcome of Care?
What remains unclear from the preceding discussion and the work of the
USPSTF is whether screening is a necessary component or condition for
effective enhanced care, and whether enhancements of care without screening
are in themselves ineffective. Recent research has emerged to answer this
question, which was not effectively addressed by the USPSTF11 and a subse-
quent review by the CTFPHC.27
The overall effectiveness of enhanced care for depression has most recently
been reviewed by Gilbody and colleagues,35 who found that collaborative care
strategies were effective far beyond conventional levels of significance in
improving depression outcomes in the short and medium term. This dataset
provides a more comprehensive body of research within which to begin to
examine whether screening is a necessary ingredient of effective enhanced care
for depression.
Among enhanced care studies as a whole, the authors found a moderate
pooled standardized effect size of 0.25 for enhanced care compared to usual
care (95% confidence interval 0.18 to 0.32). They also found that there was
significant between-study variation in the magnitude of effect size (that is,
heterogeneity). When conducting a meta-analysis, the most rigorous approach
to heterogeneity is to seek to explain or explore the causes of this heteroge-
neity.42 This technique can provide useful insights into mechanisms of effect
and variations in treatment response according to the population under study or
the intervention under evaluation. This information is often of interest to
clinicians and policymakers charged with implementing or interpreting
130 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

research evidence. One technique that can be used is regression modeling,


whereby the relationship between study-level design variables and a dependent
variable (study effect size) is examined (this is termed meta-regression42,43).
This technique was applied to the dataset of enhanced care for depression by
Bower and colleagues to identify some of the ‘‘active ingredients’’ in enhanced
care for depression.44
Among 34 studies, there was substantial variation in the content and inten-
sity of collaborative care. Some studies, such as the previously discussed
Partners in Care study,39 provided relatively intensive packages of enhanced
care, including face-to-face clinician education, computerized decision sup-
port, individualized treatment algorithms, the active support of a nurse case
manager. and regular consultation/liaison with a specialist mental health clin-
ician (psychologist or psychiatrist). This study39 accounted for 30% to 47% of
the weighted information in the meta-analyses produced on behalf of the
USPSTF.11 In contrast, less intensive packages of care were also included in
the collaborative review by Gilbody and colleagues and involved simple
telephone follow-up by practice nurses.45 Bower and colleagues44 used meta-
regression to examine the relative contributions of various aspects of the
content of enhanced care interventions in improving depression outcome
within the dataset of collaborative care studies. They specified and were able
to find sufficient study-level information on eight aspects of care and study
design, including the method of recruitment—whether by screening or by
clinician referral of already recognized depression. Stratification according
to this variable showed that the majority of studies used screening, but that 12
collaborative care studies did not.45–55A stratified meta-analysis according to
this variable is shown in Figure 7.2, and the methods of patient recruitment (by
screening or by other means) are detailed in Table 7.1.
From this stratified analysis, it is evident that the majority of studies were
positive, and that screening studies showed the most strongly positive effect
size (Standardized Mean Differencescreening = 0.30, 95% confidence interval
0.21 to 0.38), while non-screening studies were still significantly positive,
but the magnitude of effect was less pronounced (Standardized Mean
Differenceno-screening = 0.15, 95% confidence interval 0.03 to 0.26). When the
difference between these two effect sizes was tested using logistic meta-
regression,56 this trend was positive but nonsignificant (difference in standar-
dized mean differences = 0.15, 95% confidence interval –0.03 to 0.29, p = 0.09).
Of particular interest from the point of view of the present chapter was the fact
that several additional study-level variables were also related to the magnitude of
effect size in collaborative care, and that three of these predictive covariates were
either strongly significant (p < 0.05) or more significant than screening
(p < 0.1). These were better antidepressant concordance, having a trained case
manager, and regular and planned supervision of case managers.
Standardized Depression Outcomes
Study (95% CI)
referred by clinician
Wilkinson 1993 –0.29 (–0.79, 0.22)
Mann 1998 –0.08 (–0.29, 0.13)
Peveler 1999 0.21 (–0.11, 0.54)
Akerblad 2003 0.26 ( 0.07, 0.45)
Brook 2003 0.00 (–0.34, 0.34)
Katon 1995 0.19 (–0.12, 0.49)
Katon 1996 0.49 ( 0.13, 0.86)
Finley 2000 –0.30 (–0.83, 0.24)
Hunkeler 2000 0.28 ( 0.03, 0.53)
Datto 2003 0.42 (–0.14, 0.98)
Dietrich 2004 0.16 (–0.08, 0.39)
Cappocia 2004 0.17 (–0.38, 0.72)
Subtotal 0.15 ( 0.03, 0.26)

identified by screening
Blanchard 1995 0.43 (–0.01, 0.87)
Araya 2003 1.13 ( 0.79, 1.47)
Bosmans 2006 0.07 (–0.28, 0.42)
Callahan 1994 0.05 (–0.48, 0.58)
Katon 1999 0.31 ( 0.01, 0.61)
Coleman 1999 –0.14 (–0.53, 0.25)
Wells-medication 2000 0.22 (–0.02, 0.46)
Simon 2000 0.30 ( 0.07, 0.52)
Katzelnick 2000 0.43 ( 0.22, 0.63)
Wells-therapy 2000 0.22 (–0.01, 0.45)
Unutzer 2001 0.40 ( 0.31, 0.50)
Katon 2001 0.11 (–0.09, 0.32)
Rost 2001b 0.29 (–0.05, 0.62)
Rost 2001a 0.20 (–0.10, 0.50)
Oslin 2003 0.61 ( 0.08, 1.13)
Swindle 2003 0.18 (–0.30, 0.66)
Rickles 2004 0.25 (–0.37, 0.87)
Adler 2004 0.19 (–0.01, 0.39)
Bruce 2004 0.30 ( 0.07, 0.52)
Simon 2004b 0.33 ( 0.05, 0.62)
Katon 2004 0.24 (–0.03, 0.51)
Jarjoura 2004 0.41 ( 0.00, 0.82)
Simon 2004a 0.18 (–0.11, 0.46)
Wang 2007 0.82 (–0.06, 1.70)
Subtotal 0.30 ( 0.21, 0.38)

Overall 0.25 ( 0.18, 0.32)

–1.5 –1 –.5 0 .5 1 1.5


Standardized Depression Outcomes

Figure 7.2. Enhanced care for depression: a random effects meta-analysis of 36 studies, comparing depression outcomes at 6 months in studies that
use screening to recruit patients, versus those where clinicians recruit patients with recognized depression. (Re-analysis of data from Bower P, Gilbody
SM, Richards D, et al. Collaborative care for depression: making sense of complex interventions through systematic review and meta-regression. Br J
Psychiatry. 2006;189:484–493.)
Table 7.1. Study Details and Method of Patient Recruitment from Studies of Collaborative or Enhanced Care for Depression

Study Name References Setting Sample Size Patient Population Recruitment Method
Adler 2004 62 US 533 Adults with major depression or Screening of primary care attenders
dysthymia (DSM-IV) using the Primary Care Screener for
Affective Disorders (PC-SAD)
Akerblad 46 Sweden 1,031 Adults with major depression and Physician referral, no screening
2003 an indication for antidepressants
Araya 2003 63 Chile 240 Women with major depression Screening of primary care attenders
using GHQ-12 (score 5 or more on
two occasions)
Blanchard 64 UK 96 Elderly with depression warranting Elderly nursing home residents
1995 clinical intervention screening positive with diagnostic
depression scale (DPDS)
Brook 2003 47 Netherlands 147 Adults with depressive complaints, Physician referral, no screening
prescribed new antidepressant
Bruce 2004 65 US 598 Elderly with major depression, Elderly patients screening positive
dysthymia, and minor depression using the CES-D (score > 20) or
responding positively to previous
history of depression
Callahan 66 US 175 Elderly with newly diagnosed Elderly patients screening positive
1994 depression using the CES-D (score > 20)
Capoccia 48 US 74 Adults with depression, prescribed Physician referral of new episode of
2004 a new antidepressant depression, no screening
Coleman 67 US 169 Depressed frail elderly Frail older adults who screened
1999 positive for a predictive index of
hospitalization. Use of CES-D as a
screening instrument integrated into
chronic care clinics.
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Datto 2003 49 US 61 Adults with depressive symptoms Physician referral of patients with
depression, no screening
Dietrich 68 US 405 Adults with major depression and Physician referral of patients with
2004 dysthymia (DSM-IV), starting/ depression, no screening as method of
changing treatment recruitment, but had to score
SCL-20 > 0.5 at enrollment
Finley 1999 51 US 125 Adults with current major Physician referral of patients already
depression, prescribed a new prescribed antidepressants
antidepressant
Hunkeler 52 US 302 Adults with major depression or Physician referral of patients with a
2000 dysthymia, prescribed a new new diagnosis of depression, and
antidepressant prescribed antidepressant
Jarjoura 69 US 121 Adults with major depression not Screening for inclusion using the
2004 currently in treatment PRIME-MD
Katon 1995 53 US 217 Adults with depression, prescribed Physician referral of patients with
a new antidepressant definite or probable depression
Katon 1996 53 US 153 Adults with depression, prescribed Physician referral of patients with
a new antidepressant definite or probable depression
Katon 1999 70 US 228 Adults at high risk of persistent Telephone screening using the SCID
depression, recurrent depression, or
dysthymia
Katon 2001 71 US 386 Adults, prescribed a new Telephone screening using the SCID
antidepressant, at high risk of
relapse
Katon 2004 72 US 329 Adults with diabetes with Telephone screening using the PHQ-9
depressive symptoms (score >=10)

(Continued )
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Katzelnick 38 US 407 Adults, high utilizers of services, Two-stage telephone screening
2000 with depressive symptoms procedure with the SCID and
Hamilton Depression Rating Scale
Mann 1998 54 UK 419 Adults with depression Primary care physician referral;
patients currently with a diagnosis and
in receipt of care for depression
Oslin 2003 73 US 97 Adults with depression or Primary care screening with CES-D
dysthymia, at-risk drinking (score > 15)
Peveler 1999 45 UK 160 Diagnosis of depression, prescribed Physician referral; patients with a new
a new antidepressant diagnosis of depression commencing
antidepressant medication
Rickles 2005 74 US 63 Prescribed a new antidepressant Patients with a newly initiated
prescription of antidepressant
medication
Rost 2001 41 US 243 Adults with major depression, Two-stage screening procedure using
prescribed a new antidepressant, WHO-CIDI administered by practice
recently treated nurses
Rost 2002b 41 US 189 Adults with major depression, Two-stage screening procedure using
prescribed a new antidepressant, WHO-CIDI administered by practice
beginning new episode nurses
Simon 2000 75 US 392 Adults with depression, prescribed Patients identified from computerized
a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication
Simon 76 US 402 Adults with depression, prescribed Patients identified from computerized
2004a a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication. No
screening.
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Simon 76 US 393 Adults with depression, prescribed Patients identified from computerized
2004b a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication. No
screening.
Swindle 77 US 268 Adults with major depression, Primary care patients screening
2003 Dysthymia, or partially remitted positive with the PRIME-MD
major depression
Unutzer 78 US 1801 Elderly with major depression, Patients screened face to face or by
2001 dysthymia, or both phone from primary care lists or
attendance using CIDI
Wells 2000a 39 US 867 Adults with major depression or Consecutive primary care attenders
dysthymia screened using the CIDI
Wells 2000b 39 US 932 Adults with major depression or Consecutive primary care attenders
dysthymia screened using the CIDI
Whooley 40 US 331 Elderly with depressive symptoms Consecutive elderly primary care
2000 attenders screened using the GDS
(score >=6)
Wilkinson 55 UK 61 Adults with depression, prescribed Physician referral of patients with
1993 a new antidepressant already diagnosed depression

Adapted from Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for depression: a Cochrane systematic review and exploration of heterogeneity.
CMAJ. 2008;178:1023–1024; and Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative meta-analysis and review of longer-term outcomes.
Arch Intern Med. 2006;166:2314–2321.
136 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

The review by Bower and colleagues44 provides a richer and more complete
dataset than the USPSTF review within which to examine the relative con-
tribution of screening to the effectiveness of enhanced care. However, there are
several limitations to their approach. The most important limitation is the fact
that, despite using randomized studies, the exploratory comparison within a
meta-regression is an observational one and is therefore susceptible to con-
founding (alternative explanations for observed effects and relationships).56 In
this case, the use of screening could be confounded by other design-level
variables (such as increased intensity of care). Bower and colleagues44
sought to address this limitation by conducting a multivariate analysis of
these data to adjust for other potentially confounding covariates. They found
in their multivariate analysis that several of the positive associations found in
univariate meta-regression (such as this highlighted above) ceased to be sig-
nificant in multivariate analysis. The only study-level variable that remained
after adjusting for other potentially confounding variables was the mental
health background of the case manager (p = 0.03). Screening, in contrast,
became less significant (p = 0.19) when other variables were accounted for.
The most likely conclusion that can be drawn from this analysis is that the
effect of screening is weak and is potentially confounded by other study-level
variables. Screening as a recruitment strategy is not therefore likely to be an
independently significant predictor of the effectiveness of enhanced care
strategies. One might go further and suggest that good-quality collaborative
care is likely to be effective, whether or not screening is used.

5. To Screen or Not to Screen?


Despite the apparently differing conclusions and policy recommendations
relating to screening for depression, an evidence-based consensus seems to
emerge that screening when given alone is an ineffective strategy. This con-
clusion should not be surprising, since the quality of care for depression is often
poor57,58 and the addition of screening is likely only to identify an unmet need
without offering anything positive to improve the management and outcome of
this condition. It has been discussed elsewhere that screening identifies a
qualitatively different population of people with depression from those who
are already identified and managed in primary care (what Goldberg calls
‘‘conspicuous psychiatric morbidity’’59). The people identified by screening
programs tend to have less severe psychopathology, a better outcome, and a
general reluctance to take antidepressants or to benefit from medical or psy-
chosocial interventions (see Palmer and Coyne13 for review).
Low expectations and poor outcome of screening strategies have led to a
more fundamental rethinking of the organization of delivery of care for
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 137

depression.58 A direct result of the failure of the screening–detection–treat-


ment–improvement paradigm12 has been the emergence of organizational
enhancements of care, such as collaborative care.60,61
The conclusion that should be drawn from the re-analysis of existing studies
of collaborative care in the present chapter is that this strategy is generally
effective, but the assumption that screening is a key element of effective
enhancement might not be true. This is not a small and insignificant epide-
miologic issue of causal inference and confounding, but one that is of impor-
tance to practitioners and policymakers. The concerns relating to the relative
importance of screening in quality enhancement are important for two main
reasons. Firstly, policymakers have readily picked up on the positive endorse-
ment of screening from bodies such as the USPSTF and NICE without reading
the small print. Quality enhancement strategies have sometimes begun and
ended with screening, without the implementation of wider enhancements of
care. Screening is a quick and easy policy to implement, measure, and reward.
The experience in the United Kingdom is that screening and case-finding is
financially rewarded without any explicit requirement that the process of care
be improved any further.20 Secondly, for those who do choose to follow the
evidence and implement collaborative care, there are many decisions that need
to be made in the design of effective care systems. The use of screening as a
point of entry to enhanced care raises a number of ethical and logistical
issues.13 Screening usually identifies an unmet need and creates an increased
demand for care. If this demand is not met, screening itself might do more harm
than good. Services will have to be planned accordingly to meet this need (and
expectation of care) from within finite healthcare resources.
Ultimately, the most thorough way in which the effectiveness of screening
as a necessary or active component of enhanced care could be established
would be through the conduct of a randomized controlled trial of enhanced care
with screening, versus identical enhanced care without screening. To date (and
to our knowledge) there are no such trials, and it is debatable whether any such
trial will ever be conducted. In the interim, it is clear that screening is not a
sufficient intervention to improve the quality and outcomes of care for depres-
sion. What is less clear is whether screening is a necessary condition for
enhanced and improved quality of care for this important condition.

References
1. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed
patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919.
2. Murray CJ, Lopez AD. The global burden of disease: a comprehensive assessment of
mortality and disability from disease, injuries and risk factors in 1990. Boston: Harvard
School of Public Health on behalf of the World Bank, 1996.
138 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

3. Thomas C, Morris S. Cost of depression among adults in England in 2000. Br J


Psychiatry. 2003;183:514–519.
4. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in
the United States: how did it change between 1990 and 2000? J Clin Psychiatry.
2003;64:1465–1475.
5. Cepoiu M, McCusker J, Cole MG, et al. Recognition of depression by non-psychiatric
physicians—a systematic literature review and meta-analysis. J Gen Intern Med.
2008;23:25–36.
6. Simon G, Von Korff M. Recognition and management of depression in primary care.
Arch Fam Med. 1995;4:99–105.
7. Katon W, Ciechanowski P. Impact of major depression on chronic medical illness. J
Psychosom Res. 2002;53:859–863.
8. Wright A. Should general practitioners be testing for depression? Br J Gen Pract.
1994;44:132–135.
9. Sharp LK, Lipsky MS. Screening for depression across the lifespan: a review of
measures for use in primary care settings. Am Fam Physician. 2002;66:1001–1008.
10. U/S/ Preventive Services Task Force. Guide to clinical preventive services, 2nd ed.
Alexandria, VA: International Medical Publishing, 1996.
11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
12. Klinkman MS, Coyne JC, Gallo S, et al. False positives, false negatives and the validity
of the diagnosis of major depression in primary care. Arch Family Med.
1998;7:451–461.
13. Palmer SC, Coyne JC. Screening for depression in medical care: pitfalls, alternatives,
and revised priorities. J Psychosom Res. 2003:54(4):279–287.
14. New Freedom Commission on Mental Health. Achieving the promise: transforming
mental health care in America—final report. Rockville, MD: DHHS Pub. No. SMA-
03–3832, 2003.
15. Cochrane AL, Holland WW. Validation of screening procedures. Br Med Bull.
1971;27:3–8.
16. Mant D, Fowler G. Mass screening: theory and ethics. BMJ. 1990;300:916–918.
17. Stewart-Brown S, Farmer A. Screening could seriously damage your health. BMJ.
1997;314:533–534.
18. Wilson JM, Junger CT. Principles and practice of screening for disease: World Health
Organization Public Health Paper 34. Geneva: World Health Organization, 1968.
19. Gilbody S, Whitty P, Grimshaw JG, et al. Improving the recognition and management
of depression in primary care. Effective Health Care Bulletin, University of York.
2002;7(Number 5).
20. Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ.
2006;332(7548):1027–1030.
21. National Screening Committee. The UK National Screening Committee’s Criteria for
appraising the viability, effectiveness and appropriateness of a screening programme
(available at http://www.nsc.nhs.uk/pdfs/criteria.pdf). London: HMSO, 2003.
22. Oxman TE, Sengupta A. Treatment of minor depression. Am J Geriatr Psychiatry.
2002;10:256–264.
23. Gilbody SM, House AO, Sheldon TA. Screening and case finding for depression. The
Cochrane Library (Issue 4). Chichester: Wiley Publishing, 2005.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 139

24. Gilbody SM, House AO, Sheldon TA. Routinely administered questionnaires for
depression and anxiety: a systematic review. BMJ. 2001;322:406–409.
25. Coyne JC, Palmer SC, Sullivan PA. Screening for depression in adults. Ann Intern Med.
2003;138(9):767–768.
26. AHCPR Depression Guideline Panel. Depression in primary care: detection, diagnosis,
and treatment. Technical report. Number 5. Rockville, MD: US Department of Health
and Human Services, Public Health Service, 2000.
27. MacMillan HL, Patterson CJS, Wathen CN, and The Canadian Task Force on
Preventive Health Care. Screening for depression in primary care: recommendation
statement from the Canadian Task Force on Preventive Health Care. CMAJ.
2005;172(1):33–35.
28. Agency for Health Care Policy Research. Depression in primary care. Washington DC:
US Department of Health and Human Services, 1993.
29. Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for
depression: a Cochrane systematic review and exploration of heterogeneity. CMAJ.
2008;178:1023–1024.
30. Beck D, Gilbody SM. Screening and case finding for depression. The Cochrane Library
(Issue 4). Chichester: Wiley Publishing, 2008.
31. National Institute for Clinical Excellence. Depression: core interventions in the
management of depression in primary and secondary care. London: HMSO, 2004.
32. Hickie IB, Davenport TA, Ricci CS. Screening for depression in general practice and
related medical settings. Med J Austr. 2002;177(7 Suppl):S111–S116.
33. Wells KB. The design of Partners in Care: evaluating the cost effectiveness of improving
care for depression in primary care. Social Psychiatry Psychiatr Epidemiol. 1999;34:20–29.
34. Gilbody S, Whitty P, Grimshaw J, et al. Educational and organizational interventions to
improve the management of depression in primary care: a systematic review. JAMA.
2003;289:3145–3151.
35. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative
meta-analysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314–
2321.
36. Bower P, Gilbody S. Managing common mental health disorders in primary care:
conceptual models and evidence base. BMJ. 2005;330:839–842.
37. Wells K, Sherbourne C, Schoenbaum M, et al. Five-year impact of quality improvement
for depression: results of a group-level randomized controlled trial. Arch Gen
Psychiatry. 2004;61:378–386.
38. Katzelnick DJ, Simon GE, Pearson SD, et al. Randomized trial of a depression
management program in high utilizers of medical care. Arch Fam Med. 2000;9:345–
351.
39. Wells KA, Sherbourne C, Schoenbaum M, et al. Impact of disseminating quality
improvement programs for depression in managed primary care: a randomized
controlled trial. JAMA. 2000;283:212–220.
40. Whooley MA, Stone B, Soghikian K. Randomized trial of case-finding for depression
in elderly primary care patients. J Gen Intern Med. 2000;15:293–300.
41. Rost K, Nutting PA, Smith J, et al. Improving depression outcomes in community
primary care practice: a randomised trial of the QuEST intervention. J Gen Intern Med.
2001;16:143–149.
42. Thompson S. Why sources of heterogeneity in meta-analysis should be investigated. In:
Chalmers I, Altman DG, eds. Systematic reviews. London: BMJ, 1995.
140 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

43. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and
interpreted? Stat Med. 2002;21:1559–1573.
44. Bower P, Gilbody SM, Richards D, et al. Collaborative care for depression: making
sense of complex interventions through systematic review and meta-regression. British
Journal of Psychiatry 2006;189:484–493.
45. Peveler R, George C, Kinmonth AL, et al. Effect of antidepressant drug counselling and
information leaflets on adherence to drug treatment in primary care: randomised
controlled trial. BMJ. 1999;319:612–615.
46. Akerblad AC, Bengtsson F, Ekselius L, et al. Effects of an educational compliance
enhancement programme and therapeutic drug monitoring on treatment adherence in
depressed patients managed by general practitioners. Int Clin Psychopharmacol.
2003;18:347–354.
47. Brook O, van Hout H, Nieuwenhuyse H, et al. Impact of coaching by community
pharmacists on drug attitude of depressive primary care patients and acceptability to
patients; a randomized controlled trial. Eur Neuropsychopharmacol. 2003;13:1–9.
48. Capoccia K, Boudreau D, Blough D, et al. Randomized trial of pharmacist interventions
to improve depression care and outcomes in primary care. Am J Health System
Pharmacy. 2004;61:364–372.
49. Datto CJ, Thompson R, Horowitz D, et al. The pilot study of a telephone disease
management program for depression. Gen Hosp Psychiatry. 2003;25:169–177.
50. Dietrich AJ, Oxman TE, Williams JW Jr, et al. Going to scale: re-engineering systems
for primary care treatment of depression. Ann Fam Med. 2004;2(4):301–304.
51. Finley P, Rens H, Gess S, et al. Case management of depression by clinical pharmacists
in a primary care setting. Formulary. 1999;34:864–870.
52. Hunkeler EM, Meresman JF, Hargreaves WA, et al. Efficacy of nurse telehealth care
and peer support in augmenting treatment of depression in primary care. Arch Fam
Med. 2000;9:700–708.
53. Katon W, Robinson P, Von Korff M, et al. A multifaceted intervention to improve
treatment of depression in primary care. Arch Gen Psychiatry. 1996;53(10):924–932.
54. Mann A, Blizard R, Murray J. An evaluation of practice nurses working with general
practitioners to treat people with depression. Br J Gen Pract. 1998;48:875–879.
55. Wilkinson G, Allen P, Marshall E. The role of the practice nurse in the management of
depression in general practice: treatment adherence to antidepressant medication.
Psychol Med. 1993;23:229–237.
56. Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-
regression. Statistics in Medicine. 2004;23:1663–1682.
57. Katon W, von Korff M, Lin E, et al. Adequacy and duration of antidepressant treatment
in primary care. Med Care. 1992;30:67–76.
58. Katon W, Von Korff M, Lin E, et al. Population-based care of depression: effective
disease management strategies to decrease prevalence. Gen Hosp Psychiatry.
1997;19:169–178.
59. Goldberg D. The detection of psychiatric illness by questionnaire. Oxford: Oxford
University Press, 1972.
60. Simon G. Collaborative care for depression. BMJ. 2006;332:249–250.
61. Unutzer J, Schoenbaum M, Druss BG, et al. Transforming mental health care at the
interface with general medicine: report for the President’s Commission. Psychiatr Serv.
2006;57:37–47.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 141

62. Adler DA, Bungay KM, Wilson IB, et al. The impact of a pharmacist intervention on
6-month outcomes in depressed primary care patients. Gen Hosp Psychiatry.
2004;26(3):199–209.
63. Araya R, Rojas G, Fritsch R, et al. Treating depression in primary care in low-income
women in Santiago, Chile: a randomised controlled trial. Lancet. 2003;361:995–1000.
64. Blanchard MR, Waterreus A, Mann AH. The effect of primary care nurse
intervention upon older people screened as depressed. Int J Geriatr Psychiatry.
1995;10:289–298.
65. Bruce M, Ten Have T, Reynolds C, et al. Reducing suicidal ideation and depressive
symptoms in depressed older primary care patients. JAMA. 2004;291(9):1081–1091.
66. Callahan C, Hendrie H, Dittus R, et al. Improving treatment of late life depression in
primary care: a randomized clinical trial. J Am Geriatr Soc. 1994;42:839–846.
67. Coleman EA, Grothaus LC, Sandhu N, et al. Chronic care clinics: a randomized
controlled trial of a new model of primary care for frail older adults. J Am Geriatr
Soc. 1999;47:775–783.
68. Dietrich AJ, Oxman TE, Williams JW, et al. Re-engineering systems for the treatment
of depression in primary care: cluster randomised controlled trial. BMJ.
2004;329:602–609.
69. Jarjoura D, Polen A, Baum E, et al. Effectiveness of screening and treatment for
depression in ambulatory indigent patients. J Gen Intern Med. 2004;19(1):78–84.
70. Katon W, Von Korff M, Lin E, et al. Stepped collaborative care for primary care
patients with persistent symptoms of depression: a randomized trial. Arch Gen
Psychiatry. 1999;56:1109–1115.
71. Katon W, Rutter C, Ludman EJ, et al. A randomized trial of relapse prevention of
depression in primary care. Arch Gen Psychiatry. 2001;58:241–247.
72. Katon WJ, Von Korff M, Lin EHB, et al. The Pathways Study: a randomized trial of
collaborative care in patients with diabetes and depression. Arch Gen Psychiatry.
2004;61:1042–1049.
73. Oslin D, Sayers S, Ross J, et al. Disease management for depression and at risk drinking
via telephone in an older population of veterans. Psychosom Med. 2003;65:931–937.
74. Rickles N, Svarstad BL, Statz-Paynter JL, et al. Pharmacist telemonitoring of
antidepressant use: effects on pharmacist–patient collaboration. J Am Pharm Assoc.
2005;45:344–353.
75. Simon G, Von Korff M, Rutter C, et al. Randomised trial of monitoring, feedback and
management of care by telephone to improve treatment of depression in primary care.
BMJ. 2000;320:550–554.
76. Simon GE, Ludman EJ, Tutty S, et al. Telephone psychotherapy and telephone care
management for primary care patients starting antidepressant treatment: a randomized
controlled trial. JAMA. 2004;292(8):935–942.
77. Swindle R, Rao J, Helmy A, et al. Integrating clinical nurse specialists into the
treatment of primary care patients with depression. Int J Psychiatry Med.
2003;33(1):17–37.
78. Unutzer J, Katon W, Williams J, et al. Improving primary care for depression in late
life: the design of a multicenter randomized trial. Med Care. 2001;39(8):785–799.
This page intentionally left blank
8
TECHNOLOGICAL APPROACHES
TO SCREENING AND CASE FINDING
FOR DEPRESSION

William H. Rogers, Debra Lerner, and David A. Adler

1. Technological Methods of Screening for Depression


2. Ten Issues When Developing Computerized Screening for Depression
3. Examples of Implementation of Computerized Screening for Depression
4. Discussion
5. Conclusion

Context
What are the strengths and weaknesses of computer-based and other auto-
mated methods of detecting depression? Two promising technologies make use
of the Internet and speech recognition. Whatever technology is used, each
method needs to be assessed rigorously using the same high standards that
have been applied to pencil-and-paper tests.

We are in the midst of a technological revolution that inevitably will transform


psychiatric clinical practice. A consensus for routine depression screening is
building,1,2 and at the same time methods by which it could be accomplished are
emerging. The hope is that the right technology can provide an easy, inexpensive,
valid, and reliable public health approach to depression screening.
Computerized assessment is well accepted in diverse fields, and the use of
Internet-based survey technology has grown exponentially.3–7 Issues regarding
the strengths and limitations of computerized assessments are addressed reg-
ularly in the literature.3–11 For example, such assessments have been shown to
improve data quality while at the same time reducing cost as well as the time to
score, analyze, and report results. Increasingly, as depressive disorders have

143
144 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

been recognized as highly prevalent with significant morbidity, multiple


screeners using an array of technological advances have been developed2,12–33
(Table 8.1 lists selected studies).34–49
This chapter will review the technologies that are currently available for
automated depression screening and will discuss them in terms of criteria that
should dictate their adoption.

1. Technological Methods of Screening for Depression


The growing list of technologies can be classified on several dimensions. Perhaps
the most important of these is adaptive vs. non-adaptive. In an adaptive technology
pioneered by the Educational Testing Service,50 a computer, using a prepro-
grammed algorithm, decides which question to ask next given the responses so
far.3,9,48,49,51–55 Paper-and-pencil is the classical non-adaptive technology—
everyone gets the same paper with the same questions in the same order.
Technological modality is a second dimension. Currently available tech-
nologies include the phone, the Internet, and hand-held electronic devices.5
The phone can be split into several groups, including agent: computer-assisted
telephone interview (CATI), speech recognition, and touch-tone. Phone can
also be classified as inbound (the patient initiates the call to a toll-free number)
or outbound (the system initiates the call). Hand-held devices could include
tablets such as personal digital assistants, game consoles, modern cell phones,
or ‘‘electronic paper.’’ Internet-based screeners (eg, Patient Health
Questionnaire-9 [PHQ-9], Zung Self-Rating Depression Scale),13,20 can be
implemented through standard web browsers, at public kiosks, or through
connected hand-held devices. In this chapter, all of these methods are classified
together under the term ‘‘Internet’’ because they follow a common approach of
visually presenting the screener or monitoring instrument and taking responses
by interaction with that visual image. There is always a computer involved in
presenting the data and recording the responses.
One can even envision the day when more futuristic technologies such as
eye-tracking equipment, brain scans, blood tests, or electrical system monitors
for depression will be available.
Two basic premises underlie our discussion:
1. There is no fail-proof methodology. There is no single technology that
guarantees success, but some technologies have inherent failures.
2. Implementation and circumstances matter. A technology that performs
well in one setting (eg, Internet screening at home) may be unacceptable
in another (automated screening on a desktop computer in a physician’s
waiting room). In the current marketplace, there are no full-service
automated systems that are embedded in an electronic medical record.
Table 8.1. Technological Methods of Depression Screening: Summary of Studies

Technological Author/ Sample/Setting Accuracy of Computerized Method Comment


Method Publication

Mental Health-Based Studies


Computer voice Gonzales (2007), English- and Spanish-speaking CES-D 20, alpha = 0.87–0.91 Computerized CES-D speech
recognition: Hisp J Behav Sci patients, n = 217, visual computer/written; CES-D vs. BDI-2: recognition vs. written acceptable in
VIDAS r = 0.74–0.86; ROC CES-D (cut point both English and Spanish speakers;
of 16) vs. CIDI-SF: Se: 0.88–1.0; Sp: visual somewhat better than aural
0.42–0.20; PPV: 0.61–0.28; NPV:
0.77–1.0
Computer vs. Kobak (1997), CMHC, n = 51 PRIME-MD IVR/Desktop and SCID- IVR vs. Desktop of PRIME-MD and
IVR telephone Psychiatr Serv IV for MDD Kappa 0.49/0.27; Se: compared to phone SCID-IV, Ham
0.77/0.77; Sp: 0.75/0.50; PPV: 0.87/ D-17 and chart Dx, both acceptable
0.77; NPV: 0.77/0.69; similar phone SCID and chart Dx, both
prevalence rates acceptable
Computer voice Munoz (1999), Women’s health clinic, n = 104 CES-D 20 and MDD (18-item DIS Voice recognition of CES-D and
recognition J Consult Clin English- and Spanish-speaking Mood questions) Screener K=0.82/ MDD screener to clinician interview
Psychol women 0.89 for current and lifetime MDD for of both plus PRIME-MD yielded
computer vs. interview K = 0.81/0.75 comparable results
computer vs. interview of MDD vs.
PRIME-MD Se: 0. 89/0.91 current/
lifetime MDD; Sp: 0.93/0.91 current/
lifetime MDD
Population-Based Studies
IVR using Baer (1995), Midwest Univ. and NE high-tech Zung (SDS)-20 found acceptable by No direct comparison with other
telephone JAMA firm; n = 1,812; 1,597/1,812 Zung subjects forms of screening
keypad completers
Computer Lin (2007), BMC Taiwanese volunteers, n = 579 ISP-D for MDD vs. MINI (N=55): Internet-based Self-assessment
touchscreen Psychiatry Kappa 0.80; Se: 0.82; Sp: 0.73; PPV: Program for Depression (ISP-D) is
0.67; NPV: 0.86 reliable and valid online tool for
assessing depression with excellent
retest reliability

(Continued )
Table 8.1. (Continued)

Technological Author/ Sample/Setting Accuracy of Computerized Method Comment


Method Publication

Computer Patton (1999), Australian HS students; n = 2,032 Computerized CIS-R to live CIDI 2–9 Students favorable to computer
Soc Psychiatry 65 of 1,729 completers with MDD weeks late CISR/CIDI Se: 0.97; Sp:
Psychiatr 0.18; PPV: 0.49; NPV: 0..91
Epidemiol
Medical-Based Studies
Computer Allenby (2002), Australian amb. oncology center; n BDI-2, Cancer Needs Questionnaire; No direct comparison with other
touchscreen Eur J Cancer = 450, median age 61 EORTC QLQ-C30 forms of screening. Acceptable to
Care patients
Computer Bliven (2001), Cardiac OPD, n = 55 SF-36, 8 subscales/Seattle Angina Compared computer to written, 82%
touchscreen Quality of Life Quest. SF-36 computer/written r preferred computer
Research = 0.54–0.76; SF-MH mean scale
computer/written: 66.19/65.77; r =
0.54
Computer Cull (2001), Br J Outpatient chemotherapy patients, MHI-5>10, Hospital Anxiety and Two (HADS and MHI-5) screeners 2–
touchscreen Cancer n = 172 Depression Scale (HADS) >8, 4 weeks apart compared to an in-
computer vs. PSE diagnosis of MDD: person interview using Present State
Se: 0.85; Sp: 0.71; PPV: 0.47; NPV: Exam (PSE) within a week
0.26
Computer Kurt (2004), Pts. >65, PCP office; n = 240; 68/ CESD-20 (or 35) and GDS (Geriatric Patients favorable to computer
touchscreen Computer 240 participated Depression Scale) computer/written:
Methods in BL reliability: 0.74/0.72 computer/
Biomedicine written: F/Up reliability: 0.61/0.83
Computer Sharpe (2004), Br Cancer center, n = 5,613; 891/ Comparison of Hospital Anxiety and No direct comparison with other
touchscreen J Cancer 3,938 HADS completers, score Depression Scale (HADS) with DSM- forms of screening
>14; 196/570 interviewed had IV SCID clinician telephone
MDD interview
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 147

2. Ten Issues When Developing Computerized Screening


for Depression
With this in mind, we now consider the issues that arise regarding the use of
automated screeners in general and depression-monitoring instruments
specifically.

Quality Control and Accuracy


The first question posited in any discussion of automation is its accuracy.
Technology-based methods are more consistently applied, which implies
more comparable and interpretable data.3,6,17,20,47,56–66 No human bias is
introduced. Clinician interviews and agent-administered phone CATI depend
on a human being. A clinician or an agent speaks and listens differently every
time. Paper-and-pencil screeners, as well as automated electronic surveys,
eliminate this source of variation. If this advantage is pursued, agreement
with known standards can be improved beyond what is possible with a clinician
or agent. While the technology already exists, ensuring accuracy rests on the
craftsmanship of the instrument (eg, inaccurate or poorly designed program-
ming will result in poor-quality data).

Error Control
Evidence to date is that different data collection methods do not change the
probability that the answer is recorded as intended.7 In paper-and-pencil
screeners, respondents can make stray marks that scanners cannot easily
interpret. These can be reduced to acceptable levels by providing clear instruc-
tions with examples on how to make marks. In speech recognition systems,
respondents can speak responses outside of the answer set, but asking questions
in a way that prompts a response in range and challenging responses that do not
seem to be within range can reduce this.36 For both of these systems, human
post-response review of questionable responses is desirable. For example,
scanners can detect stray marks and voice recognizers can identify problematic
voice input. With these measures, very low error rates (eg, over 99.5% correct)
are possible. Without these measures, the error rates are low but errors do occur
(eg, over 98% correct). Numerous data companies report error control checks
within these ranges and better.
Nominal error rates for touch-tone and for the Internet and related technol-
ogies such as kiosks and hand-held devices are low because these systems
enforce a single answer. However, this does not mean that such devices are free
of error. The error rates on the Internet are low if the respondent can see all the
responses and no default choices are premarked. Several studies have found
148 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

that Internet surveys and mail are equivalent.67 If the respondent has to click a
mouse to see all the responses, then the results will be biased. For touch-tone
interactive voice recognition (IVR), elderly respondents and those whose
touch-tone buttons are in the receiver are likely to have high error rates, but
no further identification of errors is possible without a very laborious review of
each response—a practice suitable for banking but not for screening question-
naires. Touch-tone also invites cognitive errors because the verbal responses
must be converted to numerical form before they can be entered. Most studies
have concluded that touch-tone is not equivalent to mail.67

Honesty
Research has shown repeatedly that respondents even with depression are more
honest with computers or mail than they are with live interviewers, translating
into better acccuracy.59,60,64,68–70

Physical Clues
Conversely, human interpreters, and especially clinician interviewers, are best
at dealing with clues such as crying, gaps in speech, or slurred, sped-up, or
retarded speech that might have important implications in the screening pro-
cess.4 Voice recognition systems could also be trained to find these, but this has
not happened yet to our knowledge, and it would never be as good as trained
clinicians meeting with depressed individuals.

Performance
Case-specific performance data are key to successful use of an automated
system, given the potential time savings.7,20 Physicians can use the results
most efficiently if patient-specific reports of positive predictive value (PPV)
and negative predictive value (NPV) are included. In one of the few studies
addressing depression, Kobak and colleagues,20 using the PHQ-9, reported a
PPV of 0.87 and a sensitivity for touch-tone and IVR of 0.84 to 0.88. The cost
of untreated depression is high, particularly among employed patients,71–74 so
automated screening will normally be cost-effective compared with the hap-
hazard approach characteristic of population screening. If the screener cannot
find cases (poor sensitivity or low NPV), then other case-finding tools may
need to be used anyway.

Workload Considerations
A highly effective automated system that is used to screen all individuals
routinely has the potential to generate many possible or probable cases very
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 149

quickly. For example, as found in studies by Sharpe30 and Cull40 and their
colleagues, if every attendee at a regional cancer center is assessed, it is
possible that 20% might be flagged as high scorers on a depression scale.
Even with a second filter such as request for help, a large number of people
may need to be seen. The potential benefit of a high yield of true cases might
come at the expense of a large number (in absolute terms) of false positives,
each of whom has higher expectations on the basis of the first-stage alert and
needs to be have follow-up. Alternatively, fear of workload may defeat the
screening process itself. When the PPV is too much below 70%, physicians
may choose to ignore screening results on the grounds that following up 30% or
more who are false positives is too much work.7,75 Although PPV and sensi-
tivity are affected by response errors, they are more influenced by the screening
instrument itself. The balance between them is implementation-specific. In
general, demanding criteria for diagnosing depression will result in good PPV
but poor sensitivity.27

Acceptability
A system is useful only if subjects are willing to use it; acceptability is a
necessity for implementation of any automated screening system. Most of the
evidence to date suggests that patients accept automated screening as a general
idea compared with visits to mental health specialists.3,6,20,24,30,40 A number of
national studies have had excellent response rates with no particular item non-
response on depression screening questions.38,47,76–78
With respect to the technologies, the survey response literature has some
lessons to teach. The technological challenge to the respondent of touch-tone
IVR is higher than speech recognition; touch-tone response rates are lower.
The Internet (and associated device-related technologies) is generally regarded
as usable, but not every home has a computer, and in many businesses personal
computer use is restricted or frowned upon.4 In addition, many people have
privacy worries about the Internet, and in some businesses these are justified.79
Some degree of computer skill and literacy is necessary.38 The impact of age
cohort, gender, and cultural issues requires further study. This suggests that
alternatives to the Internet will remain useful. Combination approaches invol-
ving Internet, phone, and either outbound calling or mail achieve the best
coverage.67

Prices
As a general rule, prices are highly implementation-dependent, and a bid is
necessary to know what the price will be. However, some general principles
apply. Paper-and-pencil surveys depend on a combination of mailing costs and
150 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

processing costs.80,81 Very efficient high-end scanners are available, but they
must still be fed. Even a ‘‘free’’ screener that is entered by fax machine in a
doctor’s office costs more than $3 when the cost of handing out the survey,
collecting the response, and feeding it into a fax machine is counted. If mail is
involved, back-end duties can be handled by clerks, but this cost reduction is
more than offset by the price of mailing.6,67,76 The traditional methods of
screening such as paper surveys and scanning are only suited to large-scale
data-collection systems with central mail processing facilities and are difficult
to manage in smaller settings. For Internet screeners and voice recognition or
touch-tone, the marginal cost of the screener ranges from nothing to a dollar,
but there are fixed costs associated with developing and fielding the system
purposes.82–84 Such costs are typically between $10,000 and $25,000.12

Availability
All of the methods except for scanned paper-and-pencil surveys can be pro-
cessed immediately, with real-time feedback to respondents about what to do.
Patients often have time to consider the possibilities at times of the day when
physicians are not available (eg, the middle of the night). Results are immedi-
ately available without transcription error.47

Embedding in a System
To be useful, a screening system needs to be embedded in a healthcare system
that can deal with the information.3,7,20,85,86 Unless the results are available and
retrievable, they are useless. This very important issue is mostly beyond the
scope of this paper. Technology has some impact. A mailed and scanned
questionnaire cannot be acted on in a timely way. All of the electronic methods
can be followed up with questions about context (Did someone important to
you die recently? Are you thinking of taking your life soon?). In principle, the
results can be transmitted to electronic medical records (EMR) or physician
e-mail, if the setting allows for one. Contextual data such as medications could
also be drawn from an EMR. In the current environment, embedding screeners
is still a custom operation—EMR is not at this point sold with a depression
screener or monitor website included.

3. Examples of Implementation of Computerized Screening


for Depression
Whether a system is actually acceptable in practice depends on both the
technology and the context. All of the technologies have been shown to be
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 151

acceptable in some context (see Table 8.1 for selected studies discussed
below). For example, in our prior work, most patients in primary care offices
were willing to fill out a two-page depression screener that was immediately
scanned.26 We are now using web-based touchscreen methodology to screen

Figure 8.1.a Work and Health Initiative depression pre-screener.

Figure 8.1.b Work and Health Initiative depression pre-screener.


152 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Figure 8.1.c Sample electronic WHI Patient Depression Report.

employed individuals for depression in workplace settings (Fig. 8.1). The study
by Baer and associates13 using IVR with telephone keypad response was one of
the first to demonstrate the use and acceptability of fully automated technology
for confidential mass depression screening. Two recent studies—Gonzalez and
associates,36 using computer voice recognition, and Lin and coworkers,35
using computer touchscreen—found good psychometric properties for well-
accepted depression screeners compared to standardized diagnostic in-person
interviews. Kobak and associates,19,20,47,61 in a series of studies, demonstrated
the acceptability and equivalence of all forms of depression screening (clin-
ician interview by telephone, phone IVR, and computer touchscreen). Kurt and
colleagues22 found similar results for a computer-assisted assessment of
depression in geriatric primary care patients. Even in a minority population,
Munoz and associates24 met no resistance to depression screening with com-
puterized voice-recognition technology.
In non-mental health outpatient settings Allenby and colleagues12 in
oncology and Bliven and associates80 in cardiology found high degrees of
acceptability for computer-assisted technology in screening for psychosocial
distress. Sharpe and colleagues30 applied touchscreen technology and found no
resistance to screening for depression and anxiety in a regional ambulatory
cancer center. Cull and colleagues40 used touchscreen technology to admin-
ister the Mental Health Index and Hospital Anxiety and Depression Scale to
develop a depression screening algorithm with adequate psychometric proper-
ties among outpatient cancer patients.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 153

4. Discussion
Automated methods for both general health and depression-specific screening
are here to stay. They produce more accurate answers, are more suited to
evidence-based medicine, and are less expensive than paper-and-pencil
person-dependent methods or mail. Electronic methods are also superior to
paper and pencil because they produce timely answers and can also explore
some of the follow-up issues, such as more detail about suicidal ideation or
how the patient fits into the care process. While mental health clinicians’ face-
to-face observations of patients can identify verbal and nonverbal depressive
cues and lead to more immediate response, most individuals with depression
are not seen in the mental health specialty sector. However, gaps in both
evidence and barriers remain to effective widespread use.
Once a screening context is established, then some methods that are accep-
table in principle become unacceptable in practice. For example, most patients
would feel uncomfortable conducting a phone interview while sitting in a
crowded waiting room, or taking an Internet-based screener on a home com-
puter known to be infected with a virus. On the other hand the same patients
might feel comfortable taking a phone interview at home or completing an
Internet-based screener on a computer in a private room off the waiting area at
the doctor’s office. A number of groups have studied the issues of implementa-
tion in a number of settings focusing on acceptability and accuracy
(see Table 8.1). In general, these pilot projects find that depressed patients
are able to accurately complete both computer (desktop and web) and tele-
phone screener methodologies and find them acceptable alternatives to both
paper-and-pencil and clinician interviews. Just as with conventional methods,
there is no one-size-fits-all answer: multiple modalities are needed to meet
varied patient and provider needs. Solution modality by itself (eg, Internet,
phone, or tablet) is not the answer—much of the value lies in the craft with
which it is executed. Good-quality solutions are available in all three modal-
ities, but so are poor solutions. Choice is dependent upon purpose. If tech-
nology such as computer-adaptive testing is to be applied to population
screening, a multi-tiered approach can improve the accuracy. For example, a
general mental health prescreening can efficiently reduce the number of
individuals who might then be followed with a diagnosis-specific pre-screener,
reserving full screening for at-risk populations and for following patients
known to have a depressive disorder.
With respect to acceptability, the evidence to date suggests that automated
depression screening via web, computer, telephone, or soon tablet does not
incur reluctance by those screened. With respect to follow-up, however, the
story may differ. In most health risk-appraisal systems, patients and providers
can ignore a positive depression screener. On the other hand, a positive
154 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

screener can lead to overreaction. Work needs to be done on the back end of a
positive screener to identify cases that are appropriate for follow-up. Careful
thought needs to be given to how results will be handled with providers, what
follow-up would be cost-effective, and who will need to deliver follow-up
services. Nonetheless, without an electronic system, there is no mechanism to
help the system address these issues.
The marketplace will continue to define and redefine solutions that are
available and affordable. We have raised a set of questions that should be
asked of such systems and put them into two categories: concerns that are
frequently raised but usually do not turn out to be important issues (eg,
accuracy and acceptability) and concerns that have often led to existing
systems working less well than they could and that need to be addressed in
every implementation (eg, privacy, follow-up, and the interface of automated
results to the physician–patient relationship).

5. Conclusion
Thirty years of research has led to the conclusion that the benefits of
automated methods outweigh their limitations in general,3,6,7 for mental
health issues,3,20,58,61,62,64,68,87 and specifically for depression
13,15,16,20,24,35,36,47,88,89
screening. In the absence of information about a parti-
cular implementation and the setting it is in, one cannot say that it is auto-
matically worthwhile or unacceptable. However, one can say that pencil-and-
paper screeners will be effective only under a limited set of conditions that
avoid the costs and delays commonly associated with mail. The two most
promising technologies seem to be the Internet (using web browsers and/or
hand-held devices) and speech recognition. Whatever technology is used, there
needs to be a good fit between the technology and the system within which it is
deployed.86 Acceptability depends on context; accuracy depends on craft. The
system needs to connect the patient to a physician and support that physician
with the correct information.

References
1. Agency for Health Care Policy and Research. Depression in primary care: detection
and diagnosis. Rockville, MD, 1993.
2. U.S. Preventive Services Task Force. Guide to clinical preventive services, 2nd ed.
Baltimore: Williams & Wilkins, 1996.
3. Berger M. Computer-assisted clinical assessment. Child Adolesc Mental Health.
2006;11(2):64–75.
4. Butcher JN, Perry J, Hahn J. Computers in clinical assessment: historical developments,
present status, and future challenges. J Clin Psychol. 2004;60(3):331–345.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 155

5. Dillman DA. Mail and Internet surveys: the tailored design method, 2nd ed. Hoboken,
NJ: John Wiley & Sons, 2007:352–412.
6. Epstein J, Klinkenberg WD. From Eliza to Internet: a brief history of computerized
assessment. Computers in Human Behavior. 2001;17:295–314.
7. Garb HN. Computer-administered interviews and rating scales. Psychol Assess.
2007;19(1):4–13.
8. Buchanan T, Smith JL. Using the Internet for psychological research: personality
testing on the World Wide Web. Br J Psychol. 1999;90(Pt 1):125–144.
9. Revicki DA, Cella DF. Health status assessment for the twenty-first century: item
response theory, item banking and computer adaptive testing. Qual Life Res.
1997;6(6):595–600.
10. Truell AD, Bartlett JE, Alexander MW. Response rate, speed, and completeness: a
comparison of Internet-based and mail surveys. Behav Res Methods Instrum Comput.
2002;34(1):46–49.
11. Schleyer TK, Forrest JL. Methods for the design and administration of web-based
surveys. J Am Med Inform Assoc. 2000;7(4):416–425.
12. Allenby A, Matthews J, Beresford J, et al. The application of computer touch-screen
technology in screening for psychosocial distress in an ambulatory oncology setting.
Eur J Cancer Care (Engl). 2002;11(4):245–253.
13. Baer L, Jacobs DG, Cukor P, et al. Automated telephone screening survey for
depression. JAMA. 1995;273(24):1943–1944.
14. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression
Inventory: twenty-five years of evaluation. Clin Psychol Rev. 1988;8:77–100.
15. Gonzalez GM, Spiteri CB, Knowlton JP. An exploratory study using computerized
speech recognition for screening depressive symptoms. Computers in Human Behavior.
1995;11(1):85–93.
16. Carr AC, Ancill RJ, Ghosh A, et al. Direct assessment of depression by microcomputer.
A feasibility study. Acta Psychiatr Scand. 1981;64(5):415–422.
17. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med.
1983;13(1):151–158.
18. Klinkman MS, Coyne JC, Gallo S, et al. Case finding instruments to be used to
improve physician detection of depression in primary care. Arch Fam Med.
1997;6:567–573.
19. Kobak KA, Reynolds WM, Rosenfeld R, et al. Development and validation of a
computer-administered version of the Hamilton Depression Rating Scale. Psychol
Assess. 1990;2:56–63.
20. Kobak KA, Taylor LVH, Dottl SL, et al. Computerized screening for psychiatric
disorders in an outpatient community mental health clinic. Psychiatr Serv.
1997;48(8):1048–1057.
21. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity
measure. J Gen Intern Med. 2001;16(9):606–613.
22. Kurt R, Bogner HR, Straton JB, et al. Computer-assisted assessment of depression and
function in older primary care patients. Comput Methods Programs Biomed.
2004;73(2):165–171.
23. Mulrow CD, Williams JW Jr, Gerety MB, et al. Case-finding instruments for depression
in primary care settings. Ann Intern Med. 1995;122(12):913–921.
24. Munoz RF, McQuaid JR, Gonzalez GM, et al. Depression screening in a women’s
clinic: using automated Spanish- and English-language voice recognition. J Consult
Clin Psychol. 1999;67(4):502–510.
156 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

25. Patton GC, Coffey C, Posterino M, et al. A computerised screening instrument for
adolescent depression: population-based validation and application to a two-phase
case-control study. Soc Psychiatry Psychiatr Epidemiol. 1999;34(3):166–172.
26. Rogers WH, Wilson IB, Bungay KM, et al. Assessing the performance of a new
depression screener for primary care (PC-SAD(c)). J Clin Epidemiol.
2002;55(2):164–175.
27. Rogers WH, Adler DA, Bungay KM, et al. Depression screening instruments make
good severity measures in a cross-sectional analysis. J Clin Epidemiol.
2005;58:370–377.
28. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility
of depression screening. Psych Serv. 1998;49(1):55–61.
29. Schwenk TL. Screening for depression in primary care: a disease in search of a test.
J Gen Intern Med. 1996;11:437–439.
30. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a regional
cancer centre: screening and unmet treatment needs. Br J Cancer. 2004;90(2):314–320.
31. Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of
PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental
Disorders. Patient Health Questionnaire. JAMA. 1999;282(18):1737–1744.
32. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in
primary care. Ann Intern Med. 2001;134(5):345–360.
33. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
Two questions are as good as many. J Gen Intern Med. 1997;12(7):439–445.
34. Kim H, Bracha Y, Tipnis A. Automated depression screening in disadvantaged
pregnant women in an urban obstetric clinic. Arch Womens Ment Health.
2007;10(4):163–169.
35. Lin CC, Bai YM, Liu CY, et al. Web-based tools can be used reliably to detect patients
with major depressive disorder and subsyndromal depressive symptoms. BMC
Psychiatry. 2007;7:12.
36. Gonzalez GM, Carter C, Blanes E. Bilingual computerized speech recognition
screening for depression symptoms: comparing aural and visual methods. Hispanic
Journal of Behavioral Sciences. 2007;29(2):156–180.
37. Fann J, Berry DL, Wolpin SE, et al. Feasibility of depression screening using the PHQ-9
administered on a touchscreen computer. Psychooncology. 2006;15(1):S18–S18.
38. Ekman A, Dickman PW, Klint A, et al. Feasibility of using web-based questionnaires in
large population-based epidemiological studies. Eur J Epidemiol. 2006;21(2):103–111.
39. Hyler SE, Gangure DP, Batchelder ST. Can telepsychiatry replace in-person psychiatric
assessments? A review and meta-analysis of comparison studies. CNS Spectr.
2005;10(5):403–413.
40. Cull A, Gould A, House A, et al. Validating automated screening for psychological
distress by means of computer touchscreens for use in routine oncology practice.
Br J Cancer. 2001;85(12):1842–1849.
41. Houston TK, Cooper LA, Vu HT, et al. Screening the public for depression through the
internet. Psychiatr Serv. 2001;52(3):362–367.
42. Leon AC, Kelsey JE, Pleil A, et al. An evaluation of a computer-assisted telephone
interview for screening for mental disorders among primary care patients. J Nerv Ment
Dis. 1999;187(5):308–311.
43. Brodey BB, Rosen CS, Brodey IS, et al. Reliability and acceptability of automated
telephone surveys among Spanish- and English-speaking mental health services
recipients. Ment Health Serv Res. 2005;7(3):181–184.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 157

44. Mitchell AM, Mittelstaedt ME, Schott-Baer D. Postpartum depression: the


reliability of telephone screening. MCN Am J Matern Child Nurs.
2006;31(6):382–387.
45. Ogles BM, France CR, Lunnen KM, et al. Computerized depression screening and
awareness. Community Ment Health J. 1998;34(1):27–38.
46. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for
depression (D-CAT). Qual Life Res. 2005;14(10):2277–2291.
47. Kobak KA, Mundt JC, Greist JH, et al. Computer assessment of depression: automating
the Hamilton Depression Rating Scale. Drug Inf J. 2000;34:145–156.
48. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce
the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368.
49. Gardner W, Shear K, Kelleher KJ, et al. Computerized adaptive measurement of
depression: a simulation study. BMC Psychiatry. 2004;4:13.
50. Educational Testing Services. Educational testing services. [Web document], 2000.
Accessed 7-30-2007.
51. Green B, Bock R, Humphreys L, et al. Technical guidelines for assessing computerized
adaptive tests. J Educ Measure. 1984;21:347–360.
52. Sands WA, Waters BK, McBride JR. Computerized adaptive testing: from inquiry to
operation. Washington, DC: APA Books, 1997.
53. Wainer H, Dorans NL. Computerized adaptive testing: a primer. Hillsdale, NJ:
Erlbaum Associates, 2000.
54. Ware JE Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and
computerized adaptive testing: a brief summary of ongoing studies of widely used
headache impact scales. Med Care. 2000;38(9 Suppl):II73–II82.
55. Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol. 1985;53(6):774–789.
56. Baer L, Brown-Beasley MW, Sorce J, et al. Computer-assisted telephone
administration of a structured interview for obsessive-compulsive disorder.
Am J Psychiatry. 1993;150(11):1737–1738.
57. Buchanan T. Online assessment: desirable or dangerous. Professional Psychology:
Research and Practice. 2002;33:148–154.
58. Carr AC, Ghosh A. Accuracy of behavioural assessment by computer. Br J Psychiatry.
1983;142:66–70.
59. Erdman HP, Klein MH, Greist JH. Direct patient computer interviewing. J Consult Clin
Psychol. 1985;53(6):760–773.
60. Erdman HP, Greist JH, Gustafson DH, et al. Suicide risk prediction by computer
interview: a prospective study. J Clin Psychiatry. 1987;48(12):464–467.
61. Kobak KA, Greist JH, Jefferson JW, et al. Computer-administered clinical rating scales.
A review. Psychopharmacology (Berl). 1996;127:291–301.
62. Peters L, Andrews G. Procedural validity of the computerized version of the Composite
International Diagnostic Interview (CIDI-Auto) in the anxiety disorders. Psychol Med.
1995;25(6):1269–1280.
63. Robins L, Helzer J, Cottler L, et al. NIMH Diagnostic Interview Schedule, Version III
Revised (DIS-III-R). St. Louis, MO: Washington University, 1989.
64. Rosenfeld R, Dar R, Anderson D, et al. A computer-administered version of the Yale-
Brown Obsessive-Compulsive Scale. Psychol Assess. 1992;4:329–332.
65. Shaffer D, Fisher P, Lucas CP, et al. NIMH Diagnostic Interview Schedule for Children
Version IV (NIMH DISC-IV): description, differences from previous versions, and
reliability of some common diagnoses. J Am Acad Child Adolesc Psychiatry.
2000;39(1):28–38.
158 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

66. Wilson FR, Genco KT, Yager GG. Assessing the equivalence of paper-and-pencil
versus computerized tests: Demonstration of a promising technology. Computers in
Human Behavior. 1985;1:265–275.
67. Rodriguez HP, von GT, Rogers WH, et al. Evaluating patients’ experiences with
individual physicians: a randomized trial of mail, internet, and interactive voice
response telephone administration of surveys. Med Care. 2006;44(2):167–174.
68. Davis LJ Jr, Hoffmann NG, Morse RM, et al. Substance Use Disorder Diagnostic
Schedule (SUDDS): the equivalence and validity of a computer-administered and an
interviewer-administered format. Alcohol Clin Exp Res. 1992;16(2):250–254.
69. Millstein S. Acceptability and reliability of sensitive information collected via
computer interview. Educational and Psychological Measurement. 1987;47:523–533.
70. Rosenman SJ, Levings CT, Korten AE. Clinical utility and patient acceptance of the
computerized composite international diagnostic interview. Psychiatr Serv.
1997;48(6):815–820.
71. Adler DA, McLaughlin TJ, Rogers WH, et al. Job performance deficits due to
depression. Am J Psychiatry. 2006;163(9):1569–1576.
72. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in
the United States: how did it change between 1990 and 2000? J Clin Psychiatry.
2003;64(12):1465–1475.
73. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive
disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA.
2003;289(23):3095–3105.
74. Wang PS, Patrick A, Avorn J, et al. The costs and benefits of enhanced depression care
to employers. Arch Gen Psychiatry. 2006;63(12):1345–1353.
75. Grove WM, Zald DH, Lebow BS, et al. Clinical versus mechanical prediction: a meta-
analysis. Psychol Assess. 2000;12(1):19–30.
76. Selim AJ, Berlowitz DR, Fincke G, et al. The health status of elderly veteran enrollees
in the Veterans Health Administration. J Am Geriatr Soc. 2004;52(8):1271–1276.
77. Tarlov AR, Ware JE Jr, Greenfield S, et al. The Medical Outcomes Study. An
application of methods for monitoring the results of medical care. JAMA.
1989;262(7):925–930.
78. Wells KB, Burnam MA, Camp P. Severity of depression in prepaid and fee-for-
service general medical and mental health specialty practices. Med Care.
1995;33(4):350–364.
79. Kilbourne AM, McGinnis GF, Belnap BH, et al. The role of clinical information
technology in depression care management. Adm Policy Ment
Health.2006;33(1):59–69.
80. Bliven BD, Kaufman SE, Spertus JA. Electronic collection of health-related quality of
life data: validity, time benefits, and patient preference. Qual Life Res.
2001;10(1):15–22.
81. Radosevich DM, Werni TL. A practical guide for implementing, analyzing, and
reporting outcomes measurements. Health Outcomes Institute, 1998.
82. Rind DM, Kohane IS, Szolovits P, et al. Maintaining the confidentiality of medical
records shared over the Internet and the World Wide Web. Ann Intern Med.
1997;127(2):138–141.
83. Soetikno R, Young HS, Keefe EB. Role of emerging technology in the era of cost
containment. Am J Gastroenterol. 1997;92:1038–1040.
84. Subramanian AK, McAfee AT, Getzinger JP. Use of the World Wide Web for multisite
data collection. Acad Emerg Med. 1997;4(8):811–817.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 159

85. Barak A. Psychological applications on the Internet: a discipline on the threshold of a


new millennium. Applied and Preventive Psychology. 1999;8(4):231–245.
86. Blumenthal D, Glaser JP. Information technology comes to medicine. N Engl J Med.
2007;356(24):2527–2534.
87. Skinner HA, Allen BA. Does the computer make a difference? Computerized versus
face-to-face versus self-report assessment of alcohol, drug, and tobacco use. J Consult
Clin Psychol. 1983;51(2):267–275.
88. Greist JH, Gustafson DH, Stauss FF, et al. A computer interview for suicide-risk
prediction. Am J Psychiatry. 1973;130(12):1327–1332.
89. Kobak KA, Reynolds WM, Griest JH. Computerized and clinician assessment of
depression and anxiety: respondent evaluation and satisfaction. J Pers Assess.
1994;63(1):173–180.
This page intentionally left blank
9
SCREENING FOR DEPRESSION IN PRIMARY
CARE: CAN IT BECOME MORE EFFICIENT?

Kathryn M. Magruder and Derik E. Yeager

1. Introduction
2. Epidemiology of Depression in Primary Care
3. Is Screening for Depression in Primary Care Worthwhile?
4. Which Screening Tool Should Be Used?
5. Implementing Screening in Primary Care
6. What Developments Are on the Horizon?
7. Conclusions

Context
Screening for depression has been so widely advocated that the burden of proof
has shifted to skeptics who argue against it. Yet only recently has sufficient
evidence accrued to judge dispassionately the advantages and disadvantages
of screening. Here we discuss the evidence for specific tools and specific
strategies in improving the outcome of depression screening in primary care.

1. Introduction
In 1978, the Institute of Medicine defined primary care as ‘‘care that is
accessible, comprehensive, coordinated, continuous, and accountable.’’1
While the definition has evolved over time,2 these fundamental characteristics
are still valid today. Included in the primary care mission is to serve as the first
line for detection and either treatment or referral of common mental disorders,
including depression. The inclusion of first-line mental health services as a
component of primary care distinguishes primary care (including outpatient

161
162 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

clinics in managed care organizations, community hospitals, Veterans


Administration hospitals, teaching institutions, and other medical centers)
from care in more specialized clinical settings. The comprehensiveness of
primary care and the obligation of its providers for first-line care make it a
logical and appropriate venue for mental health screening.3
Complicating the issue, however, are the time constraints on primary care
providers. Although the amount of time spent per patient visit is about 20
minutes in the United States,4 the recommended services that should be
provided in that short period of time are daunting. It is therefore imperative
that these recommended services—in particular preventive health services—
be provided in the most efficient manner possible. Services that cannot be
provided efficiently and fit within the busy, fast-paced world of primary care
are at risk of being omitted. This is especially true for preventive mental health
services. Screening for depression is such a service; therefore, it is critical that
primary care providers make use of the best and most efficient depression
screening approaches possible.
In this chapter, we will address issues related to screening for depression in
the primary care context. We will start by briefly reviewing the epidemiology
of depression as related to primary care. Next, we will provide a critical
examination of the applicability to depression screening of the World Health
Organization’s criteria. Then we will review published screening tools and
their attributes for use in primary care settings. Last, we will provide a discus-
sion of future directions, including additional ways that screening for depres-
sion in primary care can be made more efficient and thus more effective and
more widely implemented.

2. Epidemiology of Depression in Primary Care


Population Prevalence of Depression
The National Comorbidity Survey Replication (NCS-R), conducted on
adults over 18 years old, found a 12-month prevalence of 9.5% for any
DSM-IV mood disorder, with 6.7% for major depression and 1.5% for
dysthymia.5 From this survey, 19.5% of major depression cases in the
community are classified as mild, with 50.1% and 30.4% classified as
moderate and serious, respectively.5 Thus, about 80% of those with major
depressive disorder have symptoms that are moderate to serious, and it is
likely that those who seek health services are in the higher spectra of
disorder. In a European epidemiologic study of mental disorders involving
six countries, major depression was the single most common disorder
assessed, with a 12-month prevalence of 3.9%.6 Wittchen and Jacobi7
conducted a meta-analysis of 27 studies with data on the prevalence of
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 163

mental disorders in European countries. The 12-month prevalence of major


depression ranged between 3.1% and 10.1%, with a median prevalence of
6.9%. Clearly, depression may be the most prevalent of mental disorders and
constitutes a worldwide problem affecting approximately 5% to 10% of
adults in a given year.

Primary Care Prevalence of Depression


An early compendium of studies showed that pre-DSM-III-R depression pre-
valence in primary care ranged from 4.8% to 8.6%.8 More recently, one of the
most comprehensive assessments of mental disorders in primary care was
conducted by the World Health Organization and involved 15 cities in 14
countries.9 Using the Composite International Diagnostic Interview (CIDI) as
the diagnostic assessment tool for DSM-III-R and ICD-10 conditions, this
study showed that the prevalence of current psychiatric disorders is 24% but
varies substantially by country.9 In particular, prevalence estimates for major
depression ranged from 2.6% in Nagasaki, Japan, to an exceptionally high
29.5% in Santiago de Chile (over 12% greater than the next highest—16.9% in
Manchester, England). The total prevalence of ICD-10 major depression was
10.4%. Although it is acknowledged that there is considerable variability
within a city or country based on the characteristics of a primary care clinic
(eg, inner-city clinics that serve disadvantaged patients may have higher
depression prevalence), and thus the findings of this study do not generalize
as national primary care prevalences, this important international study has
helped to solidify the importance of depression in primary care settings
throughout the world.
A number of studies have found significant prevalence and morbidity of
subthreshold disorders. For example, in a study of 619 primary care patients,
Backenstrass and associates10 found a prevalence of 4.6% for major depres-
sion, 6.2% for minor depression, and 9.1% for nonspecific depression symp-
toms. Levels of disability followed a similar pattern, with highest levels for
major depression and lowest levels for nonspecific depression symptoms.10
Thus, these ‘‘sub-major’’ forms of depression are not without associated
morbidity.

Primary Care is the ‘‘De Facto’’ Mental Health


Treatment System
Primary care has been termed the de facto mental health treatment system since
as many people with mental disorders receive treatment in general medical
settings as in mental health specialty settings.11,12 From Epidemiologic
164 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Catchment Area (ECA) data, it has been estimated that only 45% of those with
unipolar major depression used any health service in the 12 months prior;
27.8% sought care in the specialty mental health sector, while 25.3% sought
care in the general medical sector.11 Paralleling ECA findings, NCS-R data
have shown that 51.6% of those who met the criteria for major depression
received some health services for depression in the past 12 months, with 27.2%
in the general medical sector.13 This paper also examined symptom severity
with respect to treatment and found that only 12.8% of those in treatment in the
general medical sector were classified as mild cases—all others were moderate
and above.
It has been estimated that 50% to 80% of depression management occurs in
primary care. Harman and colleagues14 found that for older adults 64% of
depression visits occurred in primary care, representing only 3% of all elder
primary care visits, contrasted with 26% of depression visits occurring in
psychiatric care, representing 58% of all psychiatric elder visits. Thus, the
index of suspicion is likely to be low in primary care settings where the
prevalence is also low.
An analysis of National Ambulatory Medical Care Survey data showed that
for the average primary care doctor, 10.33 visits per week were considered
antidepressant medication visits, compared with 11.04 such visits for the
average psychiatrist.15 While antidepressant medication visits are slightly
higher for psychiatrists than for primary care physicians, it is likely that
primary care physicians initiate more antidepressant prescriptions but fewer
monitoring visits, while psychiatrists have fewer antidepressant-initiating
visits but more monitoring visits.

Unassisted Recognition of Depression in Primary Care


Ironically, while general medical settings are a primary venue for treating
mental disorders, a very large percentage of such disorders go unrecognized by
primary care providers and therefore untreated. Some reports suggest that
fewer than 50% of those with depression are so diagnosed in primary care
settings.16–18
The WHO primary care study found that overall, 54.2% of those who met
criteria for depression (ICD F32/33) were recognized as having a psycholo-
gical illness by their treating physician. This ranged from a low of 19.3% in
Nagasaki to a high of 74.0% in Santiago de Chile.19
Thus, studies show that depression is relatively common in primary care
settings, but many with depression go unrecognized. It is no wonder that a
number of screening tools have been developed to assist providers in recog-
nizing and diagnosing depression. Yet there are other issues to consider before
initiating screening programs.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 165

3. Is Screening for Depression in Primary Care Worthwhile?


Screening is an important aspect of prevention and early intervention for many
diseases and conditions, and this includes depression. WHO describes 10
criteria for initiating a screening program. Below, we discuss each criterion
along with issues that should be considered for clinically effective depression
screening. Because our focus is on primary care, we consider these criteria in
that context.

The Condition Should Be an Important Health Problem


With a depression prevalence of approximately 5% to 10% worldwide and 5%
to 20% in primary care settings, depression is considered an important health
problem. In addition to personal suffering, those with depression have signifi-
cantly worse functioning. Based on the landmark publication on worldwide
disability,20 Ustun and associates21 have updated earlier data and estimate that
depression was the fourth leading cause of global disease burden in the year
2000. The burden of depression on the healthcare system is equally significant.
The average medical costs (6-month period) for primary care patients in the
United States diagnosed with depression or anxiety were approximately twice
the average costs for patients with subthreshold depression or anxiety or no
disorder ($2,390 vs. $1,248),22 resulting in national annual medical costs of
approximately $26 billion (1990 dollars).23 For the most part, this burden is on
primary care in terms of recognition and treatment,24 including antidepressant
prescribing.25,26 On another level, the societal burden of depression is great,
and patients need not receive a clinical diagnosis of depression to experience
impaired functioning,27 missed workdays (at an annual national cost of $17
billion),23 and disability days,28 with impairment equal to or greater than that
found with other chronic conditions such as diabetes, arthritis, gastrointestinal
disturbances, lung disturbances, bronchitis, emphysema, and back problems.29
Thus, there is no doubt that at all levels depression is an important health and
public health problem.

There Should Be a Treatment for the Condition


A number of effective treatments exist for depression, including cognitive-
behavioral therapy and medications. In fact, the robust research basis for these
treatments has prompted a proliferation of treatment guidelines that provide
practical approaches for implementing these evidence-based practices for
primary care providers (see, for example, the Agency for Healthcare
Research and Quality website with depression guidelines).30
166 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Facilities for Diagnosis and Treatment Should Be Available


Although this tends to be setting-specific, more and more primary care practi-
tioners are recognizing their roles as first-line responders for depression
diagnosis and treatment. Additionally, many primary care practices incorpo-
rate mental health care specialists in their practice (eg, psychiatric nurse
specialist), are aligned with mental health specialists (ie, have a ready referral
source), or are part of larger healthcare organizations that incorporate mental
health services (eg, HMOs, U.S. Veterans Health Administration). Thus, when
there is a positive screen and a diagnosis of depression is made, treatment is
typically available within the practice or within a referral network.

There Should Be a Latent Stage of the Disease


Although the diagnosis of depression depends on the presence of symptoms,
the disorder can be considered to have a latent stage in the following sense.
Depression is often not detected clinically, patients do not spontaneously
report symptoms to providers, and patients themselves may not be aware that
their symptoms constitute depression. From NCS-R data, it has been estimated
that there is a delay of approximately 8 years between the onset of depression
and first receipt of professional help.31 Additionally, longstanding depression
is associated with disability as well as psychiatric and medical comorbidities,
which early detection and intervention may prevent.

There Should Be a Test or Examination for the Condition


As is detailed in the next section, a number of adequate depression screening tools
exist, including standard screeners (eg, the Zung Self-Rating Depression Scale
[SDS]),32 short screens (eg, Medical Outcomes Study Depression Screen [MOS-
D]),33 and some ultra-brief screens (eg, Patient Health Questionnaire [PHQ]-2).34
In addition, there are diagnostic interviews suitable for use in primary care, such
as the depression module of the Mini International Neuropsychiatric Interview
(M.I.N.I.),35 the Primary Care Evaluation of Mental Disorders (PRIME-MD),36
and the Symptom-Driven Diagnostic System for Primary Care (SDDS-PC).37

The Test Should Be Acceptable to the Population


Screens for depression are generally acceptable to both participants and the
staff who administer them.38,39 Diagnostic tools are lengthier and may be more
difficult for some patients; however, they are considered acceptable in terms of
risk and time. Certainly, relative to other recommended primary care screen-
ings (eg, colonoscopy), screening for depression is noninvasive, brief, and well
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 167

Burden Screening Tasks

Patient,
Screen
PC Staff

PC Staff Score

PCP Review results


– | +
Patient,
PC Staff, 2nd Stage Screen Diagnostic Work-up
PCP – + – +

Patient, Psych Education


PC Staff,
PCP
Watchful Wait Referral Treatment

Figure 9.1. Screening burden by task.

tolerated by patients, and results are relatively easy to interpret. In contrast to


some screenings such as colonoscopy and mammography, which require only a
referral from the primary care provider, depression screening typically requires
more clinician (nurse or physician) time to administer, interpret, and assess,
and (if positive) to treat or refer. Thus, the screening burden to clinicians is
significantly greater than to patients, and may well influence acceptability in
clinical practice (Fig. 9.1).

The Natural History of the Disease Should Be Adequately


Understood
Depression is known as a disorder with exacerbations and remissions.
Persistent depression is a risk factor for disability,40 both medical and psychia-
tric comorbidities,5 and suicide.41 There is evidence that early recognition and
effective treatment of depression can alter the trajectory by reducing disability
and premature mortality,42 promoting remission, and preventing relapse.43
There is also evidence suggesting that early recognition and effective treatment
of depression can improve patient outcomes such as social functioning, pro-
ductivity,44 and absenteeism.45
‘‘Sub-major’’ depression is often considered to be an integral part of the
natural course of major depression and is sometimes referred to as the pro-
dromal phase.46 Research has demonstrated that both subthreshold and sub-
syndromal depression are associated with increased functional disability47 and
have a negative impact on quality of life.48 Data from a randomized trial
of older adults (PROSPECT) show that patients initially presenting with
168 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

sub-major depression were five times more likely to have major depression
after 1 year.47 Thus, identification of these patients may help broaden the focus
of depression treatment to include a more preventive approach,49 allowing
patients to benefit from improved functional and quality-of-life outcomes and
receive more aggressive assessment and symptom monitoring to hasten recog-
nition of major depressive disorder.
Patients presenting with sub-major depression may, in fact, benefit from
treatment. Seligman and colleagues50 followed ‘‘at-risk’’ university students
and found that those randomized to receive weekly cognitive-behavioral therapy
workshop meetings had significantly fewer depressive symptoms after 8 weeks.

There Should Be an Agreed Policy on Whom to Treat


This may vary from site to site, with some advocating treatment for minor
depression and adjustment disorders with depressed mood. All clinical practice
guidelines advocate treating patients who meet the criteria for a diagnosis of
major depression. Several groups have shown that patients whose depression is
not recognized have milder forms of the disorder with less disability.51 To
some extent, treating those with ‘‘sub-major’’ depression may be a resource
issue. Some have advocated low-cost, low-intensity, nontraditional treatments
(eg, bibliotherapy, web-based self-help) where therapeutic intensity and cost
are aligned with symptom severity.52 While there may be benefit to treating
these sub-major conditions, those policy decisions should not compromise
system capacity to provide treatment for other important conditions.

The Total Cost of Finding a Case Should Be Economically


Balanced in Relation to Medical Expenditure as a Whole
Given the relatively short and inexpensive screening instruments, the availability
of structured diagnostic assessments for depression that can be administered in-
house for diagnostic follow-up, and the relatively moderate cost of treatment,
contrasted with the medical and psychiatric comorbid problems that are apt to
develop from lack of treatment, economics favor screening for depression. In a
cost-utility study, Valenstein and coworkers53 concluded that one-time
screening for depression is cost-effective, and more frequent screening is
likely to become more cost-effective with improvements in treatments.

Case-Finding Should Be a Continuous Process


Several studies have shown that depression can occur throughout the life-
span.5,54 Furthermore, it may have been present but not detected until many
years later. Thus, it makes sense to have in place a system that will screen
periodically throughout the lifespan.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 169

4. Which Screening Tool Should Be Used?


Primary care providers have a great deal to consider when selecting a screening
instrument, and there are many tools from which to choose, each with its own
set of attributes. Time is of obvious importance in primary care, and typically
the provider time to administer a screening tool and score it (rather than patient
time) is a key consideration. In the quest for brevity, screening tools have
evolved from standard screeners to short screeners to ultra-brief screeners.
Below, we consider a number of published screening tools organized by
administration time. In addition to time, we also consider scope of use,
administration/scoring, and performance.

Standard Screeners
In a recent article, Mitchell and Coyne55 defined a ‘‘standard’’ screening tool as
one that contains 15 or more items and takes, on average, more than 5 minutes
to complete. In addition to the term standard, many of these screeners can also
be defined as traditional, as many, including the Zung SDS,32 Beck Depression
Inventory (BDI),56 and Center for Epidemiologic Studies Depression Scale
(CES-D),57 have been in use since the early 1960s. Also, they have been
translated into dozens of languages and have been used in virtually every
health setting, including primary care and specialty clinics, and for research.
Table 9.4 provides details about the administration, scoring, and psychometric
performance of five ‘‘standard’’ depression screeners: the BDI,56 CES-D,57
Geriatric Depression Scale (GDS),58 Inventory for Depression (ID),59 and the
Zung SDS.32 The BDI,56 CES-D,57 and GDS58 are available in multiple,
typically shorter, versions. Some of these screeners offer situational advan-
tages over the others; for example, scoring results for the BDI and the Zung
SDS provide an estimate of symptom severity. The GDS was designed speci-
fically for use with geriatric patients. One must take these characteristics (and
others, such as self-administration and time frame of symptoms) into account
when selecting a screening tool. In general, all five of these screeners are well
suited for use in primary care settings; they are easy to administer, they are easy
to score, and they offer decent accuracy. Despite this, standard-length
screeners may seem cumbersome to some busy primary care providers who
prefer shorter alternatives.

Short Screeners
Short screeners, defined as consisting of 5 to 14 items and taking between 2 and
5 minutes to complete,55 include the Hospital Anxiety and Depression Scale
Table 9.4. Standard Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference


60–64 63 64
BDI Depression only* 7, 13, or 21 Score range: Sensitivity: 97% ; 89% (81–95) Original citation:
Severity of items* 0–63 Specificity: 99%63; 64% (59–68)64 Beck AT, Ward CH, Mock J,
symptoms today 2–5 min to Usual cut Efficiency: 0.9963 et al. An inventory for
complete point:10–19 False positive: 0.0163 measuring depression.
Literacy: Easy (mild), 20–29 False negative: 0.0063 Arch Gen Psychiatry.
Scoring: Simple (moderate), 30 LRþ: 4.2 (1.2;13.6)61; 2.564 1961;4:561–571
(severe) LR: 0.17 (0.1;0.3)61; 0.1764 www.psychcorpcenter.com/
Can be PPV: 84.0%63; 29.6% (10.7;57.6)62 content/bdi-ll.htm
self-administered AUC (95% CI): 0.87 (0.82–0.91)64
CES-D60–64 Depression only 10 or 20 items Score range: Sensitivity: 81%63; 93% (85–97)64 Original citation:
Frequency of 2–5 min to 0–60 Specificity: 72%63; 69% (65–74)64 Radloff L. The CES-D scale:
symptoms in the complete Usual cut point: Efficiency: 0.7263 A self-report depression scale
past week Literacy: Easy 16 False positive: 0.2763 for research in the general
Scoring: Simple False negative: 0.0163 population. Appl Psychol
LRþ: 3.3 (2.5; 4.4)61; 3.064 Meas. 1977;1:385–401.
Can be LR: 0.24 (0.2; 0.3)61; 0.1064 www.mhhe.com/hper/health/
self-administered PPV: 13.0%63; 24.8% (20; 30.6)62 personalhealth/labs/stress/
AUC (95% CI): 0.89 (0.85–0.92)64 activ2-2.html
GDS60,62 Depression only 15 or 30 items Score range: LRþ: 3.3 (2.4; 4.7)62 Original citation:
Endorsement of 2–5 min to 0–30 LR: 0.16 (0.1; 0.3)62 Yesavage JA, Brink TL, Rose
symptoms (y/n) in complete Usual cut point: PPV: 24.8% (19.4; 32)62 TL, et al. Development and
the past week Literacy: Easy 11 validation of a geriatric
Scoring: Simple depression screening scale: a
preliminary report.
J Psychiatr Res.
1982–83;17(1):37–49.
www.stanford.edu/
~yesavage/GDS.html
Table 9.4. (Continued)

Scope of Use Administration Scoring Performance Reference


ID60,61 Depression only 15 items Score range: Original citation: Popoff,
Recently 2–5 min to 0–15 L. M. A simple method for
complete Usual cut point: diagnosis of depression by the
Literacy: Easy 10 family physician. Clinical
Medicine. 1969 March:
24–29.
SDS60–63 Depression only 20 items Score range: Sensitivity: 100%63 Original citation: Zung, WW
Frequency of 2–5 min to 25–100 Specificity: 71%63 (1965) A self-rating
symptoms recently complete Usual cut point: Efficiency: 72%63 depression scale. Arch Gen
Literacy: Easy 50–59 (mild), False positive: 0.2863 Psychiatry 12, 63–70.
Scoring: Simple 60–69 False negative: 0.0063
(moderate), 70 LRþ: 3.3 (1.3; 8.1)62
Can be (severe) LR: 0.35 (0.2; 0.8)62
self-administered PPV: 15.0%63; 24.8% (11.5; 44.8)62
fpinfo.medicine.uiowa.edu/
calculat.htm

AUC, area under the curve; CI, confidence interval; LR, likelihood ratio; PPV, positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
172 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(HADS),65 MOS-D,33 and PHQ34 (Table 9.5). Many authors consider the
diagnostic performance of these intermediate-length screeners to range from
modest to good.55,64,66 Despite the advantage of both diagnostic performance
and brevity, a national U.K. survey demonstrated that they continue to be
underused in primary and secondary care settings.67 This lack of use may
have led to the development of even shorter screeners.

Ultra-short/Ultra-brief Screeners
What is the minimum number of items required to effectively screen for
depression? With the quest to reduce screening time, several new screening
instruments with four or fewer questions have been published. Mitchell and
Coyne have defined ultra-short/ultra-brief screeners as consisting of four or
fewer items and taking less than 2 minutes to complete (Table 9.6).55 Whooley
and colleagues64 reported data supporting a two-item screener, and the U.S.
Veterans Administration has adopted a four-item screener to satisfy a 1998
universal depression screening mandate. A meta-analysis on 22 studies that
assessed the accuracy of ultra-short screeners for depression in primary care
found that diagnostic rule-in accuracy increases with the number of items, with
two- and three-item screeners offering the greatest accuracy (80%) and one-
item screeners providing very poor accuracy (30%).55 No four-item screeners
met inclusion criteria for this analysis. The authors concluded that while two-
and three-item screeners can help providers identify 8 out of 10 depression
cases, it is most often at the expense of a high false-positive rate. They there-
fore argue for a two-stage screening approach when an ultra-brief screener is
employed.

Two-Stage Approaches
Another approach that may offer advantages in some situations or practices is
the use of a two-stage process. Screening followed by a standardized diagnostic
assessment has often been used in research projects for efficient identification
of potential subjects who meet criteria for major depression. The approach
enables investigators to avoid conducting diagnostic assessments on all sub-
jects, yet has the advantage of having screening information available on all
subjects, with diagnostic data on those above a certain screening threshold.
While in theory any screener could be combined with any acceptable
diagnostic assessment, two instruments that ‘‘package’’ both screening and
diagnosis, the SDDS-PC and PRIME-MD, were developed in the late 1990s
specifically for use in primary care settings.36, 37, 68, 69 These instruments were
intended for both clinical and research purposes. They were both designed to
Table 9.5. Short Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference


60,62
HADS Anxiety and 14 items Score range: 0–21 LRþ: 7.0 Original citation:
depression £2 min to complete Usual cut point: 11 (2.9; 11.2)62 Zigmond AS, Snaith RP. The
Severity of Literacy: Difficult LR: 0.3 (0.3; 0.4)62 Hospital Anxiety and Depression
symptoms in the Scoring: Simple PPV: 41.3% Scale. Acta Psychiatr Scand
past week (22.6; 52.8)62 1983;67:361–370.
www.clinical-supervision.com/
hads.htm
MOS-D60,61,64 Depression only 8 items Score range: 0–1 (logistic Sensitivity: 93% Original citation:
Frequency of <2 min to complete regression) (86–97)64 Burnam MA, Wells KB, Leake
symptoms in the Literacy: Average Usual cut point: 0.06 Specificity: 72% B, et al. Development of a brief
past week (68–76)64 screening instrument for
Can be LRþ: 3.364 detecting depressive disorders.
self-administered LR: 0.1064 Medical Care. 1988;26,775–789.
AUC (95% CI): 0.89
(0.85–0.91)64
PHQ60,62 Depression only 9 items Diagnosis: LRþ: 12.2 (8.4; 18)62 Original citation:
Frequency of <2 min to complete Score range: 0–9 LR: 0.28 (0.2; Kroenke K, Spitzer RL, Williams
symptoms in the Literacy: Average Usual cut point: 5 0.5)62 JBW. The PHQ-9: Validity of a
past 2 weeks Scoring: Simple symptoms PPV: 55% (45.7; brief depression severity
Severity: 64.3)62 measure. J Gen Intern Med.
Can be Score range: 0–27 2001;16:606–613.
self-administered Usual cut point: 0–4 www.depression
(none), 5–9 (mild), 10–14 primarycare.org/ap1.html
(moderate), 15–19
(major), 20 (severe)

LR, likelihood ratio; PPV, positive predictive value.


Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
Table 9.6. Ultra-Short Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference


60 63
PRIME-MD Multiple 2 items Score range: 0–2 Sensitivity: 96% Original citation: Spitzer RL,
(PHQ-2)60–63 components with 1–2 min to complete Usual cut point: 160 Specificity: 57% Williams JB, Kroenke K, et al.
depression category Literacy: Average Efficiency: 0.5963 Utility of a new procedure for
Presence of Scoring: Complex FP: 0.4163 diagnosing mental disorders in
symptoms in FN: 0.0063 primary care. The PRIME-MD
the past month Can be LRþ: 2.7 (2.0; 3.7)62 1000 study. JAMA.
self-administered LR: 0.14 (0.1; 0.3) 62 1994;272:1749–1756.
PPV: Kroenke K, Spitzer RL,
21.3% (16.7–27)62 Williams JBW. The Patient
Health Questionnaire-2:
Validity of a two-item
depression screener. Medical
Care. 2003;41:1284–1292.
SDDS- Multiple 5 items Score range: 0–560 Sensitivity: 96%64 Original citation: Broadhead
PC60–62,64 components with 1–2 min to complete Usual cut point: 260 Specificity: 51%64 WE, Leon AC, et al.
depression category Literacy: Easy LRþ: 3.5 (2.4; 5.1)62 Development and validation of
Presence of Scoring: Complex LR: 0.2 (0.1; 0.4)62 the SDDS-PC screen for
symptoms in PPV: 25.9% (19.4; multiple mental disorders in
the past month 33.8)62 primary care. Arch Fam Med.
AUC (95% CI): 0.86 1995;4:211–219.
(0.82–0.89)64

AUC, area under the curve; CI, confidence interval; FN, false negative; FP, false positive; LR, likelihood ratio; PPV, positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 175

take minimum clinician time and still provide multiple psychiatric diagnoses in
primary care. Both instruments have a quick screen (sometimes referred to as
stem questions) for multiple psychiatric disorders, followed by specific dis-
order modules when so indicated by the quick screen. Both instruments include
major depression. Time burden is placed mainly on patients for the quick
screen and clinicians for the disorder modules (but only for the subset of
patients with a high likelihood of disorder). For practices interested in only a
single disorder, the screen questions and module for that disorder can be
selected for use. Notably, the developers of the PRIME-MD developed the
PHQ (with slightly improved sensitivity and specificity for major depression)
because the PRIME-MD was still considered too long to be clinically useful.36

Screening for General Emotional Distress


One fundamental issue is whether screening should be aimed at identifying
distress rather than depression alone. There are several popular tools that
screen for nonspecific psychiatric distress, including the General Health
Questionnaire (GHQ),70 the Hopkins Symptom Checklist (HSCL),71 the
World Health Organization Well-Being Scale (WHO-5),72 and the Emotional
State Questionnaire-2 (EST-Q2)73 (Table 9.7). A prospective cohort study
found that the WHO-5, a well-being screener, performed better in a primary
care setting than the GHQ-12, PHQ-9, or an unaided physician diagnosis when
compared to the CIDI as the gold standard for detection of depression.74
Despite the broadness of this approach, brevity can be achieved by taking
advantage of shared symptomatology and diagnostic comorbidity. Thus, the
specificity of the screener for a disorder may not matter so much, and it will be
up to the provider to sort out, for example, major depression from post-
traumatic stress or other anxiety disorders. Because first-line primary care
treatments for many disorders are similar (eg, pharmacotherapy with selective
serotonin reuptake inhibitors), this approach could work reasonably well in
primary care.

Screening for Multiple Disorders


For many providers, it may be worthwhile to implement a screener that covers
many disorders, including only one or two items for each disorder. Means-
Christensen and coworkers75 tested such an approach with the Anxiety and
Depression Detector (ADD) and found that screening for panic disorder, post-
traumatic stress disorder, social phobia, generalized anxiety disorder, and
major depression simultaneously offered advantages in time efficiency while
maintaining screener performance. The SDDS-PC and PRIME-MD
Table 9.7. General Psychiatric Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference


WHO-5 Measures 5 items Sensitivity: 94% Original citations:
degree of Specificity: 65% Bech P, Gudex C, Johansen KS. The
well-being False negative: 0.06 WHO(Ten) Well-Being Index:
PPV: 0.37 validation in diabetes. Psychother
NPV : 0.98 Psychosom. 1996;65:183–190.
LRþ: 2.69 Bech P, Olsen LR, Kjoller M, et al.
LR: 0.09 Measuring well-being rather than the
absence of distress symptoms: a
comparison of the SF-36 Mental Health
subscale and the WHO-Five
Well-Being Scale. Int J Methods
Psychiatr Res. 2003;12:85–91.
GHQ60,61,63 General 12, 28, or 30 Score range: 0–28 Sensitivity: 76%63 Original citation:
psychiatric items Usual cut point: 4 Specificity: 74%63 Goldberg DP. The detection of
distress 2–10 min to Efficiency: 0.7463 psychiatric illness by questionnaire.
Frequency of complete False positive: 0.2563 London, Oxford University Press,
symptoms in the Literacy: Easy False negative: 0.0163 1972.
past week PPV: 13.0%63
Can be
self-administered
Table 9.7. (Continued)

Scope of Use Administration Scoring Performance Reference


60,61
HSCL General distress 13 or 25 items Score range: 25–100 Original citation:
Frequency of 2–5 min to Usual cut point: 43 Derogatis LR, Lipman RS, Rickels K,
symptoms in the complete Uhlenhuth EH, Covi L. The Hopkins
past week Literacy: Symptom Checklist (HSCL): a self-
Average report symptom inventory Behav Sci.
1974 Jan; 19(1):1–15.

EST-Q276 Detection of 28 items Score Range: 0–112 Sensitivity: 81%76 Aluoja A, Shlik J, Vasar V, Luuk K,
symptoms Depression Depression subscale: Specificity: 81%76 Leinsalu M. Development and
characteristic of subscale: 8 items Score Range: 0–32 False Positive: 0.1976 psychometric properties of the
depressive and Time to Usual Cutpoint: >11 False Negative: 0.1976 Emotional State Questionnaire, a self-
anxiety complete: PPV: 0.4476 report questionnaire for depression and
disorders during unknown NPV : 0.9676 anxiety. Nord J Psychiatry 1999; 53:
the past four Literacy: LRþ: 4.376 443–449.
weeks unknown LR-: 0.2376

LR , likelihood ratio; NPV, negative predictive value; PPV; positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
178 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(mentioned above) cover multiple disorders (including major depression) that


are prevalent and often undetected in primary care settings. They also cover
suicidality, an important consideration regardless of diagnosis.

Severity Rating
Depression screening instruments are important beyond case-finding.
Additional uses for certain instruments include monitoring symptom
levels (eg, frequency, severity) for ‘‘at-risk’’ patients or evaluating treat-
ment response/effectiveness. The types of screening instruments that
would be most valuable in these situations are those that provide severity
levels (eg, Zung SDS). The practice of ‘‘watchful waiting’’ (see Fig. 9.1)
involves following patients who present with symptomatology that may
be subthreshold or otherwise not sufficient for a clinical diagnosis of
depression, yet suggestive of an increased risk of developing depression
in the future. In this scenario, depression screeners can be administered
repeatedly over time to monitor symptom levels and determine symptom
changes and patterns (in much the same way that prostate-specific antigen
levels are monitored over time). Patients who have been clinically diag-
nosed with depression and are receiving treatment can be routinely admi-
nistered screeners both to assess treatment effectiveness and to determine
if additional interventions are required (the U.S. Preventive Services Task
Force recommends the PHQ-9 for this purpose).

5. Implementing Screening in Primary Care


Implementation of a screening strategy must be undertaken with both the
screening instrument performance characteristics and clinical context in
mind. In addition to considering overall staffing patterns and underlying
nonpsychiatric case mix, a key contextual issue is the estimated underlying
prevalence of depression in the clinic population. This, along with screening
instrument performance characteristics of sensitivity and specificity, allow one
to estimate resource use for various implementation strategies. Such exercises
can aid in determining the most parsimonious approach under various
scenarios.
Table 9.1a/b illustrates a one-stage screening approach using an instrument
with sensitivity and specificity both 80%. We present the results of using this
instrument under different prevalence scenarios: 5% and 10% (see Appendix
Tables 3 and 4 for additional scenarios). Assuming 5% prevalence, if 1,000
patients were screened for major depression, 230 would screen as positive, but
only 40 would actually have major depression (positive predictive value
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 179

Table 9.1a/b. Sample Performance Yields for Single-Stage Screening in Primary Care Setting
9.1a Prevalence: Low (5% or 50 MDD cases)

Gold Standard

MDD + MDD  Total

Screen 40 190 230 PPV: 40/230 =


+ True Positive False Positive Screen Positive 17.4%. For every
100 subjects who
Screen 10 760 770 screen positive, only
 False Negative True Negative Screen Negative approximately 17
50 950 1000 would be depressed.
MDD Positive MDD Negative Total Sample Excess diagnostic
burden: 190/1000 =
19%. Diagnostic
assessment would be
performed on 190
patients who were
not depressed.

9.1b Prevalence: Average (10% or 100 MDD cases)

Gold Standard

MDD + MDD  Total

Screen 80 180 260 PPV: 80/260 =


+ True Positive False Positive Screen Positive 30.8%. For every
100 subjects who
Screen 20 720 740 screen positive, only
 False Negative True Negative Screen Negative approximately 31
100 900 1000 would be depressed.
MDD Positive MDD Negative Total Sample Excess diagnostic
burden: 180/1000 =
18%. Diagnostic
assessment would be
performed on 180
patients who were
not depressed.

n = 1,000; Screener sensitivity 80%, specificity 80%

[PPV] 17.4%). That means that 190 false-positive patients would undergo
diagnostic assessment for major depression—an excess diagnostic burden of
19% (190/1,000). From this chart, it can be seen that as prevalence increases,
PPV also increases and excess diagnostic burden declines.
Table 9.2a/b illustrates a two-stage approach using an initial screener with
sensitivity of 95% and specificity of 60% and a follow-up screener of
180 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

sensitivity 80% and specificity 80%. Assuming prevalence of 5%, of the 1,000
patients screened in the first stage, 428 will be positive, of whom 48 are true
positive (PPV 11.2%). In stage two, these 428 are screened again, yielding 38
true positives of 114 screen positives for a more favorable PPV of 33.3%. The
cumulative yield from steps 1 and 2 combined would be 38 true positives (76%
sensitivity) and 874 true negatives (92%), with a PPV of 33% and a negative
predictive value (NPV) of 99%.

Table 9.2a/b. Sample Performance Yields for Two-Stage Screening in Primary Care
Setting
9.2a Depression Prevalence: Low (5% or 50 MDD cases)

Stage I
Gold Standard

MDD + MDD  Total

Screen 48 380 428 PPV: 48/428 = 11.2%.


+ True Positive False Positive Screen Positive For every 100 subjects
who screen positive,
Screen 2 570 572 approximately 11
 False Negative True Negative Screen Negative would be depressed.
50 950 1000
MDD Positive MDD Negative Total Sample

Stage II
Gold Standard

MDD + MDD  Total

Screen 38 76 114 PPV: 38/114 = 33.3%.


+ True Positive False Positive Screen Positive For every 100 subjects
who screen positive,
Screen 10 304 314 approximately 33
 False Negative True Negative Screen Negative would be depressed.
48 380 428 Overall excess
MDD Positive MDD Negative Total Sample diagnostic burden:
76/1,000 = 7.6%.
Diagnostic
assessment would be
performed on 76
patients who were not
depressed.
9.2b Depression Prevalence: Average (10% or 100 MDD cases)

Stage I
Table 9.2a/b. (Continued)

Gold Standard

MDD + MDD  Total

Screen 95 360 455 PPV: 95/455 = 20.9%.


+ True Positive False Positive Screen Positive For every 100 screen
positives,
Screen 5 540 545 approximately 21
 False Negative True Negative Screen Negative would be depressed
100 900 1000
MDD Positive MDD Negative Total Sample
Stage II
Gold Standard

MDD + MDD  Total

Screen 76 72 148 PPV: 76/148 = 51.4%.


+ True Positive False Positive Screen Positive For every 100 screen
positives,
Screen 19 288 307 approximately 51
 False Negative True Negative Screen Negative would be depressed
95 360 455 Overall excess
MDD Positive MDD Negative Total Sample diagnostic burden: 72/
1,000 = 7.2%.
Diagnostic
assessment would be
performed on 72
patients who were not
depressed.

n = 1,000; stage I screener sensitivity 95%, specificity 60%; stage II screener sensitivity 80%,
specificity: 80%

Table 9.3 assigns time costs to the various screening tasks, as well as
diagnostic assessment for screen-positive patients. In this table, we estimate
patient, staff, and clinician time under the various screening scenarios
(prevalence of 5%, 10%, 20%; one- and two-stage screening approaches).
We assume the same sensitivity and specificity of the screening instruments
as in Table 9.2. In a sample of 1,000 patients where the prevalence of
depression is 5%, we estimate the burden in patient time for a single-stage
screener to be 6,600 minutes. This is based on an estimate of 2 minutes per
patient for the initial screen, with 20 additional minutes for each screen-
positive patient. We estimate 2,000 minutes of non-physician staff time
(based on 2 minutes per patient) and 4,600 minutes of clinician time
(based on 20 minutes per screen-positive patient). In the single-screener

181
182 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Table 9.3. Screening and Diagnosis Time Burden for Patients, Staff, and Providers

Time Burden (min) MDD Prevalence

5% 10% 20%
A. Single-Stage Screening Approach
Sensitivity 80%, specificity 80%
Screening (patient) 2,000 (1000*2) 2,000 (1000*2) 2,000 (1000*2)
Scoring (staff) 2,000 (1000*2) 2,000 (1000*2) 2,000 (1000*2)
Screening yield 23.0% (230/1000) 26.0% (260/1000) 32.0% (320/1000)
Diagnostic interview
Patient 4,600 (230*20) 5,200 (260*20) 6,400 (320*20)
Provider 4,600 (230*20) 5,200 (260*20) 6,400 (320*20)
Positive predictive value 17.4% (40/230) 30.8% (80/260) 50% (160/320)
Total time
Patient 6,600 min 7,200 min 8,400 min
Staff 2,000 min 2,000 min 2,000 min
Provider 4,600 min 5,200 min 6,400 min

B1. Two-Stage Screening Approach: Stage I


Sensitivity 95%, specificity 60%
Screening (patient) 1000 (1000*1) 1000 (1000*1) 1000 (1000*1)
Scoring (staff) 1000 (1000*1) 1000 (1000*1) 1000 (1000*1)
Screening yield 42.8% (428/1000) 45.5% (455/1000) 51.0% (510/1000)

B2. Two-Stage Screening Approach: Stage II


Sensitivity 80%, specificity 80%
Screening (patient) 856 (428*2) 910 (455*2) 1,020 (510*2)
Scoring (staff) 856 (428*2) 910 (455*2) 1,020 (510*2)
Screening yield 26.6% (114/428) 32.5% (148/455) 42.4% (216 /510)
Diagnostic interview
Patient 2,280 (114*20) 2,960 (148*20) 4,320 (216*20)
Provider 2,280 (114*20) 2,960 (148*20) 4,320 (216*20)
Positive predictive value 33.3% (38/114) 51.4% (76/148) 70.4% (152/216)
Total time
Patient 4,136 min 4,870 min 6,340 min
Staff 1,856 min 1,910 min 2,020 min
Provider 2,280 min 2,960 min 4,320 min
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 183

example, patient burden and provider burden increase with increasing


prevalence.
We provide similar time estimates for the two-stage screening approach.
Here, patient burden decreases (because the initial screener is half the time of
the more comprehensive second-stage screener), staff burden decreases
slightly for prevalences of 5% and 10% but increases slightly for 20% pre-
valence, and provider time decreases significantly (because there are fewer
false positives to evaluate). In the above examples, we have emphasized
tangible costs and have not estimated costs of non-detection (false negative)
or costs of treatment.

6. What Developments Are on the Horizon?


Opinions concerning the appropriateness of screening for depression in
primary care have shifted over the past two decades. As more effective
treatments have become available to primary care providers, and as provi-
ders have become more knowledgeable about the importance of recognizing
and treating depression, there has been a shift towards advocating routine
screening in primary care settings. Many patients, however, are still
unwilling to accept a diagnosis of depression or treatment for depression.
Clinicians need to explain screening and diagnostic results in a way that is
non-stigmatizing. Providers must offer educational information and moti-
vate patients to accept treatment. Building depression treatment capabilities
may increase patient acceptance of both the diagnosis and treatment, as
treatment in primary care is seen as less stigmatizing, more timely, and more
integrated into overall healthcare.
Over the past two decades, remarkable progress has been made in screening
for depression in primary care. This can be seen in the change in U.S.
Preventive Services Task Force guidance,30 which recommends screening
adults ‘‘in clinical practices with systems in place to assure accurate diagnosis,
effective treatment, and follow-up.’’ It can also be seen in the myriad of
guidelines for detecting and treating depression in clinical practice, and the
tools that have been developed to assist in this. Clear advances have come in
reducing the burden of screening tools, so that some instruments with excellent
performance characteristics are as short as two questions. With increasing
acceptance of depression as a treatable illness to both patients and providers,
parallel gains need to be made in terms of implementation of screening and
early detection practices.
Further efficiencies in the screening benefit–cost ratio will need to be made
by improving treatment outcomes or by reducing screening time.
Psychometricians will be hard-pressed to develop briefer screening tools
184 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

than the current group of ultra-brief instruments described in Table 9.1, but
new methods might be able to focus on those who most need help. For
example, New Zealand researchers have found that following the two
PRIME-MD screening items with the question, ‘‘Is this something with
which you would like help?’’ improves specificity from 78% to 89% (sensi-
tivity remained the same at 96%) (CIDI was the gold standard).77 This would
theoretically improve efficiencies by selecting patients who are likely to accept
treatment for depression.
Another possibility is to reduce clinician and staff time by modifying the
screening modality. For example, waiting rooms could contain carrels with
computers where patients could update their histories and answer screening
questions. Notable results could be flagged and printed for clinicians to address
with the patient in the exam room. Similarly, patients could undertake similar
updates and self-screens on their home computers, again with results going
automatically and confidentially to providers. Automated computer reminders
for clinicians to perform depression (and other) screens could also improve
efficiency, as would making use of trained nurses to administer the screens and
flag positive results for the provider (see Chapter 8 for further discussion).
Depending on the practice, a two-stage screening process is another possibility,
using extremely brief first-level screens followed by more intense second-level
diagnostic assessments when indicated. Such second-level assessments could
even include self-administered instruments that are considered diagnostic in
nature, such as the PRIME-MD or the SDDS.
Considering depression screening in the context of other psychiatric illness
may broaden our notion of screening effectiveness. For example, depressive
symptoms frequently co-occur with generalized anxiety, post-traumatic stress,
and substance use disorders. False-positive screening results for depression
may be less worrisome in that such patients may be positive for any of these
three. Thus, a positive screen—though false positive for depression—may in
essence correctly identify patients in need of mental health treatment, even if
not for depression.
Timing is yet another way to improve efficiency. Screening less often (eg,
every 2 to 5 years instead of every year) would minimize the cost of the
screening itself but at the expense of a lower detection rate. Approaches could
be developed that take into account patient profiles to target screening to
those at highest risk. Similarly, prior screening results (eg, subthreshold
scores, positive screens for other mental health conditions, or answers to
highly predictive questions) could be used to generate a screening frequency
algorithm. In an age with electronic medical records and computer-generated
clinical reminders, the ability to develop and implement such frequency
algorithms based on individual profiles may not be as far away as it once
seemed.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 185

7. Conclusions
Screening for depression in primary care has changed radically in the past 20
years. With improvements in depression treatment, reduced stigmatization,
better acceptance of depression as a treatable illness, and more efficient
screening tools, primary care providers have embraced the notion that they
are responsible for recognizing and treating this condition. Fortunately,
providers have many excellent screening tools from which to choose. For
additional efficiencies to be realized, advances in technology (eg, compu-
terized screening and scoring), along with improved treatment outcomes,
will need to take place to change the benefit–cost ratio for depression
screening even more favorably.

References
1. Institute of Medicine (IOM). A manpower policy for primary health care. Washington,
DC: National Academy of Sciences, 1978.
2. Starfield B. Primary care: concept, evaluation, and policy. New York: Oxford
University Press, 1992.
3. Culpepper L. The active management of depression. J Fam Pract. 2002;51:769–776.
4. Mechanic D, McAlpine DD, Rosenthal M. Are patients’ office visits with physicians
getting shorter? N Engl J Med. 2001;344(3):198–204.
5. Kessler RC, Chiu WT, Demler O, et al. Prevalence, severity, and comorbidity of
12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch
Gen Psychiatry. 2005;62(6):617–627.
6. Alonso J, Angermeyer MC, Bernert S, et al. 12-Month comorbidity patterns and
associated factors in Europe: results from the European Study of the Epidemiology of
Mental Disorders (ESEMeD) project. Acta Psychiatr Scand. 2004;109(s420):28–37.
7. Wittchen H-U, Jacobi F. Size and burden of mental disorders in Europe—a critical
review and appraisal of 27 studies. Eur Neuropsychopharmacol. 2005;15(4):357–376.
8. Depression Guideline Panel. Depression in primary care, vol 1. Detection and
diagnosis. Clinical Practice Guideline, No. 5. Rockville, MD: DHHS Pub Hlth Serv.
AHCPR Publication No. 93–0550, 1993.
9. Goldberg D, Lecrubier Y. Chapter 4.1. Form and Frequency of Mental Disorders across
Centres. In Mental illness in general health care: an international study. Chichester:
John Wiley and Sons, 1995.
10. Backenstrass M, Frank A, Joest K, et al. A comparative study of nonspecific depressive
symptoms and minor depression regarding functional impairment and associated
characteristics in primary care. Compr Psychiatry. 2006;47(1):35–41.
11. Regier D, Goldberg I, Taube C. The de facto US mental and addictive disorders service
system. Epidemiologic Catchment Area prospective 1-year prevalence rates of
disorders and services. Arch Gen Psychiatry. 1993;50:85–94.
12. Regier D, Narrow W, Rae D, et al. The de facto US mental health services system: a
public health perspective. Arch Gen Psychiatry. 1978;35(6):685–693.
13. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive
disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA.
2003;289(23):3095–3105.
186 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

14. Harman JS, Veazie PJ, Lyness JM. Primary care physician office visits for depression
by older Americans. J Gen Intern Med. 2006;21(9):926–930.
15. Pincus HA, Tanielian TL, Marcus SC, et al. Prescribing trends in psychotropic
medications: primary care, psychiatry, and other medical specialties. JAMA.
1998;279(7):526–531.
16. Bridges K, Goldberg D. Somatic presentation of DSM III psychiatric disorders in
primary care. J Psychosom Res. 1985;29:563–569.
17. Magruder-Habib K, Zung W, Feussner J. Improving physicians’ recognition and
treatment of depression in general medical care: results of randomized clinical trial.
Med Care. 1990;28(3):239–250.
18. Wilson D, Widmer R, Cadoret R, et al. Somatic symptoms: a major feature of
depression in a family practice. J Affective Disorders. 1983;5:299–307.
19. Ustun T, Von Korff M. Chapter 4.3. Primary mental health services: access and
provision of care. In Mental illness in general health care: an international study.
Chichester: John Wiley and Sons, 1995.
20. Murray C, Lopez A, eds. The global burden of disease: a comprehensive assessment of
mortality and disability from diseases, injuries and risk factors in 1990 and projected to
2020. Cambridge, MA: Harvard University Press on behalf of the World Health
Organization and the World Bank, 1996.
21. Ustun TB, Ayuso-Mateos JL, Chatterji S, et al. Global burden of depressive disorders in
the year 2000. Br J Psychiatry. 2004;184:386–392.
22. Simon G, Ormel J, VonKorff M, et al. Health care costs associated with depressive and
anxiety disorders in primary care. Am J Psychiatry. 1995;152(3):352–357.
23. Greenberg P, Stiglin L, Finkelstein S, et al. Depression: a neglected major illness. J Clin
Psychiatry. 1993;54(11):419–424.
24. Manderscheid RW, Rae DS, Narrow WE, et al. Congruence of service utilization
estimates from the Epidemiologic Catchment Area Project and other sources. Arch
Gen Psychiatry. 1993;50(2):108–114.
25. Beardsley RS, Gardocki GJ, Larson DB, et al. Prescribing of psychotropic medication by
primary care physicians and psychiatrists. Arch Gen Psychiatry. 1988;45(12):1117–1119.
26. Simon GE, VonKorff M, Wagner EH, et al. Patterns of antidepressant use in community
practice. Gen Hosp Psychiatry. 1993;15(6):399–408.
27. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed
patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919.
28. Broadhead WE, Blazer DG, George LK, et al. Depression, disability days, and days lost
from work in a prospective epidemiologic survey. JAMA. 1990;264(19):2524–2528.
29. Wells KB, Golding J, Burnam MA. Psychiatric disorder and limitations in physical
functioning in a general population. Am J Psychiatry. 1988;145:712–717.
30. Guide to clinical preventive services. AHRQ Publication No. 06–0588, Agency for
Healthcare Research and Quality, Rockville, MD. Available at: http://www.ahrq.gov/
clinic/pocketgd.htm. 2006.
31. Wang PS, Berglund P, Olfson M, et al. Failure and delay in initial treatment contact
after first onset of mental disorders in the National Comorbidity Survey Replication.
Arch Gen Psychiatry. 2005;62(6):603–613.
32. Zung, WW (1965) A self-rating depression scale. Arch Gen Psychiatry 12, 63–70.
33. Burnam MA, Wells KB, Leake B, & Landsverk J (1988). Development of a brief
screening instrument for detecting depressive disorders. Medical Care, 26, 775–789.
34. Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: Validity of
a two-item depression screener. Medical Care. 2003;41:1284–1292.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 187

35. Lecrubier Y, Sheehan DV, Weiller E, Amorim P, Bonora I, Sheehan K Harnett, Janavs J
and Dunbar GC (1997) The Mini International Neuropsychiatric Interview (MINI). A
short diagnostic structrued interview: reliability and validity according to the CIDI. Eur
Psychiat 12, 224–231.
36. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care. The PRIME-MD 1000 study. JAMA.
1994;272:1749–1756.
37. Broadhead WE, Leon AC, et al. Development and validation of the SDDS-PC screen
for multiple mental disorders in primary care. Arch Fam Med. 1995;4:211–219.
38. Bermejo I, Niebling W, Berger M, et al. Patients’ and physicians’ evaluation of the
PHQ-D for depression screening. Primary Care and Community Psychiatry.
2005;10(4):125–131.
39. Loerch B, Szegedi A, Kohnen R, et al. The primary care evaluation of mental disorders
(PRIME-MD), German version: a comparison with the CIDI. J Psychiatr Res.
2000;34(3):211–220.
40. Ormel J, Von Korff M, Oldehinkel A, et al. Onset of disability in depressed and non-
depressed primary care patients. Psychol Med. 1999;29:847–853.
41. American Psychiatric Association. Diagnostic and statistical manual of mental
disorders. Vol. 4th ed., text rev. Washington, DC, 2000.
42. Sherman L. Depression and medical illness. Audio Digest Psychiatry. 2004;33(16):1–6.
43. Halfin A. Depression: the benefits of early and appropriate treatment. Am J Manag
Care. 2007;13:S92–S97.
44. Coulehan J, Schulberg H, Block M, et al. Treating depressed primary care patients
improves their physical, mental, and social functioning. Arch Intern Med.
1997;157:1113–1120.
45. Rost K, Smith J, Dickinson M. The effect of improving primary care depression
management on employee absenteeism and productivity. A randomized trial. Med
Care. 2004;42:1202–1210.
46. Eaton WW, Badawi M, Melton B. Prodromes and precursors: epidemiologic data
for primary prevention of disorders with slow onset. Am J Psychiatry.
1995;152(7):967–972.
47. Lyness JM, Heo M, Datto CJ, et al. Outcomes of minor and subsyndromal depression
among elderly patients in primary care settings. Ann Intern Med. 2006;144(7):496–504.
48. Wells KB, Burnam MA, Rogers W, et al. The course of depression in adult outpatients.
Arch Gen Psychiatry. 1992;49:788–794.
49. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive
disorder: a systematic review of prospective studies. Acta Psychiatr Scand.
2004;109(5):325–331.
50. Seligman MEP, Schulman P, DeRubeis RJ, et al. The prevention of depression and
anxiety. Prevention & Treatment. 1999;2(1).
51. Simon G, Goldberg D, Tiemens B, et al. Outcomes of recognized and unrecognized
depression in an international primary care study. Gen Hosp Psychiatry.
1999;21(2):97–105.
52. Magruder KM, Calderone GE. Public health consequences of different
thresholds for the diagnosis of mental disorders. Compr Psychiatry. 2000;41(2,
Supplement 1):14–18.
53. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in
primary care. Ann Intern Med. 2001;134(5):345–360.
188 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

54. Pálsson S, Östling S, Skoog I. The incidence of first-onset depression in a population


followed from the age of 70 to 85. Psychol Med. 2001;31:1159–1168.
55. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect
depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J
Gen Practice. 2007;57:144–151.
56. Beck AT, Ward CH, Mock J, Erbaugh J. An inventory for measuring depression.
Archives of General Psychiatry. 196;4:561–571.
57. Radloff, L. The CES-D scale: A self-report depression scale for research in the general
population. Appl Psychol Meas 1:385–401, 1977.
58. Yesavage JA, Brink TL, Rose TL, Lum O, Huang V, Adey M, Leirer VO. Development
and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr
Res. 1982–83;17(1):37–49.
59. Popoff LM. A simple method for diagnosis of depression by the family physician.
Clinical Medicine. 1969 March:24–29.
60. Williams JW, Pignone M, Ramirez G, Perez Stellato C. Identifying depression in
primary care: a literature synthesis of case-finding instruments. General Hospital
Psychiatry 2002;24(4):225–237.
61. Mulrow CD, Williams JW, Gerety MB, Ramirez G, Montiel OM, Kerber C. Case-
Finding Instruments for Depression in Primary Care Settings. Ann Intern Med
1995;122(12):913–921.
62. Nease DE, Jr., Malouin JM. Depression screening: a practical strategy. (Applied
evidence: research findings that are changing clinical practice). Journal of Family
Practice 2003;52(2):118(8).
63. McAlpine DD, Wilson AR. Screening for depression in primary care: what do we still
need to know? Depression & Anxiety (1091–4269) 20041;19(3):137–145.
64. Whooley M, Avins A, Miranda J, et al. Case-finding instruments for depression: two
questions are as good as many. J Gen Intern Med. 1997;12(7):439–445.
65. Zigmond AS, Snaith RP. The hospital anxiety and depression scale, Acta Psychiatr
Scand 1983;67:361–70.
66. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility
of depression screening. Psychiatr Serv. 1998;49(1):55–61.
67. Gilbody S, Whitty P, Grimshaw J, et al. Improving the recognition and management of
depression in primary care. Effective Health Care Bull. 2002;7(5).
68. Weissman MM, Broadhead WE, Olfson M, et al. A diagnostic aid for detecting
(DSM-IV) mental disorders in primary care. Gen Hosp Psychiatry.
1998;20(1):1–11.
69. Weissman M, Olfson M, Leon AC, et al. Brief diagnostic interviews (SDDS-PC) for
multiple mental disorders in primary care: a pilot study. Arch Fam Med.
1995;4(3):220–227.
70. Goldberg DP. The detection of psychiatric illness by questionnaire. London, Oxford
University Press, 1972.
71. Derogatis LR, Lipman RS, Rickels K, Uhlenhuth EH, Covi L. The Hopkins Symptom
Checklist (HSCL): a self-report symptom inventory Behav Sci. 1974 Jan;19(1):1–15.
72. Bech P, Olsen LR, Kjoller M, Rasmussen NK: Measuring well-being rather than the
absence of distress symptoms: a comparison of the SF-36 Mental Health subscale and
the WHO-Five Well-Being Scale. Int J Methods Psychiatr Res 12:85–91, 2003.
73. Aluoja A, Shlik J, Vasar V, Luuk K, Leinsalu M. Development and psychometric
properties of the Emotional State Questionnaire, a self-report questionnaire for
depression and anxiety. Nord J Psychiatry 1999;53:443–449.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 189

74. Henkel V, Mergl R, Kohnen R, et al. Identifying depression in primary care: a


comparison of different methods in a prospective cohort study. BMJ.
2003;326(7382):200–201.
75. Means-Christensen AJ, Sherbourne CD, Roy-Byrne PP, Craske MG, and Stein MB.
Using five questions to screen for five common mental disorders in primary care:
diagnostic accuracy of the Anxiety and Depression Detector. General Hospital
Psychiatry 2006; 28(2): 108–111.
76. Ööpik P, Aluoja A, Kalda R, et al. Screening for depression in primary care. Fam Pract.
2006;23(6):693–698.
77. Arroll B, Goodyear-Smith F, Kerse N, et al. Effect of the addition of a ‘‘help’’ question
to two screening questions on specificity for diagnosis of depression in general practice:
diagnostic validity study. BMJ. 2005;331:884–887.
This page intentionally left blank
10
SCREENING FOR DEPRESSION IN MEDICAL
SETTINGS: ARE SPECIFIC SCALES USEFUL?

Gordon Parker and Matthew Hyett

1. An Introductory Logic
2. Depression in the Medically Ill
3. ‘‘False-Positive’’ Depression Reflecting Confounding by
Physical Symptoms Associated with Medical Illness
4. Screening Measures Used to Assess Depression in the Medically Ill
5. Discussion

Context
There are two broad strategies for screening and quantifying depression in
medical settings. The first approach is replying upon measures developed in
psychiatric samples, and the second approach is to concede that symptoms are
substantially different and to develop customized scales. Here we discuss the
merits of several specific scales for measuring depression in physical settings
and make the case for scales tailored to specific populations. A subsequent
chapter (Babaei and Mitchell) will present a contrasting position.

1. An Introductory Logic
There are two broad strategies for screening and quantifying depression in
medical settings. The first approach involves using measures developed in
psychiatric samples and assuming that their relevance holds. The second
approach is to concede that there are intrinsic limitations to extrapolating
those ‘‘general’’ measures to medically ill populations. In the former case the
hypothesis is that symptoms of depression are essentially the same when

191
192 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

depression occurs with and without physical illness. In the latter case the
hypothesis is the symptoms are substantially different. Pursuing the latter,
there are two key concerns.
Firstly, such an approach assumes some constancy to the nature of depres-
sion across differing psychiatric and medical settings. Depression, however, is
difficult enough to define in psychiatric patient samples. Even ignoring the
debate as to whether depression is viewed as comprising a set of subtypes or is
best modeled along a continuum, quantifying clinical depression remains
problematic, as detailed elsewhere in this book. Over the past few decades,
clinical depression has most commonly been viewed as synonymous with
major depression, but, as numerous studies have shown, comparable sympto-
matic distress and disability associated with major depression and minor
depression—and even with subsyndromal depression1,2—begs an obvious
question: Can imposing a cutoff score on a dimensional measure of depression
accurately distinguish true cases and true non-cases in a psychiatric sample?
Further, assuming that a cutoff is derived with an acceptable classification rate,
can we extrapolate decision rules derived from psychiatric samples to screen
and quantify depression caseness in the medically ill? As measures that have
been widely used for decades (such as the Zung and the Beck Depression
Inventory) generate widely differing cutoff scores across psychiatric, general
practice, and medical settings, there would appear to be quantitatively and
possibly qualitative differences to the nature of depression in medical contexts,
making general measure extrapolation problematic.
The second issue of concern is a methodologic one. Many measures used to
assess depression in psychiatric samples weight features such as fatigue,
anergia, anhedonia, and loss of interest, as well as appetite and sleep changes.
However, it is quite possible for nondepressed patients with a medical illness to
rate positively on such items purely as a consequence of their physical problem
or of the drugs being used to treat the medical condition, or even of being
hospitalized. Such confounding clearly risks false-positive scores, which then
will inflate case identification and severity estimates. This issue also requires
some consideration.

2. Depression in the Medically Ill


The 12-month prevalence and odds of major depression are high in individuals
with chronic medical conditions, and major depression is associated
with significant increases in utilization, lost productivity, and functional dis-
ability.3 Those with a medical illness may have a co-occurring depressive
illness (melancholic or nonmelancholic) that is similar in all regards to
those depressive conditions observed in a psychiatric context. However,
many with a medical illness will more have a grief-like reaction to the
10 ARE SPECIFIC SCALES USEFUL? 193

medical illness per se. Here, instead of experiencing the primary defining
feature of depression—a loss in self-esteem or self-worth—as might be
expected for an individual with clinical depression, they may more be grieving
the loss of their previous healthy role and have no impairment of self-esteem.
In addition, medical illness itself can cause psychological features approaching
the phenomenology of depression. Cassell4 has emphasized (i) disconnection
from the usual world, (ii) a loss of the sense of indestructibility or omnipotence,
(iii) a loss of competence and completeness of reason, and (iv) a loss of control
of the sufferer’s world. He notes that, as illness deepens, medically ill people
become more and more withdrawn from their usual world, their previous
interests, friends, and families, reflecting that, ‘‘We exist to the extent that
we are connected.’’ When medically ill patients experience such feelings, they
will frequently develop irritability, anxiety, fear, and even depression. The
disconnection can occur rapidly after events such as a myocardial infarction or
severe trauma, or be gradual following the development of a chronic disease or
long-term illness.4 The loss of the sense of omnipotence is commonly handled
by denial and/or disavowal as the individual seeks to preserve his or her
intactness. The loss of control—where the patient perceives himself or herself
as helpless—can be one of the most distressing of human experiences.
According to Cassell,4 such features are illness. While they sometimes approx-
imate to depressive phenomenology, they can be distinguished by careful
clinical inquiry—but not always by simple screening measures. In essence,
there is a distinction between the experiential components of illness and
depression. Thus, in screening for depression in the medically ill, there is a
need to ensure that items are not confounded by questions that risk false-
positive responses emerging from those with a nondepressive illness.

3. ‘‘False-Positive’’ Depression Reflecting Confounding by


Physical Symptoms Associated with Medical Illness
As noted earlier, individuals with many medical conditions might be expected
to report features such as loss of interest, anergia, and sleep and appetite
disturbances, which, if secondary to the medical illness and not a reflection
of depression, will tend to inflate depression estimates.
A number of options have been proposed to redress such confounding
influences. Several authors5 have argued for an inclusive approach. Here,
every relevant depressive symptom is counted even if secondary to the
illness or its management, with or without subsequent adjustment to
threshold scores to calibrate caseness estimates. A contrasting exclusive
approach6 ignores features common to those with medical illness. A third
substitutive approach7 involves substituting psychological symptoms (eg,
tearfulness and social withdrawal) or vegetative symptoms (eg, weight loss,
194 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

appetite and sleep disturbance, fatigue, and concentration difficulties).


Fourthly, both DSM-III-R and DSM-IV decision rules allow an etiologic
approach, whereby symptoms are counted only if they are judged as not
being caused by a general medical condition. The last approach requires the
rater to make interpretative (and thus subjective) judgments.
Common sense would suggest that there would be advantages to having
measures of depression in medical settings that assess items defining depres-
sion per se and that are unlikely to be confounded by aspects of the medical
illness or of its treatment. Such an approach therefore rejects the use of general
depression measures, and argues for consideration only of measures that have
been designed to preempt confounding influences. We now review measures
that have been widely advanced and/or specifically designed for measuring
depression in medical settings.

4. Screening Measures Used to Assess Depression in the


Medically Ill
The Hospital Anxiety Depression Scale
This seven-item subscale (HADS8) is one of the most commonly used research
measures of depression in the medically ill. As the authors judged anhedonia to
be a central feature of depression and a predictor of antidepressant drug
response, five of its seven items assess anhedonia (eg, ‘‘I still enjoy the
things I used to enjoy’’; ‘‘I feel cheerful’’; ‘‘I have lost interest in my appear-
ance’’; ‘‘I look forward with enjoyment to things’’; and ‘‘I can enjoy a good
book or program’’), suggesting some redundancy. For this dimensional mea-
sure, the authors suggested that a cutoff score of 11 or more indicates a definite
case of depression, while noncases score less than 8 and doubtful cases score in
the 8 to 10 range. While this scale is widely used, Hermann9 noted that there is
‘‘still no comprehensive documentation of its psychometric properties,’’ while
its actual validity has been challenged in both medically unwell and psychiatric
patients.10 The positive predictive value (PPV) of the HADS in the latter study
showed poor discrimination, with only 17% of medically ill patients accurately
diagnosed at a cutoff of 8, rising to just 25% at a cutoff of 11. Moreover, a
recent review of the validity of the HADS11 identified differing optimal cutoffs
across differing primary care populations, suggesting that its case-finding
ability is dependent on sample characteristics. For instance, its use in general
practice settings revealed areas under the curve in the range of 0.84 to 0.96,
though its translation to more specific medical settings (eg, stroke clinics)
reveals uncharacteristically low case-finding cutoffs (ie, 4). Thus, the
validity of the HADS as a measure of depression in divergent medical settings
lacks support.
10 ARE SPECIFIC SCALES USEFUL? 195

The Beck Depression Inventory for Primary Care (BDI-PC)


This seven-item measure12 was developed for primary care (and therefore
medical settings) by removing somatic items from the well-established Beck
Depression Inventory.13 Sadness and loss of pleasure or anhedonia were
included on an a priori basis, as at least one of these symptoms is necessary
for a DSM-IV diagnosis of major depression. Suicidal ideation was also chosen
on an a priori basis, being judged as an important clinical indicator of suicidal
risk. The remaining four items—pessimism, past failure, self-dislike, and self-
criticalness—were derived empirically from data obtained from a study of 500
psychiatric patients. A cutoff score of 4 or more is used to define depression
caseness, with sensitivity and specificity being quantified at 82% to 99%
across medical inpatient and outpatient samples.14–16 In a head-to-head com-
parison of the BDI-PC and HADS depression measures, the former was shown
to be superior in distinguishing depressed nondepressed patients referred to a
consultation-liaison service.12

The Depression in the Medically Ill (DMI) Scales


These scales were developed by our research team17 with the objective of
developing a valid measure of depression in the medically ill by focusing on
cognitive symptoms. In comparison to Beck’s strategy of stripping somatic
items from an accepted measure of depression, we adopted a ‘‘bottom-up’’
approach of specifically studying those with medical illness to generate
possible salient constructs. In essence, we selected 81 items assessing the
impact of a medical illness on the individual4 as well as ones capturing
cognitive aspects of depression (eg, anhedonia, self-reproach, nonreactive
mood).
Items were scored by subjects on a three-point scale (‘‘not true at all,’’ ‘‘true
to some degree,’’ ‘‘very true’’) for the previous 2 to 3 days. The initial study
population comprised inpatients and outpatients of a large Sydney teaching
hospital being treated for a primary medical condition. A research psychiatrist
subsequently (i) made a dimensional estimate of any depression and (ii) judged
whether there was any current depression of clinical significance (ie, major
depression or an adjustment disorder with depressed mood). A number of the
subjects also completed the HADS and BDI-PC measures so that the compara-
tive properties of the measures could be examined.
We refined the initial 81-item measure by removing items affirmed by both
depressed and nondepressed subjects. We also deleted items that, while
weighted to depression, had a low prevalence (eg, suicidal ideation), resulting
in a final set of 16 items. Of interest, the measure did not appear limited to
196 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

assessing depressed mood, including a brooding item and two items having
anxiety connotations (fearfulness and insecurity).
The internal consistency of the derived DMI-16 was high (alpha = 0.95), and
total score measures correlated highly with the BDI-PC (0.80) and HADS
(0.72) measures. When correlated against depression severity as estimated by
the psychiatrist, the DMI-16 returned a high coefficient (0.74), slightly in
excess of the BDI-PC (0.68) and superior to the HADS (0.54). A receiver
operating characteristic curve (ROC) analysis derived a cutoff score of 18 or
more with both high sensitivity (100%) and specificity (96%). Of the 29
subjects who received a psychiatrist-rated judgment of a clinically significant
depression, the DMI-16 cutoff discriminated highly (kappa = 0.91) and was
superior to the BDI-PC (0.68) and the HADS (0.57).
In a second study18 involving a larger sample of hospitalized medically ill
patients, we derived a briefer version (the DMI-10) and further examined its
properties (along with the DMI-16) in comparison to the BDI-PC and HADS
measures. While anhedonia is included in the DMI-16, as it was affirmed by a
significant number of nondepressed medically ill subjects, it was excluded
from the DMI-10 measure. Analysis against clinically judged caseness estab-
lished similar overall classification rates for the DMI-10 and DMI-18 mea-
sures, comparable to that derived for the BDI-PC but superior to the HADS
measure. In this study, the formally recommended HADS cutoff of 8 or more
for a probable case was also the optimal cutoff suggested by our ROC analysis
using clinical judgment as a criterion. The recommended HADS cutoff of 11 or
more for a definite case, however, showed low sensitivity. Our ROC analysis of
the BDI-PC established a cutoff score of 5 or more, close to its recommended
cutoff score of 4 or more.
In a third development study report,19 the capacity of the DMI-10 to screen
for a depression in a general practice setting—where it might be assumed that
the majority of the subjects would have a primary medical illness—was again
supported. The DMI-10 measure is shown in Table 10.1.

Parsimonious Screening
Chochinov and colleagues20 compared four screening measures in a sample of
inpatients with advanced cancer who were receiving palliative care. A single
item (‘‘Are you depressed?’’) was reported to have perfect sensitivity and
specificity, with the authors concluding that this question provides a ‘‘reliable
and remarkably accurate screen.’’ However, as responses to the outcome
measure (Research Diagnostic Criteria status) and the single predictor question
could both have been derived by subjective response bias (ie, affirming or
denying depression), this study risks a tautologic bias. Subsequent meta-
analysis showed more modest results.21 The need for economical accurate
10 ARE SPECIFIC SCALES USEFUL? 197

Table 10.1. Ten-Item Depression in the Medically Ill Screening Measure

DMI-10
Depression Self-Report Questionnaire
Please consider the following questions and rate how true each one is in relation to how you
have been feeling lately (ie. in the last 2 to 3 days) compared to how you usually or
normally feel.

Please tick ([) the most relevant Not True Slightly Moderately Very True
option

1. Are you stewing over things?

2. Do you feel more vulnerable


than usual?

3. Are you being self-critical and


hard on yourself?

4. Are you feeling guilty about


things in your life?

5. Do you find that nothing seems


to be able to cheer you up?

6. Do you feel as if you have lost


your core and essence?

7. Are you feeling depressed?

8. Do you feel less worthwhile?

9. Do you feel hopeless or helpless?

10. Do you feel more distant from


other people?

Adapted from www.blackdoginstitute.org.au/docs/DMI-10.pdf

measures encouraged development of a four-item screener (the Brief Case


Finder for Depression [BCD]22), assessing whether depressed mood or ‘‘rest-
less and disturbed nights’’ were present, together with items assessing inability
to overcome difficulties and/or dissatisfaction with life. This measure also
tends to be overly inclusive because its broad questioning generates many false
positives; however, the sensitivity of the measure and negative predictive
power appear adequate for ruling out those who are not depressed.23
198 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

The Primary Care Evaluation of Mental Disorders


(PRIME-MD)
The PRIME-MD24 assesses four domains of mental disorders commonly
observed in general population settings: mood, anxiety, somatoform, and
alcohol disorders. The two-tier assessment structure of the PRIME-MD
allows patients who score positively on the patient questionnaire (PQ) to
then receive a physician-administered structured interview (Clinician
Evaluation Guide [CEG]) involving modularized DSM-III-R criteria. A
patient who scores positively on one of two depressive symptoms (demor-
alization and/or anhedonia) on the PQ is subsequently assessed for more
specific criteria. Due to the length of administration time, consequent self-
report measures (PRIME-MD Patient Health Questionnaire [PHQ]) have
been designed,25 including the PHQ-9 measure of depressive status.
Standard DSM-IV major depressive disorder criteria apply, and recognition
of symptomatology is comparable—if not slightly more sensitive—than the
original PRIME-MD.25
In the initial PHQ primary care study,25 measured sensitivity and speci-
ficity were 73% and 98% respectively for the self-report version, compared
with 57% and 94% for the original clinician-administered version. More
specifically, at a cutoff score of 10 or more, the PHQ-9 derived sensitivity
estimates of 88%, likewise for specificity, to meet diagnostic criteria for
major depression.26 However, diagnostic concordance of the PHQ-9, while
higher than both the HADS and the WHO Wellbeing Index (WBI-5),27 is
still relatively low in comparison with DSM-IV criteria (kappa ¼ 0.56). The
comparative validity of the PHQ-9 with physicians’ diagnoses (sensitivities
of 98% and 40% respectively) is, however, superior.28 Thus, the PHQ-9 is
suggested to be somewhat more accurate29 than HADS and physicians’
diagnoses, though comparable to more general measures of well-being in
primary care populations.

5. Discussion
The capacity for medical illness and/or hospitalization to distort the assessment
of depression in the medically ill argues against use of any general depression
measure, and we have therefore not reviewed studies using such measures
other than the PRIME-MD. The last does risk confounding by medical illness
nuances but has the advantage of delivering DSM case status decisions,
although the risk is that the intrinsic limitations to such diagnoses in medically
ill groups may fail to be recognized. We therefore take as a given that any valid
depression measure excludes items that can be confounded by illness or
10 ARE SPECIFIC SCALES USEFUL? 199

hospitalization, and have focused on relevant measures. Two—the HADS and


the BDI-PC—have adopted the exclusive approach by effectively removing
potentially confounding items from established depression measures. In devel-
oping the DMI measures, we adopted a differing ‘‘bottom-up’’ approach of
examining the properties of items capturing the world of medically ill patients
(both depressed and nondepressed).
While the HADS measure has long been in use, it has been criticized for
the lack of studies examining its psychometric properties and even for its
validity. Its focus on anhedonia respects that construct’s utility in psychiatric
subjects but, as we established its high rate of affirmation of anhedonia in
nondepressed medically ill subjects,18 that construct may not be as central to
depression as imagined. Our quantifying18 low sensitivity for the HADS in
diagnosing definite depression is of concern if the aim of the screening
measure is to prioritize detection of those with probable or definite depres-
sion. In our initial DMI study,17 we established that the DMI-16 had high
internal consistency and was distinctly superior to the HADS and somewhat
superior to the BDI-PC when compared against a psychiatrist’s independent
clinical judgment of depression severity and case status. These findings were
essentially confirmed in our second study,18 where we again compared the
three relevant measures.
Any measure of depression in the medically ill needs to be acceptable, brief,
and minimally intrusive. The last issue is worthy of consideration. We deleted
a provisional item assessing suicidal ideation as it proved intrusive to a number
of our medically ill subjects. However, we demonstrated that its omission (in
the final DMI measures) was not of concern, as all those admitting to suicidal
ideation scored above the cutoff on the DMI-16.
Our studies of the three principal candidate screening measures (HADS,
BDI-PC and the DMI) suggest that the BDI-PC and DMI measures are roughly
comparable—but superior to the HADS—in terms of their capacity to separate
depressed and nondepressed individuals in medical settings. We would recom-
mend both the use of the BDI-PC and DMI-10.

References
1. Kessler R. Prevalence, correlates, and course of minor depression and major depression
in the National Comorbidity Survey. J Affect Disord. 1997;45:14–30.
2. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive
disorder: a systematic review of prospective studies. Acta Psychiatr Scand.
2004;109(5):325–331.
3. Egede LE. Major depression in individuals with chronic medical disorders: prevalence,
correlates and association with health resource utilization, lost productivity and
functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
200 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

4. Cassell EJ. Reactions to physical illness and hospitalisation. In Usdin G, Lewis JM, eds.
Psychiatry in general nedical practice. New York: McGraw Hill, 1979.
5. Cohen-Cole SA, Brown FN, McDaniel JS. Diagnostic assessment of depression in the
medically ill. In Stoudermire A, Fogel B, eds. Psychiatric care of the medical patient.
New York: Oxford University Press, 1993:53–70.
6. Plumb MM, Holland J. Comparative studies of psychological function in patients
with advanced cancer-I. Self-reported depressive symptoms. Psychosom Med.
1977;39(4):264–276.
7. Endicott J. Measurement of depression in patients with cancer. Cancer. 1984;53(10
Suppl):2243–2249.
8. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr
Scand. 1983;67(6):361–370.
9. Hermann C. International experiences with the Hospital Anxiety and Depression
Scale: a review of validation data and clinical results. J Psychosom Res.
1997;42(1):17–41.
10. Silverstone PH. Poor efficacy of the Hospital Anxiety and Depression Scale in the
diagnosis of major depressive disorder in both medical and psychiatric patients. J
Psychosom Res. 1994;38(5):441–450.
11. Bjelland I, Dahl AA, Haug TT, et al. The validity of the Hospital Anxiety and
Depression Scale: An updated literature review. J Psychosom Res. 2002;52(2):69–77.
12. Beck AT, Guth D, Steer RA, et al. Screening for major depression disorders in medical
inpatients with the Beck Depression Inventory for Primary Care. Behav Res Ther.
1997;35(8):785–791.
13. Beck AT, Beck RW. Screening depressed patients in family practice—rapid technique.
Postgrad Med. 1972;52(6):81–85.
14. Beck AT, Steer RA, Ball R, et al. Use of Beck anxiety and depression inventories for
primary care with medical outpatients. Assessment. 1997;4(Suppl 3):211–219.
15. Steer RA, Cavalieri DO, Leonard DM, et al. Use of the Beck Depression Inventory for
Primary Care to screen for major depressive disorders. Gen Hosp Psychiatry.
1999;21(2):106–111.
16. Winter LB, Steer RA, Jones-Hicks L, et al. Screening for major depression disorders in
adolescent medical outpatients with the Beck Depression Inventory for Primary Care. J
Adolesc Health. 1999;24(6):389–394.
17. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Screening for depression in the medically
ill: the suggested utility of a cognitive-based approach. Aust N Z J Psychiatry.
2001;35(4):474–480.
18. Parker G, Hilton T, Bains J, et al. Cognitive-based measures screening for depression in
the medically ill: the DMI-10 and the DMI-18. Acta Psychiatr Scand.
2002;105(6):419–426.
19. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Clinical and personality correlates of a new
measure of depression: a general practice study. Aust N Z J Psychiatry. 2003;37(1):
104–109.
20. Chochinov HM, Wilson KG, Enns M, et al. ‘‘Are you depressed?’’ Screening for
depression in the terminally ill. Am J Psychiatry. 1997;154(5):674–676.
21. Mitchell AJ. Are one or two simple questions sufficient to detect depression in
cancer and palliative care? A Bayesian meta-analysis. Br J Cancer. 2008;98(12):
1934–1943.
22. Clarke DM, McKenzie DP, Marshall, RJ, et al. The construction of a brief case-finding
instrument for depression in the physically ill. Integr Psychiatry. 1994;10:117–123.
10 ARE SPECIFIC SCALES USEFUL? 201

23. Jefford M, Mileshkin L, Richards K, et al. Rapid screening for depression—validation


of the Brief Case-Finder for Depression (BCD) in medical oncology and palliative care
patients. Br J Cancer. 2004;91(5):900–906.
24. Spitzer RL, Williams JBW, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care: The PRIME-MD 1000 study. JAMA.
1994;272(22):1749–1756.
25. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report
version of PRIME-MD: The PHQ primary care study. JAMA. 1999;282(18):
1737–1744.
26. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression
severity measure. J Gen Intern Med. 2001;16(9):606–613.
27. World Health Organization (WHO). Wellbeing measures in primary health care: The
DepCare Project. WHO, Regional Office for Europe, Copenhagen: 1998.
28. Löwe B, Spitzer RL, Gräfe K et al. Comparative validity of three screening
questionnaires for DSM-IV depressive disorders and physicians’ diagnoses. J Affect
Disord. 2004;78(2):131–140.
29. Wittkampf KA, Naeije L, Schene AH, et al. Diagnostic accuracy of the mood module of
the Patient Health Questionnaire: a systematic review. Gen Hosp Psychiatry.
2007;29(5):388–395.
This page intentionally left blank
11
SCREENING FOR DEPRESSION IN
MEDICAL SETTINGS: THE CASE
AGAINST SPECIFIC SCALES

Fariba Babaei and Alex J. Mitchell

1. Overview of Depression in Physical Disease


2. Defining Somatic Symptoms
3. Diagnostic Accuracy of Somatic Symptoms in Depression
4. Evidence For and Against Somatic Symptoms when Diagnosing Comorbid
Depression
5. Implications for Screening

Context
The prevailing view for detecting mood disorders in the presence of physical
disease is to exclude somatic symptoms that might contaminate a diagnosis
(See Parker and Hyatt, Chapter 10 for a presentation of this point of view). This
chapter will examine whether this approach is beneficial, with a view to
deciding whether new depression scales for each physical disorder (each
excluding somatic symptoms) are required.

1. Overview of Depression in Physical Disease


There is a bidirectional relationship between depression and physical illness.
New evidence suggests that among depressed individuals presenting in pri-
mary care, most have at least one comorbid psychiatric condition and at least
one physical condition.1,2 At least 75% of elderly depressed patients in primary
care also have a known physical illness, and in 30–50% this is of high
severity.3–6 In one study only 10% of elderly depressed patients in primary
care had pure depression with no comorbidity.7 Thus, comorbid depression
should be considered ‘‘normal’’ in primary care. Some evidence suggests that

203
204 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

those with comorbidity are less likely to have depression treatment initiated by
their primary care practitioner.8 They are also less likely to recover from
depression.9 Specific conditions such as speech disorders, arthritis, and
dermatologic problems have been linked with worse outcomes of
depression.10,11
The exact relationship of depression and comorbidities is complex. In one of
the largest studies, Egede (2007)12 examined data from 30,801 adults captured
in the 1999 Household National Health Interview Survey. The community
prevalence of major depression was 4.7% in those without chronic medical
illness but 7.7%, 9.8%, and 12% in those with one, two, or three or more
chronic disorders, respectively (Fig. 11.1). Major depression was associated
with significant increases in utilization, lost productivity, and functional dis-
ability. Patients with chronic medical illness and comorbid depression (and
anxiety) also have significantly higher numbers of medical symptoms, even
controlling for severity of disease.13 Around one in four people in the general
population have functional disability, but in those with depression and medical
comorbidity, at least three out of four have functional limitations.14

18

16

14

12

10
8

0
r

1)

4)

1)

1)

)
de

91

10

31
37

79

49

68
or

=3

=7

=4
=7

=1

=3

=1
s

(n

(n

(n
di

(n

(n

(n

(n
re

VA

re
o

is

PD
N

ilu

ilu
te
io

C
Fa

Fa
ns

be

O
ry

C
te

ia

te
rt

al
er

D
ea

Ar

en
yp
H

R
ar
H
e

e
on
iv

ag
st

or

St
ge

d-
on

En
C

Figure 11.1. 12-month prevalence of major depression in community population. Data


from Egede LE. Major depression in individuals with chronic medical disorders:
prevalence, correlates and association with health resource utilization, lost productivity and
functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
11 THE CASE AGAINST SPECIFIC SCALES 205

A population survey in New Zealand found that a quarter of people with


chronic physical conditions suffered from a comorbid mental disorder, com-
pared with 15% of the population without chronic conditions.15 Further, those
with a mental disorder had higher rates of chronic pain, cardiovascular disease,
high blood pressure, and respiratory conditions as well as the risk factors
smoking, overweight/obesity, and hazardous alcohol use. In a primary care
survey of 6,641 patients with multiple physical disorders, Nuyen and collea-
gues (2006)16 used morbidity data recorded by Dutch general practitioners to
examine both psychiatric and physical comorbidities. The top three conditions
linked with lifetime depression were schizophrenia, anxiety disorders, and
substance abuse. The top three medical disorders were Parkinson’s disease,
male genital problems, and stroke.
Physical disease is also strongly linked with suicide. Juurlink and associates
(2004)17 examined 1,354 provincial coroners’ records of Ontario residents 66
years or older who committed suicide between 1992 and 2000. Their prescrip-
tion records during the preceding 6 months were compared with those of living
matched controls (1:4) to determine the presence or absence of 17 illnesses
potentially associated with suicide. Conditions associated with suicide are
shown in Figure 11.2. Compared with patients with no identified illness, for
example, patients with three illnesses had about a threefold increase in the
estimated relative risk of suicide, and patients with five illnesses had about a
fivefold increase in risk.

2. Defining Somatic Symptoms


Defining and Eliciting ‘‘Somatic Symptoms’’
What exactly is meant by ‘‘somatic symptoms’’? At face value the answer
seems obvious: somatic symptoms are physical complaints relating to bodily
sensations. These would include aches, low energy, fatigability, muscle weak-
ness, leaden paralysis, and gastrointestinal symptoms (low appetite and weight
loss). Pain, sexual dysfunction, and sleep disturbance are certainly core
somatic symptoms, but are these strictly bodily sensations? For example,
pain may be defined as ‘‘an unpleasant sensory and emotional experience
associated with actual or potential tissue damage or is described in terms of
such damage.’’18 Thus pain (and sleep) may represent physical and psycholo-
gical aspects. Even more difficult to classify but still conventionally regarded
as somatic are concentration problems, agitation/retardation, and changes in
arousal. This short list is not exhaustive. Less common somatic symptoms of
depression might include shortness of breath, dry mouth, constipation or
diarrhea, urinary frequency or hesitancy, menstrual disturbances, dizziness,
changes in libido, palpitations, increased sweating, flushing, blurred vision,
tremor, pins and needles, restless legs, and rash. Indeed, any bodily sensation
Is
ch
em
ic
he
R ar

0
1
2
3
4
5
6
7
8
9
10

he td
um is
at ea
oi se
d
D ar
ia th
be rit
te is
s
m
el
H Pr
os lit
yp us
er ta
ac te
id ca
ity nc
sy er
nd
ro
B m
re es
Pa as
rk tc
in an
so ce
C n’ r
hr s
on di
ic se
C lu as
on
ge ng e
st di
iv se
e as
he e
and the risk of suicide in the elderly. Arch Intern Med. 2004;164:1179–1184. ar
tf
M ai
od lu
U re
rin er
at
ar
y e
in pa
co in
A nt
in
nx Se
ie iz en
ty ur ce
an e
d di
so
Ps sl
ee rd
yc p er
ho di
se so
s rd
an er
d s
ag
ita
tio
D n
ep
re
ss
io
Se n
ve
B re
ip pa
ol in
ar
di
so
rd
er
Figure 11.2. Suicide risk in medical and psychiatric disorders. Reprinted with permission from Juurlink DN, Herrmann N, Szalai JP. Medical illness
11 THE CASE AGAINST SPECIFIC SCALES 207

might be included, although some symptoms may be due to medication rather


than the underlying depression.
One study examined how reliably clinicians elicit somatic compared to
nonsomatic symptoms. In the Rhode Island MIDAS project, Zimmerman and
colleagues (2006)19 conducted an in-depth analysis of symptoms for major
depressive disorder by trained raters administering a semi-structured interview
to 1,523 psychiatric outpatients. They analyzed a 17-item bank of possible
symptoms of depression, including the standard 9 DSM items but separating
the compound criteria that encompass more than one symptom (eg, increased
sleep OR insomnia), along with non-DSM diagnostic items such as hope-
lessness, helplessness, and unreactive mood. The authors found that some
items were rated more reliably than others—for example, suicidal ideas/plan/
attempt (suicidality) achieved almost perfect agreement, whereas raters often
disagreed about what constituted psychomotor retardation (Textbox 11.1).
There was no overall pattern indicating that somatic symptoms were rated
more or less reliably than nonsomatic symptoms.

Textbox 11.1. Inter-Rater Reliability Eliciting Individual Symptoms


of Depression

Symptoms Kappa
Suicidality 0.94
Depressed mood 0.92
Insomnia 0.91
Anhedonia 0.90
Decreased appetite 0.89
Loss of energy 0.88
Indecisiveness 0.88
Thoughts of death 0.86
Psychomotor agitation 0.83
Feelings of worthlessness 0.80
Increased weight 0.79
Decreased concentration 0.78
Excessive guilt 0.76
Decreased weight 0.69
Increased appetite 0.63
Psychomotor retardation 0.63
Hypersomnia 0.54

Bold text indicates somatic symptoms.


208 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Somatic Symptoms in Current Diagnostic Systems and Scales


Somatic items are included in both ICD-10 and DSM-IV. In fact, ICD-10
includes fatigue as a core feature. Neither ICD-10 nor DSM-IV gives clear
guidance about how to judge these specific symptoms in the case of depression
and physical disease (Table 11.1). As many as 70% of patients with depression
(and to a lesser extent anxiety) present with somatic symptoms as their first
complaint. Emotional symptoms are less likely to be mentioned if they are not
specifically asked about by the interviewer.20 That said, physical complaints
are seldom attributed to psychological causes, and the focus for clinical
examination is usually physical disorders with somatic symptoms.21 Thus,
somatic symptoms may indicate major depression or an underlying physical
disorder. Particular difficulty arises in the case of major depression occurring
in the context of a comorbid physical disorder. In this situation it is unclear how
to judge the significance of somatic symptoms.22,23 In an attempt to improve
upon the discriminatory value of the Beck Depression Inventory (BDI) and the
Zung Self-Rating Depression Scale, questionnaires such as the Hospital
Anxiety and Depression Scale (HADS) and the General Health
Questionnaire (GHQ-12) omit most somatic symptoms of depression in favor
of cognitive aspects.24 Most commonly fatigue and appetite and weight
changes are omitted.25 In this approach somatic symptoms are assumed to
contaminate a diagnosis of comorbid depression. The concern is that somatic
symptoms may lead to an overdiagnosis of depression because of the lack of

Table 11.1. Somatic Symptoms of Depression in ICD and DSM

Somatic or NonSomatic Core Symptom ICD-10 DSM-IV


Nonsomatic Persistent sadness or low mood Yes Yes
(core) (core)
Nonsomatic Loss of interests or pleasure Yes Yes
(core) (core)
Somatic Fatigue or low energy Yes Yes
(core)
Somatic Disturbed sleep Yes Yes
Somatic Poor concentration or Yes Yes
indecisiveness
Nonsomatic Low self-confidence Yes No
Somatic Poor or increased appetite Yes No
Nonsomatic Suicidal thoughts or acts Yes Yes
Somatic Agitation or slowing of movements Yes Yes
Nonsomatic Guilt or self-blame Yes Yes
Somatic Significant change in weight No Yes
11 THE CASE AGAINST SPECIFIC SCALES 209

discrimination regarding the cause of the symptoms.26 One way to investigate


this is to compare the ability of somatic items to distinguish between healthy
controls and those with major depression. A second method is to compare the
ability of somatic items to distinguish between those with uncomplicated major
depression and those with comorbid major depression and physical illness. A
third method is to compare those with comorbid depression and those with
physical illness alone. We consider each of these in turn below.

3. Diagnostic Accuracy of Somatic Symptoms in Depression


Given the almost endless list of possible somatic symptoms, it is important to
first establish which, if any, are of diagnostic significance in primary depres-
sion and then in the diagnosis of comorbid depression and physical illness. For
example, Chochinov and associates (1994)27 compared results from semi-
structured diagnostic interviews in 130 patients receiving palliative care.
Diagnoses according to the Research Diagnostic Criteria (RDC) were com-
pared with diagnoses made according to Endicott’s revised criteria (which
replace the somatic symptoms change in weight or in appetite, sleep distur-
bance, loss of energy, and reduced concentration with the nonsomatic alter-
natives depressed appearance, social withdrawal, brooding, self-pity or
pessimism, and lack of reactivity). The authors found that including somatic
symptoms in the diagnostic criteria increased the rates of diagnosis, but only
when these symptoms are used in conjunction with a low-threshold approach.
Similarly, Dugan and coworkers (1998)28 analyzed the Zung Self-Rating
Depression Scale both with and without somatic items and reported 5% more
false positives when measuring depression in cancer with somatic items.
However, to confirm or refute this effect, a diagnostic validity study is
needed in which somatic symptoms are added or removed from the model to
examine the effect on accuracy of ruling in or ruling out the condition
according to the gold standard. Once this information is gathered, then a
decision can be made whether to include or exclude the somatic symptoms.
A slightly more sophisticated approach uses somatic symptoms only if they are
caused by depression (Textbox 11.2). In reality this etiologic approach is
challenging, because causation of specific symptoms is usually impossible to
establish except in the crudest terms.
One reason for uncertainty is that the rate of somatic complaints is not clear
in each subgroup. For example, although somatic symptoms are certainly
common in depressed patients, they also appear to be common in the general
population: more than 75% of respondents in one community study reported at
least one somatic complaint during the previous 30 days.29 The most common
symptoms were tiredness (50%), headache (42%), and lower back pain (35%).
210 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 11.2. Approaches to Somatic Symptoms of Depression

Inclusive
The inclusive approach uses all of the symptoms of depression, regardless of
whether they may or may not be secondary to a physical illness. This
approach is used in the Schedule for Affective Disorders and Schizophrenia
(SADS) and the Research Diagnostic Criteria.
Etiologic
The etiologic approach attempts to assess the origin of each symptom and
counts a symptom of depression only if it is clearly not the result of the
physical illness. This is proposed by the Structured Clinical Interview for
DSM and Diagnostic Interview Schedule (DIS), as well as the DSM-III-R/IV.
Substitutive
The substitutive approach assumes somatic symptoms are a contaminant and
replaces these with additional cognitive symptoms. However, it is not clear
what specific symptoms should be substituted.
Exclusive
The exclusive approach eliminates somatic symptoms but without
substitution. There is concern that this might lower sensitivity, with an
increased likelihood of missed cases (false negatives).
Adapted from Trask PC. Assessment of depression in cancer patients. J Natl Cancer Inst Monogr.
2004;32:80–92.

However, only about one third of patients with somatic symptoms seek med-
ical help. From the reverse perspective, mood disorders are a common finding
in those with somatic symptoms, accounting for approximately 30% of patients
presenting with physical complaints.30 In the Epidemiological Catchment Area
Study (ECA), the presence of physical symptoms was associated with at least a
twofold increase in anxiety or depressive disorders.31,32 In the HUNT-II study,
which surveyed all inhabitants from the Nord-Trøndelag County of Norway,
women had a mean of 3.8 somatic symptoms and men 2.9 symptoms.33 There
was a linear association between the number of somatic symptoms and the total
HADS score. Gerber and associates (1992)34 showed that sleep disturbance,
fatigue, more than three complaints, nonspecific musculoskeletal complaints,
back pain, shortness of breath, amplified complaints, and vaguely stated
complaints distinguished between depressed and nondepressed patients in a
general medical primary care practice. Better evidence was recently reported in
the Rhode Island MIDAS project. Zimmerman and colleagues (2006)35 found
that the ranked order of diagnostic weight (by individual item) for DSM-IV
11 THE CASE AGAINST SPECIFIC SCALES 211

membership on logistic regression was depressed mood > anhedonia > sleep
disturbance > concentration/indecision > worthlessness/excessive guilt > loss
of energy > appetite/weight disturbance > psychomotor change > death/sui-
cidal thoughts. In the 8.9% who fulfilled the minimum DSM-IV criteria for
major depressive disorder (five features only), increased weight, decreased
weight, and indecisiveness rarely influenced diagnostic classification and in
fact were influential in diagnosis in the whole sample in about 1% of cases.
More detailed analysis of the MIDAS project was recently reported by
Mitchell and colleagues (2008).36 We found that somatic symptoms had
value in ruling in and ruling out primary depression (Fig. 11.3). When ruling
in depression (case-finding), the most successful single symptoms were psy-
chomotor retardation, diminished interest/pleasure, indecisiveness, depressed
mood, and worthlessness. When ruling out depression (reassurance), the most
successful symptoms were depressed mood, diminished drive, loss of energy,
diminished interest/pleasure, and diminished concentration. Therefore, it may
be concluded that psychomotor retardation, loss of energy, and diminished
concentration do indeed help clinicians diagnose uncomplicated depression.
What is the evidence that somatic symptoms assist in a diagnosis of comorbid
depression?

4. Evidence For and Against Somatic Symptoms when


Diagnosing Comorbid Depression
Evidence from Comparative Studies of Primary Depression
versus Secondary Depression
Lipsey and colleagues (1986)37 studied 43 post-stroke depressed patients
against 43 patients with functional major depression to compare their depres-
sive symptoms. They concluded that the depressive syndrome profiles in the
two patient groups were similar, and only two symptoms were significantly
different: slowness was more common and lack of interest/concentration was
less common in post-stroke patients. Simon and associates (2005)38 examined
the validity of the DSM-IV depression criteria in 235 individuals with medical
comorbidities, including diabetes, ischemic heart disease, or chronic obstruc-
tive lung disease, versus 204 depressed subjects without those conditions. At
the midpoint of the depression severity scale, patients with medical comor-
bidity had a 54% probability of reporting fatigue compared to 45% in those
without comorbidity. All four somatic symptoms showed robust improvement
with treatment, and this improvement did not differ significantly between
patients with and without medical comorbidity. They could find only limited
evidence that fatigue, changes in weight or appetite, psychomotor agitation/
retardation, and sleep disturbance are less valid indicators of depression in
0.00
0.10
0.20
0.30
0.40
0.50

–0.10
An ge
r

An xie
ty
De cr
ease
da ppeti
te
De cr
ease
d we
igh t
De pre
ssed
mo od
Dimin
is h ed
co nc
e n trati
on
Dimin
is h ed
Dimin drive
is h ed
in te r
est/p
le asu re

Exce
s sive
gu il t

Help
le ssn
e ss
Hope
le ssn
e ss
H yp e
rsom
n ia
In cre
ase d
appe
ti te
In cre
ase d
we ig
ht
In de
cisiv
en es
s

In som
n ia
Lack
of re acti
ve m
o od

Lo ss
of e n
ergy
P s yc
h ic a
n xie ty
P s yc
h om
o tor
a gita
tio n
P s yc
h om
o tor
ch an
P s yc ge
h om
o tor
re ta rd
ati on
Sle e
p dis
tur ba nc
e
Som
ati c a
nxiety
Rule-In Added Value (PPV-Prev)
Rule-Out Added Value (NPV-Prev)

Th ou
ghts
o f de
ath
Wo rt
h le
Figure 11.3. Added value in diagnosing primary depression. Adapted from Mitchell AJ, McGlinchey JB, Young D, et al. Accuracy of specific

ssne
ss
symptoms in the diagnosis of major depressive disorder in psychiatric out-patients: data from the MIDAS project. Psychol Med. Nov. 12, 2008:1–10.
11 THE CASE AGAINST SPECIFIC SCALES 213

patients with chronic medical illness. Pickard and associates (2006)39 used
Rasch methods to compare symptoms of depression in 32 subjects with post-
stroke depression versus 366 depressed primary-care patients. They found that
four items demonstrated statistically significant differential item functioning:
‘‘my sleep was restless,’’ ‘‘I felt that people disliked me,’’ ‘‘I did not feel like
eating,’’ and ‘‘I had crying spells.’’ Each of these items identified with statis-
tically significant Differential Item Functioning (DIF) demonstrated a logit
difference of approximately 0.5 or more across the two groups. Overall,
however, the authors found few differences between groups.
Van Wilgen and associates (2006)40 analyzed the influence of somatic
symptoms on the Center for Epidemiologic Studies Depression Scale
(CES-D) in 509 patients with oropharyngeal, gynecologic, colorectal,
and breast cancer after treatment versus a control group of 223 depressed
patients without cancer. They concluded that the incidences of somatic
morbidity within cancer types differ, but somatic items do not interfere
with the outcome of depression as measured with the CES-D.
Interestingly, some cancer groups showed both less somatic morbidity
(colorectal cancer) while others showed more (oral/oropharyngeal,
breast) than the comparison group. In the analyses of the CES-D with
and without the somatic domain, the prevalence of depression symptoms
with the somatic domain is lower for the cancer groups.
Ehrt and colleagues (2007)41 compared the individual depressive symptoms
of 145 depressed patients with Parkinson’s disease and 100 depressed patients
without Parkinson’s disease by comparing item scores on the Montgomery-
Åsberg Depression Rating Scale. Depressed patients with Parkinson’s disease
showed significant less reported sadness, less anhedonia, fewer feelings of
guilt, and slightly less loss of energy but more concentration problems than
depressed control subjects. Thus, some but not all somatic symptoms were
increased in comorbid groups. The results of this study support the hypothesis
that depression profile in Parkinson’s disease differs to a certain extent from
that in non-Parkinson’s disease patients with major depression.
Yates and colleagues for STAR*D (2007)42 analyzed the effect of
specific somatic symptoms in separating primary depression from depres-
sion with comorbid physical disease. Clearly, if somatic symptoms were
overrepresented in the comorbid group, then the classic view that somatic
symptoms may contaminate a diagnosis of depression in physical disease
would be supported. Two somatic symptoms occurred in 80% or more of
those with noncomplicated depression and four occurred in 80% or more
of those with comorbid depression. The two most common were impaired
concentration (91%) and fatigue (87%). Although somatic symptoms were
common in patients with both depression and physical ill health, somatic
symptoms were also common in patients without comorbidity. In
214 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

particular, impaired concentration and fatigue occurred in approximately


90% of both groups. Other studies have examined this issue in relation to
comorbid depression versus healthy controls.

Evidence from Comparative Studies of Comorbid Depression


versus Healthy Controls
Aikens and associates (1999)43 evaluated the depressive symptoms in 105
multiple sclerosis patients and compared the results with 80 healthy controls
as well as three other comparison groups: diabetes (n = 71), chronic pain
(n = 80), and psychiatric patients with depressive disorder (n = 37). They eval-
uated the appropriateness of omitting somatic items from the original BDI when
assessing depressive symptoms in multiple sclerosis patients. They suggested
that somatic items appear to function quite normally for this group, with
psychometric indices comparable to those observed in psychiatric and nonpsy-
chiatric samples, and recommended against dropping items from the original
BDI for routine depression assessment in multiple sclerosis samples.
Guo and colleagues (2006)44 looked at a small sample of 33 cancer patients,
13 patients with major depression without cancer, and 12 normal comparison
subjects. The authors examined which HAM-D items would optimize the
diagnosis of depression among cancer patients. Their final model contained
six HAM-D items, combining somatic and nonsomatic items (late insomnia,
agitation, psychic anxiety, diurnal mood variation, depressed mood, and gen-
ital symptoms). At a cutoff of 6 the sensitivity was 81.3% and specificity
87.5%. However, in this study, certain somatic items, including middle
insomnia, retardation, somatic symptoms (gastrointestinal and general), and
loss of weight, were not discriminatory.
Holzapfel and associates (2008)45 examined depressed patients with
(n = 113) and without (n = 137) chronic heart failure in relation to individual
DSM-IV depressive symptoms, as measured with the Patient Health
Questionnaire (PHQ)-9. Among the patients meeting the criteria for major
depressive disorder, patients with heart failure reported significantly lower
levels of depressed mood (p = 0.006) and worthlessness/guilt (p = 0.019) than
patients without. No significant differences were found for sleep disturbance,
loss of energy, change in appetite, poor concentration, psychomotor agitation/
retardation, and suicidal thoughts (Fig. 11.4).

Evidence from Comparative Studies of Comorbid Depression


versus Physical Illness Alone
Symptom profiles of depressed and nondepressed patients with cancer were
examined by Chen and Chang (2004),46 who recruited 121 hospitalized
11 THE CASE AGAINST SPECIFIC SCALES 215

Symptom severity: Symptom severity:


CHF > Non-CHF CHF > Non-CHF

Loss of interest

Depressed mood

Sleep disturbance

Loss of energy

Change in appetite

Worthlessness/feelings of guilt

Weak concentration

Psychomotor agitation/retardation

Suicidal thoughts

–1.0 –0.5 0 +0.5 +1.0

Figure 11.4. Differences in severity of individual depression symptoms in patients with


major depressive disorder with and without chronic heart failure. Data from Holzapfel N,
Müller-Tasch T, Wild B. et al. Depression profile in patients with and without chronic heart
failure. J Affect Disord. 2008;1:53–62.

patients with breast, esophageal, and head and neck cancer. Using a HADS-D
cutoff score of 11, 30 patients were classified as depressed and 91 as non-
depressed. Depressed patients showed a significantly higher occurrence rate
than nondepressed patients on insomnia (83% versus 62%), pain (83% versus
55%), anorexia (63% versus 42%), fatigue (67% versus 32%), and wound/
pressure sore (30% versus 13%). A significant chi-squared statistic with Yates
correction (w2 = 10.74, p = 0.001) indicated an association between multiple
symptoms and depression in this sample. Patients simultaneously experiencing
multiple symptoms (insomnia, pain, anorexia, and fatigue) had a significantly
higher risk of being depressed. Both groups showed similar rankings of
symptom occurrence rates.
216 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Evidence from Noncomparative Studies (eg, Rasch Analysis)


Stein and coworkers (1996)47 found that somatic items of depression were less
sensitive than nonsomatic items in the diagnosis of post-stroke depression. In
this study 189 persons with unilateral ischemic or embolic cerebrovascular
accident were interviewed by a psychologist, 4 weeks or more after stroke,
using the BDI and the HAM-D. Findings suggested that the most discrimi-
nating individual symptoms of post-stroke depression were nonsomatic.
Somatic items from both scales were significantly less specific when diag-
nosing post-stroke depression than were the nonsomatic items. Somatic symp-
toms were neither specific to post-stroke depression nor added incremental
validity over nonsomatic symptoms for diagnosing post-stroke depression.
Kathol and colleagues (1990)48 investigated the relation of scores on the
HAM-D and BDI to the presence or absence of criteria-based diagnoses of
depression in cancer. The diagnoses of major depression in 152 cancer patients
differed as much as 13% depending on the diagnostic system used. The BDI
and the HAM-D were useful tools for screening patients with depressive
symptoms but frequently misclassified those who had no major depression
according to one or more of the criteria-based diagnostic systems.
Kalichman and colleagues (2000)49 worked on overlapping somatic symp-
toms of depression and HIV disease in 357 people living with HIV/AIDS. They
directly compared the diagnostic use of the BDI and the CES-D in this single
sample. Results of a factor analysis entering the six depression factor scores
from the BDI and CES-D showed that HIV symptoms were most strongly
associated with the somatic depression symptom factors of the BDI and
CES-D. In other words, the findings suggested that depression scales that
include somatic symptoms will inflate depression scores in people living with
HIV infection, and available methods for distinguishing overlapping symptoms
should be employed when assessing people living with HIV infection.
Leentjens and coworkers (2001)50 assessed the sensitivity of individual
depressive symptoms and their relative contribution to the diagnosis of depres-
sive disorder using the Structured Clinical Interview for DSM Disorders
(SCID) in 149 patients with Parkinson’s disease. Applying the HAM-D and
the Montgomery-Åsberg Depression Rating Scale, they showed that only two
somatic symptoms, early morning awakening and reduced appetite, had good
discriminative properties. Therefore, they concluded that the core symptoms
were most important in distinguishing depressed and nondepressed
Parkinson’s disease patients.
Akechi and associates (2003)51 used data from 220 cancer patients with
major depression to examine the intercorrelations among the DSM-IV somatic
and nonsomatic symptom criteria as well as whether the presence of an
individual somatic symptom could discriminate the severity of major
11 THE CASE AGAINST SPECIFIC SCALES 217

depression. Appetite changes and a diminished ability to think but not sleep
disturbance and fatigue were significantly associated with nonsomatic symp-
toms. These associations were consistent after adjusting for physical func-
tioning and pain. Only patients with appetite changes showed a higher severity
of depression.
De Coster and colleagues (2005)52 studied 206 patients with first-ever
stroke with the SCID for DSM-IV and the HAM-D. In a discriminant analysis
HAM-D item scores correctly classified 88.3% of patients as depressed or
nondepressed. Depressed mood discriminated best between depressed and
nondepressed stroke patients, but many psychological symptoms, such as
hypochondriasis, lack of insight, and feelings of guilt, were not very sensitive.
In contrast, somatic symptoms, such as reduced appetite, psychomotor retarda-
tion, and fatigue, had high discriminative properties.

5. Implications for Screening


Somatic symptoms have a role in the diagnosis of uncomplicated depression,
but their role in comorbid depression has been subject to considerable confu-
sion. Two early studies suggested that including somatic symptoms in scales
might result in an overdiagnosis of comorbid depression and cancer (low
specificity and low positive predictive value). Since that time, our search
revealed six studies comparing primary depression and secondary depression,
three studies comparing comorbid depression and healthy controls, but only
one study comparing comorbid depression versus physical illness alone. From
the first group, somatic symptoms were certainly common in patients with
comorbid depression, but they were also common in those with uncomplicated
depression and less common in patients in physical illness alone and least
uncommon in healthy controls. Taking the example of cancer, individuals with
cancer undergoing active treatment clearly have numerous somatic symptoms.
Indeed, compared with healthy controls, individuals with cancer have a higher
level of all somatic symptoms rated by items 14 to 21 on the BDI, with the
exception of loss of libido.53 However, such differences are easy to over-
estimate. Individuals with comorbid and uncomplicated depressions have an
even higher rate of somatic symptoms. Overall, somatic symptoms did not
emerge as insignificant in primary or secondary depressions. Indeed, of the
possible list of symptoms potentially discriminating depressed patients with
and without comorbid physical illness, several nonsomatic items such as guilt
appear to be better discriminators than somatic symptoms (see Fig. 11.4). Thus,
the formulation of custom secondary depression scales by indiscriminately
omitting somatic items does not appear to be justified. That said, it is possible
that certain medical disorders might be atypical and feature somatic symptoms
that have special significance. For example, van Wilgen and colleagues
Table 11.2. Systematic Review of Comparative Studies Examining Value of Somatic Symptoms in Comorbid Depression

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
1997 Suh T, Gallo JJ. Symptom ECA (Epidemiologic 4,931 and 363 household (1) Except for gender, there Uncertain
profiles of depression Catchment Area) program: respondents from 3 ECA were significant differences
among general medical series of epidemiologic sites (Baltimore, Durham, between the two groups
service users compared surveys conducted by and Los Angeles) who according to the
with speciality mental collaborators (1980–1984) used general medical sociodemographic factors
health service users. at 5 sites in US. ECA data sector or speciality mental (p < 0.001). (2) Speciality
Psychol Med. include both community and health respectively within mental health service users
1997;27(5):1051–1063. institutional populations 6 months of interview were more likely to report all
interviewed in person. the depression symptoms. (3)
Measurement strategy: used General medical users were
standardized and generally less likely to report dysphasia
pre-coded questions as part (OR = 0.49; 95% CI = 0.33–
of highly structured 0.72) and worthless/sinful/
interview administered by guilty (OR = 0.55; 95%
an agency lay interviewer CI = 0.35–0.86) after holding
with DIS (Diagnostic constant the level of
Interview Schedule) depression but were more
training. Logistic regression likely to report fatigue
models were used to (OR = 1.82; 95%
implement item response CI = 1.17–1.83).
theory in the framework of
the symptom criteria of
major depression in
DSM-III
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
2001 Leentjens AFG, Marinus DSM-IV diagnosis of 169 patients with primary Using the HAM-D, suicidality No
J, Van Hilten JJ, et al. The depressive disorder was PD, as defined by the was the best discriminator
contribution of somatic considered the gold United Kingdom between depressed and
symptoms to the standard. All patients Parkinson’s Disease nondepressed patients,
diagnosis of depressive completed the Hamilton Society Brain Bank (UK- followed, in descending order,
disorder in Parkinson’s Rating Scale for PDS-BB), were referred by feelings of guilt, psychic
disease. J Depression (HAM-D) and from the neurologic anxiety, reduced appetite,
Neuropsychiatry Clin 111 patients completed the outpatient department for depressed mood, and reduction
Neurosci. 2003;15:74– MADRS, which were a protocolized mental of work and interest. Most
77. highly significant and used status examination. 20 somatic items had low
as symptom checklists. (11.8%) were excluded discriminative properties, but
The contribution of the because of dementia. reduced appetite and early-
individual items of these morning wakening (or late
scales to the diagnosis of insomnia) had relatively high
‘‘depressive disorder’’ was discriminative properties. On
calculated by discriminant the MADRS, the two ‘‘core’’
analysis. Then, a symptoms of depression,
correlation coefficient depressed mood and anhedonia,
with this discriminant had the highest correlation
function was obtained for coefficients. Somatic items as
each of the individual well as the item ‘‘concentration
items on these scales to difficulties’’had low correlation
reflect the relative strength coefficients. However, reduced
of association of each appetite was a relatively
symptom with the important indicator of
discriminant function. depression. Following a post
Wilks’ lambda was hoc analysis,

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
calculated as a test of the it was discovered that after
discriminant function. excluding the somatic items of
Physical disability and the HAMD (items 4, 5, 6, 8, 11–
cognitive status were rated 14, and 16), 86.6% of patients
according to the Hoehn were correctly classified as
and Yahr staging system depressed or nondepressed.
(I–V) and Mini-Mental After excluding the somatic
State Examination, items of the MADRS (items 4–
respectively. 7), 88.3% of the patients were
classified correctly.
1990 Kathol RG, Mutgi A, DSM-III, RDC (Research In an investigation of the One third of patients had No
Williams J, et al. Diagnostic Criteria), all treatment of depression in major depression according to
Diagnosis of major symptoms were recorded patients with terminal one or more of diagnostic
depression in cancer regardless of etiology. solid tumors, 152 of 808 systems. BDI total score of
patients according to four DSM-III-R, only patients (age 16–88, 59% <11 predicted 7% major
sets of criteria. Am J symptoms that had no female) reported depression according to
Psychiatry. definite relationship with symptoms of depression DSM-III, RDC, and the
1990;147:1021–1024. physical condition. during clinical evaluation Endicott criteria but not
Endicott criteria: to or screening with the DSM-III-R. At BDI and
identify depression Hamilton scale and/or Hamilton scale scores of
retrospectively, t-test and Beck inventory. All of 11–25 and 15–19
w2 square test were used to them had potentially fatal respectively, the percentage
assess differences in solid tumors at different of major depression dropped
parametric and stages. substantially. The correlation
nonparametric scores, between psychological items
respectively. (1–14) and somatic symptoms
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
(15–21) was 0.55. Somatic
symptoms were less
discriminating. Hamilton
scale was also comparable to
BDI, and in fact it allowed a
greater number to be assessed
with the positive predictive
value.
2006 Pickard AS, Dalal MR, Center for Epidemiologic Two data sources were Misfitting items—that is, No
Bushnell DB. Studies-Depression scale analyzed: (1) 32 depressed MNSQ higher than 1.40—in
Comparison of depressive (CES-D) (a 20-item scale) patients who were 3 months poststroke depression
symptoms in stroke and as a measure of depression poststroke, from two large included ‘‘my sleep was
primary care: Applying Depression = CES-D score teaching hospitals in restless,’’ ‘‘I had crying
Rasch models to evaluate 16 or higher. After Edmonton, Canada, age 18 spells,’’ ‘‘people were
the Center for informed consent, or more, and (2) 366 unfriendly,’’ and ‘‘I felt just as
Epidemiologic Studies- participants completed a depressed primary-care good as other people.’’ No
Depression Scale. Value screening questionnaire patients for which data on items misfit the scale in the
in Health. that included the CES-D. USA-based primary-care primary care-based
2006;9(1):59–64. patients with depression depression group. Four items
were obtained from the demonstrated statistically
Longitudinal Investigation significant DIF: ‘‘my sleep
of Depression Outcomes was restless,’’ ‘‘I felt that
(LIDO) study, age 18–75. people disliked me,’’ ‘‘I did
not feel like eating,’’

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
and ‘‘I had crying spells.’’ Each
of these items identified with
statistically significant DIF
demonstrated a logit difference
of approximately 0.5 or more
across the two groups.
2004 Chen ML, Chang H-K. Depression was measured 121 hospitalized patients Using the HADS-D cutoff No
Physical symptom using the Hospital Anxiety with breast, esophageal, score of 11, 30 patients (25%)
profiles of depressed and and Depression Scale. and head and neck cancer were classified as depressed
nondepressed patients Occurrence of symptoms and 91 (75%) as nondepressed.
with cancer. Palliative was evaluated with the The Mann/Whitney test
Med. 2004;18(8):712– Patient Disease Symptom/ indicated that depressed
718. Sign Assessment Scale. patients had a significantly
higher number of symptoms
(p = 0.001). Depressed patients
showed a significantly higher
occurrence rate (p < 0.05) than
that of nondepressed patients
on the following symptoms:
insomnia (83% vs. 62%), pain
(83% vs. 55%), anorexia (63%
vs. 42%), fatigue (67% vs.
32%), and wound/ pressure
sore (30% vs. 13%).
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
2000 Kalichman SC, Rompa D, Beck Depression Participants were 242 Factor scores were computed Uncertain
Cage M. Distinguishing Inventory (BDI). The BDI (68%) men, 110 (31%) for BDI Self-Defeating
between overlapping consists of 21 items that women, and 5 (1%) Thoughts, BDI Affective
somatic symptoms of reflect cognitive, affective, transgender persons living Symptoms, and BDI Somatic
depression and HIV behavioral, and somatic with HIV-AIDS. The Symptoms. We found that the
disease in people living symptoms of depression majority of the sample was strongest degree of
with HIV-AIDS. J Nerv over the previous 7 days. African-American (76%), association with HIV
Ment Dis. Center for with 19% white symptoms occurred for BDI
2000;188(10):662–670. Epidemiological Studies participants, 2% Hispanic, items involving ability to
Depression Scale (CESD) and the remaining 3% of work (r = 0.42), sleep
is a 20-item scale that other ethnic backgrounds. (r = 0.37), fatigue (r = 0.41),
assesses symptoms of They were recruited from appetite (r = 0.34), and worry
depression over the AIDS service about health (r = 0.31). The
previous 7 days. Anxiety: organizations, healthcare BDI items with the strongest
To assess anxiety in the providers, social service associations with HIV
current study, we used the agencies, community symptoms were therefore
20-item Trait-Anxiety residences for people those items reflecting somatic
scale from the State-Trait living with HIV-AIDS, complaints. For the CES-D,
Anxiety Inventory and infectious disease HIV symptoms were
(Spielberger et al., 1983). clinics. significantly correlated with
Future Pessimism: We depression items indicating
developed a six-item scale fatigue (r = 0.43), sleep

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
to assess future pessimism. (r = 0.40), appetite (r = 0.30),
To assess symptoms of not being able to shake the blues
obsessive-compulsiveness, (r = 0.34), feeling bothered
we used six items from the (r = 0.33), feeling depressed
Obsessive-Compulsive (r = 0.31), and lack of
scale of the Schedule for concentration (r = 0.32). CES-
Nonadaptive Personality D items that did not reflect
(SNAP; Clark, 1993). somatic complaints were also
closely associated with HIV
symptoms. Results of a factor
analysis entering the six
depression factor scores from
the BDI and CES-D showed that
HIV symptoms were most
strongly associated with the
somatic depression symptom
factors of the BDI and CESD.
1986 Lipsey JR, Spencer WC, Structured clinical The 43 poststroke patients Poststroke patients had No
Rabins PV, et al. interviews. Only patients were 23 acutely ill similar Hamilton depression
Phenomenological fulfilling DSM-III criteria inpatients,14 patients scores to those of functional
comparison of poststroke for major depression were admitted for rehabilitation depression. PSE syndrome
depression and functional included in this study. The following acute stroke, profiles were remarkably
depression. Am J major instrument used and 6 referred to outpatient similar between groups. Of
Psychiatry, during examination of all clinic for poststroke the 17 depression symptoms,
1986;143:527–529. patients was modified PSE depression. The 43 only 2 were significantly
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
(Present State functionally depressed different: slowness was more
Examination) with 59 patients were from inpatient frequent and lack of interest and
items specifically related admissions for major concentration was less frequent
to anxiety and depression. depression to the hospital. in poststroke patients.
2008 Holzapfel N, Müller- Depressed patients with Of a total of 921 patients The 677 patients from the No
Tasch T, Wild B, et al. chronic heart failure (CHF; from a CHF and a CHF outpatient clinic ranged
Depression profile in n = 113) and without CHF psychosomatic outpatient in age from 16 to 90 years. 42
patients with and without (n = 137) were compared clinic of the Medical patients (6.2%) met the
chronic heart failure. J with respect to severity of Hospital at the University diagnostic criteria for major
Affect Disorders. individual DSM-IV of Heidelberg, 137 met depressive disorder, and 71
2008;1:53–62. depressive symptoms, as DSM-IV diagnostic patients (10.5%) met the
measured with the PHQ-9. criteria for major diagnostic criteria for other
Of all patients, only those depressive disorder and depressive disorders
who met DSM-IV 113 for other depressive according to the PHQ-9
diagnostic criteria for disorders. Depressed diagnostic algorithm. 244
major depressive disorder patients with CHF patients had evidence of CHF
or other depressive (n = 113) and without CHF in their medical history and
disorders and were able to (n = 137). record. The age range was 16
complete the study to 79 years. 248 patients from
questionnaire were the psychosomatic outpatient
included in the study. clinic participated in this
Statistical method: study. 95 patients (38.9%)
ANCOVAs with met the diagnostic criteria for

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
sociodemographic major depressive disorder,
characteristics as covariates and 42 patients (17.2%) met
were performed separately the diagnostic criteria for
for patients with major other depressive disorders
depressive disorder and according to the PHQ-9
other depressive disorders. diagnostic algorithm.
2006 Ehrt U, Brønnick K, We compared the PD patients included in Patients with PD had less Uncertain
Leentjens AFG, et al. individual depressive this study came from two reported sadness, slightly less
Depressive symptom symptoms of 145 non- different cross-sectional loss of energy, more
profile in Parkinson’s demented depressed studies: a community concentration problems,
disease: a comparison patients with Parkinson’s study in Norway and an fewer feelings of guilt, and
with depression in elderly disease (PD) and 100 outpatient study in the lower anhedonia.
patients without depressed patients without Netherlands. PD patients
Parkinson’s disease. Int J PD by comparing item from both populations
Geriatr Psychiatry. scores on the were included in the study
2006;21(3):252–258. Montgomery-Åsberg if they did not suffer from
Depression Rating Scale dementia and had at least
by way of MANCOVA. mild depressive
Dementia was diagnosed symptoms, which was
according to DSM-IIIR operationalized as a
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
(Stavanger) or DSM-IV MADRS sum score  7
(Maastricht) after a clinical (Snaith et al., 1986). The
examination that included control group consisted of
an interview with a 100 consecutive patients
caregiver in addition to referred to the old age
cognitive testing. psychiatry outpatient clinic
at Stavanger University
Hospital, Norway, suffering
from at least mild depressive
symptoms with a MADRS
score  7. In both
populations, cognition was
assessed with the Mini-
Mental State Examination
(MMSE).
1994 Chochinov HM, Wilson Semi-structured diagnostic 130 inpatients from two A low-threshold (less Uncertain
KG, Enns M, et al. interviews were conducted hospital-based palliative stringent) diagnostic
Prevalence of depression with 130 patients receiving care services with solid approach greatly increased
in the terminally ill: palliative care. Diagnoses tumors were interviewed. the overall prevalence of
according to the RDC
effects of diagnostic (Research Diagnostic Mean age 71.5(SD = 11.0). major and minor depressive
criteria and symptom Criteria) were compared with episodes with both the RDC
threshold judgments. Am diagnoses according to and the Endicott criteria. With
J Psychiatry. Endicott’s revised criteria high thresholds, the RDC and
1994;151:537–540. (which involve replacing the Endicott criteria were
somatic

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
symptoms [change in equivalent, whereas with low
weight or in appetite, sleep thresholds the Endicott
disturbance, loss of energy, substitutions identified fewer
and reduced cases of major (but not minor)
concentration] with depression.
nonsomatic alternatives
[depressed appearance,
social withdrawal,
brooding, self-pity or
pessimism, and lack of
reactivity]) when either a
low-severity or a high-
severity threshold for
classifying RDC criterion
A symptoms was used.
2006 Vilalta-Franch J, Garre- This was a cross-sectional, The patients who The prevalence of depression Uncertain
Olmo J, López-Pousa S, observational study of 491 completed the baseline was 4.9% (95% CI: 3.2–7.1)
et al. patients with probable visit of the EDAC study according to ICD-10 criteria;
Comparison of different Alzheimer’s disease. from 1998–2003 where 9.8% (95% CI: 7.3–12.6)
clinical diagnostic criteria Depression was diagnosed CAMDEX and NPI were according to CAMDEX;
for depression in using five classification used 13.4% (95% CI: 10.6–16.6)
Alzheimer’s disease. systems (ICD-10, DSM– according to DSM–IV; 27.4%
IV, Cambridge (95% CI: 23.6–31.5) according
Am J Geriatr Psychiatry. Examination for Mental to PDC-dAD criteria; and
2006;14(7):589–597. Disorder of the Elderly 43.7% (95% CI: 39.4–48.2)
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
[CAMDEX], Provisional when using the screening
Diagnostic Criteria for questions from the NPI
depression in AD [PDC- depression subscale. The level
dAD], Neuropsychiatric of agreement between the
Inventory [NPI]). classification systems was
Cognitive function was low to moderate (<0.52). The
assessed by MMSE and characteristics associated
CAMCOG. with the most diagnostic
disagreement were loss of
confidence or self-esteem and
irritability.
1996 Stein PN, Sliwinski MJ, Mood evaluation Average age = 67. Patients Somatic items from both Yes
Gordon WA. comprised the Beck were from three hospitals scales were significantly less
Discriminative properties Depression Inventory in New York City. At least specific when diagnosing
of somatic and (BDI) and the Hamilton 4 weeks after a unilateral poststroke depression than
nonsomatic symptoms for Rating Scale for cerebrovascular accident were the nonsomatic
post-stroke depression. Depression (HRSD). (CVA) of ischemic and/or (intrapsychic) items. Somatic
Clin Neuropsychologist. embolic origin without symptoms were neither
1996;10:141–148. past history of psychiatric specific to poststroke
illness, neurologic disease, depression nor added
and substance abuse. incremental validity over
nonsomatic symptoms for
diagnosing poststroke
depression.

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
2005 Simon GE, Von Korff M. Telephone assessments at At staff-model clinics of Overall item response analysis No
Medical comorbidity and baseline, 2 months, and 6 Group Health Cooperative indicated differential item
validity of DSM-IV months included the (GHC), a prepaid health functioning between groups
depression criteria. Psychol Structured Clinical plan serving 450,000 (w2 = 33.7, df = 18, p = 0.017).
Med. 2006;36:27–36. Interview for DSM-IV and members in western Two of eight item-level
other measures of Washington state. comparisons were statistically
depression severity and Computerized records significant; one in the predicted
functional status. Item were used to identify all direction (patients with
Response Theory analyses comorbidity reported more
adult health-plan members
compared patterns of fatigue at low levels of
depressive symptoms across
filling new (no more than depression: w2 =17.9, df = 1,
groups and specifically last 150 days) p < 0.001) and one in the
evaluated somatic prescriptions for opposite direction from
symptoms (fatigue, change antidepressants from predicted (patients with
in weight or appetite, primary care physicians, comorbidity reported less
psychomotor agitation/ those with visit diagnoses psychomotor agitation/
retardation, and sleep of depression, to exclude retardation at low levels of
disturbance) as indicators of patients with diagnoses of depression : w2 = 8.0, df = 1,
depression. bipolar disorder or p = 0.005). Observed
psychotic disorder and differences were modest: at the
identify patients with midpoint of the depression
specific comorbid medical severity scale, patients with
conditions. medical comorbidity had a 54%
probability of reporting fatigue
compared to 45% in those
without comorbidity.
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
2005 de Coster E, Leentjens Structured Clinical From 1/9/1997 to 1/9/ Wilks’ lambda, as a test of No
AFG, Lodder J, et al. The Interview for DSM-IV 1999, all eligible patients discriminant function, was
sensitivity of somatic within 1 month after the with an acute first-ever highly significant
symptoms in post-stroke stroke to confirm or reject clinical presentation of (p < 0.001). In total 88.3% of
depression: a the diagnosis of major cerebral infarction who the patients were correctly
discriminant analytic depressive disorder. were seen in the Accident classified as depressed or
approach. Int J Geriatr Severity of depression was and Emergency nondepressed. In this
Psychiatry. measured with the Department and the discriminant model, as
2005;20:358–362. Hamilton Depression Department of Neurology expected, depressed mood
Rating Scale (HAM-D). At of Maastricht University was the best discriminator
the follow-up interviews at Hospital were entered in a between depressed and
3, 6, 9, and 12 months, prospective stroke nondepressed patients,
depression was diagnosed registry. The only followed by reduced appetite,
using a two-step inclusion criterion was an thoughts of suicide,
procedure. First, three ischemic stroke. psychomotor retardation,
psychiatric rating scales psychic anxiety, and fatigue.
for depression (BDI,
HADS, SCL-90). Then
patients who exceeded the
previously defined cutoff
value on at least one of
these scales were called in
and reinterviewed using
the SCID and HAM-D.

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
2003 Akechi T, Nakano T, A computerized database 220 of a total of 1,721 The results of the logistic Uncertain
Akizuki N, et al. Somatic was used to identify cancer patients referred to regression analyses
symptoms for diagnosing patients with major the Psychiatry Divisions at demonstrated that weight loss
major depression in depression. The database National Cancer Center or appetite change and a
cancer patients. included demographic Hospital and National diminished ability to think or
Psychosomatics. factors, medical factors Cancer Center Hospital concentrate were positively
2003;44:244–248. such as performance status East in Japan between associated with a diminished
and pain, and psychiatric 1996 and 1999 were interest or pleasure after
diagnoses based on a reviewed. adjusting for possible
structured clinical physical confounders.
interview based on the Patients with weight loss or
DSM-IV criteria. appetite change showed a
significantly higher severity
of major depression than
those without this symptom
(p < 0.003), while patients
with the other three somatic
symptoms did not (sleep
disturbance, fatigue,
diminished ability to think).
2006 Guo Y, Musselman D, SCID-32 and the Study subjects with The HAM-D items on weight No
Manatunga A. The dimensional 21-item cancer were recruited loss was not different in two
diagnosis of major HAM-D were from outpatients and groups. In the cancer patients,
depression in patients administered to all in inpatients at Emory 6 of the 21 HAM-D items
with cancer: study University.
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
a comparative approach. participants by a single rater Healthy comparison were significantly associated
Psychosomatics. who was either a master’s- subjects were recruited with an increased probability of
2006;47:376–384. level nurse or a fourth-year from Emory and the major depression: depressed
psychiatry resident. Final surrounding community by mood (p < 0.004), late
psychiatric diagnoses were advertisements or word of insomnia (p < 0.022), agitation
provided by consensus of mouth. (p < 0.008), psychic anxiety
the research team, (p < 0.001), genital symptoms
comprising the (p < 0.017), and diurnal
aforementioned individuals variation (p < 0.046).
and two board-certified
psychiatrists. One-way
analysis of variance
(ANOVA) was used to
compare the continuous
variables.
2006 van Wilgen CP, Dijkstra The CES-D, which contains The data of comparison With ANOVA, the cancer No
PU, Stewart RE, et al. 20 items divided in four subjects and patients with group scored significantly
Measuring somatic domains, was administered cancer were obtained higher than the control group
symptoms with the CES-D to patients at least 1 year from the hospital or health on the domain of Somatic
to assess depression in after the first cancer center database. The Retarded Activity and
cancer patients after treatment (and to a control comparison group was Depressed Affect. The four
treatment: comparison group). Patients with tumor matched for gender and age cancer groups and the
among patients with oral/ recurrences were excluded. with the cancer group and comparison group differed
oropharyngeal, The CES-D in cancer lived in the same area as the significantly on Total score,
gynecological, colorectal, patients has a good internal patients with Somatic Retarded Activity, and
and breast cancer. cancer. Depressed Affect scores.

(Continued )
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
Psychosomatics. consistency (a = 0.89) and The correlations between the
2006;47:465–470. the test–retest reliability domains of Somatic Retarded
was 0.51 (p < 0.001). Activity and Depressed
Affect were significant for the
control group (0.54; p < 0.01)
and the cancer group (0.66;
p < 0.01). The cancer groups,
except the oral/oropharyngeal
patients, and comparison
group showed significantly
higher incidences of
depression symptoms without
the Somatic items as
compared with the CES-D
with Somatic items.
1999 Aikens JE, Reinecke MA, Poser criteria (Poser et al., MS sample was recruited Relative scores for the eight No
Pliskin NH, et al. 1983) for definite or from the Neurology somatic BDI items were
Assessing depressive probable multiple sclerosis Department at the analyzed by multivariate
symptoms in multiple (MS). Mean duration since University of Chicago. analysis of variance with
sclerosis: is it necessary MS diagnosis was 11.0 Healthy control (HC) demographic variables and
to omit items from the years with moderate subjects comprised 49 BDI total as covariates. The
original Beck Depression disability due to MS. Beck students from the only significant difference
Inventory? J Behav Med. Depression Inventory was University of Chicago and was MS > HC (item 15). On
1999;22(2):127–142. used to diagnose 39 subjects from the raw scores, MS patients
depression and also community. exceeded HCs on items 15
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports


Unique
Scales?
(Yes, no,
uncertain)
computed with Mohr and 21 (sexual disinterest),
et al.’s proposed 18-item but this was attributable to the
BDI modification (BDI- low HC item endorsement.
18), as well as the There were no other
cognitive/affective (items differences on somatic items
1  13) and somatic (items or item – total correlations.
14 – 21) BDI subscales
suggested by Cavanaugh
et al. (1983). MS severity
(MS sample only) was
assessed with the widely
used Expanded Disability
Status Scale (EDSS;
Kurtzke, 1983),
administered by a
neurologist.
236 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(2006)40 found that some specific cancers showed less somatic morbidity,
while others featured more. Similarly, Ehrt and associates (2007)41 found
that depressed patients with Parkinson’s disease had less loss of energy but
more concentration problems than depressed control subjects. Ultimately, any
new scales should be tested head to head with existing tools (see Chapter 4 for
further discussion). To date, attempts to produce custom scales based on
exclusion of somatic items have not proven their superiority in well-designed
implementation studies showing superiority of new over old. We therefore
conclude that based on the evidence to date, including somatic symptoms does
not lead to many false-positive diagnoses when attempting to diagnose depres-
sion in the context of physical disease (Table 11.2). Indeed, the systematic
exclusion of somatic symptoms might cause an under-recognition of major
depression. Given the limited evidence in specific areas, we suggest that
further studies are required in relation to minor depression and subsyndromal
forms.

References
1. Niles BL, Mori DL, Lambert JF, et al. Depression in primary care: Comorbid disorders
and related problems. J Clin Psychol Med Settings. 2005;12(1):71–77.
2. Dwight-Johnson M, Sherbourne CD, Liao D, et al. Treatment preferences among
depressed primary care patients. J Gen Intern Med. 2000;15(8):527–534.
3. Berardi D, Menchetti M, De Ronchi D, et al. Late-life depression in primary care:
A nationwide Italian epidemiological survey. J Am Geriatr Soc. 2002;50(1):77–83.
4. Wells KB, Rogers W, Burnam A, et al. How the medical comorbidity of depressed
patients differs across health care settings: results from the Medical Outcomes Study.
Am J Psychiatry. 1991;148:1688–1696.
5. Yates WR, Mitchell J, Rush AJ, et al. Clinical features of depressed outpatients with and
without co-occurring general medical conditions in STAR*D. Gen Hosp Psychiatry.
2004;26(6):421–429.
6. Aragones E, Pinol JL, Labad A. Depression and physical comorbidity in primary care. J
Psychosom Res. 2007;63(2):107–111.
7. Vuorilehto M, Melartin T, Isometsa E. Depressive disorders in primary care: recurrent,
chronic, and co-morbid. Psychol Med. 2005;35(5):673–682.
8. Nuyen J, Spreeuwenberg PM, Van Dijk L, et al. The influence of specific chronic
somatic conditions on the care for co-morbid depression in general practice. Psychol
Med. 2008;38(2):265–277.
9. Cole MG, Bellavance F. Depression in elderly medical inpatients: a meta-analysis of
outcomes. Can Med Assoc J. 1997;157:1055–1060.
10. Oslin DW, Datton CJ, Kallan MJ, et al. Association between medical comorbidity and
treatment outcomes in late-life depression. J Am Geriatr Soc. 2002;50:823–828.
11. Bogner HR, Cary MS, Bruce ML, et al. The role of medical comorbidity in outcome of
major depression in primary care—The PROSPECT study. Am J Geriatr Psychiatry.
2005;13(10):861–868.
11 THE CASE AGAINST SPECIFIC SCALES 237

12. Egede LE. Major depression in individuals with chronic medical disorders: prevalence,
correlates and association with health resource utilization, lost productivity and
functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
13. Katon W, Lin EHB, Kroenke K. The association of depression and anxiety with medical
symptom burden in patients with chronic medical illness. Gen Hosp Psychiatry.
2007;29(2):147–155.
14. Egede LE. Diabetes, major depression, and functional disability among US adults.
Diabetes Care. 2004;27(2):421–428.
15. Scott KM, Browne MAO, McGee MA, et al. Mental-physical comorbidity in Te Rau
Hinengaro: The New Zealand Mental Health Survey. Aust N Z J Psychiatry.
2006;40(10):882–888.
16. Nuyen J, Schellevis FG, Satariano WA, et al. Comorbidity was associated with
neurologic and psychiatric diseases: A general practice-based controlled study. J Clin
Epidemiol. 2006;59(12):1274–1284.
17. Juurlink DN, Herrmann N, Szalai JP. Medical illness and the risk of suicide in the
elderly. Arch Intern Med. 2004;164:1179–1184.
18. International Association for the Study of Pain, Subcommittee of Taxonomy. Pain
terms: a current list with definitions and notes on usage. Part II. Pain.
1979;6:249–252.
19. Zimmerman M. McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
I. A psychometric evaluation of the DSM-IV symptom criteria. J Nerv Ment Dis.
2006;194:158–163.
20. Simon GE, VonKorff M, Piccinelli M, et al. An international study of the
relation between somatic symptoms and depression. N Engl J Med.
1999;341:1329–1335.
21. Marple RL, Kroenke K, Lucey CR, et al. Concerns and expectations on patients
presenting with physical complaints: frequency, physician perceptions and actions,
and two-week outcome. Arch Intern Med. 1997;157:1482–1488.
22. Cavanaugh SV. Depression in the medically ill: critical issues in diagnostic assessment.
Psychosomatics. 1995;36:48–59.
23. Koenig HG, George LK, Peterson BL, et al. Depression in medically ill hospitalized
older adults: prevalence, characteristics, and course of symptoms according to six
diagnostic schemes. Am J Psychiatry. 1997;154:1376–1383.
24. Cavenaugh S, Clark D, Gibbons R. Diagnosing depression in the hospitalized medically
ill. Psychosomatics. 1983;24:809–815.
25. Plumb M, Holland J. Comparative studies of psychological function in patients with
advanced cancer: 2. Interviewer-rated current and past psychological symptoms.
Psychosom Med. 1981;43:243–254.
26. Kathol RG, Mutgi A, Williams J, et al. Diagnosis of major depression in cancer patients
according to four sets of criteria. Am J Psychiatry. 1990;147:1021–1024.
27. Chochinov HM, Wilson KG, Enns M, et al. Prevalence of depression in the terminally
ill: effects of diagnostic criteria and symptom threshold judgments. Am J Psychiatry.
1994;151:537–540.
28. Dugan W, McDonald MV, Passik SD, et al. Use of the Zung Self-Rating Depression
Scale in cancer patients: feasibility as a screening tool. Psychooncology.
1998;7:483–493.
29. Eriksen HR, Svendsrød R, Ursin G, et al. Prevalence of subjective health complaints in
the Nordic European countries in 1993. Eur J Public Health. 1998;8:294–298.
238 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

30. Kroenke K, Jackson J, Chamberlin J. Depressive and anxiety disorders in patients


presenting with physical complaints: clinical predictors and outcome. Am J Med.
1997;103:339–347.
31. Kroenke K, Price R. Symptoms in the community. Prevalence, classification and
psychiatric comorbidity. Arch Intern Med. 1993;153:2474–2480.
32. Simon GE, Vonkorff M. Somatization and psychiatric disorder in the NIMH
Epidemiologic Catchment area study. Am J Psychiatry. 1991;148(11):1494–1500.
33. Haug TT, Mykletun A, Dahl AA. The association between anxiety, depression, and
somatic symptoms in a large population: The HUNT-II Study. Psychosom Med. 2004;
66:845–851.
34. Gerber PD, Barrett JE, Barrett JA, et al. The relationship of presenting physical
complaints to depressive symptoms in primary care patients. J Gen Intern Med.
1992;7:170–173.
35. Zimmerman M. McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
III. Can some symptoms be eliminated from the diagnostic criteria? J Nerv Ment Dis.
2006;194:313–317.
36. Mitchell AJ, McGlinchey JB, Young D, et al. Accuracy of specific symptoms in the
diagnosis of major depressive disorder in psychiatric out-patients: data from the
MIDAS project. Psychol Med. Nov. 12, 2008:1–10.
37. Lipsey JR, Spencer WC, Rabins PV, et al. Phenomenological comparison of poststroke
depression and functional depression. Am J Psychiatry. 1986;143:527–529.
38. Simon G, Von Korff M. Medical co-morbidity and validity of DSM-IV depression
criteria. Psychol Med. 2006;36:27–36.
39. Pickard AS, Dalal MR, Bushnell DB. Comparison of depressive symptoms in stroke
and primary care: applying Rasch models to evaluate the Center for Epidemiologic
Studies-Depression Scale. Value in Health. 2006;9(1):59–64.
40. van Wilgen CP, Dijkstra PU, Stewart RE, et al. Measuring somatic symptoms with the
CES-D to assess depression in cancer patients after treatment: comparison among
patients with oral/oropharyngeal, gynecological, colorectal, and breast cancer.
Psychosomatics. 2006;47:465–470.
41. Ehrt U, Brønnick K, Leentjens AFG, et al. Depressive symptom profile in Parkinson’s
disease: a comparison with depression in elderly patients without Parkinson’s disease.
Int J Geriatr Psychiatry. 2006;21(3):252–258.
42. Yates WR, Mitchell J, Rush AJ, et al. Clinical features of depression in outpatients with
and without co-occurring general medical conditions in STAR*D. J Clin Psychiatry.
2007;9:7–15.
43. Aikens JE, Reinecke MA, Pliskin NH, et al. Assessing depressive symptoms in multiple
sclerosis: is it necessary to omit items from the original Beck Depression Inventory?
J Behav Med. 1999;22(2):127–142.
44. Guo Y, Musselman D, Manatunga A. The diagnosis of major depression in patients with
cancer: a comparative approach. Psychosomatics. 2006;47:376–384.
45. Holzapfel N, Müller-Tasch T, Wild B, et al. Depression profile in patients with and
without chronic heart failure. J Affect Disorders. 2008;1:53–62.
46. Chen ML, Chang H-K. Physical symptom profiles of depressed and nondepressed
patients with cancer. Palliative Med. 2004;18(8):712–718.
47. Stein PN, Sliwinski MJ, Gordon WA. Discriminative properties of somatic and
nonsomatic symptoms for post-stroke depression. Clin Neuropsychologist.
1996;10:141–148.
11 THE CASE AGAINST SPECIFIC SCALES 239

48. Kathol RG, Mutgi A, Williams J, et al. Diagnosis of major depression in cancer patients
according to four sets of criteria. Am J Psychiatry. 1990;147:1021–1024.
49. Kalichman SC, Rompa D, Cage M. Distinguishing between overlapping somatic
symptoms of depression and HIV disease in people living with HIV-AIDS. J Nerv
Ment Dis. 2000;188(10):662–670.
50. Leentjens AFG, Marinus J, Van Hilten JJ, et al. The contribution of somatic symptoms
to the diagnosis of depressive disorder in Parkinson’s disease. A discriminant analytic
approach. J Neuropsychiatry Clin Neurosci. 2003;15:74–77.
51. Akechi T, Nakano T, Akizuki N, et al. Somatic symptoms for diagnosing major
depression in cancer patients. Psychosomatics. 2003;44:244–248.
52. de Coster E, Leentjens AFG, Lodder J, et al. The sensitivity of somatic symptoms in
post-stroke depression: a discriminant analytic approach. Int J Geriatr Psychiatry.
2005;20:358–362.
53. Wedding U, Koch A, Rohrig B, et al. Requestioning depression in patients with cancer:
Contribution of somatic and affective symptoms to Beck’s Depression Inventory. Ann
Oncol. 2007;18(11):1875–1881.
This page intentionally left blank
12
SCREENING FOR DEPRESSION IN
NEUROLOGIC DISORDERS

Andres M. Kanner

1. Depression in Stroke
2. Depression in Multiple Sclerosis
3. Depression in Epilepsy
4. Depression in Parkinson’s Disease
5. Conclusions

Context
Depression appears to be particularly common in several neurologic disor-
ders, including epilepsy, stroke, dementias, Parkinson’s disease, Huntington’s
disease, and multiple sclerosis. There is some evidence that the ‘‘depression’’
associated with each neurologic disorder is distinct in symptoms and course.
This suggests it may be useful to have depression scales validated for each
neurologic disorder, yet most instruments appear to yield comparable accep-
table sensitivities and specificities. However, head-to-head comparisons of
scales and implementation studies are needed to resolve this issue.
Depressive disorders are a common psychiatric comorbidity of neurologic
disorders, including epilepsy, stroke, dementias, Parkinson’s disease (PD), essen-
tial tremor, Huntington’s disease, migraines and multiple sclerosis (MS), to name
the principal ones.1 It is typically assumed that depressive disorders are a compli-
cation of these neurologic disorders. However, data published in the past 15 years
have suggested a bidirectional relation between depression and stroke,2–4 epi-
lepsy,5–7 dementia,8–10 and PD.11,12 In other words, not only are patients with these
neurologic conditions at greater risk of developing depression, but patients with
depression are at greater risk of developing one of these disorders.

241
242 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Early identification of comorbid depressive disorders is of the essence given


their negative impact on quality of life and the course and response to treatment
of most of these neurologic disorders. Unfortunately, depression often goes
unrecognized and hence untreated. Clearly the use of screening instruments by
neurologists may help remedy this problem. Several caveats need to be con-
sidered, however. First, the clinical presentation of comorbid depressive dis-
orders may differ in several ways from that of primary depression, such as in
cases of depression in epilepsy.13 Second, several somatic and cognitive
symptoms are common in primary depression and most neurologic disorders
(ie, fatigue, poor concentration, and slow thinking). Thus, a higher score may
be a reflection of such symptoms and not of a depressive episode per se. Third,
most of the available screening instruments for depression were developed for
primary mood disorders and hence may yield false-positive or -negative find-
ings. Fourth, the presence of cognitive deficits related to the underlying
neurologic disorder may limit the patient’s ability to complete on his or her
own self-report screening instruments in a reliable manner. In such cases, the
use of examiner-administered instruments may yield more accurate data. The
aim of this chapter is to provide a practical review of the prevalence and
clinical manifestations of depression in four major neurologic disorders—
stroke, epilepsy, PD, and MS—as well as of its impact on the course of these
diseases and quality of life. In addition, this chapter provides a review of the
literature of the screening instruments frequently used to identify depression in
these neurologic disorders.

1. Depression in Stroke
Epidemiologic Considerations
The prevalence rates of post-stroke depression (PSD) have ranged from 30% to
50% in several cross-sectional studies.14–17 In a review of the literature,
Robinson14 calculated the pooled prevalence of all types of PSD in various
populations to be 31.8% (range 30% to 44%) from four community-based
studies. Prevalence rates ranged from 25% to 47% from studies carried out
in acute hospitals and from 35% to 72% in studies done in rehabilitation
hospitals.
As stated above, a bidirectional relation has been identified between depres-
sive disorders and stroke. Five studies have investigated the impact of depres-
sion on the risk of stroke in large cohorts ranging from 1,703 to 6,675
subjects.2–4,18,19 Four found that depression increased the risk of developing
stroke, after controlling for other risk factors associated with this neurologic
condition.2,3,18,19 For example, Larson and colleagues3 followed 1,703 sub-
jects for 13 years; patients with a depressive disorder or depressive symptoms
had a 2.67 (confidence interval [CI] 1.08–6.63) relative risk of developing a
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 243

stroke, after controlling for vascular risk factors (ie, hypertension, diabetes,
hyperlipidemia, heart disease, and use of tobacco). May and colleagues2
followed 2,201 men aged 45 to 59 years for 14 years. Patients with significant
symptoms of depression had a 3.36 (CI 1.29–8.71) relative risk of developing a
fatal stroke. The increased risk of stroke in patients with depression may be
mediated through a direct impact on coagulation and central nervous system
vascular parameters, and indirectly by increasing risks of cardiovascular dis-
ease, hypertension, cardiac arrhythmias, and diabetes.14

Clinical Manifestations
PSD can present as major depressive episodes and minor depression. Various
investigators have proposed the existence of another type of PSD, referred to as
vascular depression. This is a late-onset (after the age of 65) depressive
disorder identified in patients who may have had overt or silent strokes or
subcortical bilateral white matter ischemic disease.
The occurrence of PSD peaks between the third and sixth month after the
stroke. For example, in a study of 100 stroke patients followed for 18 months,
symptoms of PSD were identified in 46% of patients during the first 2 months,
while only 12% of patients experienced their first symptoms 12 months after
their stroke.20 Among the patients with early-onset PSD, symptoms of depres-
sion persisted 12 and 18 months later. In addition, the course of PSD can be
rather lengthy. For example, symptoms of major depression identified in 27%
of patients with a stroke persisted for approximately 1 year, while
symptoms of minor depression in 20% of stroke patients lasted for more than
2 years.21,22
Duration of PSD symptoms appears to depend on the vascular territory of
the stroke, with longer durations being identified in patients with a stroke in the
middle cerebral artery than in the posterior circulation. In one study, 82% of
patients with middle cerebral artery stroke continued to be symptomatic at the
6-month follow-up visit, versus 20% of those with posterior circulation
strokes.23 At 12 and 24 months none of the patients with posterior circulation
strokes exhibited symptoms, but 62% of those with middle cerebral artery
stroke did.
In general, the clinical manifestations of PSD are similar to those of primary
late-onset depression, with the caveat that psychomotor retardation may be
more frequently identified among patients with PSD. In fact, Lipsey and
associates24 found that the presence of slowness/psychomotor retardation in
PSD patients was one of the differentiating symptoms from idiopathic
depressed patients who, in turn, reported more anhedonia and more difficulty
with concentration. Likewise, neurovegetative symptoms (eg, changes in
sleep, appetite, and sexual drive) and fatigue are common symptoms of
244 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

depressive and neurologic disorders, and PSD can worsen their severity. In
fact, in a study by Federoff and associates,16 disturbances of sleep, libido, and
level of energy were significantly more frequent among depressed than
nondepressed stroke patients at initial evaluation and at 3, 6, 12, and 24
months.
The severity of PSD has been found to correlate with the degree of impair-
ment of activities of daily living (ADLs), during both its acute and chronic
phases. Furthermore, the presence of cognitive disturbances such as aphasia
and even dementing processes associated with the underlying stroke may delay
the recognition of a depressive disorder. Gainotti and colleagues25 have also
suggested that patients with PSD are more likely to present with catastrophic
reactions, emotionalism, and diurnal mood variation than patients with idio-
pathic depression, though these findings have not been confirmed by other
investigators.

Screening Instruments
Most of the screening instruments used in stroke patients have not been
validated for PSD.26 Therefore, there is a concern of a potential for false-
positive diagnosis of depression based on the presence of neurovegetative
symptoms. However, this does not appear to be the case. For example, in a
study of 142 consecutive patients with stroke who were followed for 2 years,
Paradiso and coworkers27 identified 26 who met DSM-IV criteria for major
depression during their hospitalization. Excluding the vegetative symptoms
did not modify the sensitivity of the diagnosis, while the specificity decreased
only to 98%.27 Thus, the diagnosis of depression can be based solely on the
‘‘psychological symptoms.’’
On the other hand, neurovegetative symptoms can be helpful as well to
distinguish depressed from nondepressed stroke patients. For example, in a
study of 206 patients with a first stroke, de Coster and colleagues28 adminis-
tered the Structured Clinical Interview for DSM-IV (SCID) and the Hamilton
Depression Scale (HAM-D, which includes seven items that identify neurove-
getative symptoms); 32% of patients met the criteria for PSD. The discriminant
model based on HAM-D item scores was highly significant and classified
88.3% of patients correctly as depressed or nondepressed.28 As expected,
‘‘depressed mood’’ discriminated best between depressed and nondepressed
stroke patients. However, some somatic symptoms, such as reduced appetite,
psychomotor retardation, and fatigue, had also good discriminative properties.
Among the screening instruments developed for the identification of idiopathic
depression, the self-rating scales most frequently used in studies of PSD have
included the Beck Depression Inventory (BDI), the Hospital Anxiety and
Depression Scale (HADS), the Zung Scale (ZS), the Geriatric Depression
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 245

Scale (GDS), and the Center for Epidemiologic Studies-Depression


scale (CES-D). HAM-D is one of the examiner-rating scales most frequently
used.
In a study of 202 consecutive patients, Aben and colleagues29 found that the
sensitivity of the BDI and the HADS ranged between 80% and 90%, while their
specificity was 60%; the HAM-D yielded a sensitivity of 78.1% and specificity
of 74.6%. In a study of 40 elderly stroke patients, 17 of whom were found to be
depressed, the GDS and the ZS had the highest sensitivity and the ZS had the
highest positive predictive value (93%).30 The value of the GDS was also
found to be useful in the Perth Community Stroke Study, but that was not the
case with the HADS.30 By the same token, O’Rourke and associates31 did not
find the HADS to be adequate in identifying anxiety or PSD 6 months after a
stroke in a study of 105 consecutive patients.
Among other scales, Healey and colleagues32 tested the sensitivity and
specificity of the Brief Assessment Schedule Depression Cards (BASDEC)
and the Beck Depression Inventory-Fast Screen (BDI-FS) to identify PSD in 49
elderly patients with stroke with a mean age of 78.8 – 6.8 years. Using cutoff
scores of 7 or more, the BASDEC yielded a sensitivity of 100% and specificity
of 95% for detecting major depression, whereas the BDI-FS (cutoff scores of 4
or more) had a sensitivity of 71% and specificity of 74%. When patients with
minor depression were included in analyses, the sensitivity for the BASDEC
decreased to 69% while the specificity remained high (97%), and the sensi-
tivity of the BDI-FS decreased to 62% while its specificity remained almost
unchanged at 78%.
The Post-Stroke Depression Scale (PSDS) is one of the rare scales devel-
oped specifically for the identification of PSD.33 It has 10 items: depressed
mood, guilt feelings, thoughts of death or suicide, neurovegetative symptoms,
apathy and loss of interest, anxiety, catastrophic reactions, hyperemotionalism,
anhedonia, and diurnal mood variations. These investigators administered the
PSDS and the HAM-D to 124 stroke patients, 45 of who had met DSM-III-R
criteria for major depression and 47 for minor depression. Scores were com-
pared to those obtained on the same scales by 17 psychiatric patients also
diagnosed with major depression on the basis of DSM-III-R diagnostic criteria.
These investigators suggested that the PSDS demonstrated a continuum
between major and minor forms of PSD. In contrast to other authors, they
concluded that in stroke patients, a DSM-III-based diagnosis of major PSD
could be in part inflated by symptoms (such as apathy and neurovegetative
symptoms) that are typical of major depression in a patient free from brain
damage but that could be due to the brain lesion in a stroke patient. Very few
studies have published data with respect to the use of the PSDS.
The presence of aphasia may make it very difficult for clinicians to identify
PSD and more so when using screening instruments. Sutcliffe and Lincoln34
246 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

developed the Stroke Aphasic Depression Questionnaire (SADQ) to detect


depressed mood in aphasic patients in the community. They studied 70 stroke
patients who had been discharged from the hospital with the SADQ, the
HADS, and the Wakefield Depression Inventory. The SADQ was also admi-
nistered to 17 aphasic patients on two occasions at a 4-week interval. The
scores on the SADQ were significantly related to other measures of depression
(r = 0.22–0.52, p < 0.05). A shortened 10-item version showed higher validity
(r = 0.32–0.67, p < 0.01). Test–retest analysis also indicated that the SADQ is
reliable over a 4-week interval (r = 0.72, p < 0.001). Sackley and colleagues35
found that the SADQ yielded a sensitivity of 77% and a specificity of 78% in a
study of 88 stroke patients, while others have replicated the utility of this
scale.36

Impact of Post-stroke Depression on the Course of the Stroke


PSD has been found to have a negative impact on recovery of cognitive
function, recovery of ability to perform ADLs, and mortality risks. With
respect to the impact of PSD on cognitive functions, Starkstein and
coworkers37 demonstrated that patients with major PSD had significantly
more cognitive deficits than nondepressed patients who experienced a similar
location and size of left-hemisphere stroke. However, this was not the case for
strokes in the right hemisphere. In a follow-up study of 140 patients, Robinson
and colleagues38 also found that the presence of major PSD was associated
with greater cognitive impairment 2 years after a stroke.
Regarding the impact of PSD on the recovery of ADLs, Parikh and collea-
gues39 found that in-hospital PSD was the most important variable predicting
poor recovery in ADLs over a 2-year period. In fact, the score of in-hospital
ADLs was not associated with the 2-year recovery. Likewise, the negative
impact of PSD on the course of strokes is reflected in the associated higher
mortality risk. Indeed, in a study of 976 stroke patients followed for 1 year,
those with PSD had 50% higher mortality than those without.40 An example of
a screening implementation study and yield for PSD is discussed in Chapter 5
and illustrated in the Appendix Fig Ap2.

2. Depression in Multiple Sclerosis


Epidemiologic Aspects
Review of various studies has indicated the presence of depressive symptoms
in approximately 80% of all patients with MS; in 20% of these patients,
psychopharmacotherapy is indicated. Several studies have estimated lifetime
prevalence rates of major depressive disorders to range from 10% to 60%.41–43
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 247

In a population-based study, Patten and associates44 found a lifetime preva-


lence of major depression of 22.8% in MS patients compared to 16% in the
general population.45 Being a woman younger than 35 years old, having a
family history of major depression, and carrying high levels of stress were
found to be risk factors. In a subsequent population-based survey of 115,071
individuals, MS was identified in 322 people and major depression in 9,010
people.46 The annual prevalence of major depression in those with MS was
15.7%, compared to 7.4% in the non-MS population. As in the previous study,
being younger and female were each associated with a higher rate of depres-
sion, with a greater prevalence found in MS patients aged 18 to 45 (25.7%)
compared to those who were over 45 (8.4%). This study also demonstrated a
higher prevalence of depression among MS patients than those with other
chronic medical illnesses. Furthermore, significantly higher point prevalence
rates have been identified in MS patients than in the general population,
ranging between 27% and 54%. Likewise, the prevalence of bipolar disease
is significantly greater among MS patients than in the general population,47–49
with one study finding a 13% prevalence rate among 100 patients.49

Clinical Manifestations
Depression in MS may be indistinguishable from primary mood disorders.
However, as in the case of PSD, symptoms relating to the underlying neuro-
logic process (eg, cognitive, neurovegetative, and somatic symptoms) can also
be confused with symptoms of a depressive disorder. Such is the case of
symptoms like fatigue, which has been identified in up to 90% of MS patients
and is not necessarily associated with a mood disorder.50 On the other hand, the
impact of depressive disorders on fatigue was illustrated in a study in which
global fatigue severity was significantly reduced with an improvement of a
comorbid depression following treatment with cognitive–behavioral therapy,
sertraline, or supportive group therapy.51
As with other neurologic disorders, suicidal ideation is a serious problem in
MS and has been identified in up to 22% of patients.52 The rate of completed
suicide in patients with MS has been reported to be 7.5 times higher than would
be expected in the general population, although reviews of the literature have
found conflicting results.52–54 The risk appears to be greatest in the first 5 years
after diagnosis and in patients ages 40 to 49 years.54
Just as in idiopathic depression, comorbid anxiety in association with
depression increases the incidence of suicidality.55 For example, in a study
of 152 MS patients, anxiety symptoms were seen in 25%, and depression was
seen in 14% of the population. In the group with combined anxiety and
depression, 64% had suicidal thoughts, compared with 33% of the group that
had anxiety alone and 43% of the group with depression alone.56
248 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Screening Instruments
Investigators and clinicians have used screening instruments developed for
primary depression. An ongoing debate, however, has centered on the potential
confounding effect that several somatic symptoms common to MS and depres-
sion may have in yielding a false-positive diagnosis of depression. Some
authors have questioned the need to exclude certain somatic symptoms when
these screening instruments are used in MS. For example, in the study by Patten
and associates cited above,46 excluding cognitive symptoms and confounding
symptoms of fatigue resulted in a drop of the overall prevalence rates of
depression in both the MS (from 15.7% to 6.8%) and non-MS populations
(from 7.4% to 3.2%). On the other hand, in a study of 42 patients with MS and
depression, Moran and Mohr57 found that the successful treatment of depres-
sion resulted in an improvement in the score of all 21 items of the BDI-II, but
only of 12 of the 17 items of the HAM-D, including the items dealing with the
somatic symptoms. These authors endorsed the use of the BDI-II in its original
form. In fact, population-based studies have found the BDI to be a good
screening instrument for depression in MS.58,59 Furthermore, in a study of 46
newly diagnosed MS patients, Sullivan and associates60 found that a BDI
cutoff score of 13 yielded a sensitivity of 71% and specificity of 79% for
major depression.
The CES-D scale has been also found to be a valuable screening instrument
in population-based studies.61,62 For example, Chwastiak and coworkers61
found clinically significant depressive symptoms (CES-D score 16 or above)
in 41.8% of 739 patients with MS; 29.1% of the subjects had moderate to
severe depression (score 21 or above). Of note, patients with advanced MS
were much more likely to experience clinically significant depressive symp-
toms than subjects with minimal disease.
The impact of cognitive impairments on the recognition of symptoms of
depression has been also a source of concern. To investigate this potential
problem, Gold and associates63 administered the HADS to 80 MS patients with
cognitive dysfunction, established with the Symbol Digit Modalities Test
(SDMT), and 107 unimpaired patients. The HADS exhibited good internal
consistency and retest reliability. The pattern and magnitude of correlations
with other health status measures supported its validity. Of note, cognitively
impaired patients had significantly higher scores in the depression and anxiety
subscales.
Abbreviated screening instruments have also been found to be effective.
Benedict and colleagues64 investigated the validity of the BDI-FS in 54 con-
secutive MS patients; 48 caregiver/informants were interviewed using the
Neuropsychiatric Inventory (NPI). The BDI-FS correlated significantly with
other self-report measures of depression and with the informant-reported
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 249

dysphoria. Furthermore, the BDI-FS scores discriminated MS patients under-


going treatment for depressive disorder from untreated MS patients. In a study
of 260 MS patients, 26% of whom met the criteria for major depression with
the major depression module of the SCID, Mohr and associates65 investigated
the sensitivity of two questions, one referring to a depressed mood and the
second to an inability to experience pleasure. The two questions identified 99%
(95% CI 91–100%) of cases.

Impact of Depression on Quality of Life in Multiple Sclerosis


Depression has been found to significantly and independently affect quality of
life. In fact, depression has consistently been a stronger predictor of poor
quality of life than the severity of MS, as measured by the Extended
Disability Status Scale (EDSS), which has been found to correlate modestly
with quality of life.66–70 Depression is associated not only with lower overall
quality of life, but also with sexual dysfunction and health distress beyond that
accounted for by disability status.66,67 For example, in a study of 136 MS
patients, 22.8% had a history of major depression and had significantly lower
quality-of-life scores in the areas of energy, mental health, sexual and cogni-
tive functioning, and general quality of life than did MS patients who had never
had major depressive disorder.67

3. Depression in Epilepsy
Epidemiologic Aspects
The prevalence of depression in epilepsy is higher than in a matched popula-
tion of healthy controls and ranges from 3% to 9% in patients with controlled
epilepsy to 20% to 55% in patients with recurrent seizures.71–75 For example,
in a study of 155 patients with epilepsy identified from two large primary care
practices in the United Kingdom, 33% of patients with recurrent seizures and
6% of those in remission had depression.72 A recent large population-based
study demonstrated a relatively high lifetime prevalence of mood disorders in
patients with epilepsy using the Canadian Community Health Survey (CCHS
1.2) to investigate the prevalence of psychiatric comorbidity in persons with
epilepsy in the community compared with those without epilepsy.74 The CCHS
included the administration of the World Mental Health Composite
International Diagnostic Interview to a sample of 36,984 subjects. A preva-
lence of 0.6% of epilepsy was identified in this cohort. A 17.4% lifetime
prevalence of major depressive disorders was found in patients with epilepsy
(95% CI 10.0–24.9) versus 10.7% (95% CI 10.2–11.2) in the general popula-
tion. Furthermore, patients with epilepsy had a 24.4% (95% CI 16.0–32.8)
250 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

lifetime prevalence for any type of mood disorder versus 13.2% (95% CI
12.7–13.7) among the general population. The lifetime prevalence of suicidal
ideation was twice as high in patients with epilepsy (25%; 95% CI 17.4–32.5)
compared to that of the general population (13.3%; 95% CI 12.8–13.8).
As in the case of stroke, a bidirectional relation has been identified between
depressive disorders and epilepsy. Indeed, three population-based studies have
demonstrated that depressive disorders can precede the onset of epilepsy.5–7
The first study was a Swedish population-based case-control study in which
depression was found to be seven times more frequent among patients with
new-onset epilepsy, preceding the seizure disorder, than among age- and sex-
matched controls.5 When analyses were restricted to cases with partial epi-
lepsy, depression was found to be 17 times more common among cases than
among controls. The second population-based study included all adults aged 55
years and older at the time of the onset of their epilepsy living in Olmstead
County, MN.6 In this study, the investigators found that a diagnosis of depres-
sion preceding the time of their first seizure was 3.7 times more frequent
among cases than among controls after adjusting for medical therapies for
depression. As in the Swedish study,5 this increased risk was greater among
cases with partial-onset seizures.
The third study, carried out in Iceland, investigated the role of specific
symptoms of depression in predicting the development of unprovoked seizures
or epilepsy in a population-based study of 324 children and adults aged 10
years and older with a first unprovoked seizure or newly diagnosed epilepsy
and 647 controls.7 Major depression was associated with a 1.7-fold increased
risk for developing epilepsy, while a history of attempted suicide was 5.1-fold
more common among cases than among controls.

Clinical Manifestations
Depression in epilepsy can mimic any of the mood disorders included in the
DSM-IV classification. However, in a significant percentage of patients, the
depressive episodes have an atypical clinical presentation that fails to meet any
of the DSM (be it III, III-R, or IV) Axis I categories.71
Symptoms or episodes of depression can be classified according to their
temporal relation with seizure occurrence. They can be identified prior to the
onset of seizures (preictal period), as an expression of the actual seizures (ictal
symptoms), following seizures (during the postictal period, which may extend
up to 120 hours after a seizure), or, more commonly, independently of seizures
(interictal symptoms). Peri-ictal symptoms are often unrecognized by clini-
cians, which accounts for the scarcity of data regarding their prevalence and
response to treatment.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 251

Preictal Symptoms
Preictal symptoms generally manifest as a cluster of dysphoric symptoms
lasting hours, or even 1 to 3 days, prior to the onset of a seizure. In one
study, Blanchet and Frommer76 examined mood changes over 56 days in 27
patients who were asked to rate their mood on a daily basis. Changes in mood
were noted by 22 patients during the 72 hours preceding the seizure, consisting
primarily of symptoms of dysphoria, anxiety, and irritability.

Ictal Symptoms
Ictal symptoms of depression are those expressed during a simple partial
seizure.77–79 One study estimated that psychiatric symptoms occur in 25% of
auras, with 15% of these involving affect or mood changes—depression
symptoms ranked second after anxiety/fear as the most common type of ictal
affect in this study.77 Ictal symptoms of depression are usually brief and
stereotypical, develop out of context, and are affiliated with other ictal phe-
nomena. Feelings of anhedonia (inability to experience pleasure in anything),
guilt, and suicidal ideation represent the most prevalent symptoms. Ictal
symptoms of depression are often followed by an alteration of consciousness
as the ictus evolves from a simple to a complex partial seizure.

Postictal Symptoms
Postictal symptoms of depression existing in patients with epilepsy have long
been identified yet have been systematically investigated in only one study at
the Rush Epilepsy Center80 in 100 consecutive patients with poorly controlled
partial seizure disorders. The postictal period was defined as the 72 hours that
followed recovery of consciousness from a seizure or cluster of seizures, and
symptoms were identified with a 42-item questionnaire. The questions on
depressive symptoms were intended to target anhedonia, irritability, poor
frustration tolerance, feelings of hopelessness and helplessness, suicidal idea-
tion, feelings of guilt, self-deprecation, and crying bouts. Five neurovegetative
symptoms that are common postictally (including postictal fatigue, and
changes in patterns of sleep, appetite, and sexual drive) were investigated but
not classified as symptoms of depression so as not to falsely increase their
prevalence. Only those symptoms consistently identified by patients during the
postictal periods of more than 50% of their seizures were made subject to
analysis; this ensured that only the postictal symptoms of habitual occurrence
were targeted. The typical duration of each symptom was estimated. The
symptoms that also occurred during the interictal period were also identified,
and their severity during interictal versus postictal periods was compared.
Forty-three patients (43%) experienced a median of five postictal symptoms
of depression habitually (range two to nine). Thirty-five patients reported at
least two postictal symptoms with a minimum duration of 24 hours, and 13 of
252 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

these patients experienced at least seven symptoms clustered to mimic symp-


toms of major depression spanning 24 hours or longer. Two thirds of symptoms
had a median duration of 24 hours or longer.
Postictal suicidal ideation was identified in 13 patients—8 experienced both
passive and active suicidal thoughts, while 5 reported only passive suicidal
ideation. Ten of these 13 patients (77%) had a past history of either major
depression or bipolar disorder, and this association was highly significant.
Furthermore, the presence of postictal suicidal ideation was also significantly
associated with a history of psychiatric hospitalization. Postictal symptoms of
depression often occurred with other psychiatric symptoms. In 23 patients
concurrent postictal symptoms of anxiety were identified, and in 7 patients a
combination of postictal symptoms of depression, psychosis, and anxiety
was seen.

Episodes of Depression During Interictal Periods


Interictal depression is the most commonly recognized manifestation of mood
disorders among patients with epilepsy. As previously stated, depressive
episodes may be identical to any of the mood disorders described in the
DSM-IV classification (eg, major depression, dysthymic disorder, bipolar
disorder). However, many cases of interictal depression fail to meet the criteria
of any of the DSM mood disorders. For example, a study by Mendez and
colleagues81 found that 50% of depressive episodes had to be classified as
atypical depression according to DSM-III-R criteria. There is a consensus
among various investigators that interictal depression in people with epilepsy
most frequently manifests as a pleomorphic cluster of symptoms of depression,
irritability, anxiety, as well as neurovegetative symptoms.82–84 It has a chronic
course that is interrupted by recurrent symptom-free periods that last hours to
several days. This mode of presentation bears the closest resemblance to a
dysthymic disorder; therefore, the term ‘‘dysthymic-like disorder of epilepsy’’
(DLDE) has been suggested.82 In a study of 97 consecutive patients with a
depressive episode severe enough to warrant pharmacotherapy, DLDE was
identified in 69 (70%) patients.82 Overall, the severity of these depressive
disorders was milder than that of a major depressive episode; however, they
caused sizeable disruptions in patients’ daily activities, social relations, and
quality of life.
In 1923, Kraepelin84 published a description of interictal depressive epi-
sodes in patients with epilepsy suggesting the pleomorphic characteristics of
their symptoms. Six decades later, Blumer expanded on Kraepelin’s observa-
tions and coined the term ‘‘interictal dysphoric disorder’’ (IDD) to refer to this
type of depressive disorder.83 Blumer suggested that IDD consists of the
following eight intermittent affective-somatoform symptoms: irritability,
depressive moods, anergia, insomnia, pains, anxiety, phobic fears, and
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 253

euphoric moods. In his opinion, the presence of three affective-somatoform


symptoms was sufficient to be associated with significant disability. Nearly
one third to one half of patients with epilepsy seeking medical care present a
clinical picture compatible with an IDD of sufficient degree to warrant phar-
macologic treatment. IDD tends to develop 2 years or more after the onset of
epilepsy.
The suicide rate in depressed patients with epilepsy is five times higher than
predicted in the overall population (this figure rises to 25 times higher in
patients with partial seizures of temporal lobe origin).85 In a review of the
literature, Gilliam and Kanner86 concluded that suicide has one of the highest
standardized mortality rates (SMR) of all causes of death in persons with
epilepsy. Furthermore, Robertson87 reviewed 17 studies pertaining to mortality
in epilepsy and established that suicide occurred 10 times more frequently than
in the general population. In a population-based incidence cohort study from
Iceland, Rafnsson and associates88 found that suicide had the highest SMR
(5.8) of all causes of death.

Screening Instruments for Depressive Disorders in Epilepsy


A six-item screening instrument, the Neurological Disorders Depression
Inventory for Epilepsy (NDDI-E), was recently validated to identify major
depressive episodes specifically in patients with epilepsy.89 This instrument
was constructed to minimize the potential for confounding by adverse events
related to antiepileptic drugs or cognitive problems associated with epilepsy
that plague other instruments. Completion of this instrument takes less than 3
minutes. A score of 14 and higher is suggestive of a major depressive disorder,
with a sensitivity of 81% and specificity of 90%.
Other self-rating screening instruments developed to identify symptoms of
depression in the general population, the BDI-II and the CES-D, have been
recently found to be valid in patients with epilepsy.90 To date, they have been
the most frequently used instruments in research studies. Jones and collea-
gues90 found a mean sensitivity of 93% and specificity of 80% for both of these
instruments and a very high negative predictive value (0.98) but lower positive
predictive value (0.47). Ettinger and associates91 investigated the presence of
symptoms of depression with the CES-D among 775 people with epilepsy, 395
people with asthma, and 362 healthy controls identified from a cohort of
85,358 adults aged 18 years and older. Patients with epilepsy experienced
symptoms of depression with a significantly greater frequency (36.5%) and
severity than people with asthma (27.8%) and healthy controls (11.8%). Of
note, 38.5% of people with epilepsy whose score on the CES-D suggested the
presence of a depressive disorder and 43.7% of people with asthma and
depression had never been previously evaluated for depression. The same
254 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

group of investigators compared the lifetime prevalence rates of bipolar symp-


toms and past diagnoses of bipolar I and II disorder with the Mood Disorder
Questionnaire (MDQ) among subjects who identified themselves as having
epilepsy and those with migraine, asthma, diabetes mellitus, or a healthy
comparison group.91 Bipolar symptoms, evident in 12.2% of epilepsy patients,
were 1.6 to 2.2 times more common in subjects with epilepsy than in those with
migraine, asthma, or diabetes mellitus and 6.6 times more likely to occur than
in the healthy comparison group. A total of 49.7% of patients with epilepsy
who screened positive for bipolar symptoms were diagnosed with bipolar
disorder by a physician, nearly twice the rate seen in other disorders.
However, 26.3% of MDQ-positive epilepsy subjects carried a diagnosis of
unipolar depression, and 25.8% had neither a unipolar or bipolar depression
diagnosis. These data question the reliability of this instrument in identifying
bipolar disorders in patients with epilepsy.
The HADS is an attractive screening instrument because it helps
identify symptoms of depression and anxiety at once.92 Self-rating instru-
ments with too many items that identify somatic symptoms, like the PHQ-
9, may be problematic and yield a false-positive diagnosis of depres-
sion.93 Among the examiner-administered instruments, the HAM-D may
not be as useful in patients with epilepsy as the high score may result
from the sedative adverse events of antiepileptic drugs and not be the
expression of an underlying mood disorder.94

Impact of Depressive Disorder on Treatment of the Seizure


Disorder and Quality of Life
Several studies have demonstrated the negative impact of depressive disorders
on the quality of life of patients with epilepsy.95–97 For example, in a study of
56 patients with epilepsy carried out in Germany by Lehrner and associates,95
depression was the single strongest predictor for each domain of health-related
quality of life (HRQOL). The significant association of depression with
HRQOL persisted after controlling for seizure frequency, seizure severity,
and other psychosocial variables. In another study of 257 patients with epilepsy
by Perrine and coworkers,96 the ‘‘mood factor’’ had the highest correlations
with scales of the QOLIE-89 and was the strongest predictor of poor quality of
life in regression analyses. Gilliam and associates97 investigated the variables
responsible for poor quality of life measured with the QOLIE-89 in 194 adult
patients with refractory partial epilepsy. Patients averaged 9.7 seizures per
month (range 0.3 to 51), but there was no correlation between the type or the
frequency of seizures and the QOLIE-89 scores. The presence of symptoms of
depression and neurotoxicity from antiepileptic drugs were the only
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 255

independent variables significantly associated with poor quality-of-life scores


on the QOLIE-89 summary score.
A negative impact of depressive disorders on the response to pharmacologic
and surgical treatments has also been identified.98–100 In a study of 780 patients
with new-onset epilepsy, Hitiris and colleagues98 found that individuals with a
history of psychiatric disorders, and particularly depression, were almost three
times less likely to be seizure-free with antiepileptic medications (median
follow-up period was 79 months) than patients without a history of psychiatric
disorders. Similarly, among 121 patients who underwent a temporal lobectomy,
Anhoury and associates99 reported a worse postsurgical seizure outcome for
patients with a psychiatric history compared with those without a psychiatric
history. In a study of 100 patients who had a temporal lobectomy and were
followed for a mean of 8.8 – 3.3 years, Kanner and colleagues100 investigated
the role of a lifetime history of depression as a predictor of postsurgical seizure
outcome. Using a multivariate logistic regression model, the investigators
evaluated the covariates of a lifetime history of depression, cause of temporal
lobe epilepsy (ie, mesial temporal sclerosis, lesional, or idiopathic), duration of
seizure disorder, occurrence of generalized tonic–clonic seizures, and extent of
resection of mesial temporal structures. A lifetime history of depression and a
smaller resection of mesial temporal structures were the only independent
predictors of persistent auras in the absence of disabling seizures in multivariate
analyses. A lifetime history of depression was also an independent predictor of
failure to reach freedom from disabling seizures in univariate but not multi-
variate analyses. The data in these three studies raise the question of whether a
history of depression may be a marker of a more severe form of epilepsy.

4. Depression in Parkinson’s Disease


A review of the literature reveals that depression is relatively common in PD
patients: major depression has been identified in 5% to 25% of patients and
minor depression in 25% to 50%.101–104 As with other neurologic disorders, the
prevalence of depression in PD has varied according to the type of patient
population, with data derived from community-based studies being lower than
those of hospital-based studies. Thus, in one population-based study of 245 PD
patients, 45.5% reported a mild form of depression, while in another commu-
nity population sample, 19.6% met criteria for moderate to severe depres-
sion.101 Other independent reviews found the prevalence of some forms of
depression in PD patients to be around 45%.103,104 In contrast, comorbid
bipolar disease seems to be rare.105
Recent studies have demonstrated a variety of symptoms other than motor
symptoms preceding the typical manifestations of PD, including constipation,
loss of smell, sleep disturbances such as rapid-eye-movement sleep behavior
256 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

disorder, and depression. Three population-based studies suggest a bidirec-


tional relation between PD and depression.11,12,106 In the first study all subjects
diagnosed with depression between 1975 and 1990 were included and matched
with subjects with the same birth year who were never diagnosed with depres-
sion.11 Among the 1,358 depressed subjects, 19 developed PD, and among the
67,570 nondepressed subjects, 259 developed PD. Thus, people with depres-
sion were three times more likely to develop PD than nondepressed people
(hazard ratio of 3.13 [95% CI 1.95–5.01]) in multivariable analysis. In the
second study, investigators compared the incidence of depression in patients
preceding the onset of PD with that of a matched control population.12 To that
end, data from an ongoing general practice-based register study that included a
population of 105,416 people were used. Among patients who went on to
develop PD, 9.2% had a history of depression, compared with 4.0% of the
control population, yielding an odds ratio for a history of depression for these
patients of 2.4 (95% CI 2.1–2.7). A third population-based study compared the
risk of developing PD between patients with affective disorders and two groups
of medically ill patients, one with osteoarthritis and the second with diabetes,
using linkage of public hospital registers from 1977 to 1993.106 A total of
164,385 patients entered the study base. The risk of being given a diagnosis of
PD was significantly increased for patients with affective disorder when
compared to patients with osteoarthritis (odds ratio 2.2 [95% CI 1.7–2.8]) or
diabetes (odds ratio 2.2 [95% CI 1.7–2.9]).

Clinical Manifestations
Depression in PD often begins late in life, in contrast to primary major
depression, which is more likely to appear before the age of 40 and may present
as major and minor depressive disorders, but often with certain differences. For
example, dysphoric symptoms are more frequent in PD patients and include
irritability, sadness, and pessimism. However, feelings of failure, guilt, and
self-blame are less frequent in PD. Though suicidal ideation appears to be more
frequent in PD patients, they are less likely to actually commit suicide.107
Anxiety symptoms have been identified in two thirds of PD patients with
depressive symptoms; conversely, 97% of PD patients with an anxiety disorder
have been found to exhibit depressive symptoms as well.

Screening Instruments
The screening instruments used for depression in PD patients have included the
BDI, HADS, GDS, and Montgomery-Åsberg Depression Rating Scale
(MADRS).108–111 As with other neurologic disorders, the presence of somatic
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 257

symptoms resulting from PD may be a potential problem resulting in a false-


positive diagnosis of depression.
Leentjens and colleagues108 found that lower cutoff scores (11/12) of the
HAMD-17 and MADRS (14/15) yielded maximal sensitivity, but at the expense
of a low specificity, while maximum discrimination between depressed and
nondepressed PD patients was reached at cutoff scores of 13/14 and 14/15,
respectively. Likewise, the same group of investigators found a high sensitivity
and low specificity at cutoff scores of 8/9 with the BDI, while scores of 16/17 or
higher yielded a low sensitivity and high specificity.109 Similar findings were
reported by Silberman and colleagues.110 Mondolo and associates111 investi-
gated the validity of the HADS and GDS in PD and showed that a maximum
discrimination between depressed and nondepressed PD patients was reached at
a cutoff score of 10/11 for both the HADS and the GDS. A high specificity and
positive predictive value was reached at a cutoff score of 12/13 for the GDS and
at a cutoff score of 11/12 for the HADS.
Tumas and associates112 evaluated 50 consecutive patients with PD using
the Unified Parkinson’s Disease Rating Scale, the 15-item GDS, and the BDI
against DSM-IV criteria. GDS-15 (cutoff 8/9) was better than the BDI (cutoff
17/18) and the UPDRS for screening depression in PD, and depression was not
related to the degree of parkinsonian symptoms.

Impact of Depression on the Course and Quality of Life in


Parkinson’s Disease
The presence of depression in PD patients has been associated with a more
rapid deterioration of motor and cognitive functions, especially executive
function, and a greater likelihood of displaying psychotic symptoms and
physical disability.113–119 Such impact is appreciated even when the depressive
disorder occurs in the early stage of the disease. For example, in a study by
Ravina and associates,119 a total of 114 (27.6%) patients were identified among
a group of 413 patients with a depressive disorder during the average 14.6
months of follow-up. Forty percent of these subjects were neither treated with
antidepressants nor referred for further psychiatric evaluation. Depression was
a significant predictor of more impairment in ADLs and increased need for
symptomatic therapy of PD (hazard ratio = 1.86; 95% CI 1.29–2.68).
The cognitive disturbances of depressed PD patients include poor insight
and judgment, and problems with planning, but memory impairment is seen
less frequently. Neuropsychological tests that evaluate executive functioning
have demonstrated significant deficits in PD patients, indicating frontal sub-
cortical impairment—an impairment that may also be operant in the develop-
ment of mood disorders. In a study that compared performance in
neuropsychological testing between PD patients with major depression,
258 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

nondepressed PD patients, patients with primary major depression, and healthy


controls, depressed PD patients exhibited impairments in set shifting and
concept formation; these abnormalities were unique to this group of
patients.115 However, cognitive deterioration in depressed PD patients can be
mitigated with treatment of the depressive disorder. In a longitudinal study that
compared cognitive performance in depressed and nondepressed PD patients
who were followed for a 3- to 4-year period, cognitive functions deteriorated
more quickly in the depressed PD group.118 However, in the depressed PD
patients who were treated, there was an attenuated decrement in cognitive
scores (11%), compared with untreated patients.
Depression also has been found to have a negative impact on the quality of
life in PD patients, just as in stroke and epilepsy. For example, in a multicenter
study conducted by the Global Parkinson’s Disease Survey Steering
Committee and involving six countries, data were obtained from 2,020 PD
patients and 687 caregivers, and depression was found to be the most signifi-
cant predictor variable in poor health-related quality of life.120 Patients often
failed to recognize their depressive disorder: only 1% of the patients reported
feeling depressed, while 50% were considered depressed by study criteria of a
score of more than 10 on the BDI. Furthermore, in a community-based study of
228 people with PD, depression was the factor most closely related to a poor
quality of life, while the stage of PD, duration, and cognitive impairment had a
lesser impact.121 Others have confirmed these findings.122

5. Conclusions
Depressive disorders are a common comorbidity in neurologic disorders, with
significant negative impacts on their course and response to treatment. Despite
their relatively high prevalence, they go often unrecognized and untreated.
This problem can be mitigated with the use of screening instruments. While it
would be ideal to have screening instruments of depression validated for each
neurologic disorder, the available instruments appear to yield acceptable
sensitivities and specificities for the most part and should be used not only in
research studies but also in clinical practice. It is important to keep in mind,
however, that these are screening instruments, and thus the diagnosis must be
confirmed with a formal psychiatric evaluation.

References
1. Kanner AM. Depression and the risk of neurological disorders. Lancet.
2005;366(9492):1147–1148.
2. May M, McCarron P, Stansfeld S, et al. Does psychological distress predict the risk of
ischemic stroke and transient ischemic attack? The Caerphilly Study. Stroke.
2002;33:7–12.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 259

3. Larson SL, Owens PL, Ford D, et al. Depressive disorder, dysthymia, and risk of
stroke. Thirteen-year follow-up from the Baltimore Epidemiological Catchment Area
Study. Stroke. 2001;32:1979–1983.
4. Jonas BS, Mussolino ME. Symptoms of depression as a prospective risk factor for
stroke. Psychosom Med. 2000;62:463–471.
5. Forsgren L, Nystrom L. An incident case referent study of epileptic seizures in adults.
Epilepsy Res. 1990;6:66–81.
6. Hesdorffer DC, Hauser WA, Annegers JF, et al. Major depression is a risk factor for
seizures in older adults. Ann Neurol. 2000;47:246–249.
7. Hesdorffer DC, Hauser WA, Olafsson E, et al. Depression and suicidal attempt as risk
factor for incidental unprovoked seizures. Ann Neurol. 2006;59(1):35–41.
8. Modrego PJ, Ferrandez J. Depression in patients with mild cognitive impairment
increases the risk of developing dementia of Alzheimer type: a prospective cohort
study. Arch Neurol. 2004;61:1290–1293.
9. Kessing LV, Andersen PK. Does the risk of developing dementia increase with the
number of episodes in patients with depressive disorder and in patients with bipolar
disorder? J Neurol Neurosurg Psychiatry. 2004;75:1662–1666.
10. Dal Forno G, Palermo MT, Donohue JE, et al. Depressive symptoms, sex, and risk for
Alzheimer’s disease. Ann Neurol. 2005;57(3):381–387.
11. Leentgens AFG, Van Der Akker M, Metsemakers JFM, et al. Higher incidence of
depression preceding the onset of Parkinson’s disease: a register study. Movement
Disorders. 2003;18:414–418.
12. Nilsson FM, Kessing LV, Bowlig TG. Increased risk of developing Parkinson’s
disease for patients with major affective disorder: a register study. Acta Psychiatr
Scand. 2001;104:380–386.
13. Kanner AM. Depression in epilepsy: prevalence, clinical semiology, pathogenic
mechanisms and treatment. Biol Psychiatry. 2003;54:388–398.
14. Robinson RG. Poststroke depression: Prevalence, diagnosis, treatment and disease
progression. Biol Psychiatry. 2003;54:376–387.
15. Eastwood MR, Rifat SL, Nobbs H, et al. Mood disorder following cerebrovascular
accident. Br J Psychiatry. 1989;154:195–200.
16. Fedoroff JP, Starkstein SE, Parikh RM, et al. Are depressive symptoms non-specific in
patients with acute stroke? Am J Psychiatry. 1991;148:1172–1176.
17. Burvill PW, Johnson GA, Jamrozik KD, et al. Prevalence of depression after stroke:
The Perth Community Stroke Study. Br J Psychiatry. 1995;166:320–327.
18. Colantonio A, Kasi SV, Ostfeld AM. Depressive symptoms and other
psychosocial factors as predictors of stroke in the elderly. Am J Epidemiol.
1992;136:884–894.
19. Everson SA, Roberts RE, Goldberg DE, et al. Depressive symptoms and increased risk
of stroke mortality over a 29-year period. Arch Intern Med. 1998;158:1133–1138.
20. Berg A, Palomaki H, Letitihalmes M, et al. Post stroke depression: An 18-month
follow-up. Stroke. 2003;34:138–143.
21. Morris PLP, Robinson RG, Raphael B. Prevalence and course of depressive disorders
in hospitalized stroke patients. Intl J Psychiatr Med. 1990;20:349–364.
22. Robinson RG, Price TR. Post-stroke depressive disorders: a follow-up study of 103
outpatients. Stroke. 1982;13:635–641.
23. Starkstein SE, Robinson RG, Berther ML, et al. Depressive disorders following
posterior circulation as compared with middle cerebral artery infarcts. Brain.
1988;11:375–387.
260 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

24. Lipsey JR, Robinson RG, Pearlson GD, et al. Nortriptyline treatment of post-stroke
depression. A double-blind study. Lancet. 1984;1(8372):297–300.
25. Gainotti G, Azzoni A, Marra C. Frequency, phenomenology and anatomical-clinical
correlates of major post-stroke depression. Br J Psychiatry. 1999;175:163–167.
26. Salter K, Bhogal SK, Foley N, et al. The assessment of poststroke depression. Top
Stroke Rehabil. 2007;14(3):1–24.
27. Paradiso S, Ohkubo T, Robinson RG. Vegetative and psychological symptoms
associated with depressed mood over the first two years after stroke. Int J
Psychiatry Med. 1997;27(2):137–157.
28. de Coster L, Leentjens AF, Lodder J, et al. The sensitivity of somatic symptoms in
post-stroke depression: a discriminant analytic approach. Int J Geriatr Psychiatry.
2005;20(11):1103–1104.
29. Aben I, Verhey F, Lousberg R, et al. Validity of the Beck Depression Inventory,
Hospital Anxiety and Depression Scale, SCL-90, and Hamilton Depression Rating
Scale as screening instruments for depression in stroke patients. Psychosomatics.
2002;43(5):386–393.
30. Johnson G, Burvill PW, Anderson CS, et al. Screening instruments for depression and
anxiety following stroke: experience in the Perth community stroke study. Acta
Psychiatr Scand. 1995;91(4):252–257.
31. O’Rourke S, MacHale S, Signorini D, et al. Detecting psychiatric morbidity after
stroke: comparison of the GHQ and the HAD Scale. Stroke. 1998;29(5):980–985.
32. Healey AK, Kneebone II, Carroll M, et al. A preliminary investigation of the reliability
and validity of the Brief Assessment Schedule Depression Cards and the Beck
Depression Inventory-Fast Screen to screen for depression in older stroke survivors.
Int J Geriatr Psychiatry. 2008;23(5):531–536.
33. Gainotti G, Azzoni A, Razzano C, et al. The Post-Stroke Depression Rating Scale: a
test specifically devised to investigate affective disorders of stroke patients. J Clin Exp
Neuropsychol. 1997;19(3):340–356.
34. Sutcliffe LM, Lincoln NB. The assessment of depression in aphasic stroke patients:
the development of the Stroke Aphasic Depression Questionnaire. Clin Rehabil.
1998;12(6):506–513.
35. Sackley CM, Hoppitt TJ, Cardoso K. An investigation into the utility of the Stroke
Aphasic Depression Questionnaire (SADQ) in care home settings. Clin Rehabil.
2006;20(7):598–602.
36. Bennett HE, Thomas SA, Austen R, et al. Validation of screening measures for
assessing mood in stroke patients. Br J Clin Psychol. 2006;45(Pt 3):367–376.
37. Starkstein SE, Robinson RG, Price TR. Comparison of patients with and without post-
stroke major depression matched for age and location of lesion. Arch Gen Psychiatry.
1988;45:247–252.
38. Robinson RG, Starr LB, Kubos KL, et al. A two year longitudinal study of post-stroke
mood disorders: findings during the initial evaluation. Stroke. 1983;14:736–744.
39. Parikh RM, Robinson RG, Lipsey JR, et al. The impact of post-stroke depression on
recovery in activities of daily living over two year follow-up. Arch Neurol.
1990;47:785–789.
40. Wade DT, Legh-Smith J, Hewer RA. Depressed mood after stroke, a community study
of its frequency. Br J Psychiatry. 1987;151:200–205.
41. Kanner AM. Depression in neurologic disorders. Cambridge: Cambridge Medical
Communications, 2005.
42. Feinstein A. Multiple sclerosis and depression. In: The clinical neuropsychiatry of
multiple sclerosis. Cambridge: Cambridge University Press, 1999:26–50.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 261

43. Minden SL, Orav JO, Reich P. Depression in multiple sclerosis. Gen Hosp Psychiatry.
1987;9:426–434.
44. Patten SB, Metz LM, Reimer MA. Biopsychosocial correlates of lifetime major
depression in a multiple sclerosis population. Mult Scler. 2000;6(2):115–120.
45. Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of
DSM-III-R psychiatric disorders in the United States: results from the National
Comorbidity Study. Arch Gen Psychiatry. 1994;51:8–19.
46. Patten SB, Beck CA, Williams JV, et al. Major depression in multiple sclerosis: a
population-based perspective. Neurology. 2003;61(11):1524–1527.
47. Minden SL, Schiffer RB. Affective disorders in multiple sclerosis, review and
recommendations for clinical research. Arch Neurol. 1990;47:98–104.
48. Schiffer RB, Wineman M, Weitkamp LR. Association between bipolar affective
disorder and multiple sclerosis. Am J Psychiatry. 1986;143:94–95.
49. Joffe RT, Lippert GP, Gray TA, et al. Mood disorder and multiple sclerosis. Arch
Neurol. 1987;44:376–378.
50. Krupp LB, Alvarez LA, LaRocca NG, et al. Fatigue in multiple sclerosis. Arch Neurol.
1988;45:435–437.
51. Mohr DC, Hart SL, Goldberg A. Effects of treatment for depression on fatigue in
multiple sclerosis. Psychosom Med. 2003;65(4):542–547.
52. Sadovnick AD, Eisen K, Ebers GC, et al. Cause of death in patients attending multiple
sclerosis clinics. Neurology. 1991;41:1193–1196.
53. Stenager EN, Stenager E. Suicide and patients with neurological diseases—
methodologic problems. Arch Neurol. 1992;49:1296–1303.
54. Stenager EN, Stenager E, Koch-Henriksen N, et al. Suicide and multiple sclerosis: an
epidemiological investigation. J Neurol Neurosurg Psychiatry. 1992;55:542–545.
55. Jacobs DG, Jamison KR, Baldessarini RJ, et al. Suicide: clinical/risk management
issues for psychiatrists. CNS Spectrums. 2000;5:32–48.
56. Feinstein A, O’Connor P, Gray T, Feinstein K. The effects of anxiety on psychiatric
morbidity in patients with multiple sclerosis. Mult Scler. 1999;5:323–326.
57. Moran PJ, Mohr DC. The validity of Beck Depression Inventory and Hamilton Rating
Scale for Depression items in the assessment of depression among patients with
multiple sclerosis. J Behav Med. 2005;28(1):35–41.
58. McGuigan C, Hutchinson M. Unrecognised symptoms of depression in a community-
based population with multiple sclerosis. J Neurol. 2006;253(2):219–223.
59. Gottberg K, Einarsson U, Fredrikson S, et al. Population-based study of depressive
symptoms in multiple sclerosis in Stockholm County: association with functioning
and sense of coherence. J Neurol Neurosurg Psychiatry. 2007;78(1):60–65.
60. Sullivan MJ, Weinshenker B, Mikail S, et al. Screening for major depression in the
early stages of multiple sclerosis. Can J Neurol Sci. 1995;22(3):228–231.
61. Chwastiak L, Ehde DM, Gibbons LE, et al. Depressive symptoms and severity of
illness in multiple sclerosis: epidemiologic study of a large community sample. Am J
Psychiatry. 2004;161(8):1504.
62. Patten SB, Lavorato DH, Metz LM. Clinical correlates of CES-D depressive symptom
ratings in an MS population. Gen Hosp Psychiatry. 2005;27(6):439–445.
63. Gold SM, Schulz H, Mönch A, et al. Cognitive impairment in multiple sclerosis does
not affect reliability and validity of self-report health measures. Mult Scler.
2003;9(4):404–410.
64. Benedict RH, Fishman I, McClellan MM, et al. Validity of the Beck Depression
Inventory-Fast Screen in multiple sclerosis. Mult Scler. 2003;9(4):393–396.
262 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

65. Mohr DC, Hart SL, Julian L, et al. Screening for depression among patients with
multiple sclerosis: two questions may be enough. Mult Scler.
2007;13(2):215–219.
66. Amato MP, Ponziani G, Rossi F, et al. Quality of life in multiple sclerosis: the impact
of depression, fatigue and disability. Mult Scler. 2001;7:340–344.
67. Wang JL, Reimer MA, Metz LM, et al. Major depression and quality of life in
individuals with multiple sclerosis. Int J Psychiatry Med. 2000;30:309–317.
68. Fruehwald S, Loeffler-Stastka H, Eher R, et al. Depression and quality of life in
multiple sclerosis. Acta Neurol Scand. 2001;104:257–261.
69. Lobentanz IS, Asenbaum S, Vass K, et al. Factors influencing quality of life in
multiple sclerosis patients: disability, depressive mood, fatigue and sleep quality.
Acta Neurol Scand. 2004;110(1):6.
70. Benito-León J, Morales JM, Rivera-Navarro J. Health-related quality of life and its
relationship to cognitive and emotional functioning in multiple sclerosis patients. Eur
J Neurol 2002; 9(5): 497–502.
71. Kanner AM, Balabanov A. Depression in epilepsy: How closely related are these two
disorders? Neurology 2002;58(suppl 5):S27–39.
72. Jacoby A, Baker GA, Steen N, et al. The clinical course of epilepsy and its
psychosocial correlates: findings from a UK community study. Epilepsia.
1996;37(2):148–161.
73. O’Donoghue MF, Goodridge DM, Redhead K, et al. Assessing the psychosocial
consequences of epilepsy: a community-based study. Br J Gen Pract.
1999;49(440):211–214.
74. Tellez-Zenteno JSF, Patten SB, Wiebe S. Psychiatric comorbidity in epilepsy: A
population-based analysis. Epilepsia. 2007;48(12):2336–2344.
75. Ettinger A, Reed M, Cramer J; Epilepsy Impact Project Group. Depression and
comorbidity in community-based patients with epilepsy or asthma. Neurology.
2004;63(6):1008–1014.
76. Blanchet P, Frommer GP. Mood change preceding epileptic seizures. J Nerv Ment Dis.
1986;174:471–476.
77. Williams D. The structure of emotions reflected in epileptic experiences. Brain.
1956;79:29–67.
78. Weil A. Depressive reactions associated with temporal lobe uncinate seizures. J Nerv
Ment Dis. 1955;121:505–510.
79. Daly D. Ictal affect. Am J Psych. 1958;115:97–108.
80. Kanner AM, Soto A, Gross-Kanner H. Prevalence and clinical characteristics of
postictal psychiatric symptoms in partial epilepsy. Neurology. 2004;62:708–713.
81. Mendez MF, Cummings J, Benson D, et al. Depression in epilepsy. Significance and
phenomenology. Arch Neurol. 1986;43:766–770.
82. Kanner, AM, Kozak AM, Frey M. The use of sertraline in patients with epilepsy: is it
safe? Epilepsy Behav. 2000;1(2):100–105.
83. Blumer D, Altshuler LL. Affective disorders. In: Engel J, Pedley TA, eds.
Epilepsy: a comprehensive textbook, vol. II. Philadelphia: Lippincott-Raven,
1998:2083–2099.
84. Kraepelin E. Psychiatrie, vol 3. Leipzig: Johann Ambrosius Barth, 1923.
85. Robertson M. Carbamazepine and depression. Int Clin Psychopharmacol.
1987;2:23–35.
86. Gilliam F, Kanner AM. The treatment of depression in epilepsy. Epilepsy & Behavior.
2002;3:6.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS 263

87. Robertson MM. Suicide, parasuicide, and epilepsy. In: Engel J, Pedley TA, eds.
Epilepsy: a comprehensive textbook. Philadelphia: Lippincott-Raven, 1997.
88. Rafnsson V, Olafsson E, Hauser WA, et al. Cause-specific mortality in adults with
unprovoked seizures. A population-based incidence cohort study.
Neuroepidemiology. 2001;20(4):232–236.
89. Gilliam FG, Barry JJ, Meador KJ, et al. Rapid detection of major depression in
epilepsy: a multicenter study. Lancet Neurology. 2006;5(5):399–405.
90. Jones JE, Herman BP, Woodard JL, et al. Screening for major depression in epilepsy
with common self-report depression inventories. Epilepsia. 2005;46(5):731–735.
91. Ettinger AB, Reed ML, Goldberg JF, et al. Prevalence of bipolar symptoms in epilepsy
vs other chronic health disorders. Neurology. 2005;65(4):535–540.
92. Snaith RP, Zigmond AS. Hospital Anxiety and Depression Scale. Acta Psychiatr
Scand. 1983;67:361–370.
93. Gilbody S, Richards D, Brealey S, et al. Screening for depression in medical settings
with the Patient Health Questionnaire: a diagnostic meta-analysis. J Gen Intern Med.
2007;11:1596–1602.
94. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatr.
1960;23:56–62.
95. Lehrner J, Kalchmayr R, Serles W, et al. Health-related quality of life (HRQOL),
activity of daily living (ADL) and depressive mood disorder in temporal lobe epilepsy
patients. Seizure. 1999;8(2):88–92.
96. Perrine K, Hermann BP, Meador KJ, et al. The relationship of
neuropsychological functioning to quality of life in epilepsy. Arch Neurol.
1995;52(10):997–1003.
97. Gilliam F, Kuzniecky R, Faught E, et al. Patient-validated content of epilepsy-specific
quality-of-life measurement. Epilepsia. 1997;38(2):233–236.
98. Hitiris N, Mohanraj R, Norrie J, et al. Predictors of pharmacoresistant epilepsy.
Epilepsy Res. 2007;75(2–3):192–196.
99. Anhoury S, Brown RJ, Krishnamoorthy ES, et al. Psychiatric outcome after temporal
lobectomy: a predictive study. Epilepsia. 2000;41:1608–1615.
100. Kanner AM, Byrne R, Smith MC, et al. Does a lifetime history of depression predict a
worse postsurgical seizure outcome following a temporal lobectomy? [abstract] Ann
Neurol. 2006;60:(S:10):19.
101. Tandberg E, Larsen JP, Aarsland D, et al. The occurrence of depression in Parkinson’s
disease—a community-based study. Arch Neurol. 1996;53:175–179.
102. Schrag A, Jahanshahi M, Quinn NP. What contributes to depression in Parkinson’s
disease? Psychol Med. 2001;31:65–73.
103. Gotham AM, Brown RG, Marsden CD. Depression in Parkinson’s disease: a
quantitative and qualitative analysis. J Neurol Neurosurg Psychiatry.
1986;49:381–389.
104. Cummings JL. Depression and Parkinson’s disease: a review. Am J Psychiatry.
1992;149:443–454.
105. Cannas A, Spissu A, Floris GL, et al. Bipolar affective disorder and Parkinson’s
disease: rare, insidious and often unrecognized association. Neurol Sci.
2002;23:S67–68.
106. Reijnders JS, Ehrt U, Weber WE, et al. A systematic review of prevalence studies of
depression in Parkinson’s disease. Mov Disord. 2008;23(2):183–189.
107. Burn DJ. Beyond the iron mask: towards a better recognition and treatment of
depression associated with Parkinson’s disease. Mov Disord. 2002;17:445–454.
264 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

108. Leentjens AF, Verhey FR, Lousberg R, et al. The validity of the Hamilton and
Montgomery-Asberg depression rating scales as screening and diagnostic tools for
depression in Parkinson’s disease. Int J Geriatr Psychiatry. 2000;15:644–649.
109. Leentjens AF, Verhey FR, Luijckx GJ, et al. The validity of the Beck Depression
Inventory as a screening and diagnostic instrument for depression in patients with
Parkinson’s disease. Mov Disord. 2000;15(6):1221–1224.
110. Silberman CD, Laks J, Capitão CF, et al. Recognizing depression in patients with
Parkinson’s disease: accuracy and specificity of two depression rating scale. Arq
Neuropsiquiatr. 2006;64(2B):407–411.
111. Mondolo F, Jahanshahi M, Granà A, et al. Evaluation of anxiety in Parkinson’s disease
with some commonly used rating scales. Neurol Sci. 2007;28(5):270–275.
112. Tumas V, Rodrigues GGR, Farias TLA, et al. The accuracy of diagnosis of major
depression in patients with Parkinson’s disease: a comparative study among the
UPDRS, the Geriatric Depression Scale and the Beck Depression Inventory.
Arquivos De Neuro-Psiquiatria. 2008;66(2):152–156.
113. Starkstein SE, Petracca G, Chemerinski E, et al. Depression in classic versus akinetic-
rigid Parkinson’s disease. Mov Disord. 1998;13:29–33.
114. Starkstein SE, Bolduc PL, Preziosi TJ, et al. Cognitive impairments in different stages
of Parkinson’s disease. J Neuropsychiatry. 1989;1:243–248.
115. Kuzis G, Sabe L, Tiberti C, et al. Cognitive function in major depression and
Parkinson’s disease. Arch Neurol. 1997;54:982–986.
116. Starkstein SE, Bolduc PL, Mayberg HS, et al. Cognitive impairments and depression
in Parkinson’s disease: a follow-up study. J Neurol Neurosurg Psychiatry.
1990;53:597–602.
117. Weintraub D, Moberg PJ, Duda JE, et al. Effect of psychiatric and other
nonmotor symptoms on disability in Parkinson’s disease. J Am Geriatr Soc.
2004;52:784–788.
118. Starkstein SE, Mayberg HS, Leiguarda R, et al. A prospective longitudinal study of
depression, cognitive decline, and physical impairments in patients with Parkinson’s
disease. J Neurol Neurosurg Psychiatry. 1992;55:377–382.
119. Ravina B, Camicioli R, Como PG, et al. The impact of depressive symptoms in early
Parkinson disease. Neurology. 2007;69(4):E2–3.
120. The Global Parkinson’s Disease Survey Steering Committee. Factors impacting on
quality of life in Parkinson’s disease: results from an international survey. Mov
Disord. 2002;17:60–67.
121. Kuopio AM, Marttila RJ, Helenius H, et al. The quality of life in Parkinson’s disease.
Mov Disord. 2000;15:216–213.
122. Schrag A, Jahanshahi M, Quinn N. What contributes to quality of life in patients with
Parkinson’s disease? J Neurol Neurosurg Psychiatry. 2000;69:308–312.
13
SCREENING FOR DEPRESSION IN
CANCER CARE

Linda E. Carlson, Sheena K. Clifford, Shannon L. Groff,


Olga Maciejewski, and Barry D. Bultz

1. Prevalence of Depression in Cancer Care


2. Screening Methods for Depression
3. Screening for Depression in Oncology
4. Implementing Screening Programs in Oncology Settings
5. Special Issues in Screening Cancer Patients
6. Summary, Integration, Future Directions
7. Acknowledgments

Context
There is an increasing awareness of the importance of screening for depression
and distress in oncology settings. Researchers have devised quick and simple
methods for assessing symptoms in a wide range of patients that are acceptable
to both patients and providers, and introduced computerized systems that make
it possible to quickly screen a large number of patients efficiently. A large body
of data concerning implementation of screening in cancer care seems to
suggest that screening can serve to stimulate discussions of psychosocial and
mental health issues between patients and oncology staff, but whether
screening affects patient outcomes is still unclear.

1. Prevalence of Depression in Cancer Care


As with many other medical populations, people suffering with cancer are
susceptible to clinical depression. More so than many other illnesses, cancer is
associated with a poor prognosis and is in many ways synonymous with fear.

265
266 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

References to the ‘‘Big C’’ and hushed tones prevailed in the past when a
diagnosis of cancer was discussed,1 and remnants of these attitudes are still
prevalent in many communities and countries worldwide. Hence, beyond the
burden of the tumor and the associated treatment, the psychological toll of
cancer is significant. Two thorough reviews of the prevalence of depression in
cancer have been published in the past few years,2,3 and in addition to several
other general reviews,4–6 summaries are available with reference to specific
types of cancer (prostate,7 pancreatic,8 advanced disease9) and specific patient
groups (children10 and the elderly11). Taking methodologic issues into con-
sideration, the point prevalence of major depressive disorder and depression
symptoms comorbid with cancer is most commonly cited between 10% and
25%.2 This rate varies considerably depending on how depression is measured
(standard clinical interview, questionnaire), how it is conceptualized, the
criteria used to define depression, the types of patients assessed (cancer type,
demographics, inpatients versus outpatients), and the point during the cancer
treatment trajectory when assessment occurs.3 Massie3 summarized 88 papers
investigating depression prevalence in cancer patients: the highest rates of
depression were found in head and neck, pancreatic, breast, and lung cancer
patients (up to 50% of all patients), with lower rates generally reported in colon
cancer, gynecologic cancer, and lymphomas (rates from 8% to 25%). The
cancers with higher rates of depression generally have less positive prognoses
(pancreatic, lung) or involve disfiguring treatments (head and neck, breast),
perhaps explaining these discrepancies.

2. Screening Methods for Depression


Methods used for assessment and screening of depression in cancer care are
varied, with the gold standard for assessment still considered to be clinical
interviews based on DSM-IV or ICD-10 criteria for depression, and ideally the
Structured Clinical Interview based on the DSM-IV (SCID). However, long
structured or semi-structured interviews are not practical in most clinical
settings. The usual caveats for measuring depression in somatic illness, as
summarized in Chapter 11, also apply to cancer patients, since many of the
symptoms of depression are also common symptoms of cancer or results of
treatments such as surgery, chemotherapy, radiation therapy, and hormone
therapy.12 For example, sleep is often impaired due to chemotherapy drugs
or steroids, and fatigue is a common consequence of many cancer treatments.
In addition, weight loss is common due to nausea and vomiting, and immu-
notherapies such as interferon-alpha are known to cause depressive affect and
sad mood. Given this extensive array of somatic symptoms, various diagnostic
approaches have been developed to deal with these symptoms, commonly
referred to as inclusive, etiologic, substitutive, or exclusive.12,13 The inclusive
13 SCREENING FOR DEPRESSION IN CANCER CARE 267

approach counts all the symptoms of depression, whether or not they may be
secondary to the cancer, while the etiologic approach includes only those that
are thought not to result from a physical illness (this is the approach used in the
DSM and SCID). The substitutive approach replaces symptoms that may be
related to the disease (eg, fatigue) with additional cognitive symptoms such as
hopelessness or pessimism. Finally, the exclusive approach simply eliminates
the most common physical symptoms, fatigue and appetite/weight changes,
from the diagnostic criteria. There are pros and cons to each approach,
although the etiologic approach is preferred by some14; drawbacks, however,
are reliance on inference of causality, which will vary in accuracy depending
on the assessor. Future studies should assess which approach leads to the most
accurate case-finding method in patients living with cancer.
In addition to interview methods, self-report tools are often used to screen
for depression. Indeed, self-report methods have several advantages in low-
resource environments. Some evidence also suggests that self-report methods
allow the early detection of symptoms that would not be detected even by
trained clinicians.12 The most common such instruments are the Hospital
Anxiety and Depression Scale (HADS), the Beck Depression Inventory
(BDI), and the Center for Epidemiologic Studies-Depression tool (CES-D),
all of which have been discussed in detail in Chapter 4 and elsewhere. Research
in general practice has evaluated the utility of these tools against the gold-
standard structured clinical interview with some success.15 Recently shorter
one- to four-item instruments (such as the Patient Health Questionnaire two-
item version [PHQ2], the Prime MD, and the World Health Organization
[WHO] two-item scale) have been evaluated.16 However, in cancer patients,
the prevalence rates are typically higher and therefore results cannot be extra-
polated but require further study (see below).

3. Screening for Depression in Oncology


Conventional Mood Severity Scales
Few studies have been conducted in oncology populations that compare
the sensitivity and specificity of short questionnaires for depression
against a gold-standard clinical interview. A summary of those investi-
gating the HADS (the most commonly used tool in oncology) that
reported both sensitivity and specificity are presented in Table 13.1.
Some investigators found the HADS to be a useful instrument in this
context. For example, Razavi and colleagues17 compared the HADS to a
psychiatric interview in 210 inpatients with cancer using receiver oper-
ating characteristic (ROC) analyses. They determined appropriate cutoff
scores on the HADS that would maximize sensitivity and specificity (see
268 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Chapter 5 for a discussion of ROC curves). The area under the plotted
curve (AUC) provides an estimate of the degree to which each cutoff
score discriminates cases relative to the criterion measure ranging from
0.5 (no value) to 1 (perfect). Razavi and colleagues17 found that in
relation to an interview-based diagnosis of major depressive disorders, a
cutoff score of 19 on the HADS total score gave 70% sensitivity and 75%
specificity. Where the outcome was adjustment disorders and major
depressive disorders together, a cutoff score of 12v13 on the HADS
gave 75% sensitivity and 75% specificity. Other studies have found
similar results (see Table 13.1).
Three more recent cross-cultural studies, one in Southern Europe,18 one in
Japan,19 and the third in Australia,20 looked at the HADS compared to either an
ICD-10 diagnostic interview or psychiatrist diagnosis based on an interview.
All found the HADS to have relatively good predictive value. In the Japanese
study, a cutoff on the HADS of 8v9 resulted in sensitivity of 0.92 and
specificity of 0.57 against the diagnosis of either adjustment disorders or
major depression. The AUC for the HADS in the Italian study was better at
0.89, with the best cutoff identified as 9v10, resulting in sensitivity of 0.86 and
specificity of 0.82, against the criterion of diagnosis with any ICD-10 anxiety,
adjustment, or major depressive disorder. High values were maintained when
the criteria were changed to just look at the HADS in relation to anxiety and
adjustment disorders, with an AUC of 0.86, sensitivity of 0.83, and specificity
of 0.82 for a HADS cutoff of 10 or more. For mood disorders alone, a HADS
total score cutoff of 15 or more was associated with even higher AUC of 0.96,
sensitivity of 0.85, and high specificity of 0.96. Hence, higher cutoffs can be
used to maximize positive predictive value but at the expense of negative
predictive value.
Not all studies found both high specificity and sensitivity of the HADS. The
Australian study with only breast cancer patients found the recommended
cutoff of 10v11 had good specificity at 0.97 but low sensitivity (0.16) to
detect both major and minor depression.20 The best cutoff to maximize both
indices in that study was 7v8 on the depression subscale, which resulted in
sensitivity of 0.46 and specificity of 0.94, still not as good as in the Italian
sample. Similarly, Hall and associates21 found that in women with early-stage
breast cancer, neither the anxiety or depression subscales of the HADS
provided adequate sensitivity, although the specificity was high (see
Table 13.1).
The BDI was also used against a psychiatric interview in the
Australian study with breast cancer patients.20 They found the BDI to
be a better instrument than the HADS in that a cutoff of 5 resulted in a
sensitivity of 0.73 and specificity of 0.74, with three quarters of the
patients correctly classified with major and minor depression. Berard
Table 13.1. Cutoff Points of the HADS and BDI to Maximize Sensitivity and Specificity in Cancer Patients

Reference Population Measure Criterion Cutoff SE (%) SP (%)


HADS Full-
Scale Studies
Grassi et al., 109 patients with mixed HADS ICD-10 psychiatric interview – anxiety, 10 full scale 86 82
200718 diagnoses adjustment, or major depressive disorders
Grassi et al, 109 patients with mixed HADS Adjustment or major depression 16 full scale 85 96
200718 diagnoses
Razavi et al., 210 cancer inpatients with HADS Endicott criteria – major depression 19 full scale 70 75
199017 mixed diagnoses
Razavi et al., 210 cancer inpatients with HADS Adjustment and major depression 13 full scale 75 75
199017 mixed diagnoses
Hopwood 81 patients with metastatic HADS Clinical interview schedule for DSM-III – 11 full scale 75 75
et al., 199156 breast cancer affective disorders
Ibbotson 513 outpatients with HADS Psychiatric Assessment Schedule Interview for 14 full scale 80 76
et al., 199457 mixed cancer diagnoses DSM-III – GAD plus major depression (no AD)
HADS
Subxcale
Studies
Berard et al., 100 patients with breast, HADS Structured psychiatric interview for DSM-IV – 8 depression 71 95
199822 head and neck cancer, and major depression only
lymphoma

(Continued )
Table 13.1. (Continued)

Reference Population Measure Criterion Cutoff SE (%) SP (%)


Hall et al., 266 women with HADS Present State Examination for DSM-III – Major 11 depression 14 98
199921 early-stage breast cancer depression and GAD separately 11 anxiety 24 97
8 depression 33 93
8 anxiety 64 84
Lloyd- 100 patients with HADS Present State Examination for ICD-10 – major 19 full scale 68 67
Williams metastatic cancer – mixed depression 11 depression 54 74
et al., 200158 diagnoses
10 anxiety 59 68
Akizuki et al., 275 breast, lung, HADS Psychiatric interview based on DSV-IV – 9 depression 92 57
200319 lymphoma and leukemia adjustment disorders and major depression
patients
Love et al., 227 women with HADS Monash Interview for Liaison Psychiatry based 8 depression 46 94
200420 metastatic breast cancer on DSM-IV – adjustment disorder and major
depression
BDI studies
Berard et al., 100 patients with breast, BDI Major depression only 16 86 95
199822 head and neck cancer and
lymphoma
Love et al., 227 women with BDI Adjustment disorder and major depression 5 73 74
200420 metastatic breast cancer

AD, Affective Disorders; BDI, Beck Depression Inventory; GAD, Generalized Anxiety Disorder; HADS, Hospital Anxiety and Depression Scale; SE, sensitivity; SP,
specificity
13 SCREENING FOR DEPRESSION IN CANCER CARE 271

and coworkers22 also used the BDI but found that a much higher cutoff of
14 best identified cases of major depression.
In summary, mood severity scales such as the HADS or BDI can be used
with some success in classifying patients as depressed against an interview
criterion, but the specific cutoff scores used varied considerably
across different patient populations, making it difficult to know which values
to apply.

The Distress Paradigm


In recent years, with increased attention to the ‘‘patient experience,’’ the field
of psychosocial oncology has grown significantly. Once considered an add-on,
psychosocial oncology is increasingly being seen as a clinical necessity within
any cancer care delivery system. An emerging movement within the context of
cancer care has arisen to recognize a concept of broadly defined emotional
disturbance associated with cancer, which has been given the general term
‘‘distress.’’23 Distress has recently been discussed as the ‘‘sixth vital sign in
cancer care,’’24–26 with the intention to raise the level of awareness and
normalization of distress such that it mandates routine assessment. National
Comprehensive Cancer Network (NCCN) guidelines recommend screening at
the first appointment and regularly thereafter as needed throughout the course
of treatment.
Distress is a somewhat difficult term to understand clinically as training in
mental health does not recognize this term as a diagnostic category, and it is
difficult to argue for specific symptoms associated with a diagnosis of ‘‘dis-
tress.’’ However, it has great face validity, little stigma, and appeals to common
sense, as most individuals have a good idea of what feeling distressed entails.
The philosophy behind ‘‘distress’’ as a concept was to destigmatize emotional
reactions to cancer by providing a commonsense term without the negative
linguistic baggage associated with clinical terms such as depression. Emotional
distress refers primarily to a composite of anxiety, depression, and adjustment
disorders related to the cancer experience. The NCCN Distress Management
Panel has defined distress as:

. . . a multi-determined unpleasant emotional experience of a psychological


(cognitive, behavioral, emotional), social, and/or spiritual nature that may
interfere with the ability to cope effectively with cancer, its physical symptoms
and its treatment. Distress extends along a continuum, ranging from common
normal feelings of vulnerability, sadness and fears to problems that can become
disabling, such as depression, anxiety, panic, social isolation, and spiritual
crisis.27
272 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Brief Symptom Inventory (BSI)-18


Distress in cancer populations has been assessed using a number of measures,
most notably the Brief Symptom Inventory 18-item version (BSI-18)28,29 and
the Distress Thermometer (DT).27 The BSI-18 is derived from the family of
instruments developed by Derogatis,28 shortened from the Symptom
Checklist-9030 and Brief Symptom Inventory-53,31 both of which have under-
gone extensive psychometric validation. The BSI-18 consists of 18 items that
load on three subscales: depression, anxiety, and somatization (with potentially
a fourth factor with only one item assessing suicide). To determine the utility of
the BSI-18 for identifying cases of distress in cancer patients, Zabora and
coworkers29 conducted a study of sensitivity and specificity of the instrument
compared to the longer version of the BSI. Cutoff scores were estimated based
on the distribution of standardized t-scores, with the 25th percentile used as the
cutoff point for positive case identification. For men, the 25th percentile on the
BSI-18 fell at a score of 10 on the Global Severity Index (GSI), and for women
it fell at 13. The sensitivity of these cutoffs compared to those on the original
53-item BSI was 91.2%, while specificity was 92.6%. Hence, these values
were recommended as cutoffs for distress in cancer populations. However,
ROC analyses were not performed in this study, which begs the question
whether the optimal balance between sensitivity and specificity is actually
achieved at those cutoff values.
Two large-scale studies of the BSI-18 in cancer patients using the cutoff
scores detailed above29 documented clinically significant levels of distress in
approximately 35% to 38% of all patients, in studies with over 4,000 and 3,000
patients, respectively.32,33 The highest rates of distress in both studies were
found in lung, pancreatic, and head and neck cancer patients, with lower rates
in prostate and gynecologic cancer patients.32,33 This conforms to the same
patterns seen with respect to rates of depression diagnosed in different types of
cancer.3

Distress Thermometer
Recently there has been a great deal of interest in the use of the DT in cancer
screening, fuelled primarily by the recommendation for its use by the NCCN in
its distress screening guidelines. Several studies have now evaluated the
performance of the DT using ROC analyses to determine appropriate cutoff
scores on the 0-to-10 scale that would maximize sensitivity and specificity
(Table 13.2—adapted from Mitchell34). A recent comprehensive review by
Mitchell34 assessed the accuracy of the DT as investigated in 19 different
studies, as well as other short screening methods (fewer than five questions)
in oncology. Overall, accuracy of the DT in diagnosing depression across six
studies of 2,816 patients was reported at 81% sensitivity and 60% specificity.
For broadly defined distress, the corresponding values for the DT in nine
Table 13.2. Cutoff Values of the Distress Thermometer (DT) to Maximize Sensitivity and Specificity in Cancer Patients

Reference Population Criterion DT Cutoff SE (%) SP (%) PPV (%) NPV (%)
Akizuki et al., 275 breast, lung, Psychiatric interview based on 4 84 61 77 71
200319 lymphoma, and leukemia DSM-IV – adjustment disorders
patients and major depression
Patrick-Miller 1,272 outpatients with HADS (cutoff not stated) Not Stated 79 62 26 95
et al., 200459 mixed cancers
Hoffman et al., 68 outpatients with mixed BSI caseness (t-score = 63) 5 59 71 57 73
200460 cancers
Akizuki et al., 295 mixed cancer and Clinical interview for DSM-IV – 4 81 82 80 83
200561 patients preparing for stem major depression and adjustment
cell transplants disorders
Jacobsen et al., 380 patients with mixed HADS >14 4 77 68 44 90
200535 diagnoses BSI-18 >¼10 for males, >¼13
for females
Gil et al., 312 patients with mixed HADS total >¼14 4 66 79 56 85
200536 diagnoses
Ransom et al., 491 patients awaiting bone CES-D >=16 5 80 70 46 91
200662 marrow transplant
Mehnert et al., 475 outpatients with mixed HADS anxiety >=8 5 78 45 69 56
200663 cancers HADS depression >=8 83 37 42 80
Adams et al., 340 outpatients with mixed HADS anxiety >=8 4 91 63 37 97
200664 cancers HADS depression >=8 89 57 19 98

(Continued )
Table 13.2. (Continued)

Reference Population Criterion DT Cutoff SE (%) SP (%) PPV (%) NPV (%)
Andritsch 128 outpatients receiving HADS anxiety >=8 4 78 65 38 92
et al., 200665 chemotherapy HADS depression >=8 80 64 35 93
Ohno et al., 160 outpatients with mixed HADS total >14 Not 93 31 41 89
200666 cancers specified
Kumar et al., 145 palliative care patients ICD-10 adjustment disorders, 5 73 52 46 77
200667 affective disorders, and anxiety
Ozalp et al., 182 outpatients with mixed HADS total >14 4 74 50 47 76
200768 cancers
Gessler et al., 152 outpatients with mixed HADS total >15 4 83 76 57 92
200669 cancers
Grassi et al., 109 outpatients patients ICD-10 psychiatric interview – 4 80 75 69 84
200718 with mixed diagnoses anxiety, adjustment, or major
depressive disorders

BSI, Brief Symptom Inventory; CES-D, Center for Epidemiologic Studies–Depression; DSM-IV, Diagnostic and Statistical Manual for Mental Disorders Version IV; HADS,
Hospital Anxiety and Depression Scale; ICD-10, International Classification of Diseases, Version 10; NPV, negative Predictive Value; PPV, positive Predictive Value; SE,
sensitivity; SP, specificity.
13 SCREENING FOR DEPRESSION IN CANCER CARE 275

studies of 1,447 patients were 77% and 66%. Four studies of the DT used
HADS anxiety as the criterion measure in 2,215 patients and found sensitivity
of 77% and specificity of 57%. In detecting depression, distress, and anxiety,
the positive predictive value of the DT was much lower than negative pre-
dictive value—that is, it was good at ruling out noncases but not as accurate at
identifying true cases of distress. Because of this, Mitchell concluded that
ultra-short measures cannot be used alone to diagnose anxiety or depression in
cancer patients, but can serve well as a first-line screening to rule out cases of
depression.
More specifically, one of the larger studies conducted on the DT35 validated
the DT against the HADS and BSI-18 in five American comprehensive cancer
centers by asking 380 patients to complete the DT, problem checklist, HADS,
and BSI-18. They conducted ROC analysis on the DT against both criteria and
found the AUC for a cutoff score of 4 or more on the DT was 0.80 (against the
HADS cutoff score of 15 or more as the criterion) and 0.78 (using the BSI-18
cutoff scores of 10 or more for males and 13 or more for females), which are in
the range characterizing good overall test accuracy. Patients with DT scores of
4 or more were more likely to be women, to have a poorer performance status,
and to report practical, family, emotional, and physical problems, demon-
strating the concurrent validity of the instrument.
Cross-cultural validation of the DT has also been undertaken. For example,
in Japan researchers assessed the validity of the DT and the HADS against
psychiatrist diagnoses of DSM-IV major depression and adjustment disorders
in a sample of 275 patients.19 They forward- and back-translated the term
‘‘distress’’ in an attempt to find the appropriate Japanese analogue for the term.
Using ROC analysis they determined the best cutoff on the DT that maximized
sensitivity and specificity of the detection of adjustment disorders and major
depression was 4 or more, with rates of 84% and 61%, respectively. They
justified the lower specificity by reasoning that in the case of detecting
depression, it is more important to overidentify potential cases rather than
miss troubled individuals.
A multicenter study in Europe assessed the value of both the DT and a
similar scale termed the Mood Thermometer (MT) designed to assess
depressed mood in cancer patients using a population from Italy, Portugal,
Spain, and Switzerland.36 A convenience sample of 312 cancer outpatients
completed the DT, MT, and HADS. The DT was more highly associated with
HADS anxiety scores than depression scores, while the MT was related to both
HADS anxiety and depression scores and was more highly correlated to HADS
scores than was the DT. ROC analyses found that a cutoff point of 4 or more on
the DT maximized sensitivity (66%) and specificity (79%) for general psy-
chosocial morbidity (HADS cutoff of 14 or more), while a cutoff of 5 or more
identified more severe cases (HADS cutoffs of 19 or more: sensitivity 70%,
276 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

specificity 73%). On the MT, sensitivity and specificity for general psychoso-
cial morbidity were 85% and 72% using the cutoff score of 3 or more. A score
of 4 or more on the MT was associated with a sensitivity of 78% and a
specificity of 77% in detecting more severe cases.
Finally, another Italian study used the ICD-10 diagnostic interview as the
gold standard. Grassi and associates18 administered the DT and the HADS to
109 participants, and once again conducted ROC analyses compared to the
formal psychiatric diagnoses. The most efficient cutoff score for the DT to
optimize sensitivity and specificity was again 4 or more. Other studies
published since 2006 also found similar results in terms of high sensitivity
and specificity against instruments such as the HADS, but with lower positive
predictive values (see Table 13.2). Hence, there is general consensus in North
America, Europe, and Asia that scores of 4 or 5 and above on the DT are
indicative of levels of distress/depression that are generally accepted to be
troubling and require some form of intervention. The DT can serve as a useful
tool for accurately ruling out individuals who are not likely to require
intervention, but is less accurate in ruling in true-positive cases of distress.
It may best be implemented followed by a more comprehensive assessment of
those who score over the cutoff value to further determine appropriate
referrals.

4. Implementing Screening Programs in Oncology Settings


In recent years, there has been considerable interest in computerizing the
administration and scoring of short screening questionnaires in oncology37 to
improve efficiencies of time and human resource requirements (see Chapter 4).
This began primarily with longer assessments of quality of life, a construct that
assesses much more than distress or depression, including physical, social,
role, and emotional functioning, and common health-related symptoms. This
literature is relevant, however, as the technology has since been applied to
screening with shorter instruments. In such studies, the selected questionnaire
is typically completed on a computerized interface and immediately scored,
and a report is produced and presented to treatment staff to inform subsequent
clinical decisions.
For example, in a crossover, randomized study of touchscreen versus paper
completion of two quality-of-life questionnaires, touchscreen was preferred by
participants in a ratio of 2:1, within all demographic subgroups. The benefits of
the touchscreen for providers were identified as automatic and immediate
collection and sharing of data, automatic scoring, information available
online, cost and time savings, and printouts available for immediate placement
in patients’ charts.38 In another study of the feasibility of collecting
13 SCREENING FOR DEPRESSION IN CANCER CARE 277

standardized self-reported quality-of-life and psychosocial needs via a


touchscreen computer, 99% of patients reported the touchscreen as easy to
use.39 In the Netherlands, Detmar and colleagues40,41 reported that physicians
found quality-of-life summary information to provide a useful overall impres-
sion of their patients’ functional health and symptom experience while
improving the efficiency of the clinical encounter. Patients were also largely
satisfied with the computerized intervention. A recent study administered the
HADS online to 3,071 patients attending a cancer facility for follow-up care in
a variety of clinics; 85% of all patients were able to complete the question-
naires.42 Patients who were female, were younger than 65 years old, and had
more severe illness were most distressed.
In a series of studies on the computerized assessment of quality of life by
patients immediately prior to appointments, coupled with the immediate pro-
vision of quality-of-life summary information to oncologists, our group estab-
lished excellent acceptance of computerized quality-of-life data by both
physicians and patients, in breast cancer and pain and palliative care.43,44
Our current work with distress screening has followed from this and taken a
phased approach. Phase I was a baseline cross-sectional assessment of the
current level of psychosocial distress in patients, and an assessment of their
awareness and use of psychosocial resources.33 Results in a sample of almost
3,000 patients highly representative of the overall patient population con-
firmed the findings of other studies, with 38% scoring above the BSI-18
cutoff for distress—that is, ‘‘caseness’’ as identified by Zabora and collea-
gues.29 Cases were more likely to be on active treatment, have a diagnosis other
than prostate cancer, belong to an ethnic minority, or be from a low-income
family.
In Phase II of the program we updated the screening battery to include the
DT and replaced the BSI-18 with a new tool called the Psychological Screen
for Cancer (PSSCAN).45 The PSSCAN was developed for screening for
depression and anxiety in clinical practice and as a research tool, and Part C
is a reasonable proxy for the BSI-18, which we chose to replace given copy-
right-associated cost issues. The entire battery consists of the DT, modified
problem checklist,27 PSSCAN, 10-point scales for fatigue and pain, and nutri-
tion questions. Phase II, which has recently been completed, included a three-
armed randomized controlled trial of the effect of three different levels of
screening in lung and breast cancer outpatients. Outcomes were distress,
common problems, anxiety and depression, and awareness and use of psycho-
social resources 3 months after initial screening, which occurred during the
first oncology appointment. The three conditions evaluated were minimal
screening (DT only), full screening, and full screening plus personalized
triage. In the triage condition, if the patient chose to be contacted, a staff
member phoned within a specified time period to discuss and arrange referral
278 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

options. A total of 1,141 patients enrolled in this study (89% accrual), and 90%
of them provided data at the 3-month follow-up. Preliminary results confirm
similarly high levels of distress and common problems as identified in Phase
I33 and suggest those with high distress who accepted referrals to psychosocial
services showed significantly greater decreases in anxiety and depression over
time than those who did not accept referrals. The program was also successful
in increasing overall awareness of the services available to patients, as well as
uptake of services for those who received the triage intervention, compared to
baseline data from 3 years previously.

Evaluating Efficacy of Screening Programs in Clinical


Oncology Practice
Despite great enthusiasm for developing questionnaires to detect emotional
complications of cancer, few groups have been able to implement a successful
screening program for mood disorders, and even fewer have carried out
systematic evaluation of the efficacy of such programs. Table 13.3 summarizes
the studies to date that have longitudinally evaluated the impact of psychoso-
cial screening on patient outcomes.
Most recent studies have used computerized screening techniques, but one
earlier study implemented distress screening over the telephone and subse-
quently evaluated its impact on quality of life. Maunsell and colleagues46
randomized women newly diagnosed with nonmetastatic breast cancer to
receive either usual care, or monthly telephone distress screening followed
by triage. For all participants, distress levels decreased over the following year
regardless of group assignment. The authors concluded that the minimal
psychosocial intervention all participants received as part of their initial
cancer care may have been effective in reducing distress in and of itself,
without further gain from additional screening.
Our early work used a computerized version of the EORTC QLQ C-30 to
screen for quality of life in lung cancer patients. Using a sequential cohort
design, patients were assigned to either a usual-care control group, who
completed the EORTC QLQ-C30 paper version after the clinic appointment,
or an experimental group, who completed the questionnaire prior to their first
clinic appointment with feedback to staff. Patients reported being equally
satisfied with the treatment in both groups, but timely provision of quality-of-
life information in the experimental group resulted in greater discussion of
qualify-of-life issues and more actions taken by oncologists regarding these
issues.47 Velikova and associates48 randomly assigned 28 oncologists
treating 286 cancer patients to an intervention group who received feedback
of results, an attention-control group who completed questionnaires without
Table 13.3. Summary of Efficacy Studies of Psychosocial Screening in Cancer Populations

Reference Study Design Sample Methods Measures Results Conclusions/


Comments

Maunsell et al., Randomized 251 women newly The experimental Baseline: Distress levels This distress-
199646 controlled trial to diagnosed with group had monthly Social Support: SSQ decreased over the screening program
usual care (control) nonmetastatic breast telephone distress Marital study period across did not improve QL
Randomized trial or telephone distress cancer (89% of total screening using the Satisfaction: groups. among women who
of a psychological screening population seen at a 20-item GHQ for LWMAT No between-group received minimal
distress screening intervention. regional cancer 12 months with differences were psychosocial
program after Stressful life events:
Women in both center) additional LES observed with intervention as part
breast cancer: groups received psychosocial regard to distress, of their initial cancer
Effects on QL brief psychosocial intervention offered Primary Outcome: physical health, care. This alone may
intervention from a to those with high Psychiatric functional status, be effective in
social worker at distress. Symptom Index social and leisure reducing distress,
initial treatment. Outcomes assessed (PSI) activities, return to making it difficult to
3 and 12 months Other outcomes: work, or marital obtain additional
later. Overall Health satisfaction. Use benefit from a
Perception (one of outside screening program.
question) co-interventions
Health Worry (one was similar between
question) groups.
Role performance
(leisure; home;
social; physical:
CHALS)
Visits to social
workers and other
healthcare
professionals

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
Taenzer et al., Sequential cohort 57 patients with dx Control group: Paper-and-pencil EORTC QLQ The tool was
200047 study. of any-stage lung After the standard EORTC QLQ-C30 C-30: Groups did effective in
Impact of Patients were cancer, out of 170 clinic appointment, Computerized not differ in the detecting increased
computerized sequentially seen in the lung completed PDIS EORTC QLQ-C30 number of items number of QL
QL screening on recruited first into a clinic (33.5%) and paper-and- endorsed concerns during the
physician control group, then pencil EORTC PDIS clinic appointment.
Groups not Exit interview: PDIS: Both groups
behavior and to the experimental different on QLQ-C30, and exit were equally A trend was also
Structured
patient group. demographic interview satisfied with their noted of more QL
interview to
satisfaction in The first 26 were variables clinic visit. concerns being
Experimental document patients’
lung cancer assigned to the charted and more
group: perception of Satisfaction scores
outpatients control group, the actions being taken
Completed whether QL were very high.
next 27 to the to address them
computerized concerns indicated Exit Interview:
experimental (differences were
EORTC QLQ-C30 on EORTC QLQ- Experimental
group. not significant).
and provided a C30 were group indicated
printed report of addressed during Limitations:
that significantly Generalizability is
results to their the clinic more quality-of-
nurse and appointment limited due to
life items were small sample of
physician during Medical Record discussed during
the clinic patients and
Audit: Total their clinic
appointment. After nonrandomized
number of QLQ- appointment than
the clinic design.
C30 categories the control group
appointment, charted and total (48.9% vs. 23.6%;
completed PDIS number of actions t = 3.95, p < 0.01).
and exit interview. taken were
recorded by a Medical Record
research assistant Audit: Actions
blinded to the study regarding a greater
condition
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
number of QL
categories were
indicated in charts
of patients in the
experimental than
in the control
group.
McLachlan Randomized 450 cancer patients Completed CNQ short form 86% response rate There were no
et al., 200149 controlled trial Inclusion criteria: questionnaires on EORTC QLQ-C30 at 2 months and meaningful
Randomized Patients were diagnosis of touchscreen 71% at 6 months. changes from
computer prior to BDI Short Form 63% of offered baseline in QOL
trial of stratified by clinic cancer; attending
coordinated of origin (eg, lung, medical oncology appointment. Patient satisfaction services were not between the
psychosocial breast). Two thirds clinic; not Randomly at 6 months: accepted by intervention and
interventions were assigned to attending for very assigned to satisfaction with patients across usual care groups
based on patient the intervention first consultation; intervention or medical staff, groups. at 2 and 6 months.
self-assessments arm and one third fluent in English; control group in information Greater benefit in The feasibility of
vs. standard care to the control arm ECOG status  2; 2:1 ratio. provision, overall intervention over using touchscreen
to improve the within each clinic. age  18; adequate Intervention satisfaction (1–4 control group in technology was
psychosocial follow-up group: Likert scale) respect to endorsed by both
functioning of scheduled in the Printed summary of Primary outcome: psychological and groups.
patients with institution; results presented at Difference health information Standardized QOL
cancer completion of the clinic between 2 arms needs at 2 months assessments prior
90% on prestudy appointment. with respect to but no differences to clinic
items Coordination nurse changes from at 6 months. appointments
present at clinic

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
visit. After visit baseline in No significant facilitate patient–
nurse formulated a psychological differences in healthcare team
personalized care needs and secondary communication
plan based on information needs outcomes at 2 about QOL issues.
results of summary measured by CNQ months.
report. at 2 months No significant
Control group: Secondary differences in levels
Summary report outcomes: of satisfaction with
was not made Differences in care.
available during other domains of
the clinic visit. CNQ, EORTC
Follow-up at 2 and QLQ-C30, and
6 months. depression
Satisfaction with
care received
assessed at 6
months.
Detmar et al., Prospective, 10 physicians, Intervention group: Patient–physician Patient–physician Significant
200250 longitudinal, 273 cancer patients Patients filled out communication: communication: increase in
HRQL randomized EORTC QLQ-C30 All visits were Higher in the discussion of
crossover trial Inclusion criteria: in the waiting room audiotaped and intervention group HRQL topics.
assessments and after receiving two
patient– before each visit. content analyzed. than control; 12 Intervention had
cycles of Responses were A score (0–12) of HRQL issues were only modest effect
physician chemotherapy
communication; optically scanned all health-related discussed more on patient
randomized into a computer QL topics frequently. management
controlled trial
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
Physicians were and a graphic discussed was the Physicians’ activities. Most
initially summary profile primary study awareness of patients and all
randomized into was printed out and outcome. patient HRQL: physicians reported
intervention vs. given to patients; a Physicians’ there were no that HRQL
control group. 10 copy was also awareness of significant summary of results
consecutive placed in the patient HRQL: At differences report helped with
patients were medical record. first and fourth between groups in patient–physician
recruited for each Physicians were visit both patients physician–patient communication
physician. First trained how to and physicians agreement in and they
study visit was a interpret the results completed COOP ratings on COOP/ recommended
baseline. of the and the WONCA. WONCA charts. continued use of
Intervention was questionnaire. Patient the intervention as
introduced at Patient standard care in
management: management:
second visit and Significantly more outpatient clinics.
continued until Medical records
and audiotapes patients from Limitations: Large
fourth visit. At intervention group number of tests
midpoint, were used to score
how many HRQL (23%) received performed,
physicians were counseling from physician sample
crossed over: those actions were taken
by a physician per the physician on was limited;
in the control group how to manage crossover design
were in the patient.
their health facilitated
intervention group Patients’ self- problems than in carryover and
and vice versa. reported HRQL: At the control group contamination
first and fourth (16%). effects.
visit the SF-36 was
administered to all
patients.

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
Patient and Patient and Physicians initially
physician physician assigned to the
evaluation of the satisfaction: Both intervention group
intervention: After patients and tended to discuss
fourth visit patients physicians reported HRQL issues more
in the intervention high satisfaction frequently even
group completed a Patients? HRQL: when in control
satisfaction survey No group group.
and brief phone differences in SF-
interview; 36 scales at the
physicians fourth visit;
underwent a semi- intervention group
structured reported
interview. significantly higher
improvement over
time in mental
health and role
functioning than
control group.
Consultation
duration +
evaluation of
intervention: No
significant
differences in visit
duration were
found; patients
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
reported positive
feedback about the
summary report
and so did the
physicians.

Velikova et al., Prospective 28 oncologists; 286 Patients were Intervention EORTC QLQ C- Chronic symptoms
200470 randomized cancer patients randomly assigned questionnaires: 30: A significant were discussed
Measuring QL in controlled trial Inclusion criteria: and their clinic EORTC QLQ-C30; overall effect of more often due to
routine oncology with repeated commencing encounter was HADS. well-being the intervention.
practice measures treatment, tape-recorded. Outcome between groups Intervention had a
improves Groups: expected to attend Those in measures: FACT-G: Scores positive impact on
communication clinic at least three intervention and improved in the patients’ well-
Intervention: attention FACT-G (v4)
and patient Completion of times, fluent in primary outcome intervention vs. being.
well-being: English, not taking completed control group, but
touchscreen QL touchscreen Process of care Routine repeated
randomized questionnaire + part in other HRQL not vs. attention- measurements of
controlled trial studies, not questionnaires measures: control group.
feedback of results before each of their Audiotaped HRQL may lead to
to physicians exhibiting overt Attention-control improvements in
psychopathology clinic encounters. encounters were
Attention-control: Outcome analyzed for group significantly emotional well-
Completion of QL questionnaires content of any better than being in some
questionnaire on were provided to quality-of-life control. patients.

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
touchscreen all patients on issues included in Process of Care:
computer, no paper to complete EORTC QLQ-C30. The number of
feedback to at home and return The content was EORTC symptoms
physicians by mail. Outcomes presented as a list mentioned was
Control: no were assessed: of binary variables higher in
touchscreen after the baseline (topics discussed or intervention vs.
measurement of encounter, after not) and combined control group.
HRQL before three study score of EORTC Chronic
clinic encounters encounters (2–3 symptoms (0–7) nonspecific
months), after 4 and functional symptoms (sleep,
Randomizations months, and at the issues (0–5).
2:1:1 in favor of changes in
end of the study discussed. The appetite, fatigue)
intervention group, (approx. 6 months). combined scores
stratified by site of were discussed
were used as study more often without
cancer primary outcome. prolonging the
encounters.
Physicians used the
HRQL data 64% of
the time.
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
Boyes et al., Two-group study 95 cancer patients Patients were Demographics and Intervention group Overall the patients
200651 with alternate Eligibility criteria: alternatively cancer reported fewer were well
Does routine consenting patients 18 or older, assigned by characteristics debilitating functioning at
assessment and assigned to attending first computer into (13 items) physical symptoms baseline, which
real-time treatment and consultation, intervention (n = Physical than control group. presented a limited
feedback control groups. received active 42) or control symptoms: 12 HADS: Anxiety opportunity to
improve cancer Assessed at first treatment after the group (n = 38). symptoms scores decreased in detect changes.
patients’ visit and three first visit, Both groups associated with both intervention Both patients and
psychosocial following considered by completed a 15-to chemotherapy and and control groups clinicians provided
well-being? consecutive visits. oncologist to be 20-min survey on a to what extent they from baseline to positive feedback.
emotionally and touchscreen interfere with final follow-up, but Even though
physically able to computer. Results patients’ daily the change was not clinicians were
participate of the intervention routine (1–3 scale) significantly involved in the
group made HADS different between development of the
available to groups. Depression report and provided
physicians; results SCNS measured scores decreased in positive feedback,
of the control patients’ level of the intervention they reported that it
group were not. need for help in 4 group from rarely contributed
domains: baseline to final to their decision
psychological follow-up and making, which
(8 items), health increased in the may be an
systems and control group, but important
information (13), the change was not implication for
patient care and significantly future training.
support (7), different between
physical and daily groups.
living (3).

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
Acceptability SCNS: Both groups
survey was reported moderate
administered to to high need for
both patients and help, which
oncologists. decreased from
baseline to follow-
up; differences
between groups
were not
significant.
Only 3 patients
reported that their
doctor discussed the
report with them,
but 50% of
physicians reported
providing feedback
to patients based on
the report.
Rosenbloom Randomized 213 patients Control group: FACT-G: 5 Negative mood and No impact of the
et al., 200752 clinical trial, Eligibility: FLIC at baseline, 3, subscales plus age were the two intervention, even
Assesment is not stratified by 6 months. additional rating of significant among the most
primary cancer. advanced breast, each symptom: differences distressed patients
enough: a lung, or colorectal FACT at 6 months.
randomized Control group: better than, worse between groups at Providing HRQL
cancer, receiving Data not shared than, as expected. baseline, used as
controlled trial Data not shared chemotherapy, at assessment and
of the effects of with treatment covariates. structured
with treatment least 6 months of nurse.
HRQL nurse life expectancy feedback of results
assessment to nursing staff
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
on QL and Assessment Exclusion: Assessment ‘‘Worse than’’ No significant prior to clinic visit
satisfaction in control: brain metastases control: triggered the differences in did not produce
oncology Baseline, 1- and 2- FACT-G and FLIC structured satisfaction or improvement in
clinical practice month FACT-G at baseline, and 1, interview to focus HRQL over time patient outcomes,
scores were shared 2, 3, and 6 months. on the indicated across all groups. clinical
with the treatment Baseline and 1- and symptom. Satisfaction and management, or
nurse 2-month FACT-G FLIC: to measure HRQL did not satisfaction.
Structured scores were shared HRQL outcomes change over the
interview and with the treatment Brief POMS-17: to study period.
discussion nurse. measure distress No significant
condition: Structured outcomes group differences
Structured interview and PSQ-III: 2 in clinical
interview about discussion subscales for treatment changes
responses to condition: general between 3 groups.
FACT-G at FACT-G and FLIC satisfaction, and
baseline and 1 and at baseline and 1, 2, satisfaction with
2 months 3, and 6 months + communication
structured Clinical Treatment
interview about changes:
responses to
FACT-G at Items completed by
baseline and 1 and treatment nurse at
2 months. baseline, 3 and 6
months included:
supportive
medication
changes,

(Continued )
Table 13.3. (Continued)

Reference Study Design Sample Methods Measures Results Conclusions/


Comments
supportive care
changes, referral to
supportive
services, other
clinical changes,
changes in dose of
chemotherapy as a
result of reported
side effects or
treatment toxicity.

BDI, Beck Depression Inventory; COOP, Dartmouth Primary Care Cooperative Information Health Assessment; CHALS, Canada Health and Activity Limitation Survey;
CNQ, Cancer Needs Questionnaire; ECOG, Eastern Cooperative Oncology Group performance status; EORTC QLQ C-30, European Organization of Research and
Treatment of Cancer Quality of Life Questionnaire C30; FACT-G, Functional Assessment of Cancer Therapy-General; FLIC, Functional Living Index-Cancer; GHQ,
General Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; HRQL, health-related quality of life; LES, Life Experiences Scale; LWMAT, Locke-
Wallace Marital Adjustment Test; PDIS, Patient-Doctor Interaction Scale; POMS, Profile of Mood States; PSQ III, Medical Outcomes Study Patient Satisfaction
Questionnaire-III; QL, quality of life; SCNS, Supportive Cancer Needs Survey; SF-36, Medical Outcomes Study 36-Item Short Form Health Survey; SSQ, Social Support
Questionnaire; WONCA, World Organization Project of National Colleges and Academics.
13 SCREENING FOR DEPRESSION IN CANCER CARE 291

feedback, and a control group with no questionnaires. Patients completed the


EORTC QLQ C-30 and the HADS online before their appointment.
Oncologists who received the quality-of-life reports asked more about emo-
tional problems, work-related issues, and daily activities, and on
average more issues were discussed without extending the time of the
consultation.
Another group further investigated the utility of providing summary reports
of quality of life and depression to the oncology team for a randomly chosen
two thirds of patients, with referral to appropriate psychosocial resources.49
Additionally, in the intervention arm a nurse was also present during the
consultation and formulated an individualized management plan based on the
issues raised and prespecified expert psychosocial algorithms. Six months after
randomization there were no significant differences between the two arms in
any domain or regarding satisfaction with care. However, the most striking
finding was that for patients who were moderately or severely depressed at
baseline on the BDI, appropriate triage did result in decreased depression 6
months later compared to the group whose results were not shared with the
healthcare team.49
Similarly, Detmar and colleagues50 randomly assigned patients in palliative
care to complete computerized quality-of-life assessments and either did or did
not provide the graphical presentation of results to physicians. For patients
whose physicians did receive the results, more health-related quality-of-life
issues were discussed, and more quality-of-life issues were identified by
physicians.
Boyes and associates, in Australia,51 had patients complete a computerized
version of the HADS while waiting to see their oncologist during each visit.
Responses were immediately scored and summary reports placed in each
patient’s file before the appointment. There were no effects on subsequent
anxiety, depression, and perceived needs among those who received the inter-
vention. However, it is possible that the oncologists were not using the report,
as only three intervention patients reported that their oncologist discussed the
feedback report with them.
Most recently, Rosenbloom and colleagues52 randomly assigned 213
patients with metastatic breast, lung, or colorectal cancer to usual care,
quality-of-life assessment, or assessment followed by a structured interview
(with presentation of symptoms to the treating nurse). There were no improve-
ments seen in patient outcomes, clinical management, or patient satisfaction
between the three conditions.
In summary, the data on implementation of distress screening followed by
evaluation of efficacy on subsequent patient outcomes has shown that such
interventions can result in more discussion of psychosocial issues between
patients and oncology staff, but there is limited evidence that this results in
292 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

better patient outcomes in the longer term. It appears that screening alone is not
enough to result in improvements for patients; screening ideally should be
accompanied by triage and referral to appropriate services known to have
proven efficacy in treating psychosocial distress, and should be
accompanied by training for oncology staff regarding how to make these
types of referrals.

5. Special Issues in Screening Cancer Patients


In the context of healthcare, not only asking but also acting upon the patient’s
most intimate and complex concerns requires a change of practice and a change
in assumptions, incorporating the full biopsychosocial model. Psychosocial
screening is an opportunity for patients and their support persons to understand
the relevance and importance of emotional well-being. One of the primary
goals of screening in cancer care is to provide programs that contribute to the
normalization and treatment of distress, as implied in the adoption of the
concept of ‘‘distress’’ as the ‘‘sixth vital sign in cancer care.’’25,26
Implications of adopting this model are that distress is assessed minimally
upon entry into the system and monitored at regular intervals throughout the
treatment program. Physicians and other members of the healthcare team also
require training in how to access and act upon the information provided
through psychosocial screening. Currently patients experience service delivery
in many different and often inconsistent ways; the hope is to provide a more
streamlined, consistent, meaningful, and proactive experience through appli-
cation of routine screening.
Successful integration into the complex cancer care system is an ongoing
process and demands collaboration, integration of theory into practice, flex-
ibility, and communication. In this environment, the need for connection,
understanding, and transparency with representatives from all levels is essen-
tial. This includes nurses, oncologists, booking clerks, receptionists, patient
records staff, managers, administrators, information technology services, and
program planners. From a clinical perspective, patient presentation is diverse.
Patients who are less distressed at their initial screening can sometimes show
significant distress in various areas, including resource needs, depression,
anxiety, and coping upon follow-up assessment—these needs have to be
acted upon no matter when they arise in an appropriate manner. On the other
hand, patients who are extremely distressed at their first screening often report
feeling less distressed at later intervals. It is essential within the framework of a
combined clinical and research setting that clinical staff are available when the
patient identifies a need, so that research findings are acted upon ethically
when these needs are identified.
13 SCREENING FOR DEPRESSION IN CANCER CARE 293

Defining, understanding, accommodating, and advocating for the needs of


people living with cancer and their support persons is the foundation that drives
service delivery. In terms of emotional needs, patients have provided feedback
that having psychosocial support as a core component of their medical appoint-
ment is important and that it helps them feel cared about. Patients have shared
that they appreciate confidentiality and private space provided where they can
answer sensitive questions in a discreet way—hence, the physical setting of
screening should accommodate these privacy needs. Most, if not all, cancer
patients seen for first consultations in cancer clinics bring at least one support
person with them. The importance of providing an environment that includes
those most important to patients, both in terms of physical space and inclusion
in the screening process, is an essential part of providing complete care. In
preliminary planning, the choice of screening space and technology must be
inclusive in the areas of physical needs and mobility—including access to
those in wheelchairs and on stretchers, if necessary. The technology chosen to
present the application has to be both psychologically and physically acces-
sible for people with disabilities. At the same time, it has to be efficient and
relevant to clinical staff, providing useful feedback in real-time that can be
used in the clinical encounter. Research to date has pointed to simple touchsc-
reen computer programs as the best way to balance these needs of patients and
families with the needs of the healthcare system.
In addition, in recognition of the predominantly older population served in
cancer centers (average patient age is in the mid-60s), supports have to been put
in place to accommodate patients who are not comfortable with computer use.
Finally, taking into consideration diversity in the patient population, screening
programs should offer versions of the program in multiple languages or have
translators present to assist patients who don’t speak the dominant language,
thus ensuring that they can benefit equally from the opportunities screening
provides.

6. Summary, Integration, Future Directions


There is an increasing awareness of the value and importance of screening for
depression and distress in oncology settings, based on research that has con-
sistently documented substantial rates of psychological morbidity in a range of
patients, using both conventional measures of depression and anxiety and more
recently introduced short screening tools for psychosocial distress. Researchers
have devised quick and simple methods for assessing symptoms in a wide range
of patients that are acceptable to both patients and providers, and introduced
computerized systems that make it possible to quickly screen a large number of
patients and provide immediate feedback regarding depression, distress, and
quality of life. Despite these advances, little evaluation of the actual downstream
294 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

impact of these programs on patients has been conducted, and most of the work
done to date has not resulted in clearly demonstrable benefits. As a result,
screening has yet to be implemented into routine clinical practice.
A 2005 survey of all NCCN member institutions in the United States
treating adults found that of 15 responding centers, 8 (53%) conducted routine
distress screening for at least some patient groups, and 4 (27%) were pilot-
testing screening strategies.53 However, only 20% of surveyed member insti-
tutions screened all patients as the NCCN guidelines recommend. In addition,
37.5% of institutions that conducted screening relied only on interviews to
identify distressed patients rather than using validated screening tools. In
addition, the fiscal costs of implementing screening have not been compared
to potential benefits. Some areas of potential cost savings resulting from
distress screening may be less use of inappropriate and expensive resources
such as visits to the emergency room or unnecessary chemotherapy, which may
be used inappropriately to treat anxiety (see Carlson and Bultz54,55 for reviews
of medical cost offset). Some form of economic analysis of psychosocial
screening may be required by policymakers before large-scale implementation
becomes common.
The high levels of distress documented in many cancer patients may serve as
a call to action and spur future research and program development. Ethically, it
can be argued that the documented prevalence of distress and depression in
these patients can no longer be ignored. Recognition of distress as the sixth
vital sign in cancer care requires service providers to assess and treat this
problem—respecting it with the same importance as treatment of physical
illness. Given the high prevalence of distress, cancer must be considered a
biopsychosocial illness with emotional sequelae that often include accompa-
nying symptoms of depression and anxiety that can be treated.
It is the imperative of the treatment and research team to determine how to
most reliably and efficiently identify and treat those in need of such care. The
several efficacy studies to date that have directly assessed potential benefit to
patients of screening with feedback to the medical team have provided incon-
sistent results. It appears that screening alone is not sufficient to alleviate patient
problems; some form of training must be provided to the care team to stimulate
appropriate action to treat identified problems, and ideally the required psycho-
social services must be available for needy patients. Further research to deter-
mine the specifics of how to best act upon information provided from patient
screening to optimize patient outcomes is critically needed.

7. Acknowledgments
Dr. Linda Carlson is supported by the Enbridge Endowed Research Chair in
Psychosocial Oncology, funded by Enbridge Inc., the Canadian Cancer Society
13 SCREENING FOR DEPRESSION IN CANCER CARE 295

Alberta/NWT Division, and the Alberta Cancer Foundation. This program of


research has been funded by the Public Health Agency of Canada, and the
Alberta Cancer Board Bridge and Pilot Funding and Research Initiatives
Programs.

References
1. Sontag S. Illness as metaphor and AIDS and its metaphors. New York: Picador USA,
2001.
2. Pirl WF. Evidence report on the occurrence, assessment, and treatment of depression in
cancer patients. J Natl Cancer Inst Monogr. 2004;32:32–39.
3. Massie MJ. Prevalence of depression in patients with cancer. J Natl Cancer Inst
Monogr. 2004;32:57–71.
4. Massie MJ, Popkin MK. Depressive disorders. In: Holland J, ed. Psycho-Oncology.
New York: Oxford University Press, 1998:518–540.
5. Sellick SM, Crooks DL. Depression and cancer: An appraisal of the literature for
prevalence, detection, and practice guideline development for psychological
interventions. Psychooncology. 1999;8:315–333.
6. Bottomley A. Depression in cancer patients: A literature review. Eur J Cancer Care
(Engl). 1998;7:181–191.
7. Bennett G, Badger TA. Depression in men with prostate cancer. Oncol Nurs Forum.
2005;32:545–556.
8. Boyd AD, Riba M. Depression and pancreatic cancer. J Natl Compr Canc Netw.
2007;5:113–116.
9. Potash M, Breitbart W. Affective disorders in advanced cancer. Hematol Oncol Clin
North Am. 2002;16:671–700.
10. Dejong M, Fombonne E. Depression in paediatric cancer: An overview.
Psychooncology. 2006;15:553–566.
11. Kua J. The prevalence of psychological and psychiatric sequelae of cancer in the
elderly—how much do we know? Ann Acad Med Singapore. 2005;34:250–256.
12. Trask PC. Assessment of depression in cancer patients. J Natl Cancer Inst Monogr.
2004;32:80–92.
13. Newport DJ, Nemeroff CB. Assessment and treatment of depression in the cancer
patient. J Psychosom Res. 1998;45:215–237.
14. Rodin G, Craven J, Littlefield C. Depression in the medically ill: an integrated
approach. New York: Brunner/Mazel, 1991.
15. Klinkman MS, Coyne JC, Gallo S, et al. Can case-finding instruments be used to improve
physician detection of depression in primary care? Arch Fam Med. 1997;6:567–573.
16. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect
depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J
Gen Pract. 2007;57:144–151.
17. Razavi D, Delvaux N, Farvacques C, et al. Screening for adjustment disorders and
major depressive disorders in cancer in-patients. Br J Psychiatry. 1990;156:79–83.
18. Grassi L, Sabato S, Rossi E, Marmai L, Biancosino B. J Affect Disord. 2009
Apr;114(1-3):193–199.
19. Akizuki N, Akechi T, Nakanishi T, et al. Development of a brief screening interview for
adjustment disorders and major depression in patients with cancer. Cancer.
2003;97:2605–2613.
296 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

20. Love AW, Grabsch B, Clarke DM, et al. Screening for depression in women with
metastatic breast cancer: A comparison of the Beck Depression Inventory Short Form
and the Hospital Anxiety and Depression Scale. Aust N Z J Psychiatry. 2004;38:526–
531.
21. Hall A, A’Hern R, Fallowfield L. Are we using appropriate self-report questionnaires
for detecting anxiety and depression in women with early breast cancer? Eur J Cancer.
1999;35:79–85.
22. Berard RM, Boermeester F, Viljoen G. Depressive disorders in an out-patient oncology
setting: Prevalence, assessment, and management. Psychooncology. 1998;7:112–120.
23. Holland JC. How’s your distress? A simple intervention addressing the emotional
impact of cancer can help put the ‘‘care’’ back in caregiving. Oncology (Williston
Park). 2007;21:530.
24. Holland JC, Bultz BD, National Comprehensive Cancer Network (NCCN). The NCCN
guideline for distress management: A case for making distress the sixth vital sign. J Natl
Compr Canc Netw. 2007;5:3–7.
25. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign—future directions in
cancer care. Psychooncology. 2006;15:93–95.
26. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign in cancer care. J Clin
Oncol. 2005;23:6440–6441.
27. National Comprehensive Cancer Network, Inc. Practice guidelines in oncology—
v.1.2002: Distress management. National Comprehensive Cancer Network, Inc;
2002;version 1.
28. Derogatis LR. Brief Symptom Inventory 18: administration, scoring and procedures
manual. Minneapolis, MN: NCS Pearson Inc, 2001.
29. Zabora J, Brintzenhofe-Szoc K, Jacobsen P, et al. A new psychosocial screening
instrument for use with cancer patients. Psychosomatics. 2001;42:241–246.
30. Derogatis LR. SCL-90-R: administration, scoring and procedures manual-II, 2nd ed.
Baltimore, MD: Clinical Psychometric Research, 1983.
31. Derogatis LR. Brief Symptom Inventory: administration, scoring, and procedures
manual. National Computer Systems, Inc, 1993.
32. Zabora J, Brintzenhofe-Szoc K, Curbow B, et al. The prevalence of psychological
distress by cancer site. Psychooncology. 2001;10:19–28.
33. Carlson LE, Angen M, Cullum J, et al. High levels of untreated distress and fatigue in
cancer patients. Br J Cancer. 2004;90:2297–2304.
34. Mitchell AJ. Pooled results from 38 analyses of the accuracy of Distress Thermometer
and other ultra-short methods of detecting cancer-related mood disorders. J Clin Oncol.
2007;25:4670–4681.
35. Jacobsen PB, Donovan KA, Trask PC, et al. Screening for psychologic distress in
ambulatory cancer patients. Cancer. 2005;103:1494–1502.
36. Gil F, Grassi L, Travado L, et al, Southern European Psycho-Oncology Study Group.
Use of distress and depression thermometers to measure psychosocial morbidity among
Southern European cancer patients. Support Care Cancer. 2005;13:600–606.
37. Wright EP, Selby PJ, Crawford M, et al. Feasibility and compliance of automated
measurement of quality of life in oncology practice. J Clin Oncol. 2003;21:374–382.
38. Velikova G, Wright EP, Smith AB, et al. Automated collection of quality-of-life data: A
comparison of paper and computer touch-screen questionnaires. J Clin Oncol.
1999;17:998–1007.
13 SCREENING FOR DEPRESSION IN CANCER CARE 297

39. Allenby A, Matthews J, Beresford J, et al. The application of computer touch-screen


technology in screening for psychosocial distress in an ambulatory oncology setting.
Eur J Cancer Care (Engl ). 2002;11:245–253.
40. Detmar SB, Muller MJ, Wever LD. The patient-physician relationship. patient-
physician communication during outpatient palliative treatment visits: An
observational study. JAMA. 2001;285:1351–1357.
41. Detmar SB, Aaronson NK. Quality of life assessment in daily clinical oncology
practice: A feasibility study. Eur J Cancer. 1998;34:1181–1186.
42. Strong V, Waters R, Hibberd C, et al. Emotional distress in cancer patients: The
Edinburgh cancer centre symptom study. Br J Cancer. 2007;96:868–874.
43. Taenzer PA, Speca M, Atkinson MJ, et al. Computerized quality of life screening in an
oncology clinic. Cancer Pract. 1997;5:168–175.
44. Carlson LE, Speca M, Hagen N, et al. Computerized quality-of-life screening in a
cancer pain clinic. J Palliat Care. 2001;17:46–52.
45. Linden W, Yi D, Barroetavena MC, et al. Development and validation of a psychosocial
screening instrument for cancer. Health Qual Life Outcomes. 2005;3:54.
46. Maunsell E, Brisson J, Deschenes L, et al. Randomized trial of a psychologic distress
screening program after breast cancer: Effects on quality of life. J Clin Oncol.
1996;14:2747–2755.
47. Taenzer P, Bultz BD, Carlson LE, et al. Impact of computerized quality of life screening
on physician behaviour and patient satisfaction in lung cancer outpatients.
Psychooncology. 2000;9:203–213.
48. Velikova G, Brown JM, Smith AB, et al. Computer-based quality of life questionnaires
may contribute to doctor-patient interactions in oncology. Br J Cancer. 2002;86:51–59.
49. McLachlan SA, Allenby A, Matthews J, et al. Randomized trial of coordinated
psychosocial interventions based on patient self-assessments versus standard care to
improve the psychosocial functioning of patients with cancer. J Clin Oncol.
2001;19:4117–4125.
50. Detmar SB, Muller MJ, Schornagel JH, et al. Health-related quality-of-life assessments
and patient-physician communication: A randomized controlled trial. JAMA.
2002;288:3027–3034.
51. Boyes A, Newell S, Girgis A, et al. Does routine assessment and real-time feedback
improve cancer patients’ psychosocial well-being? Eur J Cancer Care (Engl).
2006;15:163–171.
52. Rosenbloom SK, Victorson DE, Hahn EA, et al. Assessment is not enough: A
randomized controlled trial of the effects of HRQL assessment on quality of life and
satisfaction in oncology clinical practice. Psychooncology. 2007;16:1069–1079.
53. Jacobsen PB, Ransom S. Implementation of NCCN distress management guidelines by
member institutions. J Natl Compr Canc Netw. 2007;5:99–103.
54. Carlson LE, Bultz BD. Efficacy and medical cost offset of psychosocial interventions in
cancer care: Making the case for economic analyses. Psychooncology. 2004;13:837–
849.
55. Carlson LE, Bultz BD. Benefits of psychosocial oncology care: Improved quality of life
and medical cost offset. Health Qual Life Outcomes. 2003;1:8.
56. Hopwood P, Howell A, Maguire P. Screening for psychiatric morbidity in patients with
advanced breast cancer: Validation of two self-report questionnaires. Br J Cancer.
1991;64:353–356.
57. Ibbotson T, Maguire P, Selby P, et al. Screening for anxiety and depression in cancer
patients: The effects of disease and treatment. Eur J Cancer. 1994;30A:37–40.
298 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

58. Lloyd-Williams M, Friedman T, Rudd N. An analysis of the validity of the hospital


anxiety and depression scale as a screening tool in patients with advanced metastatic
cancer. J Pain Symptom Manage. 2001;22:990–996.
59. Patrick-Miller LJ, Broccoli TL, Much JK. Validation of the Distress Thermometer: A
single item screen to detect clinically significant psychological distress in ambulatory
oncology patients. J Clin Oncol. 2004;24:Abstr 6024.
60. Hoffman BM, Zevon MA, D’Arrigo MC, et al. Screening for distress in cancer patients:
The NCCN rapid-screening measure. Psychooncology. 2004;13:792–799.
61. Akizuki N, Yamawaki S, Akechi T, et al. Development of an impact thermometer for
use in combination with the Distress Thermometer as a brief screening tool for
adjustment disorders and/or major depression in cancer patients. J Pain Symptom
Manage. 2005;29:91–99.
62. Ransom S, Jacobsen PB, Booth-Jones M. Validation of the Distress Thermometer with
bone marrow transplant patients. Psychooncology. 2006;15:604–612.
63. Mehnert A, Muller D, Lehmann C. Die deutsche version des NCCN distress-
thermometers: Empirische Prufung eines screening-instruments zur erfassung
psychosozialer belastung bei krebspatienten. [in German with English translation by
author]. Zeitschrift fur Psychiatrie Psychologie und Psychotherapie. 2006;54:213–223.
64. Adams CA, Carter GL, Clover KA. Concurrent validity of the Distress Thermometer
with other validated measures of psychological distress. Psychooncology.
2006;15:s105.
65. Andritsch E, Ladinek V, Zlokikovits S. Identifying symptom burden and distress of
cancer patients with chemotherapy: A pilot study for an Austrian sample.
Psychooncology. 2006;15:s158.
66. Ohno T, Noguchi W, Nakayama Y, et al. How do we interpret the answer ‘‘neither’’
when physicians ask patients with cancer ‘‘are you depressed or not?’’ J Palliat Med.
2006;9:861–865.
67. Kumar TM, Venkateswaran C, Bostock N. Screening for psychosocial distress:
Crosscultural issues. Psychooncology. 2006;15:S692.
68. Ozalp E, Cankurtaran ES, Soygur H, et al. Screening for psychological distress in
Turkish cancer patients. Psychooncology. 2007;16:304–311.
69. Gessler SF, Lowe J, Daniells E. UK validation of the Distress Thermometer.
Psychooncology. 2006;15:s107.
70. Velikova G, Booth L, Smith AB, et al. Measuring quality of life in routine oncology
practice improves communication and patient well-being: A randomized controlled
trial. J Clin Oncol. 2004;22:714–772.
14
SCREENING FOR DEPRESSION IN PERINATAL
SETTINGS

Jodi Barton and Philip Boyce

1. Introduction: Perinatal Screening in Context


2. Why Screen, and What Are We Screening For?
3. Screening Practices in Perinatal Settings
4. Screening Guidelines and Recommendations
5. Evidence-Based Comparison of Screening Methods
6. Implementation in Practice: Does Screening Make any Real-World
Difference?
7. Service Delivery and Treatment Implications
8. Summary and Key Recommendations

Context
Implementing screening in perinatal settings poses a potentially complex set of
issues, but screening is nonetheless increasingly being recommended and even
mandated. When should screening occur—during pregnancy, postpartum, or
both? What instrument should be used? How acceptable is screening to
mothers? What difference does screening make to the management of post-
partum depression? This chapter presents an evidence-based approach to all
aspects of perinatal screening.

1. Introduction: Perinatal Screening in Context


Over the past 20 years there has been considerable interest in psychiatric
disorders arising during the course of pregnancy and following childbirth.
Most of the attention has been focused on depressive disorders arising within
the first 3 months to 1 year after childbirth, commonly referred to as postnatal

299
300 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

or postpartum depression. Pregnancy was once thought to be protective


against depressive symptoms; however, women are just as likely to experi-
ence depressive symptoms while pregnant as they are during the postpartum
period.1,2 The mean prevalence of antenatal depression is between 10.7%3
and 12%,4 with increasing prevalence and severity2 through the second and
third trimesters. This is comparable with the 10% to 15% of women who
develop postpartum depression.5 While the DSM-IV official recognition of
postpartum depression arising after childbirth is confined to a postpartum
specifier for those episodes of major depression that have an onset within
4 weeks after delivery, increasing knowledge of depression during the
antenatal period has given rise to its equally important early recognition
and treatment.
Whatever the specifier of postpartum depression in the DSM-IV, depres-
sion at this time has been granted considerable importance because of its
potential adverse impacts upon child development and maternal morbidity
and mortality;6 and because of the treatment challenges inherent in pregnant
and breastfeeding women.7 Even though the consequences of postpartum
depression have been recognized, the illness itself is frequently not identi-
fied; it has been estimated that between 50%8 and 75%9 of the women
suffering from postpartum depression will have it identified and potentially
treated. More recent work has focused attention on depression during the
course of pregnancy, so-called antenatal depression. However, the validity of
measuring depression during pregnancy and in the postpartum period is not
clear, especially the boundary between depressive symptoms and clinically
significant depressive disorder. The timing of onset of the disorder is also
important; it may more accurately represent a continuation of a depressive
episode that had commenced prior to conception.
The perinatal period is technically defined as the period between 154 days
(22 weeks) of gestation and 28 days postpartum.10 While the DSM-IV
definition of postpartum depression is an onset within 4 weeks of parturition,
symptoms of depression often develop much later within the first year of the
infant’s life. Practically, as healthcare providers, the entire antenatal period
and up to 1 year after delivery is managed under a broader perinatal umbrella.
It is a time when women will have more contact with healthcare professionals
than any other time in adulthood. This is why it is considered an opportune
time to identify those at risk for developing depression (so that prevention
can take place) and to detect depression so that early intervention can be
instituted. This has encouraged the development of a variety of screening
strategies to identify risk and detect disorder that will be discussed in this
chapter.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 301

2. Why Screen, and What Are We Screening For?


Screening for a disorder (or a marker for disorder, such as HB1Ac for diabetes)
enables health practitioners to provide early intervention to reduce or eliminate
negative outcomes. Screening for depression during the perinatal period permits
both obstetric and mental health clinicians to identify women who are experien-
cing depression or anxiety, associated with childbearing, or to attempt to identify
women at risk of developing depression. Early intervention strategies can be
targeted directly at women who may be most in need of additional support,
thereby potentially ameliorating the negative effects that maternal depression
can have upon the development of the infant and the mother–infant relationship.
By targeting those who may need intervention, we can more effectively use the
physical and staffing resources available to clinical care providers.
There are two predominant approaches to screening in the perinatal period,
one aimed at detection of an occult disorder and the other to identify risk
factors for the disorder. Further predictive methods are under development to
determine who is at risk of future episodes of depression. We also need to
consider the timing of screening, which complicates the strategies chosen. The
variable approaches that have been taken and reported are as follows.

Screening for Depressive Symptoms in the Postpartum Period


Screening is usually conducted at routine postpartum checkups or at a ‘‘well-
baby’’ 6-week-postpartum health checkup using the Edinburgh Postnatal
Depression Scale (EPDS).11 Screening usually occurs at general practices,
pediatricians, or maternal and child healthcare centers. A cutoff score of 12
on the EPDS in the postpartum period is suggested to indicate that major
depression12 is likely to be present and is typically used to trigger further
assessment, referral, and treatment.13

Screening for Depressive Symptoms in the Antenatal Period


Screening occurs during pregnancy using questionnaires such as the EPDS, the
Beck Depression Inventory II (BDI-II), the Postpartum Depression Screening
Scale (PDSS), the Center for Epidemiological Studies Depression Scale (CES-
D),14 and the Prime MD-PHQ.15 However, none of these questionnaires have
been suitably validated for this purpose in this population, and cutoff scores
that accurately predict major or minor depression have not been adequately
established. Further, treatment options are limited, especially for major depres-
sion. Screening is usually conducted in conjunction with antenatal care visits at
clinics and in general practice.
302 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Screening for Depressive Symptoms/Psychological Distress


During the Antenatal Period for Those at Risk of Developing
Postnatal Depression
There is little evidence to show that the robust prediction of depression in
the postpartum period can be based upon psychological distress during
pregnancy. Screening is usually conducted using the instruments and
methods listed above. Further effective interventions to prevent the devel-
opment of postnatal depression have yet to be clearly identified, and
whether individual versus general risk aversion should be implemented
remains unclear.

Screening for Psychosocial Risk Factors During the Antenatal


Period for Risk of Developing Postnatal Depression
The Holy Grail of screening in perinatal psychiatry has been to develop
instruments that identify significant risk factors that will reliably predict
subsequent postpartum depression. A series of predictive tools have been
developed and significant risk factors for depression in the postpartum
period have been identified.16–18 While it seems reasonable to generalize
to include the antenatal period, appropriate investigation would be indi-
cated. Some obstetric care models already screen for the presence of risk
factors such as domestic violence in routine care. This screening strategy
is similarly encumbered by less-than-adequate resources to routinely
follow up women considered to be at risk. The evaluation and validation
of instruments to screen for antenatal risk factors for postnatal depression
is an ongoing endeavor. The merit of screening for risk factors remains
questionable given that ‘‘most risk factors have poor discriminatory
power, or poor positive predictive value’’19 (p. 176). Even if there is a
strong association between a risk factor and potential disease outcome, it
does not automatically ‘‘follow that the risk factor provides a basis for an
effective prediction rule for individual patients’’20 (p. 2616). We need to
differentiate between the statistical risk of a population-determined risk
factor and the clinical risk that is pertinent to the current status of the
individual patient, as risk factors for depression are dynamic rather than
static.
It is not yet clear whether we are screening for current psychological
distress, major or minor depression, or the presence of risk factors that may
predict future depressive episodes. The objectives of screening in this clinical
population require clarification and strategic development before routine
screening, using instruments such as the EPDS, for depression becomes an
integral part of obstetric care.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 303

3. Screening Practices in Perinatal Settings


Since the advent of routine perinatal depression screening, mixed evidence has
emerged about its utility on the basis that even though screening was imple-
mented and treatment offered for women detected as at risk, they often refused
treatment.9,21 Reasons for treatment refusal will be discussed below. Attempts at
prediction of later risk have shown average sensitivity and specificity and do not
always capture the women who are most at risk, as they often do not participate
in the screening process and/or refuse subsequent intervention. The majority of
studies that have attempted to identify risk factors for perinatal depression have
been conducted in postnatal women.22 Many studies also do not take into
account racial and cultural variations that are likely to entail different levels of
risk and different risk factors. The generalizability of these findings to the
antenatal population and in varying cultures needs to be ensured by thorough
investigation. Cost-effectiveness not only of screening, but also of outcomes
needs to be ensured before wide area screening methods are implemented.23
Screening for depression in perinatal care often becomes the responsibility of
obstetric care providers such as midwives, maternal and child health nurses,24,25
general practitioners,26 and obstetricians. Such screening is in addition to the
other important health issues managed at busy antenatal clinics and needs to be
backed up with adequate training to improve clinicians’ skill base, confidence,
and subsequent willingness to implement routine screening. Ideally, mental
health services for childbearing women would be co-located with obstetric
services; however, this is rarely the case, particularly in busy public hospital
settings where time and space are premium assets. Mental health services are not
predominantly located in primary care facilities that are used by many women
for their obstetric care, and thus the onus remains with the primary care provider.
Practically, it is optimal for screening to be conducted by primary antenatal care
providers, with adequate mental health training and awareness, due to their
proximity to perinatal women during this time.
While this may be a practical approach to screening, introducing depres-
sion screening into already busy and demanding obstetric practices can be
problematic, especially in the antenatal setting. Routine screening can be used
where all women are screened at all their visits to their practitioner.
Alternatively, strategic screening can be implemented at specific visits (eg,
6 weeks postpartum). The optimal time to screen antenatally has not been
established; however, most investigators have screened late in the second
trimester or early in the third trimester. The EPDS, the BDI-II, and the PDSS
have shown greatest utility and predictive validity to date.8
An alternative strategy, recommended by the National Institute for Clinical
Health and Excellence guidelines,27 is the use of two or three simple targeted
interview questions aimed at identifying key DSM-IV diagnostic criteria—
304 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

namely, whether the woman has been bothered by feeling down, depressed, or
hopeless; whether she has been bothered by having little pleasure or interest in
normal activities; and whether she would like to receive further help. Given that
individuals may endorse the first two questions but may not in fact be subjec-
tively bothered by it, the third question—whether the woman would like further
help with the way she has been feeling—has been suggested (the ‘‘help ques-
tion’’). The Patient Health Questionnaire (PHQ2) has been developed for this
purpose.28 By taking an approach such as this, not only is it simple, sensitive,29
and fast, but the clinician can conserve resources by not referring women who do
not, at that time, want or need further assistance from mental health services. The
PHQ2 screening strategy has sensitivity and specificity equivalent to the use of
the EPDS, both in the antenatal and postnatal period, and has high accuracy to
rule out women who are not at risk of being depressed; the negative predictive
value is between 97% and 99%.30 It is also appropriate to use with women who
have low levels of education, as it is not limited by literacy levels. Pregnant or
postnatal patients may also feel more attended to by having the clinician ask
about their well-being, rather than by having them complete a pen-and-paper
questionnaire.
Studies to date show that while obstetric care providers recognize the
importance and impact of mental health problems, they also feel they lack
adequate knowledge about how to recognize and manage perinatal depression
and about where to refer women to for specialized psychiatric help. They often
feel screening is difficult to carry out in everyday practice and question
whether it leads to better outcomes.31 Practitioner education is a critical
element in the implementation of any screening program, as it will ensure
more accurate detection, confident independent practice, and potentially the
capacity to streamline referrals to psychiatric services.32

4. Screening Guidelines and Recommendations


The National Screening Committee (NSC) criteria appraise ‘‘the viability,
effectiveness and appropriateness of a screening programme.’’ Screening for
depression in the perinatal period has been evaluated against these existing
guidelines and significant deficits have been found33 (Textbox 14.1 and
Table 14.1). The current screening initiatives used do not meet the majority
of the criteria to warrant routine screening in national health services. Gaynes
and associates23 found similar deficits in the U.S. context and highlighted the
need for thorough research in this population. The existing evidence is just too
sparse to adequately inform clinicians and clinical policy decision makers
about the most appropriate screening methods to be used, whether screening
is cost-effective, and whether screening leads to better outcomes for perinatal
women and their families.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 305

Textbox 14.1.Comparison of Perinatal Screening against National


Screening Committee Criteria

The Condition
Important health problem [
Adequately understood and detected [
Cost-effective primary prevention available 
The Test
Validated screening instrument [
Known and agreed cutoff score [
Acceptability [
Agreed policy of diagnostic investigation and treatment options for positive 
screens
Treatment
Evidence of effective early intervention [
Agreed policy on availability of effective treatment 
Optimal condition management prior to the implementation of screening 
Screening program
RCT evidence of reduction of morbidity/mortality 
Clinically, socially & ethically acceptable to health professionals and [
consumers
Benefits outweigh risk of harm 
Cost-effectiveness & value of screening 
Quality assurance & monitoring 
Adequate staff & facilities 
Cost-effectiveness in comparison to existing management options 
Informed decision making for consumer 
Justifiable screening criteria & cutoffs for treatment eligibility 
 Either no clear evidence or criteria not met
[ Clear evidence and criteria met

5. Evidence-Based Comparison of Screening Methods


The NSC criteria for screening state that a ‘‘screening test should be safe,
simple, precise and validated; a suitable cut-off value should be defined and
agreed’’ before any screening program is routinely implemented. Defining and
diagnosing a psychiatric disorder is not a simple process and not one aided by
definite measurable biomarkers. There is increased opportunity for subjective
bias and variable interpretation; this is the case whether questionnaires or
interviews are used to screen for depression (see Chapter 2). Further, the
Table 14.1. Screening Guidelines and Recommendations for Best Practice by Country of Origin

National Guideline Date of Country Intention Selected Recommendations


Release of Origin
Evaluation of screening for postnatal August UK To evaluate screening initiatives against Many national criteria not met
depression against the NSC 2001 current national guidelines particularly with regard to cost
handbook criteria33 effectiveness and outcomes from
screening. Insufficient evidence to draw
substantial conclusions, though
concerns raised about national
screening initiatives already
implemented.
Antenatal care: routine care for the October UK To provide a national clinical Women should be assessed and
healthy pregnant woman34 2003 framework for best practice in routine interviewed for a history of psychiatric
antenatal care. Covers all aspects of disorder. Women should not be
antenatal care, psychiatric assessment screened routinely with the Edinburgh
considered as singular element. Postnatal Depression Scale (EPDS) to
predict risk of developing postnatal
depression. Women should not be
offered antenatal education
interventions to reduce perinatal or
postnatal depression,
Postnatal depression and puerperal June Edinburgh, To provide evidence for clinicians and There is no evidence to support routine
psychosis. A national clinical 2002 UK health consumers about the screening screening in the antenatal period to
guideline35 for and prevention and management of predict the development of postnatal
postnatal depression depression. The EPDS should be
offered as part of a screening program
for postnatal depression at 3 weeks and
6 months postpartum. The EPDS is not a
diagnostic tool, diagnosis requiring
clinical evaluation.
Table 14.1. (Continued)

National Guideline Date of Country Intention Selected Recommendations


Release of Origin
U.S. Preventive Services Task May US To provide evidence for routine Recommends screening adults for
Force36 2002 screening for depression (not depression in clinical practices that
specifically postnatal) in primary have systems in place to assure accurate
practice diagnosis, effective treatment, and
follow-up. Some evidence of cost-
effectiveness provided.
Senate Select Committee on Mental April Australia Report of wide-ranging inquiry into the That a national strategy for perinatal
Health37 2006 national mental health strategy and health services be developed, including
objective achievement. Intended as early identification, intervention,
recommendations for strategic reform. prevention, and education and support
of all new parents. Recommendations
developed subsequent to submission of
findings of ‘‘beyondblue’’ postnatal
depression program.
308 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

treatment of depression during pregnancy and for breastfeeding mothers is not


simple; thus, better care may not necessarily follow better identification.
Short depression screening questionnaires (with 10 items or less) have
become a popular method of screening for depression in the perinatal period.
A range of questionnaires have been tested. The most commonly used is the
EPDS, which was initially designed by Cox and Holden11 as a detection tool to
assist health visitors in assessing the mental health of new mothers during
home visits. Since then, screening for postpartum depression has gained sub-
stantial momentum and validation. The EPDS is short, easy to administer, and
easy to score, has reasonable predictive validity in the postnatal context, and
has good face validity with the consumer. The EPDS is not the only screening
instrument used, but it is the most widely used and has been more widely
tested, providing the strongest data of its utility. Screening for depression
during pregnancy is a more recent initiative making convenient use of routine
obstetric care.
Any screening instrument used must have not only construct validity but
also face validity—that is, it must be acceptable to the population in which it is
to be used. This applies also to the use of interview questions to screen for
depressive symptoms or distress, as is advocated by the National Institute for
Clinical Excellence (NICE) guidelines in preference for screening question-
naires. There is currently not enough evidence about the comparative validity
of interviewing versus questionnaire approaches to suggest the superiority of
one over the other—nor if, in fact, routine screening should even be conducted.
Reviews of screening instruments for postpartum depression found that the
EPDS, the BDI-II, and the PDSS38 have greater sensitivity and specificity in
the perinatal population than other measures that have been tested.8,23 The
benefit of the EPDS over the other two measures is its brevity: it has only 10
items, compared with 21 and 35 items on the BDI-II and PDSS respectively.
This makes completion and scoring easier. A methodologic problem in the
validation of many questionnaires is that they are not validated in the intended
population, nor against a gold-standard clinical interview. A summary of the
review studies and their findings is given in Table 14.2. Higher cutoff points
were usually used to detect major depression only, where lower scores were
used to detect possible major or minor depression. Lowering the cutoff
increases the number of false positives and reduces the specificity, or the
ability of the instrument to detect those who truly do not have depressive
symptoms. Clinicians would need to clarify their screening objectives to
decide whether a higher or lower cutoff best meets their needs; cutoff score
ranges are given in Table 14.2. The instrument of choice is best dictated by the
clinical population, and it would be ideal to choose an instrument that has been
adequately validated in that population, with particular regard given to the
appropriate cutoff score to be used in each unique culture.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 309

There have been recent initiatives to incorporate screening for psychosocial


risk factors for depression in the perinatal setting, in addition to screening for
depressive symptoms.39 Many health services already routinely screen for
known risk factors such as family violence and financial difficulties as part
of routine antenatal clinic intake interviews. While it is important to know what
risk factors are pertinent for a depressed woman and likely to be contributing to
her symptoms, we suggest caution in this additive approach as a means of
detecting women who may be depressed. Studies that have evaluated the utility
of psychosocial risk screening instruments have so far shown poor sensitivity17
or do not provide any evidence of their predictive value.40 Dichotomizing risk
factors for depression into a categorical yes or no, as can happen with the use of
risk factor screening strategies, may oversimplify the impact of risk factors on
psychological well-being. Risk factors are dimensional in nature and are
perhaps best considered on a continuum, such as number of significant life
events or adequacy of social support. There is also no evidence to indicate at
what point risk for depression becomes clinically significant: How many risk
factors need to be present? How severe do they need to be?

Table 14.2. Summary of Perinatal Depression Screening Instrument Sensitivity, Specificity,


Cutoff, and Positive Predictive Value (PPV) Ranges

Instrument Time of Depression Cutoff PPV Sensitivity Specificity


Screening Screened Range (%)* Range Range
For
Antenatal
EPDS 28–34/40 Major 12–15 8–35 1.0 0.79–0.96
weeks
Major/minor 11–14 0.57–0.71 0.72–0.95
Postnatal
4 days to Major 10–13 19–92 0.75–1.0 0.7–0.99
12 weeks
Major/minor 10–13 0.44–0.81 0.77–0.92
8–14 0.23–0.79 0.43–0.96
BDI Postnatal Major 11–21 34–53 0.32–0.68 0.88–0.99
Major/minor 10 0.48 0.86
BDI-II Postnatal Major 21 74–100 0.56 1.0
Major/minor 15 0.57 0.97
PDSS Postnatal Major 81 33–88 0.94 0.98
CES-D Postnatal Major/minor 16–21 53 0.6–0.43 0.92–0.97

*At 13% prevalence rate estimate.


CES-D, Center for Epidemiological Studies Depression Scale; EPDS, Edinburgh Postnatal Depression
Scale; BDI-II, Beck Depression Inventory; PDSS, Postpartum Depression Screening Scale.
Data from references 8, 12, 22, 23, 41, 42.
310 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

6. Implementation in Practice: Does Screening Make any


Real-World Difference?
Let us consider the case of a woman who is 28 weeks pregnant and scores 21 on
the EPDS, clearly indicating psychological distress, maybe even a depressive
episode. What then? Is there somewhere we can refer the patient? Are appro-
priate treatments available? Will the patient have expedient access to support
and treatment? Are treatment facilities adequately resourced and staffed by
appropriately trained personnel?
She is now referred for further assessment and perhaps treatment, but she
declines the services offered. Many women identified using the EPDS as
a screening tool for probable depression in the beyondblue depression
screening study declined follow-up care. This is not an uncommon finding
in both research and clinical care and indicates many women’s tendency to
mask their distress with stoicism in their endeavor to ‘‘stay strong’’ for
themselves and their baby/family, or to dismiss their distress and cope as
best they can. Some women also decline psychiatric care for fear of man-
datory reporting to social service agencies (where such protocols exist),
which is of particular concern for women with severe mental illness
(Textbox 14.2).
Table 14.3 outlines the potential outcomes from screening against true
diagnosis of depression. The inherent inaccuracy of depression screening
leads to high numbers of false positives, which in turn leads to inefficient use
of available resources in both psychiatric and obstetric settings. The World

Textbox 14.2.Reasons for Refusal of Treatment for Perinatal


Depressive Symptoms

• Lack of knowledge of condition and resources


• Cultural factors
• Somatized distress and help seeking for treatment of physical condition
• Denial
• Accepted as a normal part of being a mother
• Don’t want to be a burden
• Fear of loss of child through social services
• Lack time or willingness to attend appointments
• Health professionals normalizing/dismissing depressive symptoms
• Time constraints in primary care
Dennis CL, Chung-Lee L. Postpartum depression help-seeking barriers and maternal treatment
preferences: a qualitative systematic review. Birth. 2006;33:323–331.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 311

Table 14.3. Possible Reasons for False Positives and Negatives in Screening

Depression diagnosis

Depressed Non-depressed
Positive TRUE POSITIVE FALSE POSITIVE
screening • Full diagnostic assessment • False referral
outcome
• Referral for treatment • Ineffective use of
• Potential for appropriate perinatal psychiatric clinical resources
care – if pathways to care are appropriately • Inappropriate
established and resourced. labeling
Negative FALSE NEGATIVE TRUE NEGATIVE
screening • Not offered follow-up assessment or treatment • Not offered follow-
outcome
• Fall through the gaps of clinical care up
• Increased risk for mother and infant on social, • Clinical resources
emotional, and cognitive level not required
• May not be seeking help for psychological
distress or masking symptoms (stoicism)

Mental Health Survey noted that ‘‘a meaningful number of services are going
to those without apparent needs. Such potential diversion of limited treatment
resources to individuals without apparent needs would be of concern in view
of the magnitude of unmet needs for patients with clearly defined and serious
disorders.’’43 Not only does this reflect the limitations of depression
screening strategies but it also questions the merit of doing so, especially
when effective treatment options for pregnant or postpartum women are not
clearly defined.
Targeted multilevel screening is recommended to make the most efficient
use of the health resources available (Fig. 14.1). A multilevel strategy also
permits the detection of women with different health risk profiles44 and may
assist in the assessment of their unique clinical risk and management needs.

7. Service Delivery and Treatment Implications


Beyond the issue of screening and accurate detection of depressive symptoms
in the perinatal setting, according to the NSC screening program criteria there
must be evidence of effective early intervention, agreed-on policies on the
availability of effective treatment, and optimal condition management in place
before the implementation of screening.
The overall focus on prevention and early intervention has put great
emphasis on the perinatal period as a seemingly ideal time to provide inter-
ventions to prevent postnatal depression. This is due to the high level of contact
ANTENATAL 6 WEEKS POSTNATAL
Targeted Interview
• Have you noticed any change in your mood or the
NO
way you feel about things since you became
pregnant? (Asked at each visit )

YES

• Have you been depressed or anxious before? Assessment


• Is the way you are feeling causing you distress? • Symptom screening using EPDS
NO • Would you like further help with the way you are
feeling?

EPDS < 12 EPDS ≥ 12


YES

• No intervention Referral
Assessment • Educate about mental health • Refer for diagnostic assessment,
• Risk factor assessment maintenance strategies for new follow up and treatment, if appropriate,
• Symptom screening using EPDS OR mothers by psychiatry and or perinatal mental
BDI-ll OR PDSS • Provide information on available health personnel - as determined by
resources available resources

EPDS < 10 (or other appropriate EPDS ≥ 10 and < 15 (or other EPDS ≥ 15 (or other appropriate
antenatal cut off score if alternate appropriate antenatal cut off score if antenatal cut off score if alternate
instrument used ) alternate instrument used ) instrument used )
Negative for risk factors Positive for risk factors Positive for risk factors

Monitoring Referral
• Ongoing monitoring for change. • Refer for diagnostic assessment,
Antenatal staff to repeat screening at follow up and treatment, if appropriate,
subsequent antenatal visits. by psychiatry and or perinatal mental
• Educate about mental health health personnel - as determined by
maintenance strategies for new available local resources
mothers
• Provide information on available
resources

Figure 14.1. Perinatal depression screening model.


14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 313

that women have with their healthcare providers during the perinatal period.
Effective preventive strategies offered to women at high risk should theoreti-
cally prevent the emergence and consequences of depression upon the social,
emotional, and cognitive development of the infant. Such interventions to date
have included psychoeducation about risk factors and symptoms,45 psy-
chotherapy,21 interpersonal therapy,46 both individually and in group settings,
interventions such as increased community care, and interventions designed to
affect directly the attachment relationship between mother and infant. Meta-
analyses conclude that psychosocial interventions designed to prevent post-
partum depression do not reduce the number of women who go on to develop
depression,47 and although intensive, professional postpartum support is effec-
tive in treating postpartum depression, there is no substantive evidence of the
cost-effectiveness of any of these interventions.48
While there is a significant need to identify and eliminate barriers to
treatment, we must also focus on providing effective and consumer-friendly
treatment that is readily available to those who need and choose to participate
in it. Perinatal psychiatry services need to provide evidence-based treatments
that are safe for both mother and baby. A combination of inpatient mother–
baby, outpatient, and outreach services that run in parallel with obstetric care
would be the optimal service model. There are few specialist perinatal psy-
chiatry facilities in public health settings, and pathways to such facilities are
not always clear. The facilities that are available vary depending upon service
models and available resources; thus, it is important for the obstetric care
provider to know what resources are available and how to expediently obtain
access to resources to help women who are depressed and ensure optimal
condition management.

8. Summary and Key Recommendations


A repeated approach to antenatal screening using the NICE approach to
screening of two or three critical interview questions first helps the clinician
to detect whether there is a problem. Asking targeted interview questions at
each antenatal visit promotes communication and rapport between the mother
and her healthcare provider and permits monitoring over time. It also estab-
lishes whether in fact the woman even desires further assistance, at that time,
with any emotional distress she may be experiencing, thus conserving health
resources, time, and effort.
Secondary screening with a severity scale then permits the clinician to
gauge the severity of any symptoms the woman may be experiencing and her
unique risk factors for depression. Whether there is an optimal cutoff score is
yet to be resolved.23 Referral to appropriate services to diagnose and treat
314 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

mental illness will depend on the resources available; these vary due to
differing service models and available trained personnel and facilities.
However, a full diagnostic interview should then be conducted prior to the
formulation and implementation of management plans.
Postpartum screening is more straightforward. Screening at a well-baby
checkup, between 2 weeks and 6 months postpartum,8 is recommended with
the use of a questionnaire such as the EPDS. This should be followed up by a
clinical interview to confirm or refute the diagnosis of major depression.
Scores of 12 and over on the EPDS are predictive of symptoms of postpartum
depression severe enough to necessitate referral for diagnosis and/or treat-
ment.12,13,49,50 Psychotherapeutic and pharmacologic treatments are both
effective in the treatment of postpartum depression. As discussed earlier,
symptoms of postpartum depression often develop much later in the infant’s
first year of life than the DSM-IV-defined 4-week-postpartum period.
Clinicians need to be mindful of this and ask their patients about their emo-
tional or mental health since the birth of their baby each time they see them.
Staying vigilant and sensitive to women’s mental health status provides a
maximal opportunity for depression detection and treatment.

References
1. Dennis CL, Chung-Lee L. Postpartum depression help-seeking barriers and maternal
treatment preferences: a qualitative systematic review. Birth. 2006;33:323–331.
2. Evans J, Heron J, Francomb H, et al. Cohort study of depressed mood during pregnancy
and after childbirth. Br Med J. 2001;323:257–260.
3. Dennis CL, Ross LE, Grigoriadis S. Psychosocial and psychological interventions for
treating antenatal depression. Cochrane Database Syst Rev. 2007:CD006309.
4. Bennett H, Einarson A, Taddio A, et al. Prevalence of depression during pregnancy:
systematic review. Obstetr Gynecol. 2004;103:698–709.
5. O’Hara MW, Swain AM. Rates and risk of postpartum depression—a meta-analysis. Int
Rev Psychiatry. 1996;8:37–54.
6. Oates M. Perinatal psychiatric disorders: a leading cause of maternal morbidity and
mortality. Br Med Bull. 2003;67:219–229.
7. Riecher-Rossler A, Hofecker FM. Postpartum depression: do we still need this
diagnostic term? Acta Psychiatr Scand Suppl. 2003;418:51–56.
8. Boyd RC, Le HN, Somberg R. Review of screening instruments for postpartum
depression. Arch Womens Ment Health. 2005;8:141–153.
9. Thio IM, Oakley Browne MA, Coverdale JH, et al. Postnatal depressive symptoms go
largely untreated: a probability study in urban New Zealand. Soc Psychiatry Psychiatr
Epidemiol. 2006;41:814–818.
10. Australian Institute of Health & Welfare. Perinatal Period. NPDD Committee, 2005.
11. Cox J, Holden J, Sagovsky R. Detection of postnatal depression: Development of the
10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS 315

12. Eberhard-Gran M, Eskild A, Tambs K, et al. Review of validation studies of the


Edinburgh Postnatal Depression Scale. Acta Psychiatr Scand. 2001;104:243–249.
13. Leverton TJ, Elliott SA. Is the EPDS a magic wand? 1. A comparison of the Edinburgh
Postnatal Depression Scale and health visitor report as predictors of diagnosis on the
Present State Examination. J Reprod Infant Psychol. 2000;18:279–296.
14. Radloff LS. The CES-D Scale: a self-report depression scale for research in the general
population. Appl Psychol Measurement. 1977;1.
15. Spitzer RL, Williams JB, Kroenke K, et al. Validity and utility of the PRIME-MD
patient health questionnaire in assessment of 3000 obstetric-gynecologic patients: the
PRIME-MD Patient Health Questionnaire Obstetric-Gynecology Study. Am J Obstet
Gynecol. 2000;183:759–769.
16. Appleby L, Gregoire A, Platz C, et al. Screening women for high risk of postnatal
depression. J Psychosom Res. 1994;38:539–545.
17. Austin MP, Hadzi-Pavlovic D, Saint K, et al. Antenatal screening for the prediction of
postnatal depression: validation of a psychosocial pregnancy risk questionnaire. Acta
Psychiatr Scand. 2005;112:310–317.
18. Cooper PJ, Murray L, Hooper R, et al. The development and validation of a predictive
index for postpartum depression. Psychol Med. 1996;26(3):627–634.
19. Rockhill B, Kawachi I, Colditz G. Individual risk prediction and population-wide
disease prevention. Epidemiol Rev. 2000;22:176–180.
20. Ware JH. Statistics and medicine: The limitations of risk factors as prognostic tools. N
Engl J Med. 2006;355:2615–2618.
21. Carter FA, Carter JD, Luty SE, et al. Screening and treatment for depression during
pregnancy: a cautionary note. Aust N Z J Psychiatry. 2005;39(4):255–261.
22. Austin MP, Lumley J. Antenatal screening for postnatal depression: a systematic
review. Acta Psychiatr Scand. 2003;107(1):10–17.
23. Gaynes BN, Gavin N, Meltzer-Brody S, et al. Perinatal depression: prevalence,
screening accuracy, and screening outcomes. Evidence Report: Technology
Assessment (Summary). 2005;119:1–8.
24. Buist A, Condon J, Brooks J, et al. Acceptability of routine screening for perinatal
depression. J Affect Disord. 2006;93:233–237.
25. Massoudi P, Wickberg B, Hwang P. Screening for postnatal depression in Swedish
child health care. Acta Paediatr. 2007;96:897–901.
26. Seehusen DA, Baldwin LM, Runkle GP, et al. Are family physicians appropriately
screening for postpartum depression? J Am Board Fam Pract. 2005;18:104–112.
27. National Institute for Health and Clinical Excellence. Antenatal and postnatal mental
health: Clinical management and service guidance. In: NICE Clinical Guideline.
London, 2007.
28. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a
two-item depression screener. Medical Care. 2003;41:1284–1292.
29. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
Two questions are as good as many. J Gen Intern Med. 1997;12:439–445.
30. Bennett IM, Coco A, Coyne JC, et al. Can the burden of screening for depression in
pregnancy and postpartum be reduced? Efficiency of a two-question pre-screen: An
IMPLICIT network study. J Am Board Fam Med. 2008;21(4):317–325.
31. LaRocco-Cockburn A, Melville J, Bell M, et al. Depression screening attitudes and
practices among obstetrician-gynecologists. Obstet Gynecol. 2003;101:892–898.
32. Coleman VH, Morgan MA, Zinberg S, et al. Clinical approach to mental health issues
among obstetrician-gynecologists: A review. Obstet Gynecol Surv. 2006;61:51–58.
316 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

33. Shakespeare J. Evaluation of screening for postnatal depression against the NSC
handbook criteria. United Kingdom, 2001:1–21.
34. National Collaborating Centre for Women’s and Children’s Health. Antenatal are:
Routine care for the healthy pregnant woman. London, 2003:1–304.
35. Scottish Intercollegiate Network. Postnatal depression and puerperal psychosis.
A national clinical guideline. Edinburgh, 2002.
36. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
37. Senate Select Committee on Mental Health. A national approach to mental health—
from crisis to community. Canberra, Australia, 2006:1–33.
38. Beck CT, Gable RK. Postpartum Depression Screening Scale: Development and
psychometric testing. Nursing Res. 2000;49:272–282.
39. Matthey S, Phillips J, White T, et al. Routine psychosocial assessment of women in the
antenatal period: frequency of risk factors and implications for clinical services. Arch
Womens Mental Health. 2004;7:223–229.
40. Blackmore ER, Carroll J, Reid A, et al. The use of the Antenatal Psychosocial Health
Assessment (ALPHA) tool in the detection of psychosocial risk factors for postpartum
depression: a randomized controlled trial. J Obstet Gynaecol Can. 2006;28:873–878.
41. Adouard F, Glangeaud-Freudenthal NM, Golse B. Validation of the Edinburgh
Postnatal Depression Scale (EPDS) in a sample of women with high-risk pregnancies
in France. Arch Womens Ment Health. 2005;8:89–95.
42. Adewuya AO, Ola BA, Dada AO, et al. Validation of the Edinburgh Postnatal
Depression Scale as a screening tool for depression in late pregnancy among Nigerian
women. J Psychosom Obstet Gynaecol. 2006;27:267–272.
43. Wang PS, Aguilar-Gaxiola S, Alonso J, et al. Use of mental health services for anxiety,
mood, and substance disorders in 17 countries in the WHO world mental health surveys.
Lancet. 2007;370:841–850.
44. Harrington AR, Greene-Harrington CC. Healthy Start screens for depression among
urban pregnant, postpartum and interconceptional women. J Natl Med Assoc.
2007;99:226–231.
45. Lumley J, Austin MP. What interventions may reduce postpartum depression. Curr
Opin Obstet Gynecol. 2001;13:605–611.
46. Spinelli MG. Interpersonal psychotherapy for depressed antepartum women: a pilot
study. Am J Psychiatry. 1997;154:1028–1030.
47. Dennis CL. Psychosocial and psychological interventions for prevention of postnatal
depression: systematic review. BMJ. 2005;331:15.
48. Brugha TS, Wheatley S, Taub NA, et al. Pragmatic randomized trial of antenatal
intervention to prevent post-natal depression by reducing psychosocial risk factors.
Psychol Med. 2000;30:1273–1281.
49. Leverton TJ, Elliott SA. Is the EPDS a magic wand? 2. ‘Myths’ and the evidence base. J
Reprod Infant Psychol. 2000;18:297–307.
50. McQueen K, Montgomery P, Lappan-Gracon S, et al. Evidence-based
recommendations for depressive symptoms in postpartum women. J Obstet Gynecol
Neonatal Nurs. 2008;37:127–136.
15
SCREENING IN CARDIOVASCULAR CARE

Brett D. Thombs and Roy C. Ziegelstein

1. Depression in Cardiovascular Disease


2. The Prevalence of Depression in Cardiovascular Disease
3. Screening Instruments for Depression in Cardiovascular Care
4. Recommendations for Evaluation and Treatment of Patients in
Cardiovascular Care
5. Conclusions

Context
There is great interest in screening in cardiovascular settings but little evidence
that implementation of screening will affect depression or cardiac outcomes
despite the epidemiologic evidence that depression predicts cardiac events and
mortality. Since this chapter was accepted, in October 2008 the American Heart
Association (AHA) Working Group published a Scientific Advisory recom-
mending that all patients with cardiovascular disease be screened for depres-
sion, although this recommendation was not based on a systematic review of the
evidence. Several weeks after release of the Scientific Advisory, a systematic
review of depression screening in cardiovascular care was published but did not
find evidence that patients with cardiovascular disease would benefit from
screening for depression. The authors of the review noted that no published
trials have assessed whether screening for depression improves depressive
symptoms or cardiac outcomes in patients with cardiovascular disease, sug-
gesting that the recommendations of the AHA Scientific Advisory were
premature.

317
318 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

1. Depression in Cardiovascular Disease


High rates of depression were first documented among patients with cardio-
vascular disease (CVD) in the late 1960s. Early research on depression in CVD
focused on patients with acute myocardial infarction (AMI) and conceptua-
lized depression as an acute reaction to a catastrophic medical event.1–4 In the
1990s, groundbreaking work by Frasure-Smith and colleagues5,6 demonstrated
a connection between major depression during hospitalization for AMI and
subsequent mortality. Since then, many other studies have identified major
depression or depressive symptoms as risk factors for mortality and recurrent
cardiac events among patients with AMI or unstable angina pectoris (together
known as acute coronary syndromes [ACS]) even after controlling for other
known risk factors, although not all studies have reported a significant associa-
tion.7–10 Other studies have reported that depression among patients with ACS
is related to decreased quality of life11,12 and poor adherence to secondary
prevention behaviors, including smoking cessation, taking prescribed medica-
tions, exercising, and attending cardiac rehabilitation.13 Less research on the
relationship between depression and mortality has been done in other CVD
patient groups, although similar links have been reported in studies of patients
with congestive heart failure (CHF), for instance.14–17 Authors of systematic
reviews and meta-analyses have not all agreed that the evidence is sufficiently
robust to determine that depression is a risk factor for mortality in CVD above
and beyond other risk factors and cardiac disease severity, however, and some
have raised the issue of possible methodologic limitations in study designs,
including inadequate control for other risk factors and cardiac disease
severity.7–10 In addition, anxiety and self-reported quality of life, which
overlap substantially with depression, have also been shown to be important
predictors of outcomes among patients with CVD.18,19 Only one trial, the
ENhancing Recovery in Coronary Heart Disease (ENRICHD) trial, which
enrolled over 2,000 patients, has been designed to test whether treatment of
depression among post-AMI patients would reduce mortality risk. It did not
find that patients randomized into the cognitive–behavioral therapy (CBT)
treatment group fared better than patients in the usual-care control group in
terms of mortality,20 although secondary analyses indicated that patients who
received CBT and whose depression improved or patients who were treated
with sertraline due to severe depression or an initially poor response to CBT
exhibited lower mortality.21,22
The decision to screen for depression among patients in cardiovascular care,
however, should not depend on whether or not treatment of depression improves
cardiac outcomes or overall mortality. Depression is a chronic, disabling condi-
tion that has been shown to have a major impact on quality of life in CVD,23 even
after controlling for standard somatic measures, such as the degree of heart
15 SCREENING IN CARDIOVASCULAR CARE 319

failure or the severity of an index myocardial infarction.24,25 Indeed, for many


patients with CVD, quality of life is as important as survival.23
Screening is indicated if a disease or condition is an important health problem; if
its presence would not be readily detected without screening; if it is prevalent in the
population; if cost-efficient screening mechanisms with good performance char-
acteristics (eg, sensitivity and specificity) exist and are available; if effective
treatments are available; and if failure to identify and treat would have important
negative consequences. Ideally, screening methods should carry a minimal risk of
false-positive results that might lead to unnecessary diagnostic testing, adverse
effects and costs of inappropriate treatment, and the sequelae of being incorrectly
labeled.26–28 The American College of Cardiology/American Heart Association
(ACC/AHA) Guidelines for the Management of Patients with ST-Elevation
Myocardial Infarction (2004)29 designate as class I (ie, procedure or treatment is
useful/effective) the recommendation that ‘‘the psychosocial status of the patient
should be evaluated, including inquires regarding symptoms of depression, anxiety,
or sleep disorders and the social support environment’’ (p. e153), and the ACC/
AHA 2007 Guidelines for the Management of Patients with Unstable Angina/Non-
ST-Elevation Myocardial Infarction30 designate as class IIa (ie, recommendation in
favor of treatment or procedure being useful) the recommendation that ‘‘it is
reasonable to consider screening UA/NSTEMI patients for depression and refer/
treat when indicated’’ (p. e96). Neither recommendation, however, describes
procedures for assessing depression, and no guidelines recommend for or against
depression screening for patients with other cardiovascular disease diagnoses.
In most centers, screening for depression is not yet part of standard cardiac
care,31 and the merit of routinely screening every patient is still debated. The
objective of this chapter is to provide an overview of key issues related to the
implementation of depression screening as standard care. The chapter reviews
the prevalence of depression in cardiovascular care and available depression
screening tools and makes recommendations on how screening, treatment, and
follow-up programs may be best integrated into cardiovascular care.

2. The Prevalence of Depression in Cardiovascular Disease


Several questions related to the prevalence of depression in cardiovascular care
have a direct impact on the likelihood that screening can be implemented in a
cost-effective and efficient manner that produces beneficial results. For
instance, is depression sufficiently prevalent among patients with CVD to
warrant the time and cost involved in implementing a screening program?
Among which patients, and at what point in the disease process? Is depression
mostly a phenomenon related to a life-threatening event like an AMI? Will it
resolve on its own even without specific treatment?
320 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Comorbid major depression is present in approximately 1 of 5 patients with


cardiovascular disease,1,32 which is a substantially higher rate than the estimated
5% prevalence in the general population33 or the 5% to 10% among patients in
primary care.34 A recent systematic review reported that rates of major depression
among patients hospitalized with AMI ranged from 16% to 27%.32 A similar
prevalence of depression (14% to 27%) was reported across a wider spectrum of
CVD, including hospitalized patients with AMI or unstable angina, outpatients
and inpatients with coronary artery disease, and patients after coronary artery
bypass graft surgery.1 Studies of inpatients and outpatients with CHF and of
patients with cardiomyopathy have reported similar depression prevalence rates
of 14% to 21%.14,15,35,36 These rates include only major depression, but minor
depression and subsyndromal symptoms of depression are also highly prevalent
among patients with CVD and have been associated with risk for future cardiac
events and mortality among post-AMI patients.37–39 The Beck Depression
Inventory (BDI)40 is the most commonly used assessment tool in studies of
depression in CVD, and based on a standard cutoff of 10 or greater, between
20% and 37% of hospitalized post-AMI patients have at least mild to moderate
symptoms of depression,32 consistent with rates reported among patients with
implantable cardioverter defibrillators (ICDs).41–43 Patients with CHF may have
even higher rates of depressive symptoms based on a BDI score of 10 or more
(30% to 51%), although their rates of major depression are similar.14,35,44,45
Minor depressive symptoms may occasionally be seen as a reaction to the acute
event, although the majority of patients who are depressed in the hospital continue
to be depressed months after discharge.32 Recent research has shown that the
trajectory of depressive symptoms over the course of time, rather than symptom
levels in the hospital following an AMI alone, may play a role in long-term health.
Patients who have high levels of depressive symptoms during hospitalization
following AMI, but whose symptoms resolve fairly rapidly, are not at greater risk
for negative health outcomes. Patients whose symptoms are persistent or increase
following discharge, on the other hand, tend to have worse outcomes.12,46,47
Thus, the evidence suggests that high rates of depression and/or subsyn-
dromal symptoms of depression are present among most CVD patient groups.
Levels of depressive symptoms change over time for individual patients, but,
overall, depression is not a transient phenomenon related to acute events.
Instead, depression and subsyndromal depressive symptoms tend to be
persistent.

3. Screening Instruments for Depression in Cardiovascular Care


Many potential screening instruments have been developed and tested in various
patient populations. A reasonable question is whether health professionals who
15 SCREENING IN CARDIOVASCULAR CARE 321

work in cardiovascular care need to select a screening tool that has been
validated specifically for cardiovascular care or whether one screening instru-
ment is as good as any other for use with CVD patients.
Indeed, many different depression screening instruments have been vali-
dated and tested against diagnostic criteria in primary care settings. A few of
the better-recognized assessment instruments include the BDI40 or its revised
version, the BDI-II,48 the Patient Health Questionnaire-9 (PHQ-9),49 the
Patient Health Questionnaire-2 (PHQ-2),50 the Center for Epidemiologic
Study Depression Scale (CES-D),51 and the General Health Questionnaire
(GHQ).52 Fewer depression screening tools have been specifically validated
against a ‘‘gold standard’’ structured diagnostic interview in cardiovascular
care.53 In primary care, however, there is little evidence to suggest that any
particular instrument performs better than other instruments. A systematic
review found that although there was inconsistency across studies that used
the same instrument, there were not systematic differences between instru-
ments, and that brief two- or three-item screening tools appeared to perform
as well as longer screening instruments for screening purposes.54 Median
sensitivity and specificity across 38 studies of 16 different case-finding
instruments with primary care patients were 85% and 74%, respectively,
which was only slightly better than similar values reported in a meta-analysis
of brief two- or three-item screeners (overall pooled sensitivity = 74%,
specificity = 75%),55 although this comparison is based on sets of studies
using different samples rather than head-to-head comparisons in the same
settings. Both brief and longer screening tools, however, tend to have rela-
tively high false-positive rates—approximately 50% when the prevalence of
depression is 20% and 60% to 70% when the prevalence is 10%.54,55 Thus,
positive screens must be confirmed by a diagnostic interview.56,57
Table 15.1 shows instruments that have published data on diagnostic accu-
racy compared to a structured interview, such as the Structured Clinical
Interview for DSM,58,59 the Diagnostic Interview Schedule,60 or the
Composite International Diagnostic Interview,61 for major depression among
patients with CVD. Sensitivity refers to the proportion of patients with major
depression who had a positive screen, and specificity is the proportion without
major depression with negative screens. The positive predictive value (PPV) is
the proportion of patients with positive screens who were also diagnosed with
major depression based on a structured clinical interview, and the negative
predictive value (NPV) is the proportion of patients with negative screens who
did not receive a major depression diagnosis based on a structured clinical
interview (see Chapter 5 for further discussion). In Table 15.1, where PPV,
NPV, and/or 95% confidence intervals were not provided in the original
studies, they were estimated from available prevalence, sensitivity, and speci-
ficity data.
Table 15.1. Summary of Studies of Performance Characteristics of Depression Screening Tools in Cardiovascular Disease
Study Patient Study Site n Mean Males % Depressed Instrument/ Derivation Sensitivity (%) Specificity (%) Positive Negative
Author, Group Age (%) Cutoff of Cutoff (95% CI) (95% CI) Predictive Predictive
Year (Years) Value (%) Value (%)
(95% CI) (95% CI)

Frasure- Post-AMI Canada 218 60 78 15% BDI  10 Standard 82 (68–94) 78 (71–83) 40 (27–51) 96 (93–99)
Smith,
19956, 63
Gutierrez Outpatient Canada 40 70 50 15% BDI  13 Standard 83 (53–100) 94 (86–100) 71 (37– 97 (89–100)
199981 CHF 100)
Strik, Post-AMI Netherlands 206 60 76 11% BDI  10 ROC 82 (66–98) 79 (73–85) 37 (21–45) 98 (96–100)
200164
HADS  13 ROC 90 (77–100) 84 (79–90) 45 (31–59) 99 (96–100)
HADS-D  4 ROC 85 (70–100) 75 (69–81) 32 (21–43) 98 (96–100)
SCL-90-D  25 ROC 96 (87–100) 74 (68–80) 37 (26–48) 96 (93–99)
Freedland, Hospitalized US 613 66 49 20% BDI  10 Standard 88 (81–93) 58 (54–62) 34 (28–38) 95 (93–97)
200335 CHF
Dickens, Post-AMI UK 314 58 63 21% HADS  17 ROC 88 (80–96) 85 (80–89) 60 (50–70) 96 (94–99)
200470
McManus, CHD US 1,024 67 82 22% CES-D-10  10 Standard 76 (70–81) 79 (76–82) 50 (45–56) 92 (90–94)
200566
PHQ-9  10 Standard 54 (47–61) 90 (88–92) 59 (53–67) 87 (85–90)
PHQ-2  3 Standard 39 (33–46) 92 (90–94) 58 (50–65) 84 (82–87)
2-item screen Standard 90 (86–94) 69 (66–73) 45 (40–49) 96 (95–98)
Table 15.1. (Continued)
Study Patient Study Site n Mean Males % Depressed Instrument/ Derivation Sensitivity (%) Specificity (%) Positive Negative
Author, Group Age (%) Cutoff of Cutoff (95% CI) (95% CI) Predictive Predictive
Year (Years) Value (%) Value (%)
(95% CI) (95% CI)

Denollet, Post-AMI Netherlands 176 60 76 11% SAD4  3 Upper 95 (85–100) 68 (60–74) 28 (17–37) 99 (97–100)
200682 tertile
Huffman, Post-AMI US 131 62 80 13% 2 items from ROC 94 (83–100) 76 (68–84) 37 (23–52) 99 (97–100)
200669 BDI
Low ACS Canada 119 63 75 6% BDI-II  14 Standard 86 (59–100) 89 (82–94) 33 (11–55) 99 (95–100)
200765
GSD  11 Standard 100 85 (77–91) 29 (11–47) 100
Stafford, CAD Australia 193 64 81 18% HADS-D  6 ROC 80 (69–91) 82 (76–86) 49 (38 – 95 (91–97)
200778 60)
PHQ-9  6 ROC 83 (71–93) 79 (73–83) 46 (36–56) 95 (92–98)
Frasure- ACS Canada 804 60 81 7% BDI-II  14 Standard 91 (84–98) 78 (74–80) 24 (17–29) 99 (98–100)
Smith,
200818
HADS-A  8 Standard 84 (74–94) 62 (58–65) 14 (10–18) 98 (97–99)

ACS, acute coronary syndrome; AMI, acute myocardial infarction; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory-II; CAD, coronary artery disease;
CES-D-10, 10-item version of the Center for Epidemiological Studies Depression Scale; CHD, coronary heart disease; CHF, congestive heart failure; DMI-10, Depression in
the Medically Ill 10-item measure; DMI-18, Depression in the Medically Ill 18-item measure; GDS, Geriatric Depression Scale; HADS, Hospital Anxiety and Depression
Scale, total score; HADS-A, Anxiety Subscale of the Hospital Anxiety and Depression Scale; HADS-D, Depression Subscale of the Hospital Anxiety and Depression Scale;
PHQ-2, Patient Health Questionnaire-2; PHQ-9, Patient Health Questionnaire-9; ROC, receiver operator curve analysis; SAD4, Symptoms of Anxiety-Depression index;
SCL-90-D, Depression Subscale of the Symptom Checklist 90.
324 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

As shown in Table 15.1, some studies used receiver operator characteristic


(ROC) curve analysis62 to derive cutoff scores in an exploratory fashion and
other studies used established cutoff scores based on published results from
studies with other patient groups or guidelines from screening tool developers.
Overall, consistent with reviews of screening in primary care,54,55 there were
few major differences in sensitivity or specificity, and the rate of false positives
was high across studies. Studies that reported results based on established
cutoff scores for the BDI,6,35,63,64 BDI-II,18,65 and Geriatric Depression
Scale65 generally performed reasonably well. Use of the standard cutoff
score of 10 or above on the BDI produced good sensitivity and specificity to
diagnose major depression post-AMI.6,63,64 However, this cutoff resulted in
poor specificity in a sample of 613 hospitalized heart failure patients.35 The use
of cutoff thresholds developed for primary care patients also resulted in poor
sensitivity with the PHQ-2 (3 or more) and PHQ-9 (10 or more) in a study by
McManus and colleagues.66 Results from that study, however, were consistent
with findings reported by Stafford and associates78 that the PHQ-9 was more
accurate when a lower cutoff level of 6 or greater was used.
In studies that used ROC curve analysis, the same patient data were used to
set cutoff levels and to test the accuracy of those very same cutoff levels. This
is important because ROC curve analysis involves the generation of a list or
menu of all sensitivity and specificity combinations across the range of pos-
sible cutoff scores, from which researchers identify the combination that, in
their judgment, maximizes diagnostic utility. Like any exploratory data ana-
lysis technique, however, ROC curve analysis capitalizes on chance and often
overemphasizes idiosyncratic characteristics of a given set of patients or
particularities of the diagnostic process in a given study. Thus, cutoffs derived
from ROC curve analysis may not generalize well to other samples, and cross-
validation is necessary before cutoffs can be accepted as useful for prac-
tice.67,68 This is particularly the case with small samples, and of the studies
in Table 15.1 that used ROC curve analysis, diagnostic characteristics are
based on between 1769 and 6570 patients with major depression.
The need for cross-validation of derived cutoff scores is illustrated by the
large discrepancy in cutoffs for the total score of the Hospital Anxiety and
Depression Scale (HADS) in studies by Dickens70 and Strik64 and their
coworkers. The two studies obtained sensitivity and specificity values that
were approximately equal, but Dickens and colleagues found a HADS score of
17 or above to be the most accurate, whereas Strik and associates used a score
of 13 or greater. The HADS depression subscale (HADS-D) has been used
more frequently than the total HADS in studies of post-AMI depression. A
concern with the HADS-D, however, is that based on the weighted prevalence
of identified possible or probable cases across studies, it identifies a much
lower rate than the actual rate of major depression found in CVD patients
15 SCREENING IN CARDIOVASCULAR CARE 325

(HADS-D of 8 or more, 15.5%; HADS-D of 11 or more, 7.3%), whereas a BDI


score of 10 or above, for instance, identifies a greater proportion of patients
when used as a screening tool (31.1%).32 Use of instruments like the HADS
that inquire only about nonsomatic symptoms has been justified based on
claims that other screening tools that inquire about a full range of symptoms
(eg, BDI, PHQ-9) are likely to be biased in CVD patients due to the overlap
between somatic symptoms of depression and those of CVD itself. These
alternative approaches, however, have been based on face validity rather on
empirical evidence that existing methods are biased or that alternative
approaches increase accuracy.71 Furthermore, across cultures, the majority of
primary care patients with depression present primarily with somatic symp-
toms,72 and depression treatment affects somatic and nonsomatic symptoms
similarly in patients with and without chronic medical illness.73 We recently
examined responses on the BDI from a sample of hospitalized post-AMI
patients compared to a matched sample of psychiatric outpatients using rig-
orous techniques for detecting potential bias74 due to possible somatic
symptom over-endorsement, and did not find that total BDI scores from the
post-AMI patients were affected by somatic symptom endorsement any more
than the total scores from non-medical psychiatric outpatients (submitted for
review). One possible explanation for this finding may relate to the overt, as
opposed to covert, nature of assessment of depressive symptoms, which has
been shown to influence responses to self-report questionnaires.75 Hospitalized
post-AMI patients who are tired or not eating well, for instance, may not
endorse these symptoms because they are aware that they are being asked
about depression and may attribute these symptoms to the cardiac event or the
hospitalization itself, although this has not been demonstrated.
Summarizing the information in Table 15.1, many screening tools have
been used in cardiovascular care, although few have been shown to achieve
good sensitivity and specificity (using a standard of 80%, for example) in more
than one sample of CVD patients using the same cutoff threshold. Cutoff scores
of 10 or above on the BDI and 14 or above on the BDI-II are reasonably
sensitive and specific, although these cutoffs on the BDI are not as specific
among patients with CHF. None of the tools and cutoffs tested are convin-
cingly superior to any others, and more research is needed with larger samples
from multiple centers before we can be comfortable that published cutoffs for
other available instruments will work efficiently in cardiovascular care.
Given the lack of evidence of consistently good performance by any single
instrument across multiple samples or a clear performance advantage of one
instrument over others, other considerations, such as an instrument’s brevity,
readability, and comprehensibility, should be considered. Healthcare workers
in cardiovascular care settings have limited time with each patient to focus on
his or her emotional health. In addition, CVD patients, particularly those in
326 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

acute care, may have difficulty with some instrument formats. Some screening
tools, such as the BDI, are long and include response options that vary across
items, increasing complexity for patients and staff. Instruments that require
simple yes-or-no responses or estimates of symptom frequency based on
numeric ratings or visual-analogue scales may be easier to administer to
patients or for patients to complete independently.54 The PHQ-949 is a nine-
item patient-completed measure of depression symptoms that replicates the
symptoms included in the DSM-IV; a score of 10 or above has been shown to
be highly sensitive (88%) and specific (88%) for detecting DSM-IV-defined
depression among primary care patients. The PHQ-250 is an even briefer, two-
item measure that is also sensitive (83%) and specific (92%) for major depres-
sion in primary care. Research concerning the accuracy of the PHQ-9 in
identifying ICD-10-based depression is limited, although it performed better
than two other measures in a study of medical outpatients.76
Recently, a National Heart, Lung, and Blood Institute (NHLBI) working
group report made recommendations for research purposes on the assessment
and treatment of depression in patients with CVD. The report recommended a
screening algorithm that included administering the PHQ-2 followed by the
PHQ-9 if one or both of the items on the PHQ-2 are positive for depression.77
Although a cutoff threshold of 10 or greater is used in primary care, this cutoff
was not sensitive among patients in cardiovascular care in one study,66 and a
lower threshold of 6 worked well in another study.78 Thus, until accurate
cutoffs are verified for patients with CVD, a potential strategy would be to
follow the NHLBI recommendations using the lower threshold (6 or more) on
the PHQ-9 (Fig. 15.1).

4. Recommendations for Evaluation and Treatment of


Patients in Cardiovascular Care
Practical Recommendations
Screening for depression in primary care is recommended by the U.S.
Preventive Services Task Force (USPSTF) when systems are in place to
ensure accurate diagnosis, effective treatment, and follow-up.56 Many patients
with depression can be successfully managed by their primary care provider.
Most primary care providers have, or should have, experience treating patients
with many forms of depression, but the degree to which cardiologists are
comfortable with, and experienced in, the care of patients with mood disorders
is generally more variable. The triage and treatment of patients with cardiac
disease and comorbid depression therefore must be individualized in every
instance. Psychiatric or psychological consultation (or advice) should be
considered when (1) depression is suspected or diagnosed, (2) none of the
No
PHQ-2
Positive?

Yes
No
Ongoing PHQ-9
Assessment/Care? Positive?

Yes

No Clinical Interview
Positive?

Yes

Severe or Complex Symptoms?

Refer for • Severe Symptoms


Yes
Psychiatric • Manic Symptoms
Evaluation and • Psychosis
Treatment • Suicide Risk
• Substance Abuse

No

Informed Patient Preference and Management in


Includes Cardiovascular Care Clinic
CBT
Refer to • Cognitive-Behavioral Therapy (CBT)
CBT Provider
• Psychopharmacology
• Combined CBT and Psychopharmacology
• Watchful Waiting/Self-Help

Ongoing Follow-up in Cardiovascular Care Clinic

• Symptom Monitoring
• Assessment of Effectiveness of Management
Strategy
• Re-evaluation of Management Strategy

Figure 15.1. Recommended decision process for screening for depression in


cardiovascular care. Recommended screening, treatment, and follow-up decisions and
strategies are presented. In addition to strategies presented in the figure, health-promoting
practices of benefit to most cardiovascular care patients, such as maximizing social support
and healthy lifestyle choices, such as regular exercise, should be emphasized. These
recommendations may be of particular benefit to patients with minor depression who may
be able to make lifestyle changes that improve mood. Patients with severe depression, on the
other hand, are unlikely to be able to make lifestyle changes without depression treatment,
which should be prioritized.

327
328 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

patient’s ‘‘front-line’’ care providers are able to manage the condition, and (3)
the patient wishes to receive this form of specialist help. In addition, psychia-
tric consultation should be considered when diagnostic uncertainty, a history of
mania or psychosis, substance abuse, or suicide risk is present.79,80

Barriers to Implementation
Consistent with the USPSTF guidelines for primary care, we recommend that
screening only be considered in cardiovascular care settings when personnel and
resources are available to ensure appropriate diagnosis, treatment, and follow-
up.56 We recognize that personnel who are adequately trained and experienced in
diagnosing and treating depression may not be available in many cardiovascular
care settings, and that specific mental health resources may not be readily avail-
able either. We also recognize that even if cardiovascular care providers are
adequately trained and experienced in diagnosing and treating depression, in a
busy cardiology practice attention is typically focused on issues that are consid-
ered more central to cardiovascular care. Given these realities, some may argue
that it is reasonable to administer a PHQ-9 and to base treatment on the results,
since a score of 10 or more on the PHQ-9 has approximately 90% specificity for
major depression.80 Based on the sensitivity (54%) and specificity (90%) figures
reported by McManus and associates66 for CVD patients and assuming a rate of
major depression of 20%, however, almost half of patients (47%) treated for
depression if this protocol is followed would be treated inappropriately.
Evidence does not suggest that treating patients with subsyndromal depressive
symptoms with selective serotonin reuptake inhibitors (SSRIs) is helpful, so this
strategy could expose many patients to potential harm without established benefit.
Although the harms of treatment in non-cases are not well documented, potential
negative ramifications include the cost of treatment, side effects of drugs, drug–
drug interactions, and the potentially adverse effects of being incorrectly labeled.56
Given that time constraints are likely to be a formidable barrier to screening for
many cardiologists, alternative strategies, such as using trained nursing or social
work personnel to assist with assessment, may be considered. When insufficient
resources are available to provide accurate diagnostic, treatment, and follow-up
services, either in the cardiovascular care setting or through referrals, however,
screening is not likely to benefit patients and may actually have negative effects.

5. Conclusions
In summary, there is no evidence from research with primary care or CVD
patients that any single screening tool works consistently better than any other
screening tool. Without evidence of superiority for any instrument,
15 SCREENING IN CARDIOVASCULAR CARE 329

considerations such as brevity, user-friendliness, and match to current DSM-IV


criteria suggest that the PHQ instruments are a reasonable choice for clinical
screening. Future research with large patient samples from multiple centers
should be done to verify the best cutoffs for cardiovascular care. Consistent
with USPSTF guidelines for primary care, screening for depression may be
considered in cardiovascular care settings where resources are available to
provide accurate diagnosis, treatment, and follow-up services.

References
1. Rudisch B, Nemeroff CB. Epidemiology of comorbid coronary artery disease and
depression. Biol Psychiatry. 2003;54:227–240.
2. Hackett TP, Cassem NH, Wishnie HA. The coronary-care unit. An appraisal of its
psychologic hazards. N Engl J Med. 1968;279:1365–1370.
3. Cassem NH, Hackett TP. Psychiatric consultation in a coronary care unit. Ann Intern
Med. 1971;75:9–14.
4. Dreyfuss F, Dasberg H, Assael MI. The relationship of myocardial infarction to
depressive illness. Psychother Psychosom. 1969;17:73–81.
5. Frasure-Smith N, Lesperance F, Talajic M. Depression following myocardial
infarction. Impact on 6-month survival. JAMA. 1993;270:1819–1825.
6. Frasure-Smith N, Lesperance F, Talajic M. Depression and 18-month prognosis after
myocardial infarction. Circulation. 1995;91:999–1005.
7. van Melle JP, de Jonge P, Spijkerman TA, et al. Prognostic association of depression
following myocardial infarction with mortality and cardiovascular events: A meta-
analysis. Psychosom Med. 2004;66:814–822.
8. Barth J, Schumacher M, Herrmann-Lingen C. Depression as a risk factor for mortality in
patients with coronary heart disease: A meta-analysis. Psychosom Med. 2004;66:802–813.
9. Sorensenf C, Friis-Hasche E, Haghfelt T, et al. Postmyocardial infarction mortality in
relation to depression: A systematic critical review. Psychother Psychosom.
2005;74:69–80.
10. Nicholson A, Kuper H, Hemingway H. Depression as an aetiologic and prognostic
factor in coronary heart disease: A meta-analysis of 6362 events among 146,538
participants in 54 observational studies. Eur Heart J. 2006;27:2763–2774.
11. Parashar S, Rumsfeld JS, Spertus JA, et al. Time course of depression and outcome of
myocardial infarction. Arch Intern Med. 2006;166:2035–2043.
12. Thombs BD, Ziegelstein RC, Stewart DE, et al. Usefulness of persistent symptoms of
depression to predict physical health status 12 months after an acute coronary
syndrome. Am J Cardiol. 2008;101:15–19.
13. Kronish IM, Rieckmann N, Halm EA, et al. Persistent depression affects adherence to
secondary prevention behaviors after acute coronary syndromes. J Gen Intern Med.
2006;21:1178–1183.
14. Jiang W, Alexander J, Christopher E, et al. Relationship of depression to increased risk
of mortality and rehospitalization in patients with congestive heart failure. Arch Intern
Med. 2001;161:1849–1856.
15. Faris R, Purcell H, Henein MY, et al. Clinical depression is common and significantly
associated with reduced survival in patients with non-ischaemic heart failure. Eur J
Heart Fail. 2002;4:541–551.
330 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

16. Friedmann E, Thomas SA, Liu F, et al. Relationship of depression, anxiety, and social
isolation to chronic heart failure outpatient mortality. Am Heart J. 2006;152:940.e1–940.e8.
17. Jiang W, Kuchibhatla M, Cuffe MS, et al. Prognostic value of anxiety and depression in
patients with chronic heart failure. Circulation. 2004;110:3452–3456.
18. Frasure-Smith N, Lesperance F. Depression and anxiety as predictors of 2-year cardiac
events in patients with stable coronary artery disease. Arch Gen Psychiatry.
2008;65:62–71.
19. Faller H, Stork S, Schowalter M, et al. Is health-related quality of life an independent
predictor of survival in patients with chronic heart failure? J Psychosom Res.
2007;63:533–538.
20. Berkman LF, Blumenthal J, Burg M, et al. Effects of treating depression and low
perceived social support on clinical events after myocardial infarction: The
Enhancing Recovery in Coronary Heart Disease Patients (ENRICHD) randomized
trial. JAMA. 2003;289:3106–3116.
21. Taylor CB, Youngblood ME, Catellier D, et al. Effects of antidepressant medication on
morbidity and mortality in depressed patients after myocardial infarction. Arch Gen
Psychiatry. 2005;62:792–798.
22. Carney RM, Blumenthal JA, Freedland KE, et al. Depression and late mortality after
myocardial infarction in the Enhancing Recovery in Coronary Heart Disease
(ENRICHD) study. Psychosom Med. 2004;66:466–474.
23. Rumsfeld JS, Ho PM. Depression and cardiovascular disease: A call for recognition.
Circulation. 2005;111:250–253.
24. Muller-Tasch T, Peters-Klimm F, Schellberg D, et al. Depression is a major
determinant of quality of life in patients with chronic systolic heart failure in general
practice. J Card Fail. 2007;13:818–824.
25. Dickens CM, McGowan L, Percival C, et al. Contribution of depression and anxiety to
impaired health-related quality of life following first myocardial infarction. Br J Psychiatry.
2006;189:367–372.
26. Wilson JM, Jungner G. Principles and practices of screening for disease. Geneva:
World Health Organization, 1968.
27. Magruder KM, Norquist GS, Feil MB, et al. Who comes to a voluntary depression
screening program? Am J Psychiatry. 1995;152:1615–1622.
28. Greenfield SF, Reizes JM, Magruder KM, et al. Effectiveness of community-based
screening for depression. Am J Psychiatry. 1997;154:1391–1397.
29. Antman EM, Anbe DT, Armstrong PW, et al. ACC/AHA guidelines for the
management of patients with ST-elevation myocardial infarction; A report of the
American College of Cardiology/American Heart Association Task Force on Practice
Guidelines (committee to revise the 1999 guidelines for the management of patients
with acute myocardial infarction). J Am Coll Cardiol. 2004;44:E1-E211.
30. Anderson JL, Adams CD, Antman EM, et al. ACC/AHA 2007 guidelines for the
management of patients with unstable angina/non-ST-elevation myocardial
infarction: A report of the American College of Cardiology/American Heart
Association Task Force on Practice Guidelines (writing committee to revise the 2002
guidelines for the management of patients with unstable Angina/Non-ST-elevation
myocardial infarction) developed in collaboration with the American College of
Emergency Physicians, the Society for Cardiovascular Angiography and
Interventions, and the Society of Thoracic Surgeons endorsed by the American
Association of Cardiovascular and Pulmonary Rehabilitation and the Society for
Academic Emergency Medicine. J Am Coll Cardiol. 2007;50:e1–e157.
15 SCREENING IN CARDIOVASCULAR CARE 331

31. Ziegelstein RC, Kim SY, Kao D, et al. Can doctors and nurses recognize depression in
patients hospitalized with an acute myocardial infarction in the absence of formal
screening? Psychosom Med. 2005;67:393–397.
32. Thombs BD, Bass EB, Ford DE, et al. Prevalence of depression in survivors of acute
myocardial infarction. J Gen Intern Med. 2006;21:30–38.
33. Blazer DG, Kessler RC, McGonagle KA, et al. The prevalence and distribution of major
depression in a national community sample: The National Comorbidity Survey. Am J
Psychiatry. 1994;151:979–986.
34. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults:
A summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
35. Freedland KE, Rich MW, Skala JA, et al. Prevalence of depression in hospitalized
patients with congestive heart failure. Psychosom Med. 2003;65:119–128.
36. Poole NA, Morgan JF. Validity and reliability of the Hospital Anxiety and Depression
Scale in a hypertrophic cardiomyopathy clinic: The HADS in a cardiomyopathy
population. Gen Hosp Psychiatry. 2006;28:55–58.
37. Bush DE, Ziegelstein RC, Tayback M, et al. Even minimal symptoms of depression
increase mortality risk after acute myocardial infarction. Am J Cardiol. 2001;88:337–341.
38. Lesperance F, Frasure-Smith N, Juneau M, et al. Depression and 1-year prognosis in
unstable angina. Arch Intern Med. 2000;160:1354–1360.
39. Frasure-Smith N, Lesperance F, Juneau M, et al. Gender, depression, and one-year
prognosis after myocardial infarction. Psychosom Med. 1999;61:26–37.
40. Beck AT, Steer RA. Manual for the revised Beck Depression Inventory. San Antonio,
TX: Psychological Corporation, 1987.
41. Luyster FS, Hughes JW, Waechter D, et al. Resource loss predicts depression and
anxiety among patients treated with an implantable cardioverter defibrillator.
Psychosom Med. 2006;68:794–800.
42. Friedmann E, Thomas SA, Inguito P, et al. Quality of life and psychological status of
patients with implantable cardioverter defibrillators. J Interv Card Electrophysiol.
2006;17:65–72.
43. Simson U, Perings C, Plaskuda A, et al. Impact of attachment style, social support and
the number of implantable cardioverter defibrillator (ICD) discharges on psychological
strain of ICD patients. Psychother Psychosom Med Psychol. 2006;56:493–499.
44. Gottlieb SS, Khatta M, Friedmann E, et al. The influence of age, gender, and race on the
prevalence of depression in heart failure patients. J Am Coll Cardiol. 2004;43:1542–1549.
45. Jiang W, Kuchibhatla M, Clary GL, et al. Relationship between depressive symptoms
and long-term mortality in patients with heart failure. Am Heart J. 2007;154:102–108.
46. de Jonge P, van den Brink RH, Spijkerman TA, et al. Only incident depressive episodes
after myocardial infarction are associated with new cardiovascular events. J Am Coll
Cardiol. 2006;48:2204–2208.
47. Kaptein KI, de Jonge P, van den Brink RH, et al. Course of depressive symptoms after
myocardial infarction and cardiac prognosis: A latent class analysis. Psychosom Med.
2006;68:662–668.
48. Beck AT, Steer RA, Brown GK. Manual for the Beck Depression Inventory-II. San
Antonio, TX : Psychological Corporation, 1996.
49. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: Validity of a brief depression
severity measure. J Gen Intern Med. 2001;16:606–613.
50. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: Validity of a
two-item depression screener. Med Care. 2003;41:1284–1292.
332 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

51. Radloff LS. The CES-D scale: A self-report depression scale for research in the general
population. Applied Psychological Measurement. 1977;1:385–401.
52. Goldberg DP, Gater R, Sartorius N, et al. The validity of two versions of the GHQ in the
WHO study of mental illness in general health care. Psychol Med. 1997;27:191–197.
53. Thombs BD, Magyar-Russell G, Bass EB, et al. Performance characteristics of
depression screening instruments in survivors of acute myocardial infarction: Review
of the evidence. Psychosomatics. 2007;48:185–194.
54. Williams JW Jr, Pignone M, Ramirez G, et al. Identifying depression in primary care: A
literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24:225–237.
55. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect
depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J
Gen Pract. 2007;57:144–151.
56. U.S. Preventive Services Task Force. Screening for depression: Recommendations and
rationale. Ann Intern Med. 2002;136:760–764.
57. MacMillan HL, Patterson CJ, Wathen CN, et al. Screening for depression in primary
care: Recommendation statement from the Canadian Task Force on Preventive Health
Care. CMAJ. 2005;172:33–35.
58. Spitzer R, Williams J, Gibbons M. Structured clinical interview for DSM-III-R-patient
version. New York: Biometrics Research Department, New York State Psychiatric
Institute, 1988.
59. First MB, Spitzer RL, Gibbon M, et al. Structured clinical interview for DSM-IV Axis I
disorders. New York: Biometrics Research Unit, New York Psychiatric Institute, 1995.
60. Robins LN, Helzer JE, Croughan J, et al. National Institute of Mental Health Diagnostic
Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry.
1981;38:381–389.
61. Wittchen HU. Reliability and validity studies of the WHO—Composite International
Diagnostic Interview (CIDI): A critical review. J Psychiatr Res. 1994;28:57–84.
62. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating
characteristic (ROC) curve. Radiology. 1982;143:29–36.
63. Frasure-Smith N, Lesperance F, Talajic M. Depression after myocardial infarction:
Response. Circulation. 1998;97:707–708.
64. Strik JJ, Honig A, Lousberg R, et al. Sensitivity and specificity of observer and self-
report questionnaires in major and minor depression following myocardial infarction.
Psychosomatics. 2001;42:423–428.
65. Low GD, Hubley AM. Screening for depression after cardiac events using the Beck
Depression Inventory-II and the Geriatric Depression Scale. Soc Indic Res.
2007;82:527–543.
66. McManus D, Pipkin SS, Whooley MA. Screening for depression in patients with
coronary heart disease (data from the Heart and Soul Study). Am J Cardiol.
2005;96:1076–1081.
67. Charlson ME, Ales KL, Simon R, et al. Why predictive indexes perform less well in
validation studies. Is it magic or methods? Arch Intern Med. 1987;147:2155–2161.
68. Dawes RM, Faust D, Meehl PE. Clinical versus actuarial judgment. Science.
1989;243:1668–1674.
69. Huffman JC, Smith FA, Blais MA, et al. Rapid screening for major depression in post-
myocardial infarction patients: An investigation using Beck Depression Inventory II
items. Heart. 2006;92:1656–1660.
70. Dickens CM, Percival C, McGowan L, et al. The risk factors for depression in first
myocardial infarction patients. Psychol Med. 2004;34:1083–1092.
15 SCREENING IN CARDIOVASCULAR CARE 333

71. Simon GE, Von Korff M. Medical co-morbidity and validity of DSM-IV depression
criteria. Psychol Med. 2006;36:27–36.
72. Simon GE, VonKorff M, Piccinelli M, et al. An international study of the relation
between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335.
73. Simon GE, Von Korff M, Lin E. Clinical and functional outcomes of depression
treatment in patients with and without chronic medical illness. Psychol Med.
2005;35:271–279.
74. Jones RN. Identification of measurement differences between English and Spanish
language versions of the mini-mental state examination. Detecting differential item
functioning using MIMIC modeling. Med Care. 2006;44:S124–133.
75. Hunt M, Auriemma J, Cashaw AC. Self-report bias and underreporting of depression on
the BDI-II. J Pers Assess. 2003;80:26–30.
76. Lowe B, Grafe K, Zipfel S, et al. Diagnosing ICD-10 depressive episodes: Superior
criterion validity of the Patient Health Questionnaire. Psychother Psychosom.
2004;73:386–390.
77. Davidson KW, Kupfer DJ, Bigger JT, et al. Assessment and treatment of depression in
patients with cardiovascular disease: National Heart, Lung, and Blood Institute working
group report. Psychosom Med. 2006;68:645–650.
78. Stafford L, Berk M, Jackson HJ. Validity of the Hospital Anxiety and Depression Scale
and Patient Health Questionnaire-9 to screen for depression in patients with coronary
artery disease. Gen Hosp Psychiatry. 2007;29:417–424.
79. Fancher T, Kravitz R. In the clinic. Depression. Ann Intern Med. 2007;146:ITC5–1-ITC5–16.
80. Whooley MA. Depression and cardiovascular disease: Healing the broken-hearted.
JAMA. 2006;295:2874–2881.
81. Gutierrez RC. Assessing depression in patients with congestive heart failure. Can
J Cardiovasc Nurs. 1999;10:29–36.
82. Denollet J, Strik JJ, Lousberg R, et al. Recognizing increased risk of depressive
comorbidity after myocardial infarction: Looking for 4 symptoms of anxiety-
depression. Psychother Psychosom. 2006;75:346–352.
This page intentionally left blank
16
SCREENING IN DIABETES CARE: DETECTING
AND MANAGING DEPRESSION IN DIABETES

Norbert Hermanns and Bernhard Kulzer

1. Depression in Diabetes is a Major Health Problem


2. Screening Tests
3. Treatment Options
4. Screening Program
5. Conclusions for Clinical Practice

Context
The analysis of depression screening in diabetes according to the four criteria
of the United Kingdom’s National Screening Committee shows that both
screening tests and treatment options are available. However, results of the
Cochrane meta-analysis about depression screening in primary care settings
indicate that the implementation of depression screening needs a structured
approach to link these two components. A stepped-care approach comprising
verification of positive screening results, treatment options, assessment of
response to treatment, and adaptation may carry favorable results with
regard to reduction of depression as well as cost-effectiveness.

The association between diabetes and distress has long been recognized. In 1685
Thomas Willis, a British physician, suggested that diabetes might be a consequence
of prolonged sorrow.1 In the middle of the 20th century Alexander2 regarded
diabetes as one of the seven major psychosomatic diseases. In more recent years
these historical observations have been supported by growing empirical evidence of
a special relationship between emotional distress and diabetes. A meta-analysis
regarding depression and diabetes onset showed that the presence of depressed
symptoms increased the risk of developing diabetes by 37%.3 However, the effect is

335
336 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

bidirectional.4,5 Meta-analytic findings suggest that the comorbidity of depression


and diabetes is frequent: approximately one third of diabetic patients report symp-
toms of depression, and a smaller group of 10% of diabetic patients meet the criteria
of a clinical depression.6 In diabetes care settings the recognition rate of depression
in diabetic patients is disappointingly low, ranging between 20% and 50%.7 Even in
more specialized diabetes care settings approximately 50% of depressed diabetic
patients remain undetected.8–10 Thus, there are strong and compelling arguments in
favor of depression screening in diabetes, and this is also recommended by several
guidelines for diabetes care (Fig. 16.1).
However, there are also arguments against depression screening. Studies
analyzing the effectiveness of depression screening in primary care settings do
not all support large-scale implementation of depression screening.11 Increasingly,
there is a need to justify depression screening in different medical conditions with
regard to its effectiveness and ethical and clinical implications and to specify
whether screening as a routine or more selective case-finding is warranted.12
Screening for depression potentially exposes both false positives and true positives
(but otherwise unrecognized cases) to stigmatization and potential discrimination
by health insurance companies or employers. Thus, the potential benefits of
screening for a specific condition have to be balanced against its disadvantages.
The U.K. National Screening Committee specified criteria for screening
that should help to ensure that any screening program does more good than
harm.13 It established criteria pertaining to the condition (it should be a major
health problem), the screening tests (they should have sufficient screening
performance), the treatment options (they should be available for those
detected), and the screening program (it should be of proven benefit). This
chapter will analyze the rationale for depression screening in diabetic patients
according to these broad criteria.

100%

80% 44
49
detection rate

60% 75 78 75 75

40%

51 56
20%
25 22 25 25
0%
Rubin (7) Pouwer (9) Pouwer (9) Hermanns (8) Katon (10) Hermanns (8)
subthreshold subthreshold

detection No detection

Figure 16.1. Detection rates of depression in diabetic patients.


16 SCREENING IN DIABETES CARE 337

1. Depression in Diabetes is a Major Health Problem


The relevance of depression in diabetes can be demonstrated with regard to the
frequency of depression in diabetes and its impact on the prognosis, quality of
life, and healthcare costs of diabetic patients.

Prevalence of Depression in Diabetes


Depression is a frequent comorbid condition in diabetes. A meta-analysis demon-
strated that 31.0% of diabetic patients described themselves as having elevated
depressive symptoms, compared with 14% of nondiabetic subjects. Depression
based on the diagnosis of mental health specialists occurred in 11.4%, compared
with 5.0% of nondiabetics. Minor and subsyndromal depressions are about twice
as common as major depression in diabetes.6
Out of 100 unselected diabetic patients, approximately 10 to 12 meet the
diagnostic criteria for clinical depression and a further 20 suffer from mild or
subthreshold depression.

Prognostic Relevance of Depression


The comorbidity of depression and diabetes must be taken seriously because of
the implications for the prognosis and quality of life of affected diabetic
patients.14–17 There is evidence that depression might impair effective
diabetes self-care. Diabetic patients with a higher depression score showed
higher rates of nonadherence to oral antidiabetic medication, less exercise,
more unhealthy diet, and less glucose monitoring.18,19
A meta-analysis found a significant association between depression and
glycemic control; subanalysis showed this relationship was even stronger if
the only patients who were analyzed were those who fully met the diagnostic
criteria for depression.17 Depression in people with diabetes is also a risk
factor for the occurrence of late complications and functional disability.
A prospective study with 7 years of follow-up demonstrated that the hazard
ratio for macrovascular complication is more than three times higher if
depressive symptoms were reported at baseline.20 The hazard ratios for
microvascular complications and functional disability were 8.6 and 6.9 if
minor depression was present. There was only a small difference between
mild and more severe depression with regard to the risk of late
complications.21,22
An epidemiologic analysis of the NHANES II study also revealed that
depression is a risk factor for enhanced mortality in diabetic patients: depressed
diabetic patients had a mortality rate 54% higher than nondepressed diabetic
patients.23 Katon and colleagues24 found a relative risk for mortality of 1.67 in
338 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

diabetic patients with minor depression and a hazard rate of 2.67 in diabetic
patients with major depression.
In summary, there seems no safe threshold for depression, as even mild
depressive symptoms seem to have a negative impact on the prognosis.

Depression and Quality of Life


Diabetes care guidelines define an optimal quality of life as one of the primary
objectives of diabetes therapy. Depression in diabetes not only has adverse
somatic consequences but also impairs quality of life in diabetic patients
(Table 16.1). According to an Australian survey, depression in diabetic patients
was associated with poorer quality of life in all eight quality-of-life dimensions
(physical functioning, role limitation due to physical health, bodily pain,
general health, vitality, social functioning, role limitations due to emotional
health and mental health).25

Table 16.1. Quality of Life and Depression in Diabetes.

No major Major Diabetes Diabetes P value


depression depression only (%) and major
and no (%) depression
diabetes (%) (%)
Difficulty walking 10.9 26.7 39.0 60.2 <0.0001
12 city blocks
Difficulty climbing 8.1 20.1 30.7 51.7 <0.0001
10 steps
Difficulty standing 13.2 30.1 40.0 61.6 <0.0001
on feet for 2 h
Difficulty sitting 7.3 23.9 17.1 35.9 <0.0001
for 2 h
Difficulty stooping, 15.9 35.3 44.0 59.7 <0.0001
bending, or
kneeling
Difficulty reaching 5.5 18.0 17.0 32.0 <0.0001
over head
Difficulty grasping 5.4 14.4 18.9 30.9 <0.0001
small objects
Difficulty lifting 7.1 19.5 25.4 49.5 <0.0001
10 pounds
Difficulty pushing or 10.2 26.2 32.2 55.5 <0.0001
pulling heavy objects
Difficulty shopping 5.1 17.3 20.5 39.8 <0.0001
Difficulty visiting 4.0 16.8 15.6 34.7 <0.0001
friends
16 SCREENING IN DIABETES CARE 339

Table 16.1. (Continued)

No major Major Diabetes Diabetes P value


depression depression only (%) and major
and no (%) depression
diabetes (%) (%)
Difficulty watching 2.1 12.2 7.6 22.3 <0.0001
television or listening
to music to relax
Overall functional 24.5 51.3 58.1 77.8 <0.0001
disability
Egede LE. Effects of depression on work loss and disability bed days in individuals with diabetes.
Diabetes Care. 2004;27:1751–1753.

Clinical depression and depressive symptoms (including subsyndromal


depression) are also associated with higher diabetes-related distress. In a
clinical survey, only 14.7% of patients with low or no depression reported a
high amount of diabetes-related distress, but 56.3% of patients with subthres-
hold depression and 73.6% with clinical depression suffered from diabetes-
related stress.8 Fisher and colleagues26 found a strong association between the
presence of subthreshold depression, a high amount of diabetes-related dis-
tress, and metabolic as well as behavioral risk factors. Although the causal
relationship between depression, diabetes-related distress, and impaired
quality of life is not fully understood, it seems that there is a syndrome of
depressed mood, reduced well-being, and diabetes-related distress that is a
major barrier for attaining an optimal quality of life as the ultimate treatment
goal of diabetes.

Socioeconomic Aspects of Depression in Diabetes


Depression in diabetes has also socioeconomic implications. Despite the
poorer outcome of depressed diabetic patients, costs associated with treatment
of depressed diabetic patients are significantly higher than in nondepressed
diabetic patients.27,28 There is evidence that glycemic control is more
impaired, diabetes self-care is more reduced, and the likelihood of adverse
outcomes such as functional disability, comorbidities, higher healthcare costs,
or even mortality are significantly enhanced in depressed diabetic patients.
This adverse outcome of depressed mood in diabetic patients is evident in
patients with minor or subthreshold depression as well as in patients with
clinical depression.
340 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

2. Screening Tests
Screening tests for depression in diabetes should have sufficient screening
performance, but they should also be simple to administer and acceptable to
both healthcare professionals and patients.13

Screening Performance
There are many validated questionnaires available to screen for depression or
to assess depressive symptoms. All depression scales used for depression
screening in the general population could be used in diabetic patients.
Additional evidence is available for the Beck Depression Inventory (BDI)8,29
and the Center for Epidemiological Studies Depression Scale (CES-D).8 The
9-item Patient Health Questionnaire (PHQ-9) has been used in diabetic
patients,19 but its screening performance has not yet been assessed in the
diabetic population. For depression screening in diabetic patients, the WHO
5 questionnaire30 and the Problem Areas in Diabetes Questionnaire (PAID),8
assessing diabetes-related distress, have also been used. The latter two ques-
tionnaires measure a broader aspect of negative emotional status in diabetic
patients (psychological well-being, diabetes-related distress) than the more
specific depression questionnaires.
The screening performance of questionnaires is evaluated according to their
sensitivity and specificity. These depend on the selection of a cutoff score
defining a positive screening result. For clinical practice, the positive (PPV)
and negative (NPV) predictive values are also of considerable interest, since
the PPV informs the healthcare professional about the relationship between
patients who screen positive and truly depressed patients. A rather low PPV is
associated with high rate of false positives.
Table 16.2 summarizes the screening performance for case-finding of
clinical depression of the above-mentioned screening instruments.

Table 16.2. Screening Performance of Depression Screening Tools Used in Diabetic


Patients.

Questionnaire Cutoff Sensitivity Specificity Positive Negative


Predictive Value Predictive Value
BDI (41) 12 90.0% 84% 59% 97%
BDI (8) 11 87% 81% 66% 83%
CES-D (8) 23 79.2% 89% 54% 96%
WHO-5 (30) £12 100% 78% 45% 100%
PAID (8) 38 81.1% 74% 34% 96%
16 SCREENING IN DIABETES CARE 341

Depression questionnaires like the BDI and CES-D showed high sensitivity
and specificity. PPVs were higher than 50% and NPVs were higher than 80%.
The questionnaires that are less depression-specific, like WHO-5 and PAID,
had a comparable sensitivity to depression questionnaires but a lower specifi-
city. These questionnaires measure a broader aspect of emotional aspects
(psychological well-being and diabetes-related distress), which may result in
a lower specificity and rather low PPVs (less than 50%).
Figure 16.2 summarizes the screening performance of the different screening
tools. The screening performance is expressed as the positive likelihood ratio. As
expected, depression-specific questionnaires had the highest positive likelihood
ratio, followed by the less-depression-specific questionnaires (well-being and
diabetes-related distress). The advantages of all questionnaires are that they are
easy to administer and to evaluate. Furthermore, all questionnaires are able not
only to screen for clinical depression but also to quantify subthreshold emotional
problems. Well-being and diabetes-related distress questionnaires may be more
in line with the expectations of diabetic patients seeking medical treatment than
depression questionnaires—patients may expect to be asked about diabetes-
related problems or well-being instead of depressed feelings and suicidal inten-
tions—but this advantage is balanced by the somewhat lower screening perfor-
mance. The low screening performance of verbally asked questions31 may be
explained by a reduced readiness on the part of diabetic patients to speak about
emotional problems if they are directly asked about depressed feelings.

Acceptability of Screening
The acceptability of screening for patients is determined by the time
needed to complete the questionnaire and the complexity of questions.

8
7,2
7
positive likelihood ratio

6 5,6

5 4,7 4,5
4
3,1
3 2,7

1
0
CES-D (8) BDI (29) BDI (8) WH0 (30) PAID (8) Quest (31)

Figure 16.2. Positive likelihood ratio of different screening tests.


342 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

For health professionals the time needed to score and interpret the result
is also important. The time to complete the above-mentioned question-
naires ranges from 1 to 5 minutes, and so they could be completed in the
waiting room.
Questionnaires that measure quantity and intensity of negative emotions
might be seen as difficult to complete by some people. Ideally, the purpose of
the questionnaire and the findings would be discussed individually with each
patient. The pros and cons of screening questionnaires are summarized in
Textbox 16.1.
From a clinician’s perspective, the discriminatory value, in particular
the PPV, of a screening tool plays a decisive role.32 If the PPV is low,
the healthcare professional has to deal with numerous false positives.
Where the prevalence of a condition is low, most tests will yield a low
PPV (Fig. 16.3).
There are several possible solutions to the low PPV problem. One is to
add a second screen for those who initially screen positive (Appendix
Table 4). A second option is to choose a higher cutoff, and a third option
is to screen only the high-risk cases who have by definition a high
prevalence of depression. The likelihood for depression is not equally
distributed among diabetic patients: risk factors such as female gender,
lack of social support, younger age, and low socioeconomic status are
associated with a higher risk of depression.32 In diabetic patients there are
additional risk factors showing a substantial association with clinical or
subthreshold depression: occurrence of late complications, especially neu-
ropathy and erectile dysfunction in men, the need for insulin therapy in
type 2 diabetic patients, poor glycemic control, and hypoglycemia
problems.15–17,33

TextBox 16.1. Pros and Cons of Screening vs. Routine Detection

Pros Cons
• Easy to administer • Requires literacy of the patient
• Easy to evaluate • Scoring sometimes is complex
• Time saving if done during waiting time (need of templates)
• Measures of subclinical depression, • Feedback and discussion of test
well-being, or diabetes-related distress results needs communicative skills
• Cutoff scores sometimes have to be
adapted with regard to the setting
16 SCREENING IN DIABETES CARE 343

true depressed false positive

100%
positive predictive value

80% 34 40 47 55
67 34
60%

40%
66 60 53 45
20% 34 19

0%
30% 25% 20% 15% 10% 5%
population prevalence of a condition

Figure 16.3. Positive predictive value and population prevalence.

3. Treatment Options
The third criterion of the U.K. National Screening Committee for the evaluation
of depression screening refers to the treatment options for the screened condi-
tion.13 For the ethical consideration of depression screening in diabetic patients, it
is important that screening not merely leads to an additional diagnosis of depres-
sion, but that effective treatment options are available. An additional depression
diagnosis without an effective treatment option could cause a stigma to the patient
and a risk of discrimination by insurance companies or employers.12
Fortunately, there are treatment options available, including nonspecific
interventions like diabetes education and counseling on diabetes-related pro-
blems and more specific antidepressive treatment strategies.

Nonspecific Interventions
Diabetes education has proven to be effective in treating subthreshold as well as
clinical depression. In two different studies the rate of subthreshold depression
dropped from 38% to 13% 6 months after diabetes education34 and from 28% to
18% 1 year after diabetes education.35 In randomized controlled trials that
evaluated more specific treatments like nortriptyline, fluoxetine, or cognitive–
behavioral therapy in diabetic patients with major or clinical depression, diabetes
education was also used frequently as a ‘‘placebo treatment.’’ The remission rate
of major depression after diabetes education was 37% (compared to fluoxetine36
or cognitive–behavioral therapy37) and 41% (compared to nortriptyline38).
In summary, diabetes education that provides diabetic patients with skills
and knowledge to better cope with diabetes-related challenges, can halve the
rate of subthreshold depression and even reduce the rate of major depression by
more than one third.
344 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Specific Antidepressive Interventions


Specific antidepressive treatments consist of antidepressant medication and
psychotherapy. In diabetic patients the antidepressant drugs nortriptyline38 and
fluoxetine36 reduced major depression rates and severity to less than 50%.
Cognitive–behavioral therapy focuses on replacing the diabetic patient’s
dysfunctional attitudes and negative cognitions with more appropriate perspec-
tives and cognitions. In one study, cognitive–behavioral therapy led to a remis-
sion of major depression of 70%37. Thus, all specific antidepressive treatments
were effective in reducing depression in diabetic patients. However, only cog-
nitive–behavioral therapy had an additional effect on glycemic control;37 fluox-
etine did not have any additional beneficial effect on glycemic control,36 and
nortriptyline led to a slight deterioration of glycemic control.38

4. Screening Program
Implementation of a large-scale screening program for depression in diabetes
should be justified by high-quality randomized trials demonstrating that such a
program in diabetic patients reduces the morbidity of depression.39
Furthermore, data showing the cost-effectiveness of screening would
strengthen arguments for implementation of depression screening, since all
screening programs will have an impact on finite healthcare resources.

Effect of Depression Screening on Morbidity


There are no meta-analytic findings based on randomized controlled trials
about the effectiveness of depression screening in diabetes. Therefore, we
have to rely on a Cochrane review about the efficacy of depression screening
in primary care settings,11 which is where most diabetic patients are treated.
Depression screening is able to identify unrecognized cases if there is a
selected feedback of elevated depression scores. But if there is no strategy
for dealing with positively screened or identified cases, the effect on depres-
sion management is not substantial.
In the field of diabetes, the only randomized controlled trial is the Pathways
study,19 which shows a beneficial effect of a structured screening and treatment
program for depression in diabetic patients. In a two-stage process, positive
screening results on the PHQ-9 were confirmed by a second diagnostic proce-
dure using the Hopkins Symptom Checklist (SCL-90). Depressed diabetic
patients in the intervention group were offered a choice of antidepressant
medication or problem-solving therapy. If depression did not improve within
10 to 12 weeks, the initial treatment was either intensified or changed. If patients
did not respond to the intensification or treatment switch, they were referred to a
16 SCREENING IN DIABETES CARE 345

specialized mental health service. This stepped-care approach was compared to


the control condition, which involved simply informing the patients about their
depression and asking them to speak with their primary care physician about
depression treatment. The intervention group had a significantly greater reduc-
tion of depression (–40%) than the control group (–12%).
In summary, there is evidence that depression management in diabetic
patients is effective. The stepped-care approach described, containing
screening, an offer of treatment options, and assessment of treatment response,
proved to have the potential to reduce the morbidity of depression in diabetes.

Cost-Effectiveness of Depression Screening and Intervention


Healthcare resources are finite, so the cost-effectiveness of depression
screening in diabetes is a matter of debate. In the Pathways study, a cost-
effectiveness analysis40 showed that within 2 years, there was an increase of
days without depression to 61 days per patient in the intervention group. The
cost analysis showed that this stepped-care approach led to an annual net cost
reduction of $314 in total healthcare costs, even taking into account the costs
for depression screening ($ 27) and antidepressant treatment ($545).
In summary, there are promising results that implementation of depression
screening using a stepped-care approach is effective in reducing depression
and also in reducing healthcare costs, but further research is clearly required.

5. Conclusions for Clinical Practice


The analysis of depression screening in diabetes according to the four criteria of
the U.K. National Screening Committee showed that depression in diabetes is a
major healthcare problem warranting attention. Screening tests are available,
although their accuracy might be improved, and treatment options are available.
However, results of the Cochrane meta-analysis about depression screening in
primary care settings indicate that the implementation of depression screening
needs a structured approach with defined treatment options and control of
treatment response; otherwise, depression screening alone will not reduce mor-
bidity. A stepped-care approach comprising verification of positive screening
results, treatment options, assessment of response to treatment, and adaptation
showed favorable results with regard to reducing depression as well as to cost-
effectiveness and even mortality.
For the clinical management of depression in diabetic patients, the flow-
chart in Figure 16.4 could provide some guidance.
Depression screening for diabetic patients seems to be appropriate if there is
a clinical impression of low mood or if risk factors are present (eg, history of
depression, late complications). It is not yet clear if routine screening for an
346 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

clinical impression risk factors


for depression

Treatment negative Monitoring


Depression
as usual

positive
Differential diagnosis
Exclude, eg,mental comorbidities
Anxiety, dementia Depression diagnosis negative Diabetes management
Reduction of diabetes-related
problems
positive
Specific antidepressive treatment
Dosage (eg, medication, CBT)
increase
treatment

Response not Response


sufficient sufficient

Figure 16.4. Flow diagram of depression management in diabetes care.

unselected population with diabetes is worthwhile. Treatment as usual seems


indicated if depression screening is negative and NPV rates are high. For those
with positive screening results, follow-up assessment and a clinical interview
for clinical depression should be performed. If the depression screening is
positive but the criteria for clinical depression are not met, it seems appropriate
to offer help for any related concerns. If diabetes-related problems are
involved, the patient may profit from improved diabetes management, redu-
cing diabetes-related distress. The patient’s psychological well-being should
be monitored to find out if depressive symptoms will improve, deteriorate, or
remain stable.
Depression and poor quality of life are common in diabetic patients. Since
subthreshold disorders, clinical depression, and distress all have a negative
impact on the quality of life as well as on the course of diabetes, depressive
symptoms deserve attention in clinical care. The timely identification of
patients with subthreshold or clinical depression enables effective manage-
ment of this comorbidity to diabetes. This is important so that these patients
can achieve the ultimate goal of diabetes therapy, an optimal quality of life, and
the fewest possible diabetes complications.

References
1. Willis T. Pharmaceutice rationalis sive diabtriba de medicamentorum operantionibus
in humano corpore. Oxford, 1675.
2. Alexander F. Psychosomatic medicine. New York, Norton, 1950.
16 SCREENING IN DIABETES CARE 347

3. Knol MJ, Twisk JW, Beekman AT, et al. Depression as a risk factor for the onset of type
2 diabetes mellitus. A meta-analysis. Diabetologia. 2006;49:837–845.
4. Hermanns N, Kubiak T, Kulzer B, et al. Emotional changes during experimentally
induced hypoglycaemia in type 1 diabetes. Biol Psychol. 2003;63:15–44.
5. Hermanns N, Scheff C, Kulzer B, et al. Association of glucose levels and glucose
variability with mood in type 1 diabetic patients. Diabetologia. 2007;50:930–933.
6. Anderson RJ, Freedland KE, Clouse RE, et al. The prevalence of comorbid depression
in adults with diabetes: A meta-analysis. Diabetes Care. 2001;24:1069–1078.
7. Rubin RR, Ciechanowski P, Egede LE, et al. Recognizing and treating depression in
patients with diabetes. Current Diabetes Reports. 2004;4:119–125.
8. Hermanns N, Kulzer B, Krichbaum M, et al. How to screen for depression and
emotional problems in patients with diabetes: comparison of screening characteristics
of depression questionnaires, measurement of diabetes-specific emotional problems
and standard clinical assessment. Diabetologia. 2006;49:469–477.
9. Pouwer F, Beekman AT, Lubach C, et al. Nurses’ recognition and registration of
depression, anxiety and diabetes-specific emotional problems in outpatients with
diabetes mellitus. Patient Educ Couns. 2006;60:235–240.
10. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based
sample of patients with diabetes and major depression. Med Care. 2004;42:1222–1229.
11. Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for
depression. Cochrane Database of Systematic Reviews CD002792, 2005.
12. Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ.
2006;332:1027–1030.
13. The UK’s National Screening Committee’s criteria for appaising the viability,
effectiveness and appropriateness of a screening programme. Available at: http://
www.nsc.nhs.uk/pdfs/criteria.pdf. 2003.
14. Peyrot M. Depression: A quiet killer by any name. Diabetes Care. 2003;26:2952–2953.
15. Peyrot M, Rubin RR Levels and risks of depression and anxiety symptomatology
among diabetic adults. Diabetes Care. 1997;20:585–590.
16. de Groot M, Anderson RJ, Freedland KE, et al. Association of depression and diabetes
complications: A meta-analysis. Psychosom Med. 2001;63:619–630.
17. Lustman PJ, de Groot M, Anderson RJ, et al. Depression and poor glycemic control.
Diabetes Care. 2000;23:934–942.
18. Ciechanowski PS, Katon WJ, Russo JE. Depression and diabetes: impact of depressive
symptoms on adherence, function, and costs. Arch Intern Med. 2000;160:3278–3285.
19. Katon WJ, Von Korff M, Lin EH, et al. The Pathways Study: a randomized trial of
collaborative care in patients with diabetes and depression. Arch Gen Psychiatry.
2004;61:1042–1049.
20. Black SA, Markides KS, Ray LA. Depression predicts increased incidence of adverse
health outcomes in older Mexican Americans with type 2 diabetes. Diabetes Care.
2003;26:2822–2828.
21. Egede LE. Diabetes, major depression, and functional disability among U.S. adults.
Diabetes Care. 2004;27:421–428.
22. Pouwer F, Beekman ATF, Nijpels G, et al. Rates and risks for co-morbid depression in
patients with type 2 diabetes mellitus: results of a community based study.
Diabetologia. 2003;46:892–898.
23. Zhang X , Norris SL, Gregg EW, et al. Depressive symptoms and mortality among
persons with and without diabetes. Am J Epidemiol. 2005;161:652–660.
348 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

24. Katon W, Cantrell CR, Sokol MC, et al. Impact of antidepressant drug adherence on
comorbid medication use and resource utilization. Arch Intern Med. 2005;165:2497–
2503.
25. Goldney RD, Phillips PJ, Fisher LJ, et al. Diabetes, depression, and quality of life: a
population study. Diabetes Care. 2004;27:1066–1070.
26. Fisher L, Skaff MM, Mullan JT, et al. Clinical depression versus distress among
patients with type 2 diabetes: Not just a question of semantics. Diabetes Care.
2007;30:542–548.
27. Egede LE. Effects of depression on work loss and disability bed days in individuals with
diabetes. Diabetes Care. 2004;27:1751–1753.
28. Egede LE, Zheng D, Simpson K. Comorbid depression is associated with increased
health care use and expenditures in individuals with diabetes. Diabetes Care.
2002;25:464–470.
29. Lustman PJ, Clouse RE, Griffith LS, et al. Screening for depression in diabetes using
the Beck Depression Inventory. Psychosom Med. 1997;59:24–31.
30. Awata S, Bech P, Yoshida S, et al. Reliability and validity of the Japanese version of the
World Health Organization-Five Well-Being Index in the context of detecting
depression in diabetic patients. Psychiatry Clin Neurosci. 2007;61:112–119.
31. Arroll B, Khin N, Kerse N. Screening for depression in primary care with two verbally
asked questions: cross-sectional study. BMJ. 2003;327:1144–1146.
32. Carter RM, Wittchen HU, Pfister H, et al. One year prevalence of subthreshold and
threshold DSM-IV generalized anxiety disorder in a nationally representative sample.
Depression Anxiety. 2001;13:78–88.
33. Hermanns N, Kulzer B, Krichbaum M, et al. Affective and anxiety disorders in a
German sample of diabetic patients: prevalence, comorbidity and risk factors. Diabet
Med. 2005;22:293–300.
34. Peyrot M, Rubin RR. Persistence of depressive symptoms in diabetic adults. Diabetes
Care. 1999;22:448–452.
35. Hermanns N, Kulzer B, Kubiak T, et al. Course of depression in type 2 diabetes
[abstract]. Diabetes. 2004;53:A16.
36. Lustman PJ, Freedland KE, Griffith LS, et al. Fluoxetine for depression in diabetes: a
randomized double-blind placebo-controlled trial. Diabetes Care. 2000;23:618–623.
37. Lustman PJ, Griffith LS, Freedland KE, et al. Cognitive-behavior therapy for
depression in type 2 diabetes mellitus: a randomized, controlled trial. Ann Intern
Med. 1998;129:613–621.
38. Lustman PJ, Griffith LS, Clouse RE, et al. Effects of nortriptyline on depression and
glycemic control in diabetes: results of a double-blind, placebo-controlled trial.
Psychosom Med. 1997;59:241–250.
39. Jones LE, Doebbeling CC. Depression screening disparities among veterans with
diabetes compared with the general veteran population. Diabetes Care.
2007;30:2216–2221.
40. Simon GE, Katon WJ, Lin EH, et al. Cost-effectiveness of systematic depression
treatment among people with diabetes mellitus. Arch Gen Psychiatry. 2007;64:65–72.
41. Lustman PJ, Clouse RE, Griffith LS, et al. Screening for depression in diabetes using
the Beck Depression Inventory. Psychosom Med. 1997;59:24–31.
17
COMMENTARY AND INTEGRATION: IS IT TIME
TO ROUTINELY SCREEN FOR DEPRESSION IN
CLINICAL PRACTICE?

James C. Coyne

We were pleased we were able to convince such talented authors to contribute


chapters to this volume. We hope that their contributions will serve to redefine
key issues in the implementation of screening programs for depression in
clinical settings. The chapters are quite varied but are notable for their
balanced, evidence-based recommendations and skepticism about introducing
screening into routine care unless there is a substantial infusion of resources.
Taken together, the chapters provide a foundation for critiquing screening
programs as they are currently being implemented.
Screening has become the most commonly adopted enhancement of care
for depression, even if questions can be raised about the fidelity with it is
being implemented. Yet, the enthusiasm for screening is not based on
the accumulation of compelling new evidence, but rather a reframing of
the question of its efficacy, and the evidence is mustered to answer it. The
crucial question has shifted from ‘‘Does routine screening improve patient
outcomes?’’ to ‘‘Can screening be used to improve outcomes when there is
a substantial effort made to ensure adequate treatment and follow up?’’1
This seemingly important difference has been downplayed in endorsements
of screening. And yet, stand-alone screening programs are simply not
effective in improving the management of depression in primary care
(see Chapter 7). Moreover, including screening as a component in more
comprehensive enhancements of care may not be necessary to improve
outcomes.

349
350 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

One can readily find basis in this volume for questioning the wisdom of
stand-alone screening initiatives and for raising doubts whether routine
screening is acceptable and sustainable in non-mental-health medical settings.
I will highlight these points in the context of providing a more general
commentary on the preceding chapters. One goal is to alert readers tempted
by enthusiasm about screening to some frustrations and disappointments that
await them if they proceed with a screening program without additional
resources. I acknowledge that I am going beyond the conclusions of many
chapters. However, almost all limit endorsement of screening to settings where
supports are in place for absorbing the effects of screening and ensuring it has
its intended effect. Unfortunately, such settings are far less common than
presumed. So, the question becomes, ‘‘What are the implications of routine
screening being implemented without such support?’’
In the preface, Katon notes the inadequacy of routine care for depression.
Most primary care patients discontinue treatment with antidepressants shortly
after it is initiated.2 Only 20% to 30% of depressed persons being treated
exclusively in primary care settings receive adequate care and follow-up.3
Berndt and colleagues4 estimate that 40% of depressed patients are adminis-
tered treatment with little or no benefit over what would be obtained by
remaining on a wait list. This represents about 20% of the total cost of treating
depression. Katon also notes the underappreciated problem of overtreatment of
depression—that is, the prescribing of treatment to patients who are not
depressed or likely to show benefit (see also Chapters 3 and 5). Over the past
20 years, rates of treatment of depression have doubled to quadrupled in most
Westernized countries, largely due to increases in the prescription of antide-
pressants to persons who are mildly depressed or not at all depressed.5–7
Increasingly, rates of antidepressant prescriptions equal or exceed the esti-
mated prevalence of depression,8 even if the most depressed persons in the
community still go untreated.9
Zimmerman and Mitchell, in Chapter 1, question the validity of the
diagnosis of major depression. There is no gold standard for diagnosis,
and arbitrary decisions are involved in the presumptive gold standards such
as diagnosis on the basis of semi-structured interview using formal diag-
nostic criteria. They note often-overlooked differences between DSM-IV
and ICD-10 criteria. In DSM-IV, major depression requires five symptoms.
The more nuanced ICD-10 criteria distinguish between mild depression,
requiring only four symptoms, and moderate depression, requiring six
symptoms. Thus, U.S. practice guidelines10 do not distinguish degree of
severity in recommending that a diagnosis of major depression indicates a
need for treatment, in contrast to U.K. recommendations, which encourage
watchful waiting and nonpharmacologic intervention for mild and moderate
depression.
17 COMMENTARY AND INTEGRATION 351

Zimmerman and Mitchell propose the justification for a diagnosis lies in its
identifying ‘‘meetable unmet needs.’’ Does major depression satisfy such a
criterion? Patten11 recently raised the question of whether the diagnosis of
major depression is overinclusive as an indicator of addressable clinical need,
singling out community-based studies that used lay interviews and that pro-
duced high estimates of the prevalence of depression and low rates of its
treatment (see also Brugha and colleagues12).
Establishing that the criteria for depression are somewhat arbitrary sets the
stage for psychiatrists not relying on them in any systematic fashion for
interviewing patients and making diagnoses. Psychiatrists may be prone to
making invidious comparisons between their own and primary care physicians’
diagnostic skills, but Zimmerman and Mitchell show that psychiatrists typi-
cally inquire only about depressed mood and not anhedonia, and that 90% of
psychiatrists do not use formal criteria for case identification or assessment of
severity.
Mitchell, in Chapter 2, provides an historical overview of existing mood
scales. His exhaustive list is long, but only a small handful are in very wide use.
He notes that scales can be applied in the separate tasks of screening, diag-
nosing, and monitoring clinical improvement. While it is tempting to expect
that a single scale will perform all of these tasks well, it is unrealistic. One issue
that arises in the evaluation of screening scales is whether it is of any advantage
that scale items conform closely to diagnostic criteria. Presumably a scale such
as the nine-item Patient Health Questionnaire (PHQ-9) that is directly modeled
on such criteria should be more efficient, but that has not generally proven to be
the case. Yet, such scales, originally designed as screening instruments, are
increasingly being promoted and accepted as both diagnostic instruments
suitable for making treatment decisions13 and the gold standard to which
physician detection of depression is compared.14 However, such scales do
not consider exclusion criteria for major depression. When administered in a
self-report format, they do not provide for answering patients’ questions about
what is meant by particular items, probing their responses, or asking clarifying
questions.
Screening instruments need to be acceptable to clinicians and patients, and
this criterion in turn needs to be the balanced against the validity of the
instrument. Seemingly minor changes in the burden on patients completing
screening instruments or on clinicians in scoring them can make large differ-
ences in their acceptability. Bermejo and coworkers15 found that after partici-
pating in a screening study, 62.5% of primary care physicians found the PHQ-9
too long and 37.5% found it too time-consuming, even though it typically took
less than 2 minutes. Half of the physicians rated the PHQ-9 as an impediment to
daily practice and 75% thought it was impractical. Kessler and Wang16 report a
physician’s objection to screening: ‘‘You are proposing we use half or more of
352 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

our appointment to ask patients a set of questions about things that they usually
are not here to discuss and usually will not generate a positive finding?’’
Shorter scales should be more acceptable than longer ones, and a suggestion
repeatedly appears in the literature that ultrashort, one- or two-item scales can
be sufficient as screens, in terms of validity. Mitchell and Coyne17 conducted a
systematic review and meta-analysis of such ultrashort screening scales and
concluded that they were better at ruling out the need for further assessment
than ruling in further assessments—that is, their negative predictive value is
substantially higher than the positive predictive value. Even so, despite the
high negative predictive value of a two-item screen, over 25% of patients who
are depressed will be missed. Bennett and colleagues18 examined ultrashort
screening scales as a basis for deciding to administer a longer screening scale.
While a small advantage was found, this strategy carries the risk that patients
will decline a second questionnaire or that harried clinicians will accept the
simple screen as confirmation of a diagnosis.
There are inherent limitations to the performance of screening scales; there
is always some trade-off in maximizing specificity versus sensitivity or vice
versa, and between validity and acceptability. Furthermore, any existing and
perhaps any conceivable self-report scale is going to require a clinical inter-
view if underidentification or overidentification of depression is to be mini-
mized. With properly cross-validated cut points, many scales appear to
perform about the same.19 Consumers of the screening literature should be
suspicious of claims to the contrary. It is quite common to find claims
amounting to home-court advantage—in other words, adjustments to the cut
points for a particular instrument favored by some investigator produce better
performance in the specific sample under study than fixed, well-validated cut
points on an established scale. These findings capitalize on chance, including
sampling error. A meta-analysis found that studies tailoring cut points to
particular samples produce spurious estimates of the performance of
instruments.20
Zimmerman and Mitchell’s doubts about the interviewing practices of
psychiatrists do not take primary care physicians off the hook. Mitchell, in
Chapter 3, documents a persistent failure of primary care physicians to detect
depression, despite campaigns and educational efforts to promote detection.
Mitchell provides the pooled estimates of a sensitivity of 48% and a specificity
of 70% for detecting depression. Assume that the prevalence of depression in
primary care is 10%, consistent with a large body of research. Figure 3.3 in
Mitchell’s chapter shows that at that prevalence, these pooled sensitivity and
specificity figures translate into physicians correctly identifying 4.8% of
patients as depressed, falsely identifying 5.2%, correctly reassuring 60.5%
that they are not depressed, and falsely reassuring 29.5% of patients who are
actually depressed. The remainder of Mitchell’s chapter reviews factors
17 COMMENTARY AND INTEGRATION 353

associated with whether depression is detected in a primary care visit. Most


primary care physicians cannot recall the formal criteria for depression; few
use formal interviews or screening instruments. Yet, screening programs
typically require primary care clinicians to conduct a follow-up interview of
patients who screen positive to confirm a diagnosis. There are reasons to doubt
they would be prodded to do this or assisted in doing so: simple efforts at
guideline implementation or educational interventions to improve physician
recognition of depression are notoriously ineffective.21,22
Valenstein and associates23 investigated the effects of providing interview
guides and other aids for physicians participating in an implementation of the
PRIME-MD,24 which consists of a coordinated patient screening questionnaire
and Clinician Evaluation Guide (CEG). Despite provision of support staff, only
21% of positive screens were followed up. Use of the PRIME-MD fell off
sharply after withdrawal of added support; there was soon no use of either the
questionnaire or the CEG.
These authors provided more resources and support to the clinical setting
than would likely be available in settings implementing routine screening, but
their results are quite consistent with other studies of stand-alone screening that
uniformly yield little or no effect on interventions being offered and no
significant effect on patient depression outcomes.25–27 Yet, the authors provide
a potential insight into why there is little or no benefit to screening: namely,
clinicians may see little need to follow up on positive screens if patients do not
appear distressed, and they do not want to use any formal algorithm to
determine diagnosis when depression is suspected. Moreover, clinicians noti-
fied of patients who screen positive often preferred watchful waiting to active
intervention with the mildly depressed patients who are identified.27 Thus, the
likelihood that a positive screen will become an adequately treated case is
lower than expected. This is illustrated in Figures AP.1-3 in the Appendix.
Asked to nominate barriers to detection of depression, primary care physi-
cians emphasize structural and organizational issues: half of all them endorsed
lack of time, lack of reimbursement for depression treatment, and lack of
access to specialist care. Turning to patient factors, Mitchell notes that most
primary care patients who spontaneously complain of depressive symptoms are
detected, and most will volunteer symptoms when asked directly. However,
prominent among the reasons for nondisclosure are patients’ beliefs that they
can handle the symptoms on their own, and if they need professional assis-
tance, that primary care is not where to obtain it.
Who are the currently untreated depressed primary care patients who would
be identified with screening? Ethnic minorities and low-education patients are
particularly likely to go undetected,28 but severity of depression is a crucial
determinant of detection.29,30 Coyne and associates31 found that over half of
undetected patients had one or no symptoms beyond the five required for a
354 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

diagnosis and were less likely to have a past history of treatment. Results
suggested that if primary care physicians were to improve their detection, they
would have to increase their willingness to make a diagnosis on the basis of
fewer symptoms and pay more attention to mild symptoms in highly func-
tioning patients. Von Korff32 notes, ‘‘we need to be circumspect about con-
cluding that unrecognized, undiagnosed, and untreated primary care patients
with mental disorders necessarily indicate poor quality of care’’ (p. 295).
Smith, in Chapter 4, provides an excellent review of innovations in psycho-
metrics that can be used to improve the efficiency and the validity of existing
mood scales. With the development of the Rasch model as a basis for con-
structing and refining instrumentation, no longer does the dictum of classical
test theory that ‘‘longer is better’’ hold. Smith notes that many of the barriers to
routine screening for depression realizing its potential are organizational or
related matters of clinicians being able and willing to change their existing
practices to ensure quality of care. However, to the extent to which inadequa-
cies of existing scales burden patients and clinicians, there is room for
increasing the sustainability and effectiveness of routine screening by refining
the scales. As he notes, very often existing scales can be pared down to a
quarter or a third of their current length with no loss in validity, and even
improvement. Furthermore, the creation of large databanks of past patients’
responses to individual items can be used to create algorithms for computer-
ized, tailored adaptive testing of new individual patients, with the selection of
the next item to be presented to them determined by their accumulating
responses. Demonstrating the incredible power of adaptive testing, Gibbons
and colleagues33 administered 616 items from Mood and Anxiety Spectrum
Scales (MASS) to 800 outpatients from a mood and anxiety treatment program
as the basis for developing a computerized adaptive testing (CAT) using post
hoc simulation. On average, the 616 items were reduced by 95% to 24, and the
CAT version was still correlated 0.95 with the original MASS.
Despite the promise of new psychometric approaches for the refinement of
screening instruments, we should not be under the illusion that they can entirely
overcome their limitations. Santor and Coyne34 used such methods with the
CES-D in a sample of 528 primary care patients, split into a study and a cross-
validation group. A reduction of the scale from 20 items to 9 was possible, and
the positive predictive value was raised 30%. However, even with these
improvements, a proportion of patients screening positive would not be found
to be depressed, and a proportion of depressed patients would be missed.
Gilbody and Beck, in Chapter 7, declare the conclusion of their systematic
review in their title ‘‘Implementing Screening as Part of Enhanced Care:
Screening Alone is Not Enough.’’ The 2002 U.S. Preventive Services Task
Force (USPSTF), which was pivotal in revising the recommendations con-
cerning screening for depression, was based on expanding the inclusion criteria
17 COMMENTARY AND INTEGRATION 355

for relevant studies to include studies of screening in the context of more


general quality improvement in depression care. The single decisive study
was Wells and colleagues’ Partners in Care,35 an ambitious effort involving
resources such as personnel to administer and score screening instruments,
training materials and academic detailing, depression management specialists,
initiatives to ensure scheduling of follow-up appointments, consultations and
training with mental health professionals, and ready access to antidepressants
and psychotherapy, as well as resources in kind provided by participating
practices.
Gilbody and Beck note that prior to the UPSTF report, stand-alone
screening programs were not recommended because of a lack of evidence for
their effectiveness. These authors take up the issue of whether programmatic
enhancements of care for depression are effective, and conclude that they are
indeed modestly effective. However, they next identified studies relevant to
answering the critical question not adequately addressed in the UPSTF report:
whether screening is a necessary component of enhanced care, in terms of the
added value for effectiveness. A stratified analysis of studies with and without
screening revealed a slight advantage for programs that included screening, but
the differences was modest, with a standardized mean difference of 0.15,
accounting for less than 1% percent of the variance in outcomes. Yet, none
of these studies involved randomized comparisons of screening versus the
same program without screening, and reliance on screening to select patients
covaried with other factors. Meta-regression analyses of sources of heteroge-
neity in outcomes found that among key program elements, whether the care
manager was trained, whether this care manager received regular supervision
from mental health professionals, and whether the enhanced care intervention
targeted increased guideline-concordant treatment with antidepressants all had
greater impact than whether patients were screened or referred. Additional
analyses reported elsewhere36 differentiated among outcomes of screening and
found that including screening in an enhancement of depression care had
nonsignificant effects on physician recognition or provision of any treatment
for depression, and no effect on use of antidepressants or changes in patient
depression scores.
Rogers, Lerner, and Adler, in Chapter 8, discuss the use of technology to
reduce the burden of screening on patients and clinicians and to improve its
efficiency. These technologies range from telephone screening by a human
interviewer and automated telephone response systems in which the voice of
the interviewer is pre-recorded and patient responses are automated using
voice recognition technology or the telephone touchpad to personal digital
assistants (PDAs), touchscreens, and the Internet. Rogers and colleagues
also distinguish between nonadaptive applications that basically replace the
use of conventional pencil-and-paper screens with telephone or electronic
356 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

presentations of the same items, and adaptive applications that use a large
banking of other patients’ responses to items and the power of computers to
tailor the selection of items presented to individual patients based on their
previous item responses. Essentially, adaptive testing streamlines and indivi-
dualizes screening in a way that cannot be readily accomplished with pencil-
and-paper screening.
There are high hopes that such technology will extend the reach and
sustainability of routine screening for depression by improving its accept-
ability, efficiency, and accuracy, and where resources are available, clinical
settings are in a rush to obtain touchscreens and PDAs. Yet, Rogers and
colleagues note that evaluations of nonadaptive technologies have been largely
limited to their acceptability to patients and clinicians and their comparability
in results to what is obtained with conventional pencil-and-paper screening.
Evidence consistently indicates that such technologies are convenient and
acceptable to patients and comparable in their results to conventional
screening. They may even have advantages for some patients who find a
more impersonal assessment more acceptable and more conducive to honesty
than completing a screen in front of a clinic staff member. Adaptive testing,
however, requires a large database to evaluate the performance of individual
items and develop algorithms for selecting the items to be administered in an
individual screening, and the requisite item banks are just being assembled,
with validated algorithms not readily available for clinical applications.
Perhaps the efficiency and automated screening results afforded by technolo-
gical aids can free up clinical time and other resources for assessment and
treatment, and the same technologies used in screening can be efficiently used
in monitoring the progress of individual depressed patients and tailoring
adjustments in their treatment to maximizing their improvement, including
prompting clinicians of the need for follow-up. This is an ambitious but
potentially realizable goal with emerging technologies.37
Chapter 9, ‘‘Screening for Depression in Primary Care: Can It Become
More Efficient?’’ by Magruder and Yeager is this volume’s most upbeat
chapter about the prospects of screening for depression, but still qualifies
its optimism with the assumption that screening is implemented in clinical
contexts where resources are available to resolve positive screens and ade-
quate treatment and follow-up are ensured. The chapter articulates general
standards for evaluating whether screening is worthwhile, drawing on the
World Health Organization’s criteria for the implementation of screening.
Magruder and Yeager’s optimism is also predicated on progress in technol-
ogies for screening and monitoring clinical change and automating recontact
of patients for follow-up.
Chapters by Parker and Hyett (Chapter 10) and Babaei and Mitchell
(Chapter 11) take opposing sides in the debate over whether screening for
17 COMMENTARY AND INTEGRATION 357

depression should accommodate physical comorbidity. Parker and Hyett


dispute whether there is any consistency in depression seen in psychiatric
versus general medical contexts, and so screening instruments for general
medical settings need to accommodate overlap between the symptoms of the
physical condition and depression. The authors suggest that screening instru-
ments originally developed in specialty mental health contexts may be
inadequate for general medical settings without substantial modification.
Importantly, Parker and Hyett argue that an excess of false positives will
accumulate in screening if the influence of confounding medical conditions is
not taken into account. In contrast to an ‘‘inclusive’’ approach to diagnosis
and assessment, in which symptoms that might be attributable to physical
health conditions, two alternatives exist. The first, exclusionist approach is to
eliminate consideration of possibly confounded symptoms, and the second,
substitutive approach involves substituting other symptoms for those that are
suspected to be confounded. Parker and Hyett identify three screening instru-
ments that adopt an exclusionist or approach to test construction: the seven-
item Beck Depression Inventory for Primary Care (BDI-PC), the Hospital
Anxiety and Depression Scale (HADS), and a new scale developed by Parker
and colleagues, the DMI. The BDI-PC was constructed by excluding more
somatically oriented items from the existing scale, which already had a
heavily cognitive emphasis. The HADS, like the Edinburgh Postnatal
Depression Scale (EPDS), was constructed with an effort to avoid not only
somatic items but also formal psychiatric symptoms, with item construction
using deliberately colloquial language, with the intention of destigmatizing
and therefore being more acceptable to nonpsychiatric patients. The logic is
appealing, and as a result, both the HADS and EPDS have been widely
recommended and implemented. Yet, both are as highly correlated with
conventional scales as their respective reliabilities allow, and in head-to-
head comparisons, neither has been shown to be consistently superior to
conventional scales. The use of colloquial language may be a source of
problems in that such items lack specificity and may resist precise transla-
tion, as in the case of the ‘‘butterflies in the stomach’’ item of the HADS or
the ‘‘Things have been getting on top of me’’ item of the EPDS. Different cut
points that are obtained in apparently similar populations have been a source
of complaints about both scales,38 and it is not clear whether this is entirely
due to ‘‘home-court advantage.’’ Parker and Hyett’s well-reasoned argu-
ments for construction of scales with an exclusionist strategy have not
translated into better performance. Parker’s own DMI adopted a ‘‘bottoms
up’’ approach to scale construction using a medically ill population and
emphasizing cognitive symptoms. It remains to be seen whether apparent
advantages persist when cutoffs are fixed and tested in head-to-head compar-
isons in new populations.
358 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Babaei and Mitchell acknowledge that substantial physical comorbidity


occurs in general medical care, and, further, that the association between
physical health problems and depression is likely to be bidirectional.
Physical conditions may be associated with a prolonging of depressive epi-
sodes, and untreated depression may slow functional recovery from the phy-
sical conditions. They articulate three strategies for investigating the role for
somatic symptoms in making a diagnosis of depression, but also the selection
of item content for screening instruments: comparing somatic items’ discrimi-
nation between healthy controls and those with major depression; comparing
the somatic items’ ability to distinguish between patients with uncomplicated
major depression and those with comorbid major depression and physical
health problems; and comparing patients with comorbid depression and those
with physical illness alone. With regard to the question of somatic symptoms in
making a diagnosis of depression among populations that were not physically
ill, Babaei and Mitchell found that reports of single somatic symptoms are
common among nondepressed persons, but that these symptoms nonetheless
have a contribution to make to both ruling in and ruling out depression.
Concerning whether somatic items perform differently in physically ill popu-
lations, they cite evidence that somatic items function similarly in both popu-
lations (see also Thombs and colleagues20). Finally, they report that while
individual somatic symptoms may be common among depressed and nonde-
pressed patients with physical illness, taken together, such symptoms still are
more elevated among physically ill patients with comorbid depression.
Overall, Babaei and Mitchell conclude that somatic symptoms have a role in
detecting and diagnosing depression among both otherwise healthy and phy-
sically ill patients. Scales constructed to exclude such symptoms may have no
advantage and may actually be at a disadvantage.
The remaining five chapters (Chapters 11 through 16) discuss screening for
depression in specialty medical settings or among patients with specific phy-
sical illnesses. Calls for introducing routine screening in these contexts are
most often based on generalizations from the evidence from primary care
populations. Additionally, calls for screening are bolstered by the belief that
depression and other mental disorders have a heightened prevalence in parti-
cular medical settings or populations. Many of these claims have been deflated
by methodologically superior studies with representative samples and diag-
noses based on semi-structured diagnostic interviews. Thus, claims by a former
president of the American Psychiatric Association39 that half of all cancer
patients have a psychiatric disorder stretches the definition of psychiatric
disorder and are seemingly contradicted by findings that cancer patients are
not more likely that other medically ill populations to be depressed.40,41
Similarly, major depression during pregnancy and postpartum has important
implications for mothers, their infants, and the family, and so should be of
17 COMMENTARY AND INTEGRATION 359

particular concern, but it appears that major depression is no more common


among pregnant and postpartum women than among age-matched control
women.42 The inevitable deflation of such claims can lead to a backlash and
a withdrawal of necessary resources for dealing with the depression that is
found in these settings.
Calls for screening for depression into specialty settings are often based on
associations between depression and adverse health outcomes, such as reinfarc-
tions and mortality among postmyocardial infarction patients,43 and the pre-
sumption that improving depression outcomes will yield other benefits in terms
of improvements in these physical health outcomes Yet, however well intended,
such claims have typically yet to be supported by treatment of depression
producing demonstrable changes in physical health.20 Perhaps a failure to
obtain expected gains is due to the quality of care for depression not being
adequate to produce sufficient change in depression. Regardless, here too defla-
tion of unrealistic claims may result in a diminished interest in treating depres-
sion and a withdrawal of resources needed for adequate care for depression.
Enhancements in care for depression such as collaborative care that have
proven necessary for screening to have its intended effect in primary care may
face formidable challenges in any effort to import and sustain them in tertiary
specialty medical settings. Importantly, specialty physicians may not be pre-
pared to provide the investment in time and resources and collaboration that is
expected of primary care physicians. They may see diagnosing and treating
depression as beyond their competence or priorities, especially when they are
better trained to deal with life-threatening specialty conditions. They may be
discouraged with their limited success in making referrals for depression care
and the small proportion that actually get completed. Perhaps effective
enhancements of care for depression in many specialty settings will have to
involve not just collaborative care, but also integration of mental health
professionals into the setting who take primary responsibility for diagnosing
and treating depression.
Andres Kanner (Chapter 12) discusses screening in neurologic and rehabi-
litation settings, focusing on four major neurologic disorders: stroke, epilepsy,
Parkinson’s disease, and multiple sclerosis. Some consistent themes emerge.
First, depression is not only common but is likely to have a bidirectional
relationship with disorders, sometimes being prodromal and emerging prior
to the neurologic disorder becoming evident. Second, with the bulk of depres-
sion remaining undetected in neurology settings, Kanner advocates screening,
but with some important cautions. For each of these disorders, idiosyncratic
depressive symptoms associated with particular disorder can be used to make
the case for specific depressive symptoms reflecting underlying neurologic
deficits and allowing for the existence of depressive syndromes to varying
degrees not captured in ISC-10 or DSM-IV nosology.
360 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

In a recent paper, Kanner44 asks whether neurologists should be trained to


recognize depression, which seems to imply that the answer to this question is
not universally agreed upon among neurologists. He explains the question
needs to be asked because in the past 5 to 10 years, the residency training of
neurologists has excluded any psychiatric training. He also gives research
examples that demonstrate that undetected depression among neurology
patients often persists, and that patient referrals for depression are often not
completed. Furthermore, he cites unpublished survey data indicating that most
neurologists do not screen for depression, but they would if they were con-
fronted with convincing data that treating depression would improve their
patients’ medical adherence and quality of life.
Carlson and colleagues (Chapter 13) identify unique aspects of screening for
depression in the cancer care setting. Screening for depression there is under-
stood within a distress paradigm, rather than depression as a psychiatric
disorder. Psychological distress has become the ‘‘sixth vital sign’’45 requiring
routine assessment, preferably at every visit, according to some advocates. The
bulk of the empirical work on screening for distress or depression in cancer
care consists of studies of the calibration or validation of screening instruments
or their acceptability. Carlson and her colleagues in Calgary and Sharpe and his
colleagues in Edinburgh have systematic research programs underway evalu-
ating the implementation of screening, but these programs are the exception in
moving the field forward in the obvious next step of evaluating the benefits of
screening, beyond scale performance and acceptability.
There are signs that the advocacy of routine screening for distress in cancer
care is not being heeded. Jacobsen and Ransom46 found only that 3 of the 15
cancer centers of the National Comprehensive Cancer Network (NCCN) that
has promulgated the recommendations for routine screening had actually
implemented these procedures. Mitchell and associates47 surveyed workers
in cancer care in the United Kingdom and found that fewer than 10% relied on a
standardized questionnaire; most preferred to rely on clinical skills or recall of
the two questions of the PHQ-2. Yet, Mitchell’s48 Bayesian analyses found that
that while two question were adequate in ruling out depression, more extensive
interviewing is necessary to confirm a diagnosis.
Implementation of a screening program in one major cancer center resulted
in a shift from the population largely consisting of breast cancer patients
seeking psychosocial services on their own or being referred by cancer care
professionals to an increasing proportion of head and neck cancer patients49
before the program was discontinued. A positive screen warrants a discussion
of a patient’s sources of distress, but many of these sources are not cancer-
related and not best addressed by cancer care professionals. Thus, a positive
screen on a measure of distress that does not represent a psychiatric disorder for
which there are empirically based guidelines will often nonetheless require a
17 COMMENTARY AND INTEGRATION 361

time-consuming discussion that does not lead to treatment. A false-positive


screen is often not dispensed with a brief reassurance, but may require an
extended discussion of nonmedical issues. Mitchell48 found that in contrast to
most nurses, most oncologists were not prepared to give patients sufficient
time to discuss their distress. While such issues do not rule out screening
programs, they do suggest the need for considerable planning, communication
with affected staff, and piloting before implementation. Garssen and de Kok50
propose that simply asking cancer patients what services they want is prefer-
able to formal screening.
Despite these challenges, there have been two promising projects
demonstrating that effective care for depression can be provided to
cancer patients. Dwight-Johnson and associates51 showed that a collabora-
tive care model could be used to improve the treatment and outcome of
depression among low-income Latina breast and ovarian cancer patients.
The collaborative care team involved psychiatrists and bilingual master’s-
level social workers who had to address numerous barriers to the women
becoming engaged in effective care. Strong and colleagues52 showed that
trained cancer care nurses could have similar effects n a different health-
care environment in Scotland. The nurses provided psychoeducation,
problem-solving therapy, and consultation with patients’ oncologists and
primary care physicians.
Screening for depression may be most enthusiastically promoted in onco-
logic settings, but it is in perinatal settings where the most ambitious efforts
have been made to implement it (Boyce and Barton, Chapter 14). Still, there is
a lack of systematic data from controlled trials that screening improves depres-
sion outcomes for pregnant and postpartum women. Accumulating data sug-
gest that implementation of screening programs meets with resistance from
clinicians and women alike and does not yield a substantial increase in the
uptake of treatment, much less improvements in outcomes.53,54 Many maternal
care providers are uncomfortable treating pregnant women for depression
without consultation,55 preferring to refer them to mental health professionals
versus overseeing treatment themselves.56
The literature advocating screening of women during pregnancy and the
postpartum seems to downplay the immense barriers to women obtaining
uninterrupted quality care for depression during these periods. Many women
abruptly terminate existing antidepressant treatment when planning to get
pregnant or learning that they are pregnant,55 often without medical consulta-
tion, leaving them at a risk for relapse estimated to be 43%.57 Pregnant women
may be reluctant to take medication of any kind but are particularly apprehen-
sive about antidepressants.58 There is evidence for long-term effects of in utero
exposure to antidepressant medications, but the absolute risk is considered
quite low.59
362 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Boyce and Barton review the literature, ranging from available instruments
and screening practices to the paucity of evidence that screening makes any
difference in clinical outcomes. Most instruments lack adequate validation in
perinatal settings. Some instruments, such as the EPDS and the Postnatal
Depression Screening Scale (PDSS), have appealing names suggesting parti-
cular appropriateness for perinatal settings, but that is not substantiated in head-
to-head comparisons with more generic screening instruments. Boyce notes that
some initially appealing notions have arisen in the perinatal screening for
depression literature that warrant critical scrutiny. The first, endorsed by the
U.K. National Institute for Clinical Excellence (NICE),60 is that screening can be
efficiently accomplished with three items, two concerning the core symptoms of
depression and the third inquiring whether help is sought. The rationale is that
the third question addresses the problem that many pregnant and postpartum
women do not wish help, and so attention can be directed away from them with a
negative response to this question. As noted above, a systematic review17
recommends against such ultra-short screening instruments except as a rule-
out of further examination. The performance of the ‘‘help question’’ has not been
formally evaluated. Although it might be attractive in other clinical contexts, it
poses an obvious problem with pregnant or nursing women. Such women are
averse to taking antidepressants. However, a personalized risk assessment with
their maternal care provider might result in women deciding to initiate or resume
treatment, particularly after delivery. A negative response to the ‘‘help question’’
effectively rules out such discussions.
The second novel idea from Barton and Boyce is that with prevention of
depression as a goal, the focus of screening should not be just for current
depression, but for risk factors such as low social support. The authors dismiss
this because ‘‘most risk factors have poor discriminatory power, or poor
positive predictive value.’’ To these objections could be added that preventive
interventions require treating many ‘‘at-risk’’ persons who will not develop the
disorder anyway.
Hermanns and Kulzer’s coverage of the detection and management of
depression in diabetes care (Chapter 16) reiterates some points raised in the
chapters on neurologic, rehabilitation, and cardiac settings but also introduces
some new considerations. While good arguments can be made that the pre-
valence of major depression is likely high among persons with diabetes,
prevalence estimates obtained with research diagnostic interviews in represen-
tative populations suggest that the comorbidity of diabetes and major depres-
sion is well within the range of other chronic medical conditions.41 There may
nonetheless be a bidirectional association between major depression and dia-
betes,61 and particularly between major depression and diabetes control and
related complications. Hermanns and Kulzer suggest that the possibility that
major depression may be more common in persons with diabetes with poor
17 COMMENTARY AND INTEGRATION 363

glycemic control or serious diabetic complications poses a rationale for depres-


sion screening that specifically targets these higher-risk patient populations.
In the Pathways study, Katon and colleagues62 demonstrated the effectiveness
of collaborative care for improving the outcome of major depression among
patients with diabetes. However, some caution should be exercised in generalizing
from this study to care for diabetes in specialty settings. The study was conducted
in primary care, where patients are more likely to be older and not insulin
dependent. More importantly, primary care physicians are more likely than diabe-
tologists to assume responsibility for care for depression and actively collaborate
with depression care managers in the manner required by such interventions.
Nonetheless, results of the Pathways study point to the possibility of ‘‘bundling’’
improvements in the care for both diabetes and major depression, with nurse
specialists providing monitoring and follow-up care for both conditions.
Thombs and Ziegelstein (Chapter 15) provide an overview of screening for
depression in cardiovascular care. Provocative findings that depression fol-
lowing a myocardial infarction is an independent risk factor for reinfarction
and death63 have captured the attention of behavioral medicine and mental health
professionals. The authors note that a number of organizations have endorsed
screening for depression, and most recently the American Heart Association
(AHA) has updated recommendations and provided specific instructions on how
screening should be conducted.64 They also review evidence concerning the
prevalence of major depression in cardiovascular disease, the performance of
screening instruments in cardiovascular care, and recommendations for evalua-
tion and treatment of depressed patients in cardiovascular care. They note that in
addition to studies of performance in primary care, a variety of instruments have
been tested specifically with cardiac patients. Few instruments have been vali-
dated in more than one sample, and there is no convincing evidence of the
superiority of any instrument. The authors provide strong documentation of the
home-court advantage effect hinted at in studies with other populations. They
ultimately fall back on the criteria of ease of administration and scoring and,
consistent with a National Heart, Lung, and Blood Institute Working Group
recommendation, endorse the PHQ-9 until contradicted by new data. They
similarly endorse USPSTF guidelines for screening, but reiterate the necessity
of resources being available for treatment and follow-up.
A systematic review of screening for depression in cardiovascular settings65
coincidentally appeared at the same time as the statement by the AHA64 was
released. The contrast between the systematic review and the AHA recom-
mendations was striking. The authors of the systematic review were unable to
find sufficient evidence for or against routine screening for depression to make
a recommendation. Yet, they expressed concern that ‘‘the adoption of depression
screening in cardiovascular care settings would likely be unduly resource-inten-
sive and would not be likely to benefit patients in the absence of significant
364 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

changes in current models of care.’’ Yet, the AHA statement declared that the
opportunity ‘‘should not be missed’’ to screen for depression in patients with
cardiovascular disease across the settings in which they are treated. Furthermore,
qualified professionals should follow up with patients who screen positive and
monitor their treatment. The statement concludes by noting that ‘‘Coordination
of care between healthcare providers is essential in patients with combined
medical and mental health diagnoses.’’ Yet, what evidence is there that intro-
duction of screening will improve cardiac outcomes, or for that matter, even
depression outcomes as hoped? What evidence is there that cardiologists are
willing or able to diagnosis or initiate treatment of depression, and if they are not
up to these tasks, where are the mental health professionals in cardiac care to fill
in? Taken seriously, the AHA statement seems to call implicitly for an integrated
system of care for depression that shows no sign of being developed within
cardiac care settings, yet there is no recognition expressed in the statement that
such a fundamental reorganization of care is needed or possible.

Integration: Deflating the Puffer Phenomenon and Making


the Case Against Screening
Goldner66 has described the puffer phenomenon in psychiatric epidemiology, a
spurious initial inflation of prevalence or incidence rates that is corrected in
later studies that are more methodologically sophisticated and draw on more
representative samples. Puffer phenomena in screening studies are not limited
to exaggerated estimates of prevalence, however, but extend to estimates of
unmet clinical need, the accuracy and efficiency of screening instruments, the
acceptability of screening to clinicians and patients, the resulting uptake of
clinical services, the effectiveness with which those services will be delivered,
and, ultimately, the yield in terms of improved patient outcomes. The various
chapters in this book and the material introduced in the present chapter provide
ample demonstration of the pervasiveness of puffer phenomena in making the
case for routine screening of depression.
Exaggerated estimates of the potential benefits of screening begin with
estimates of rates of clinically significant depression produced by lay clinical
interviews conducted in the community and clinical settings. That much of the
depression identified in this way is also found to be undetected and untreated at
least in part reflects on the validity of these estimates, not just well-documented
inadequacies of routine care for depression.
Advocates of screening further inflate estimates of the presumed prevalence
of depression by expanding the range of depressive phenomena to include
elevated scores on depression scales, various ill-defined subclinical states, and
minor depression. The question needs to be refocused on not whether such
17 COMMENTARY AND INTEGRATION 365

conditions are associated with impairment, but whether they are effectively
and best addressed in general medical settings, and with the most likely
intervention offered there, prescription of an antidepressant.
Repeatedly seen across chapters were claims of the superiority of particular
screening instruments, usually based on findings in a single sample that are not
replicated in subsequent samples. In contrast, the message of this volume might
be that cut points that are not cross-validated should be disallowed. Overall,
across general medical populations there is little evidence of the superiority of
any particular instrument and little support for the intuitively appealing notion
that an instrument with somatically oriented items removed will perform better
than a conventional instrument that includes them.
Taking an overview of the large literature on performance of screening instru-
ments, one gets a sense of the difficulty of inferring from estimates of sensitivity
and specificity how instruments will perform with the prevalence of depression
found in particular populations. Assuming the prevalence of clinically significant
depression is 20% or more rather than the more realistic 9% to 12% can yield
markedly distorted estimates of false positives and false negatives.
The bulk of the empirical literature concerning screening does not stop with
evidence-based estimates of the performance of screening instruments or com-
parisons between the instruments and rates of unassisted detection of depression
by primary care physicians, but rather proceeds to project how introduction of
screening will improve detection and promote treatment of depression. Missing
from these inflated estimates, however, is any consideration of whether physi-
cians would actually offer treatment of depression to otherwise undetected cases
of depression and whether these physicians are registering in their ‘‘nondetec-
tion’’ that they do not consider such depression as appropriate for treatment or
that patients would accept treatment. The large gaps that are reported between
rates of detection under naturalistic conditions and actual treatment rates should
give pause to anyone assuming that rates of undetected depression necessarily
represent missed opportunities for effective intervention.
The final missing bit of evidence from most enthusiastic claims for
screening are data suggesting that detected cases of depression do better than
undetected cases, or even that the treatment offered to detected patients will be
adequate and appropriate and lead to improved outcomes. Available assess-
ments of the quality of routine care for depression in general medical settings
are a cause for great pessimism.
It is unfortunate that the strong sentiment in favor of routine screening for
depression in general medical settings is such that any expression of skepticism
is held to a higher burden of proof than unsubstantiated claims for its benefits.67
Skepticism is countered quickly by its contradiction from practice guidelines and
their advocates and ‘‘everybody knows’’ clinical wisdom. Nonetheless, the basic
data seem obvious in their implication. Introduction of screening without
366 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

substantial resources to enhance routine care does not appreciably improve


outcomes, and analyses of the literature concerning enhancement of care for
depression in primary care do not indicate an independent contribution to any
screening component. Furthermore, introduction of well-resourced collaborative
care interventions, with or without screening, still produces only modest
improvements in the quality of care that often are not sustained beyond the
implementation phase.68 The presumed contribution of screening needs to be
evaluated in the context of rapidly evolving conditions in the larger clinical and
community environment. Most importantly, one must consider the escalating
rates of prescription of antidepressants, often exceeding reasonable estimates of
the prevalence of depression, and with much of this medication being prescribed
to persons who are not depressed. Finally, there is persistent evidence that the
intensity of treatment of depression in the community is too inadequate to yield
clinically significant improvements in depression outcomes.
Are there risks to implementing routine screening for depression? First, there
is the risk that screening will consume scarce resources and aggravate existing
problems. More patients will inappropriately be prescribed antidepressants, with
the already nonspecific diagnostic and prescribing practices of physicians
increased if patients are identified by screening instruments as depressed that
physicians and the patients themselves would not identify. The resources con-
sumed by these ineffective efforts might be at the expense of already poor
monitoring and follow-up of patients who are already known to be depressed.
All of these problems would be compounded in specialty settings where physi-
cians are less inclined and less prepared to diagnose and treat depression and
where pressing, well-defined medical issues compete with depression care.
Should we do nothing if we do not introduce routine screening for depression?
The most urgent tasks are to correct existing problems, namely that depression is
so inadequately treated in the primary care setting. Patten69 has produced
mathematical models that strongly indicate that improvements in the outcome
of depression for patients who are already being treated is more cost-effective
than introducing more patients into this inadequate treatment. Misplaced con-
fidence in the power of screening alone to affect depression outcomes will delay
recognition of this problem and divert existing resources from its solution.

References
1. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults:
a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
2. Mojtabai R, Olfson M. National patterns in antidepressant treatment by psychiatrists and
general medical providers: Results from the National Comorbidity Survey Replication.
J Clin Psychiatry. 2008;69:1064–1074.
17 COMMENTARY AND INTEGRATION 367

3. Fernandez A, Haro JM, Martinez-Alonso M, et al. Treatment adequacy for anxiety and
depressive disorders in six European countries. Br J Psychiatry. 2007;190:172–173.
4. Berndt ER, Bir A, Busch SH, et al. The medical treatment of depression, 1991–1996:
productive inefficiency, expected outcome variations, and price indexes. J Health
Economics. 2002;21:373–396.
5. Berardi D, Menchetti M, Cevenini N, et al. Increased recognition of depression in
primary care—Comparison between primary-care physician and ICD-10 diagnosis of
depression. Psychotherapy and Psychosomatics. 2005;74:225–230.
6. Esposito E, Wang JL, Adair CE, et al. Frequency and adequacy of depression treatment
in a Canadian population sample. Can J Psychiatry. 2007;52:780–789.
7. Mojtabai R. Increase in antidepressant medication in the US adult population between
1990 and 2003. Psychotherapy and Psychosomatics. 2008;77:83–92.
8. Beck CA, Patten SB, Williams JVA, et al. Antidepressant utilization in Canada. Social
Psychiatry Psychiatr Epidemiol. 2005;40:799–807.
9. Kessler RC, Merikangas KR, Wang PS. Prevalence, comorbidity, and service
utilization for mood disorders in the United States at the beginning of the twenty-first
century. Ann Rev Clin Psychol. 2007;3:137–158.
10. Depression Guideline Panel (1993). Depression in primary care: Vol. 2. Treatment of
major depression (Clinical Practice Guideline No. 5, AHCPR Publication No. 93–0551).
Rockville, MD: Department of Health and Human Services, Public Health Service,
Agency for Health Care Policy and Research.
11. Patten SB. Major depression prevalence is very high, but the syndrome is a poor proxy for
community populations’ clinical treatment needs. Can J Psychiatry. 2008;53:411–418.
12. Brugha TS, Jenkins R, Taub N, et al. A general population comparison of the Composite
International Diagnostic Interview (CIDI) and the Schedules for Clinical Assessment in
Neuropsychiatry (SCAN). Psychol Med. 2001;31:1001–1013.
13. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of
PRIME-MD—The PHQ primary care study. JAMA. 1999;282:1737–1744.
14. Norton J, De Roquefeuil G, Boulenger JP, et al. Use of the PRIME-MD Patient Health
Questionnaire for estimating the prevalence of psychiatric disorders in French primary
care: comparison with family practitioner estimates and relationship to psychotropic
medication use. Gen Hosp Psychiatry. 2007;29:285–293.
15. Bermejo I, Frey C, Kriston L, et al. Stability of the effects of guideline training in
primary care on the identification of depressive disorders. Primary Care & Community
Psychiatry. 2007;12:99–107.
16. Kessler RC, Wang PS. The descriptive epidemiology of commonly occurring mental
disorders in the United States. Ann Rev Public Health. 2008;29:115–129.
17. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression
in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Practice.
2007;57:144–151.
18. Bennett IM, Coco A, Coyne JC, et al. Efficiency of a two-item pre-screen to reduce the
burden of depression screening in pregnancy and postpartum: An IMPLICIT network
study. J Am Board Family Med. 2008;21:317–325.
19. Williams JW, Pignone M, Ramirez G, et al. Identifying depression in primary
care: a literature synthesis of case-finding instruments. Gen Hosp Psychiatry.
2002;24:225–237.
20. Thombs BD, Fuss S, Hudson M, et al. High rates of depressive symptoms among
patients with systemic sclerosis are not explained by differential reporting of somatic
symptoms. Arthritis Rheumatism. 2008;59:431–437.
368 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

21. Gilbody S, Whitty P, Grimshaw J, et al. Educational and organizational interventions to


improve the management of depression in primary care—A systematic review. JAMA.
2003;289:3145–3151.
22. Hodges B, Inch C, Silver I. Improving the psychiatric knowledge, skills, and attitudes of
primary care physicians, 1950–2000: A review. Am J Psychiatry. 2001;158:1579–1586.
23. Valenstein M, Dalack G, Blow F, et al. Screening for psychiatric illness with a
combined screening and diagnostic instrument. J Gen Intern Med. 1997;12:679–685.
24. Spitzer RL, Williams JBW, Kroenke K, et al. Utility of new procedure for
diagnosing mental disorders in primary care—the PRIME-MD-1000 study. JAMA.
1994;272:1749–1756.
25. Lewis G, Sharp D, Bartholomew J, et al. Computerized assessment of common mental
disorders in primary care: Effect on clinical outcome. Family Practice. 1996;13:120–126.
26. Magruder-Habib K, Zung WWK. Improving physicians’ recognition and treatment of
depression in general medical care—Results from a randomized clinical trial. Medical
Care. 1990;28:239–250.
27. Swindle RW, Rao JK, Helmy A, et al. Integrating clinical nurse specialists into the
treatment of primary care patients with depression. Int J Psychiatry Med. 2003;33:17–37.
28. Borowsky SJ, Rubenstein LV, Meredith LS, et al. Who is at risk of nondetection of
mental health problems in primary care? J Gen Intern Med. 2000;15:381–388.
29. Demyttenaere K, Bruffaerts R, Posada-Villa J, et al. Prevalence, severity, and unmet
need for treatment of mental disorders in the World Health Organization World Mental
Health Surveys. JAMA. 2004;291:2581–2590.
30. Schwenk TL, Coyne JC, Fechner-Bates S. Differences between detected and
undetected patients in primary care and depressed psychiatric patients. Gen Hosp
Psychiatry. 1996;18:407–415.
31. Coyne JC, Schwenk TL, Fechnerbates S. Nondetection of depression by primary care
physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12.
32. Von Korff M. Case definitions in primary care—the need for clinical epidemiology.
Gen Hosp Psychiatry. 1992;14:293–295.
33. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce
the burden of mental health assessment. Psychiatr Serv. 2008;59:361–368.
34. Santor DA, Coyne JC. Shortening the CES-D to improve its ability to detect cases of
depression. Psychol Assess. 1997;9:233–243.
35. Wells KB, Sherbourne C, Schoenbaum M, et al. Impact of disseminating quality
improvement programs for depression in managed primary care—A randomized
controlled trial. JAMA. 2000;283:212–220.
36. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression:
a meta-analysis. Can Med Assoc J. 2008;178:997–1003.
37. Unick GJ, Shumway M, Hargreaves W. Are we ready for computerized adaptive
testing? Psychiatr Serv. 2008;59:369–369.
38. Matthey S, Henshaw C, Elliott S, et al. Variability in use of cut-off scores and formats
on the Edinburgh Postnatal Depression Scale—implications for clinical and research
practice. Arch Womens Mental Health. 2006;9:309–315.
39. Riba M. Identifying depression, distress, & anxiety in cancer patients [Online]. Physicians
Weekly [serial online] 1997 [cited 2005 Jul 05]; 22(28) [4 screens] Available from: http://
www.physweekly.com/article.asp?IssueID=260&ArticleID=2426.
40. Coyne JC, Palmer SC, Shapiro PJ, et al. Distress, psychiatric morbidity, and
prescriptions for psychotropic medication in a breast cancer waiting room sample.
Gen Hosp Psychiatry. 2004;26:121–128.
17 COMMENTARY AND INTEGRATION 369

41. Evans DL, Charney DS, Lewis L, et al. Mood disorders in the medically ill: Scientific
review and recommendations. Biol Psychiatry. 2005;58:175–189.
42. Vesga-Lopez O, Blanco C, Keyes K, et al. Psychiatric disorders in pregnant and
postpartum women in the United States. Arch Gen Psychiatry. 2008;65:805–815.
43. Frasure-Smith N, Lesperance F. Reflections on depression as a cardiac risk factor.
Psychosom Med. 2005;67:S19–S25.
44. Kanner AM. Should neurologists be trained to recognize and treat comorbid depression
of neurologic disorders? Yes. Epilepsy & Behavior. 2005;6:303–311.
45. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign in cancer care. J Clin
Oncol. 2005;23:6440–6441.
46. Jacobsen PB, Ransom, S. Implementation of NCCN distress management guidelines by
member institutions. Journal of the National Comprehensive Cancer Network.
2007;5:99–103.
47. Mitchell A, Kaar S, Coggan C, et al. Acceptability of common screening methods used
to detect distress and related mood disorders—preferences of cancer specialists and
non-specialists. Psychooncology. 2008;17:226–236.
48. Mitchell AJ. Are one or two simple questions sufficient to detect depression in cancer
and palliative care? A Bayesian meta-analysis. Br J Cancer. 2008;98:1934–1943.
49. Zabora JR, Diaz L, Loscalzo MJ, et al. Psychosocial screening goes mainstream: a
prospective problem-solving system as an essential element of comprehensive cancer
care: background and rationale. Psychooncology. 2003;12(Suppl. 4):S71.
50. Garssen B, de Kok E. How useful is a screening instrument? Psychooncology.
2008;17:726–728.
51. Dwight-Johnson M, Ell K, Lee PJ. Can collaborative care address the needs of low-
income Latinas with comorbid depression and cancer? Results from a randomized pilot
study. Psychosomatics. 2005;46:224–232.
52. Strong V, Waters R, Hibberd C, et al. Management of depression for people with cancer
4 (SMaRT oncology 1): a randomised trial. Lancet. 2008;372:40–48.
53. Mitchell AJ, Coyne JC. Screening for postnatal depression: Barriers to success. Br J
Obstet Gynaecol. 2009;116:11–14.
54. Von Ballestrem CL, Strauss M, Kachele H. Contribution to the epidemiology of
postnatal depression in Germany—implications for the utilization of treatment. Arch
Womens Mental Health. 2005;8:29–35.
55. Wisner KL, Zarin DA, Holmboe ES, et al. Risk-benefit decision making for treatment
of depression during pregnancy. Am J Psychiatry. 2000;157:1933–1940.
56. Dietrich AJ, Williams JW, Ciotti MC, et al. Depression care attitudes and practices of
newer obstetrician-gynecologists: A national survey. Am J Obstet Gynecol.
2003;189:267–273.
57. Cohen LS, Altshuler LL, Harlow BL, et al. Relapse of major depression during
pregnancy in women who maintain or discontinue antidepressant treatment. JAMA.
2006;295:499–507.
58. Sleath B, West S, Tudor G, et al. Ethnicity and depression treatment preferences of
pregnant women. J Psychosom Obstetr Gynecol. 2005;26:135–140.
59. Alwan S, Reefhuis J, Rasmussen SA, et al. Use of selective serotonin-reuptake inhibitors
in pregnancy and the risk of birth defects. N Engl J Med. 2007;356:2684–2692.
60. National Institute for Clinical Excellence. Depression: core interventions in the
management of depression in primary and secondary care. London: HMSO, 2004.
61. Rubin RR, Peyrot M. Was Willis right? Thoughts on the interaction of depression and
diabetes. Diabetes Metab Res Rev. 2002;18:173–175.
370 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

62. Katon WJ, Von Korff M, Lin EHB, et al. The Pathways study—A randomized trial of
collaborative care in patients with diabetes and depression. Arch Gen Psychiatry.
2004;61:1042–1049.
63. Frasure-Smith N, Lesperance F, Talajic M. Depression following myocardial
infarction—impact on 6-month survival. JAMA. 1993;270:1819–1825.
64. Lichtman JH, Bigger JT, Blumenthal JA, et al. Depression and coronary heart disease:
Recommendations for screening, referral, and treatment. Circulation. 2008;118:1768–1775.
65. Thombs BD, de Jonge P, Coyne JC, et al. Depression screening and patient outcomes in
cardiovascular care: a systematic review. JAMA. 2008;300:2161–2171.
66. Goldner EM. Is it time to revise our understanding and management of depression? Can
J Psychiatry. 2008;53:409–410.
67. Palmer SC, Coyne JC. Screening for depression in medical care—Pitfalls, alternatives,
and revised priorities. J Psychosom Res. 2003;54:279–287.
68. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression—A cumulative meta-
analysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314–2321.
69. Patten SB. A framework for describing the impact of antidepressant medications on
population health status. Pharmacoepidemiology and Drug Safety. 2002;11:549–559.
Appendix

Table AP.1. Symptoms of Depression from 11 Popular Scales (Ordered by Frequency of Symptom)
Symptom Reference Classic Scales New Scales
(problem
with . . .)
ICD-10 HAM- BDI-II Zung CES-D MADRS GDS-15 HADS EPDS MOS-8 DSM-IV
(MDI) D-21 (PHQ9)

Low mood Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes
(sadness) (blue) (sadness)
Sleep disturbance Yes Yes Yes Yes Yes Yes No No Yes Yes Yes
Interest/ Yes No Yes Yes No Yes No Yes Yes Yes Yes
pleasure
Energy Yes Yes Yes Yes Yes No Yes No No No Yes
Thoughts of death Yes Yes Yes Yes No Yes No No Yes No Yes
or
self-harm
Agitation Yes Yes Yes Yes No Yes No Yes No No Yes
(tension)
Confidence/ Yes No Yes Yes Yes No Yes No No No Yes
self-esteem (worthless) (worthless) (worthless)
Guilt Yes Yes Yes No No Yes No No Yes No Yes
(blame)
Concentration/ Yes No Yes Yes Yes Yes No No No No Yes
indecisiveness
Retardation Yes Yes Yes No No No No Yes No No Yes
Crying No No Yes Yes Yes No No No Yes Yes No
Anxiety/ No Yes No No Yes No Yes Yes Yes No No
fearful
(continued)
Table AP.1. (Continued)

Symptom Reference Classic Scales New Scales


(problem
with . . .)
ICD-10 HAM- BDI- Zung CES-D MADRS GDS- HADS EPDS MOS-8 DSM-IV
(MDI) D-21 II 15 (PHQ9)

Appetite Yes No Yes No Yes No No No No No Yes


Hope No No Yes Yes* Yes* No No Yes* No No No
lessness
Irritability No No Yes Yes Yes (bothered) No No No No No No
Loss libido No Yes Yes Yes No No No No No No No
Lassitude No No No Yes Yes Yes No No No No No
Weight No No No Yes No No No No No No Yes
change
Sense humor/ No No No No No No No Yes Yes No No
laughter
Activities, work No Yes No No No No Yes No No No No
Diurnal mood No Yes No Yes No No No No No No No
variation
Satisfaction/ No No No Yes No No Yes* No No No No
quality of
life
Helplessness No No No No No No Yes No No No No
Punishment No No Yes No No No No No No No No
feelings
Loneliness No No No No Yes No No No No No No
Difficulty No No No No No No No No Yes No No
coping
Constipation No No No Yes No No No No No No No

*Reverse keyed or alternate wording.


Table AP.2. Somatic and Nonsomatic Symptoms of Depression from 11 Popular Mood Scales
Symptom Reference Classic Scales New Scales
(problem
with. . .)
ICD-10 HAM- BDI-II Zung CES- MADRS GDS-15 HADS EPDS MOS-8 DSM-IV
(MDI) D-21 D (PHQ9)

Sleep Yes Yes Yes Yes Yes Yes No No Yes Yes Yes
disturbance
Energy Yes Yes Yes Yes Yes No Yes No No No Yes
Agitation Yes Yes Yes Yes No Yes No Yes No No Yes
(tension)
Concentration/ Yes No Yes Yes Yes Yes No No No No Yes
indecisiveness
Retardation Yes Yes Yes No No No No Yes No No Yes
Appetite Yes No Yes No Yes No No No No No Yes
Loss libido No Yes Yes Yes No No No No No No No
Lassitude No No No Yes Yes Yes No No No No No
Weight change No No No Yes No No No No No No Yes
Activities, work No Yes No No No No Yes No No No No
Diurnal mood No Yes No Yes No No No No No No No
variation
Constipation No No No Yes No No No No No No No
Low mood Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes
(sadness) (blue) (sadness)
Interest/pleasure Yes No Yes Yes No Yes No Yes Yes Yes Yes
Thoughts of death Yes Yes Yes Yes No Yes No No Yes No Yes
or
self-harm

(Continued )
Table AP.2. (Continued)

Symptom Reference Classic Scales New Scales


(problem with. . .)
ICD-10 HAM- BDI-II Zung CES-D MADRS GDS-15 HADS EPDS MOS-8 DSM-IV
(MDI) D-21 (PHQ9)

Confidence/ Yes No Yes Yes Yes No Yes No No No Yes


self-esteem (worthless) (worthless) (worthless)
Guilt Yes Yes Yes No No Yes No No Yes No Yes
(blame)
Crying No No Yes Yes Yes No No No Yes Yes No
Anxiety/fearful No Yes No No Yes No Yes Yes Yes No No
Hopelessness No No Yes Yes* Yes* No No Yes* No No No
Irritability No No Yes Yes Yes No No No No No No
(bothered)
Sense humor/ No No No No No No No Yes Yes No No
laughter
Satisfaction/quality No No No Yes No No Yes* No No No No
of life
Helplessness No No No No No No Yes No No No No
Punishment No No Yes No No No No No No No No
feelings
Lonely No No No No Yes No No No No No No
Difficulty coping No No No No No No No No Yes No No
Somatic/ 1.2 1.75 0.77 1.125 0.71 1 0.4 0.5 0.125 0.33
Psychological
Ratio:
Most Somatic Scale #3 #1 Neutral #4 Neutral Neutral
Most Psychological Neutral Neutral Neutral #3 #4 #1 #2
Scale
Table AP.3. Statistical Summary of Accuracy from Hypothetical Single-Step Diagnostic Tests (n = 1,000)

Depressed (n) TP FN Nondepressed (n) TN FP PPV NPV PSI Youden UIþ UI FC

Prevalence 0.10 (Sensitivity: Specificity)


Single Step 90:90 100 90 10 900 810 90 0.50 0.99 0.49 0.80 0.45 0.89 0.90
Single Step 80:80 100 80 20 900 720 180 0.31 0.97 0.28 0.60 0.25 0.78 0.80
Single Step 70:70 100 70 30 900 630 270 0.21 0.95 0.16 0.40 0.14 0.67 0.70
Single Step 60:60 100 60 40 900 540 360 0.14 0.93 0.07 0.20 0.09 0.56 0.60
Single Step 50:50 100 50 50 900 450 450 0.10 0.90 0 0 0.05 0.45 0.50
Single Step 90:80 100 90 10 900 720 180 0.33 0.99 0.32 0.70 0.30 0.79 0.81
Single Step 90:70 100 90 10 900 630 270 0.25 0.98 0.23 0.60 0.23 0.69 0.72
Single Step 90:60 100 90 10 900 540 360 0.20 0.98 0.18 0.50 0.18 0.59 0.63
Single Step 80:90 100 80 20 900 810 90 0.47 0.98 0.45 0.70 0.38 0.88 0.89
Single Step 80:70 100 80 20 900 630 270 0.23 0.97 0.20 0.50 0.18 0.68 0.71
Single Step 80:60 100 80 20 900 540 360 0.18 0.96 0.15 0.40 0.15 0.58 0.62
Single Step 60:80 100 60 40 900 720 180 0.25 0.95 0.20 0.40 0.15 0.76 0.78
Prevalence 0.20 (Sensitivity: Specificity)
Single Step 90:90 200 180 20 800 720 80 0.69 0.97 0.67 0.80 0.62 0.88 0.90
Single Step 80:80 200 160 40 800 640 160 0.50 0.94 0.44 0.60 0.40 0.75 0.80
Single Step 70:70 200 140 60 800 560 240 0.37 0.90 0.27 0.40 0.26 0.63 0.70
Single Step 60:60 200 120 80 800 480 320 0.27 0.86 0.13 0.20 0.16 0.51 0.60
Single Step 50:50 200 100 100 800 400 400 0.20 0.80 0 0 0.10 0.40 0.50
Single Step 90:80 200 180 20 800 640 160 0.53 0.97 0.50 0.70 0.48 0.78 0.82
Single Step 90:70 200 180 20 800 560 240 0.43 0.97 0.39 0.60 0.39 0.68 0.74
Single Step 90:60 200 180 20 800 480 320 0.36 0.96 0.32 0.50 0.32 0.58 0.66
Single Step 80:90 200 160 40 800 720 80 0.67 0.95 0.61 0.70 0.53 0.85 0.88
Single Step 80:70 200 160 40 800 560 240 0.40 0.93 0.33 0.50 0.32 0.65 0.72

(Continued )
Table AP.3. (Continued)

Depressed (n) TP FN Nondepressed (n) TN FP PPV NPV PSI Youden UIþ UI FC

Single Step 80:60 200 160 40 800 480 320 0.33 0.92 0.26 0.40 0.27 0.55 0.64
Single Step 60:80 200 120 80 800 640 160 0.43 0.89 0.32 0.40 0.26 0.71 0.76
Prevalence 0.50 (Sensitivity: Specificity)
Single Step 90:90 500 450 50 500 450 50 0.90 0.90 0.80 0.80 0.81 0.81 0.90
Single Step 80:80 500 400 100 500 400 100 0.80 0.80 0.60 0.60 0.64 0.64 0.80
Single Step 70:70 500 350 150 500 350 150 0.70 0.70 0.40 0.40 0.49 0.49 0.70
Single Step 60:60 500 300 200 500 300 200 0.60 0.60 0.20 0.20 0.36 0.36 0.60
Single Step 50:50 500 250 250 500 250 250 0.50 0.50 0 0 0.25 0.25 0.50
Single Step 90:80 500 450 50 500 400 100 0.82 0.89 0.71 0.70 0.74 0.71 0.85
Single Step 90:70 500 450 50 500 350 150 0.75 0.88 0.63 0.60 0.68 0.61 0.80
Single Step 90:60 500 450 50 500 300 200 0.69 0.86 0.55 0.50 0.62 0.51 0.75
Single Step 80:90 500 400 100 500 450 50 0.89 0.82 0.71 0.70 0.71 0.74 0.85
Single Step 80:70 500 400 100 500 350 150 0.73 0.78 0.51 0.50 0.58 0.54 0.75
Single Step 80:60 500 400 100 500 300 200 0.67 0.75 0.42 0.40 0.53 0.45 0.70
Single Step 60:80 500 300 200 500 400 100 0.75 0.67 0.42 0.40 0.45 0.53 0.70

TP, true positive; FN, false negative; TN, true negative; FP, false positive; PPV, positive predictive value; NPV, negative predictive value; PSI, predictive
summary index; UI, utility index; FC, fraction correct.
Table AP.4. Statistical Summary of Accuracy from Hypothetical Two-Step (Algorithm) Diagnostic Tests (n = 1,000)

Depressed (n) TP FN Nondepressed TN FP PPV NPV NNP+PPV-1 Youden UI+ UI FC


(PSI)
Prevalence 0.10 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity
Combined i.90:90 ii.90:90 100 81 19 900 891 9 0.90 0.98 0.88 0.80 0.73 0.97 0.97
Combined i.80:80 ii.90:90 100 72 28 900 882 18 0.80 0.97 0.77 0.70 0.58 0.95 0.95
Combined i.80:80 ii.80:80 100 64 36 900 864 36 0.64 0.96 0.60 0.60 0.41 0.92 0.93
Combined i.70:70 ii.70:70 100 49 51 900 819 81 0.38 0.94 0.32 0.40 0.18 0.86 0.87
Combined i.60:60 ii.60:60 100 36 64 900 756 144 0.20 0.92 0.12 0.20 0.07 0.77 0.79
Combined i.80:90 ii.80:90 100 64 36 900 891 9 0.88 0.96 0.84 0.63 0.56 0.95 0.96
Combined i.80:70 ii.90:60 100 72 28 900 792 108 0.40 0.97 0.37 0.60 0.29 0.85 0.86
Combined i.90:60 ii.80:70 100 72 28 900 792 108 0.40 0.97 0.37 0.60 0.29 0.85 0.86
Combined i.80:70 ii.60:90 100 48 52 900 873 27 0.64 0.94 0.58 0.45 0.31 0.92 0.92
Combined i.60:90 ii.80:70 100 48 52 900 873 27 0.64 0.94 0.58 0.45 0.31 0.92 0.92
Prevalence 0.20 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity
Combined i.90:90 ii.90:90 200 81 119 800 792 8 0.91 0.87 0.78 0.40 0.37 0.86 0.87
Combined i.80:80 ii.90:90 200 72 128 800 784 16 0.82 0.86 0.68 0.34 0.29 0.84 0.86
Combined i.80:80 ii.80:80 200 64 136 800 768 32 0.67 0.85 0.52 0.28 0.21 0.82 0.83
Combined i.70:70 ii.70:70 200 49 151 800 728 72 0.40 0.83 0.23 0.16 0.10 0.75 0.78
Combined i.60:60 ii.60:60 200 36 164 800 672 128 0.22 0.80 0.02 0.02 0.04 0.68 0.71
Combined i.80:90 ii.80:90 200 64 136 800 792 8 0.89 0.85 0.74 0.31 0.28 0.84 0.86
Combined i.80:70 ii.90:60 200 72 128 800 704 96 0.43 0.85 0.27 0.24 0.15 0.74 0.78
Combined i.90:60 ii.80:70 200 72 128 800 704 96 0.43 0.85 0.27 0.24 0.15 0.74 0.78
Combined i.80:70 ii.60:90 200 48 152 800 776 24 0.67 0.84 0.50 0.21 0.16 0.81 0.82
Combined i.60:90 ii.80:70 200 48 152 800 776 24 0.67 0.84 0.50 0.21 0.16 0.81 0.82

(Continued )
Table AP.4. (Continued)

Depressed (n) TP FN Nondepressed TN FP PPV NPV NNP+PPV-1 Youden UI+ UI FC


(PSI)
Prevalence 0.50 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity
Combined i.90:90 ii.90:90 500 405 95 500 495 5 0.99 0.84 0.83 0.80 0.80 0.83 0.90
Combined i.80:80 ii.90:90 500 360 140 500 490 10 0.97 0.78 0.75 0.70 0.70 0.76 0.85
Combined i.80:80 ii.80:80 500 320 180 500 480 20 0.94 0.73 0.67 0.60 0.60 0.70 0.80
Combined i.70:70 ii.70:70 500 245 255 500 455 45 0.84 0.64 0.49 0.40 0.41 0.58 0.70
Combined i.60:60 ii.60:60 500 180 320 500 492 8 0.96 0.61 0.56 0.34 0.34 0.60 0.67
Combined i.80:90 ii.80:90 500 320 180 500 495 5 0.98 0.73 0.72 0.63 0.63 0.73 0.82
Combined i.80:70 ii.90:60 500 360 140 500 440 60 0.86 0.76 0.62 0.60 0.62 0.67 0.80
Combined i.90:60 ii.80:70 500 360 140 500 440 60 0.86 0.76 0.62 0.60 0.62 0.67 0.80
Combined i.80:70 ii.60:90 500 240 260 500 485 15 0.94 0.65 0.59 0.45 0.45 0.63 0.73
Combined i.60:90 ii.80:70 500 240 260 500 485 15 0.94 0.65 0.59 0.45 0.45 0.63 0.73
TP, true positive; FN, false negative; TN, true negative; FP, false positive; PPV, positive predictive value; NPV, negative predictive value; PSI, predictive summary index;
UI, utility index; FC, fraction correct.
N = 1000
Input data:
N=100 Se 80Sp 80 Unselected Population
n = 100 n = 900
Depression No Depression

Sp 90%
Screening method #1 Sp 90%

Screen #1 Screen #1
+ve –ve

PPV 50% NPV 98.8%

TP = 90 TN = 810
Possible case Possible non-case
FP = 90 FN = 10

Offered & Accept


Treatment 50%

Adequate Inappropriate Unmet No Unmet


Treatment Treatment Needs Needs

TP = 45 FP = 45 FN = 10 TP = 45 TN = 810 FP = 45

TN = 810 FP = 90 Se 90% PPV 50%


Cumulative Yield (recognition)
TP = 90 FN = 10 Sp 90% NPV 99%

TN = 855 FP = 45 Se 45% PPV 50%


Cumulative Yield (treatment)
TP = 45 FN = 50 Sp 45% NPV 94%

Figure AP.1. Low-risk single screen yield. In this scenario 50% of those who screen
positive are actually depressed, although 99% of those who screen negative are not
depressed. 9% of the sample are possible false alarms or an ‘‘excess diagnostic burden.’’
Assuming 50% of those identified accept treatment, then 45% of depressed patients receive
adequate treatment, but an equivalent raw number of nondepressed subjects receive
inappropriate treatment. 55% of depressed individuals have unmet needs and 855 have no
unmet needs.

379
Input data: N = 1000
N=1000 All Stroke Patients
Prevalence 90%
Step 1: Se 80 Sp 70
Step 2: Se 60 Sp 91 n = 100 n = 900
Depression No Depression

Sp 90%
PHQ2 (Q1 or Q2 +ve) Sp 90%
Screen #1 Screen #1
+ve –ve

PPV 50% NPV 98.8%

TP = 90 TN = 810
Possible case Possible non-case
FP = 90 FN = 10

TP = 90
HADS-D (9v10) FP = 90
Screen #1 Screen #2
+ve –ve
PPV 50% NPV 98.8%

TP = 48 TN = 246
Probable Depression Possible non-case
FP = 24 FN = 32

Offered & Accept


Treatment 50%

Adequate Inappropriate Unmet No Unmet


Treatment Treatment Needs Needs

TP = 12 FP = 6 FN = 32 TP = 36 TN = 876 FP = 18

TN = 876 FP = 24 Se 48% PPV 66%


Cumulative Yield (recognition) TP = 48 FN = 32 Sp 97% NPV 94%

TN = 876 FP = 24 Se 12% PPV 33%


Cumulative Yield (treatment)
TP = 12 FN = 68 Sp 97% NPV 72%

Figure AP.2. Low-risk two-step screen yield. In this scenario the first step screen yields a
positive predictive value (PPV) of only 23%, but adding the second step improves this to
66%. However, if only one in four of those identified are offered and accept treatment, then
the treatment yield is weakened: effectively, the PPV becomes 33% and the negative
predictive value 72%.

380
N = 1000
Selected Population
n = 200 n = 800
Depression No Depression

Se 80%
Screening Method #1 Sp 90%

Screen #1 Screen #1
+ve –ve

PPV 67% NPV 95%

TP = 160 TN = 720
Possible case FP = 80
Possible non-case FN = 40

n = 114 n = 646
Want Help Reject Help
Sp 90%
Screening Method 2 Sp 90%

Screen #2 Screen #2
+ve –ve

PPV 95% NPV 81%

TP = 114 TN = 72
Probable Depression FP = 8 Probable non-case FN = 16

n = 73 n = 79 n = 17 n = 71
Want Help Reject Help Want Help Reject Help

TN = 792 FP = 8 Sp = 95% PPV 95% Helped = 203


Cumulative Yield TP = 144 FN = 56 Sp = 93% NPV 93% Not Helped = 797

Figure AP.3. Two-step screen yield including desire for help in medium-prevalence
sample. In this scenario the first step screen yields a positive predictive value (PPV)
of 67% (given a baseline prevalence of 20%), but the addition of the second step
improves this to 95%. However, only about half of those identified as depressed
actually want and accept professional help and about 15% of those without depression
also want help. Thus, desire for help for psychosocial problems does not map exactly
with the presence of distress.

381
Optimum Cut-off value
Healthy Test Negative Test Positive
Individuals

Healthy

True –ve False +ve

Test Score
False –ve True +ve

Depressed

Depressed
Individuals

Figure AP.4. Conceptual overlap of test scores in healthy and depressed individuals.

382
Revised Emotion Thermometers Scale 7-items

Instructions In the first four columns, please mark the number (0–10) that best describes how much emotional upset you have been
experiencing in the past two weeks, including today. In the next three columns, please indicate how much impact this has had on you.

Emotional Upset Emotional Impact


1. Distress 2. Anxiety 3. Depression 4. Anger 5. Duration 6. Burden 7. Need Help
10 = Extreme 10 = Extreme 10 = Extreme 10 = Extreme 10 = 10+months 10 = Cannot function at all 10 = Desperately

10 10 10 10 10 10 10
9 9 9 9 9 9 9
8 8 8 8 8 8 8
7 7 7 7 7 7 7
6 6 6 6 6 6 6
5 5 5 5 5 5 5
4 4 4 4 4 4 4
3 3 3 3 3 3 3
2 2 2 2 2 2 2
1 1 1 1 1 1 1
0 0 0 0 0 0 0

0 = None 0 = None 0 = None 0 = None 0 = Just today 0 = No Effect on me 0 = can manage myself

Figure AP.5. Emotion Thermometers. Source: Adapted from the NCC Distress Thermometers, , Alex Mitchell.
Edinburgh Postnatal Depression Scale (EPDS)

Name: Address:
Your Date of Birth:
Phone:

Please check the answer that comes


closed to how you have felt in the past 7 days, not just how you feel today.
Here is an example, already completed.

I have felt happy:


Yes, all the time
Yes, most of the time This would mean: “I have felt happy most of the time” during the past week.
No, not very often Please complete the other questions in the same way.
No, not at all

In the past 7 days:


1. I have been able to laugh and see the funny side of things *6. Things have been getting on top of me
As much as I always could Yes, most of the time I haven’t been able
Not quite so much now to cope at all
Definitely not so much now Yes, sometimes I haven’t been coping as
Not at all well as usual
No, most of the time I have coped quite well
2. I have looked forward with enjoyment to things No, I have been coping as well as ever
As much as I ever did
Rather less than I used to *7. I have been so unhappy that I have had difficulty sleeping
Yes, most of the time
Definitely less than I used to
Yes, sometimes
Hardly at all
Not very often
*3. I have blamed myself unnecessarily when things No, not at all
went wrong
Yes, most of the time *8. I have felt sad or miserable
Yes, some of the time Yes, most of the time
Not very often Yes, quite often
No, never Not very often
No, not at all
*4. I have been anxious or worried for no good reason
No, not at all *9. I have been so unhappy that I have been crying
Hardly ever Yes, most of the time
Yes, sometimes Yes, quite often
Yes, very often Only occasionally
No, never
*5. I have felt scared or panicky for no very good reason
Yes, quite a lot *10. The though of harming myself has occured to me
Yes, sometimes Yes, quite often
No, not much Sometimes
No, not at all Hardly ever
Never
Source: Reprinted, with permission, from Cox JL, Holden JM, Sagovsky R. 1987. Detection of postnatal depression:
Development of the 10-item Edinburgh Postnatal Depression Scale. British Journal of Psychlatry 150:782–786.

Figure AP.6. Edinburgh Postnatal Depression Scale (EPDS).  1987 The Royal College of
Psychiatrists. The Edinburgh Postnatal Depression Scale may be photocopied by individual
researchers or clinicians for their own use without seeking permission from the publishers. The
scale must be copied in full and all copies must acknowledge the following source: Cox, J.L.,
Holden, J.M., & Sagovsky, R. (1987). Detection of postnatal depression. Development of the
10-item Edinburgh Postnatal Depression Scale. British Journal of Psychiatry, 150, 782-786.
Written permission must be obtained from the Royal College of Psychiatrists for copying and
distribution to others or for republication (in print, online or by any other medium).
Translations of the scale, and guidance as to its use, may be found in Cox, J.L. & Holden, J.
(2003) Perinatal Mental Health: A Guide to the Edinburgh Postnatal Depression Scale.
London: Gaskell.
Index

Note: Page Numbers followed by f denotes figures, t denotes tables and b denotes boxes

Accuracy. See Diagnostic accuracy Appendices, 371–384


Activities of daily living (ADLs), 244, 246, 257 Edinburgh Postnatal Depression Scale, 384f
Acute myocardial infarction (AMI). See emotion thermometers, 383f
Cardiovascular care low-risk single screen yield, 379f
Adaptive technology. See Technological low-risk two-step screen yield, 380f
approaches overlap of test scores, 382f
Adjustment disorders, 18–19, 168, 195, 268, 271, somatic/nonsomatic symptoms and scales,
275 373–374t
Affect heuristic, 117 statistical accuracy, 375–378t
African Americans symptoms and scales, 371–372t
detection and, 62, 75 two-step screen yield in medium-prevalence
readiness to disclose, 67 sample, 381f
Agency for Health Care Policy and Research Area under the curve (AUC), 93, 268, 275
(U.S.), 125 Availability heuristic, 117
Agency for Healthcare Research and Quality, Axis I disorders
165 epilepsy diagnosis and, 250
Alcohol, Drug Abuse, and Mental Health screening instruments, 22
Administration, U.S. (ADAMHA), 21–22 underdiagnosis of, 115
Alcohol and drug abuse Axis II disorders
underdiagnosis of, 115 screening instruments, 22
Algorithms
in adaptive testing, 144, 354–356 Barriers, diagnostic, 65–66b, 69b
clinician avoidance of, 353 Bayes’ theorem, 59, 59f, 102, 360
for CVD patients, 326 Bech-Rafaelsen Melancholia Scale (MES), 92
in diagnostic accuracy, 22, 49–50, 109 Beck Depression Inventory (BDI)
in diagnostic checklists, 10 BDI-FS (Fast Screen), 38, 245, 248–249
in DSM-IV, 43–44 BDI-PC (Primary Care), 195–196,
in MDI, 43–44 198–199, 357
in mood scales, 33 comparisons to other scales, 39, 41, 45t
psychosocial, 291 future developments, 44, 48
in questionnaires, 12, 14–15 history of, 30, 31, 31b, 37–38
screening frequency, 184 sensitivity to change, 35
for touchscreen technology use, 152 Best estimates procedure (BEP), 15–16
American College of Cardiology, 319 Bipolar disorders
American Heart Association (AHA), 317, 319, among MS patients, 247
363–364 antidepressants and, 119
Anxiety and Depression Detector (ADD), 175 epilepsy and, 254
Aphasic Depression Rating Scale, 33 false-positive diagnosis and, 18

385
386 INDEX

Bipolar disorders (continued ) Centre for Epidemiologic Studies Depression


Parkinson’s disease and, 255 Scale (CES-D). See also specific research
suicidal ideation in, 252 and studies
Black African patients (U.K.) comparisons to other scales, 41–42, 45b
detection and, 64 future developments, 44, 48
Brief Assessment Schedule Depression Cards history of, 31, 31b, 39
(BASDEC), 245 Checklists. See Diagnostic checklists
Brief Finder for Depression (BCD), 196–197 Chronic heart failure, 214, 215f. See also
Brief Symptom Inventory (BSI)-18, 272–277 Cardiovascular care
CIDI. See Composite International Diagnostic
CADI (Computer-Assisted Diagnostic Interview
Interview), 18 Clinical judgment, 113–120
Canadian Community Health Survey (CCHS research on, 114–119
1.2), 249 cognitive processes of clinicians, 116–119
Canadian Task Force on Preventive Health Care narrowness of interviews, 114–115
(CTFPHC), 127b, 128–129 patient feedback, 115–116
Cancer care, 265–294 screening limitations, 119–120
future developments, 293–294 Clinician Evaluation Guide (CEG)
HADS and, 6f (PRIME-MD), 42, 198, 353
implementation of screening programs in, Clinician rated scales, 34–35, 34b
276–292 Cochrane Collaboration evidence review,
evaluating efficacy in, 278–292, 279–290t 125–126, 335, 344, 345
PHQ and, 94, 360 Cognitive-behavioral therapy (CBT)
in cardiovascular care, 318, 327f
prevalence of depression in, 265–266
in diabetes care, 343–344, 346f
Rasch Models and, 92, 93
global fatigue severity and, 247
SCID and, 266–267
in primary care, 168
screening in oncology settings, 267–276
Cognitive heuristics, 117
BSI-18, 272
Collaborative care. See Enhanced care
conventional mood severity scales,
Comorbid depression and somatic symptoms,
267–271, 269–270t
211–217
distress paradigm, 271
comparative studies on, 218–235t
distress thermometers, 272–276, 273–274t
healthy controls versus, 214
screening methods, 266–267
noncomparative studies on, 216–217
special issues in, 292–293
physical illness alone versus, 214–215, 215f
technological approaches and, 152
primary versus secondary depression,
Cardiovascular care, 317–329 211–214, 212f
evaluation and treatment recommendations, Composite International Diagnostic Interview
326–328 (CIDI)
barriers to implementation, 328 CCHS and, 249–250
decision process, 327f explanation of, 20–22
HAM-D and, 217 in MAGPIE study, 64
PHQ and, 214, 324–329, 363 in primary care, 163, 175, 183
prevalence of depression in, 319–320 in studies on diagnostic accuracy, 16t, 17, 21,
SCID and, 217, 244–245 50, 321, 322–323t
screening instruments in, 320–326, 322–323t Comprehensive Psychopathological Rating Scale
performance characteristics, 322–323 (CPRS), 37
Case-finding, definition of, 29–30 Computer-adaptive testing (CAT)
CAT. See Computer-adaptive testing algorithms in, 144, 354–356
CATEGO-5 software, 21 based on Rasch Model, 93–95
CATI (computer-assisted telephone interview), elderly and, 293
144, 147 in oncology settings, 276–278
CBT. See Cognitive-behavioral therapy studies on, 355–356
CEG (Clinician Evaluation Guide, PRIME-MD), Computer-Assisted Diagnostic Interview
42, 198, 353 (CADI), 18
INDEX 387

Computerization issues, 147–150. See also Detection, 57–75


Technological approaches barriers to, 69b
acceptability, 149 influences on, 66–71
availability, 150 clinician communication, 68–71, 69b
embedding in systems, 150 illness related, 71–74, 73f
error control, 147–148 patient self-report, 66–68
examples of, 150–152 over/under diagnosis, 57–62
honesty, 148 detection errors, 60–62, 61–62f
performance, 148 in primary care, 58–60, 59f
physical clues, 148 predictors of, 62–66
price, 149–150 recognition barriers, 65–66b
quality control and accuracy, 147 recognition sensitivity, 63f
workload considerations, 148–149 Diabetes care, 335–346
Confirmatory hypothesis testing, 116 depression as major health problem in
Conspicuous psychiatric morbidity, 136 prevalence of, 336f, 337
Continuum argument, 5 prognostic relevance of, 337–338
Cornell Scale for the Assessment of Depression in quality of life issues, 338–339, 336–337t
Dementia (CSDD), 33 socioeconomic aspects of, 339, 342
Coronary heart disease. See Cardiovascular care screening programs, 344–345
CPRS (Comprehensive Psychopathological clinical management flow diagram, 346f
Rating Scale), 37 cost-effectiveness of, 345
Cronholm-Ottosson Depression Scale, 30 effect on morbidity, 344–345
Crowding-out hypothesis, 73–74 screening tests, 340–343, 341t
acceptability of, 341, 343f
Decision process performance of, 340–341
in cardiovascular care, 327f routine detection versus, 342b
clinical diagnosis, 103f treatment options, 343–344
Depression, 3–24 nonspecific interventions, 343
definition of, 3–7 specific antidepressive interventions, 344
DSM-IV symptoms in Zurich study, 7f Diagnosis, definition of, 29–30
HADS scores in cancer outpatients, 6f Diagnostic accuracy. See also Diagnostic
psychiatric diagnostic certainty levels, 4b methods
test score distribution, 5f algorithms in, 22, 49–50, 109
diagnostic checklists, 10–15 comparison of screening tools, 49t
DSM, 11–15 computerization and, 147
DSM-IV, 9–11, 11b improving accuracy, 49–50
history of, 10b measures of, 102b
ICD-10, 9–15, 11b, 21 in primary care, 16t
for psychiatry, 11b of psychiatrists, 17t, 19t
structured/semi-structured diagnosis in routine diagnosis, 15–19
fully structured assisted interviews, 20b, somatic symptoms, 209–211
21–22 approaches to, 210b, 212f
interviews, 19–22 studies on, 16–17, 16t, 17–20, 17t, 19t, 23f
partially structured assisted interviews, 20b Diagnostic and Statistical Manual (DSM).
unstructured clinician diagnosis, 15–19 See also specific research and studies
diagnostic accuracy in primary care, 16t comparisons to other scales, 20b, 21–22,
diagnostic accuracy in routine diagnoses, 36–44, 45t
15–19 DSM-III-R
diagnostic accuracy of psychiatrists, 17t, 19t checklists, 22
validity of syndrome concept, 7–10 history of, 24
validity testing, 8b DSM-IV
Depression in the Medical Ill (DMI) Scales, algorithms in, 43–44
195–199, 197t, 357 checklists, 9–15, 11b
Depression Scale in Schizophrenia (DEPS), 33 decision-tree logic and, 22
388 INDEX

Diagnostic and Statistical Manual (DSM) ECA study (Epidemiologic Catchment Area),
(Continued ) 17, 42, 210
history of, 21, 24, 31, 31b Edinburgh Postnatal Depression Scale (EPDS)
limitations of, 33 acceptability of, 357
MDD characteristics, 5, 7–8 examples of, 384
somatic symptoms, 208t history of, 31, 41–42
validation of criteria, 11–15, 12t in perinatal care, 33, 301–304, 308–310,
in Zurich study, 7f 313, 362
future developments, 44, 50 Rasch analysis on, 42, 91
history of, 11–12 EDSS (Extended Disability Status Scale), 249
in studies on diagnostic accuracy, 16–17 Education Testing Service, 144
Diagnostic barriers, 65–66b Elderly
Diagnostic certainty, levels of, 4b cancer care and, 265
Diagnostic checklists CES-D and, 92
algorithms in, 10 comorbid depression and, 203–204
DSM-IV, 9–15, 11b computer-adaptive tests and, 293
history of, 10b detection and, 63, 64, 68, 70, 73, 75
ICD-10, 9–15, 11b, 21 IVR methods and, 148
for psychiatry, 11b as stroke patients, 245
Diagnostic Interview Schedule (DIS), 17, 20–22, Electronic medical records (EMR), 150, 184
42, 62, 210 Emotional State Questionnaire-2 (EST-Q2), 175
Diagnostic methods, 99–111 Emotion thermometers, 49, 275–276, 383f
clinical diagnosis, 99–103 Enhanced care, 123–137
accuracy measures, 102b screening and
case examples, 100–101, 100t, 101b, arguments for and against, 124–125
106b, 107t comparison of studies, 131f
decision theory, 103f evidence for, 128–129
evidence-based, 101–103 outcome improvement, 125–127, 126f,
diagnostic accuracy, clinical aspects, 129, 136
105–109 recommendations for, 128
algorithm approaches, 109 studies and patient population, 132–135t
pre/post testing, 106–108, 107t, 108f Enhancing Recovery in Coronary Heart Disease
rule-in accuracy, 108–109 (ENRICHD) trial, 318
diagnostic accuracy, scientific aspects, EORTC QLQ C-30, 278, 291
103–105 Epidemiologic Catchment Area (ECA) study, 17,
likelihood ratios, 104–105, 105f 42, 210
2  2 table, 104f Epilepsy
implementation studies, 109–111 depression in, 249–255
added value, 111 clinical manifestations, 250–253
feasibility, 110–111 epidemiologic aspects, 249–250
UK guidelines, 110b quality of life issues, 254–255
Diagnostic overshadowing, 115 screening instruments, 253–254
Differential item functioning (DIF), 36, 87, HAM-D and, 254
91–93, 213 Erectile dysfunction, 342
DIS. See Diagnostic Interview Schedule Ethnic groups. See specific groups
Distress as ‘‘vital sign,’’ 292 Etiologic approach
Distress thermometers, 272–276, 273–274t characteristics of, 194
DMI Scale. See Depression in the Medical Ill definition of, 210b
DSM. See Diagnostic and Statistical Manual somatic symptoms and, 209–210, 266–267
Dysthymia European Study of the Epidemiology of
criteria for, 12t, 13 Mental Disorders (ESEMeD), 58,
MDD versus, 19 60, 72t
prevalence in primary care, 162 Even Briefer Assessment Scale for Depression
Dysthymic-like disorder of epilepsy (DLDE), 252 (EBAS DEP), 44
INDEX 389

Exclusive approach Household National Health Interview Survey, 204


characteristics of, 193 HSCL-25. See Hopkins Symptom Checklist (SCL)
definition of, 210b Human bias, 147
in screening tools, 198–199 HUNT-II study (Norway), 210
somatic symptoms and, 73–74, 266–267, 357 Hyett, Matthew, 356–357
Extended Disability Status Scale (EDSS), 249
Ictal symptoms, 250–251
Factor analyses, 88, 91 Inclusive approach
False-positive depression, 193–194 characteristics of, 193
Feighner Diagnostic Criteria (FDC), 10, 21 definition of, 210b
Fluoxetine, 343–344 somatic symptoms and, 266–267, 357
Institute of Medicine, 161–162
General Health Questionnaire (GHQ), 31, 65, Institut National de la Santé et de la Recherche
67–69, 75, 88, 94, 175, 208, 321 Médicale (INSERM) study, 60, 72t
Geriatric Depression Scale (GDS) Interactive voice recognition (IVR) methods, 148,
cutoff scores and, 324 149, 152
history of, 31, 40–41 Interictal dysphoric disorder (IDD), 252–253
in neurologic disorders, 244–245 Interictal symptoms, 250–253
PD and, 256–257 International Classification of Diseases (ICD).
in primary care, 169 See also specific research and studies
Rasch analysis on, 41, 91, 93 comparisons to other scales, 20b, 21–22, 43–44
Global Parkinson’s Disease Survey Steering future developments, 10b, 44, 50
Committee, 258 history of, 11–12, 24, 31, 31b
ICD-10
HADS. See Hospital Anxiety Depression Scale assessment tools and, 22
Hamilton, Max, 36 diagnostic checklists, 9–15, 11b, 21
Hamilton Depression Rating Scale (HAM-D) MDD characteristics, 8
in cancer care, 214 somatic symptoms, 208t
in cardiovascular care, 217 validation of criteria, 13
epilepsy and, 254 limitations of, 33
HAMD-17, 257 International Diagnostic Checklists (IDCL), 11b
history of, 30–31, 36–37 Internet based screening. See Technological
in neurologic disorders, 244–245, 247, 257 approaches
Rasch analysis on, 91–92, 216 Interviews, diagnostic, 19–22
sensitivity to change, 35 fully structured assisted interviews, 20b, 21–22
Hawthorne effect, 111 narrowness of, 114–115
Heart disease. See Cardiovascular care partially structured assisted interviews, 20b
Hermanns, Norbert, 362–363 patient-centered, 66b
Hispanics research on, 114–115
collaborative care and, 223 Intuition, 41, 113, 117
readiness to disclose, 67 Item banking, 85–86, 88, 89f, 93–95
History taking IVR (interactive voice recognition) methods, 148,
diagnosis and, 68 149, 152
importance of, 114–115
LEAD (Longitudinal evaluation performed by
HIV/AIDS
Expert clinicians who utilize All available
depression and, 216
Data) standard, 15, 18, 22
Hopkins Symptom Checklist (SCL), 31, 41–42,
Likert scoring, 41
91, 93, 175, 344–345
Lists of Integrated Criteria for the Evaluation of
Hospital Anxiety Depression Scale (HADS). See
Taxonomy (LICET), 11b
also specific research and studies
in cancer care, 6f MAGPIE study
history of, 31, 39–40
on amount of patient contact, 71
in primary care, 40f on detection, 61, 64
sensitivity to change, 35, 37 patient disclosure, 67–68
390 INDEX

Magruder-Habib, K., 356 Mood and Anxiety Spectrum Scales (MASS),


Major Depression Inventory (MDI), 31, 43–44 95, 354
Major depressive disorder (MDD). See also Mood Disorder Questionnaire (MDQ), 254
specific aspects and research Mood scales, 83–96
characteristics of, 5, 7–8 algorithms in, 33
criteria for, 12–15, 12t, 14b Rasch Model, 86–95
as disorder, 3–5, 7 assessment of, 87
syndrome concept and, 7–9 clinical testing, 93
prevalence rates of, 163, 204f computer-adaptive tests, 93–95
unstructured clinician diagnosis in, 15–19 features of, 87–88
MASS (Mood and Anxiety Spectrum Scales), instruments based on, 91–93, 91t
95, 354 item banking, 93–95
MDQ (Mood Disorder Questionnaire), 254 mental health measures and, 88–91, 89–90f
Medical settings, 191–199 tool development, 84–85t, 84–86
DMI and, 195–199, 197t Mood Thermometer (MT), 275–276
false-positive depression, 193–194 MOS 8-Item Depression Screener (Burnam
medically ill patients, 192–193 Screen)
depression in, 192–193 acceptability of, 44–45, 48
HADS and, 194, 195–196 detection and, 62–63
parsimonious screening, 196–197 history of, 31, 42
PRIME-MD and, 198 MOS-D, 169, 172
Men Multiple sclerosis (MS)
detection and, 62, 75 depression in, 246–249
diabetes and, 342 clinical manifestations, 247
erectile dysfunction, 342 epidemiologic aspects, 246–247
stroke and, 243 quality of life issues, 249
Mental disorders screening instruments, 248–249
detection of, 57 SCID and, 249
ESEMeD studies on, 58, 60, 72t
Rasch Model on, 88–91, 89–90f National Ambulatory Medical Care Survey, 164
underdiagnosis of retardation, 115 National Comorbidity Survey, 13, 21, 162,
WHO classification of, 11 164, 166
WHO studies on, 163, 164, 356 National Comprehensive Cancer Network
Mental Health Index and Hospital Anxiety and (NCCN), 271, 272, 294, 360
Depression Scale, 152 National Heart, Lung, and Blood Institute
Mental retardation (NHLBI), 326, 363
underdiagnosis of, 115 National Institute for Health and Clinical
MES (Bech-Rafaelsen Melancholia Scale), 92 Excellence (NICE) (U.K.), 42, 127b, 128,
Meta-regression, 130, 136, 355 137, 303–304, 308, 313, 362
MIDAS project (Rhode Island), 13–15, 14b, 207, National Institute of Mental Health (NIMH), 21, 39
210–211 National Institutes of Health (U.S.), 95
Mild depression, 5, 43, 60, 73, 338, 350 National Screening Committee (NSC) (U.K.)
M.I.N.I. (Mini-International Neuropsychiatric on depression and diabetes, 345
Interview), 20, 22, 166 on perinatal screening, 304–308, 305b,
Minor depression. See also specific diseases 306–307t, 311
criteria for, 12–13, 12t screening definitions, 30b
detection of, 43, 60, 71, 192 screening guidelines, 109–110, 110b,
diagnostic rates of, 16, 58, 163 124–125, 336
treatment for, 168 on treatment options, 343
Monothetic diagnostic checklists, 10 National Study of Medical Care Outcomes
Montgomery, S. A., 37 (MOS), 42. See also MOS 8-item
Montgomery-Åsberg Depression Rating Scale Depression Screener
(MADRS), 30–31, 35, 37–38, 213, 216, National Survey of Mental Health and Well-
256–257 Being (Australia), 50
INDEX 391

Natural disease entities, 4 in medical settings, 198


Negative predictive value (NPV), 29–30 PHQ-2
Negative Predictive Value, definition of, 102 acceptability of, 48
Neurological Disorders Depression Inventory for in perinatal settings, 304
Epilepsy (NDDI-E), 253 as short screener, 170–175, 267
Neurologic disorders, 241–258 in studies on diagnostic accuracy, 109
epilepsy, 249–255 PHQ-9
clinical manifestations, 250–253 as computerized tool, 144, 148
epidemiologic aspects, 249–250 for general emotional distress, 175
quality of life issues, 254–255 history of, 31, 42–43
screening instruments, 253–254 as self-rating instrument, 254
GDS and, 244–245 in studies on diagnostic accuracy, 16–17,
HAM-D and, 244–245, 248, 257 17t, 100–101
multiple sclerosis, 246–249 in studies on feasibility, 110–111
clinical manifestations, 247 treatment effectiveness and, 177
epidemiologic aspects, 246–247 in primary care, 198, 321, 351–352
quality of life issues, 249 Rasch analysis on, 88
screening instruments, 248–249 Patient-Reported Outcomes Measurement
Parkinson’s disease, 255–258 Information System (PROMIS), 95
clinical manifestations, 205, 256 PDSS (Postpartum Depression Screening Scale),
comparative studies on, 213, 216, 236 301, 303, 308, 362
quality of life issues, 257–258 Peri-ictal symptoms, 250
screening instruments, 44, 256–257, 361 Perinatal settings, 299–314
stroke, 242–246 comparison of methods, 305, 308–309, 309t
clinical manifestations, 243–244 EPDS and, 33, 301–304, 308–310, 314, 362
epidemiologic aspects, 242–243 guidelines and recommendations, 304–305,
PSD and, 246 305b, 306–307t
screening instruments, 106b, 244–246 implementation in practice, 310–311
Neuropsychiatric Inventory (NPI), 248 false screening outcomes, 311t
NHANES II study, 337 perinatal screening model, 312f
Nortriptyline, 343–344 refusal of treatment, 303, 310b
Not otherwise specific (NOS), 13 purpose of screening
in antenatal period, 301–302
Oncological settings. See Cancer care in postpartum period, 301
Operational Criteria Checklist (OPCRIT), 11b recommendations for, 313–314
screening practices in, 303–304
Parkinson’s disease (PD) service delivery and treatment implications,
depression in, 255–258 311, 313
clinical manifestations, 205, 256 Person-item map, 89f
comparative studies on, 213, 216, 236 Perth Community Stroke Study, 245
quality of life issues, 257–258 PHQ. See Patient Health Questionnaire
screening instruments, 44, 256–257, 361 Polydiagnostic Interview (PODI), 21
MADRS and, 213, 216, 256–257 Polythetic diagnostic checklists, 10
SCID and, 216 Positive predictive value (PPV), 30
SDS and, 44 Positive Predictive Value, definition of, 102
studies on, 205, 213, 216, 236 Positive utility index, 108
Partial Credit Model, 88 Postictal symptoms, 251–252
Partners in Care study, 128–130, 355 Postpartum Depression Screening Scale (PDSS),
Pathways study, 344–345, 363 301, 303, 308, 362
Patient Health Questionnaire (PHQ) Post-stroke depression (PSD). See also
in cancer care, 360 Cardiovascular care
item banking and, 94 clinical manifestations, 243–244
in cardiovascular care, 214, 324–329, 363 epidemiologic considerations, 242–243
in diabetes care, 340, 344–345 recovery and risks, 246
392 INDEX

Post-stroke depression (PSD) (continued ) PROSPECT trial, 167–168


screening instruments, 33, 106b, 244–246 PSE (Present State Examination)/SCAN,
studies on, 106–109, 211–213, 216 20–22, 93
Post-Stroke Depression Scale (PSDS), 245 Psychiatric Screening Questionnaire for Primary
Predictive Summary Index, definition of, 102 Care Patients (PSP) study, 42
Pregnancy. See Perinatal settings Psychological Problems in General Health Care
Preictal symptoms, 251 study (PPGHC)
Primacy effect, 116 on detection, 58, 60, 72t
Primary care, 161–184. See also Clinical Psychological Screen for Cancer (PSSCAN),
judgment 277–278
CIDI and, 163, 175, 183
detection in, 58–60, 59f QOLIE-89, 254–255
epidemiology of depression in, 162–164 Quality of life issues
population prevalence of, 162–163 in diabetes care, 338–339, 338–339t
primary care as de facto treatment system, in epilepsy, 254–255
163–164 in MS, 249
primary care prevalence of, 163 in PD, 257–258
unassisted recognition, 164
future developments, 182–184 Rasch analysis
GDS and, 169 on BDI, 38
HADS and, 40f on EPDS, 42, 91
importance of screening in, 165–168 on GDS, 41, 91, 93
acceptable diagnostic tools, 166–167, 167f on HAM-D, 37
cost-effectiveness of, 168 on SDS, 38
facility availability, 166 Rasch Model, mood scales, 86–95
as health problem, 165 assessment of, 87
history of disease, 167–168 clinical testing, 93
in latent stage, 166 computer-adaptive tests, 93–95
lifespan screening, 168 features of, 87–88
screening tools, 166 instruments based on, 91–93, 91t
treatable conditions, 165 item banking, 93–95
treatment policies, 168 mental health measures and, 88–91, 89–90f
M.I.N.I. and, 166 Rating Scale, 88
PHQ and, 198, 321, 351–352 Receiver operating characteristic (ROC)
screening strategy implementation, 177–182 analyses, 196, 267–268, 272, 275–276,
one-stage approach, 177–182, 178t 324
time burden, 181–182, 181t Refusal of treatment, 303, 310b
two-stage approach, 178–182, 179–180t Renard Diagnostic Instruments, 21
screening tools, 169–177, 170–171t, Representativeness heuristic, 117
173–174t, 176t Research Diagnostic Criteria (RDC), 10,
for general emotional distress, 175 20–21, 209
for multiple disorders, 175, 177 Roter Interactional Analysis System, 68–69, 70
one-stage approach, 172, 175 Routine screening in clinical practice, 349–366
severity ratings, 177 Babaei on, 356, 358
short screeners, 169, 172 Barton on, 361, 362
standard screeners, 169 Beck on, 354
ultra-short/brief screeners, 172 Bermejo on, 351
unstructured clinician diagnosis in, 16t Boyce on, 361, 362
Primary Care Evaluation of Mental Disorders Carlson on, 360
(PRIME-MD), 42, 183, 197–198, 353. See de Kok on, 361
also Patient Health Questionnaire Dwight-Johnson on, 361
Primary care physician (PCP). See Primary care Garssen on, 361
Problem Areas in Diabetes Questionnaire Gibbons on, 354
(PAID), 340, 341 Gilbody on, 354
INDEX 393

Hermanns on, 362–363 screening procedures


Hyett on, 356–357 definition of, 29–30
Jacobsen on, 360 sensitivity to change of mood and, 35–36, 36f
Kanner on, 359–360 severity scales, 33–34
Katon on, 350, 363 short versions of rating scales, 48b
Kessler on, 351–352 special population scales, 32b
Kulzer on, 362–363 Schedule for Affective Disorders and
Magruder on, 356 Schizophrenia (SADS), 20, 208
Mitchell on, 350–351, 352–353, 356, 358, Schedule for Affective Disorders and
360–361 Schizophrenia-Lifetime (SADS-L), 16
Parker on, 356–357 Schedules for Clinical Assessment in
Patten on, 351 Neuropsychiatry (SCAN), 16, 44
puffer phenomenon and, 364–366 SCID (SCID). See Structured Clinical Interview
Ransom on, 360 for DSM Disorders
Rogers on, 355–356 SCL. See Hopkins Symptom Checklist
Smith on, 354 SCL-90-R (Symptom Checklist-90), 272
Strong on, 361 Screening, definition of, 29–30
Thombs on, 363 Screening algorithms. See Algorithms
UPSTF report and, 354–355 Screening-detection-treatment-improvement
Valenstein on, 353 paradigm, 124, 137
Von Korff on, 354 Screening procedures, definition of, 29–30
Wang on, 351–352 SDDS-PC. See Symptom-Driven Diagnostic
Wells on, 355 System for Primary Care
Yeager on, 356 SDMT (Symbol Digit Modalities Test), 248
Ziegelstein on, 363 SDS. See Zung Self-Rating Depression Scale
Zimmerman on, 350–351 Self-reporting
efficacy of, 95, 267, 352
SADQ (Stroke Aphasic Depression influences on, 66–68, 325
Questionnaire), 33, 245–246 by medically ill patients, 197t
Scales and tools, 29–50. See also specific scales patient-rated scales, 34–35, 34b, 38, 40–41,
and tools 242, 267
classic Severity Scales recognition rate and, 71
BDI, 37–38 recommendations on, 127b
CES-D, 39 studies on computerization of, 276–277
HAM-D, 36–37 studies on screening instruments, 198, 248–249
MADRS, 37 Sensitivity (Se), definition of, 102
SDS, 38 Sensitivity to change, 35–36
cutoff scores for varying severity, 45t Severity assessment, definition of, 30
future developments Severity Scales (1960-1980). See also specific
accuracy comparison, 49t scales and tools
improving acceptability, 44, 48–49 BDI, 37–38
improving accuracy, 49–50 CES-D, 39
generic scales, 32b HAM-D, 36–37
history of, 30–32 MADRS, 37
new Severity Scales SDS, 38
EPDS, 41–42 Severity Scales (1981-2008). See also specific
GDS, 40–41 scales and tools
HADS, 39–40 EPDS, 41–42
MDI, 43–44 GDS, 40–41
MOS, 42 HADS, 39–40
PHQ, 42–43 MDI, 43–44
patient-rated versus clinician-rated scales, MOS, 42
34–35 PHQ, 42–43
scale properties, 46–47t Severity scales, limitations of, 33–34
394 INDEX

Software, 21, 22, 151–152f Suicide


Somatic symptoms, 203–236 bipolar disorders and, 252
comorbid depression and, 211–217 clinical judgment and, 114–115
comparative studies on, 218–235t epilepsy and, 250, 252, 253
healthy controls versus, 214 MIDAS project and, 13–14, 14b
noncomparative studies on, 216–217 MS and, 247
physical illness alone versus, 214–215, Parkinson’s disease and, 256
215f physical disease and, 205, 206f, 207
primary versus secondary depression, under/over detection of risk, 57, 70, 74
211–214, 212f Summary receiver operator characteristic curve
definition of, 205–209 (sROC), 104
diagnostic systems and scales in, 208–209 Symbol Digit Modalities Test (SDMT), 248
depression in physical disease, 203–205 Symptom Checklist-90 (SCL-90-R), 272
inter-rater reliability and, 207b Symptom-Driven Diagnostic System for Primary
suicide risk and, 205, 206f, 207 Care (SDDS-PC), 44, 48, 166, 172,
diagnostic accuracy, 209–211 175–176, 183
approaches to, 210b, 212f
presentation rates of, 67 Targeted (high-risk) case-finding, 30
screening implications, 217, 236 Technological approaches, 143–154
Special population scales, 32b computerization issues, 147–150
Specificity (Sp), definition of, 102 acceptability, 149
Specificity to change, 35 availability, 150
Spielberger State-Trait Anxiety Inventory embedding in systems, 150
(STAI), 94 error control, 147–148
Standardized mortality rate (SMR), 253 honesty, 148
STAR*D, 213–214 performance, 148
Stroke. See also Post-stroke depression (PSD) physical clues, 148
depression and, 205, 217, 242–246 price, 149–150
clinical manifestations, 243–244 quality control and accuracy, 147
epidemiologic aspects, 242–244 workload considerations, 148–149
screening instruments, 33, 92, 217, 244–246 implementation of computerized screening,
studies on, 217, 359 150–152
Stroke Aphasic Depression Questionnaire methods of, 144–146, 145–146t
(SADQ), 33, 245–246 Telephone technology in assessment, 39, 40, 130,
Strong, V., 361 144, 147, 149, 152, 153, 278, 355–356
Structured Clinical Interview for DSM Disorders Thermometers
(SCID) distress assessment, 272–276, 273–274t
in cancer care, 266–267 emotional assessment, 49, 275–276, 383f
in cardiovascular care, 217, 244–245 Thombs, B. D., 358, 363
in detection, 64 Treatment refusal, 303, 310b
explanation of, 43 Tversky, Amos, 117
MS and, 249
PD and, 216 Unassisted diagnosis. See Unstructured clinician
Rasch analysis on, 92–93 diagnosis
in studies on diagnostic accuracy, 17–22, Unemployed people, detection and, 62–63
19t, 23f Unified Parkinson’s Disease Rating Scale
Sub-major depression, 163, 167–168 (UPDRS), 257
Substitutive approach United States
characteristics of, 196–194 economic costs of depression in, 123–124, 165
definition of, 210b mandated screening programs in, 42
somatic symptoms and, 266–267, 357 NCCN member survey in, 294
Subsyndromal depression, 167, 192, 236, 320, PCPS and detection in, 71
328, 337, 339 prevalence rates in, 21
Sufficient diagnostic checklists, 10 time spent per patient visit in, 162
INDEX 395

Unstructured clinician diagnosis Distress Thermometers and, 272, 275


diagnostic accuracy EPDS and, 41–42
in primary care, 16t during pregnancy, 358–359, 361–362
of psychiatrists, 17t, 19t Work and Health Initiative depression software,
in routine diagnoses, 15–19 151–152f
U.S. Preventative Services Task Force (USPSTF) World Bank, 123
guidelines for cardiovascular care, 328–329, World Health Organization (WHO). See also
363–364 Composite International Diagnostic
guidelines for enhanced care, 128–130, 136, 137 Interview (CIDI)
quality improvement in care, 354–355 analytic principles of, 124–125
on screening in primary care settings, 125–126, criteria for screening programs, 165–168
127b, 177, 182–183, 326 on mental disorders, 11
Psychological Problems in General Health
Validus (validity), 8–9 Care study, 58
Virginia Twin Registry, 13, 92–93 SCAN instrument, 21, 93
study on mental disorders in primary care, 163,
Wakefield Depression Inventory, 246 164, 354
Well-Being Scale (WHO-5), 175, 198, 340, 341 two-item scales, 267
Whites (U.K.) Well-Being Scale (WHO-5), 175, 198,
detection and, 64 340, 341
Whites (U.S.) World Mental Health Survey, 310–311
readiness to disclose, 67
Willis, Thomas, 335 Youden’s J
Women. See also Perinatal settings in accuracy testing, 102, 104
breast cancer and screening, 268, 278
collaborative care and, 361 Zung Self-Rating Depression Scale (SDS), 31,
confiding in PCPs, 68 38–39, 44, 64, 91, 144, 169, 177,
detection and, 63 208–209, 244–245

You might also like