0% found this document useful (0 votes)

36 views414 pages

Screening For Depression in Clinical Practice An Evidence-Based Guide

The document discusses the importance of screening for depression in clinical practice, highlighting the challenges in detection and the evolving nature of treatment. It emphasizes the need for accurate diagnostic tools and the integration of screening into enhanced care models to improve patient outcomes. The material serves as an evidence-based guide for clinicians, detailing various aspects of depression diagnosis and management across different medical settings.

Uploaded by

Sofia Pádua Marcelino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views414 pages

Screening For Depression in Clinical Practice An Evidence-Based Guide

Uploaded by

Sofia Pádua Marcelino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 414

Screening for Depression in Clinical Practice

This material is not intended to be, and should not be considered, a

substitute for medical or other professional advice. Treatment for the
conditions described in this material is highly dependent on the individual
circumstances. While this material is designed to offer accurate information
with respect to the subject matter covered and to be current as of the time it
was written, research and knowledge about medical and health issues is
constantly evolving, and dose schedules for medications are being revised
continually, with new side effects recognized and accounted for regularly.
Readers must therefore always check the product information and clinical
procedures with the most up-to-date published product information and data
sheets provided by the manufacturers and the most recent codes of conduct
and safety regulation. Oxford University Press and the authors make no
representations or warranties to readers, express or implied, as to the accu-
racy or completeness of this material, including without limitation that they
make no representations or warranties as to the accuracy or efficacy of the
drug dosages mentioned in the material. The authors and the publishers do not
accept, and expressly disclaim, any responsibility for any liability, loss, or
risk that may be claimed or incurred as a consequence of the use and/or
application of any of the contents of this material.
SCREENING FOR
DEPRESSION IN
CLINICAL
PRACTICE
An Evidence-Based Guide

ALEX J. MITCHELL, MRCPsych

Consultant and Honorary Senior Lecturer, Department of Liaison
Psychiatry, Leicester General Hospital and University of Leicester, UK

JAMES C. COYNE, PhD

Professor of Psychology, Department of Psychiatry,
University of Pennsylvania Health System

1
2010
1
Oxford University Press, Inc., publishes works that further
Oxford University’s objective of excellence
in research, scholarship, and education.

Oxford New York

Auckland Cape Town Dar es Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto

With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam

Copyright 2010 by Oxford University Press, Inc.

Published by Oxford University Press, Inc.

198 Madison Avenue, New York, New York 10016

www.oup.com

Oxford is a registered trademark of Oxford University Press.

All rights reserved. No part of this publication may be reproduced,

stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying, recording, or otherwise,
without the prior permission of Oxford University Press.

Mitchell, Alex J.
Screening for depression in clinical practice: an evidence-based guide / by Alex J. Mitchell,
James C. Coyne.
p. ; cm.
Includes bibliographical references and index.
ISBN 978-0-19-538019-4
1. Depression, Mental—Diagnosis. 2. Primary care (Medicine)
I. Coyne, James C., 1947– II. Title.
[DNLM: 1. Depressive Disorder—diagnosis. 2. Primary Health Care. WM 171 C881s 2009]
RC537.M5625 2009
616.850 27075—dc22
2009007863

9 8 7 6 5 4 3 2 1

Printed in the United States of America

on acid-free paper
Contents

List of Contributors, xi
Preface, xv
Wayne Katon

1. Is the Syndrome of Depression a Valid Concept?, 3

Alex J. Mitchell and Mark Zimmerman
What is Meant by Depression?, 3
Value and Validity of the Syndrome Concept, 7
Diagnostic Checklists (including DSM and ICD), 10
Unstructured (Unassisted) Clinician Diagnosis, 15
Structured and Semi-Structured Assisted Diagnostic
Interviews, 19
Conclusion, 22
References, 24

2. Overview of Depression Scales and Tools, 29

Alex J. Mitchell
Background, 29
The Classic Severity Scales (1960–1980), 36
The New Severity Scales (1981–2008), 39
The Future of Screening Scales, 44
References, 51

3. Why Do Clinicians Have Difficulty Detecting

Depression?, 57
Alex J. Mitchell
Introduction to the Problem of Over- and Under-Detection, 57
Predictors of Detection, 62

v
vi CONTENTS

Patient and Clinician Influences on Detection, 66

Illness-Related Influences on Detection, 71
Conclusions, 74
References, 75

4. How Can Existing Mood Scales Be Improved? How to Test,

Refine, and Improve Existing Scales, 83
Adam B. Smith
Introduction, 83
The Rasch Model and Other Item Response Models, 86
Conclusion, 95
References, 96

5. How Do We Know When a Screening Test is Clinically

Useful?, 99
Alex J. Mitchell
How Do Clinicians Make a Diagnosis?, 99
Scientific Aspects of Diagnostic Accuracy, 103
Clinical Aspects of Diagnostic Accuracy, 105
Testing Screening via Implementation Studies, 109
Conclusions, 111
References, 111

6. Clinical Judgment and the Influence of Screening on

Decision Making, 113
Howard N. Garb
Introduction, 113
Research on Clinical Judgment, 114
The Limits of Screening, 119
References, 120

7. Implementing Screening as Part of Enhanced Care:

Screening Alone is Not Enough, 123
Simon Gilbody and Dan Beck
The Case for Screening, 123
Screening and Enhanced Care for Depression, 128
New and Additional Evidence Relating to Enhanced Care, 128
Is Screening a Necessary Intervention to Improve the Quality and
Outcome of Care?, 129
To Screen or Not to Screen?, 136
References, 137
CONTENTS vii

8. Technological Approaches to Screening and Case

Finding for Depression, 143
William H. Rogers, Debra Lerner, and David A. Adler
Technological Methods of Screening for Depression, 144
Ten Issues When Developing Computerized Screening for
Depression, 147
Examples of Implementation of Computerized
Screening for Depression, 150
Discussion, 153
Conclusion, 154
References, 154

9. Screening for Depression in Primary Care: Can It

Become More Efficient?, 161
Kathryn M. Magruder and Derik E. Yeager
Introduction, 161
Epidemiology of Depression in Primary Care, 162
Is Screening for Depression in Primary Care Worthwhile?, 165
Which Screening Tool Should Be Used?, 169
Implementing Screening in Primary Care, 178
What Developments Are on the Horizon?, 183
Conclusions, 185
References, 185

10. Screening for Depression in Medical Settings: Are Specific

Scales Useful?, 191
Gordon Parker and Matthew Hyett
An Introductory Logic, 191
Depression in the Medically Ill, 192
‘‘False-Positive’’ Depression Reflecting Confounding by Physical
Symptoms Associated with Medical Illness, 193
Screening Measures Used to Assess Depression in the
Medically Ill, 194
Discussion, 198
References, 199

11. Screening for Depression in Medical Settings:

The Case Against Specific Scales, 203
Fariba Babaei and Alex J. Mitchell
Overview of Depression in Physical Disease, 203
Defining Somatic Symptoms, 205
viii CONTENTS

Diagnostic Accuracy of Somatic Symptoms in

Depression, 209
Evidence For and Against Somatic Symptoms when Diagnosing
Comorbid Depression, 211
Implications for Screening, 217
References, 236

12. Screening for Depression in Neurologic Disorders, 241

Andres M. Kanner
Depression in Stroke, 242
Depression in Multiple Sclerosis, 246
Depression in Epilepsy, 249
Depression in Parkinson’s Disease, 255
Conclusions, 258
References, 258

13. Screening for Depression in Cancer Care, 265

Linda E. Carlson, Sheena K. Clifford, Shannon L. Groff,
Olga Maciejewski, and Barry D. Bultz
Prevalence of Depression in Cancer Care, 265
Screening Methods for Depression, 266
Screening for Depression in Oncology, 267
Implementing Screening Programs in Oncology Settings, 276
Special Issues in Screening Cancer Patients, 292
Summary, Integration, Future Directions, 293
Acknowledgments, 294
References, 295

14. Screening for Depression in Perinatal Settings, 299

Jodi Barton and Philip Boyce
Introduction: Perinatal Screening in Context, 299
Why Screen, and What Are We Screening For?, 301
Screening Practices in Perinatal Settings, 303
Screening Guidelines and Recommendations, 304
Evidence-Based Comparison of Screening Methods, 305
Implementation in Practice: Does Screening Make any
Real-World Difference?, 310
Service Delivery and Treatment Implications, 311
Summary and Key Recommendations, 313
References, 314
CONTENTS ix

15. Screening in Cardiovascular Care, 317

Brett D. Thombs and Roy C. Ziegelstein
Depression in Cardiovascular Disease, 318
The Prevalence of Depression in Cardiovascular Disease, 319
Screening Instruments for Depression in Cardiovascular Care, 320
Recommendations for Evaluation and Treatment of Patients
in Cardiovascular Care, 326
Conclusions, 328
References, 329

16. Screening in Diabetes Care: Detecting and Managing

Depression in Diabetes, 335
Norbert Hermanns and Bernhard Kulzer
Depression in Diabetes is a Major Health Problem, 337
Screening Tests, 340
Treatment Options, 343
Screening Program, 344
Conclusions for Clinical Practice, 345
References, 346

17. Commentary and Integration: Is it Time to Routinely

Screen for Depression in Clinical Practice?, 349
James C. Coyne
Integration: Deflating the Puffer Phenomenon and Making
the Case Against Screening, 364
References, 366

Appendix, 371
Index, 385
This page intentionally left blank
List of Contributors

David Adler, Professor of Psychiatry and Medicine, Tufts University School

of Medicine, and Senior Psychiatrist, Department of Psychiatry and ICRHPS,
Tufts Medical Center

Fariba Babaei, Specialist Trainee in Psychiatry, Lincolnshire Partnership

Trust, Grantham, UK

Jodi Barton, Research Co-ordinator, Westmead Perinatal Psychiatry &

Clinical Research Unit, Westmead Hospital

Dan Beck, Research Fellow, Department of Health Sciences, University of

York, UK
Philip Boyce, Professor of Psychiatry, Department of Psychological Medicine,
University of Sydney, Westmead Hospital
Barry D. Bultz, Director, Department of Psychosocial Resources, Tom Baker
Cancer Centre, and Head and Adjunct Professor, Division of Psychosocial
Oncology, Department of Oncology, Faculty of Medicine, University of Calgary,
Calgary, Alberta, Canada

Linda E. Carlson, Enbridge Research Chair in Psychosocial Oncology,

Associate Professor, Division of Psychosocial Oncology, Department of
Oncology, Faculty of Medicine, University of Calgary, and Clinical
Psychologist, Tom Baker Cancer Centre, Calgary, Alberta, Canada
Sheena K. Clifford, Department of Psychosocial Resources, Tom Baker
Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary,
Alberta, Canada

xi
xii LIST OF CONTRIBUTORS

James C. Coyne, Director, Behavioral Oncology Program, Abramson Cancer

Center, and Professor of Psychology, Department of Psychiatry, University of
Pennsylvania School of Medicine

Howard N. Garb, Lackland Air Force Base

Simon Gilbody, Professor of Psychological Medicine and Health Services

Research, Department of Health Sciences, University of York, UK

Shannon L. Groff, Department of Psychosocial Resources, Tom Baker

Cancer Centre, Alberta Cancer Board
Norbert Hermanns, Head of the Research Institute of the Diabetes Academy
Mergentheim

Matthew Hyett, Research Assistant, Black Dog Institute, Sydney, Australia

Andres M. Kanner, Department of Neurological Sciences, Rush
Medical College, Rush Epilepsy Center, Rush University Medical Center,
Chicago, IL

Wayne Katon, Professor and Vice Chair of Psychiatry and Behavioral

Sciences, Director of Division of Health Services and Epidemiology,
University of Washington Medical School, Seattle, WA
Bernhard Kulzer, Head of the Psychosocial Department of the Diabetes
Centre Mergentheim
Debra Lerner, Associate Professor of Medicine and Psychiatry, Tufts
University School of Medicine (TUSM), and Senior Researcher, ICRHPS,
Tufts Medical Center.

Olga Maciejewski, Department of Psychosocial Resources, Tom Baker

Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary,
Alberta, Canada
Kathryn M. Magruder, Veterans Administration Medical Center,
Charleston, SC, and Department of Psychiatry and Behavioral Sciences,
Medical University of South Carolina, Charleston, SC
Alex J. Mitchell, Consultant in Liaison Psychiatry, Leicester General Hospital,
Leicester, and Honorary Senior Lecturer in Liaison Psychiatry, Department of
Cancer & Molecular Medicine, Leicester Royal Infirmary, UK
Gordon Parker, Scientia Professor, School of Psychiatry, University of
New South Wales, Sydney, Australia, Executive Director, Black Dog
Institute
LIST OF CONTRIBUTORS xiii

William Rogers, Senior Statistician, Institute of Clinical Research and Health

Policy Studies (ICRHPS), Tufts Medical Center
Adam B. Smith, Lecturer in Quantitative Methods, Centre for Health and
Social Care, Leeds Institute of Health Sciences, University of Leeds, UK.
Brett D. Thombs, Department of Psychiatry, McGill University and Jewish
General Hospital, Montreal, Quebec

Derik E. Yeager, Department of Biometry, Biostatistics, and Epidemiology,

Medical University of South Carolina, Charleston, SC
Roy C. Ziegelstein, Department of Medicine, Johns Hopkins University
School of Medicine, Baltimore, MD

Mark Zimmerman, Department of Psychiatry and Human Behavior, Brown

University School of Medicine, Rhode Island Hospital, Providence, RI
This page intentionally left blank
Preface

Researchers became interested in screening patients for depression in primary

care in the early 1980s because of evidence of poor recognition of depression
by primary care physicians and gaps in adequacy of treatment.1 Because of
extensive epidemiologic research as well as the development of antidepressant
medications that have fewer side effects and evidence-based brief therapies,
recognition rates of depression by primary care physicians have improved over
the past two decades, with recent studies suggesting that as many as 50% to
65% of patients are accurately diagnosed.2 Most studies also show that greater
severity of depression and increased functional impairment are associated with
higher rates of recognition.3 A study by Rost and colleagues that examined
recognition rates over a 6-month period rather than for just one visit also found
higher rates of accurate diagnosis by primary care physicians.4 This latter study
is important because primary care physicians often make diagnoses over time
as they work up patients over several visits.
Studies have also shown that a much higher percentage of patients in primary
care are exposed to antidepressant medications compared to two decades ago.5
However, there are many remaining gaps in the quality of care for depression in
primary care: only 20% of patients receive the Health Employer Data and
Information Set (HEDIS)-recommended three or more visits in the first 90
days after starting an antidepressant and only 40% to 50% remain on medication
at 6 months.6 Over the past 20 years (from the tricyclic era to the selective
serotonin reuptake inhibitor era), studies consistently report that only 40% of
patients started on antidepressants for major depression recover (a greater than
50% decrease in symptoms) by 4 to 6 months.7 Less than 10% of patients with
major depression in primary care receive evidence-based psychotherapy.5 There
is clearly room for improvement of quality of care in patients with major
depression from screening to improved detection, to healthcare models that
provide enhanced exposure to evidence-based treatments.

xv
xvi PREFACE

One of the unexpected findings of increased interest by primary care

physicians in the detection and treatment of patients with depression is that
approximately half of patients started on medication for depression actually
meet DSM-IV criteria for minor depression.8 This is important because anti-
depressant-versus-placebo trials have generally shown high rates of placebo
response in patients with minor depression and lack of active drug-versus-
placebo differences.9 Screening for depression may actually increase the
number of patients with minor depression who are potentially treated because
many patients cluster around the DSM-IV diagnostic threshold and, depending
on the stressful life events of the past few days, may or may not meet criteria for
major depressive disorder. Patients with minor depression or adjustment reac-
tions to stressful life events must be distinguished from those with a history of
major depression who have significant residual symptoms necessitating active
treatment. For patients who have mild major depression, brief counseling,
watchful waiting, and rescreening them for depression 2 to 4 weeks later
may allow better recognition of whether the patient needs treatment with
medication or psychotherapy.
If screening of depression is to be integrated into primary care, healthcare
organizations are faced with the decision about which screening tool is
optimal. Primary care organizations, the American Psychiatric Association,
and many research foundations have recommended the use of the Patient
Health Questionnaire (PHQ-9) as the optimal depression screening tool in
primary care. The PHQ-9 has the advantage of being able to help measure
the severity of depression (0 to 27 is the severity range of this tool) and, at a
score of above 10, has high sensitivity and specificity compared to structured
psychiatric interviews for the diagnosis of major depression.10
The U.S. Preventive Services Task Force recommended routine depression
screening in primary care in systems that have been reorganized to provide
effective treatment for depression.11 This reflects the fact that studies that
tested depression screening alone showed mild to modest improvement in
the quality of depression treatment provided, but generally no effect on depres-
sion outcomes.12 What do we know about methods to organize care to improve
outcomes of depression?
Although screening for depression alone has not been shown to improve
outcomes, when screening is paired with an organized system of depression
care, multiple studies have shown that depression outcomes can be
improved.13 The chapter by Gilbody reviews the recent meta-analysis of
an intervention called ‘‘collaborative care.’’ A total of 37 randomized trials
that compared collaborative versus usual primary care found that collabora-
tive care was associated with a twofold increase in adherence to antidepres-
sant medication and improvements in depression that lasted 2 to 5 years. 13
The key elements of the most successful collaborative care interventions
PREFACE xvii

included two core components. The first component incorporates a depres-

sion care manager who improves patient education and, with telephone and/
or in-person frequent contacts, tracks depressive symptoms, side effects,
and adherence to treatment.14 The care manager facilitates return appoint-
ments with the primary care doctor or, in some instances, a mental health
specialty referral for patients with persistent symptoms, problematic side
effects, or poor adherence.14 The second crucial component is supervision
of the case manager by a psychiatrist who recommends changes in medica-
tion based on clinical response and side effects. Many recent collaborative
care trials also have used psychologists’ skills to teach care managers
motivational interviewing techniques and brief, evidence-based psy-
chotherapies such as problem-solving therapy.15
In summary, this excellent book summarizes two decades of research on
depression screening and quality-improvement efforts in primary care. We
now have state-of-the-art depression screening tools, and research studies have
shown that pairing depression screening with evidence-based models that
enhance exposure to antidepressant medication and evidence-based psy-
chotherapies can markedly improve depression outcomes for patients with
major depression.

Wayne Katon

References
1. Zung WW, Magill M, Moore JT, et al. Recognition and treatment of depression in a
family medicine practice. J Clin Psychiatry. 1983;44:3–6.
2. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based
sample of patients with diabetes and major depression. Med Care. 2004;42:1222–1229.
3. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care
physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12.
4. Rost K, Zhang ML, et al. Persisently poor outcomes of undetected major depression in
primary care. Gen Hosp Psychiatry. 1998;20(1):12–20.
5. Olfson M, Marcus SC, Druss B, et al. National trends in the outpatient treatment of
depression. JAMA. 2002;287:203–209.
6. Druss BG, Miller CL, Rosenheck RA, et al. Mental health care quality under managed
care in the United States: a view from the Health Employer Data and Information Set
(HEDIS). Am J Psychiatry. 2002;159:860–862.
7. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in
primary care. Gen Hosp Psychiatry. 2002;24:213–224.
8. Katon W, Von Korff M, Lin E, et al. Collaborative management to achieve treatment
guidelines. Impact on depression in primary care. JAMA. 1995;273:1026–1031.
9. Barrett JE, Williams JW, Jr., Oxman TE, et al. Treatment of dysthymia and minor
depression in primary care: a randomized trial in patients aged 18 to 59 years. J Fam
Pract. 2001;50:405–412.
10. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity
measure. J Gen Intern Med. 2001;16:606–613.
xviii PREFACE

11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
12. Katon W, Gonzales J. A review of randomized trials of psychiatric consultation-liaison
studies in primary care. Psychosomatics. 1994;35:268–278.
13. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a systematic
review and cumulative meta-anlysis. Arch Intern Med. 2006;166:2314–2321.
14. Katon W, Unutzer J. Collaborative care models for depression: time to move from
evidence to practice. Arch Intern Med. 2006;166:2304–2306.
15. Unützer J, Katon W, Callahan CM, et al. Collaborative care management of late-life
depression in the primary care setting: a randomized controlled trial. JAMA.
2002;288:2836–2845.
Screening for Depression in Clinical Practice
This page intentionally left blank
1
IS THE SYNDROME OF DEPRESSION
A VALID CONCEPT?

Alex J. Mitchell and Mark Zimmerman

1. What is Meant by Depression?

2. Value and Validity of the Syndrome Concept
3. Diagnostic Checklists (including DSM and ICD)
4. Unstructured (Unassisted) Clinician Diagnosis
5. Structured and Semi-Structured Assisted Diagnostic Interviews
6. Conclusion

Context
Depression is an everyday term, but if clinical management is to be empirically
based, there needs to be a valid and reliable definition of the disorder that is
distinct from normal sadness. The validity of the concept and all studies of
screening for depression are hampered by the absence of a gold standard.
Nevertheless, various thorough methods of assessment may help to improve the
clinical utility of our concept of depression.

1. What is Meant by Depression?

This book is built around the premise that major depressive disorder (MDD)
exists in a way that is recognizable time and again by clinicians around the
world. Considerable effort has been expended in developing and refining
methods to measure depression. This chapter takes a step back and asks
whether this effort is built upon a solid foundation. This begins with an
important question: What is the purpose of making a meaningful diagnosis
in any field of medicine? We suggest it is primarily to gain consensus and

3
4 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.1. Levels of Diagnostic Certainty in Psychiatry

Highest
Externally validated by ‘‘perfect’’ biological test
High
Consensus expert panel performing longitudinal evaluation using all possible
data
Medium to High
Structured or semi-structured interview performed by a trained interviewer or
clinician
Low to High
Severity questionnaires rated by the patient or clinician
Low to Medium
Unstructured, unassisted interview performed by an interested clinician
Low
Unstructured, unassisted interview performed by an inexperienced (or
uninterested) clinician

knowledge that may help individuals and populations who have health-
related ‘‘meetable unmet needs.’’ A medical diagnosis (spurious or not)
has several other benefits (Textbox 1.1). It facilitates agreement with col-
leagues, it lends confidence to patients, it adds legitimacy to treatments, and
it may allow the development of targeted interventions. Because many
conditions can be successfully treated without knowing the true etiology
or the precise diagnosis, the lack of gold standard should not be a cause of
therapeutic nihilism. Consider neurologists attempting to treat a midlife
inherited chorea in 1862. Meticulous clinical method could bring some
success despite the absence of a name and a description for another 10
years and the absence of a known etiology for another 110 years. Although
many early treatments were based largely on placebo effects or environ-
mental manipulation, once a definitive cause is found and the pathophysio-
logic mechanism is revealed, the potential for treatment becomes vast,
whereas once it was small.
Yet there is an even more fundamental issue. Kraepelin believed the
major psychiatric disorders were ‘‘natural disease entities’’ simply
awaiting a discovery of a specific medical cause. After intensive effort
the search for fundamental causes was resigned and nosology underwritten
by internal cohesion of symptoms and signs.1 What if depression has no
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 5

single pathophysiologic explanation and is a complex manifestation of

severe external stress?2 Would our concept be invalid and would existing
treatment be rendered obsolete overnight? Similarly, if severe stress and
mild depression were closely related, then attempting to find a test that
separated them would be difficult to the point of impossibility (Fig. 1.1).
After many decades of debate, it is not at all clear that depression is a
discrete entity and justifies a categorical classification as opposed to a
continuum merging with normal healthy but unhappy people.3 In the
continuum argument, the distribution of symptoms of depression would
theoretically approximate to a skewed normal or half-normal distribution
with no point of rarity (Fig. 1.2).4 Cloninger stated that there is no
empirical evidence for natural boundaries between major syndromes and
that ‘‘no one has ever found a set of symptoms, signs, or tests that separate
mental disorders fully into non-overlapping categories.’’5 Yet all current
diagnostic systems that include MDD appear to assume there is a distinct
syndrome (depressive disorder as distinct from depressive symptoms) and
try to suggest an optimal method to identify it (Fig. 1.3). Even if this
approach was correct and the current nosology of DSM-IV entirely per-
fect, there would be a significant danger of over relying on the concept of
MDD to exclusion of other under researched forms. In other words, given

Point of Partial Rarity

Number
of Normal Stress
Individuals

Depressed

True –ve

True +ve

False –ve False +ve

Score on Hypothetical Diagnostic Test

Optimum Cut-off value

Figure 1.1. Hypothetical distribution of test scores in two related conditions. Two distinct
conditions should be separated by a point of rarity on at least one fundamental measure
(see also Fig. AP.4).
Distribution of HADS Scores in Cancer
Outpatients (n=3071)
3000

2500

2000

1500

1000

500

n
en

en
ur en

nt n
Th lve
Tw en

Se Six n
n
e

t
ro

e
ur
ne

Th o

El n
Se ix

ee
ve tee
e
ve
re

in
Tw

Te
S
Ze

te
Fo irte
ev

fte
Fi
O

e
N
Ei

gh
Fi

Ei
16

0
en two

y- o
Tw Tw en
Si en
Fo elve

Ei een

Tw ent ur
ty ix

x
irt r
ro

ht
Tw en

Tw ty- y
ur

irt irty

Th Fou
gh

Th -Tw

Si
en nt
Si
Tw

en -s
o

ig
Ze

te
te
T

y-
e

Tw ty-f

Th Th
Ei

y
xt

-E
gh
ur

y
irt

Figure 1.2. Distribution of HADS scores in cancer outpatients (n ¼ 3,071). This

continuous distribution of HADS scores in primary care and secondary (cancer care)
illustrates a skewed normal distribution. Data from Thompson et al. Br J Psychiatry.
2001;179:317–323 and Sharpe et al. Br J Cancer. 2004;90:314–320.

6
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 7

Distribution of DSMIV Symptoms of Depression in Zurich Study

100
90
80
70
60
50
40
30
20
10
0
Zero One Two Three Four Five Six Seven Eight Nine

Figure 1.3. Distribution of DSM-IV symptoms from Zurich study. The sample comprised
591 individuals originally selected in 1978 from the total population of 18- and 19-year-olds
in Zurich, Switzerland, based on their scores of the Symptom Checklist-90 (SCL-90-R)
(Derogatis, 1977). Two thirds of the sample was randomly selected from members of the
total population who scored above the 85th percentile on the SCL-90-R, and one third was
randomly selected from the remainder of the total population. Reprinted from Journal of
Affective Disorders 62, Angst J, Merikangas KR, Multi-dimensional criteria for the
diagnosis of depression, 7–15, Copyright (2001).

recent evidence, psychiatrists would be well advised to pay as much

attention to minor (mild and syndromal) disorders as diabetologists are
now paying to impaired glucose tolerance.6

2. Value and Validity of the Syndrome Concept

The concept of a syndrome is fundamental to diagnostic classification and may
be valuable even if imperfect.7 Without the concept of a syndrome, a disorder
would be defined by a single symptom or simple symptom count. A syndrome
is a special collection of symptoms that cluster in a peculiar way determined by
the underlying pathophysiology, even if that mechanism is unknown. Careful
identification of many psychiatric syndromes and their relationships has
formed a detailed family of mental disorders not dissimilar to the Linnaean
taxonomy proposed by Carl Linnaeus (1707–1778).
In defining clinical syndromes, we rely on certain essential or core
symptoms occurring commonly in those with the disorder but rarely in
those without (Textbox 1.2). By the same token, we often ignore other
symptoms that occur without much discrimination. Hence, some symptoms
8 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.2. Types of Validity Testing

Content validity (Strength: Weak)

The degree of measurement of all fields of interest
Criterion validity (Strength: Strong)
Agreement against a criterion that is external to the measuring instrument
itself
Construct validity (Strength: Moderate)
Agreement with other measures consistent with theoretically derived
hypotheses
Procedural validity Strength: Weak)
Agreement with an existing procedure

are more important diagnostically than others, but without large samples and
rigorous examination, it isn’t obvious which ones these are. Further, life is
rarely simple and rarely is any symptom both entirely unique to a psychiatric
disorder and at the same time always manifest. If it were, then when this
particular symptom was absent, we would know the disorder itself was impos-
sible. We would therefore have a single question diagnostic test with perfect
specificity (see Chapter 5). In MDD, DSM-IV suggests that the core features
involve dysphoria (low mood) and anhedonia (loss of interest), and ICD-10
suggests that fatigue should also be an essential feature.8 In addition to these
symptoms, aspects such as clinical significance, duration, disability, and dis-
tress have been added as a requirement in many diagnostic categories. We
suggest it is no longer sufficient for an expert panel to mandate such features,
no matter how logical it seems, because their predictive values will be uncer-
tain until tested. In fact, all aspects of a definition (the symptoms, signs,
associated features, and rules binding them together) should be amenable to
clarification and empiric testing. If a syndrome is adopted too easily, the
concept can become a pitfall, as Kendell and Jablensky explained: ‘‘Once a
diagnostic concept such as syndrome has come into general use, it tends to
become reified.’’9 In other words, its validity is assumed rather than tested.
How, then, can a syndrome be tested and better tests developed? This is
discussed in detail in Chapters 4 and 5, but in brief, accuracy is usually
determined by validity and reliability. Reliability refers to the extent to
which an observation yields the same results on repeated independent assess-
ments. Essentially, this is a measure of consensus between assessors. Validity,
derived from the Latin validus, meaning strong, refers to how well
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 9

the instrument measures what it purports to measure (see Textbox 1.2). In

essence this is a measure of truth—how much agreement is there with the
actual disorder, assuming it could be defined by some criterion reference (or
gold standard). In MDD there is no accepted gold standard,10 and therefore
reliability and validity testing must be reduced to measures of agreement,
where the critical question becomes: How good is the comparison? In
medical specialties, aspects of the history such as nature of the chest pain
have been subjected to diagnostic validity testing in a similar way to
established investigations such as the electrocardiogram.11,12 In psychiatry
(outside of organic brain disease), such objective tests are rarely if ever
available. Many influences favor the adoption of a medical model in which
an etiologic agent, a pathologic process, and symptoms and signs are
assumed to be present even if unknown. This is often highly acceptable to
patients, clinicians, and other interested parties (eg, the pharmaceutical
industry), not least because stigma may be reduced and help-seeking and
adherence encouraged. The flip side is that patient responsibility may be
diminished and biologic treatments may be overprescribed. If the medical
model of depression is correct, then eventually a definitive core disease
process underlying depression will be found and a diagnostic test developed
that (regardless of convenience) will enable current clinical diagnostic
methods to be fully evaluated. If the medical model of depression is incor-
rect, then a definitive biologic test will never be developed, and we will
continue to develop proxies of illness that may nevertheless correspond to
important correlates of disorder and suffering, such as treatment response,
course, and quality of life.
The astute reader will probably conclude that measures of reliability
and validity in psychiatry (and by implication diagnosis itself) are essen-
tially all tests of agreement, albeit against different standards. Reliability
is agreement with peers, and validity is agreement with an accepted
method. As no group has yet found a robust biologic test for depression,
most work has focused on attempts to improve the reliability of assess-
ments conducted by researchers and clinicians. Often this involves refine-
ment of the clinical interview using methods that assist the clinician.
Semi-structured interviews provide questions that might best elicit symp-
toms but the clinician retains flexibility to deviate from this if necessary.
Structured interviews provide questions that must be asked as described,
purposely removing flexibility, with the useful benefit that clinical training
is not a prerequisite and large population surveys using lay interviewers
becomes possible. One level of assistance to clinicians that does not
interfere with the clinical interview is provision of symptom checklists,
together with the rules for their combination (Textbox 1.3). This essen-
tially forms the basis of ICD-10 and DSM-IV.
10 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.3. Development of Diagnostic Checklists

1972
Feighner, Diagnostic Criteria (FDC): Primary Depression
1978
Spitzer, Research Diagnostic Criteria (RDC): Major Depressive Disorder
1980
Diagnostic and Statistical Manual III: Major Depressive Episode
1987
Diagnostic and Statistical Manual III-R: Major Depressive Episode
1990
ICD-10 International Classification of Diseases: Mild, Moderate, or Severe
Depression
2000
Diagnostic and Statistical Manual IV: Major Depressive Episode
2012
ICD11 International Classification of Diseases Diagnostic and Statistical
Manual V

3. Diagnostic Checklists (including DSM and ICD)

Diagnostic checklists are a list of features, together with the rules for making
a particular diagnosis. If the criteria are monothetic, then all the items must be
present; if polythetic, then only a proportion are required. If features are
necessary, then specific features must be present; if sufficient, then only
certain criteria but no others are needed. Several checklists that generate
one or more systems of psychiatric diagnosis have been proposed (Textbox
1.4).13–15 Checklists leave the clinician to conduct the clinical interview in
any way he or she feels appropriate. Advanced systems may use diagnostic
algorithms that prioritize certain items and use more complex rules, such as
‘‘if x, then y.’’ DSM and ICD-10 use diagnostic checklists but also include
some suggestions for the interview itself. That said, a diagnostic interview
defined only by DSM-IV/ICD-10 lacks clearly defined probe questions,
requiring clinicians to formulate their own approach. Although this adds to
the acceptability, equally it contributes to interrater variability.16 Some con-
sider DSM and ICD distinct from other checklist methods because of the
claim that DSM and ICD are operationalized—that is, each and every step is
described and subject to unambiguous instructions as well as reliability or
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 11

Textbox 1.4. Checklists for Aiding Psychiatric Diagnosis

Lists of Integrated Criteria for the Evaluation of Taxonomy (LICET)

LICET-D for depressive disorders assembles all criteria from 9 diagnostic
systems.
Operational Criteria Checklist (OPCRIT)
OPCRIT generates diagnoses of 13 diagnostic systems and has been proposed
to generate diagnoses direct from medical notes.
ICD-10 Symptom Checklist
Developed by Janca; takes about 15 minutes.
International Diagnostic Checklists (IDCL)
Two 30-item lists, one for ICD-10 and one for DSM-IV.

validity testing. This is probably not the case. Efforts to measure the relia-
bility of DSM-IV have been published.17

ICD and DSM

The World Health Organization (WHO) introduced mental disorders in the
sixth revision of the International Classification of Diseases (ICD-6) in 1948.18
The American Psychiatric Association Committee on Nomenclature and
Statistics published the first edition of the Diagnostic and Statistical
Manual: Mental Disorders (DSM-I) in 1952 (see Textbox 1.3).19 Current
diagnostic classification manuals (DSM-IV and ICD-10) deliberately do not
contain mutually exclusive diagnostic categories; rather, they contain over-
lapping areas. Indeed, if carefully applied, each diagnostic system yields a
different number of cases, as illustrated by Erkinjuntti and colleagues (1997)
for dementia20 and Furukawa and associates21 for depression. Of note, agree-
ment between diagnostic systems examined in the same sample is often modest
(Table 1.1). It was in the eighth revision of ICD (ICD-8) in 1967 and in the third
edition of DSM (DSM-III) in 1980 where a systematic effort to improve the
diagnosis and classification of mental disorders was made. Until then, text-
books containing descriptions of individual conditions were the main source of
information, but naturally this led to numerous disputes. DSM and ICD go
beyond textbook descriptions by providing a checklist of useful criteria and,
importantly, suggesting a diagnostic threshold determined by specific symp-
toms, which usually have to fulfill both frequency (symptom count) and
duration criteria. The key difference between a severity questionnaire and an
operational method is that certain criteria are required in the latter, whereas
12 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Table 1.1. Clinician Agreement (Kappa) Using Different Diagnostic

Systems for Depression

DSR DSM RDC ICD-10

DSM 0.95
RDC 0.71 0.71
ICD-10 0.71 0.7 0.74
FDC 0.59 0.6 0.77 0.63
Adapted from Philipp M, Delmo CD, Buller R, et al. Differentiation between
major and minor depression. Psychopharmacology. 1992;106:S75–S78.

severity questionnaires usually rely on symptom counts alone, without

weighting of symptoms (see Appendix Table 1). That said, questionnaires
can be constructed to follow the DSM diagnostic algorithm.22 This is not
surprising, because most mood questionnaires were proposed by experts
based on clinical experience alone, whereas careful field testing is needed to
rank important items (see Chapter 4). Given this, it is remarkable that severity
questionnaires may perform quite well against structured interviews.

Validation of the DSM-IV/ICD-10 Criteria for Depression

The criteria for major depression, minor depression, and dysthymia are shown
in Table 1.2. Subsyndromal depression is not currently included in DSM-IV
but can be considered present if there are at least two DSM-IV symptoms but
the overall criteria for major or minor depression are not met.23 MDD is
defined by depressed mood or loss of interest in nearly all activities for at
least 2 weeks accompanied by at least three or four (for a total of 5) symptoms.
The criteria for minor depression are identically but require two to four

Table 1.2. Diagnostic Categories for Depressive Disorders

Diagnostic DSM-IV Criteria Symptom

Category Duration
Major depression 5 depressive symptoms, including depressed mood or 2 weeks
anhedonia, causing significant impairment in social,
occupational, or other important areas of functioning
Minor depression 2–4 depressive symptoms, including depressed mood or 2 weeks
(research criteria anhedonia, causing significant impairment in social,
diagnosis) occupational, or other important areas of functioning
Dysthymia 3 or 4 dysthymic symptoms, including depressed mood, 2 years
poor appetite or overeating, insomnia or hypersomnia,
low energy, low self-esteem, poor concentration or
indecisiveness, and hopelessness
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 13

symptoms and require exclusion of previous major depression in an attempt to

avoid confusion over residual symptomatology. Dysthymia is characterized by
fewer symptoms than major depression (three or four) and a chronic course
lasting at least 2 years.
In ICD-10 the core symptoms of depression include decreased energy or
increased fatigability in addition to low mood and loss of interest. Further, only
four symptoms are required for a mild episode, and six (five in early versions)
symptoms qualify as moderate depressive episode. Thus, DSM-IV major
depression is broadly analogous to the ICD-10 concept of moderate or severe
depression. Both ICD and DSM suggest a minimum number of typical and
associated symptoms and a minimum duration of symptoms of 2 weeks. In
DSM-IV, but not in ICD-10, a third feature is added: that the disorder causes
significant impairment in social, occupational, or other important areas of
functioning. As a result, there is discordance in diagnosis based on ICD-10
versus DSM-IV.24–26
Over the past 10 years there have been accumulating challenges to the
diagnostic criteria in DSM-IV, including but not limited to MDD. Philipp and
colleagues (1992) were one of the first groups to show that the major
depression concept may be too narrow.27 In a primary care study using
DSM-III-R, MDD occurred in 17.4%, but the majority of depressed patients
fell into the group of depression ‘‘not otherwise specified’’ (NOS). Adding
the minor depression concept resulted in the reclassification of 38.3% of the
NOS patients to minor depression. Data from the National Comorbidity
Survey have shown that across the minor, major, and severe categories of
depression (depending on the number of symptoms) there is a ‘‘monotonic’’
increase for a number of fundamental indices such as average number of
episodes, impairment, comorbidity, and parental psychopathology,28 sug-
gesting a continuum within depression rather than categorical groupings.
Kendler and Gardner’s 1998 longitudinal analysis of the Virginia Twin
Registry demonstrated that the presence of five or more symptoms of depres-
sion was not a more accurate definition of depression at 1-year follow-up than
the presence of three or four symptoms.29 Additionally, there is little
empirical support for the DSM-IV requirement for 2-week duration or,
indeed, ‘‘clinically significant impairment.30,31
In the Rhode Island MIDAS project, Zimmerman and colleagues (2006)32
conducted an in-depth analysis of symptoms for MDD by having trained raters
administer a semi-structured interview to 1,523 psychiatric outpatients. 54.4%
of the sample had a current MDD. They analyzed a 17-item bank of possible
symptoms of depression, including the standard 9 DSM items but separating
the compound criteria that encompass more than one symptom (eg, increased
sleep or insomnia), along with non-DSM diagnostic items such as hopeless-
ness, helplessness, and unreactive mood. The authors found that some items
14 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 1.5. Inter-Rater Reliability Eliciting

Individual Symptoms of Depression

Symptoms Kappa
Suicidality 0.94
Depressed mood 0.92
Insomnia 0.91
Anhedonia 0.90
Decreased appetite 0.89
Loss of energy 0.88
Indecisiveness 0.88
Thoughts of death 0.86
Psychomotor agitation 0.83
Feelings of worthlessness 0.80
Increased weight 0.79
Decreased concentration 0.78
Excessive guilt 0.76
Decreased weight 0.69
Increased appetite 0.63
Psychomotor retardation 0.63
Hypersomnia 0.54

were rated more reliably than others—for example, suicidal ideas, plan, or
attempt (suicidality) achieved almost perfect agreement, whereas raters often
disagreed about what constituted psychomotor retardation (Textbox 1.5). The
authors found that the ranked order of diagnostic weight (by individual item)
for DSM-IV membership on logistic regression was depressed mood > anhe-
donia > sleep disturbance > concentration/indecision > worthlessness/exces-
sive guilt > loss of energy > appetite/weight disturbance > psychomotor
change > death/suicidal thoughts. Some items seemed redundant in making a
diagnosis. Zimmerman’s group also looked at a validity of so-called core
criteria.33 Only 1.5% of the 1,800 patients reported five or more criteria in
the absence of low mood or loss of interest or pleasure. Twenty-five of these 27
patients reported depressed mood at a subthreshold level, often in partial
remission. Thus, only a small handful of cases would be false positives if no
core criteria existed. In another paper in the series, they found that few patients
who met the symptom criteria for MDD were ruled out of the diagnosis by the
other components of the diagnostic algorithm, thereby explaining why self-
administered depression symptom questionnaires perform well as diagnostic
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 15

proxies.34 Finally, they addressed the longstanding issue of applying some of

the criteria in patients with comorbid medical illnesses because of symptom
nonspecificity. Based on a series of psychometric analyses that were cross-
validated, they developed an alternative set of diagnostic criteria for MDD that
did not include somatic symptoms but would nonetheless demonstrate a high
level of concordance with the current DSM-IV definition.

4. Unstructured (Unassisted) Clinician Diagnosis

Clinician-based assessment has been poorly investigated compared with
assisted methods of diagnosis. In fields of medicine where a robust external
validation such as postmortem is available, routine diagnostic accuracy has
often proven to be remarkably poor.35,36 It should be no surprise, then, if in the
absence of a gold standard, health professionals have considerable difficulty
making accurate and reliable diagnoses (see Table 1.2).37,38 Regarding missed
diagnoses, one study suggested that only 26% were complete mistakes; 25%
were underestimates of severity and 38% misidentifications. Conversely,
regarding false-positive diagnoses, 35% were overestimates of severity, 24%
misdiagnoses, and 41% complete errors. To compound this problem, 90% of
psychiatrists do not routinely use case identification and severity measurement
for depression (and more than half never do so).39 Most clinicians rely on their
own abilities based on training received earlier in their career. On the other
hand, clinician-based assessment is purported to be a gold standard in psy-
chiatry if the clinician is given adequate time and resources. This was best
conceptualized by Spitzer, who proposed the LEAD standard.40 LEAD is an
acronym that stands for the Longitudinal evaluation performed by Expert
clinicians who utilize All available Data. The LEAD standard is an important
way of obtaining the most likely diagnosis by requiring clinicians to use a
collateral history, hospital records, psychological evaluations, and laboratory
results. However, uncertainty about who is ‘‘expert’’ and which data are
mandatory, as well as availability, limits both the actuarial and practical
value of this standard.41 A related clinical standard is the best estimates
procedure (BEP), which is simpler than the LEAD.42 In the BEP, all available
information is evaluated by experienced clinicians who assign a consensus
‘‘best-estimate diagnosis.’’ As with the LEAD standard, the number of clin-
icians and source of information should always be stated.

Accuracy of Psychiatrists’ Routine Diagnoses

The accuracy of psychiatrists’ diagnostic skills can be compared against BEP
diagnoses and/or structured interviews. The value of BEP was investigated by
16 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Kosten and Rounsaville (1992),43 who interviewed 475 subjects using the
Schedule for Affective Disorders and Schizophrenia-Lifetime (SADS-L).
Two psychologists independently evaluated and diagnosed the same subjects,
applying the BEP. Higher rates of diagnoses of major and minor depressive
disorder, antisocial personality, alcoholism, and drug abuse were revealed
when the BEP was applied than with routine interview alone and with a
minimal rate of false positives. More recently, Taiminen and colleagues
(2001)44 compared routine discharge diagnoses based on DSM-IV and BEP
diagnoses in 116 first-admission patients with psychosis and severe affective
disorder (Table 1.3). However, in this case the BEP included data from a
Schedules for Clinical Assessment in Neuropsychiatry (SCAN) interview,
enforcing an even higher gold standard. Diagnostic agreement was moderate
(kappa 0.51), suggesting frequent errors in the routine diagnoses even when
using DSM-IV criteria. Of note, clinicians tended to miss depressive symptoms
in psychotic patients, to overdiagnose psychotic symptoms in depressive
patients, and to overlook earlier hypomanic or depressive episodes in depres-
sive patients. Spitzer and colleagues (1999)45 evaluated the unassisted accu-
racy of mental health professionals (1 psychologist and 3 mental health social
workers) in comparison with 62 primary care physicians (PCPs) using the
depression scale of the Patient Health Questionnaire (PHQ-9). Accuracy was
calculated in 585 cases who had both assessments within a 48-hour period.
PCPs recognized 61% of cases thought to have major depression by mental
health professionals and excluded 98% of cases thought not to have major
depression. Accuracy in the other direction was not reported. Recently
Carballeira and colleagues from Switzerland (2007)46 studied 212 patients
admitted to the internal medicine units of the University Hospitals of Geneva
(Table 1.4). Each patient completed the PHQ-9 and underwent a blind DSM-IV
diagnostic assessment by a psychiatrist. Compared to the PHQ-9, psychiatrists
recognized 50% of cases with major depression but only 22% of those with

Table 1.3. Diagnostic Accuracy of Primary Care Physicians Against CIDI

Gold Standard Gold Standard

Depressed (CIDI) Not Depressed (CIDI)
Depressed 70 76 PPV 48%
(Unassisted Diagnosis) (false positives)
Not Depressed 104 459 NPV 81.5%
(Unassisted Diagnosis) (false negatives)
Total Se 40.2% Sp 85.8%
Reprinted from General Hospital Psychiatry 21(2), Tiemens BG, VonKorff M, Lin EH, Diagnosis of
depression by primary care physicians versus a structured diagnostic interview. Understanding
discordance, 87–96, Copyright (1999).
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 17

Table 1.4. Diagnostic Accuracy of Psychiatrists vs. PHQ-9 (Patient-Rated)

Psychiatrist PHQ-9 PHQ-9 No PHQ-9 No PHQ-9

Mj Mn Mj Mn
Depressed Depressed Depressed Depressed
Depressed 12 5 26 30 PPV (Mj) 32%
(Unassisted (false (false PPV (Mn) 14%
Diagnosis) positives) positives)
Not Depressed 12 18 162 159 NPV (Mj) 93%
(Unassisted (false (false NPV (Mn) 90%
Diagnosis) negatives) negatives)
Total Se 50% Se 22% Sp 86% Sp 84%
Mj, major (DSM-IV); Mn, minor (DSM-IV).
Reproduced from Carballeira et al. Criterion validity of the French version of Patient Health
Questionnaire (PHQ) in a hospital department of internal medicine. Psychology and Psychotherapy:
Theory, Research and Practice (2007), 80, 69–77.

more milder forms. Rule-out accuracy was high but rule-in accuracy was poor,
with a high rate of false positives. The authors also compared diagnoses of
psychiatrists by internists in medicine, finding a kappa agreement of only 0.20.
This study is valuable because patient-rated symptoms have particular
importance.47
Several groups have explored the accuracy of routine diagnoses against
the Structured Clinical Interview for DSM Disorders (SCID), although few
have used other methods such as the Composite International Diagnostic
Interview (CIDI).48 Helzer and colleagues (1985)49 examined the level of
agreement between a lay-rated Diagnostic Interview Schedule (DIS) in the
Epidemiologic Catchment Area project and routine clinical diagnoses made
by psychiatrists. Overall agreement between the DIS and the psychiatrists
ranged from 79% to 96%, but specificities were all 90% or better. Anthony
and associates (1985)50 studied DSM-III diagnoses made by the DIS in
comparison to a ‘‘standardized’’ DSM-III diagnosis by psychiatrists in the
two-stage Baltimore Epidemiologic Catchment Area mental morbidity
survey. There were considerable disagreements; the only category of
modest agreement was alcohol use disorder. Steiner and colleagues
(1995)51 studied the relationship between diagnoses generated by the
SCID and unstructured psychiatric interviews. Diagnoses generated by
researchers using the SCID and routinely by psychiatrists were compared
for 100 patients. Overall agreement between the SCID diagnosis and the
clinical diagnosis was low (kappa of 0.30). Shear and coworkers (2000)52
examined 164 nonpsychotic patients at two community treatment facilities
using the SCID and compared results to diagnoses obtained from clinician
18 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

records. The majority (59%) of patients met the SCID criteria for a primary
depressive disorder. Diagnoses agreed in only a small minority of cases
(kappa 0.24 overall and 0.33 for mood disorder). Overall, use of the SCID
resulted in more diagnoses than the standard clinical procedures, particu-
larly where comorbidity was present. Anxiety disorders, in particular, were
much more likely to be overlooked by a clinical rater. One exception was
‘‘adjustment disorder,’’ which was more frequently diagnosed by a clinician
than by the SCID rater. In an important but small-scale study, Miller and
colleagues (2001)53 compared three methods of diagnosis for 56 psychiatric
inpatients against the LEAD criterion standard. These were unassisted
clinical assessment, SCID, and a structured Computer-Assisted Diagnostic
Interview (CADI). Psychiatrists’ unassisted assessment had 54% agreement
against LEAD (kappa 0.43), whereas SCID and CADI had agreements
above 85% (kappa 0.81). Compared with similarly trained colleagues,
there was an interrater agreement of only 45.5% (kappa 0.24) for unassisted
clinicians, meaning independent clinicians disagreed most of the time.54 In
one of the largest studies of diagnostic accuracy, Kashner and coworkers
(2003)55 looked at 294 newly enrolled adult psychiatric patients based on
clinical records. Within 2 weeks of their primary evaluation, patients were
randomly assigned to receive a nurse-administered SCID with feedback to
the attending psychiatrist or usual care. The kappa agreement between the
SCID and chart diagnoses of MDD was 0.56 at baseline (unassisted), rising
to 0.90 at the end of the study after feedback of results to clinicians. Against
the SCID, clinicians underdiagnosed all psychiatric disorders (for example,
missing over 60% of substance abuse disorders and anxiety disorders).
However, unassisted clinicians also made several false-positive diagnoses,
most commonly for schizophrenia, bipolar disorders, and MDD. Basco and
associates (2003)56 interviewed 200 psychiatric outpatients and attempted to
establish gold standard diagnoses based on SCID, all medical records, and a
follow-up interview with a psychiatrist or a psychologist trained in diag-
nostic procedures (in effect, the LEAD procedure). The percentage of
agreements with this gold standard was 53% for routine diagnoses, 68%
for the SCID, and 79% for the SCID plus chart review. Concordance was
better for depression. Looking at the subset of patients examined by a
psychiatrist, 70% of those thought by psychiatrists to have MDD actually
did on the SCID (43 of 61 participants), but half of the SCID cases of MDD
were not previously recognized as such, typically assigned adjustment dis-
order or no clinical diagnosis, anxiety disorder, substance abuse, or bipolar
disorder. The accuracy of unassisted clinical ability was examined for both
rule-in and rule-out accuracy (Table 1.5). Psychiatrists were good at
excluding depression but missed 50% of cases when attempting to rule in
a diagnosis. In all groups, when discrepancies occurred, most were judged to
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 19

Table 1.5. Diagnostic Accuracy of Psychiatrists vs. SCID Plus

SCID þ Standard SCID þ Standard

Depressed Depressed
Depressed 17 7 PPV 76%
(Unassisted (false positives)
Diagnosis)
Not Depressed 17 155 NPV 89%
(Unassisted (false negatives)
Diagnosis)
Total Se 50% Sp 96%
SCID+ refers to SCID, plus all medical records and a follow-up interview with a trained
psychiatrist or a psychologist; see text.
Basco et al. Methods to improve diagnostic accuracy in a community mental health setting. Am J
Psychiatry. 2000;157(10):1599–1605.

be of substantial clinical importance. Performance shows remarkable simi-

larity to those of PCPs (see Table 1.4 for comparison). The kappa coeffi-
cients showed that administration of the SCID without the benefit of a
medical record review improved accuracy beyond routine diagnosis alone,
while adding information derived from the chart review resulted in an
additional 25% improvement over and above the SCID alone. These find-
ings are consistent with reports from other studies showing the advantage of
diagnostic interviews over unstructured clinical interviews (see below).57,58
This is one study in which the importance of the competing diagnoses was
investigated. Psychiatrists found separation of MDD versus obsessive-com-
pulsive disorder and MDD versus dysthymia to be relatively straightforward
but struggled with MDD versus adjustment disorder and MDD versus
organic disorder, among others. Reasons for suboptimal accuracy are dis-
cussed in Chapter 3.

5. Structured and Semi-Structured Assisted Diagnostic

Interviews
Semi-structured diagnostic interviews were introduced in the 1970s as a method
that would allow lay interviewers to obtain psychiatric diagnoses close to those a
psychiatrist would obtain.59,60 Rogers suggested that one third of clinical varia-
bility was due to idiosyncratic questioning and two thirds to interpretation of the
information gleaned.60 The premise is that standardization forces an assessor to
cover all the areas of psychopathology and provides consistency in the way
questions are asked. Three main components of the structured interview are (1)
to use the standardized language of clinical method, (2) to sequence the order of
inquiry, and (3) to quantify the responses.
20 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

However, assisted interviews have several significant limitations. First, they

are time-consuming: the average time to administer the SCID is approximately
1 hour and 44 minutes, compared to about 40 minutes for a standard interview
(Textbox 1.6). Second, they have modest acceptability to patients and staff,
who often find these interviews restrictive (for staff) and repetitive (for

Textbox 1.6. Summary of Assisted Interviews

Partially Structured
The PSE (Present State Examination)/ SCAN
Type: Semi-structured interview
Recommended Use by: Clinicians
Generates: ICD-10 and DSM-IV criteria
Duration: 45 minutes
SCID-I (Structured Clinical Interview for DSM-IIIR)
Type: Semi-structured interview
Recommended Use by: Trained interviewer and/or clinicians
Generates: DSM-IV
Duration: 1 hour and 44 minutes
Schedule for Affective Disorders and Schizophrenia (SADS)
Type: Semi-structured interview
Recommended Use by: Trained interviewer and/or clinicians
Generates: RDC
Duration: 90 minutes

Fully Structured
CIDI (Composite International Diagnostic Interview)
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: ICD-10 and DSM-III-R criteria
Duration: 75 minutes
M.I.N.I (Mini-International Neuropsychiatric Interview)
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: ICD-10 and DSM-IV criteria
Duration: 20 minutes
Diagnostic Interview Schedule (DIS),
Type: Structured
Recommended Use by: Trained interviewer (clinician optional)
Generates: DSM-IV
Duration: 120 minutes
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 21

patients).61 Third, and perhaps unexpectedly, diagnostic interviews can

produce far from uniform results even in the same population. For example,
12-month prevalence rates of major depression in the United States using two
instruments were 4.2%62 and 10.1%.63 Further, no before-and-after study or
randomized trial has shown how much these methods can improve routine
care. These cautions call into question the value of these instruments for
clinical care, at least until further data are available.64
The most common instruments are illustrated in Textbox 1.6. The SCID was
developed alongside DSM-III-R.65 As with most instruments, raters must first be
trained. Compared with the CIDI, the clinician makes more judgments as to
whether each criterion is met and whether all criteria taken together validate the
clinical diagnosis. Numerous studies have evaluated interrater reliability for
major depression using the SCID. One of the largest, from Williams and
colleagues (1992),66 evaluated the ability of psychiatrists (n = 14), psychologists
(n = 6) and master’s degree students (n = 4) to diagnose depression. There was a
modest kappa agreement of 0.64. There are also several studies comparing the
SCID and CIDI. In a sample of 325 patients from the National Comorbidity
Survey, the sensitivity of CIDI was 55% and specificity was 93.7% for lifetime
major depression compared with the SCID (kappa 0.54).67 In the study by Basco
and associates (2003) mentioned previously, the added value of SCID plus chart
diagnoses suggests that the SCID can be improved using very experienced
clinical raters—hence the need for a clinician-led assisted interview.
Interestingly, feedback of SCID results to psychiatrists can lead to more positive
outcomes.68 Philipp and colleagues (1986) proposed a refinement to the SCID
called the Polydiagnostic Interview (PODI).69 The advantage of this approach is
that the PODI generates diagnosis according to several completing diagnostic
checklists, including DSM-III-R, ICD-10, Research Diagnostic Criteria (RDC),
and Feighner Diagnostic Criteria. The Present State Examination (PSE) is a
semi-structured interview designed for use only by clinicians. The current 10th
edition can generate both ICD-10 and DSM-IV diagnoses. A computer program
derived from PSE (CATEGO-5) has also been developed, as has a short version
of PSE. SCAN is a semi-structured interview based on PSE and is also the
product of a collaborative study between the World Health Organization (WHO)
and the U.S. Alcohol, Drug Abuse, and Mental Health Administration
(ADAMHA).70 Again, the PSE requires a thorough training course, making it
expensive and time-consuming for many.

Fully Structured Assisted Interviews

The DIS was developed by National Institute of Mental Health (NIMH) and
was released in its first version in 1978. It was an adaption of the Renard
Diagnostic Instruments designed to assess Feighner’s diagnostic criteria.
22 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

DIS-4 focuses on DSM-IV and is similar to the CIDI. It has been validated,
but one study found low sensitivity of the DIS versus the SCID.71 The CIDI
was produced jointly by WHO and ADAMHA and is designed to enable a
trained interviewer to arrive at a either an ICD-10 or a DSM diagnosis in
about 75 minutes. The CIDI is an amalgamation of two pre-existing instru-
ments, the DIS and the PSE. It contains 276 symptom questions, many of
which are probes to evaluate symptom severity, as well as questions for
assessing help-seeking and psychosocial impairments. A computerized ver-
sion, CIDI 2.1, is available. The first field showed high interrater reliability
but poor test–retest reliability for depressive disorders.72 Subsequent relia-
bility studies (using slightly different versions of the CIDI) demonstrated a
high interrater reliability.73,74 One validity study used a clinician-scored
DSM-III-R symptom checklist as the gold standard.75 Compared with this
gold standard checklist, the CIDI had a sensitivity of 85% and a specificity
of 98% (kappa 0.84). A second study compared the CIDI against the SCID-
assisted LEAD procedure.76 There was modest positive predictive value and
a high negative predicted value (kappa 0.46). The Mini-International
Neuropsychiatric Interview (M.I.N.I.) is an abbreviated structured psychia-
tric interview that takes only 15 to 20 minutes to administer.77 It uses
decision-tree logic to elicit all the symptoms listed in the symptom criteria
for DSM-IV and ICD-10 for 15 major Axis 1 diagnostic categories, for one
Axis II disorder, and for suicidality. Several specific tools are available:
M.I.N.I.-Screen, M.I.N.I.-Plus, and the M.I.N.I.-Kid. Validation of the
M.I.N.I. in relation to the SCID Patient Version, the CIDI, and expert
professional opinion has been conducted.77

6. Conclusion
Some will find the conclusion that a diagnosis of mental disorders is not based
on a robust gold standard surprising.78 Current evidence has repeatedly shown
that unassisted psychiatric diagnoses are neither particularly reliable (when
judged by repeat assessments) nor particularly valid (when judged by con-
sensus methods or assisted interviews), especially when comorbidity is
present.79 Miller and colleagues (2001)53 found that when unassisted, clini-
cians evaluated an average of only 53% of key criteria present in diagnostic
algorithms (32% in the case of depression). Psychiatrists asked about low
mood in 86% of cases but asked about loss of pleasure in only 8%.80 As
awareness of these limitations increases, there will be an increased call for
clinicians to use diagnostic aids as a routine in clinical practice. If this occurs
with proper diagnostic scrutiny (comparing accuracy with and without assis-
tance head to head), psychiatric diagnosis will slowly move from being a
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 23

nonscientific art based on the overall clinical impression to a science where the
accuracy of each method—indeed each question—is known. As Kendell and
Jablensky9 observed: ‘‘Psychiatry is in the position—that most of medicine
was in 200 years ago—of still having to define most of its disorders by their
syndromes. Because of the consequent need to distinguish one disorder from
another by differences between syndromes, the validity of diagnostic concepts
remains an important issue in psychiatry. In this situation, to search for
boundaries between syndromes and to use zones of rarity as criteria of validity
is, we contend, the best strategy available to us.’’
Here Kendell and Jablensky highlight a fundamental problem in the search
for accuracy. That is the notion that many of our current diagnoses are labels of
convenience not any more distinct from each other than short stature and
normal height. Like many conditions based largely on phenotypes alone,
normal height has a Gaussian (normal) distribution that overlaps with many
diseases and disorders that cause growth retardation. The result may be two
distributions with significant overlap and little point of rarity (see Fig. 1.1).

Kappa
160
Time required
Agreement With Gold Standatd
on Specific Diagnoses (kappa)

140
Time Required (minutes)

120

1.00 100

0.80 80

0.60 60

0.40 40

0.20 20

0.00 0
Routine Diagnoses Diagnoses
Diagnoses Based on SCID Based on SCID
Plus Medical
Records

Figure 1.4. Time required to produce accurate diagnoses. Time requirement and reliability
of routine diagnoses, SCID-based diagnoses, and diagnoses based on the SCID plus medical
records for 200 outpatients with severe mental illness. Reprinted from Basco RM, Bostic JQ,
Davies D, Rush AJ, Witte B, Hendrickse W, Barnett V. Methods to improve diagnostic
accuracy in a community mental health setting. Am J Psychiatry. 2000 Oct;157(10):
1599–605 with permission.
24 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

DSM-III and ICD-8 were landmark publications that allowed us to

scrutinize the mysterious process of psychiatric diagnosis. Each new release
brings an incremental improvement. Although neither DSM nor ICD has
been universally accepted (in one study, clinicians used DSM criteria in
23% of visits in which a psychosocial problem was recognized),81 they have
had a beneficial influence.82 As these checklist-based diagnostic systems
with rule-based criteria are field-tested, it becomes apparent that many of
the suggested symptoms, combinations, and associated features are not
particularly useful diagnostically. However, this could be seen as an advan-
tage, as previously no attempt was made at all to change mainstream
psychiatric diagnoses. Finding out what doesn’t work may be as valuable
as finding out what does. Beyond the checklist approach lie assisted inter-
views, which have a good evidence base for reliability, validity, or both.
What is missing are formal implementation trials where one group of
clinicians are randomized to assisted interviews and one group to diagnosis
as usual to discover if clinical outcome actually improves. Unfortunately,
most of the assisted methods so far developed are too long for routine
clinical use. Indeed, a rule of thumb in this field is that the more accurate
the diagnostic method, the longer the time required—and, further, this effect
may not be linear (Fig. 1.4). A key challenge for the future, therefore, is to
develop reliable diagnostic methods of sufficient brevity that they become
routinely accepted in busy clinical practice, including primary and sec-
ondary care.

References
1. Jablensky A. Categories, dimensions and prototypes: critical issues for psychiatric
classification. Psychopathology. 2005;38:201–205.
2. van Praag HM. Can stress cause depression? Prog Neuropsychopharmacol Biol Psych.
2004;28(5):891–907.
3. Parker G. Classifying depression: should paradigms lost be regained? Am J Psychiatry.
2000;157:1195–1203.
4. Sneath PHA. Some thoughts on bacterial classification. J Gen Microbiol.
1957;17:184–200.
5. Cloninger CR. A new conceptual paradigm from genetics and psychobiology for the
science of mental health. Aust N Z J Psychiatry. 1999;33:174–186.
6. Lyness JM, Kim JH, Tang W, et al. The clinical significance of subsyndromal
depression in older primary care patients. Am J Geriatr Psychiatry. 2007;15:214–223.
7. Angst J, Merikangas KR. Multi-dimensional criteria for the diagnosis of depression.
J Affect Disord. 2001;62:7–15.
8. The ICD-10 classification of mental and behavioral disorders: diagnostic criteria for
research, 10th edition. Geneva: World Health Organization, 1993.
9. Kendell R, Jablensky A. Distinguishing between the validity and utility of psychiatric
diagnoses. Am J Psychiatry. 2003;160:4–12.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 25

10. Aboraya A, Compton III W. Biological markers and external validators in psychiatry:
progress report on the validity of psychiatric diagnoses. eCommunity Int J Mental
Health Addiction. Nov. 7, 2004 [online].
11. Tierney W, Fitzgerald J, McHenry R, et al. Physicians’ estimates of the probability of
myocardial-infarction in emergency room patients with chest pain. Medical Decision
Making. 1986;6(1):12–17.
12. Chun AA, McGee SR. Bedside diagnosis of coronary artery disease: A systematic
review. Am J Med. 2004;117(5):334–343.
13. Pull CB, Pull MC, Pichot P. Integrated lists of taxonomic evaluation criteria: LICET-S
and LICET-D. Acta Psychiatr Belg. 1984;84(4):297–309.
14. Mihalopoulos C, McGorry P, Roberts S, et al. The procedural validity of retrospective
case note diagnosis. Aust N Z J Psychiatry. 2000;34(1):154–159.
15. Janca A, Hillerb W. ICD-10 checklists—A tool for clinicians’ use of the ICD-10
classification of mental and behavioral disorders. Comprehensive Psychiatry.
1996;37(3):180–187.
16. Hamilton JD. Do we underutilise actuarial judgement and decision analysis? Evidence-
Based Mental Health. 2001;4:102–103.
17. Holzer III CE, Nguyen HT, Hirschfeld RMA. Reliability of the diagnosis in mood
disorders. Psychiatric Clin North Am. 1996;19(1):73–84.
18. Manual of the international classification of diseases, injuries and causes of death, 6th
ed. Geneva: World Health Organization, 1948.
19. Diagnostic and statistical manual of mental disorders. Washington, DC: American
Psychiatric Publishing, 1952.
20. Erkinjuntti T, Ostbye T, Steenhuis R, et al. The effect of different diagnostic criteria on
the prevalence of dementia. N Engl J Med. 1997;337(23):1667–1674.
21. Furukawa TA, Anraku K, Hiroe T, et al. A polydiagnostic study of depressive disorders
according to DSM-IV and 23 classical diagnostic systems. Psychiatry Clin Neurosci.
1999;53(3):387.
22. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive
disorder VI: Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis.
2006;194:565–569.
23. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC:
American Psychiatric Publishing, 1994.
24. Philipp M, Maier W, Delmo CD. The concept of major depression. I. Descriptive
comparison of six competing operational definitions including ICD-10 and DSM-
III-R. Eur Arch Psychiatry Clin Neurosci. 1991;240(4–5):258–265.
25. Andrews G, Slade T, Peters L, et al. Classification in psychiatry: ICD-10 versus
DSM-IV. Br J Psychiatry. 1999;174(1):3–5.
26. Ravelli A, Bijl RV, Van Brink WD. Consequences of the use of different classification
systems: A comparison of the DSM-III-R and the ICD10 for depression. Int J Methods
Psychiatric Res. 1999;8(4):192–203.
27. Philipp M, Delmo CD, Buller R, et al. Differentiation between major and minor
depression. Psychopharmacology. 1992;106:S75–S78.
28. Kessler RC, Zhao S, Blazer DG, et al. Prevalence, correlates, and course of minor
depression and major depression in the National Comorbidity Survey. J Affect Disord.
1997;45:19–30.
29. Kendler KS, Gardner CO Jr. Boundaries of major depression: an evaluation of DSM-IV
criteria. Am J Psychiatry. 1998;155:172–177.
30. Spitzer RL, Wakefield JC. DSM-IV diagnostic criterion for clinical significance: does it
help solve the false positives problem? Am J Psychiatry. 1999;156:1856–1864.
26 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

31. Beals J, Novins DK, Spicer P, et al., the AI-SUPERPFP Team. Challenges in
operationalizing the DSM-IV clinical significance criterion. Arch Gen Psychiatry.
2004;61(12):1197–1207.
32. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
I. A psychometric evaluation of the DSM-IV symptom criteria. J Nerv Ment Dis.
2006;194:158–163.
33. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder,
IV. Relationship between number of symptoms and the diagnosis of disorder. J Nerv
Ment Dis. 2006;194:450–453.
34. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive
disorder, VI. Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis.
2006;194:565–569.
35. Lundberg GD. Low-tech autopsies in the era of high-tech medicine: continued value for
quality assurance and patient safety. JAMA. 1998;2801:1273–1274.
36. Mayeux R, Saunders AM, Shea S, et al. Utility of the apolipoprotein E genotype in the
diagnosis of Alzheimer’s disease. Alzheimer’s Disease Centers Consortium on
Apolipoprotein E and Alzheimer’s Disease. N Engl J Med. 1998;338(8):506–511.
37. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol
Rev. 1983;3:103–145.
38. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care physicians
versus a structured diagnostic interview. Understanding discordance. Gen Hosp
Psychiatry. 1999;21(2):87–96.
39. Gilbody SM, House AO, Sheldon TA. Psychiatrists in the UK do not use outcomes
measures: National survey. Br J Psychiatry. 2002;80:101–103.
40. Spitzer RL. Psychiatric diagnosis: Are clinicians still necessary? Comprehensive
Psychiatry. 1983;24:399–411.
41. Antony MM, Barlow DH. Structured and semistructured diagnostic interviews. In
Barlow DH, ed. Handbook of assessment and treatment planning for psychological
disorders. New York: Guilford, 2002:3–37.
42. Leckman JF, Sholomskas D, Thompson WD, et al. Best estimate of lifetime psychiatric
diagnoses. Arch Gen Psychiatry. 1982;39:879–883.
43. Kosten TA, Rounsaville BJ. Sensitivity of psychiatric diagnosis based on the best
estimate procedure. Am J Psychiatry. 1992;149:1225–1227.
44. Taiminen T, Ranta K, Karlsson H, et al. Comparison of clinical and best-estimate
research DSM-IV diagnoses in a Finnish sample of first-admission psychosis and
severe affective disorder. Nord J Psychiatry. 2001;55(2):107–111.
45. Spitzer RL, Kroenke K, Williams JBW, et al. Validation and utility of a self-report
version of PRIME-MD: the PHQ primary care study. JAMA. 1999;282:1737–1744.
46. Carballeira Y, Dumont P, Borgacci S, et al. Criterion validity of the French version of
Patient Health Questionnaire (PHQ) in a hospital department of internal medicine.
Psychol Psychotherapy Theory Res Pract. 2007;80:69–77.
47. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry.
2000;15(3):160–172.
48. Becker J, Kocalevent RD, Rose M, et al. Standardized diagnosing: Computer-assisted
(CIDI) diagnoses compared to clinically-judged diagnoses in a psychosomatic setting.
Psychotherapie Psychosomatik Medizinische Psychologie. 2006;56(1):5–14.
49. Helzer JE, Robins LN, McEvoy LT, et al. A comparison of clinical and diagnostic
interview schedule diagnoses. Physician reexamination of lay-interviewed cases in the
general population. Arch Gen Psychiatry. 1985;42:657–666.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? 27

50. Anthony JC, Folstein M, Romanoski AJ, et al. Comparison of the Lay Diagnostic
Interview Schedule and a standardized psychiatric diagnosis. Experience in eastern
Baltimore. Arch Gen Psychiatry. 1985;42(7):667–675.
51. Steiner J, Tebes J, Sledge W, et al. A comparison of the structured clinical interview for
DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183(6):365–369.
52. Shear MK, Greeno C, Kang J, et al. Diagnosis of nonpsychotic patients in community
clinics. Am J Psychiatry. 2000;157:581–587.
53. Miller PR. Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of
structured versus unstructured interviews. Psychiatry Res. 2001;105:265–272.
54. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and outcomes of
structured vs. unstructured interviews. Psychiatry Res. 2001;105:265–271.
55. Kashner TM, Rush AJ, Suris A, et al. Impact of structured clinical interviews on
physicians’ practices in community mental health settings. Psychiatr Serv.
2003;54:712–718.
56. Basco RM, Bostic JQ, Davies D, et al. Methods to improve diagnostic accuracy in a
community mental health setting. Am J Psychiatry. 2000;157(10):1599–1605.
57. Riskind JH, Beck AT, Berchick RJ, et al. Reliability of DSM-III diagnoses for major
depression and generalized anxiety disorder using the Structured Clinical Interview for
DSM-III. Arch Gen Psychiatry. 1987;44:817–820.
58. Williams JBW, Gibbon M, First MB, et al. The Structured Clinical Interview for
DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry.
1992;49:630–636.
59. Robins L. National Institute of Mental Health diagnostic interview schedule—its
history, characteristics, and validity. Arch General Psychiatry. 1981;38:381.
60. Rogers R. Handbook of diagnostic and structured interviewing. New York: Guilford
Publications, 2001.
61. Gibson C. Semi-structured and unstructured interviewing: a comparison of
methodologies in research with patients following discharge from an acute
psychiatric hospital. J Psychiatric Mental Health Nursing. 1998;5(6):469–477.
62. Robins LN. Psychiat Disorders A: 1991.
63. Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of DSM-
III-R psychiatric disorders in the United States—results from the National Comorbidity
Survey. Arch Gen Psychiatry. 1994;51:8.
64. Brugha TS, Bebbington PE, Jenkins R. A difference that matters: comparisons of
structured and semi-structured psychiatric diagnostic interviews in the general
population. Psychol Med. 1999;29:1013–1020.
65. Spitzer RL, Williams JB, Gibbon M, et al. The Structured Clinical Interview for
DSM-III-R (SCID). I: History, rationale, and description. Arch Gen Psychiatry.
1992;49(8):624–629.
66. Williams JB, Gibbon M, First MB, et al. The Structured Clinical Interview for
DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry.
1992;49:630–636.
67. Haro JM, Arbabzadeh-Bouchez S, Brugha TS, et al. Concordance of the Composite
International Diagnostic Interview Version 3.0 (CIDI 3.0) with standardized clinical
assessments in the WHO World Mental Health Surveys. Int J Methods Psychiatric Res.
2006;15(4):167–180.
68. Kashner TM, Rush AJ, Suris A, et al. Impact of structural clinical interviews on physicians’
practices in community mental health settings. Psychiatric Services. 2003;54(5):712–718.
28 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

69. Philipp M, Maier W. The polydiagnostic interview: a structured interview for

polydiagnostic classification of psychiatric patients. Psychopathology. 1986;19:175–185.
70. Wing JK, Babor T, Brugha T, et al. SCAN. Schedules for Clinical Assessment in
Neuropsychiatry. Arch Gen Psychiatry. 1990;47(6):589–593.
71. Murphy JM, Monson RR, Laird NM, et al. A comparison of diagnostic interviews for
depression in the Stirling County Study Challenges for Psychiatric Epidemiology. Arch
Gen Psychiatry. 2000;57:230–236.
72. Wittchen HU, Robins LN, Cottler LB, et al. Cross-cultural feasibility, reliability and
sources of variance of the Composite International Diagnostic Interview (CIDI). The
multicentre WHO/ADAMHA field trials. Br J Psychiatry. 1991;159:645–658.
73. Wittchen HU. Reliability and validity studies of the WHO-Composite International
Diagnostic Interview (CIDI): a critical review. J Psychiatr Res. 1994;28:57–84.
74. Andrews G, Peters L. The psychometric properties of the Composite International
Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol. 1998;33:80–88.
75. Janca A, Robins LN, Bucholz KK, et al. Comparison of Composite International
Diagnostic Interview and clinical DSM-III-R criteria checklist diagnoses. Acta
Psychiatr Scand. 1992;85:440–443.
76. Booth BM, Kirchner JE, Hamilton G, et al. Diagnosing depression in the medically ill:
validity of a lay-administered structured diagnostic interview. J Psychiatric Res.
1998;32(6):353–360.
77. Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-International Neuropsychiatric
Interview (M.I.N.I.): the development and validation of a structured diagnostic
psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl
20):22–57.
78. Kendell RE. Clinical validity. Psychol Med. 1989;19:45–55.
79. Zimmerman M, Mattia JI. Psychiatric diagnosis in clinical practice: is comorbidity
being missed? Comprehensive Psychiatry. 1999;40:182–191.
80. Miller PR. Inpatient diagnostic assessments: 3. Causes and effects of diagnostic
imprecision. Psychiatry Res. 2002;111:191–197.
81. Gardner W, Kelleher KJ, Pajer KA, et al. Primary care clinicians’ use of standardized
psychiatric diagnoses. Child Care Health Development. 2004;30(5):401–412.
82. Toshiyuki S, Makoto T. Is DSM widely accepted by Japanese clinicians? Psychiatry
Clin Neurosci. 2001;55:437–450.
2
OVERVIEW OF DEPRESSION SCALES
AND TOOLS

Alex J. Mitchell

1. Background
2. The Classic Severity Scales (1960–1980)
3. The New Severity Scales (1981–2008)
4. The Future of Screening Scales

Context
There have been a large number of depression tools published for the purposes
of detecting depression or rating its severity. Choosing between them is
difficult without adequate information on their validity, reliability, and
acceptability. Recently, ever-shorter-version mood measures have been
released. Is a shorter scale a better scale? It is important to study each
method against our best standard and ideally compare scales head to head
to judge the optimal scale for each situation.

1. Background
Clinicians and researchers have developed a bewildering number of tools for the
assessment of depression. These are most often questionnaires designed to help
elicit symptoms of depression for the purpose of screening, diagnosis, and
monitoring progress (Textbox 2.1). Although we often use the terms screening,
diagnosis, and case-finding interchangeably, in an epidemiologic sense screening
refers to the attempted detection of disorder in those who had not sought testing or
did not suspect they had a particular condition. Often a screening test is not
usually intended to be diagnostic, in that those with suspicious findings may be
referred for more definitive examination. The latter is perhaps better known as
case-finding. This means a screening tool can favor negative predictive value

29
30 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 2.1. Definitions of Screening and Related Procedures

Screening
‘‘The systematic application of a test or inquiry, to identify individuals at
sufficient risk of a specific disorder to warrant further actions among those
who have not sought medical help for that disorder’’
Case-Finding
‘‘The selected application of a test or inquiry, to identify those individuals
with a suspected disorder and exclude those without a disorder, usually in a
population who have sought medical help’’
Targeted (High-Risk) Case-Finding
‘‘The highly selected application of a test or inquiry, to identify individuals at
high risk of a specific disorder by virtue of known risk factors’’
Severity Assessment
‘‘The application of a test or inquiry, to quantify the severity of a specific
disorder’’
Adapted from Department of Health. Annual report of the National Screening Committee.
London: DoH, 1997.

(NPV) over positive predictive value (PPV) (see Chapter 5). In both screening and
case-finding the test may be applied ‘‘routinely’’ to all cases, or selectively to
those thought to be at high risk. A screening test applied to many individuals
should be as simple as possible to retain high uptake, and positive results must be
paired with an acceptable next step.1 A case-finding measure may be more
involved but should still consider acceptability. Adoption of a test in clinical
practice probably depends more on acceptability than accuracy.2

Historical Aspects
During the past five decades there has been a considerable effort to improve the
methods used to detect and quantify depression (Textbox 2.2).3–6 Some scales,
such as the Cronholm-Ottosson Depression Scale, have fallen into obscurity, while
others, such as the Hamilton Depression Rating Scale and the Beck Depression
Inventory, have each been cited over 10,000 times. Given that there are so many
similar depression scales, it is not surprising that clinicians have trouble choosing
between them. The American College of Psychology Consultants lists 213 psy-
chologically oriented scales with variable validation and reliability data,7 simpli-
fied here to 50 depression scales (Textbox 2.3). Fortunately, this may be distilled
further to ten key depression instruments, five created before 1980 and five more
modern inventions (table 2.1, 2.2). The classic scales are the Hamilton Depression
Rating Scale (HAM-D), the Montgomery-Åsberg Depression Rating Scale
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 31

(MADRS), the Beck Depression Inventory (BDI), the Zung Self-Rating

Depression Scale (SDS), and the Centre for Epidemiologic Studies Depression
Scale (CES-D). The five key scales developed since 1980 are the Hospital Anxiety
Depression Scale (HADS), the Geriatric Depression Scale (GDS), the Edinburgh
Postnatal Depression Scale (EPDS), the MOS 8-Item Depression Screener
(Burnam Screen), and the Patient Health Questionnaire (PHQ-9). In addition, I
have included the less-well-known Major Depression Inventory (MDI) as it has a
special role, facilitating a diagnosis based on both DSM-IV and ICD-10 criteria.
Tools examining more general psychopathology are purposely omitted from this
chapter even if they include a rating of depression. This includes some seminal
scales such as the General Health Questionnaire (GHQ) and the Hopkins
Symptom Checklist (SCL) family (SCL-90, SCL-25, and SCL-8).8–10 To keep
this chapter manageable I will also not discuss reliability and validity data in detail,
but further information can be found in relevant chapters by setting. A comparison
of these key scales is shown in Appendix 1.

Textbox 2.2. Development of Major Depression Scales

1952 DSM-I published

1960 Hamilton Depression Scale (HAM-D)
1961 Beck Depression Inventory (BDI)
1965 Zung Self-Rating Depression Scale (SDS)
1968 DSM-II published
1977 Center for Epidemiologic Studies Depression Scale (CES-D)
1977 ICD-9 published
1979 Montgomery-Åsberg Depression Rating Scale (MADRAS)
1980 DSM-III published
1980 The Bech–Rafaelsen Melancholia Scale (MES)
1982 Geriatric Depression Scale (GDS-30)
1983 Hospital Anxiety and Depression Scale (HADS)
1986 Abbreviated version of Geriatric Depression Scale (GDS-15)
1987 DSM-IIIR published
1987 Edinburgh Postnatal Depression Scale (EPDS)
1987 Inventory to Diagnose Depression (IDD)
1988 MOS-8 Burnam Screen
1992 ICD-10 published
1994 DSM-IV published
1996 Revision of BDI to BDI-II
2001 Patient Health Questionnaire (PHQ)
2001 Major Depression Inventory (MDI)
DSM (Diagnostic and Statistical Manual of Mental Disorders);
ICD – International Classification of Disease
32 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 2.3. Listing of Depression Scales

Generic Scales Special Population Scales
Beck Depression InventoryTM-Second Edition Aphasic Depression Rating Scale
(BDI-II)TM (ADRS)
Brief Psychiatric Rating Scale (BPRS) Calgary Depression Scale for
Brief Symptom Inventory (BSI) Schizophrenia (CDSS)
Burns Depression Checklist (BDC) Children’s Depression Inventory
Carroll Depression Scales-Revised (CDS-R) (CDI)
Center for Epidemiological Studies Depression The Children’s Depression Index
Scale (CES-D) (CDI)
Depression Anxiety Stress Scales (DASS) Children’s Depression Rating
Depression Questionnaire (DQ) Scale-Revised (CDRS-R)
Depression 30 Scale (D-30) Cornell Scale for Depression in
Diagnostic Interview Schedule Dementia (Cornell Scale)
(DIS-IV) Depression and Anxiety in Youth
Diagnostic Inventory for Depression (DID) Scale (DAYS)
Hamilton Depression Inventory (HDI) Depression Intensity Scale Circles
Hamilton Rating Scale for Depression (HRSD) (DISCs)
Hopelessness Depression Symptom Depression Rating Scale (DRS)
Questionnaire (HDSQ) Geriatric Depression Scale (GDS)
Hospital Anxiety and Depression Scale (HADS) Kiddie-Schedule for Affective
Inventory to Diagnose Depression (IDD) Disorders and Schizophrenia for
Inventory of Depressive Symptomatology (IDS) School-Age
IPAT Depression Scale Children-Present and Lifetime
Manual for the Diagnosis of Major Depression Version (K-SADS-PL)
(MDMD) Medical-Based Emotional Distress
Minnesota Multiphasic Personality Inventory 2 Scale (MEDS)
(MMPI-2) Depression Scale Multiscore Depression Inventory
Montgomery–Åsberg Depression Rating Scale for Children (MDI-C)
(MADRS) Postpartum Depression Interview
MOS 8-Item Depression Screener Schedule (PDIS)
Multiple Affect Adjective Checklist-Revised Psychopathology Inventory for
(MAACL-R) Mentally Retarded Adults
Multiscore Depression Inventory for (PIMRA)
Adolescents and Adults (MDI) Reynolds Adolescent Depression
Newcastle Scales Scale (RADS)
Positive and Negative Affect Scales (PANAS) Reynolds Child Depression Scale
Primary Care Evaluation of Mental Disorders (RCDS)
(PRIME-MD) Signs of Depression Scale (SDSS)
Profile of Mood States (POMS) Stroke Aphasic Depression
Raskin Three-Area Severity of Depression Scale Questionnaire (SADQ)
Revised Hamilton-Rating Scale for Depression Visual Analog Mood Scales
(RHRSD): (VAMS)
Reynolds Depression Screening Inventory (RDSI) Youth Depression Adjective
Rimon’s Brief Depression Scale (RBDS) Checklist (Y-DACL)
State Trait-Depression Adjective Check List
(ST-DACL)
Symptom Checklist-90-Revised (SCL-90-R)
Zung Self-Rating Depression Scale (Zung SDS)
Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empirically-based
measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 33

The Limitations of Severity Scales

Most mood scales have only an approximate relationship to the criteria of ICD
and DSM (see Textbox 2.2). None adhere strictly to these algorithmic criteria
(including duration and function), and as such they do not produce operational
diagnoses. Several early scales were developed to measure severity (see
Sensitivity to Change below) during treatment.11 Yet the value of a scale
does not necessarily correspond to its original or intended use—for example,
the EPDS may not be the optimal choice in perinatal settings and yet may be
valuable elsewhere. The evaluation and refinement of existing scales is dis-
cussed in Chapter 4. It remains a significant limitation that only a small number
of well-powered studies have compared the value of multiple scales head to
head.12,13 From these comparative studies, most suggest that severity scales
provide somewhat distinct estimates of depression diagnosis and severity (this
has been confirmed by Rasch analysis).14–16 For example, although all mea-
sure low mood, not all measure anhedonia, somatic symptoms, anxiety, sui-
cidal ideation, and well-being.
Depression scales are predominantly symptom counts over a narrowly
defined period. They do not tend to measure chronicity or effect on daily
function. Thus, they should not be considered a precise measure of burden of
depression. Neither do they measure met or unmet needs or the desire for help.
One fundamental issue is that it is not clear which of many possible symptoms
of depression are most important for diagnosis (see Chapter 1). For example,
some symptoms appear more likely to be associated with greater severity and
pervasiveness of depression.17 If some symptoms are more important than
others, should the scale weight items differently? This has been tried, but
without good validation and at a cost of significant scale complexity.
A second unresolved issue is whether depression differs significantly by
setting and by comorbid disease. If one presupposes that there is one syndrome
of depression manifest in all situations (eg, primary care, specialist care) and
all medical conditions, then the role of any scale is simply to best identify and
quantify these core symptoms. Although the ‘‘one size fits all’’ approach
sounds unlikely, it is essentially the approach taken by DSM-IV and ICD-10.
These do not attempt to define a syndrome of, say, ‘‘post-stroke depression’’ as
opposed to uncomplicated depression in primary care. A number of very
specific depression scales have been proposed to elicit special types of mood
disorders. Examples are listed in Textbox 2.3 and include the Depression Scale
in Schizophrenia (DEPS) scale,18 the Cornell Scale for the Assessment of
Depression in Dementia (CSDD),19 the post-stroke depression scale,20 the
Stroke Aphasic Depression Questionnaire (SADQ),21 and the Aphasic
Depression Rating Scale.22 The scientific basis for and against having special
scales for medical settings is discussed in Chapters 10 and 11. This usually
34 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

revolves around the issue of whether to keep or omit somatic items (see
Appendix 2). A final limitation is the temptation to overrely on scales to
improve quality of care. Numerous studies have explored this issues, which
is discussed by Gilbody in Chapter 7.

Patient-Rated Versus Clinician-Rated Scales

In the case of a mental illness where there is no foolproof gold standard, it is by
no means clear whether patient-rated or clinician-rated measures are more
useful.23 A list of such scales is shown in box 4. Neither patient (self)-rated

Textbox 2.4. Major Clinician vs. Self-Report Scales

Clinician-Rated Protocols
Hamilton Rating Scale for Depression
Inventory of Depressive Symptomatology (IDS-C)
Manual for the Diagnosis of Major Depression
Montgomery–Asberg Depression Rating Scale
Newcastle Scales
Raskin Three-Area Scale
Rimon’s Brief Depression Scale

Self-Report Inventories
Beck Depression Inventory-Second Edition
Carroll Depression Scales-Revised
Center for Epidemiological Studies Depression Scale
Diagnostic Inventory for Depression
Hamilton Depression Inventory
Hopelessness Depression Symptom Questionnaire
Inventory to Diagnose Depression
Inventory of Depressive Symptomatology (IDS-SR)
IPAT Depression Scale
Minnesota Multiphasic Personality Inventory 2 Depression Scale
MOS 8-Item Depression Screener
Multiscore Depression Inventory for Adolescents and Adults
Positive and Negative Affect Scales
Revised Hamilton Rating Scale for Depression: Self-Report
Reynolds Depression Screening Inventory
State Trait-Depression Adjective Check Lists
Zung Depression Self-Rating Depression Scale

Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empirically-
based measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 35

scales nor clinician-rated scales are inherently more sensitive to change nor
more accurate.24,25 A self-rated scale has certain benefits over interviewer-
rated scales and clinical interviews in large population studies. A self-rated
scale takes less time and does not require trained personnel. The adminis-
tration and scoring process is probably more standardized for self-rated
scales.26 Clinician-rated scales can directly augment a clinical interview. If
training is a requirement, then the skills of the clinician may also improve.
The major advantage of interviewer-rated scales is that the experience of the
interviewer comes into play. Faravelli and coworkers (1986)27 compared
the distributions of three doctor-rated scales and three self-rated scales in a
series of 100 depressed patients and noted that doctor-rated scales tend to be
asymmetric toward the left, while self-rated scales tend to be asymmetric
toward the right. This may result from the tendency of patients to judge their
own condition as more severe than average, while doctors tend to rate
severity as less than average. On the other hand, patients can underreport
symptoms in some situations.28 Our advice is to choose the type of scale
most suited to the purpose at hand.

Sensitivity to Change
In psychiatry the concept of sensitivity to change of mood was first used in
psychometric research during the 1970s.29,30 Yet sensitivity to change is a
phrase that has been variably defined in the literature and is poorly understood.
Most consider sensitivity to change to be the ability of a severity scale to detect
small changes in outcomes over time with repeated assessment. A more
accurate description of sensitivity to change is the proportion of those who
actually changed according to a gold standard (eg, responders) that were
correctly identified by the instrument under study (Fig. 2.1). One should also
consider specificity to change as a useful concept. This is the proportion of
those who actually did not change (eg, nonresponders) who are correctly
identified as such by the instrument. That said, no group has yet documented
specificity to change.
The HAM-D has been the main comparator in most sensitivity to change
papers.31 The HAM-D, MADRAS, BDI, and HADS have all been compared
head to head, but results do not demonstrate any consistent superiority of
one scale over another. Vermeersch and associates (2004)32 describe five
factors that may influence the sensitivity of a scale: inclusion of irrelevant
items, categorical items, items not conducive to detect change, items asses-
sing traits, and items susceptible to floor and ceiling effects.
Fundamentally, scales with many items are more likely to be sensitive to
subtle changes.
36 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Gold Standard Gold Standard

Change No Change

Instrument Change A/A + B

PPV
A B

Instrument No Change D/C + D

NPV
C D

Total A/A + C D/B + D

Se to Change Sp to Change

Figure 2.1. Accuracy of change in 2 2 format.

2. The Classic Severity Scales (1960–1980)

Hamilton Rating Scale for Depression (HAM-D)33
In 1953 Max Hamilton moved to Leeds, where he developed one of the best-
known scales in psychiatry.34 The original HAM-D was developed to quantify
severity after an interview had established a diagnosis of depression. Despite
its age the HAM-D remains the most commonly used scale in treatment
studies, helped by the fact that it is in the public domain.34 Indeed, it may
have been a victim of its own success, as independent groups have produced as
many as 20 conflicting variations.35
The HAM-D is rather unusual in that it is designed to be administered by a
trained clinician on the basis of the clinical condition at the time of the inter-
view. It requires a rather long semi-structured interview, taking 15 to 20
minutes. As such, it is probably not a good choice for screening in busy clinical
settings. It was developed before DSM criteria were established for depression
and differs significantly from the DSM approach, assessing four of the nine
DSM-IV criteria. It may favor somatic presentations, as eight items are
related to six somatic symptoms: insomnia, psychomotor retardation, loss of
appetite, loss of energy, loss of weight, and loss of libido. There have
been other criticisms, such as lack of a single unifying structure; differential
item weighing, and limited interrater reliability (although this can be
improved).36,37 In the past 5 years several shortened versions of the
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 37

HAM-D have appeared, including a seven-item version and a six-item ver-

sion.38–40 Using Rasch analysis, Bech and coworkers (1981)41,42 confirmed
that six items associated with unidimensionality could be combined. These
were depressed mood, guilt, work/interests, psychomotor retardation, anxiety
psychic, and general somatic symptoms. Several versions provide standardized
explicit scoring conventions and/or structured interview guidance.43

Montgomery-Åsberg Depression Rating Scale (MADRS)44

Montgomery and Åsberg45 published this 10-item scale in 1979 following
earlier development of the Comprehensive Psychopathological Rating
Scale (CPRS).46 Ratings of patients on the 65-item CPRS were used to
identify the 17 most common symptoms in depression, which were field-
tested in four antidepressant trials and hence refined to 10 items suggested
to show the largest changes with treatment. However, it is a mistake to
assume the MADRS is necessarily most sensitive to change (see above);
indeed, a meta-analysis showed that the HAM-D has superior sensitivity to
change.47
Like the HAM-D, this is a clinician-rated scale designed for a trained
interviewer, although a self-rating form was later developed. It covers the
clinical condition at the time of the interview and does not specify a time-
frame during which the patient should be rated. The 10-item checklist
actually consists of 1 observational item and 9 question items that require
about 15 minutes of additional interview time. The items covered are
apparent sadness, reported sadness, inner tension, reduced sleep, reduced
appetite, concentration difficulties, lassitude, inability to feel, pessimistic
thoughts, and suicidal thoughts. These items also cover all the DSM-IV
criteria for major depression, with the exception of psychomotor retardation
or agitation.

Beck Depression Inventory (BDI)48

The original version of this scale was developed by Aaron Beck and colleagues
at the University of Pennsylvania and first published in 1961.49 It can be
administered by a trained professional or self-administered and covers an
explicit 2 weeks before the evaluation (1 week in the original version). The
21-item version requires 5 to 10 minutes. Each item is scored on a consistent
scale of 0 to 3, with options presented in a multiple-choice format. A reading
age of about 10 years is required for a person who is self-administering the test.
In the original publication no timeframe is mentioned, but in the BDI-IA
38 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

revision, this was changed to 1 week and in the BDI-II the time frame was
extended to 2 weeks to more closely follow the DSM criteria for MDD.
Version II (1996) also replaced body image change, weight loss, somatic
preoccupation, and work difficulty with agitation, worthlessness, concentra-
tion difficulty, and loss of energy. The scale is considered to emphasize
psychological items. In fact, there are eight ‘‘cognitive items’’ (pessimism,
past failures, guilty feelings, punishment feelings, self-dislike, self-critical-
ness, suicidal thoughts or wishes, and worthlessness) and nine ‘‘somatic items’’
(crying, agitation, indecisiveness, loss of energy, change in sleep patterns,
change in appetite, concentration difficulties, tiredness and/or fatigue, and
loss of interest in sex). Other items are sadness, loss of pleasure, loss of interest,
and irritability. The cognitive and somatic items, when considered as sub-
scales, are typically moderately correlated.50
Recently Beck and associates developed the Beck Depression Inventory
Fast Screen (BDI-FS) to address possible somatic contamination.51 It con-
tains 7 of the original 21 BDI-II items to assess cognitive and affective
aspects of depression, conforming with DSM-IV diagnostic criteria. It was
developed to permit more rapid detection of depression in primary care and
hospital settings.
Original validation data was derived two samples, a group of 500 patients
from four psychiatric outpatient facilities and a group of 120 college students.
Rasch analysis of BDI has been reported.52 The BDI was administered to 660
adult patients with unipolar depression and examined using factor analysis.
BDI was internally consistent but yet distinct in severity rating from the
MADRS.53

The Zung Self-Rating Depression Scale (SDS)54

The Zung SDS is a 20-item scale in its original form that takes about 5 to 8
minutes to administer.55 It is the prototypical self-report depression scale. Of
the 20 items, half are worded positively (‘‘I feel hopeful about the future’’) and
half negatively (‘‘I feel downhearted and blue’’). Each item is consistently
rated with a 4-point Likert scale (a little of the time ¼ 1; some of the time ¼ 2; a
good part of the time ¼ 3; or most of the time ¼ 4). A meta-analysis summar-
ized validity studies up to 1986.56 A large factor analysis in over 1,000 cancer
patients showed a four-factor solution: a cognitive symptom factor, a depressed
mood factor, and two somatic factors (eating-related and non–eating-related),
accounting for 20%, 13%, 7%, and 8% of the variance on the Zung, respec-
tively.57 Rasch analysis of the Zung SDS has been performed.58 Several short
forms have been developed, including a 12-item,59 an 11-item,60 and a 10-item
version.61
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 39

The Centre for Epidemiologic Studies Depression Scale62

This 20-item scale was originally developed as a screening instrument for
community-based studies from existing scales such as the BDI and Zung
SDS.63 It was designed at the U.S. National Institute of Mental Health
(NIMH) with government rather than university funding. It bridged both
epidemiologic and clinical needs and was first used in an epidemiologic
study of Kansas City64 and became the most used depression scale in the
1990s. It includes items concerning low mood and loss of interest but not
suicidal ideation. Original psychometric properties were based on three
community samples and two psychiatric patient samples consisting of
about 5,000 healthy individuals but only 70 adult psychiatric patients. Four
of the 20 items are positively worded and reverse scored (negatively keyed).
CES-D is designed for self-completion, telephone administration, or web-
based administration. The approach is mostly psychological, with some
somatic items. The CES-D has four separate factors: low mood, somatic
symptoms, positive affect, and interpersonal relations. A revised version
has been published, the CESD-R, which is more in line with DSM. There
are a variety of short forms, most notably several 10-item versions and a
5-item version.65 Recently Rasch-modeled short forms have been reported in
a general population.66 A second model has been applied to the depressed
population.67

3. The New Severity Scales (1981–2008)

Hospital Anxiety Depression Scale (HADS)68
The HADS can be considered the first in a new generation of scales that were
shorter, easier to score, and no less accurate than the first generation. It is a
relatively brief self-administered rating scale of symptoms and functioning.
Anxiety and depression are assessed as separate components, each with seven
items that are rated from 0 (no problem) to 3. A cut-off of 7v8 in each
subscale is usually recommended, although others have been used.69
Although the scores for the two components have often been added together
to give a composite anxiety–depression score (or emotional distress), this is
not recommended by the authors. It is a fairly simple scale that does not
include somatic and cognitive signs of depression. Limitations are that seven
of nine DSM criteria are not covered in the HADS and the reverse rating of
some items, together with the random sorting of depression and anxiety
questions, can cause confusion. It excludes reduced appetite, weight loss,
sleeping disturbances, fatigue, and concentration difficulties and also
excludes guilt, worthlessness, and suicidality. Notably, it does not include a
40 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

3000

2500

2000

1500

1000

500

n
en

en
n

Se een
Th e

en
n
e

ee
ro

e
ne

en
x

ee
gh

v
ve
re

in
Tw

Te
Si

te
el
Ze

fte
Fi

nt
O

ev
Th

irt
ei

xt
Se

gh
Tw

ve
Fi

Si
El

Ei
Figure 2.2. Distribution of HADS-D scores in 18,414 primary care attendees. Adapted
from Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition
of depressive symptoms in primary care: The Hampshire Depression Project 3.
Br J Psychiatry. 2001;179:317–323.

question on low mood per se. These choices may or may not be advantageous
in general hospital and primary care settings (see Chapters 10 and 11 for
discussion). Despite these limitations, the HADS has found an important
place and has been used in impressive studies involving thousands of patients
(Fig. 2.2).70–72 Good data are also available on values in nonclinical
populations.73

Geriatric Depression Scale (GDS)74

In its original form the GDS consists of a simple list of 30 questions, all
of which require a ‘‘yes’’ or ‘‘no’’ answer.75 However, a 15-item version
is very commonly used. Ten of the items on the GDS-30 and five of the
items on the GDS-15 are negatively keyed (ie, a ‘‘no’’ response is an
endorsement of a depressive symptom). The GDS is a self-report instru-
ment, and a telephone version has demonstrated good agreement with the
self-report questionnaire. The GDS focuses on the psychological symp-
toms of depression, particularly changes in mood and thoughts. Few
somatic items are included on the GDS—specifically, sleep, appetite,
gastrointestinal symptoms, autonomic symptoms, and sexual symptoms
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 41

are not assessed. GDS-30 covers five of the DSM-IV criteria using
differing terminology (lowered mood, loss of interest, loss of energy,
impaired concentration, and restlessness), and GDS-15 covers three (low-
ered mood, loss of interest, and loss of energy). Questions about suicidal
ideation were intentionally not included, and the scoring of items makes
the GDS a poor choice for rating the burden or severity of depression.
Rasch analysis of GDS has been reported.76 In one study of 526 people
over 65 in home care, the optimal cutoff on the GDS-15 was 5, which
yielded a sensitivity of 71.8% and a specificity of 78.2%.77 A systematic
review of the GDS found 42 studies with a mean sensitivity of 0.753 and
specificity of 0.770 for the GDS-30 and a sensitivity of 0.805 and a
specificity of 0.750 for the GDS-15.71 GDS versions showed significantly
better validity indices than the ‘‘Yale-1-question’’ screen but were similar
to the CES-D. Briefer 10-item, 5-item, and 4-item versions and even a
1-item version have been developed, but their value is currently uncertain.

The Edinburgh Postnatal Depression Scale (EPDS)78

Cox and colleagues developed this scale after noting that some women
endorse somatic items on existing scales because of the physiologic
changes of childbearing and because of normal postnatal sleep distur-
bance.79,80 The authors used clinical intuition to identify possible items
from questionnaires such as the SAD and HAD scales and the BDI. Thirty
items were initially tested, and 13 items that were thought likely to detect
mothers with clinical depression were tested on a sample of 60 postnatal
women against the Clinical Interview Schedule. After factor analysis this
was shortened to the final 10-item scale. Interestingly, the EPDS contains
no specific item about mother–baby interaction or about irritability, which
allowed its use to be expanded beyond perinatal settings. Its appeal is
enhanced by its simple Likert scoring—0 for no presence of the symptom
through 3 for marked presence/change in usual state. It incorporates anxiety
but not suicidality.
Studies suggest that the EPDS includes three factors expressing euthymic
mood, anxiety, and depression. Anxiety (items 3, 4, 5, 6, and 7), depression
(items 8, 9, and 10), and anhedonia (items 1 and 2) are the main components of
the questionnaire, accounting for 63% of the variance.81 A short five-item
version of the EPDS was developed after stepwise multiple regression analysis
was used to find the combination of items that explains the maximum propor-
tion of the variance of the full-scale sum score in 2,730 women. The selected
EPDS items were thereafter correlated with the Hopkins Symptom Check List
(HSCL-25)82 for external validation. The five items were ‘‘I have felt sad or
42 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

miserable,’’ ‘‘I have been anxious or worried for no good reason,’’ ‘‘I have been
so unhappy that I have had difficulty sleeping,’’ ‘‘I have blamed myself
unnecessarily when things went wrong,’’ and ‘‘I have looked forward with
enjoyment to things.’’ Rasch analysis of the EPDS suggested that a revised
eight-item version (EPDS-8) might provide a more psychometrically robust
scale.83 Recent mandated screening programs in Australia and the United
States have recommended routine administration of the EPDS, although
National Institute for Health and Clinical Excellence (NICE) guidance in the
United Kingdom does not.

MOS 8-Item Depression Screener (Burnam Screen)84

This short tool was developed for use in the National Study of Medical Care
Outcomes (MOS).85 It was essentially an adaptation of the CES-D, although
two items related to duration of symptoms (required for DSM diagnosis/
caseness) were drawn from the DIS. The tool has only eight items, although
#7 and #8 are rather unwieldy single questions: 1. I felt depressed, 2. My
sleep was restless, 3. I enjoyed life, 4. I had crying spells, 5. I felt sad, 6. I
felt that people disliked me, 7. In the past year, have you had 2 weeks or
more that you felt sad, blue, depressed, or lost pleasure in things that you
usually cared about or enjoyed?, 8. Have you had 2 years or more in your
life when you felt depressed or sad most days, even if you felt okay some-
times? (If yes:) Have you felt depressed or sad much of the time in the past
year?
Validation data were provided by two samples: 3,132 adults in the Los
Angeles sample of the Epidemiological Catchment Area (ECA) study, and 525
adults from the Psychiatric Screening Questionnaire for Primary Care Patients
(PSP) study. However, a limitation is that a complex scoring algorithm has
been suggested. Additionally, in comparison with the NIMH’s Structured
Clinical Interview for DSM-IV, the screen had low positive predictive value
(Tuunainen et al., 2001).86

The Patient Health Questionnaire (PHQ)87

The PHQ is the self-administered version of the Primary Care Evaluation of
Mental Disorders (PRIME-MD) instrument, which was designed to diagnose
specific disorders in primary care settings using DSM criteria.88 The whole
PRIME-MD has two components: a 1-page patient questionnaire (PQ) and a
12-page clinician evaluation guide (CEG). The PQ, which is completed by the
patient before seeing the primary care physician (PCP), consists of 26 yes/no
questions inquiring about symptoms that were present during the past month.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 43

The focus is on a depressive episode (the SCID focuses on depressive

disorder).
The depression module comprises nine questions (PHQ-9). The first two
questions (known as the PHQ-2), which refer to the ‘‘cardinal’’ symptoms of
anhedonia and depressed mood, can be administered separately as a
screening tool. This scale rates the proportion of time from ‘‘0’’ (not at all)
to ‘‘3’’ (nearly every day). Rated linearly, a cutoff of 10 is suggested to
represent mild depression. However, individual items can be combined
according to a DSM-IV algorithm to generate a diagnosis of major or minor
depression. The DSM-IV exclusion criteria for a depressive disorder are not
included in the PHQ-9; therefore, the PHQ9 diagnosis closely approximates
but is not identical to a DSM-IV diagnosis. Validation of the PHQ-9 took
place in 6,000 patients in eight primary care clinics and seven obstetrics-
gynecology clinics.89
The short version of the PHQ is almost as well known as the long
version. The PHQ-2 is a two-item screen which uses the first two items
from the PHQ that inquire about the frequency of depressed mood (ques-
tion 2) and loss of interest (question 1) over the past 2 weeks, scoring
each as 0 (‘‘not at all’’) to 3 (‘‘nearly every day’’). A score of three points
or more on this version of the PHQ-2 is sometimes recommended.81
However, an even simpler version calls for simple ‘‘yes’’ or ‘‘no’’
responses, with a ‘‘yes’’ response to either question constituting a positive
screen. The questions are as follows: Over the past month, have you often
had little interest or pleasure in doing things? (Yes/ No) Over the past
month, have you often been bothered by feeling down, depressed, or
hopeless? (Yes/ No). A two-stage screening with the PHQ-2 and then
the PHQ-9 has been investigated and is probably more efficient than
either test alone. However, when given by pen and paper, the time
taken to check if there is a positive PHQ-2 may limit the efficiency
saving.

Major Depression Inventory (MDI)90

This self-rated questionnaire aims to help make a diagnosis of major depres-
sion, according to either the DSM-IV criteria or the ICD-10 criteria.91 It covers
the previous 2 weeks and requires 5 to 10 minutes. An answer of ‘‘more than
half of the time’’ to at least 5 of the 10 questions is indicative of major
depression. It has 10 questions, although items 8 and 10 each have two
subitems, a and b—therefore, it can be considered 12 items. Ratings are
consistent from 0 (at no time) to 5 (all of the time), giving a total score from
0 to 50. A score of 4 or more on an item (ie, most of the time) qualifies for the
algorithm of ICD-10 or DSM-IV. The ICD-10 algorithm requires a score of 4
44 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

or 5 on two of the three top items and on at least four of the remaining items.
The DSM-IV algorithm requires a score of 4 or 5 on five of the nine items (item
4 being excluded), but at least one of these five items must be either depressed
mood or loss of interest.
Few validation studies or translations of the MDI exist.92 A comparative
study of the SDS and MDI in 89 patients with Parkinson’s disease suggested
that the MDI is superior to the SDS.93 The largest study compared the MDI in
1,093 persons also interviewed by psychiatrists using SCAN. The specificity of
the MDI was 0.22, the sensitivity 0.67, and kappa 0.25 when major depression
according to SCAN was considered as the index of validity, and with all
depressive disorders the specificity was 0.44, the sensitivity 0.51, and kappa
0.33. More highly educated persons and those with reported disability were
less likely to be false negatives.94

4. The Future of Screening Scales

The ideal scale is one that is very brief, highly acceptable, and very accurate
when tested against an accepted reference standard. It may also be an
advantage if it obeys current conventional diagnostic rules from ICD or
DSM and is freely available but long enough to gauge severity and measure
change. It is unclear whether one scale can fulfill all these purposes, but
there is a trend to develop ever-shorter scales that attempt to retain high
accuracy. All scales must consider the tension between acceptability and
accuracy.

Improving Acceptability
Following on from the originals, ever-shorter versions of every major
scale have been released, usually comprising 10 items or less
(Textbox 2.5). A good example is the 8-item Even Briefer Assessment
Scale for Depression (EBAS DEP) derived from the 21-item Brief
Assessment Scale.95 Of course, eight items might not be short enough for
many settings, and in the extreme case single-item methods (applied by pen
and paper, verbally, or in visual analog form) have been evaluated. The first
‘‘ultra-short’’ scales began to appear in the 1970s with early visual analog
methods of rating mood.96
Just how good are these short and ultra-short scales?97 Whooley and
colleagues (1997)98 compared CES-D (20- and 10-item versions), BDI
(20- and 13-item versions), Symptom-Driven Diagnostic System for
Primary Care (SDDS-PC), and MOS-8 against the Quick Diagnostic
Interview Schedule for major depression. Using summary statistics
Table 2.1. Conventional Cutoff Scores for Different Severities of Depression

Scale Abbreviation No Depression Mild Moderate Severe

(asymptomatic Depression
and
subsyndromal)
Hamilton HAM-D 0 to 7 8 to 13 14 to 18 19 to 63
Depression
Scale
Beck Depression BDI 0 to 9 10 to 16 17 to 29 30 to 63
Inventory
Beck Depression BDI-II 0 to 13 14 to 19 20 to 28 29 to 63
Inventory II
Geriatric GDS-30 0 to 9 10 to 19 20 to 30 20 to 30
Depression
Scale (original)
Zung Self-Rating SDS 0 to 49 50 to 59 60 to 69 70 to 80
Depression
Scale
Hospital Anxiety HADS-D 0 to 7 8 to 10 11 to 14 15 to 21
and Depression
Scale
Montgomery- MADRS 0 to 6 7 to 19 20 to 34 35 to 60
Åsberg
Depression
Rating Scale
Center for CESD 0 to 15 16 to 20 21 to 26 27 to 60
Epidemiologic
Studies
Depression
Scale
Edinburgh EPDS 0 to 9 9 to 12 13 to 30 13 to 30
Postnatal
Depression
Scale
Patient Health PHQ-9 0 to 5 6 to 9 10 to 19 20 to 27
Questionnaire
Patient Health PHQ-9 0 to 9 10 to 16 17 to 22 23 to 27
Questionnaire
(remapped to
DSM-IV)
Major Depression MDI 0 to 13 14 to 19 20 to 26 27 to 50
Inventory

45
Table 2.2. Summary of Scale Properties

Year Scale Abbreviation Original Max Rater Copyright Duration Time Cites Suicidality Somatic
Items Score Frame Per Included? Bias (most
Year to least)
1960 Hamilton HAM-D 21 63 Clinician Public 15 min Past 237 Yes #1
Depression Scale domain week
1961 Beck Depression BDI 21 63 Patient Harcourt 10 min Past few 225 Yes #6
Inventory Assessment days
(BDI)
Last 2
weeks
(in BDI
II)
1965 Zung Self-Rating SDS 20 80 Patient Public 5–8 min Past 84 Yes #5
Depression Scale domain several
days
1977 Center for CESD 20 60 Patient Public 4–5 min Past 256 No #7
Epidemiologic domain week
Studies Depression
Scale
1979 Montgomery- MADRS 10 60 Observer Copyright 10 min Current 107 Yes #4
Åsberg Depression
Rating Scale
Table 2.2. (Continued)

Year Scale Abbreviation Original Max Rater Copyright Duration Time Cites Suicidality Somatic
Items Score Frame Per Included? Bias (most
Year to least)
1982 Geriatric Depression GDS-30 30 30 Patient Public 10 min Past 94 No #10
Scale (original) domain week
1983 Hospital Anxiety HADS 14 42 Patient NFER- 5 min Past 195 No #6
and Depression Nelson week
Scale
1986 Geriatric Depression GDS-15 15 15 Patient Public 5 min Past 31 No #8
Scale (modified) domain week
1987 Edinburgh Postnatal EPDS 10 30 Patient Copyright 1–2 min Past 50 No #11
Depression Scale week
1988 MOS-8 Burnam MOS-8 8 20 Patient RAND 2–5 min 2 weeks 12 No #9
Screen Corporation and 2
years
2001 Patient Health PHQ 9 27 Patient Public 2–4 min 2 weeks 53 Yes #2
Questionnaire domain
2001 Major Depression MDI 10 60 Patient Elsevier 3–5 min 2 weeks 7 Yes #3
Inventory
48 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(Table 2.3), the optimal tests appear to be MOS-8 > CES-D20 > CES-
D10 > BDI-20 > BDI-13 >SDDS-PC, with the least accurate method
being the PHQ-2. However, even the PHQ-2 was good at excluding
nondepressed cases with a high negative predictive value. However, this
finding does not allow for test efficiency—that is, correcting for the
length of the scale. Such weighting requires an economic evaluation,
and such studies are in progress. This finding has since been extended,
showing that even single-item mood scales can be valuable, albeit as a
form of rule out (reassurance) for those who answer negatively.

Textbox 2.5. Short Versions of Rating Scales (10 items or less)

Ten Items Five Items
EPDS-10 (original) EPDS-5
SDS-10 WHO-5
CES-D 10 GDS-5
DEPS-10 Emotion Thermometers
MADRS-10 (original) Four Items
Nine Items GDS-4
PHQ9 Three Items
HDI-Short Form PHQ2 + help question
Eight Items EPDS-3 (anxiety items)
MOS-8 Two Items
EPDS-8 PHQ2
PHQ-8 Whooley / NICE 2 Questions
EBAS-Dep BDI-2
Seven Items EPDS-2
HADS-Depression One Item
HADS-Anxiety PHQ Q1
HAM-D-7 PHQ Q2
BDI-7 GDS-1
DADS-7 Distress Thermometer
EPDS-7 (depression items)
Six Items
EPDS-6
HAM-D-6
CES-D-6

Short methods improve acceptability, but there may be other techniques

to improve uptake. A tool can be administered in the waiting room or by
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 49

Table 2.3. Accuracy of Various Depression Scales Head to Head

Questionnaire Sensitivity Specificity PPV NPV PSI Youden FC AUC

PHQ2 0.96 0.57 0.33 0.98 0.31 0.53 63.99 0.82
SDDS-PC 0.96 0.51 0.30 0.98 0.28 0.47 59.14 0.86
MOS-8 0.93 0.72 0.42 0.98 0.40 0.65 75.75 0.89
CESD20 0.93 0.69 0.40 0.98 0.38 0.62 73.32 0.89
CESD10 0.90 0.72 0.41 0.97 0.38 0.62 75.19 0.87
BDI21 0.89 0.64 0.35 0.96 0.31 0.53 68.47 0.87
BDI13 0.92 0.61 0.34 0.97 0.31 0.53 66.42 0.86

PSI, predictive summary index; PPV, positive predictive value; NPV, negative predictive value; FC,
fraction correct; AUC, area under the curve.
Data from Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. J Gen
Intern Med. 1997;12(7):439.

mail. Increasingly, questionnaires are becoming computerized and can be

given using a Palm Pilot or Tablet or over the Internet (this is discussed
further in Chapter 8). The format of a questionnaire can be influential. For
example, a single-item visual analog item takes no more time than a verbal
item but can quantify a symptom. The seven-item version of the emotion
thermometers tested in cancer and cardiovascular settings is shown in the
Appendix Figure 5.

Improving Accuracy
Algorithmm Approaches
In clinical practice, prevalence is typically low (between 10% and 30%), and
therefore a high negative predictive value is relatively easy to achieve but a high
positive predictive value is difficult. For example, if one applied a screening test
with 80% sensitivity and specificity to a sample of 1,000 individuals with a 20%
rate of depression, the positive predictive value would be 0.50 and the negative
predictive value 0.94 (overall accuracy ¼ 0.80 by fraction correct) (see Appendix
Table Single 3). Given that only 50% of those with a positive result would actually
have depression, what would happen if you applied a second test to those who
scored positive but relied on the results from the first screen for those who scored
negative? This is illustrated in Appendix Figure 3. From Appendix Table
MultiStep 3 providing the second instrument’s sensitivity and specificity of
80% held for the filtered population, the positive predictive value rises to 0.67
at a cost of a small fall in the negative predictive value to 0.85 (overall accuracy
¼ 0.83). In short, applying a second step to those who screen positive in step 1
favors specificity at a cost of sensitivity but with a gain in overall accuracy. This
example of the application of two tests with 80% sensitivity and specificity might
50 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

be unrealistic in clinical practice. Often different test performances are achievable

in each step. A difficult question to answer is: What would be best, to choose
instruments with high sensitivity or high specificity applied in step 1 or step 2?
The answer from Table AP.4 is that it is best to apply the most accurate instrument
first, where clinically possible (although often in screening the reverse occurs). If
both instruments have the same combined value but different sensitivity and
specificity values, the optimal yield can be calculated. The rule of thumb for a
two-step approach for a low-prevalence setting is to avoid putting two instruments
that favor sensitivity together, particularly if one has high sensitivity in the second
step, and this may produce low overall yields. Practical application of two-step
approaches have been recently described.99,100

Weighting Specific Items

In the future there will be re-examination of the weighting of specific symp-
toms of depression in relation to depression in each setting. The current
concept of depression is that there are certain essential core symptoms that
define the disorder and others that contribute to severity.101–104 This may or
may not hold true. A scientific understanding of optimal depression items has
appeared only in the past 3 years. Zimmerman and colleagues have re-exam-
ined the traditional symptoms of depression to discover if all the conventional
symptoms listed in DSM-IV or ICD-10 contribute to a diagnosis of depression.
The difficulty with this method that there is no accepted gold standard (see
Chapter 1). One way around this problem is to simply examine how many
fulfill full DSM-IV (or ICD-10) criteria if only certain symptoms are counted.
Zimmerman and colleagues proposed combining two core and three psycho-
logical symptoms—namely depressed mood, lack of interest, worthlessness,
poor concentration, and thoughts of death. Against full DSM-IV, this abbre-
viated checklist had a sensitivity was 93.7%, specificity 94.8%, positive pre-
dictive value 95.5%, and negative predictive value 91.6%. Andrews and
associates (2007)105 replicated this finding from data from the 10,641 respon-
dents to the Australian National Survey of Mental Health and Well-Being
using the 12-month version of the Composite International Diagnostic
Interview. In this study sensitivity was 92.9%, specificity 99%, positive pre-
dictive value 94%, and negative predictive value 99.7%. Another method is to
start with short versions and only add in items that prove useful. Brody and
colleagues (1998)106 found that adding four follow-up questions on sleep
disturbance, appetite, anhedonia, and self-esteem to the two-question
PRIME-MD markedly improved the specificity while maintaining the
sensitivity.
Future developments will also take into account aspects of depression not
measured by symptom counts alone—for example, tools that measure dura-
tion, impact, function, and desire for professional help.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 51

References
1. Wittkampf KA, van Zwieten M, Smits FT, et al. Patients’ view on screening for
depression in general practice. Fam Pract. 2008;25:438–444.
2. Jepson R, Clegg A, Forbes C, et al. The determinants of screening uptake and
interventions for increasing uptake: a systematic review. Health Technol Assess.
2000;4:14.
3. Grinker RR Sr, Miller J, Sabshin M, et al. The phenomena of depressions. New York:
Hoeber, 1961.
4. Nezu AM, Ronan GF, Meadows EA, et al. Practitioners’ guide to empirically based
measures of depression. Kluwer Academic/Plenum Publishers 2000.
5. Williams JW, Pignone M, Ramirez G, et al. Identifying depression in primary care:
a literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24(4):
225–237.
6. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression:
a meta-analysis. Can Med Assoc J. 2008;178:997–1003.
7. http://www.mentaltests.com/cms/mentaltests_list.
8. Parloff MB, Kelman HC, Frank JD. Comfort, effectiveness, and self-awareness as
criteria of improvement in psychotherapy. Am J Psychiatry. 1954;111:343–351.
9. Derogatis LR, Lipman RS, Covi L. SCL-90: An outpatient psychiatric rating scale,
preliminary report. Psychopharmacol Bull. 1973;9:13–28.
10. Fink P, Ornbol E, Hansen MS, et al. Detecting mental disorders in general hospitals by
the SCL-8 scale. J Psychosom Res. 2004;56(3):371–375.
11. Demyttenaere K, De Fruyt J. Getting what you ask for: On the selectivity of depression
rating scales. Psychotherapy Psychosomatics. 2003;72(2):61–70.
12. Ruhe HG, Dekker JJ, Peen J, et al. Clinical use of the Hamilton Depression
Rating Scale: is increased efficiency possible? A post hoc comparison of
Hamilton Depression Rating Scale, Maier and Bech subscales, Clinical Global
Impression, and Symptom Checklist-90 scores. Comprehensive Psychiatry.
2005;46(6):417–427.
13. Leentjens AF, Lousberg R, Verhey FRJ. The psychometric properties of the Hospital
Anxiety and Depression Scale in patients with Parkinson’s disease. Acta
Neuropsychiatr. 2001;13:83–85.
14. Richter P, Werner J, Heerlein A, et al. On the validity of the Beck Depression
Inventory. A review. Psychopathology. 1998;31(3):160–168.
15. Shafer AB. Meta-analysis of the factor structures of four depression
questionnaires: Beck, CES-D, Hamilton, and Zung. J Clin Psychol.
2005;62(1):123–146.
16. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration
of three scales in the GENDEP study. Psychol Med. 2008;38(2):289–300.
17. Faravelli C, Servi P, Arends JA, et al. Number of symptoms, quantification, and
qualification of depression. Comprehensive Psychiatry. 1996;37(5):307–315.
18. Huttunen J, Taiminen T, Kähkönen J, et al. Depression Scale (DEPS) in schizophrenia.
Acta Psychiatr Scand. 1999;99(3):220–222.
19. Alexopoulos GS, Abrams RC, Young RC, et al. Cornell Scale for Depression in
Dementia. Biol Psychiatry. 1988;23(3):271–284.
20. Gainotti G, Azzoni A, Razzano C, et al. The Post-Stroke Depression Rating Scale: a
test specifically devised toinvestigate affective disorders of stroke patients. J Clin Exp
Neuropsychol. 1997;19(3):340–356.
52 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

21. Leeds L, Meara RJ, Hobson JP. The utility of the Stroke Aphasia Depression
Questionnaire (SADQ) in a stroke rehabilitation unit. Clin Rehab.
2004;18(2):228–231.
22. Benaim C, Cailly B, Perennou D, et al. Validation of the Aphasic Depression Rating
Scale. Stroke. 2004;35:1692.
23. Clements KM, Murphy JM, Eisen SV, et al. Comparison of self-report and clinician-
rated measures of psychiatric symptoms and functioning in predicting 1-year hospital
readmission. Administration And Policy In Mental Health And Mental Health Services
Research. 2006;33(5):568–577.
24. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry.
2000;15(3):160–172.
25. Rush AJ, Carmody TJ, Ibrahim HM, et al. Comparison of self-report and clinician
ratings on two inventories of depressive symptomatology. Psychiatr Serv.
2006;57(6):829–837.
26. Biggs JT, Wylie LT, Ziegler VE. Validity of the Zung Self-Rating Depression Scale.
Br J Psychiatry. 1978;132:381–385.
27. Faravelli C, Albanesi G, Poli E. Assessment of depression: a comparison of rating
scales. J Affect Disord. 1986;11:245–253.
28. Hunt M, Auriemma J, Cashaw ACA. Self-report bias and underreporting of depression
on the BDI-II. J Personality Assess. 2003;80(1):26–30.
29. Vaughan M, Krawiecka M. Sensitivity to change in symptoms of new scales for rating
chronic psychotic patients. Int Pharmacopsychiatry. 1979;14(3):121–126.
30. Maier W, Philipp M, Demuth W, et al. Reliability, validity, transferability and
sensitivity to change of 3 rival observer rating-scales for the severity of depression
(HAM-D, MADRS, BRMS). Int J Neurosci. 1986;31(1–4):288.
31. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale; has
the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177.
32. Vermeersch DA, Whipple JL, Lambert MJ, et al. Outcome questionnaire: Is it
sensitive to changes in counselling center clients? J Counsel Psychol.
2004;51(1):38–49.
33. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry.
1960;23:56–62.
34. http://healthnet.umassmed.edu/mhealth/HamD.pdf.
35. Zitman FG, Mennen MF, Griez E, et al. The different versions of the Hamilton
Depression Rating Scale. Psychopharmacology. 1990;9:28–34.
36. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale: has
the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177.
37. Williams JB. A structured interview guide for the Hamilton Depression Rating Scale.
Arch Gen Psychiatry. 1988;45:742–747.
38. Khullar A, McIntyre RS. An approach to managing depression. Defining and
measuring outcomes. Can Fam Physician. 2004;50:1374–1380.
39. McIntyre RS, Konarski JZ, Mancini DA, et al. Measuring the severity of depression
and remission in primary care: validation of the HAMD-7 scale. Can Med Assoc J.
2005;173:1327–1334.
40. Bobes J, Bulbena A, Luque A, et al. The sufficiency of the HAM-D6 as an outcome
instrument in the acute therapy of antidepressants in the outpatient setting. Int J
Psychiatry Clin Practice. 2007;11(2):146–150.
41. Bech P, Gram LF, Dein E, et al. Quantitative rating of depressive states. Acta
Psychiatr Scand. 1975;51:161–170.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 53

42. Bech P, Allerup P, Gram LF, et al. The Hamilton Depression Scale: evaluation of
objectivity using logistic models. Acta Psychiatr Scand. 1981;63:290–299.
43. Kalali A, Williams JBW, Kobak KA, et al. The new GRID HAM-D: pilot testing and
international field trials. Int J Neuropsychopharmacol. 2002;5:S147–S148.
44. Montgomery SA, Åsberg M. A new depression scale designed to be sensitive to
change. Br J Psychiatry. 1979;134:382–389.
45. http://www.neurotransmitter.net/depressionscales.html.
46. Asberg M, Montgomery SA, Perris C, et al. A comprehensive psychopathological
rating scale. Acta Psychiatr Scand Suppl. 1978;271:5–27.
47. Carroll BJ, Wilson WH. HAM-D and MADRS as depression change measures. In:
New Clinical Drug Evaluation Unit (NCDEU) Program Abstracts, 40th Annual
Meeting, 2000. Rockville, MD: National Institute of Mental Health, poster number 9.
48. Beck AT, Ward CH, Mock J, et al. An inventory for measuring depression. Arch Gen
Psychiatry. 1961;4:561–571.
49. http://harcourtassessment.com/haiweb/cultures/en-us/productdetail.htm?pid=015–
8018–370.
50. Storch EA, Roberti JW, Roth DA. Factor structure, concurrent validity, and internal
consistency of the Beck Depression Inventory-Second Edition in a sample of college
students. Depression Anxiety. 2001;19(3):187–189.
51. Beck AT, Steer RA, Brown GK. BDI-II fast screen for medical patients manual.
London: The Psychological Corporation, 2000.
52. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI):
Applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand.
1987;76(5):568–573.
53. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration
of three scales in the GENDEP study. Psychol Med. 2008;38:289–300.
54. Zung WW. A self-rating depression scale. Arch Gen Psychiatry. 1965;12:63–70.
55. http://healthnet.umassmed.edu/mhealth/ZungSelfRatedDepressionScale.pdf.
56. Lambert MJ, Hatch DR, Kingston MD, et al. Zung, Beck, and Hamilton Rating Scales
as measures of treatment outcome: a meta-analytic comparison. J Consulting Clin
Psychol. 1986;54(1):54–59.
57. Passik SD, Lundberg JC, Rosenfeld B, et al. Factor analysis of the Zung Self-Rating
Depression Scale in a large ambulatory oncology sample. Psychosomatics.
2000;41:121–127.
58. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale
incorporating latent class and Rasch rating scale models. Educational and
Psychological Measurement. 2007;67(2):280–299.
59. Hulstijn EM, Deelman BG, de Graaf A, et al. The Zung-12: a questionnaire
for depression in the elderly. Tijdschr Gerontol Geriatr (Netherlands).
1992;23:85–93.
60. Dugan W, McDonald MV, Passik SD, et al. Use of the Zung Self-Rating Depression
Scale in cancer patients: feasibility as a screening tool. Psychooncology.
1998;7(6):483–493.
61. Tucker MA, Ogle SJ, Davison JG, et al. Validation of a brief screening test for
depression in the elderly. Age Ageing. 1987;16(3):139–144.
62. Radloff LS. The CES-D scale: a self-report depression scale for research in the general
population. Appl Psychol Meas. 1977;1:385–401.
63. http://www.mdlogix.com/cesdr.htm.
54 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

64. Markush RE, Favero RV. Epidemiologic assessment of stressful life events, depressed
mood, and psychophysiological symptoms: A preliminary report. In Dohrenwend BS,
Dohrenwend BP, eds. Stressful life events: their nature and effects. New York: Wiley,
1974:171–190.
65. Furukawa T, Anraku K, Hiroe T, et al. Screening for depression among first-visit
psychiatric patients: Comparison of different scoring methods for the Center for
Epidemiologic Studies Depression Scale using receiver operating characteristic
analyses. Psychiatry Clin Neurosci. 1997;51:71–78.
66. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived
CES-D short form. Psychol Assess. 2004;16(4):360–372.
67. Chan KS, Orlando M, Ghosh-Dastidar B, et al. The interview mode effect on the
Center for Epidemiological Studies Depression (CES-D) scale: an item response
theory analysis. Med Care. 2004;42:281–289.
68. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr
Scand. 1983;67:361–370.
69. Bjellard I, Dahl AA, Tangen Haug T, et al. The validity of the Hospital Anxiety
and Depression Scale. An updated literature review. J Psychosom Res. 2002;
52:69–77.
70. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a
regional cancer centre: screening and unmet treatment needs. Br J Cancer.
2004;90:314–320.
71. Martin CR, Thompson DR, Barth J. Factor structure of the Hospital Anxiety and
Depression Scale in coronary heart disease patients in three countries. J Eval Clin
Pract. 2008;14(2):281–287.
72. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of
depressive symptoms in primary care: The Hampshire Depression Project 3. Br J
Psychiatry. 2001;179:317–323.
73. Crawford JR, Henry JD, Crombie C, et al. Normative data for the HADS from a large
non-clinical sample. Br J Clin Psychol. 2001;40:429–434.
74. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a
geriatric depression screening scale: a preliminary report. J Psychiatr Res.
1983;17:37–49.
75. www.stanford.edu/~yesavage/GDS.html.
76. Tang WK, Wong E, Chiu HFK. The Geriatric Depression Scale should be shortened:
results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20(8):783–789.
77. Marc LG, Raue PJ, Bruce ML. Screening performance of the 15-item Geriatric
Depression Scale in a diverse elderly home care population. Am J Geriatr
Psychiatry. 2008;16(11):914–921.
78. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression: development
of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry.
1987;150:782–786.
79. Wancata J, Alexandrowicz R, Marquart B, et al. The criterion validity of the Geriatric
Depression Scale: a systematic review. Acta Psychiatr Scand. 2006;114(6):398–410.
80. www.aap.org/practicingsafety/Toolkit_Resources/Module2/EPDS.pdf.
81. Cox J, Holden J. Perinatal mental health—A guide to the EPDS. RCPsych
Publications, 2003.
82. Chabrol H, Teissedre F. Relation between the Edinburgh Postnatal Depression Scale
scores at 2–3 days and 4–6 weeks postpartum. J Reprod Infant Psychol. 2004;22:33–39.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS 55

83. Hesbacher PT, Rickels K, Morris RJ, et al. Psychiatric illness in family practice. J Clin
Psychiatry. 1980;41:6–10.
84. Burnam MA, Wells KB, Leake B, et al. Development of a brief screening instrument
for detecting depressive disorders. Med Care. 1988;26:775–789.
85. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Post Natal Depression
Scale using Rasch analysis. BMC Psychiatry. 2006;6:28.
86. www.patient.co.uk/showdoc/40025272/.
87. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care. The PRIME-MD 1000 study. JAMA.
1994;272:1749–1756.
88. Tuunainena A, Langer RD, Klauber MR, Kripke DF. Short version of the CES-D
Burnam screen for depression in reference to the structured psychiatric Interview.
Psychiatry Research 2001; 103: 261–270.
89. Kroenke K Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression
severity measure. J Gen Intern Med. 2001;16(9):606–613.
90. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a
two-item depression screener. Med Care. 2003;41:1284–1292.
91. http://www.gp-training.net/protocol/psychiatry/who/mdi.doc.
92. Fountoulakis KN, Iacovides A, Kleanthous S, et al. Reliability, validity and
psychometric properties of the Greek translation of the Major Depression Inventory.
BMC Psychiatry 2003;3:2.
93. Bech P, Wermuth L. Applicability and validity of the MDI in patients with Parkinson’s
Disease. Nord J Psychiatry. 1998;52:305–309.
94. Forsell Y. The Major Depression Inventory versus schedules for clinical assessment in
neuropsychiatry in a population sample. Soc Psychiatry Psychiatric Epi.
2005;40(3):209–213.
95. Weyerer S, Killmann U, Ames D, et al. The Even Briefer Assessment Scale for
Depression (EBAS DEP): its suitability for the elderly in geriatric care in
English- and German-speaking countries. Int J Geriatr Psychiatry. 1999;14(6):
473–480.
96. Folstein M. Reliability, validity, and clinical application of visual analog mood scale.
Psychol Med. 1973;3:479.
97. Blank K, Gruman C, Robison JT. Case-finding for depression in elderly people:
balancing ease of administration. J Gerontol A Biol Sci Med Sci. 2004;59:M378–M384.
98. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
J Gen Intern Med. 1997;12(7):439.
99. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major
depression among patients with coronary artery disease using the Patient Health
Questionnaire: Data from the Heart and Soul Study. J Gen Intern Med. 23(12):
2014–2017.
100. Bech P, Rasmussen N, Olsen R, et al. The sensitivity and specificity of the MDI using
the Present State Examination as the index of diagnostic validity. J Affect Disord.
2001;66:159–164.
101. Mitchell AJ, Baker-Glenn EA, Park B, et al. Can the distress thermometer be improved
by additional mood domains? Part II: What is the Optimal Combination of
Thermometers? Psychooncology. 2009 [e-pub March 18].
102. Evans KR, Sills T, DeBrota DJ, et al. An item response analysis of the Hamilton
Depression Rating Scale using shared data from two pharmaceutical companies.
J Psychiat Res. 2004;38:275–284.
56 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

103. Maier W, Philipp M. Improving the assessment of severity of depressive states: a

reduction of the Hamilton Depression Scale. Pharmacopsychiatry. 1985;18:
114–115.
104. Gibbons RD, Clark D, Kupfer DJ. Exactly what does the Hamilton Depression Rating
Scale measure? J Psychiat Res. 1993;27:259–273.
105. Andrews G, Slade T, Sunderland M, et al. Issues for DSM-V: Simplifying DSM-IV to
enhance utility: the case of major depressive disorder. Am J Psychiatry. 2007;164:
1784–1785.
106. Brody DS, Hahn SR, Spitzer RL et al. Identifying Patients With Depression in the
Primary Care Setting:A More Efficient Method. Arch Intern Med. 1998;158:2469–
2475.
3
WHY DO CLINICIANS HAVE DIFFICULTY
DETECTING DEPRESSION?

Alex J. Mitchell

1. Introduction to the Problem of Over- and Under-Detection

2. Predictors of Detection
3. Patient and Clinician Influences on Detection
4. Illness-Related Influences on Detection
5. Conclusions

Context
Hundreds of studies reveal than most cases of depression remain undetected
and untreated. Yet there is growing concern that efforts to increase detection of
depression entail unacceptable numbers of persons who are not depressed
nonetheless being given a diagnosis and receiving medication. What factors
underlie false-positive and false-negative errors? How might clinicians and
services address these detection errors?

1. Introduction to the Problem of Over- and Under-Detection

Only about half of primary care practitioners (PCPs) feel confident in diag-
nosing depression or assessing suicide risk.1–6 Yet the issue of underdetection
is by no means confined to PCPs7–13 or to depression.14,15 Convincing data
show that clinicians in all medical specialties have difficulty recognizing
mental disorders. This includes depression, anxiety, and delirium and
dementia.16,17 Less discussed in the literature but increasingly recognized as
important is the issue of overdetection. In this chapter I will review the
predictors of diagnostic errors (false positives and false negatives) with

57
58 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

reference to depression in primary care. I will focus on two essential barriers to

correct identification: communication and illness complexity.
To meaningfully discuss errors in recognition, it is important to first
establish baseline rates of depression. Prevalence exerts a powerful influence
upon detection accuracy, not least because clinicians usually have a higher
index of suspicion for high-risk patients. The World Health Organization
(WHO) study on Psychological Problems in General Health Care (PPGHC),
conducted across 14 countries, found that 26% of individuals visiting their
PCP had at least one psychiatric disorder as defined by ICD-10 criteria.18
Fourteen percent had major depression. Almost identical rates were reported
from the European Study of the Epidemiology of Mental Disorders
(ESEMeD).19,20 If one examines depression in older people, the point pre-
valence of major depression is lower in rural than urban primary care
practices (8.3% versus 14.8%).21 Further, if one combines a 14% rate of
major depression with 10% who have minor depression, then the combined
rate approaches 25%.22

How Many Cases of Depression Are Detected in Routine Care?

Approximately 100 studies concerning the unassisted recognition rate of
depression in primary care have been published, but only a third have used a
robust semi-structured interview as a gold standard.23 Of these at least 10 have
had samples of more than 1,000 and 17 studies examined both the ability of
clinicians to rule in and rule out a diagnosis (see table 3.1). From these studies
PCPs’ pooled sensitivity is 48% and specificity 70%. At a prevalence of 16%,
the positive predictive value (PPV) is 21.4% and the negative predictive value
(NPV) is 87.4%. In a low-risk sample where the prevalence is 10%, the PPV
becomes 14% and NPV 92%. This is best illustrated in a Bayesian plot of
conditional probabilities (Fig. 3.1).
Looked at descriptively at a prevalence of 16%, an average PCP would
correctly identify 8 out of 16 depressed cases, missing 8 true positives. He or
she would correctly reassure 57 out of 84 non-cases but falsely diagnose 27
people as depressed (Fig. 3.2). Thus, the number of correctly identified people
per 100 screened would be 64 (the number needed to screen would be 3.5 to
correctly identify one true case or non-case). Out of every five cases thought to
be depressed, only one would be a true case (PPV = 21.4%). Out of every 10
cases thought to be well, approximately 9 would be correctly reassured
(NPV = 87.4%).
In a low-risk sample (such as a rural practice) where the prevalence is 10%,
an average PCP would correctly identify 5 out of 10 cases, missing 5 true
positives, and would correctly reassure 60 out of 90 non-cases, falsely diag-
nosing 30 people as depressed. In a high-risk sample (such as patients with
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 59

known physical disease), at a prevalence of 25%, Bayesian analysis suggests

that an average PCP would correctly identify 12 out of 25 cases, missing 13
true positives, and would correctly reassure 50 out of 75 non-cases, falsely
diagnosing 25 people as depressed.

1.00
Post-test Probability

0.90
Unassisted Attempt to Rule-In Depression

0.80 Unassisted Attempt to Rule-Out Depression

Baseline Probability
0.70

0.60

0.50

0.40

0.30

0.20

0.10
Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 3.1. Bayesian plot of conditional pre-test/post-test probabilities.

Prev 25% 13.0 12.0 50.4 24.6

Depressed Non-Depressed

Prev 10% 5.2 4.8 60.5 29.5

Depressed Non-Depressed
False Negatives (%)
Correctly Diagnosed (%)
Correct Reassured (%)
False Positives (%)
Prev 16% 8.1 7.5 56.7 27.6

Depressed Non-Depressed

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0

Figure 3.2. Rates of correct and incorrect identification per 100 selected cases in
primary care.
60 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Clinicians do less well with minor depression as well as mild depression—a

problem that is shared by those using screening tools as well.24
Underrecognition converts into undertreatment, as recognized patients are
more likely to be offered mental health interventions.25 Data from ESEMeD
shows that only 15.1% of those with an identified mood disorder and 23.2%
with an anxiety disorder received either drug or psychological treatment.26
Maginn and colleagues (2004)27 found that PCPs recorded active management
of a psychological problem in 37% of patients whom they rated as cases. Of
these, 24% were prescribed psychoactive drug treatment, 5% were referred to
psychiatric or psychological services, and 3% were offered both drug and
psychological treatments. Surprisingly, only 5% were offered a follow-up
appointment with their PCP. Wittchen and colleagues28 found somewhat
more favorable rates of conversion to treatment in a large study of 20,421
primary care patients in Germany. After correctly identifying depression
(according to the ICD-10 definition), doctors prescribed drug treatments in
60.8%, prescribed non-drug treatments in 24.9%, and referred the patient to a
mental health specialist in 10%. The take-home message is that the typical
proportion of recognized patients offered treatment from the large ESEMeD,
PPGHC, and INSERM studies is approximately 20%.

Textbox 3.1. Case History: An Example of a Difficult Case?

A previously well 58-year-old man comes to see his GP for the first time soon
after discharge from hospital with a dominant hemisphere stroke from which
he has difficulty walking and word finding. His main complaints are physical,
notably discomfort on walking, fatigue, loss of appetite, and insomnia. His
GP is not sure if he is depressed but asks about low mood and low of interest.
Mood is indeed low since the stroke and motivation is poor, but interest,
weight, and concentration are preserved. There is no hopelessness, guilt, or
suicidal thoughts.

Understanding Detection Errors

To go beyond raw rates of detection accuracy, detailed studies examining the
types of diagnostic error are needed. Tiemens and colleagues (1999)12 found
that that only 26% of missed cases (false negatives) were complete omissions,
while 25% were underestimates of severity (eg, diagnosing subthreshold instead
of mild) and 38% were misidentifications. Conversely, of false-positive diag-
noses, 35% were overestimates of severity, 24% were misdiagnoses, and
41% were complete errors. Diagnostic errors are illustrated in Figure 3,
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 61

using data from Wittchen and colleagues (2002).16 It can be seen that when
deliberating both true cases and true non-cases, there is about a 25% rate of
uncertainty, which is an area for improvement. It also helps explain the
considerable variance between recognition studies, as these possible cases
are sometimes included in those detected and sometimes in those missed. In
the MAGPIE study, Bushnell and associates (2004)29 found that 38% of
depression cases were not recognized. Reasons for this were not categorizing
the patient’s psychological issues as clinically significant (23.4%), recognizing
clinical significance but not ascribing a particular diagnosis (7.1%), or the PCP
making an explicit diagnosis of something other than depression (7.7%).
What, then, distinguishes one clinician from another? Rogers (2001)30
suggested several types of common clinical error when attempting to make a
psychiatric diagnosis: idiosyncratic language in clinical questioning, idiosyn-
cratic coverage in clinical questioning, idiosyncratic sequence of clinical
questioning, idiosyncratic recording of responses and idiosyncratic rating of
severity.

(a) 60.0

50.0

40.0

30.0

20.0

10.0

0.0
ill se se se se se
ntly ca ca ca ca ca
ne ild e re re
rre rli at ve ve
cu M er
ot rde od Se se
N Bo M ry
Ve

Figure 3.3a. and 3.3b. Severity estimates by general practitioners of nondepressed and
depressed patients. Adapted from Wittchen HU, Kessler RC, Beesdo K, et al. Generalized
anxiety and depression in primary care: prevalence, recognition, and management. J Clin
Psychiatry. 2002;63(suppl 8):24–34.
62 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(b) 60.0

50.0

40.0

30.0

20.0

10.0

0.0
ill se se se se se
tly ca ca ca ca ca
rr en ne ild te re re
li a ve ve
cu er M er
ot d od Se se
or y
N B M
Ver

Figure 3.3a and 3.3b (Continued)

2. Predictors of Detection
There have been some impressive studies examining what factors influence
correct detection, although few concerning the influences upon willingness to
look for symptoms of depression. Borowsky and colleagues (2000)31 con-
ducted an impressive study involving 19,309 patients from 349 PCPs in
Boston, Chicago, and Los Angeles. All underwent the MOS eight-item
Burnam screen for depression, and 1,610 underwent a Diagnostic Interview
Schedule (DIS) for DSM-III. Of the patients, 661 were depressed, although
only 70 had current major depression. Physicians were less likely to detect
depression in African Americans, men, and those younger than 35 years and
more likely to detect depression when comorbid hypertension or diabetes was
present. Hickie and colleagues (2001)32 looked at a large sample of 46,515
patients attending 386 PCPs; 56% of cases were not recognized. This is
probably the most comprehensive study of predictors of recognition available.
Patients were more likely to be assessed psychologically if they were middle-
aged, female, Australian-born, unemployed, single, or presenting with mainly
psychological symptoms or for psychological reasons. Doctor characteristics
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 63

associated with willingness to assess were being over 35 years old, having an
interest in mental health, having had previous mental health training, being in
part-time practice, seeing fewer than 100 patients per week, and working in
regional centers. Thompson and colleagues (2001)33 examined recognition
among 156 PCPs in the United Kingdom, involving 18,414 individuals. The
prevalence of depression was 20% based on a 7v8 cutoff on the HADS
depression subscale. The mean recognition sensitivity was 36% and recogni-
tion specificity was 91.5% (Fig. 3.4). Women and unemployed people were
more likely to be detected, while the elderly and retired were more likely to be
missed. However, these relationships were confounded by severity of depres-
sion or anxiety: increased anxiety improved recognition of depression.
Wittchen and colleagues (2002)16 conducted a large study of PCP recogni-
tion in Germany. This impressive nationwide study recruited a total of 20,421
patients, attending 633 PCPs. Taking the doctors’ decision of definite or
probable depression, 75% of all DSM and 59% of all ICD-10 diagnoses were

0.3

0.25

Proportion Missed
0.2
Proportion Recognized

0.15

0.1

0.05

0
ne
n
Th e

en
en

Se een

n
t

y
e

n
gh

ee
ee
ee

nt
in

fte
ev

-o
Ei

e
N

irt

et
nt
Tw

ty
ur

Tw
El

in
ve

en
Fo

Figure 3.4. Burden and detection of depression by Hampshire (U.K.) general

practitioners. 36% of depression (blue) was detected and 64% was missed (red). 72.6% of
all omissions occurred at a HADS-D score of between 8 and 10. Adapted from Thompson C,
Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive
symptoms in primary care: The Hampshire Depression Project 3. Br J Psychiatry.
2001;179:317–323.
64 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

recognized by the treating physician, albeit with an 11.7% false-positive rate.

Multiple logistic regression revealed that recognition was associated with prior
treatment episodes, increasing number of depression symptoms, patient’s
higher age, practice experience of greater 5 years, and the presence of psycho-
motor retardation. In the MAGPIE study from New Zealand, 63.7% of patients
with a CIDI-diagnosed disorder were recognized as having psychological
problems, although only 40% were recognized as having a clinically signifi-
cant psychological problem and only 33.8% were given an explicit diagnosis.28
In those seen five or more times during the previous year, these recognition
figures increased to 80.2% compared with 28.8% among patients not seen in
the previous year. Maginn and associates (2004)26 examined PCP recognition
of distress in South London. Overall, PCPs identified 65% of cases, but Black
African patients were less likely to be detected or treated than Black Caribbean
and White English patients. Willingness to talk to the doctor about psycholo-
gical problems was the main predictor of detection. Ethnicity did not indepen-
dently predict detection, but Black African individuals were less likely to talk
to their PCP about psychological problems. Worryingly, half as many Black
African individuals with detected distress were offered treatment compared
with English cases (41% versus 22%). Pfaff and Almeida (2004)34 found that
39.9% of patients (87/218) were correctly classified as depressed by their PCP.
Older patients were more likely to be incorrectly classified as ‘‘not depressed’’
by their PCP when they were born outside of Australia or New Zealand, did not
smoke or use sleeping tablets, acknowledged milder levels of depression, and
presented with primarily somatic complaints.
Aragones and colleagues (2004)35 screened 209 Zung-positive patients and
97 negative patients with the SCID. Detection was associated with educational
level, severity of the depression, level of impairment, and the complaint of
explicit psychological symptoms. Antidepressant treatment was associated
with marital status, severity of and impairment from the depression, frequency
of visits to the family physician, and the patient’s complaint of psychological
symptoms. Aragones and colleagues went on to study of predictors of false-
positive diagnoses (2006)36 and found that PCPs had a nearly 50% rate of false-
positive diagnosis. Factors associated independently with overdiagnosis were
higher levels of symptoms SDS score, lower Global Assessment of
Functioning, a previous history of depression, and the absence of generalized
anxiety. Nuyen and colleagues (2005)37 found that among 191 depressed
primary-care patients diagnosed using the CIDI, 28.8% were recognized and
recorded by PCPs over the same period. Patients without chronic somatic
comorbidity, with a lower educational level, with less severe depression, and
with fewer PCP contacts were all significantly more likely not to be diagnosed
as depressed. Verhaak and coworkers (2006)38 conducted a survey of primary
care contacts of patients with a DSM-IV diagnosis of affective disorder,
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 65

anxiety disorder, or alcohol abuse. Forty percent visited their PCP but received
only a somatic diagnosis and 50% were given a psychological or social
diagnosis at least once during 1 year. The chances of a psychological PCP
diagnosis increased with the number of PCP contacts. Patients who were given
a psychological or social diagnosis by their PCP had a higher GHQ score,
lower mental functioning scores on the SF-36, and far more visits to their PCP
than those not diagnosed as psychologically ill. Finally, patients given a
diagnosis tended to express slightly more confidence in their PCP.
McCall and colleagues (2007)39 looked at predictors of recognition of
distress in Austrian primary care practice. Twenty-eight PCPs completed a
clinical audit on 868 of their patients who completed the GHQ-28. PCPs
correctly identified 43% of GHQ-positive cases as having distress. For indivi-
dual PCPs the rate of correct recognition varied considerably, from 4% to
100%. Correct recognition was associated with years of experience as a PCP,
older age of patient, and greater severity of distress.
Clearly, there is a wide variation in the ability of GPs to diagnose mental
health problems, due in part to differences in knowledge, skills, and attitudes
(Textbox 3.2).40,41 Most clinicians have difficulty recalling the current criteria
for major depression.42 Further, only one third claim for make diagnoses based
on validated criteria.43 Self-confident, outgoing physicians with high academic
ability appear to make more accurate diagnoses44—yet this same formula would
apply to psychiatrists’ ability to detect physical illness. One apparently simple
solution is to increase the length of the consultation. There is reasonably good
evidence that short appointments impair detection in difficult cases.45 However,
paradoxically, lengthening the consultation may not improve recognition.46
Verhaak and colleagues (2007)47 found that in general, healthcare system
characteristics do affect PCPs’ performance in psychosocial care. PCPs’ work-
load was not related to their awareness of psychological problems and hardly
related to their communication, except for the finding that a PCP with a
subjective experience of a lack of time is less patient-centered (Textbox 3.3).48

Textbox 3.2. Possible Barriers to Recognition (Diagnostic Barriers)

Patient Related
Younger patient
Male gender
Reluctance to seek help
Reluctance to disclose symptoms
Disclosure of only somatic symptoms
Low awareness of emotional symptoms
Fear of stigma/label of mental illness
66 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 3.2. (Continued)

Clinician Related
Low clinician confidence and skills
Low therapeutic alliance
Low consultation time
Single appointment only
Low index of suspicion
Rare inquiry about depressive symptoms
Caution re: stigma of mental illness

Textbox 3.3. Basic Patient-Centered Interviewing Method

Step 1. Welcoming
Welcome the patient
Introduce self and identify specific role
Ensure patient comfort and privacy
Step 2. Set agenda
Indicate time available and objective
Summarize what is already known and others involved
Indicate own needs
Clarify what patient wants to discuss
Step 3. Non-focused interviewing
Open-ended beginning question: ‘‘How have things been recently?’’
Attentive (active) listening (with prompts): ‘‘That sounds difficult’’
Observe nonverbal cues
Step 4. Focused interviewing
Obtain description of main problem and secondary problems
Clarify the development and context of the problems
Ask about emotional and functional impact of the problems
Step 5. Transition to agreed action
Give brief summary and check accuracy

3. Patient and Clinician Influences on Detection

Do Patients Volunteer Symptoms of Depression?
It should be no surprise that recognition of distress and depression is linked
with the number of symptoms reported during a consultation.49 Recognition is
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 67

facilitated when patients report psychological symptoms of anxiety or depres-

sion early in the consultation.50 Patients who normalize or minimize their
symptoms are less likely to be identified.51 It has been reported that detection
rates may be 100% in those who spontaneously complain of emotional pro-
blems.52 However, patients do not usually complain of ‘‘depression,’’ and
patients’ views about their depressive symptoms are significantly different
from conventional medical views.53,54 Many groups have noted that patients
with depression often present with physical symptoms rather than psycholo-
gical complaints, and the depression is less likely to be recognized as a
consequence.56–62 Perhaps 60% to 70% of patients with depression and anxiety
have predominantly somatic presentations.63,64 Such patients tend to be older
and have less severe depression but not necessarily more comorbid physical
illness. Many authors have shown that patients are often reluctant to discuss
emotional issues with health professionals.65–67 Patients have their own readi-
ness to disclose.68 Indeed, willingness to discuss emotional issues may be one
of the strongest predictors of detection.69 Some ethnic groups (whites and
Hispanics) appear more likely to communicate with a clinician about depres-
sion than others (African Americans).70 However, most patients will discuss
psychological symptoms if asked.71,72 Reassuringly, Davenport and associates
(1987)73 found that there is some association between severity of distress and
spontaneous verbal cues, but this is by no means a perfect correlation,
and those clues are easily overlooked. O’Conner and colleagues (2001)74
examined 1,021 older patients in Melbourne, Australia. Symptom disclosure
was associated with higher depressive scores, previous contact with a psychia-
trist, and female gender; even so, 48% of persons with ICD-10 moderate or
severe depressive episode had not reported any current complaints to their
doctor at the time of the interview. In the MAGPIE study 30% of all primary
care patients of all patients (and 37% of patients with current psychological
symptoms) did not disclose their psychological problems spontaneously;
younger patients, those consulting more frequently, and those with greater
psychiatric disability were more likely to report non-disclosure.75 However, in
this study, reported nondisclosure did not influence detection rates. Verhaak
and colleagues76 collected comprehensive data on detection rates from con-
sultations across 10 European countries and found low rates of spontaneous
emotional complaints.
What, then, are the reasons for not discussing emotional difficulties? The
most frequently given reason in the MAGPIE study was the belief that the PCP
is not the ‘‘right’’ person to talk to (33.8%) or that mental health problems
should not be discussed at all (27.6%). In a survey of primary care attendees
who were high scorers on the GHQ, more than 75% had not mentioned any
emotional problems during a consultation.77 Thirty-six percent felt they were
able to cope without emotional help, but 45% gave reasons including
68 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

psychological embarrassment and hesitation to trouble the doctor, and a further

19% were deterred by the doctors’ interview behaviors (see below). Thirty-
nine percent felt there was little the doctor could do to help with their emotional
problems. In a study by Del Piccolo and associates (1998),78 about two thirds
of patients with stressful life events and social problems had mentioned them to
their PCP. A positive attitude about confiding and emotional distress were the
best predictors of confiding. In women, past confiding and a longstanding
relationship with the PCP were also important. Pollock79 summarized the
difficulty, stating that medical consultations are difficult encounters for most
patients, who often strive to protect their privacy and personal integrity by
‘‘maintaining face,’’ but this in turn may impede the diagnostic process.

Do Clinicians Ask About Depression?

Communication behaviors of clinicians have been much discussed. Individual
clinicians differ in their communicative style, with some more patient-centered
and others less so, but most adjust their style according to the situation, such as
illness severity.79–81 In a large study recording responses of PCPs to standardized
patients, biomedical inquiry/explanations, nonspecific acknowledgment, and
reassurance were common, whereas empathy, expressions of uncertainty, and
exploration of psychosocial factors and emotions were uncommon.82 Yet in
consultations about psychosocial issues, doctors show more emotional behavior,
ask more questions, and give less information than in other consultations.83,84
Feldman and colleagues (2007)85 found that history taking about depression was
directly associated with the likelihood of a chart diagnosis of depression and the
provision of minimally acceptable initial depression care. When PCP decisions
for late-life depression were monitored, a recorded treatment decision occurred
in about 5% of visits, a deferred or monitor-only decision occurred in about a
third of visits, and no decision was made in about half of visits.86 Saltini and
coworkers (2004)87 found that although occupational, financial, and housing
problems and life events of loss were the most important predictors of the GHQ-
12 case definition, PCPs gave significantly more importance to psychiatric
treatment, psychopharmacological drug, use and chronic illness.
A number of authors have commented on suboptimal communication stra-
tegies from clinicians.88 Inadequate interview and diagnostic skills influence
detection.89,90 For example, clinicians appear to miss most cues and concerns
and adopt behaviors that discourage disclosure.91,92 More sophisticated ana-
lysis with video recording of consultations is revealing. In one of the best
examples, Deveugele and colleagues (2004)93 analyzed 2,095 consultations
from 168 PCPs using the Roter Interactional Analysis System. Clinicians
differed markedly in their psychosocial and emotional communication. Some
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 69

studies attempt to go further and uncover an explicit link with detection. In a

seminal study from Marks and associates (1979),94 a research psychiatrist
made detailed observations on 2,098 interviews carried out by 55 PCPs. The
authors found that PCPs who had a better conceptual understanding of mental
illness produced a more accurate diagnosis of the patient’s condition. They also
noted that PCPs with an interest in psychological medicine, those with higher
levels of empathy, and those who asked about social and family problems more
accurately diagnosed psychiatric illness. Badger and colleagues (1994)95 found
two communication behaviors that predicted successful recognition of depres-
sion: the proportion of the interview devoted to emotional issues and the use of
broad, open-ended psychosocial questions. Carney and coworkers (1999)96
found that PCPs who recognized depression asked twice as many questions
about feelings and affect compared with those who did not. In a series of
interviews, Rost and colleagues (2000)97 found that physicians and patients
discussed depression in 47.9% of untreated patients. Chronic physical comor-
bidity decreased the odds that physicians and untreated patients discussed
depression as a possible diagnosis. Interestingly, PCPs who have a preference
for psychotherapy rather than antidepressant treatment also appear more accu-
rate in diagnosing depression.98
There are a number of important barriers to detection, including clinician
attitude (Textbox 3.4). Saltini and associates (2004)99 found that although
occupational, financial, and housing problems and life events of loss were
the most important predictors of the GHQ-12 case definition, PCPs gave
significantly more importance to psychiatric treatment, psychopharmacolo-
gical drug use, and chronic illness. Travado and colleagues6 found that low

Textbox 3.4. Top 10 GP Perceived Barriers to Dealing with

Depression

1. Lack of access to mental health specialists (51.4%)

2. Lack of time (50.6%)
3. Poor reimbursement for depression treatment (50.4%)
4. Distracted by other presenting problems (39.4%)
5. Patient reluctant to be referred to a specialist (37.3%)
6. Workload prevents adequate attention to depression (32.3%)
7. Patient/family reluctance to accept diagnosis of depression (21.7%)
8. Patient inability/unwillingness to discuss depressive symptoms (16.2%)
9. Lack of accessible assessment tools for depression (15.9%)
10. Patient reluctant to begin antidepressant medications (8.6%)
Adapted from Richards JC, Ryan P, McCabe MP, et al. Barriers to the effective management of
depression in general practice. Aust N Z J Psychiatry. 2004;38:795–803.
70 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

psychosocial orientation and burnout symptoms were associated with lower

confidence in communication skills and higher expectations of a negative
outcome after physician–patient communication. In a study of 50 PCPs and
473 patients in Portland, Oregon, routine office visits were audiotaped and
analyzed for communication behaviors and emotional tone using the Roter
Interactional Analysis System.100 Physicians with more positive attitudes to
psychosocial aspects of patient care had more psychosocial discussions in
visits. A large-scale practice audit in Australia found that PCPs with a declared
interest in mental health and those who had obtained mental health training
were more likely to see more patients with depression and more likely to
provide appropriate mental health assessment and treatments. In some studies
insufficient undergraduate and postgraduate training is influential,101 as well
as insufficient time devoted to adequate diagnostic assessment, and a lack of
acquisition of new knowledge relevant to provision of treatments.
Three recent observation studies have examined physician habits in
relation to late-life depression. In a study based in nine primary care clinics
involving 1,023 individuals, Fischer and colleagues (2003)102 found that
physicians were only 6% as likely to ask older depressed patients about
suicide risk and about one-fifth as likely to ask if they felt depressed
compared with younger depressed patients. Tai-Seale and colleagues
(2005)103 observed 389 elderly patients and 33 physicians using video of
their clinical interactions. Physicians assessed depression in only 14% of
the visits and used validated tools only three times. Depression assessment
was more likely in visits that covered multiple topics, contrary to the
‘‘crowding-out’’ hypothesis. Tai-Seale et al (2007)104 observed 35 PCPs
interviewing 366 of their elderly patients. Discussion of mental health
topics occurred in only 22% of visits despite a high prevalence of depres-
sion. A typical mental health discussion lasted approximately 2 minutes.104
Adelman and colleagues (2008)105 audiotaped 482 follow-up visits at three
sites. Depression was discussed in 7.3% of medical visits. Physicians raised
the topic of depression in 41% of visits, patients raised the topic in 48% of
visits, and accompanying persons raised it in 10% of visits. The topic of
depression was raised almost exclusively in the first 2.5 years of the patient–
physician relationship. Physicians with some geriatric training were more
likely to discuss depression.
However, it is important to remember that patient and clinician commu-
nication are reciprocally related. Patient perceptions of how the PCP related to
him or her in the consultation correlates with reduction in symptom severity 3
months later.106 Goldberg and colleagues (1993)107 found that patient cues
were influenced by the PCP’s behavior, increasing with patient-centered
behaviors such as empathic statements or directive questioning about psycho-
logical issues, and decreasing with medical questions and other doctor-led
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 71

behaviors. Similarly, others found that the patient’s willingness to disclose

information is related to physician facilitation, and patient emotional
expression is associated with a warm and empathetic attitude of the
physician.108 Physicians may signal to patients, wittingly or unwittingly,
how emotional problems will be addressed, influencing how patients per-
ceive their interactions with physicians regarding emotional problems. Del
Piccolo and coworkers (2000)109 also found that the proportion of cues
given by patients was related also to the PCP’s verbal behavior, increasing
with closed psychosocial questions and decreasing with the use of active
interview techniques. In fact, patients with detected distress gave more
cues, often with psychological content, whereas patients with undetected
distress gave mainly cues related to their lifestyle and life episodes.
Recently, an international study by Verhaak and colleagues (2007)76
found that eye contact and empathy and asking questions about psycholo-
gical or social topics were associated with more awareness of patients’
psychological problems.
One other important predictor of diagnostic sensitivity (recognition)
includes the amount of contact with the patient.110,111 In the MAGPIE study
from New Zealand, 80.2% of cases seen five or more times during the previous
year were correctly identified, compared with 28.8% of those patients not seen
in the previous year. For example, over time, only 30% remain undetected at
1 year and 14% at the end of 3 years.112,113 Using patient self-report regarding
the adequacy of diagnosis/treatment, Jackson and colleagues114 found that the
cumulative recognition rate was a modest 56% for major depression and 20%
for minor depression, even after 5 years.

4. Illness-Related Influences on Detection

There is some evidence that clinicians find mental illness difficult to deal with
and awkward to diagnose. For example, PCPs in the United States appear
reluctant to code patients as depressed.115 Somatic complaints thought to have
a psychological basis are also perceived as difficult.116,117 In a study of 500
primary care visits, 15% were perceived as difficult by clinicians, and these
were more likely to involve a mental disorder, more than five somatic symp-
toms, more severe symptoms, poorer functional status, more unmet expecta-
tions, less satisfaction with care, and higher use of health services.118
Interestingly, clinicians with poorer psychosocial attitudes perceived three
times as many encounters as being difficult. In the same study, the authors
showed that a 2-hour physician workshop followed by information
provided before each visit improved physician-perceived difficulty of the
encounter.119
Table 3.1. Large-Scale International Studies on Mood Disorders Recognition and Treatment

Study Setting Sample Instrument Prevalence of Recognition % Offered

Mood Disorders in Primary Antidepressants
Care
Institut National Paris, France, 2,419 patients (aged 18–70 MINI Major depression Major Major
de la Santé et de la 1996–97 years) 238 were found to be (14.0%), minor depression depression
Recherche depressed and were followed depression (26%) (21%)
Médicale up for 6 months. (3.1%), and Any mental
(INSERM) study dysthymia (2.1%) disorder
(58%)
European Study Community study 21,425 non-institutionalized WMH-CIDI Lifetime Not Major
of the in Belgium, adults 18 years old prevalence rates examined depression
Epidemiology of France, Germany, (including those 65 years and of 13.4% for (21.2%)
Mental Disorders Germany, Italy, older) major depression
(ESEMeD) the Netherlands, and 4.4% for
and Spain dysthymia were
reported.
World Health 14 countries 26,422 consecutive patients General Mental disorders Major Major
Organization worldwide (aged 15–65 years) Health (24%) depression depression
study on Questionnaire Major depression (15%) (15%)
Psychological (GHQ-12) Any mental
(13.7%)
Problems in disorder
General Health Minor depression
(54%)
Care (PPGHC) (3.6%)
Dysthymia
(3.6 %)
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 73

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
t

Th ve

Fo en

Si n

ve n

Ei e n

in n

Tw n
en ty
ne
gh

Se e e

e
ee
in

Tw en
el
ev

fte

-o
e
N
ei

irt

et
nt
Tw

gh
ur

ty
El

N
Figure 3.5. Detection sensitivity (%) by severity of depression according to the HADS
scale. Adapted from Thompson, C., Ostler, K., Peveler, R. C., et al (2001) Dimensional
perspective on the recognition of depressive symptoms in primary care. The Hampshire
Depression Project 3. British Journal of Psychiatry, 179, 317–323.

Most depressions in primary care are mild to moderate in severity (90%

have a score of 8 to 13 on the HADS), and the detection of mild disorders is a
challenge because symptoms do not differ greatly from those of healthy but
stressed individuals.120,121 Thompson and colleagues (2001)32 examined the
relationship between severity of depression on the HADS-D and proportion of
cases detected (Fig. 3.5). Generally, higher severity of depression is associated
with greater recognition, but because of the great burden of mild depression,
50% of all correct recognition occurs at a HADS-D score of between 8 and 10.
Further, many cases feature physical or mental comorbidities such as anxiety.
Comorbidity may decrease recognition.122 In primary care only about 10% of
all depressions do not feature comorbidity (5% of those with major depres-
sion). About 50% have physical comorbidity and an overlapping 70% to 80%
psychiatric comorbidity (of which 40% to 50% is anxiety). Patients with
anxiety or chronic mixed anxiety and depression were less likely to be offered
active treatment than those considered to have depression.123 One hypothesis is
that somatic complaints, particularly in late-life depression, might cause the
clinician to focus on physical rather than mental symptoms. Many clinicians
have been taught to take an exclusive approach and ignore such complaints, but
accumulating evidence suggests this is probably incorrect and that somatic
symptoms should be ‘‘counted’’ toward depression even when another physical
74 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

illness like stroke or Parkinson’s disease is present. This is discussed further in

Chapters 10 and 11.
However, this ‘‘crowding-out’’ hypothesis has been refuted. For example,
Ani and coworkers (2008)124 found that comorbidity had no effect of recogni-
tion accuracy. Pfaff and Almedia (2005)125 found that predictors of detection
included concomitant polypharmacy (imply higher comorbidity) as well as
higher CESD scores, presenting with psychological complaints, and higher risk
of suicide. O’Conner and associates (2001)126 found that comorbid pain
positively influenced detection of late-life depression. Similarly, Borowsky
and associates (2000)30 found superior detection of depression if comorbid
diabetes or hypertension were present. Other factors were previous psychiatric
consultation, number of years as a patient, severity of depression, and disclo-
sure of depression to the physician. Indeed, the co-occurrence of MDD and
anxiety might actually facilitate recognition of depression127 or psychiatric
caseness.128–130 When faced with ambiguity and diagnostic difficulties, some
evidence suggests that only a minority of clinicians choose to explore the issues
in more detail.131

5. Conclusions
Depression is often a complex comorbid presentation associated with frequent
primary care attendance.132 Recognition of depression in primary care and
hospital settings is poor, yet in part it is worth remembering that depression is a
relatively uncommon reason for presentation in primary care, with at least six
out of seven unselected cases not having depression. In primary care, time and
resources are limited, and hence psychological or even structured self-help
programs are often not available. The most plausible factor explaining under-
treatment is underrecognition. Antidepressants are typically the treatment of
choice for clinicians but not for patients, and hence managing depression can
be seen as difficult.133 Against this background, only about a half of true cases
are diagnosed and perhaps a quarter treated. Conversely, about 70% of non-
cases are correctly reassured.
Two major factors appear to influence detection: how the person with depres-
sion describes his or her symptoms and how the clinician interviews the patient.
The nature of the therapeutic relationship is important. Even in the face of a high
frequency of contact, a therapeutic relationship that is noted by the clinician (or
patient) to be unhelpful is likely to decrease the recognition rate. Discussion of
emotional distress in primary care is also linked with high patient satisfaction.134
Additional factors such as the skill of the clinician and the use of tools may also
play a role (see Chapter 7). There are certainly many potential barriers to
successful diagnosis and treatment.135 Mental health skills training has been
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 75

effective in improving recognition and management of somatizing and depressed

patients by PCPs, but it remains uncertain whether this translates into improved
clinical outcomes.136–138 Interventions are likely to be most successful where
problems are most serious. For example, Shapiro and colleagues (1987)139 con-
ducted a randomized clinical trial involving 1,242 patients attending inner-city
PCPs by giving feedback of GHQ scores. Results showed marked increases in
detection but only among the elderly, African Americans, and men.
Clinicians should have a high index of suspicion in frequent attendees, those
with serious or chronic illness, and those who have persistent but unexplained
pain. High vigilance is warranted in patients with those somatic symptoms, in
men, and in younger patients.140,141 Ultimately, it is useful to reflect on
patients’ opinions on the importance of primary care for depression.142 The
top four most important needs are the clinician’s interpersonal skills, ability to
recognize depression, the effectiveness of treatment, and problems associated
with treatment.

References

1. Callahan CM, Nienaber NA, Hendrie HC, et al. . Depression of elderly outpatients:
Primary care physicians’ attitudes and practice patterns. J Gen Intern Med. 1992;7(1):
26–31.
2. Kaplan MS, Adamek ME, Martin JL. Confidence of primary care physicians in
assessing the suicidality of geriatric patients. Int J Geriatric Psychiatry.
2001;16(7):728–734.
3. Gallo JJ, Ryan SD, Ford DE. Attitudes, knowledge, and behavior of family physicians
regarding depression in late life. Arch Fam Med. 1999;8:249–256.
4. Shao W, Williams J, Lee S, et al. Knowledge and attitudes about depression among
non-generalists and generalists. J Fam Pract. 1997;44:161–168.
5. Feldman MD, Franks P, Duberstein PR, et al. Let’s not talk about it: Suicide inquiry in
primary care. Ann Fam Med. 2007;5(5):412–418.
6. Travado L, Grassi L, Gil F, et al., and the Southern European Psycho-Oncology Study
(SEPOS) Group. Physician-patient communication among Southern European cancer
physicians: The influence of psychosocial orientation and burnout. Psychooncology.
2005;14(8):661—670.
7. Plummer SE, Gournay K, Goldberg D, et al. Detection of psychological distress by
practice nurses in general practice. Psychol Med. 2000;30(5):1233–1237.
8. Cape J, Morris E, Adams N, et al. Identification of psychological morbidity in
older people in primary care by practice nurses. Aging Mental Health.
2003;7(6):446–451.
9. Ryan H, Schofield P, Cockburn J, et al. How to recognize and manage psychological
distress in cancer patients. Eur J Cancer Care. 2005;14(1):7–15.
10. Liu SI, Mann A, Cheng A, et al. Identification of common mental disorders by general
medical doctors in Taiwan. Gen Hosp Psychiatry. 2004;26(4):282–288.
11. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol
Rev. 1983;3:103–145.
76 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

12. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care
physicians versus a structured diagnostic interview. Understanding discordance. Gen
Hosp Psychiatry. 1999;21(2):87–96.
13. Smith MV, Rosenheck RA, Cavaleri MA, et al. Screening for and detection of
depression, panic disorder, and PTSD in public-sector obstetric clinics. Psychiatr
Serv. 2004;55:407–414.
14. Ormel J, Koeter MWJ, van den Brink W, et al. Recognition, management and
course of anxiety and depression in general practice. Arch Gen Psychiatry.
1991;48:700–706.
15. Norton J, De Roquefeuil G, Boulenger JP, et al. Use of the PRIME-MD Patient
Health Questionnaire for estimating the prevalence of psychiatric disorders in
French primary care: comparison with family practitioner estimates and
relationship to psychotropic medication use. Gen Hosp Psychiatry.
2007;29(4):285–293.
16. Wittchen HU, Kessler RC, Beesdo K, et al. Generalized anxiety and depression in
primary care: prevalence, recognition, and management. J Clin Psychiatry.
2002;63(suppl 8):24–34.
17. Jackson JL, Passamonti M , Kroenke K. Outcome and impact of mental disorders in
primary care at 5 years. Psychosom Med. 2007;69(3):270–276.
18. Ustun TB, Von Korff M. Primary mental health services. In: Ustun TB, Sartorius N,
eds. Mental illness in general health care: an international study. Chichester, UK:
John Wiley & Sons; 1995:347–360.
19. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe:
results from the European Study of the Epidemiology of Mental Disorders (ESEMeD)
project. Acta Psychiatr Scand Suppl. 2004;420:21–27.
20. Alonso J, Lépine J-P. Overview of key data from the European Study of the
Epidemiology of Mental Disorders (ESEMeD). J Clin Psychiatry. 2007;68(suppl
2):3–9.
21. Friedman B, Conwell Y, Delavan RL. Correlates of late-life major depression:
A comparison of urban and rural primary care patients. Am J Geriatr Psychiatry.
2007;15(1):28–41.
22. Licht-Strunk E, van der Kooij KG, van Schaik DJF. Prevalence of depression in older
patients consulting their general practitioner in The Netherlands. Int J Geriatr
Psychiatry. 2005;20(11):1013–1019.
23. Mitchell AJ, Vaze A, Rao S. Meta-Analysis of Unassisted Recognition of Depression
in Primary Care: Importance of False Positives and False Negatives. The Lancet 2009
(in press).
24. Lyness JM, Noel TK, Cox C, et al. Screening for depression in elderly primary care
patients. A comparison of the Center for Epidemiologic Studies-Depression Scale and
the Geriatric Depression Scale. Arch Intern Med. 1997 24;157(4):449–454.
25. Greer J, Halgin R, Harvey E. Global versus specific symptom attributions: predicting
the recognition and treatment of psychological distress in primary care. J Psychosom
Res. 2004;57:521–527.
26. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe:
results from the European Study of the Epidemiology of Mental Disorders (ESEMeD)
project. Acta Psychiatr Scand Suppl 2004;420:21–27.
27. Maginn S, Boardman AP, Craig TKL, et al. The detection of psychological problems
by general practitioners. Influence of ethnicity and other demographic variables. Soc
Psychiatry Psychiatr Epidemiol. 2004;39:464–471.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 77

28. Wittchen HU, Hofler M, Meister W. Prevalence and recognition of depressive

syndromes in German primary care settings: poorly recognized and treated? Int Clin
Psychopharmacol. 2001;16(3):121–135.
29. Bushnell J. Frequency of consultations and general practitioner recognition of
psychological symptoms. Br J Gen Pract. 2004;54(508):838–842.
30. Rogers R. Handbook of diagnostic and structured interviewing, New York: Guilford
Publications, 2001.
31. Borowsky SJ, Rubenstein LV, Meredith LS, et al. Who is at risk of nondetection of
mental health problems in primary care? J Gen Intern Med. 2000;15(6):381–388.
32. Hickie IB, Davenport TA, Scott EM, et al. Unmet need for recognition of
common mental disorders in Australian general practice. Med J Australia.
2001;175:S18–S24.
33. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition
of depressive symptoms in primary care. The Hampshire Depression Project 3. Br J
Psychiatry. 2001;179:317–323.
34. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection
of depression in older primary care patients. Australian N Z J Psychiatry.
2005;39(4):262–265.
35. Aragones E, Pinol JL, Labad A, et al. Detection and management of depressive
disorders in primary care in Spain. Int J Psychiatry Med. 2004;34(4):331–343.
36. Aragones E, Pinol JL, Labad A. The overdiagnosis of depression in non-depressed
patients in primary care. Fam Pract. 2006;23(3):363–368.
37. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in
primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol
Med. 2005;35(8):1185–1195.
38. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in
general practice: determinants of general practitioners’ psychological diagnosis. Gen
Hosp Psychiatry. 2006;28:125–132.
39. McCall L, Clarke D, Trauer T, et al. Predictors of accuracy of recognition of
emotional distress in general practice. Primary Care Community Psychiatry.
2007;12(1):1–5.
40. Millar T, Goldberg DP. Link between the ability to detect and manage emotional
disorders: a study of general practitioner trainees. Br J Gen Pract. 1991; 41: 357–359.
41. Davenport TA, Hickie IB, Naismith SL, et al. Variability and predictors of mental
disorder rates and medical practitioner responses across Australian general practices.
Med J Australia. 2001;175:S37–S41.
42. Rapp S, Davis K. Geriatric depression: physicians’ knowledge, perceptions and
diagnostic practices. Gerontologist. 1989;29:252–257.
43. Williams Jr JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to
depressive disorders: effects of physician specialty and practice structure. Arch Fam
Med. 1999;8(1):58–67.
44. Goldberg D, Steele J, Johnson A, et al. Ability of primary care physicians to
make accurate ratings of psychiatric symptoms. Arch Gen Psychiatry.
1982;39:829–833.
45. Hutton C, Gunn J. Do longer consultations improve the management of psychological
problems in general practice? A systematic literature review. BMC Health Services
Research. May 17, 2007;7:Art. No. 71.
46. Howie JG, Porter AM, Heaney DJ, et al. Long to short consultation ratio: a proxy
measure of quality of care for general practice. Br J Gen Pract. 1991;41:48–54.
78 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

47. Verhaak PFM, Van Den Brink-Muinen A, Bensing JM, et al. Demand and
supply for psychological help in general practice in different European
countries—Access to primary mental health care in six European countries.
Eur J Public Health. 2004;14(2):134–140.
48. Zantinge EM, Verhaak PFM, de Bakker DH, et al. The workload of general
practitioners does not affect their awareness of patients’ psychological problems.
Patient Education Counseling. 2007;67(1–2):93–99.
49. Kruse J, Schmitz N, Woller W, et al. Why does the general practitioner overlook
psychological disorders in his patient? Determinates of physicians’ identification with
psychological disorders. Psychotherapie Psychosomatik Medizinische Psychologie.
2004;54(2):45–51.
50. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the
recognition by general practitioners of major depression in women? Br J Gen Pract.
1995;45:575–578.
51. Kessler D, Lloyd K, Lewis G, et al. Cross sectional study of symptom attribution
and recognition of depression and anxiety in primary care. BMJ.
1999;318:436–439.
52. Weich S, Lewis G, Mann AH, et al. The somatic presentation of psychiatric morbidity
in general practice. Br J Gen Pract. 1995;45:143–147.
53. Yeung A, Chang D, Gresham RL, et al. Illness beliefs of depressed Chinese American
patients in primary care. J Nerv Mental Dis. 2004;192(4):324–327.
54. Cornford CS, Hill A, Reilly J. How patients with depressive symptoms view their
condition: a qualitative study. Fam Pract. 2007;24(4): 358–364.
55. Bridges KW, Goldberg DP. Somatic presentation of DSM-III psychiatric disorders in
primary care. J Psychosom Res. 1985;29:563–569.
56. Susman JL, Crabtree BF, Essink G. Depression in rural family practice: easy to
recognize, difficult to diagnose. Arch Fam Med. 1995;4:427–431.
57. Sartorius N, Ustun TB, Lecrubier Y, et al. Depression comorbid with anxiety: results
from the WHO study on psychological disorders in primary health care. Br J
Psychiatry. 1996;168(Suppl. 30):38–43.
58. Freeling P, Rao BM, Paykel ES, et al. Unrecognised depression in general practice.
BMJ. 1985;290:1880–1883.
59. Tylee AT, Freeling P, Kerry S. Why do general practitioners recognize major depression
in one woman patient yet miss it in another? Br J Gen Pract. 1993;43:327–330.
60. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the
recognition by general practitioners of major depression in women? Br J Gen Pract.
1995;45:575–578.
61. Coulehan JL, Schulberg HC, Block MR, et al. Medical comorbidity of major depressive
disorder in a primary medical practice. Arch Intern Med. 1990;150:2363–2367.
62. Freeling P, Rao BM, Paykel ES, et al. Unrecognized depression in general practice.
BMJ. 1985;290:1880–1883.
63. Keeley RD, Smith JL, Nutting PA, et al. Does a depression intervention result in
improved outcomes for patients presenting with physical symptoms? J Gen Intern
Med. 2004;19:615–623.
64. Vuorilehto M, Melartin T, Isometsa E. Depressive disorders in primary care:
recurrent, chronic, and co-morbid. Psychol Med. 2005;35(5):673–682.
65. Priest RG, Vize C, Roberts A, et al. Lay people’s attitudes to treatment of depression:
Results of opinion poll for defeat depression campaign just before its launch. BMJ.
1996;313:858–859.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 79

66. Prior L, Wood F, Lewis G, et al. Stigma revisited, disclosure of emotional problems in
primary care consultations in Wales. Social Sci Med. 2003;56(10):2191–2200.
67. Cape J, McCullough Y. Patients’ reasons for not presenting emotional problems in
general practice consultations. Br J Gen Pract. 1999;49(448):875–879.
68. Leaf PJ, Livingston MM, Tischler GL, et al. Contact with health professionals
for the treatment of psychiatric and emotional problems. Med Care.
1985;23:1322–1337.
69. Maginn S, Boardman AP, Craig TKJ, et al. The detection of psychological problems
by general practitioners—Influence of ethnicity and other demographic variables.
Social Psychiatry Psychiatric Epidemiol. 2004;39(6):464–471.
70. Probst JC, Laditka SB, Moore CG, et al. Race and ethnicity differences in reporting of
depressive symptoms. Administration And Policy In Mental Health And Mental
Health Services Research. 2007;34(6):519–529.
71. Williams JWJ, Mulrow CD, Kroenke K, et al. Case-finding for depression in primary
care: a randomized trial. Am J Med. 1999;106:36–43.
72. Simon GE, Von Korff M, Picinelli M, et al. An international study of the relation
between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335.
73. Davenport S, Goldberg D, Millar T. How psychiatric disorders are missed during
medical consultations. Lancet, 1987;330(8556):439–441.
74. O’Connor DW, Rosewarne R, Bruce A. Depression in primary care. 1:Elderly
patients’ disclosure of depressive symptoms to their doctors. Int Psychogeriatr.
2001;13(3):359–365.
75. Bushnell J, McLeod D, Dowell A, et al. Do patients want to disclose psychological
problems to GPs? Fam Pract. 2005;22(6): 631–637.
76. Verhaak PFM, Bensing JM, Van der Brink-Mulinen A. GP mental health care in 10
European countries: patients’ demands and GPs’ responses. Eur J Psychiatry.
2007;21(1):7–16.
77. Cape J, McCulloch Y. Patients’ reasons for not presenting emotional problems in
general practice consultations. Br J Gen Pract. 1999;49(448): 875–879.
78. Del Piccolo L, Saltini A, Zimmermann C. Which patients talk about stressful life
events and social problems to the general practitioner? Psychol Med.
1998;28(6):1289–1299.
79. Pollock K. Maintaining face in the presentation of depression: constraining the
therapeutic potential of the consultation. Health (London). 2007;11(2): 163–180.
80. Zandbelt LC, Smets EMA, Oort FJ, et al. Determinants of physicians’ patient-
centred behaviour in the medical specialist encounter. Social Sci Med.
2006;63(4):899–910.
81. Del Piccolo L, Mazzi M, Saltini A, et al. Inter- and intra-individual variations in
physicians’ verbal behaviour during primary care consultations. Social Sci Med.
2002;55(10):1871–1885.
82. Epstein RM, Hadee T, Carroll J, et al. ‘‘Could this be something serious?’’—
Reassurance, uncertainty, and empathy in response to patients’ expressions of
worry. J Gen Intern Med. 2007;22(12): 1731–1739.
83. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs
during the consultation related to the diagnosis? A cross-sectional study in six
European countries. Patient Education Counseling. 2004;54(3):283–289.
84. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to
their perceptions of illness severity, coping and social support? Social Sci Med.
2002;55(7):1245–1253.
80 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

85. Feldman MD, Franks P, Epstein RM, et al. Do patient requests for antidepressants
enhance or hinder physicians’ evaluation of depression? A randomized controlled
trial. Med Care. 2006;44(12):1107–1113.
86. Watts SC, Bhutani GE, Stout IH, et al. Mental health in older adult recipients of
primary care services: is depression the key issues? Identification, treatment and the
general practitioner. Int J Geriatr Psychiatry. 2002;17:427–437.
87. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of
emotional distress in primary care. Psychol Med. 2004;34(4):729–739.
88. Maguire P. Improving the recognition of concerns and affective disorders in cancer
patients. Recent Advances in Clinical Psychiatry. 1992;7:15–30.
89. Goldberg DP, Jenkins L, Millar T, et al. The ability of trainee general
practitioners to identify psychological distress among their patients. Psychol
Med. 1993;23:185–193.
90. Tobin M, Hickie I, Urbanc A. Increasing general practitioner skills with patients with
serious mental illness. Aust Health Rev. 1997;20:55–67.
91. Zimmermann C, Del Piccolo L, Finset A. Cues and concerns by patients in medical
consultations: A literature review. Psychol Bull. 2007;133(3):438–463.
92. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to
their perceptions of illness severity, coping and social support? Social Sci Med.
2002;55(7):1245–1253.
93. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs
during the consultation related to the diagnosis? A cross-sectional study in six
European countries. Patient Education and Counseling. 2004;54(3):283–289.
94. Marks JN, Goldberg DP, Hillier VF. Determinants of the ability of general
practitioners to detect psychiatric illness. Psychol Med. 1979;9(2):337–353.
95. Badger LLW, deGruy FV, Hartman MA, et al. Psychosocial interest, medical
interviews, and the recognition of depression. Arch Fam Med. 1994;3:899–907.
96. Carney PA, Eliassen MS, Wolford GL, et al. How physician communication
influences recognition of depression in primary care. J Fam Pract.
1999;48(12):958–964.
97. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment
provided primary care patients with major depression. Arch Fam Med.
2000;9:150–154.
98. Dowrick C, Gask L, Perry R, et al. Do general practitioners’ attitudes towards
depression predict their clinical behaviour? Psychol Med. 2000;30:413–419.
99. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of
emotional distress in primary care. Psychol Med. 2004;34(4):729–739.
100. Levinson W, Roter D. Physicians psychosocial beliefs correlate with their patient
communication-skills. J Gen Intern Med. 1995;10(7):375–379.
101. A report of the Joint Consultative Committee. Primary care psychiatry—the last
frontier. Canberra: Royal Australian College of General Practitioners and Royal
Australian and New Zealand College of Psychiatrists, 1997.
102. Fischer LR, Wei F, Solberg LI, e tal. Treatment of elderly and other adult patients for
depression in primary care. J Am Geriatr Soc. 2003;51(11):1554–1562.
103. Tai-Seale M, Bramson R, Drukker D, et al. Understanding primary care physicians’
propensity to assess elderly patients for depression using interaction and survey data.
Med Care. 2005;43(12):1217–1224.
104. Tai-Seale M, McGuire T, Colenda C, et al. Two-minute mental health care for elderly
patients: Inside primary care visits. J Am Geriatr Soc. 2007;55(12):1903–1911.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? 81

105. Adelman RD, Greene MG, Friedmann E, et al. Discussion of depression in follow-up
medical visits with older patients. J Am Geriatr Soc. 2008;56(1):16–22.
106. Cape J. Patient-rated therapeutic relationship and outcome in general practitioner
treatment of psychological problems. Br J Clin Psychol. 2000;39(4):383–395.
107. Goldberg D, Jenkins L, Millar T, et al. The ability of trainee general practitioners to
identify psychological distress among their patients. Psychol Med. 1993;23:185–193.
108. Ishikawa H, Takayama T, Yamazaki Y, et al. The interaction between physician and
patient communication behaviors in Japanese cancer consultations and the influence
of personal and consultation characteristics. Patient Education Counseling.
2002;46(4):277–285.
109. Del Piccolo L, Saltini A, Zimmermann C, et al. Differences in verbal behaviours of
patients with and without emotional distress during primary care consultations.
Psychol Med. 2000;30(3):629–643.
110. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in
primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol
Med. 2005;35:1185–1195.
111. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in
general practice: determinants of general practitioners’ psychological diagnosis. Gen
Hosp Psychiatry. 2006;28:125–132.
112. Rost K, Zhang M, Fortney J, et al. Persistently poor outcomes of undetected major
depression in primary care. Gen Hosp Psychiatry. 1998;20:12–20.
113. Kessler D, Bennewith O, Lewis G, et al. Detection of depression and anxiety in
primary care: follow-up study. BMJ. 2002;325:1016–1017.
114. Jackson JL, Passamonti M, Kroenke K. Outcome and impact of mental disorders in
primary care at 5 years. Psychosom Med. 2007;69(3):270–276.
115. Rost K, Smith R, Matthews DB, et al. The deliberate misdiagnosis of major depression
in primary care. Arch Fam Med. 1994;3(4):333–337.
116. Hahn SR. Physical symptoms and physician-experienced difficulty in the physician-
patient relationship. Ann Intern Med. 2001;134(9):897–904.
117. Carson AJ, Stone J, Warlow C, et al. Patients whom neurologists find difficult to help.
J Neurol Neurosurg Psychiatry. 2004;75(12):1776–1778.
118. Jackson JL, Kroenke K. Difficult patient encounters in the ambulatory clinic: clinical
predictors and outcomes. Arch Intern Med. 1999;159:1069–1075.
119. Jackson JL, Kroenke K, Chamberlin J. Effects of physician awareness of symptom-
related expectations and mental disorders—A controlled trial. Arch Fam Med.
1999;8(2):135–142.
120. Olfson M, Gilbert T, Weissman M, et al. Recognition of emotional distress in
physically healthy primary care patients who perceive poor physical health. Gen
Hosp Psychiatry. 1995;17:173–180.
121. Perez Stable E, Miranda J, Munoz RF. Depression in medical outpatients:
underrecognition and misdiagnosis. Arch Intern Med. 1990;150:1083–1088.
122. Schwenk TL, Coyne JC, Fechner-Bates S. Differences between detected and
undetected patients in primary care and depressed psychiatric patients. Gen Hosp
Psychiatry. 1996;18:407–415.
123. Hyde J, Evans J, Sharp D, et al. Deciding who gets treatment for depression and
anxiety: a study of consecutive GP attenders. Br J Gen Pract. 2005;55(520):846–853.
124. Ani C, Bazargan M, Hindman D, et al. Depression symptomatology and diagnosis:
discordance between patients and physicians in primary care settings. BMC Family
Practice 2008;9:1.
82 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

125. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection
of depression in older primary care patients. Australian N Z J Psychiatry.
2005;39(4):262–265.
126. O’Conner DW, Rosewarne R, Bruce A. Depression in primary care 2: General
practioners’ recognition of major depression in elderly patients. Int Psychogeratrics.
2001;13(3):367–374.
127. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care
physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12.
128. Ormel J, Van den Brink W, Koeter MW, et al. Recognition, management and outcome
of psychological disorders in primary care: a naturalistic follow-up study. Psychol
Med. 1990;20:909–923.
129. Pini S, Berardi D, Rucci P, et al. Identification of psychiatric distress by primary care
physicians. Gen Hosp Psychiatry. 1997;19:411–418.
130. Pini S, Perkonnig A, Tansella M, et al. Prevalence and 12-month outcome of threshold
and sub-threshold mental disorders in primary care. J Affective Disorders.
1999;56:37–48.
131. Seaburn DB, Morse D, McDaniel SH, et al. Physician responses to ambiguous patient
symptoms. J Gen Intern Med. 2005;20(6):525–530.
132. Menchetti M, Cevenini N, De Ronchi D, et al. Depression and frequent attendance in
elderly primary care patients. Gen Hosp Psychiatry. 2006;28(2):119–124.
133. van Schaik DJF, Klijn AFJ, van Hout HPJ, et al. Patients’ preferences in the treatment
of depressive disorder in primary care. Gen Hosp Psychiatry. 2004;26(3):184–189.
134. Gross R, Brammli-Greenberg S, Tabenkin H, et al. Primary care physicians’
discussion of emotional distress and patient satisfaction. Int J Psychiatry Med.
2007;37(3):331–345.
135. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in
primary care. Gen Hosp Psychiatry. 2002;24:213–224.
136. Gask L, McGrath G, Goldberg D, et al. Improving the psychiatric skills of established
general practitioners: evaluation of group teaching. Med Educ. 1987;21:362–368.
137. Gask L, Usherwood T, Thompson H, et al. Evaluation of a training package in the
assessment and management of depression in primary care. Med Educ.
1998;32:190–198.
138. Kaaya S, Goldberg D, Gask L. Management of somatic presentations of psychiatric
illness in general medical settings: evaluation of a new training course for general
practitioners. Med Educ. 1992;26:138–144.
139. Shapiro S, German PS, Skinner EA, et al. An experiment to change detection and
management of mental morbidity in primary care. Med Care. 1987;25:327–339.
140. Gallo JJ, Rabins PV. Depression without sadness: alternative presentations of
depression in late life. Am Fam Physician. 1999;60:820–826.
141. Gallo JJ, Rabins PV, Anthony JC. Sadness in older persons: 13-year follow-up of a
community sample in Baltimore, Maryland. Psychol Med. 1999;29:341–350.
142. Cooper LA, Brown C, Vu HT, et al. Primary care patients’ opinions regarding the
importance of various aspects of care for depression. Gen Hosp Psychiatry.
2000;22(3):163–173.
4
HOW CAN EXISTING MOOD SCALES BE
IMPROVED? HOW TO TEST, REFINE, AND
IMPROVE EXISTING SCALES

Adam B. Smith

1. Introduction
2. The Rasch Model and Other Item Response Models
3. Conclusion

Context
Many scales and tools have been developed by expert opinion. Several methods
are available by which tools can be field tested in order to more accurately
gauge their diagnostic potential. Promising new methods including item banks
and computer-adaptive tests are under development to maximize the efficiency
of screening tools for depression.

1. Introduction
Various methods are available to diagnose psychiatric disorders (see
Chapter 2), but in the absence of a formal semi-structured psychiatric assess-
ment, which remains impractical, the most commonly used method for asses-
sing and screening levels of emotional distress remains by self-completed
questionnaire.1 There have been many hundreds of validation attempts, com-
paring the severity questions against clinical judgment, semi-structured inter-
views, DSM and ICD criteria, and of course each other. Almost universally in
primary care, community, and specialist settings, their accuracy is imperfect
and further refinement is required. When tested according to their ability to
enhance the detection and quality of care for depression, the efficacy of these
instruments remains modest.2 A recent review from Gilbody and colleagues3

83
84 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

found that screening and case-finding instruments were associated with a

modest increase in the recognition of depression by clinicians (relative risk
[RR] 1.27, 95% confidence interval [CI] 1.02 to 1.59) and only a borderline
significant effect on the overall management of depression (RR 1.30, 95% CI
0.97 to 1.76). Seven studies provided data on the impact of screening on
depression outcomes, but there was no evidence of an effect (standardized
mean difference –0.02, 95% CI –0.25 to 0.20). No doubt some of the problem
lies with the organizational elements that may (or may not) accompany
screening and some lies with clinicians’ willingness to treat a probable case.
However, some blame also lies with the instruments themselves, as most were
developed by expert opinion rather than by a scientific process.

Tool Development
The quantitative methods that enable evaluation of the diagnostic accuracy of
severity scales are discussed in Chapter 5. However, the evaluation of scales
should be viewed in a wider context of tool development (Table 4.1). In the
preclinical phase a tool is developed, often in the case of depression borrowing
from existing scales and usually by consensus rather than by scientific testing.
In phases I and II preliminary testing occurs, ideally in the clinically repre-
sentative sample with several competing comparison groups. These diagnostic
validity studies do not prove that the tool is useful, rather that it is potentially

Table 4.1. Stages in the Evaluation of the Screening Tool

Stage Purpose Description

Preclinical Tool Here the aim is to develop a screening method that
development is likely to help in the detection of the underlying
disorder, either in a specific setting or in all
settings. Issues of acceptability of the tool to both
patients and staff must be considered for
implementation to be successful.
Phase I screen Early diagnostic The aim is to evaluate the early design of the
validity testing screening method against a known (ideally
in a selected accurate) standard known as the criterion
sample and reference. In early testing the tool may be refined,
refinement of selecting the most useful aspects and deleting
tool redundant aspects to make the tool as efficient
(brief) as possible while retaining its value.
Phase II screen Diagnostic The aim is to assess the refined tool against a
validity in a criterion (gold standard) in a real-world sample
representative where the comparator subjects may represent
sample several competing conditions that may otherwise
cause difficulty regarding differential diagnosis.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 85

Table 4.1. (Continued)

Stage Purpose Description

Phase III screen Screening This is an important step in which the tool is
randomized evaluated clinically in one group with access to the
controlled trial; new method compared to a second group (ideally
clinicians using selected in a randomized fashion) who make
vs. not using a assessments without the tool. This is akin to
screening tool randomized controlled trials for drugs, and the
outcome of interest is the number of additional
cases correctly diagnosed or ruled out compared
with assessment as usual.
Phase IV screen Screening In this last step the screening tool/method is
implementation introduced clinically but monitored to discover the
studies using effect on important patient outcomes such as new
real-world identifications, new cases treated, and new cases
outcomes entering remission. In short, the question here is
how much the tool influences patient outcomes
and how well the tool is accepted by clinicians
(uptake).

After Mitchell AJ Psycho-Oncol 17: S141, 2008.

accurate. Given a sufficient sample, a tool may be refined by field testing. This
is the basis of the remainder of this chapter. Ultimately the value of a tool must
be proven in the clinical environment by comparison against either an estab-
lished tool or clinical skills alone. The acceptability and availability of the tool
will ultimately influence its uptake as much as its efficacy.
Given that there are a large number of imperfect but widely used instru-
ments, it follows many could be refined by adding or removing items or
changing the weighting of scoring or possibly the diagnostic algorithm.
There have been recent attempts to improve efficacy of screening instruments
using modern psychometrics, most notably using Rasch models. These models
are part of a family of measurement models developed for educational psy-
chology and increasingly employed in test development and refinement in
medicine. Very frequently it is found that conventional instruments may be
shortened in length without significantly decreasing screening efficacy.
Occasionally this shortening is dramatic, reducing an instrument by half or
by a quarter. Yet it should be acknowledged that the ability of these adapted
instruments to identify levels of a key outcome variable, such as ‘‘distress
warranting intervention,’’ remains less than perfect. Combining items drawn
from a number of emotional distress instruments into an item bank may
improve screening efficacy while at the same time minimizing the number of
questions patients are required to answer and consequently reducing patient
burden. Item banks such as these and computer-adaptive tests, which tailor the
86 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

questions presented to patients’ responses, have already been successfully

developed for assessing emotional distress in a psychiatric population.4,5
This chapter describes the Rasch model and its application to mental health
research in more detail.

2. The Rasch Model and Other Item Response Models

In classical test theory, item difficulty (eg, the probability of subjects
responding ‘‘yes’’ or ‘‘no’’ to items or selecting a category from a number of
response options) is calculated from the number of responses or proportion of
responses in the sample.6 The major drawback of this approach is that estima-
tion of item difficulty is sample dependent: the ‘‘endorsability’’ of any given
item will be larger if drawn from a more able population (eg, a healthier
population) than if drawn from a less able population. A similar approach
can also be applied to estimating ‘‘person ability’’ (eg, quality of life, physical
health). Any given estimate of an individual’s ability on a latent (ie, not directly
observable) trait will be dependent on the range of difficulties of the items
presented.
Rasch models7 overcome this problem of sample dependency by esti-
mating person ability and item difficulty independently.8 The raw data are
the sufficient statistics for estimating these parameters—that is, the models
use only the raw scores from individuals for estimating item difficulties
and the response sets across items for estimating person ability estimates.8
To achieve the separation of item and person parameter estimations, the
Rasch models rely on two assumptions: unidimensionality and local
dependence.
Rasch models assume that a uniform latent trait or construct underlies the
data being investigated (eg, mathematical knowledge, physical health). This
assumption is then tested using fit statistics and/or principal components
analysis of residuals. Local independence is related to unidimensionality and
refers to the assumption that the single latent trait (ie, the unidimensionality)
accounts for all the variance in the data—that is, the association between the
variables in a dataset should disappear once the Rasch model has been con-
trolled for.9 It is possible to have unidimensionality but not local dependence;
however, if local independence is proven, then there must also be unidimen-
sionality in the data set. If the assumptions have been met, then the (log)
probability of a person responding to an item can be expressed as the difference
between the individual’s ability and the item difficulty. Unlike in classical test
design, the person ability and item difficulty parameters are estimated jointly to
produce estimates (referred to as ‘‘logits’’ or log-odds), which are independent
of both the items and sample employed.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 87

Assessing the Rasch Model

A fundamental criterion underlying these models is unidimensionality—that
is, a single latent trait should explain the variance in the data. In the absence of
unidimensionality, constituent parts of an instrument cannot be summed to
create a summary index. Unidimensionality can be assessed through principal
components analysis, where the first factor extracted corresponds to the Rasch
‘‘factor,’’ or latent trait.10 Any additional factors extracted can be investigated
to confirm whether these form true factors or random noise. In addition to this,
unidimensionality can be assessed using fit statistics. Both item fit and person
fit to the Rasch model can be evaluated. Fit statistics have an expected value of
1.0 and can range from 0 to infinity. Deviations in excess of the expected value
can be interpreted as noise or lack of fit between the items and the model,
whereas values significantly lower than the expected value can be interpreted
as item redundancy or overlap.
Identifying misfitting items allows those items adding noise to the analysis
to be removed from a scale. The suggested limits for fit statistics are between
0.7 and 1.3, with those items with fit statistics greater than 1.3 being identified
as misfitting.11,12
A similar analysis may also be applied to the response categories and
thresholds (ie, the point at which response to categories is equally probable
between categories). Within the Rasch model the average level of the latent
trait (‘‘ability’’) should increase monotonically across categories. Disordering
of categories, where the average level of the latent trait does not increase in this
manner, may interfere with measurement precision. Therefore, disordered
response categories may be collapsed or items removed to improve fit to the
Rasch model.9
Finally, an additional requirement for Rasch models is item invariance—
that is, item parameter estimates should be independent of the sample used.
Item invariance or differential item functioning (DIF) may be evaluated using
defined subgroups (eg, gender, diagnosis).
When items fit the model, an interval scale is produced where differences
between adjacent scores on a scale are equally spaced. This has important
implications for measurement, since this allows meaningful comparisons to be
made of changes in scores of equal intervals along the latent trait.13 Recent
work has suggested that changes of around 0.5 logits may suggest a clinically
meaningful difference.14

Features of the Rasch Model

The Rasch model is more accurately referred to as belonging to a family of
models. Rasch’s original dichotomous model7 has been extended to
88 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

incorporate polytomous data—that is, from questionnaires incorporating mul-

tiple (more than two) response options. Popular models within health research
are the Rating Scale15 and the Partial Credit Model.16
In the Rasch model the estimates of person ability (or person measure) and
item location or difficulty are located along the same continuum (eg,
Depression). For instance, Figure 4.1 shows a ‘‘person-item map’’ from an
item bank developed for assessing emotional distress in cancer patients.17 The
left side of the map represents the distribution of person measures along the
continuum and the right side describes the location of the items.
As discussed above, the Rasch model describes a probabilistic relationship
between a person’s measure and the item location. For instance, from
Figure 4.1, the Rasch model allows us to state that a patient with a level of
distress around –1 logits will be more likely to endorse items at a corre-
sponding level, such as General Health Questionnaire (GHQ)-1 item (‘‘con-
centration’’) and MHI-1 (‘‘nervousness’’), as well as items below this point, but
would be less likely to endorse items further along the latent trait, such as
Patient Health Questionnaire (PHQ)-9 (‘‘suicidal ideation’’). This analysis can
be extended to the thresholds between each response category (Fig. 4.2).
An additional important feature of Rasch models is that the models can
equate different questionnaires completed by different subgroups of patients,
assuming that a common subset of items exists that all patients have completed.
This process then enables a range of items measuring the same latent trait to be
collated to form an item bank. The development of an item bank may help
improve static questionnaires by including fewer and more relevant questions,
which could cover a broader and more representative spectrum of the latent
trait (for assessment) or may be more focused on discrete areas of clinical
interest, such as clinical thresholds (for screening). It also paves the way for the
development of computer-adaptive testing,18 creating programs that tailor
questions to individual patients based on their previous responses, allowing
an accurate assessment of the patient (eg, level of psychological distress) with
fewer questions.
Taken together, Rasch models offer a number of advantages, including
improving existing measures, reducing the number of items in question-
naires, and allowing the development of item banks and computer-adap-
tive tests.

Application of the Rasch Model to Mental Health Measures

In traditional test theory, questionnaires are often designed and validated using
techniques such as factor analysis. In addition to the sample dependence of
these approaches as described above, rating scales produce ordinal data that do
not meet the assumptions behind factor analyses, potentially leading to
Person Measures Item Lovation

4 +

– |

3 +

– |

2 – + phq9

– |

– |T

1 – + d2

–# | ef4 ghq8 phq6 phq8

–# T|S a7 bdi6 bdi8 d6 ef3 ghq12

0 – ## + a5 d1 ef1 phq1 phq2 stai13

– #### |M a1 a2 a3 a4 bdi1 ef2

– ###### S| bdi2 bdi9 ewb4 ghq3 mhi2 stail

–1 – ######### + bdi4 ewb1 ewb5 ghq1 mhi1 phq3

– ########## |S d4 mhi4 phq4

– ############ | ewb6

–2 – ############ M +

– ######### |T ghq7

– ######## |

–3 – ######### + bdi11

– ###### S|

– ##### | bdi12

–4 – +

– #### |

– T|

–5 – ### +

Figure 4.1. Item-Person Map for Item Bank.

PATSS MAP OF QUESS – 50% Cumulative probabilities (Rasch–Thurstone thresholds)

< more > |

4 + bdi6 .4
| bdi1 .4
. | ghq4 .4
3 + bdi6 .3 bdi2 .4
| bdi1 .3 ghq3 .4
. | phq9 .4
2 . + bdi2 .3 d2 .4
. | phq9 .3 ef4 .4
. |T d6 .4
1 . + a5 .4
.# | d2 .3 a2 .4 ewb4 .5
. # T | S bdi6 .2 ef4 .3 a1 .4 ewb1 .5
0 . ## + a5 .3 bdi12 .4 mhi4 .5
. #### | M bdi1 .2 bdi11 .3 d4 .4 ewb6 .5
. ###### S | bdi2 .2 a1 .3 ewb1 .4
–1 . ######### + bdi4 .2 bdi12 .3 phq4 .4
. ########## | S d2 .2 ewb1 .3 ewb6 .4
. ############ | phq1 .2 d4 .3
–2 . ############ M + a5 .2 ewb6 .3
. ######### | T d1 .2 ghq7 .3
. ######## | a1 .2
–3 . ######### + bdi11 .2
. ###### S |
. ##### | bdi12 .2
–4 . +
. #### |
. T|
–5 . ### +
|
. |
–6 . +
|
|
–7 +
|
| ghq8 .2
–8 + ghq12 .2
|
| ghq3 .2
–9 +
| ghq1 .2
|
–10 +
| ghq7 .2
|
–11 . ### +
<less> | <frequ>

Figure 4.2. Rasch-Thurstone Thresholds for Item Bank.

4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 91

misinterpretation of results.19 Furthermore, these ordinal scales are often

summed to produce total scores that are assumed to meet the criteria of interval
scales; frequently these assumptions are not tested.13
A number of studies have recently described the application of Rasch
models to mental health instruments to overcome the shortcomings of tradi-
tional test theory and design.

Unidimensionality, Item Reduction, and Differential Item Functioning

The Rasch model has been applied to a number of mental health instruments,
including the Beck Depression Inventory (BDI),20 the Zung Self-Rating
Depression Scale,21 the Geriatric Depression Scale (GDS),22 and the
Symptom Checklist (SCL-90 and SCL-90R) (see table 4.2).23 The application
of the model to four of the most commonly used mental health instruments,
namely the Center for Epidemiologic Studies Depression Scale (CES-D),24 the
Hospital Anxiety and Depression Scale (HADS),25 the Hamilton Depression
Scale (HAM-D),26 and the Edinburgh Postnatal Depression Scale (EPDS),27 is
discussed in this section.
These four instruments have been well validated using traditional test theory
involving reliability and validity studies and factor analyses, yet despite this

Table 4.2. Examples of Rasch-Refined Mood Scales

Stage Original Rasch-Derived Unidimensionality Reference

Length Length Shown
CES-D 20 items 13 items Yes Covic et al.
(2007)29
HADS 14 items 11 items Yes Smith et al.
(2006)31
EPDS 10 items 8 items Yes Pallant et al.
(2006)32
Hamilton 17 items 6 items No Licht et al.
(2005)35
Beck 21 items Not changed No Bouman & Kok
(1987)20
Zung SDS 20 items Not changed Yes Hong & Min
(2007)21
GDS 15 items 11 items Yes Tang et al.
(2005)22
SCL90 92 items 63 items Yes (for non- Olsen et al.
psychotic items) (2004)23
SCL25 25 items 8 items Yes Fink et al.
(1995)47
92 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

there has been little previous evidence to support the assumption that these
questionnaires are unidimensional.
Stansbury and colleagues28 applied the Rasch model to the full CES-D
completed by a large community sample of elderly participants. Four of the
positively worded items were identified as misfitting and removed. The
remaining 16 items formed a unidimensional structure that was verified
using confirmatory factor analysis. Additionally, the removal of the misfitting
items also reduced the floor effects that had been observed in this sample.
Covic and colleagues29 demonstrated, using a sample of patients with rheu-
matoid arthritis, that three additional items (appetite, restlessness, sadness)
misfitted the Rasch model. The resulting 13-item CES-D demonstrated good
internal validity. In contrast to these two studies, Pickard and colleagues30
found no misfit for the CES-D in primary care patients, although misfit was
reported for three items that were not positively worded in stroke patients.
Additionally, four items from this instrument demonstrated differential item
functioning when comparing the two patient samples.
Rasch studies of the HADS with cancer patients31 and patients attending an
outpatient musculoskeletal rehabilitation program32 showed that the full
instrument is broadly unidimensional, although the individual subscales con-
tained items that misfitted. Similarly, an analysis of the Edinburgh Postnatal
Depression Scale has recommended that the original 10-item form be reduced
to eight items to produce a unidimensional instrument.33
In addition to identifying misfit, Rasch models have also been used to
develop short forms of these standard instruments. For instance, a 10-item
version of the CES-D has been validated using both Rasch and traditional test
methods,34 as well as the 6-item version of the HAM-D.35 Licht and collea-
gues35 compared the unidimensionality of the Bech-Rafaelsen Melancholia
Scale (MES) and the 17-item HAM-D in 1,629 patients with a major depressive
episode using Mokken and Rasch analysis. Unidimensionality of the
HAM-D-17 could not be confirmed; however, the HAM-D-6 and the MES
did fulfill criteria for unidimensionality.
There have also been recent attempts to apply Rasch models to the standar-
dized psychiatric interview schedule for major depression.36 A modified SCID
interview was used on a large sample of twins from the Virginia Twin Registry
(n = 2,163). Participants were asked to report whether they had experienced
any of the 14 disaggregated DSM-III-R criteria for major depression. The
Rasch model was used to derive liability thresholds (the point at which there
is a 50% probability of a given diagnostic category being endorsed) for the
10 symptom criteria for major depression. The results demonstrated an uneven
spacing between liability thresholds where ‘‘depressed mood’’ was easiest to
endorse (–1.8 logits) and ‘‘suicidal ideation’’ at the other end of the latent trait
(2.5 logits) was hardest to endorse, suggesting a tentative link between the
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 93

latent trait as measured by the Rasch model and that derived from a formal
psychiatric interview.
Other more general distress and psychopathology tools have also been
tested using Rasch models. For example, the 90-item SCL and the 25-item
SCL-25 have been improved.23

Clinical Testing and Clinical Impact

Ultimately any tool (original or adapted) should be field tested, even if the
refinement is minor. In a robust test of a newly developed tool (let’s use the
hypothetical example of CES-D-Revised), the new scale would be compared
alongside the original scale, and unassisted clinical diagnosis against a robust
gold standard such as the SCID for DSM-IV major depression. Any additional
detection beyond the unassisted clinician would suggest that the scale is
clinically useful; any additional detection beyond that achieved by the original
scale would suggest that the new scale is an improvement. If the new version is
shorter, both accuracy and efficiency may be enhanced, and hence accept-
ability increased. If the new version is longer, accuracy may be improved at the
expense of efficiency, and then a clinical judgment is required to explore which
is most useful. Sadly, very few well-designed validation studies exist.
A few studies have employed Rasch models to assess the impact of misfit
and the subsequent removal of misfitting on the diagnostic accuracy of mental
health measures. Smith and colleagues31 applied the Rasch model to both the
full 14-item HADS25 as well as the 7-item anxiety and depression subscales.
In addition to completing the HADS, a subset of cancer patients had also
received a psychiatric assessment in the form of either the Present State
Examination (PSE)37 or the Schedules for Assessment in Clinical
Neuropsychiatry (SCAN World Health Organisation).38 Three items from
the full HADS were identified as misfitting the Rasch model, in addition to
one misfitting item from the subscales. Removal of the items had little or no
impact on the specificity and sensitivity of the scales (including the area under
the curve [AUC]).
Similarly, Tang and colleagues22 identified four items from the GDS that
did not fit the Rasch model. The GDS data were derived from a community
sample of patients with pneumoconiosis who had also received a structured
psychiatric interview with the aim of diagnosing depressive disorders. Once
again, the results demonstrated that removing the misfitting items did not affect
the AUC or sensitivity and specificity.

Item Banking and Computer-Adaptive Testing

The ability of the Rasch model to derive item locations for different instru-
ments and to allow evaluations of whether these items form a unidimensional
construct creates the opportunity to generate item banks. Various methods
94 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

exist for item banking39; however, a frequently employed approach is common

item equating,10 where patients complete a core set of questionnaires.
Additional items or instruments may be added by anchoring the locations for
the core set of items. Typically in this scenario patients will have completed the
core set of items along with further items. The benefit of item banking is that
patients do not have to complete all the questionnaires, which therefore reduces
not only patient burden but also the costs of developing the item bank.
After item banks are developed, two further steps can be taken: (1) the
development of multiple fixed short forms derived from the item bank (see
Ware and associates40 for an example of the development of a short form of the
headache impact scale) and (2) the development of computer-adaptive tests.
Computer-adaptive tests (eg, Wainer18) tailor the items presented to the patient
on the basis of his or her previous responses. They generally present an initial
item aimed at the average level of the latent trait in the target population
(eg, average level of depression); subsequent questions presented are either
easier or harder to endorse. At each step the patient’s level of latent trait
(eg, depression) is estimated until a predetermined number of questions has
been presented or the standard error of the estimate falls below a given
predetermined level.
Computer-adaptive test systems provide a greater level of precision in
estimating the latent trait and may be designed to allow a broad assessment
of, for instance, depression, or specifically designed to present more questions
around diagnostic categories. Another benefit of these systems is that fewer
questions need to be completed by the patient (for the same or greater level of
accuracy).
The development of item banks and computer-adaptive tests has been
progressing apace in fields such as physical health,41 although in mental
health this area is still in its infancy. However, recently an item bank has
been developed for assessing psychological distress in cancer patients.17
A large sample of cancer patients completed the HADS25 in addition to a
variety of other instruments, including the GHQ-12,42, BDI,43 PHQ-9,44 and
Spielberger State-Trait Anxiety Inventory (STAI).45 Common item equating
using the HADS as the anchor was used to create the item bank. The initial
83 items were reduced to a unidimensional item bank with good internal
reliability (Cronbach’s alpha = 0.84) consisting of 63 items once misfitting
items had been removed. An analysis of the item-person map (see Fig. 4.1)
demonstrated good face validity: questions concerning suicidal ideation
were hardest to endorse, whereas questions concerning fatigue and energy
were easiest to endorse. Further analysis of the item-person map revealed
that items tended to be targeted at moderate to high levels of distress,
indicating a floor effect for low levels of distress, potentially requiring
additional items.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 95

Computer-adaptive tests have already been developed for use with psy-
chiatric populations to identify emotional distress.4,5 Fliege and associates4
have developed a system for measuring depression (‘‘D-CAT’’) in a psy-
chosomatic patient sample. Patients completed 11 mental health question-
naires that were subsequently rated as indicative of depressive
symptomatology by expert reviewers. A total of 320 items from the original
questionnaires produced an item bank of 64 items. A simulation study using
patients’ actual responses to the questions demonstrated that levels of
depression could be estimated reliably from six items. Scores generated
from the D-CAT system fell within 2 standard deviations of the sample
mean and correlated well with the overall item bank and two standard
mental health measures (BDI and CES-D).
Finally, recently Gibbons and colleagues46 developed a computer-adaptive
test derived from the 626-item Mood and Anxiety Spectrum Scales (MASS).
This system was designed to identify anxiety and mood disorders in patients
attending outpatient clinics. The study demonstrated that the number of items
presented to patients could be reduced to 24 to 30 items without a loss of
information, representing a significant reduction in both administration time
and patient burden.

3. Conclusion
Despite the intuitive appeal and ease of use of brief self-report instruments to
screen for depressive disorders, there remains a great deal of variability in the
efficacy of a number of commonly employed instruments. Many instruments
have been comprehensively validated by traditional test methods, but issues
still remain about unidimensionality, floor and ceiling effects, and instrument
performance across different groups of patients. Rasch models7 have the
potential to address and overcome these issues, generating instruments that
are independent across samples and providing the basis for item banks and
computer-adaptive tests.
Although item banking is a relatively new area of development in health
measures, the U.S. National Institutes of Health has recently provided major
funding for the Patient-Reported Outcomes Measurement Information System
(PROMIS) initiative, with one of the goals to produce computer-adaptive tests
for the clinical research community (http://nihroadmap.nih.gov/clinicalresearch/
promis.asp). The next step in the development of the item bank will be to develop
computer-adaptive testing systems. An important corollary to this will be to
continue to map the item bank, in particular levels of emotional distress, to both
psychiatric diagnoses of clinical anxiety and major depression, as well as clinical
guidelines. This will not only provide a potentially more sensitive instrument for
96 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

assessing and screening for distress, but will also assist in tailoring the manage-
ment of distress and associated interventions to individual patients.

References
1. Wright AF. Should general practitioners be testing for depression? Br J Gen Pract.
1994;44(380):132–135.
2. Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for
depression. Cochrane Database of Systematic Reviews. 2005, Issue 4.
3. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression:
a meta-analysis. CMAJ. 2008;178:997–1003.
4. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for
depression (D-CAT). Qual Life Res. 2005;14:2277–2291.
5. Walter OB, Becker J, Bjorner JB, et al. Development and evaluation of a computer
adaptive test for ‘Anxiety’ (Anxiety-CAT). Qual Life Res. 2007;16:S143–S155.
6. Suen HK. Principles of test theories. Hillsdale, NJ: Lawrence Erlbaum Associates,
1990.
7. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: The
University of Chicago Press, 1960/1980.
8. Wright BD, Masters G. Rating scale analysis. Chicago: MESA Press, 1982.
9. Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human
sciences. Mahwah, NJ: Lawrence Erlbaum Associates, 2001.
10. Linacre JM. A user’s guide to WINSTEPS/MINISTEPS Rasch-model computer
programs. 2007.
11. Lai JS, Cella D, Chang CH, et al. Item banking to improve, shorten and computerize
self-reported fatigue: an illustration of steps to create a core item bank from the
FACIT-Fatigue Scale. Qual Life Res. 2003;12(5):485–501.
12. Wright BD, Linacre JM, Gustafson J-E, et al. Reasonable mean-square fit values. Rasch
Measurement Transactions. 1994;8:370.
13. Stucki G, Daltroy L, Katz JN, et al. Interpretation of change scores in ordinal clinical
scales and health status measures: the whole may not equal the sum of the parts. J Clin
Epidemiol. 1996;49:711–717.
14. Lai JS, Eton DT. Clinically meaningful gaps. Rasch Measurement Transactions.
2002;15:850.
15. Andrich D. A rating formulation for ordered response categories. Psychometrika.
1978;43:561–573.
16. Masters GN. A Rasch model for partial credit scoring. Psychometrika.
1982;47:149–174.
17. Smith AB, Rush R, Velikova G, et al. The initial development of an item bank to assess
and screen for psychological distress in cancer patients. Psychooncology.
2007;16:724–732.
18. Wainer H. Computerized adaptive testing: a primer. Hillsdale, NJ: Lawrence Erlbaum
Associates, 1990.
19. Schumacker RE, Linacre JM. Factor analysis and Rasch. Rasch Measurement
Transactions. 1996;9:470.
20. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI):
applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand.
1987;76(5):568–573.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES 97

21. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale
incorporating latent class and Rasch rating scale models. Educ Psych Measure.
2007;67(2):280–299.
22. Tang WK, Wong E, Chiu HF, et al. The Geriatric Depression Scale should be shortened:
results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20:783–789.
23. Olsen LR, Mortensen EL, Bech P. The SCL-90 and SCL-90R versions validated by
item response models in a Danish community sample. Acta Psychiatr Scand.
2004;110(3):225–229.
24. Radloff LS. The CES-D scale: A self-report depression scale for research in the general
population. Applied Psych Measure. 1977;384–401.
25. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr
Scand. 1983;67:361–370.
26. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry.
1960;23:56–62.
27. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression. Development of
the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786.
28. Stansbury JP, Ried LD, Velozo CA. Unidimensionality and bandwidth in the Center for
Epidemiologic Studies Depression (CES-D) Scale. J Pers Assess. 2006;86:10–22.
29. Covic T, Pallant JF, Conaghan PG, et al. A longitudinal evaluation of the Center for
Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population
using Rasch analysis. Health Qual Life Outcomes. 2007;5:41.
30. Pickard AS, Dalal MR, Bushnell DM. A comparison of depressive symptoms in stroke
and primary care: applying Rasch models to evaluate the Center for Epidemiologic
Studies-Depression scale. Value Health. 2006;9:59–64.
31. Smith AB, Wright EP, Rush R, et al. Rasch analysis of the dimensional structure of the
Hospital Anxiety and Depression Scale. Psychooncology. 2006;15:817–827.
32. Pallant JF, Tennant A. An introduction to the Rasch measurement model: an example
using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol.
2007;46:1–18.
33. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Postnatal Depression
Scale using Rasch analysis. BMC Psychiatry. 2006;6:28.
34. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived
CES-D short form. Psychol Assess. 2004;16:360–372.
35. Licht RW, Qvitzau S, Allerup P, et al. Validation of the Bech-Rafaelsen Melancholia
Scale and the Hamilton Depression Scale in patients with major depression; is the total
score a valid measure of illness severity? Acta Psychiatr Scand. 2005;111:144–149.
36. Aggen SH, Neale MC, Kendler KS. DSM criteria for major depression: evaluating
symptom patterns using latent-trait item response models. Psychol Med.
2005;35:475–487.
37. Wing J Cooper JE, Sartorius N. The description of psychiatric symptoms: an
introduction manual for the PSE and CATEGO System. Cambridge: Cambridge
University Press, 1974.
38. World Health Organization. Mental health: new understanding, new hope. Geneva,
Switzerland: WHO, 1993.
39. Wolfe EW. Equating and item banking with the Rasch model. J Applied Measure.
2000;1(4):409–434.
40. Ware JE Jr, Kosinski M, Bjorner JB, et al. Applications of computerized adaptive
testing (CAT) to the assessment of headache impact. Qual Life Res.
2003;12(8):935–952.
98 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

41. Rose M, Bjorner JB, Becker J, et al. Evaluation of a preliminary physical function item
bank supported the expected advantages of the Patient-Reported Outcomes
Measurement Information System (PROMIS). J Clin Epidemiol. 2008;61:17–33.
42. Goldberg DP, Hillier VF. A scaled version of the General Health Questionnaire.
Psychol Med. 1979;9:139–145.
43. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch
Gen Psychiatry. 1961;4:561–571.
44. Kroenke KJ, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression
severity measure. Gen Intern Med. 2001;16:606–613.
45. Spielberger CD. Manual for the State-Trait Anxiety Inventory (STAI). Palo Alto, CA:
Consulting Psychologists Press, 1983.
46. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce
the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368.
5
HOW DO WE KNOW WHEN A SCREENING
TEST IS CLINICALLY USEFUL?

Alex J. Mitchell

1. How Do Clinicians Make a Diagnosis?

2. Scientific Aspects of Diagnostic Accuracy
3. Clinical Aspects of Diagnostic Accuracy
4. Testing Screening via Implementation Studies
5. Conclusions

Context
There is no shortage of suggested methods to screen for depression, including
clinical interviews. Assuming these are applied to a group containing patients with
depression and patients without depression, how do we decide which are the
optimal methods? In addition, how can tests be compared and how can tests
be combined? This chapter discusses the methods used to compared scales and
tools.

1. How Do Clinicians Make a Diagnosis?

The terms diagnosis and screening both refer to the application of an agreed
method to confirm those with a condition and to exclude those without the
condition (for discussion see Chapter 2). When attempting to separate
depressed versus non-depressed individuals there is always an overlap of
symptoms (or biological markers) (see Chapter 1, Fig. 1); therefore, a perfect
test based on current tests is unobtainable. Testing may be focused on those at
high risk of the condition (such as screening for depression after myocardial
infarction) or applied to a wider population (screening for depression in all

99
100 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

primary care patients). The former is a high-prevalence setting, which favors the
ability to confirm a condition, whereas the latter is a low-prevalence setting,
which favors the ability to refute a condition. It is often forgotten that the clinical
process of making a diagnosis is a form of screening itself. Here the tool is the
clinician’s clinical skill and the sample is all patients seen by the clinician. If a
clinician is attuned to the concept of depression, has a high index of suspicion,
and asks the right questions, then it is likely he or she will have high personal
diagnostic accuracy. If the clinician is unconfident, inexperienced, and
untrained, it is less likely that he or she will be able to make a correct diagnosis
(see Table 5.1 and Chapter 3). Some literature suggests that the added value of
screening tools for depression is apparent only in the latter situation.
A diagnostic test for depression is designed to help the clinician elicit and
weigh symptoms and signs to make a diagnosis. How, then, is this achieved,
and how does a screening test work in scientific terms?

Case Example
Consider the case illustrated in Textbox 5.1. A man who suffered a stroke
2 months previously now complains of five troubling symptoms. Assuming
these symptoms are elicited correctly, is he clinically depressed? Could the
somatic symptoms be features of stroke and not depression (see Chapters 10
and 11)? Five symptoms may immediately sound sufficient for a diagnosis,
but not all symptoms qualify under DSM-IV or ICD-10. For example, loss of
drive is not a qualifying feature and therefore, under these guidelines, must be
ignored. This leaves four qualifying symptoms and only one core symptom,
which is insufficient for a DSM-IV-based diagnosis of major depression.
However, using ICD-10, he does have two core features and two associated
features listed, but only at a level designated as a mild depressive episode.
Thus, clinicians who use a strict operational checklist approach may or may
not diagnose depression in this case. In fact, research suggests that fewer than
one in five psychiatrists would take this strict operational approach, and
fewer still use validated questionnaires such as the Patient Health

Table 5.1. Levels of Diagnostic Confidence

Prior Experience & No Prior Experience &

Training Training
Use a checklist or screening i. Trained, Assisted ii. Untrained, Assisted
tool
Do not use a checklist or iii. Trained, Unassisted iv. Untrained, Unassisted
screening tool
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 101

Textbox 5.1. Case History: Post-Stroke Depression?

A previously well 58-year-old man who suffered a dominant hemisphere

stroke 2 months previously is referred to an outpatient psychiatry clinic. He
reports that he has had five symptoms—low mood, loss of drive, low energy,
poor appetite, and insomnia—for the past 3 weeks. He has no other symptoms
on detailed questioning.

Core Symptoms ICD-10 DSM-IV

Persistent sadness or low mood Yes (core) Yes (core)
Loss of interests or pleasure Yes (core) Yes (core)
Fatigue or low energy Yes (core) Yes
Disturbed sleep Yes Yes
Poor concentration or indecisiveness Yes Yes
Low self-confidence Yes No
Poor or increased appetite Yes No
Suicidal thoughts or acts Yes Yes
Agitation or slowing of movements Yes Yes
Guilt or self-blame Yes Yes
Significant change in weight No Yes

Questionnaire (PHQ)-9. Most trained psychiatrists rely on their own clinical

skills.
Similarly, in primary care, in a survey of 2,500 Australian primary care
practitioners (PCPs), Krupinski and Tiller (2001)1 found that 28% asked about
at least five of the nine standard DSM-IV symptoms. The two symptoms that
were most frequently asked about were sleep disturbance (cited by 86.8%) and
loss of appetite (cited by 55.6%). Only 0.2% of this sample said they would
make a diagnosis using a rating scale.

Toward Evidence-Based Diagnosis

Is ICD or DSM right to place more weight on some symptoms than others? If
so, there must be evidence that specific symptoms have more diagnostic
importance than others. This means that these methods have been subject to
comparative diagnostic validity testing. Most clinicians (psychiatrists and
non-psychiatrists alike) use their own clinical acumen to make a diagnosis
without using any specific tool, but they may have personal experience of the
diagnostic importance of specific symptoms. Even those using DSM-IV still
have to use clinical judgment because there are no recommended structured
102 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

questions in DSM.2 Conventional clinical method replies on experience and

pattern recognition, whereas actuarial judgment uses decision theory
informed by empirically established tests.3 In both cases, reaching a diag-
nosis means narrowing down a long list of possibilities in light of accumu-
lating clinical evidence. However, in the former case it is difficult to check
for inaccuracy, whereas in the latter case there is an attempt to diagnose on
the basis of calculated probabilities. The standard model for this task is
Bayes’ theorem, which calculates post-test probability in relation to the
baseline probability (Fig. 5.1). The baseline (pre-test) probability of the
condition is the local prevalence of the disease, and the post-test probability
is the probability of disease given new information such as a positive test
result.4
Before assuming that assisted methods (eg, screening) are helpful, it is
worth checking on the evidence base for unassisted detection (see Chapter 3).

Textbox 5.2. Definitions of Measures of Diagnostic Accuracy

Sensitivity (Se)
A measure of accuracy defined the proportion of patients with disease in
whom the test result is positive: a/(a + c)
Specificity (Sp)
A measure of accuracy defined as the proportion of patients without disease
in whom the test result is negative: d/(b + d)
Positive Predictive Value
A measure of rule-in accuracy defined as the proportion of true positives in
those with a positive screening result: a/(a + b)
Negative Predictive Value
A measure of rule-out accuracy defined as the proportion of true negatives in
those with a negative screening result: c/(c + d)
Youden’s J
A composite of overall accuracy using sensitivity and specificity that is
unaffected by prevalence: sensitivity + specificity – 1
Predictive Summary Index
A composite of overall accuracy using all positive and negative screens that
reflects the prevalence: PPV + NPV – 1
Kappa
An index that compares the agreement against that which might be expected
by chance. Kappa can be thought of as the chance-corrected proportional
agreement: (Observed agreement – Chance agreement)/(1 – Chance
agreement)
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 103

Decision Trees

Test Positive
Treated condition
Condition Sensitivity
Pre valence X Sens
Pre valence
Test Negative
Untreated condition
Screen 1-sensitivity
Pre valence X T-Sens
Test Positive
False positive
No condition 1-specificity
1-Prev X 1-Spec
1-Pre valence Test Negative
Healthy child
specificity
1-Prev X Spec

Condition
Untreated condition
Pre valence
Pre valence
Don’t Screen

No condition
Healthy child
1-Pre valence
1-Pre valence

Figure 5.1. Decision Theory.

2. Scientific Aspects of Diagnostic Accuracy

Attempts to distinguish patients with a condition from those without on the
basis of a test or clinical method are most simply represented by a 2 2 table
that generates sensitivity, specificity, positive predictive value (PPV), and
negative predictive value (NPV) (Textbox 5.2).5 It is critical to understand
the difference between looking vertically down cells and looking horizontally
across (Figure 5.2). Vertically, the denominator is the number of cases with or
without the condition, a number that is unknown to the clinician but is known in
a research setting with a gold standard. Horizontally, the dominator is the
number of positive or negative screens, a number that is known to clinicians
and hence the reason why PPV and NPV reveal proportions of interest in the
real world. There is a complex relationship between these variables. In real life
the performance of a test varies with the baseline prevalence of the condition.
Put simply, it is simple to spot cases when nothing but cases exist (prevalence =
100%); conversely, it is hard when the prevalence is low.6 Rule-in and rule-out
accuracy are essentially independent variables, although a test may perform
well in both directions. Rule-in accuracy is best measured by the PPV, but a
high specificity also implies there are few false positives, and hence any
positive results will suggest a true case.7 Rule-out accuracy is best measured
by the NPV where the denominator is all who test negative, but again if the
104 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Gold Standard Gold Standard

Disorder No Disorder
Test +ve A/A + B
A B PPV

Test –ve D/C + D

C D NPV

Total A/A + C D/B + D

Se Sp

Figure 5.2. Generic 2 2 Table.

sensitivity is high, there will be few false-negative results, and hence any
negative implies a true non-case.
Optimal accuracy is often achieved by choosing one test for rule in (case-
finding) and another for rule out, but not uncommonly only a single test can be
applied and it must perform as well as possible in both directions. In this situation
summary accuracy statistics are useful. The simplest are Youden’s J and the
predictive summary index, which are essentially averages of sensitivity + spe-
cificity and PPV + NPV, respectively.8 The fraction correct (ratio of true cases
and non-cases/all cases and non-cases) is also useful, as it can easily be used to
compare different methods. All such methods work well when the optimal cutoff
is known or in binary (yes/no) tests. However, where performance varies by
cutoff threshold, sensitivity versus specificity for each cutoff generates a
receiver-operator curve, and the area under the curve gives a measure of the
overall performance. Where multiple tests need to be compared, each with
different optimal sensitivity and specificity values, results can be combined in
a summary receiver operator characteristic curve (sROC).9 Additionally when
the relative importance of false positives or false negatives is significant, then a
cutoff may be chosen that favors rule-in or rule-out accuracy.

Likelihood Ratios
Likelihood ratios can be clinically useful because they do not vary with
prevalence and because they can be calculated for several levels of test
result. A positive likelihood ratio is the odds that a positive test result came
from a patient with the disorder (sensitivity/[1 – specificity]). The negative
likelihood ratio represents the odds that a negative result came from a patient
with the disorder ([1 – sensitivity]/specificity).
A normogram (Fig. 5.3) has been developed for use with likelihood ratios to
determine the post-test probability of disease if the pre-test probability and the
likelihood ratio for the specific test are known. A likelihood ratio greater than 1
produces a post-test probability that is higher than the pre-test probability.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 105

0.1 99

0.2 98

0.5 95
2000
1 1000 90
500
2 200 80
100
50 70
5
20 60
10 10 50
5 40
20 2 30
1
30 0.5 20
40 0.2
50 10
0.1
60 0.05
5
70 0.02
0.01
80 0.005 2
0.002
90 1
0.001
0.0005
95 0.5

98 0.2

99 0.1
Pre-Test Likelihood Post-Test
Probability (%) Ratio Probability (%)

Figure 5.3. Likelihood Ratio Normogram.

3. Clinical Aspects of Diagnostic Accuracy

The best way to understand the clinical applicability of a screening test is to
consider the example listed in Textbox 5.1. The patient complains of five
symptoms and has data from a single Hospital Anxiety and Depression Scale-
Depression (HADS-D) rating. Are these symptoms likely to be symptoms of
depression or do they occur in people with stroke who are not depressed? The
diagnostic impact of each piece of information can be evaluated scientifi-
cally, provided its rate of occurrence is known in both groups (Textbox 5.3
lists these rates). The occurrence rate in the depressed sample is in fact the
sensitivity of each specific item. Thus, the symptom with optimal sensitivity
106 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

is ‘‘persistent low mood.’’ Specificity is derived from the non-occurrence in

the non-depressed subject, and in this case the optimal specificity is a HADS
score of 9 of above, closely followed by poor appetite. However, does this
mean these are the best ‘‘tests’’ for this condition?

Textbox 5.3. Post-Stroke Depression: Symptom Counts

A previously well 58-year-old man who suffered a dominant hemisphere

stroke 2 months previously is referred to an outpatient psychiatry clinic. He
reports that he has had five symptoms—low mood, loss of drive, low energy,
poor appetite, and insomnia—for the last 3 weeks. His score on the HADS
depression scale is 9 out of 21. Out of the last 100 patients seen in this clinic,
54 were depressed.

Patient’s Symptoms % of Depressed Stroke % of Nondepressed Stroke

Patients from Previous Patients from Previous
Studies Studies
Persistent low mood 93% 18%
Loss of drive 88% 30%
Low energy 87% 32%
Disturbed sleep 83% 32%
Poor appetite 45% 11%
HADS score 9 60% 9%
or above

Pre-Test–Post-Test Change
As previously noted, raw sensitivity and specificity figures are of only mod-
erate use by themselves. More useful are the PPV and NPV, which can be
calculated from the above data. The data from Textbox 5.3 are reproduced in
detail in Table 5.2. From this study of 1,000 people following stroke, we see the
complexity of deciding upon the optimal test. Persistent low mood is the
symptom with highest sensitivity and NPV. Thus, if low mood is not present,
there is a 98% chance of identifying a healthy subject on this symptom alone.
This alone improves upon the pre-test probability of 0.80 by 0.18 (pre–post
gain) (Fig. 5.4). Similarly, if all five symptoms listed are present, there is an
88% chance of major depression, a large pre–post gain. This is different from
calculating the value of any one of the five symptoms, which compares ‘‘or’’
rather than ‘‘and’’ combination.
Table 5.2. Summary of Diagnostic Accuracy Results from a Hypothetical Study of Post-Stroke Depression

Patient’s Depressed TP Sensitivity Non- TN Specificity PPV NPV Youden PSI FC UI+ UI
Symptoms after Stroke Depressed
after Stroke

Single Symptoms
Persistent low mood 200 186 0.93 800 656 0.82 0.56 0.98 0.75 0.54 0.84 0.52 0.80
Loss of drive 200 176 0.88 800 560 0.70 0.42 0.96 0.58 0.38 0.74 0.37 0.67
Low energy 200 174 0.87 800 544 0.68 0.40 0.95 0.55 0.36 0.72 0.35 0.65
Disturbed sleep 200 166 0.83 800 544 0.68 0.39 0.94 0.51 0.33 0.71 0.33 0.64
Poor appetite 200 90 0.45 800 712 0.89 0.51 0.87 0.34 0.37 0.80 0.23 0.77
Composite Measures
All five symptoms 200 56 0.28 800 792 0.99 0.88 0.85 0.27 0.72 0.85 0.25 0.84
PHQ2 (Q1 or Q2 200 160 0.80 800 560 0.70 0.40 0.93 0.50 0.33 0.72 0.32 0.65
positive)
HADS: score 9 or 200 130 0.60 800 728 0.91 0.64 0.91 0.51 0.56 0.86 0.39 0.83
above
Algorithm: PHQ2 then 200 96 0.48 800 778 0.97 0.81 0.88 0.45 0.70 0.87 0.39 0.86
HADS (if positive)
Sample size = 1,000; prevalence = 0.20
TP, true positives; TN, true negatives; PSI, predictive summary index; FC, fraction correction; UI, utility Index.
108 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

0.8
Post-test probability

0.6

Max gain
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
Prevalence of prior probability

Figure 5.4. Conditional probabilities graph of pre-test post-test gain from a hypothetical
diagnostic test.

Surely, then, the five-symptom method is the best method to identify

post-stroke depression? In the real world, the situation is more complex than
it first appears because all five symptoms are positive in only 28% of true
cases.

Clinical Utility of a Discriminating Test

Even when a test has a high PPV or NPV, a correction is needed for occurrence
of that test in each respective population. Thus, in this example, if a combina-
tion of five symptoms occurs, then it is 88% likely that major depression is
present; however, this combination is actually uncommon (28%) in clinical
practice. For the clinician, any test with a high PPV will be devalued if it occurs
rarely in true cases. Clinically relevant rule-in accuracy (also known as the
positive utility index) is a product of the PPV and sensitivity. Thus, the positive
utility index for all five symptoms is 0.88 0.28 = 0.32. A similar calculation
applies for ruling out a diagnosis. For example, the symptom ‘‘loss of drive’’
has a high NPV but is negative in only 70% of non-depressed stroke patients.
Thus, its corrected rule-out value can be calculated by the negative utility
index, 0.96 0.70 = 0.67. Utility index scores can be converted into qualita-
tive grades as follows: excellent 0.81, good 0.64, satisfactory 0.49, and
poor < 0.49.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 109

In this example, the most useful population-based rule-in test is low mood,
although it is only a ‘‘satisfactory’’ test. The most useful rule-out test is the
algorithm approach, which can be graded as an ‘‘excellent’’ rule-out test.
Algorithm approaches are worth examining in a bit more detail.

Algorithm Approaches
In this example, three questionnaire approaches are shown. The PHQ-2
achieves modest sensitivity and specificity and identifies 77% of all true
cases. The HADS-D has excellent specificity and NPV and thus could be used
as a rule-out test. Indeed, it could be combined with a high cutoff (eg, 15v16)
as a good rule-in test, leaving a cohort scoring 9 to 15 as diagnostically
uncertain and requiring a second-stage test. The HADS can also be combined
with another questionnaire, in this case the PHQ-2 (see Appendix Fig. 2).
This is a basic algorithm approach where a second test is applied only in those
positive in the first step. This two-step strategy has the effect of reducing the
false positives, improving the PPV and specificity but at the expense of
sensitivity and NPV. In low-prevalence conditions, the overall gain in accu-
racy may be worth the effort of the extra step. Thus, the two-step strategy
improves on the 0.40 PPV from the PHQ-2 alone to 0.81 but reduces the NPV
from 0.93 to 0.88. However, there is an overall gain in accuracy from 65% to
86% correctly identified.
Clinicians may use their own clinical method as an algorithm—for
example, offering a follow-up interview to those who are suspected of
having a disorder on initial examination. The algorithm often offers a
potential economic and efficiency advantage over a conventional approach.
Here the majority of patients receive a simple, inexpensive screening test
and a minority receive a more lengthy case-finding test. However, the
algorithm approach is efficient only where the prevalence of a condition
is very low (or very high, in which case the second step is applied to those
who screen negative to reduce the false negatives). As the prevalence
approaches 0.50, the yield of two-steps converges on the yield from one-
steps. The gain is also at its greatest when the accuracy of the single-step
approach is least (see Appendix Tables 3 and 4 for more details). A practical
example of an algorithm approach to the detection of depression can be
found here.10

4. Testing Screening via Implementation Studies

Even a test of high predictive value and high utility index cannot be assumed to
be beneficial. Guidelines from the U.K. National Screening Committee are
110 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 5.4. U.K. National Screening Committee Guidelines

The condition should:

Be an important health issue
Have a well-understood history, with a detectable risk factor or disease
marker
Have cost-effective primary preventions implemented
The screening tool should:
Be a valid tool with known cutoff
Be acceptable to the public
Have agreed diagnostic procedures
The treatment should:
Be effective, with evidence of benefits of early intervention
Have adequate resources
Have appropriate policies as to who should be treated
The screening program should:
Show evidence that benefits of screening outweigh risks
Be acceptable to public and professionals
Be cost-effective (and have ongoing evaluation)
Have quality-assurance strategies in place
Adapted from UK National Screening Committee Criteria for appraising the
viability, effectiveness and appropriateness of a screening programme.
Available at: http://www.nsc.nhs.uk/pdfs/criteria.pdf

helpful here (Textbox 5.4). Ultimately, the case for a screening test has to be
proven in an implementation study. This has two important parts: the feasi-
bility of the tool in a clinical setting and the added value of the tool beyond
what could be achieved without it.

Feasibility of Depression Screening

Feasibility asks whether a tool is practical both in application and scoring to
gain acceptance by healthcare professionals and patients. This has been
rarely studied in relation to depression severity scales. Bermejo and associ-
ates (2005)11 looked at attitudes to the PHQ-9 in general practice in
Germany. This study enrolled 1,034 patients from 17 PCPs; both patients
and healthcare professionals were asked about acceptability. Patients found
the instrument highly acceptable, but 62.5% of the PCPs thought it was too
long and 37.5% thought it was too time-consuming, even though it typi-
cally took 1 to 2 minutes. Half of the PCPs rated the PHQ as an impedi-
ment to daily practice and 75% thought it was impractical, compared with
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? 111

only 25% of patients. One proxy for feasibility is willingness of clinicians

to use the test: any screening roll-out will be compromised if front-line
staff find the tool too difficult to administer or score.

Added Value
Demonstrating the possible benefit of a screening tool is akin to demonstrating
benefit from a new medicine. Ideally, a randomized controlled trial using
representative clinicians and patients takes place. The design should be a
randomized trial where one group (arm 1) use their clinical skills uninfluenced
by the study taking place (Hawthorn effect) and the other group (arm 2) use
their clinical skills plus the screening tool or method. The advantage of this
design is that the results reveal the unassisted detection rate (arm 1) as well as
added value beyond usual care (the difference between arm 2 and arm 1).
Possible stages of tool development are discussed in Chapter 4.
Ideally, implementation should not stop with demonstration of superior
detection; rather, it should attempt to demonstrate further patient benefits,
such as better quality of care and greater resolution of depression. This is
discussed further in Chapter 7.

5. Conclusions
Although depression is one of the world’s most prevalent disorders and anti-
depressants are the most commonly prescribed class of drug, the science of
diagnosing depression has been hampered by the paucity of simple studies
documenting the rate of symptoms and signs in depressed and non-depressed
subjects. Once these data become available, calculating the diagnostic value of
specific symptoms (both individually and in combination) becomes straight-
forward. Better data exist for depression severity scales and other assisted
methods. Beyond this, further implementation studies are required in which the
true benefit of all proposed diagnostic methods to patients are compared with
conventional unassisted approaches.

References
1. Krupinski J, Tiller J. The identification and treatment of depression by general
practitioners. Aust N Z J Psychiatry. 2001;35:827–832.
2. Steiner JL, Tebes JK, Sledge WH, et al. A comparison of the structured clinical
interview for DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183:365–369.
3. Steadman HJ, Silver E, Monahan J, et al. A classification tree approach to the
development of actuarial violence risk assessment tools. Law and Human Behavior.
2000;24:83–100.
112 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

4. Elstein AS, Schwarz A. Clinical problem solving and diagnostic decision making:
selective review of the cognitive literature. BMJ. 2002;324:729–732.
5. Yerushalmy J. Statistical problems in assessing methods of medical diagnosis, with
special reference to X-ray techniques. Pub Health Rep. 1947;62:1432–1449.
6. Whiting P, Rutjes AWS, Dinnes J, et al. Development and validation of methods for
assessing the quality of diagnostic accuracy studies. Health Technology Assessment.
2004;8(25):1–234.
7. Sackett DL, RB Haynes. The architecture of diagnostic research. BMJ. 2002;324:539–541.
8. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35.
9. Macaskill P. Empirical Bayes estimates generated in a hierarchical summary ROC
analysis agreed closely with those of a full Bayesian analysis. J Clin Epidemiol.
2004;57:925–932.
10. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major depression
among patients with coronary artery disease using the Patient Health Questionnaire:
Data from the Heart and Soul Study. J Gen Intern Med. 2008;23(12):2014–2017.
11. Bermejo I, Niebling W, Mathias B, et al. Patients’ and physicians’ evaluation of the
PHQ-D for depression screening. Primary Care & Community Psychiatry.
2005;10(4):125–131.
6
CLINICAL JUDGMENT AND THE INFLUENCE
OF SCREENING ON DECISION MAKING

Howard N. Garb

1. Introduction
2. Research on Clinical Judgment
3. The Limits of Screening

Context
How do clinicians arrive at diagnostic decisions? In most cases the
decision is not made following formal criteria, but by intuition. In addi-
tion, routine interviews are often narrow and the feedback gleaned from
patients is inadequate. Yet it is not clear if screening helps or hinders
clinical judgment. It might be that only clinicians who have low confi-
dence and interviewing and diagnostic skills are open to the use of and
actually helped by diagnostic tools.

1. Introduction
To provide a theoretical framework for understanding why it is difficult for
physicians to detect depression in primary care settings, a broad array of
research in the mental health fields can be described. For example, more than
1,000 studies have been conducted on clinical judgment in the area of mental
health practice,1,2 and the results from these studies can be used to illuminate
the challenges physicians face in judging whether a patient is clinically

*
The views expressed in this article are those of the author and are not the official policy of the
Department of Defense or the United States Air Force.

113
114 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

depressed and can benefit from treatment. In this chapter, results on clinical
judgment will be described.
A second topic will also be briefly discussed. Results from research on
clinical judgment would seem to indicate that screening should be of value.
Yet, as noted in Chapter 7, stand-alone screening programs have added little
or nothing to outcomes. Reasons for this unexpected result will be explored.

2. Research on Clinical Judgment

Three topics will be discussed: (1) narrowness of interviews, (2) nature of
patient feedback, and (3) the cognitive processes of clinicians.

Narrowness of Interviews
Depression goes undetected because in many cases physicians do not ask
patients if they have symptoms of a depressive mood disorder.3 To place this
in context, it can be noted that mental health professionals also often do not ask
patients about important symptoms and behaviors. Failure to inquire about
depression in primary care settings can be viewed in the broader context of
failure to inquire about important symptoms and events in mental health
settings.
Research on clinical judgment has demonstrated that lack of comprehen-
siveness is often a problem for interviews made in clinical practice. For
example, in one study,4 mental health professionals saw patients in routine
clinical practice, and afterwards research investigators conducted semi-struc-
tured interviews with the patients. Remarkably, the mental health professionals
had evaluated only about 50% of the symptoms that were recorded using the
semi-structured interviews.
Similarly, a number of studies have found that mental health professionals
often do not ask about important events when formulating a case history. For
example, in a study by Malone and associates (1995),5 clinicians at a psychia-
tric hospital failed to document a history of suicidal behavior for 12 of 50
patients who had a history of suicidal behavior. This is important because past
suicidal behavior is one of the best predictors of suicide. In another study,6 26
of 69 psychiatric inpatients reported on a research questionnaire that they were
victims of severe physical abuse by family members or partners during the past
year. The abuse had been documented in medical charts for only nine of the
patients. To give one more example, in another study a computer interview was
used to collect a psychiatric history.7 Important history information was
obtained using the computer interview that had not been obtained by mental
health professionals in the course of their routine work. This was especially
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 115

true for obtaining information about criminal history (26% of patients),

amnesic blackouts after drinking heavily (23%), repeatedly being fired from
jobs (17%), recent drug abuse (10%), and debts (10%).
Another type of error that occurs when evaluations of psychopathology are
not comprehensive is called diagnostic overshadowing. Diagnostic oversha-
dowing is said to occur when clinicians make one or two diagnoses but over-
look other disorders.8,9 For example, when diagnoses are made by mental
health professionals, mental disorders tend to be missed among clients with
mental retardation,10,11 alcohol and drug abuse is often underdiagnosed among
clients presenting with psychiatric problems,12 and diagnoses of personality
disorder are often missed among clients with an Axis I disorder (eg, among
clients with obsessive-compulsive disorder).13
If mental health professionals fail to ask about important emotional and
behavioral problems and overlook mental disorders, it is not surprising that
physicians who are not trained in psychiatry do the same. Since patients in
primary care settings almost always present with physical complaints, we
should not be surprised when diagnostic overshadowing occurs and physicians
do not explore other possible problems.

Nature of Patient Feedback

Another reason why physicians may have difficulty detecting depression in
primary care settings is because they are unlikely to receive accurate feedback.
If a patient with clinically significant depression presents with a medical
problem and the physician misses the diagnosis, it is unlikely that the physician
will later learn that the diagnosis of depression was missed.
One of the most surprising findings on clinical judgment is that it can be
very difficult to learn from clinical experience. Training is often positively
related to validity, but experience is not.14,15 Thus, once physicians and mental
health professionals complete residency or graduate-school levels of training,
the amount of experience they gain is weakly related, or even negatively
related, to the accuracy of judgments and treatment outcomes.
In a review of the literature on the relationship between clinical experience and
quality of healthcare,16 physicians who had been in practice longer were found to
be at risk for providing lower-quality care. A decreasing level of performance (or
treatment) was associated with increasing years in practice for all outcomes
assessed in 32 of 62 studies. In the other studies, decreasing level of performance
was associated with increasing experience for some outcomes but not for others
(13 of 62 studies), no association was observed for 13 of 62 studies, mixed results
were obtained for 3 of 62 studies, and an increasing level of performance with
increasing years in practice for all outcomes was obtained in 1 of 62 studies.
116 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Similarly, in routine clinical practice in the mental health fields, profes-

sionals with extensive clinical experience are typically no more accurate than
other clinicians. For example, in one study,17 different participants (eg, marital
therapists, undergraduates) viewed videotaped conversations of 10 married
couples and predicted which couples were likely to divorce in the future.
Attitudes about marriage, but not amount of clinical experience, were related
to the validity of predictions.

The Cognitive Processes of Clinicians

It is likely that depression often goes undetected in primary care settings not
only because interviews are narrow and feedback is inadequate, but also
because the cognitive processes of clinicians are fallible. The primacy effect,
confirmatory hypothesis testing, cognitive heuristics, and causal reasoning are
described in this section.
One can wonder if one reason physicians miss diagnosing depression is
because they make judgments too quickly. The tendency to make judgments
quickly, sometimes after collecting relatively few data, is called the primacy
effect. It is characteristic of social judgments made in everyday situations as
well as of clinical judgments made in mental health settings.1,18 For example,
Gauron and Dickinson reported that psychiatrists who observed a videotaped
interview routinely formed diagnostic impressions in 30 to 60 seconds.19
Similarly, Kendell found that psychiatrists are often ready to make a diagnosis
for a patient within a few minutes.20 One can wonder if physicians in primary
care settings also tend to reach conclusions surprisingly quickly, and if this is a
reason for their missing diagnoses of depression.
Another reason depression may go undetected is because physicians may
rely on confirmatory hypothesis testing. Confirmatory hypothesis testing refers
to a tendency to seek, use, and remember information that is likely to confirm,
but not refute, a hypothesis. Research on clinical judgment indicates that
mental health professionals tend to seek and remember information that will
support a hypothesis and this leads them to not consider alternative hypotheses.
For example, in an especially well-designed study,21 psychology graduate
students watched a videotape of an initial psychotherapy session. They listed
questions they would like to ask the client portrayed in the videotape, and they
described their reasons for wanting to ask the client these questions. An
independent panel of psychologists coded each question as being likely to
elicit information that could confirm or disconfirm their hypothesis. The style
of hypothesis testing was confirmatory 64% of the time, neutral 21% of the
time, and disconfirmatory 15% of the time. These results, along with results
from other studies, provide insight into why clinicians do not routinely con-
sider alternative hypotheses.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 117

Cognitive heuristics are simple rules that describe how judgments are made.
Made famous by Daniel Kahneman and Amos Tversky, cognitive heuristics
describe cognitive processes that allow us to efficiently process vast amounts
of information.22 However, these same cognitive processes also cause us to
sometimes make characteristic types of mistakes. Cognitive heuristics include
the affect, representativeness, and availability heuristics.
The affect heuristic refers to the fact that people often make judgments and
decisions based, in part, on their feelings. ‘‘Snap judgments’’ and judgments
based on ‘‘gut instinct’’ or intuition are often described by the affect heuristic.
Kahneman believes that the formulation of the affect heuristic is ‘‘probably the
most important development in the study of judgment heuristics in the past few
decades.’’23, p. 703 But how does the affect heuristic relate to the detection of
depressive disorders in primary care settings? For whatever reasons, in many
cases, physicians’ reliance on affect and intuition does not allow them to detect
depression in these settings.
The representativeness heuristic is said to be descriptive of a clinician’s
cognitive processes when a judgment is made by deciding if a patient is
representative of a category.24 For example, when a screening instrument
indicates that a patient may be depressed and physicians must decide if
treatment for depression is required, the physicians may compare the patient
to (a) patients they have worked with who have been clinically depressed,
(b) their concept of the ‘‘typical’’ person with clinically significant depression,
or (c) a theoretical standard that serves to define clinically significant depres-
sion. The representativeness heuristic is often descriptive of how judgments are
made in everyday life,25 and it is even descriptive of how many mental health
professionals make diagnoses.26 Since the representativeness heuristic is often
descriptive of how people make judgments, it is likely to also be descriptive of
physicians in primary care settings. If they are not comparing patients to
appropriate exemplars, stereotypes, or prototypes, then this may explain why
they are having difficulty with this task.
The third heuristic, the availability heuristic, is descriptive of memory when
clinicians are influenced by the ease with which events or different patients can
be remembered. For example, the ease with which information is remembered
can be related to its recency or its vividness. The point to be understood here is
that memory is fallible. We are unable to remember all of the patients we have
seen. By being selective for memory, cognitive efficiency is enormously
enhanced, but learning from experience becomes difficult.
One more feature of the cognitive processes of clinicians will be described.
A major finding on clinical judgment in recent years is that causal reasoning
underlies the manner in which mental health professionals make many dif-
ferent types of judgments, including treatment decisions, predictions, and
diagnoses.27,28
118 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

With regard to treatment decisions, Witteman and Koele addressed the

following questions: ‘‘What explains which treatment is proposed to a
(depressed) patient? Is it the patient characteristics, such as her or his specific
symptoms, social context, and seriousness of the disorder, or is it the theoretical
background of the proposing psychotherapist?’’29, p. 100 For a group of 56
therapists, treatment plans were highly variable, and Witteman and Koele con-
cluded, ‘‘The best explanations of the treatment proposals seemed to be the
therapist’s theory-inspired interpretations of the patient complaints.’’29, p. 100
Causal reasoning also underlies how mental health professionals make
predictions. In one study, clinicians predicted whether patients would
become violent in the next 6 months.30 Ratings were made by mental health
professionals working in a psychiatric emergency room. Mulvey and Lidz
observed:

Clinicians did not appear to be making simple ‘‘yes’’ or ‘‘no’’ judgments of

dangerousness. Rather, they seemed to be making contextualized judgments
regarding future violence. Instead of stating whether they thought someone
was highly likely or unlikely to be involved in violence, the clinicians instead
gave what we called ‘‘conditional judgments’’ regarding future violence. . . . In
other words, they saw the violence as dependent upon certain conditions in the
person’s life.’’30, p. S108

Thus, clinicians will frequently make predictions by formulating case

conceptualizations.
Finally, when clinicians make diagnoses, they are influenced not only by
diagnostic criteria but also by their implicit causal theories.27,31 Clinicians
weigh diagnostic criteria more heavily when the criteria describe symptoms
and behaviors that are part of a clinician’s implicit causal model for a dis-
order.27 When using DSM, clinicians are supposed to weigh each criterion
equally. Similarly, mental health professionals’ implicit theories influenced
their memories of their clients’ mental status. Causally central symptoms were
recalled more often than causally peripheral symptoms and isolated symptoms.
In addition, false memories of a patient having symptoms the patient did not
really have were most likely to occur for symptoms that were causally central
to clinicians’ theories of different disorders.
The finding that causal reasoning underlies different types of clinical judg-
ments is important for helping us understand the actions of physicians in
primary care settings. To understand the etiology and course of a patient’s
physical complaint, physicians should understand the effect of depression. In
other words, for some patients, vague physical complaints and complaints of
fatigue and aches and pains are highly correlated with depression and anxiety.
To the extent that this is recognized by physicians, they will become more
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 119

adept at detecting depression. Thus, to some degree, to bring about change in

primary care settings, we must be concerned with the implicit causal theories of
physicians.

3. The Limits of Screening

The use of screening questionnaires can help physicians overcome some
problems but not others. Screening questionnaires can compensate for
interviews that are not comprehensive, and they can help physicians over-
come some cognitive processes that are counterproductive, such as diag-
nostic overshadowing and confirmatory hypothesis testing. In particular,
screening questionnaires will prompt physicians to consider alternative
hypotheses—that is, results from a screening questionnaire can lead a
physician to consider whether a patient is depressed. Otherwise, the physi-
cian may not even consider the hypothesis that a particular patient has a
mood disorder.
Given everything we know about clinical judgment, it is somewhat sur-
prising that the use of screening questionnaires has not been related to
improved clinical outcomes. A number of reasons can be given for why this
is the case. Two reasons will be described here.
First, some patients overreport symptoms while other patients underreport
them. This can occur if a patient misunderstands an item or if the patient wants
to create an impression of being healthy or of being impaired. To the extent that
symptoms are overreported or underreported on screening instruments, we
should not expect better clinical outcomes.
Second, even with the use of screening questionnaires, physicians must still
rely on clinical judgment. Thus, if a patient tests positive for depression on a
screening instrument, physicians must rely on their clinical judgment to deter-
mine whether the patient’s responses should be viewed as indicating a need for
treatment or as a false positive. If someone is clinically depressed, physicians
will need to determine if he or she may have a bipolar disorder (and should not
be treated with an antidepressant). They must also determine if the patient is at
serious risk for suicide. If physicians are not making the right judgments when
a patient tests positive (eg, making a referral to a mental health professional,
providing treatment for depression, making a differential diagnosis of bipolar
disorder), then the use of screening questionnaires will not lead to improved
clinical outcomes. This is a challenging task for physicians, in part because
they will not receive feedback on the validity of their judgments or the utility of
their decision making and in part because they are unlikely to have specialized
training in mental health diagnosis and treatment. It is also a challenging task
because when patients complete questionnaires inquiring about mental health
120 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

symptoms, false positives are common, usually because patients (and everyone
else) will sometimes interpret items in an idiosyncratic manner.32
In conclusion, we are faced with a dilemma. Clinical judgment is fallible,
and the use of screening questionnaires has not been related to improved
clinical outcomes. However, the use of screening tools should help to improve
clinical judgment, and, much of the time, an optimal strategy will be to conduct
screening and then rely on clinical judgment. Although a large body of research
describes errors and mistakes in clinical judgment, it can still be of consider-
able value, if only to review responses on a screening questionnaire with a
patient so as to better understand how the patient interpreted the items. In
addition, it may be that use of screening assists in the diagnosis of under-
confident clinicians but could be unhelpful in those skilled in making the
diagnosis in question.

References
1. Garb HN. Studying the clinician: judgment research and psychological assessment.
Washington, DC: American Psychological Association, 1998.
2. Garb HN. Clinical judgment and decision making. Ann Rev Clin Psychol.
2005;1:67–89.
3. Nichols GA, Brown JB. Following depression in primary care—Do family practice
physicians ask about depression at different rates than internal medicine physicians?
Arch Fam Med. 2000;9:478–482.
4. Miller PR, Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of
structured vs. unstructured interviews. Psychiatry Res. 2001;105:255–264.
5. Malone KM, Szanto K, Corbitt EM, et al. Clinical assessment versus research methods
in the assessment of suicidal behavior. Am J Psychiatry. 1995;152:1601–1607.
6. Cascardi M, Mueser KT, DeGiralomo J, et al. Physical aggression against psychiatric
inpatients by family members and partners. Psychiatr Serv. 1996;47:531–533.
7. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med.
1983;13:151–158.
8. Jopp DA, Keys CB. Diagnostic overshadowing reviewed and reconsidered. Am J Ment
Retard. 2001;106:416–433.
9. Reiss S, Szyszko J. Diagnostic overshadowing and professional experience with
mentally retarded persons. Am J Mental Defic. 1983;87:396–402.
10. Mason J, Scior K. Diagnostic overshadowing amongst clinicians working with people
with intellectual disabilities in the UK. J Appl Res Int Dis. 2004;17:85–90.
11. Spengler PM, Strohmer DC, Prout HT. Testing the robustness of the overshadowing
bias. Am J Mental Retard. 1990;95:204–214.
12. Drake RE, Osher FC, Noordsy DL, et al. Diagnosis of alcohol use disorders in
schizophrenia. Schizophr Bull. 1990;16:57–67.
13. Tenney NH, Schotte CKW, Denys DAJP, et al. Assessment of DSM-IV personality
disorders in obsessive-compulsive disorder: Comparison of clinical diagnosis, self-report
questionnaire, and semi-structured interview. J Personal Disord. 2003;17:550–561.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING 121

14. Garb HN. Clinical judgment, clinical training, and professional experience. Psychol
Bull. 1989;105:387–396.
15. Garb HN, Schramke CJ. Judgment research and neuropsychological assessment: a
narrative review and meta-analyses. Psychol Bull. 1996;120:140–153.
16. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The relationship
between clinical experience and quality of health care. Ann Intern Med.
2005;142:260–273.
17. Ebling R, Levenson RW. Who are the marital experts? J Marriage Fam.
2003;65:130–142.
18. Ambady N, Rosenthal R. Thin slices of expressive behavior as predictors of
interpersonal consequences: A meta-analysis. Psychol Bull. 1992;111:256–274.
19. Gauron EF, Dickinson JK. Diagnostic decision making in psychiatry. Arch Gen
Psychiatry. 1966;14:225–232.
20. Kendell RE. Psychiatric diagnoses: A study of how they are made. Br J Psychiatry.
1973;122:437–445.
21. Haverkamp BE. Confirmatory bias in hypothesis testing for client-identified and
counselor self-generated hypotheses. J Couns Psychol. 1993;40:303–315.
22. Tversky A, Kahneman D. Judgments under uncertainty: heuristics and biases. Science.
1974;185:1124–1131.
23. Kahneman D. A perspective on judgment and choice: Mapping bounded rationality. Am
Psychol. 2003;58:697–720.
24. Kahneman D, Slovic P, Tversky A, eds. Judgment under uncertainty: Heuristics and
biases. New York: Cambridge University Press, 1982.
25. Gilovich T, Griffin D, Kahneman, D, eds. Heuristics and biases. New York: Cambridge
University Press, 2002.
26. Garb HN. The representativeness and past-behavior heuristics in clinical judgment.
Prof Psychol Res Pr. 1996;27:272–277.
27. Kim NS, Ahn W. Clinical psychologists’ theory-based representations of mental
disorders predict their diagnostic reasoning and memory. J Exp Psychol Gen.
2002;131:451–476.
28. Wakefield JC, Kirk SA, Pottick KJ, et al. Disorder attribution and clinical judgment in
the assessment of adolescent antisocial behavior. Soc Work Res. 199;23:227–238.
29. Witteman C, Koele P. Explaining treatment decisions. Psychother Res. 1999;9:100–114.
30. Mulvey, EP, Lidz CW. Clinical prediction of violence as a conditional judgment. Soc
Psychiatry Psychiatr Epidemiol. 1998;33:S107–S113.
31. Pottick KJ, Kirk SA, Hsieh DK, et al. Judging mental disorder in youths: Effects of
client, clinician, and contextual differences. J Consult Clin Psychol. 2007;75:1–8.
32. Nease DE, Klinkman MS, Aikens JE. Depression case findings in primary care:
A method for the mandates. Int J Psychiatry Med. 2006;36:141–151.
This page intentionally left blank
7
IMPLEMENTING SCREENING AS PART OF
ENHANCED CARE: SCREENING ALONE IS NOT
ENOUGH

Simon Gilbody and Dan Beck

1. The Case for Screening

2. Screening and Enhanced Care for Depression
3. New and Additional Evidence Relating to Enhanced Care
4. Is Screening a Necessary Intervention to Improve the Quality and Outcome
of Care?
5. To Screen or Not to Screen?

Context
There are conflicting conclusions and policy recommendations relating to the
effects of screening on the outcome of depression, but what does the latest
evidence suggest? Based on the best available information to date, it emerges
that screening alone is not a sufficient intervention to improve the quality and
outcomes of care for depression. What is less clear is whether screening is a
necessary condition for enhanced and improved quality of care and, given
additional components, to what extent screening programs can potentially
improve quality of routine care.

1. The Case for Screening

Depression is the most common mental health problem and is associated with
decrements in functioning and quality of life comparable to other chronic
physical diseases.1 The prevalence, chronicity, and burden of suffering are
such that the World Bank has predicted that depression will become the second
leading cause of global disability by 2020.2 The economic consequences of
depression are also profound, with the healthcare costs, welfare costs, and

123
124 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

losses to productivity amounting to £9 billion ($20 billion) in the United

Kingdom3 and $53 billion in the United States.4
Depression is most commonly encountered in primary care and in hospital
settings, yet it often goes unrecognized by healthcare professionals.5–7 This has
led to calls to implement screening programs to aid in the detection and
management of this problem.8,9 The rationale and evidence base to support
screening for depression is the focus of the present book and is discussed
extensively in other chapters (see Chapters 2, 4, and 9). In the United States,
screening has shifted from being an intervention that was not initially sup-
ported in national policy recommendations10 to being one that is regarded as
being of proven effectiveness.11 An evolution in thinking has occurred that
places screening at the center of mental health policy and practice, and is based
upon the general assumption that screening will logically lead to improvements
in the quality and outcome of care. Some have termed this the screening–
detection–treatment–improvement paradigm.12,13 Recently screening for
common mental health problems in the United States has become the corner-
stone of the president’s agenda to improve the mental health of the U.S.
population.14

Arguments For and Against Screening

Screening has a long and honorable tradition in helping to improve the health
and well-being of populations and individuals.15 However, screening is a
‘‘special case’’ in the armory of healthcare interventions, since testing and
treatment may be offered to those who do not necessarily know they have a
condition or do not specifically ask for help for that problem.16 Screening
programs have also been implemented in the past without due consideration of
their effectiveness, their ethical and clinical implications, and their impact on
finite healthcare resources.17 Consequently, clear criteria have evolved that
must be satisfied before screening programs are adopted (see Chapter 5).18 In
the case of depression, screening is just one of a range of possible interventions
that might be offered to improve care for depression at a population level,19 and
the implementation of screening programs should be supported by sound
clinical and economic evidence.20 The relative merits of screening for depres-
sion more generally have been reviewed by Gilbody and colleagues 20 and by
Palmer and Coyne.13 Gilbody and colleagues used a set of analytic principles
laid down by the World Health Organization18 and adopted by the U.K.
National Screening Committee.21 In their analysis, they agued that the relative
merits of screening programs are sometimes overstated, and that convincing
evidence that screening substantially influences the outcomes of depression is
difficult to find. The principal concerns that have been highlighted are that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 125

screening for depression uncovers a substantial body of undetected psycholo-

gical need that is not currently well met within existing healthcare systems.
Much of this represents short-term and self-limiting distress, the natural history
of which is not readily influenced by active intervention.22 In addition, the
common belief that unrecognized depression is as responsive to the evidence-
supported interventions (antidepressants and brief psychotherapy) currently
used for already recognized depression is not necessarily true: unrecognized
depression may be more difficult to treat because it tends to be mild or atypical.
Most importantly of all, they highlighted the relative lack of evidence in the
form of randomized controlled trials to show that the introduction of screening
programs for depression makes any substantial difference to the outcomes of
depression itself.23 There is also a dearth of economic data to inform this
population-level policy intervention. It is this area of supportive epidemiologic
and economic evidence that has produced the greatest amount of debate and
controversy, which we will review in more detail within this chapter.
Two strategies have been scrutinized and variously rejected10,24,25 or advo-
cated.11,26,27 The first is the use of screening as a ‘‘stand-alone’’ quality
improvement strategy. The second is the use of screening within a more
general enhancement of the care for depression in non-specialist settings. Let
us examine each of these strategies in turn to establish whether screening is a
sufficient or necessary condition in improving the quality and outcome of care
for depression.

Is Screening a Sufficient Intervention to Improve the Quality

and Outcome of Care?
The effectiveness of screening for depression was first addressed with refer-
ence to the research literature the 1990s. The first evidence synthesis was
conducted by the U.S. Agency for Health Care Policy and Research, which
looked at the evidence to support various aspects of the management of
depression in primary care settings, including screening.28 This review exam-
ined the totality of research and came down firmly against screening. On the
basis of a review of the literature published in May 1993, the U.S. Preventive
Services Task Force (USPSTF) concluded that there was ‘‘sufficient evidence
to exclude screening for depression in the primary care setting’’ (a ‘‘grade D’’
recommendation). This research highlighted that screening instruments did not
generally improve the detection rate or management of depression. The evi-
dence they reviewed was primarily related to the use of screening programs as
a ‘‘stand-alone’’ measure.
A similar conclusion was found in a 2001 evidence review24 also published
under the auspices of the Cochrane Collaboration (first in 200523 and updated
126 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

again in 200829,30). The most recent version of this review of ‘‘stand-alone’’

depression screening, which now includes 16 primary trials of the effectiveness
of screening strategies (5,000 + patients), concluded, ‘‘There is substantial
evidence that routinely administered case finding/screening questionnaires
for depression have minimal impact on the detection, management or outcome
of depression by clinicians.’’
The most important finding from the Cochrane reviews29,30 has been the
consistent demonstration that screening had minimal impact on the actual
outcomes of depression when screened populations were followed up over
time. This review concurs with the first USPSTF review,11 and an overall
summary diagram of the lack of effect of simple screening strategies based on
the Cochrane review is shown in Figure 7.1.
A review conducted at around the same time as the first Cochrane review,
to provide updated guidance to the USPSTF,11 examined a similar body of
research and found a similar lack of effect in relation to the impact of stand-
alone screening strategies. However, this review was altogether more posi-
tive about screening (Textbox 7.1). The reasons for this shift in recommen-
dation by the USPSTF deserve examination in some detail, and relate to the
additional consideration of screening alongside ‘‘additional enhancements
of care.’’

Depression outcomes (SMD)

Study (95% CI)

Bergus 2005 -0.29 (-1.40, 0.82)

Callaghan 1994 -0.05 (-0.97, 0.86)

Johnstone 1976 -0.77 (-1.54, 0.00)

Lewis GHQ 1996 0.10 (-0.09, 0.29)

Lewis PRQ 1996 -0.06 (-0.25, 0.13)

Whooley 2000 -0.16 (-0.72, 0.39)

Williams 1999 -0.22 (-0.81, 0.37)

Overall -0.03 (-0.16, 0.10)

–1.5 –1 –.5 0 .5 1 1.5

Depression outcomes (SMD)
Favors screening Favors control

Figure 7.1. Summary of random effects meta-analysis of the effect of simple screening/
case-finding instruments on the outcome of depression at follow-up (adapted from
references 23, 29, and 30).
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 127

Textbox 7.1. Current Policy Recommendations on Screening for

Depression

U.K. National Institute of Clinical Excellence31

‘‘Screening should be undertaken in primary care and general hospital
settings for depression in high-risk groups—for example, those with a past
history of depression, significant physical illnesses causing disability, or
other mental health problems such as dementia.’’
Review of reviews to inform practice and policy in Australia and
New Zealand32
‘‘Brief self-report instruments have acceptable psychometric properties and are
practical for use in general practice settings. Screening increases the recognition
and diagnosis of depression and, when integrated with a commitment to provide
coordinated and prompt follow up of diagnosis and treatment, clinical outcomes
are improved. Although controversial, the evidence is now in favour of the
appropriate use of screening tools in primary care.’’
U.S. Preventive Services Task Force11
‘‘The USPSTF found good evidence that screening improves the accurate
identification of depressed patients in primary care settings and that treatment
of depressed adults identified in primary care settings decreases clinical
morbidity. Trials that have directly evaluated the effect of screening on
clinical outcomes have shown mixed results. Small benefits have been
observed in studies that simply feed back screening results to clinicians.
Larger benefits have been observed in studies in which the communication
of screening results is coordinated with effective follow-up and treatment.
The USPSTF concluded the benefits of screening are likely to outweigh any
potential harms.’’
Strength of recommendation: B (‘‘there is at least fair evidence that the
intervention improves important health outcomes and that the benefits
outweigh the harms’’)
Canadian Task Force on Preventive Health Care27
‘‘The CTFPHC concludes that there is fair evidence to recommend screening
adults for depression in primary care settings since screening improves health
outcomes when linked to effective follow-up and treatment.’’
Strength of recommendation: B (‘‘there is fair evidence to recommend the
clinical preventive action’’)
‘‘The CTFPHC concludes that there is insufficient evidence to recommend
for or against screening adults for depression in primary care settings where
effective follow-up and treatment are not available.’’
Strength of recommendation: I (‘‘insufficient evidence [in quantity and/or
quality] to make a recommendation, however other factors may influence
decision-making’’)
128 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

2. Screening and Enhanced Care for Depression

The major shift between recommendations produced in 1996 and 2003 turns
upon the change in the scope of the evidence review and the inclusion criteria
that were set.25 In contrast to earlier reviews, the USPSTF in their updated
report reviewed both stand-alone screening programs and those embedded
within enhancements of care. An example of such an enhanced care study was
that conducted by Wells and colleagues (the Partners in Care study),33 which
provided practice-level enhancements in the quality of care for depression,
including structured psychotherapy or medication management, clinician edu-
cation and consultation/liaison, treatment guidelines, and structured follow-
up. Recruitment to this trial was by screening and, as such, was considered by
the USPSTF as evidence to support the effectiveness of screening in practice.
This study showed strongly positive results on the outcomes of depression and
was included a summary meta-analysis (accounting for 33% of the overall
weight of evidence). On the basis of this evidence, the USPSTF concluded,
‘‘benefits have been observed in studies in which the communication of
screening results is coordinated with effective follow-up and treatment.’’
A subsequent 2005 review published by the Canadian Task Force on
Preventive Health Care (CTFPHC)27 made a nearly identical recommendation,
highlighting the ineffectiveness of stand-alone screening and the effectiveness
of screening plus enhanced care. A similar recommendation was made in the
United Kingdom in guidance offered by the U.K. National Institute of Clinical
Excellence (NICE) (see Textbox 7.1).31

3. New and Additional Evidence Relating to Enhanced Care

The specific recommendations made by the USPSTF, CTFPHC and NICE
relating to screening plus enhanced care fit into a much wider body of research
relating to organizational enhancements to the process of care for depression.34
The enhancement of primary care for depression is an active area of research,
and a substantial body of research evidence now exists to show that this is an
effective intervention.35 The most recent review of this topic has included
pooled data from over 30 randomized trials, based on over 12,000 patients
with depression, and has shown that enhanced care is effective in the short and
medium term.35 The finding that enhanced or collaborative care is effective is
now a consistent one that has been supported in several independently con-
ducted meta-analyses (see Bower and Gilbody36 for an overview of reviews in
this area). In the aforementioned Partners in Care study, the benefits of an
enhanced care intervention have persisted at up to 5 years.37 However, while
the effectiveness of enhanced care is now beyond reasonable doubt, the
USPSTF review included only 438–41 of the 36 trials of enhanced care that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 129

were summarized in the largest or most comprehensive review to date. From

these four studies, the U.S. and Canadian reports drew quite specific conclusions
about the effectiveness of screening (the topic of their review) rather than about
the effectiveness of enhanced and collaborative care in general.25 Many studies
of enhanced care do not use screening as an entry criterion or component of
quality improvement, but these were not reviewed by the USPSTF. This is not
just of academic interest, since it is clear that many healthcare systems have
taken the positive endorsement of screening within enhancements of care as an
endorsement of screening per se. In the United Kingdom, for example, financial
inducements have been introduced to encourage primary care physicians to
screen for depression, without any requirement that further enhancements in
the quality of care are introduced.20 Clearly, the specific question about the
relative contribution of screening to the effectiveness of quality improvement
strategies is important from a policy and practice perspective. To what extent is
screening the critical component in determining the quality of depression care?

4. Is Screening a Necessary Intervention to Improve the

Quality and Outcome of Care?
What remains unclear from the preceding discussion and the work of the
USPSTF is whether screening is a necessary component or condition for
effective enhanced care, and whether enhancements of care without screening
are in themselves ineffective. Recent research has emerged to answer this
question, which was not effectively addressed by the USPSTF11 and a subse-
quent review by the CTFPHC.27
The overall effectiveness of enhanced care for depression has most recently
been reviewed by Gilbody and colleagues,35 who found that collaborative care
strategies were effective far beyond conventional levels of significance in
improving depression outcomes in the short and medium term. This dataset
provides a more comprehensive body of research within which to begin to
examine whether screening is a necessary ingredient of effective enhanced care
for depression.
Among enhanced care studies as a whole, the authors found a moderate
pooled standardized effect size of 0.25 for enhanced care compared to usual
care (95% confidence interval 0.18 to 0.32). They also found that there was
significant between-study variation in the magnitude of effect size (that is,
heterogeneity). When conducting a meta-analysis, the most rigorous approach
to heterogeneity is to seek to explain or explore the causes of this heteroge-
neity.42 This technique can provide useful insights into mechanisms of effect
and variations in treatment response according to the population under study or
the intervention under evaluation. This information is often of interest to
clinicians and policymakers charged with implementing or interpreting
130 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

research evidence. One technique that can be used is regression modeling,

whereby the relationship between study-level design variables and a dependent
variable (study effect size) is examined (this is termed meta-regression42,43).
This technique was applied to the dataset of enhanced care for depression by
Bower and colleagues to identify some of the ‘‘active ingredients’’ in enhanced
care for depression.44
Among 34 studies, there was substantial variation in the content and inten-
sity of collaborative care. Some studies, such as the previously discussed
Partners in Care study,39 provided relatively intensive packages of enhanced
care, including face-to-face clinician education, computerized decision sup-
port, individualized treatment algorithms, the active support of a nurse case
manager. and regular consultation/liaison with a specialist mental health clin-
ician (psychologist or psychiatrist). This study39 accounted for 30% to 47% of
the weighted information in the meta-analyses produced on behalf of the
USPSTF.11 In contrast, less intensive packages of care were also included in
the collaborative review by Gilbody and colleagues and involved simple
telephone follow-up by practice nurses.45 Bower and colleagues44 used meta-
regression to examine the relative contributions of various aspects of the
content of enhanced care interventions in improving depression outcome
within the dataset of collaborative care studies. They specified and were able
to find sufficient study-level information on eight aspects of care and study
design, including the method of recruitment—whether by screening or by
clinician referral of already recognized depression. Stratification according
to this variable showed that the majority of studies used screening, but that 12
collaborative care studies did not.45–55A stratified meta-analysis according to
this variable is shown in Figure 7.2, and the methods of patient recruitment (by
screening or by other means) are detailed in Table 7.1.
From this stratified analysis, it is evident that the majority of studies were
positive, and that screening studies showed the most strongly positive effect
size (Standardized Mean Differencescreening = 0.30, 95% confidence interval
0.21 to 0.38), while non-screening studies were still significantly positive,
but the magnitude of effect was less pronounced (Standardized Mean
Differenceno-screening = 0.15, 95% confidence interval 0.03 to 0.26). When the
difference between these two effect sizes was tested using logistic meta-
regression,56 this trend was positive but nonsignificant (difference in standar-
dized mean differences = 0.15, 95% confidence interval –0.03 to 0.29, p = 0.09).
Of particular interest from the point of view of the present chapter was the fact
that several additional study-level variables were also related to the magnitude of
effect size in collaborative care, and that three of these predictive covariates were
either strongly significant (p < 0.05) or more significant than screening
(p < 0.1). These were better antidepressant concordance, having a trained case
manager, and regular and planned supervision of case managers.
Standardized Depression Outcomes
Study (95% CI)
referred by clinician
Wilkinson 1993 –0.29 (–0.79, 0.22)
Mann 1998 –0.08 (–0.29, 0.13)
Peveler 1999 0.21 (–0.11, 0.54)
Akerblad 2003 0.26 ( 0.07, 0.45)
Brook 2003 0.00 (–0.34, 0.34)
Katon 1995 0.19 (–0.12, 0.49)
Katon 1996 0.49 ( 0.13, 0.86)
Finley 2000 –0.30 (–0.83, 0.24)
Hunkeler 2000 0.28 ( 0.03, 0.53)
Datto 2003 0.42 (–0.14, 0.98)
Dietrich 2004 0.16 (–0.08, 0.39)
Cappocia 2004 0.17 (–0.38, 0.72)
Subtotal 0.15 ( 0.03, 0.26)

identified by screening
Blanchard 1995 0.43 (–0.01, 0.87)
Araya 2003 1.13 ( 0.79, 1.47)
Bosmans 2006 0.07 (–0.28, 0.42)
Callahan 1994 0.05 (–0.48, 0.58)
Katon 1999 0.31 ( 0.01, 0.61)
Coleman 1999 –0.14 (–0.53, 0.25)
Wells-medication 2000 0.22 (–0.02, 0.46)
Simon 2000 0.30 ( 0.07, 0.52)
Katzelnick 2000 0.43 ( 0.22, 0.63)
Wells-therapy 2000 0.22 (–0.01, 0.45)
Unutzer 2001 0.40 ( 0.31, 0.50)
Katon 2001 0.11 (–0.09, 0.32)
Rost 2001b 0.29 (–0.05, 0.62)
Rost 2001a 0.20 (–0.10, 0.50)
Oslin 2003 0.61 ( 0.08, 1.13)
Swindle 2003 0.18 (–0.30, 0.66)
Rickles 2004 0.25 (–0.37, 0.87)
Adler 2004 0.19 (–0.01, 0.39)
Bruce 2004 0.30 ( 0.07, 0.52)
Simon 2004b 0.33 ( 0.05, 0.62)
Katon 2004 0.24 (–0.03, 0.51)
Jarjoura 2004 0.41 ( 0.00, 0.82)
Simon 2004a 0.18 (–0.11, 0.46)
Wang 2007 0.82 (–0.06, 1.70)
Subtotal 0.30 ( 0.21, 0.38)

Overall 0.25 ( 0.18, 0.32)

–1.5 –1 –.5 0 .5 1 1.5

Standardized Depression Outcomes

Figure 7.2. Enhanced care for depression: a random effects meta-analysis of 36 studies, comparing depression outcomes at 6 months in studies that
use screening to recruit patients, versus those where clinicians recruit patients with recognized depression. (Re-analysis of data from Bower P, Gilbody
SM, Richards D, et al. Collaborative care for depression: making sense of complex interventions through systematic review and meta-regression. Br J
Psychiatry. 2006;189:484–493.)
Table 7.1. Study Details and Method of Patient Recruitment from Studies of Collaborative or Enhanced Care for Depression

Study Name References Setting Sample Size Patient Population Recruitment Method
Adler 2004 62 US 533 Adults with major depression or Screening of primary care attenders
dysthymia (DSM-IV) using the Primary Care Screener for
Affective Disorders (PC-SAD)
Akerblad 46 Sweden 1,031 Adults with major depression and Physician referral, no screening
2003 an indication for antidepressants
Araya 2003 63 Chile 240 Women with major depression Screening of primary care attenders
using GHQ-12 (score 5 or more on
two occasions)
Blanchard 64 UK 96 Elderly with depression warranting Elderly nursing home residents
1995 clinical intervention screening positive with diagnostic
depression scale (DPDS)
Brook 2003 47 Netherlands 147 Adults with depressive complaints, Physician referral, no screening
prescribed new antidepressant
Bruce 2004 65 US 598 Elderly with major depression, Elderly patients screening positive
dysthymia, and minor depression using the CES-D (score > 20) or
responding positively to previous
history of depression
Callahan 66 US 175 Elderly with newly diagnosed Elderly patients screening positive
1994 depression using the CES-D (score > 20)
Capoccia 48 US 74 Adults with depression, prescribed Physician referral of new episode of
2004 a new antidepressant depression, no screening
Coleman 67 US 169 Depressed frail elderly Frail older adults who screened
1999 positive for a predictive index of
hospitalization. Use of CES-D as a
screening instrument integrated into
chronic care clinics.
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Datto 2003 49 US 61 Adults with depressive symptoms Physician referral of patients with
depression, no screening
Dietrich 68 US 405 Adults with major depression and Physician referral of patients with
2004 dysthymia (DSM-IV), starting/ depression, no screening as method of
changing treatment recruitment, but had to score
SCL-20 > 0.5 at enrollment
Finley 1999 51 US 125 Adults with current major Physician referral of patients already
depression, prescribed a new prescribed antidepressants
antidepressant
Hunkeler 52 US 302 Adults with major depression or Physician referral of patients with a
2000 dysthymia, prescribed a new new diagnosis of depression, and
antidepressant prescribed antidepressant
Jarjoura 69 US 121 Adults with major depression not Screening for inclusion using the
2004 currently in treatment PRIME-MD
Katon 1995 53 US 217 Adults with depression, prescribed Physician referral of patients with
a new antidepressant definite or probable depression
Katon 1996 53 US 153 Adults with depression, prescribed Physician referral of patients with
a new antidepressant definite or probable depression
Katon 1999 70 US 228 Adults at high risk of persistent Telephone screening using the SCID
depression, recurrent depression, or
dysthymia
Katon 2001 71 US 386 Adults, prescribed a new Telephone screening using the SCID
antidepressant, at high risk of
relapse
Katon 2004 72 US 329 Adults with diabetes with Telephone screening using the PHQ-9
depressive symptoms (score >=10)

(Continued )
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Katzelnick 38 US 407 Adults, high utilizers of services, Two-stage telephone screening
2000 with depressive symptoms procedure with the SCID and
Hamilton Depression Rating Scale
Mann 1998 54 UK 419 Adults with depression Primary care physician referral;
patients currently with a diagnosis and
in receipt of care for depression
Oslin 2003 73 US 97 Adults with depression or Primary care screening with CES-D
dysthymia, at-risk drinking (score > 15)
Peveler 1999 45 UK 160 Diagnosis of depression, prescribed Physician referral; patients with a new
a new antidepressant diagnosis of depression commencing
antidepressant medication
Rickles 2005 74 US 63 Prescribed a new antidepressant Patients with a newly initiated
prescription of antidepressant
medication
Rost 2001 41 US 243 Adults with major depression, Two-stage screening procedure using
prescribed a new antidepressant, WHO-CIDI administered by practice
recently treated nurses
Rost 2002b 41 US 189 Adults with major depression, Two-stage screening procedure using
prescribed a new antidepressant, WHO-CIDI administered by practice
beginning new episode nurses
Simon 2000 75 US 392 Adults with depression, prescribed Patients identified from computerized
a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication
Simon 76 US 402 Adults with depression, prescribed Patients identified from computerized
2004a a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication. No
screening.
Table 7.1. (Continued)

Study Name References Setting Sample Size Patient Population Recruitment Method
Simon 76 US 393 Adults with depression, prescribed Patients identified from computerized
2004b a new antidepressant records with a new diagnosis of
depression and commencing
antidepressant medication. No
screening.
Swindle 77 US 268 Adults with major depression, Primary care patients screening
2003 Dysthymia, or partially remitted positive with the PRIME-MD
major depression
Unutzer 78 US 1801 Elderly with major depression, Patients screened face to face or by
2001 dysthymia, or both phone from primary care lists or
attendance using CIDI
Wells 2000a 39 US 867 Adults with major depression or Consecutive primary care attenders
dysthymia screened using the CIDI
Wells 2000b 39 US 932 Adults with major depression or Consecutive primary care attenders
dysthymia screened using the CIDI
Whooley 40 US 331 Elderly with depressive symptoms Consecutive elderly primary care
2000 attenders screened using the GDS
(score >=6)
Wilkinson 55 UK 61 Adults with depression, prescribed Physician referral of patients with
1993 a new antidepressant already diagnosed depression

Adapted from Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for depression: a Cochrane systematic review and exploration of heterogeneity.
CMAJ. 2008;178:1023–1024; and Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative meta-analysis and review of longer-term outcomes.
Arch Intern Med. 2006;166:2314–2321.
136 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

The review by Bower and colleagues44 provides a richer and more complete
dataset than the USPSTF review within which to examine the relative con-
tribution of screening to the effectiveness of enhanced care. However, there are
several limitations to their approach. The most important limitation is the fact
that, despite using randomized studies, the exploratory comparison within a
meta-regression is an observational one and is therefore susceptible to con-
founding (alternative explanations for observed effects and relationships).56 In
this case, the use of screening could be confounded by other design-level
variables (such as increased intensity of care). Bower and colleagues44
sought to address this limitation by conducting a multivariate analysis of
these data to adjust for other potentially confounding covariates. They found
in their multivariate analysis that several of the positive associations found in
univariate meta-regression (such as this highlighted above) ceased to be sig-
nificant in multivariate analysis. The only study-level variable that remained
after adjusting for other potentially confounding variables was the mental
health background of the case manager (p = 0.03). Screening, in contrast,
became less significant (p = 0.19) when other variables were accounted for.
The most likely conclusion that can be drawn from this analysis is that the
effect of screening is weak and is potentially confounded by other study-level
variables. Screening as a recruitment strategy is not therefore likely to be an
independently significant predictor of the effectiveness of enhanced care
strategies. One might go further and suggest that good-quality collaborative
care is likely to be effective, whether or not screening is used.

5. To Screen or Not to Screen?

Despite the apparently differing conclusions and policy recommendations
relating to screening for depression, an evidence-based consensus seems to
emerge that screening when given alone is an ineffective strategy. This con-
clusion should not be surprising, since the quality of care for depression is often
poor57,58 and the addition of screening is likely only to identify an unmet need
without offering anything positive to improve the management and outcome of
this condition. It has been discussed elsewhere that screening identifies a
qualitatively different population of people with depression from those who
are already identified and managed in primary care (what Goldberg calls
‘‘conspicuous psychiatric morbidity’’59). The people identified by screening
programs tend to have less severe psychopathology, a better outcome, and a
general reluctance to take antidepressants or to benefit from medical or psy-
chosocial interventions (see Palmer and Coyne13 for review).
Low expectations and poor outcome of screening strategies have led to a
more fundamental rethinking of the organization of delivery of care for
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 137

depression.58 A direct result of the failure of the screening–detection–treat-

ment–improvement paradigm12 has been the emergence of organizational
enhancements of care, such as collaborative care.60,61
The conclusion that should be drawn from the re-analysis of existing studies
of collaborative care in the present chapter is that this strategy is generally
effective, but the assumption that screening is a key element of effective
enhancement might not be true. This is not a small and insignificant epide-
miologic issue of causal inference and confounding, but one that is of impor-
tance to practitioners and policymakers. The concerns relating to the relative
importance of screening in quality enhancement are important for two main
reasons. Firstly, policymakers have readily picked up on the positive endorse-
ment of screening from bodies such as the USPSTF and NICE without reading
the small print. Quality enhancement strategies have sometimes begun and
ended with screening, without the implementation of wider enhancements of
care. Screening is a quick and easy policy to implement, measure, and reward.
The experience in the United Kingdom is that screening and case-finding is
financially rewarded without any explicit requirement that the process of care
be improved any further.20 Secondly, for those who do choose to follow the
evidence and implement collaborative care, there are many decisions that need
to be made in the design of effective care systems. The use of screening as a
point of entry to enhanced care raises a number of ethical and logistical
issues.13 Screening usually identifies an unmet need and creates an increased
demand for care. If this demand is not met, screening itself might do more harm
than good. Services will have to be planned accordingly to meet this need (and
expectation of care) from within finite healthcare resources.
Ultimately, the most thorough way in which the effectiveness of screening
as a necessary or active component of enhanced care could be established
would be through the conduct of a randomized controlled trial of enhanced care
with screening, versus identical enhanced care without screening. To date (and
to our knowledge) there are no such trials, and it is debatable whether any such
trial will ever be conducted. In the interim, it is clear that screening is not a
sufficient intervention to improve the quality and outcomes of care for depres-
sion. What is less clear is whether screening is a necessary condition for
enhanced and improved quality of care for this important condition.

References
1. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed
patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919.
2. Murray CJ, Lopez AD. The global burden of disease: a comprehensive assessment of
mortality and disability from disease, injuries and risk factors in 1990. Boston: Harvard
School of Public Health on behalf of the World Bank, 1996.
138 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

3. Thomas C, Morris S. Cost of depression among adults in England in 2000. Br J

Psychiatry. 2003;183:514–519.
4. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in
the United States: how did it change between 1990 and 2000? J Clin Psychiatry.
2003;64:1465–1475.
5. Cepoiu M, McCusker J, Cole MG, et al. Recognition of depression by non-psychiatric
physicians—a systematic literature review and meta-analysis. J Gen Intern Med.
2008;23:25–36.
6. Simon G, Von Korff M. Recognition and management of depression in primary care.
Arch Fam Med. 1995;4:99–105.
7. Katon W, Ciechanowski P. Impact of major depression on chronic medical illness. J
Psychosom Res. 2002;53:859–863.
8. Wright A. Should general practitioners be testing for depression? Br J Gen Pract.
1994;44:132–135.
9. Sharp LK, Lipsky MS. Screening for depression across the lifespan: a review of
measures for use in primary care settings. Am Fam Physician. 2002;66:1001–1008.
10. U/S/ Preventive Services Task Force. Guide to clinical preventive services, 2nd ed.
Alexandria, VA: International Medical Publishing, 1996.
11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a
summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern
Med. 2002;136:765–776.
12. Klinkman MS, Coyne JC, Gallo S, et al. False positives, false negatives and the validity
of the diagnosis of major depression in primary care. Arch Family Med.
1998;7:451–461.
13. Palmer SC, Coyne JC. Screening for depression in medical care: pitfalls, alternatives,
and revised priorities. J Psychosom Res. 2003:54(4):279–287.
14. New Freedom Commission on Mental Health. Achieving the promise: transforming
mental health care in America—final report. Rockville, MD: DHHS Pub. No. SMA-
03–3832, 2003.
15. Cochrane AL, Holland WW. Validation of screening procedures. Br Med Bull.
1971;27:3–8.
16. Mant D, Fowler G. Mass screening: theory and ethics. BMJ. 1990;300:916–918.
17. Stewart-Brown S, Farmer A. Screening could seriously damage your health. BMJ.
1997;314:533–534.
18. Wilson JM, Junger CT. Principles and practice of screening for disease: World Health
Organization Public Health Paper 34. Geneva: World Health Organization, 1968.
19. Gilbody S, Whitty P, Grimshaw JG, et al. Improving the recognition and management
of depression in primary care. Effective Health Care Bulletin, University of York.
2002;7(Number 5).
20. Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ.
2006;332(7548):1027–1030.
21. National Screening Committee. The UK National Screening Committee’s Criteria for
appraising the viability, effectiveness and appropriateness of a screening programme
(available at http://www.nsc.nhs.uk/pdfs/criteria.pdf). London: HMSO, 2003.
22. Oxman TE, Sengupta A. Treatment of minor depression. Am J Geriatr Psychiatry.
2002;10:256–264.
23. Gilbody SM, House AO, Sheldon TA. Screening and case finding for depression. The
Cochrane Library (Issue 4). Chichester: Wiley Publishing, 2005.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 139

24. Gilbody SM, House AO, Sheldon TA. Routinely administered questionnaires for
depression and anxiety: a systematic review. BMJ. 2001;322:406–409.
25. Coyne JC, Palmer SC, Sullivan PA. Screening for depression in adults. Ann Intern Med.
2003;138(9):767–768.
26. AHCPR Depression Guideline Panel. Depression in primary care: detection, diagnosis,
and treatment. Technical report. Number 5. Rockville, MD: US Department of Health
and Human Services, Public Health Service, 2000.
27. MacMillan HL, Patterson CJS, Wathen CN, and The Canadian Task Force on
Preventive Health Care. Screening for depression in primary care: recommendation
statement from the Canadian Task Force on Preventive Health Care. CMAJ.
2005;172(1):33–35.
28. Agency for Health Care Policy Research. Depression in primary care. Washington DC:
US Department of Health and Human Services, 1993.
29. Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for
depression: a Cochrane systematic review and exploration of heterogeneity. CMAJ.
2008;178:1023–1024.
30. Beck D, Gilbody SM. Screening and case finding for depression. The Cochrane Library
(Issue 4). Chichester: Wiley Publishing, 2008.
31. National Institute for Clinical Excellence. Depression: core interventions in the
management of depression in primary and secondary care. London: HMSO, 2004.
32. Hickie IB, Davenport TA, Ricci CS. Screening for depression in general practice and
related medical settings. Med J Austr. 2002;177(7 Suppl):S111–S116.
33. Wells KB. The design of Partners in Care: evaluating the cost effectiveness of improving
care for depression in primary care. Social Psychiatry Psychiatr Epidemiol. 1999;34:20–29.
34. Gilbody S, Whitty P, Grimshaw J, et al. Educational and organizational interventions to
improve the management of depression in primary care: a systematic review. JAMA.
2003;289:3145–3151.
35. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative
meta-analysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314–
2321.
36. Bower P, Gilbody S. Managing common mental health disorders in primary care:
conceptual models and evidence base. BMJ. 2005;330:839–842.
37. Wells K, Sherbourne C, Schoenbaum M, et al. Five-year impact of quality improvement
for depression: results of a group-level randomized controlled trial. Arch Gen
Psychiatry. 2004;61:378–386.
38. Katzelnick DJ, Simon GE, Pearson SD, et al. Randomized trial of a depression
management program in high utilizers of medical care. Arch Fam Med. 2000;9:345–
351.
39. Wells KA, Sherbourne C, Schoenbaum M, et al. Impact of disseminating quality
improvement programs for depression in managed primary care: a randomized
controlled trial. JAMA. 2000;283:212–220.
40. Whooley MA, Stone B, Soghikian K. Randomized trial of case-finding for depression
in elderly primary care patients. J Gen Intern Med. 2000;15:293–300.
41. Rost K, Nutting PA, Smith J, et al. Improving depression outcomes in community
primary care practice: a randomised trial of the QuEST intervention. J Gen Intern Med.
2001;16:143–149.
42. Thompson S. Why sources of heterogeneity in meta-analysis should be investigated. In:
Chalmers I, Altman DG, eds. Systematic reviews. London: BMJ, 1995.
140 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

43. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and
interpreted? Stat Med. 2002;21:1559–1573.
44. Bower P, Gilbody SM, Richards D, et al. Collaborative care for depression: making
sense of complex interventions through systematic review and meta-regression. British
Journal of Psychiatry 2006;189:484–493.
45. Peveler R, George C, Kinmonth AL, et al. Effect of antidepressant drug counselling and
information leaflets on adherence to drug treatment in primary care: randomised
controlled trial. BMJ. 1999;319:612–615.
46. Akerblad AC, Bengtsson F, Ekselius L, et al. Effects of an educational compliance
enhancement programme and therapeutic drug monitoring on treatment adherence in
depressed patients managed by general practitioners. Int Clin Psychopharmacol.
2003;18:347–354.
47. Brook O, van Hout H, Nieuwenhuyse H, et al. Impact of coaching by community
pharmacists on drug attitude of depressive primary care patients and acceptability to
patients; a randomized controlled trial. Eur Neuropsychopharmacol. 2003;13:1–9.
48. Capoccia K, Boudreau D, Blough D, et al. Randomized trial of pharmacist interventions
to improve depression care and outcomes in primary care. Am J Health System
Pharmacy. 2004;61:364–372.
49. Datto CJ, Thompson R, Horowitz D, et al. The pilot study of a telephone disease
management program for depression. Gen Hosp Psychiatry. 2003;25:169–177.
50. Dietrich AJ, Oxman TE, Williams JW Jr, et al. Going to scale: re-engineering systems
for primary care treatment of depression. Ann Fam Med. 2004;2(4):301–304.
51. Finley P, Rens H, Gess S, et al. Case management of depression by clinical pharmacists
in a primary care setting. Formulary. 1999;34:864–870.
52. Hunkeler EM, Meresman JF, Hargreaves WA, et al. Efficacy of nurse telehealth care
and peer support in augmenting treatment of depression in primary care. Arch Fam
Med. 2000;9:700–708.
53. Katon W, Robinson P, Von Korff M, et al. A multifaceted intervention to improve
treatment of depression in primary care. Arch Gen Psychiatry. 1996;53(10):924–932.
54. Mann A, Blizard R, Murray J. An evaluation of practice nurses working with general
practitioners to treat people with depression. Br J Gen Pract. 1998;48:875–879.
55. Wilkinson G, Allen P, Marshall E. The role of the practice nurse in the management of
depression in general practice: treatment adherence to antidepressant medication.
Psychol Med. 1993;23:229–237.
56. Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-
regression. Statistics in Medicine. 2004;23:1663–1682.
57. Katon W, von Korff M, Lin E, et al. Adequacy and duration of antidepressant treatment
in primary care. Med Care. 1992;30:67–76.
58. Katon W, Von Korff M, Lin E, et al. Population-based care of depression: effective
disease management strategies to decrease prevalence. Gen Hosp Psychiatry.
1997;19:169–178.
59. Goldberg D. The detection of psychiatric illness by questionnaire. Oxford: Oxford
University Press, 1972.
60. Simon G. Collaborative care for depression. BMJ. 2006;332:249–250.
61. Unutzer J, Schoenbaum M, Druss BG, et al. Transforming mental health care at the
interface with general medicine: report for the President’s Commission. Psychiatr Serv.
2006;57:37–47.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE 141

62. Adler DA, Bungay KM, Wilson IB, et al. The impact of a pharmacist intervention on
6-month outcomes in depressed primary care patients. Gen Hosp Psychiatry.
2004;26(3):199–209.
63. Araya R, Rojas G, Fritsch R, et al. Treating depression in primary care in low-income
women in Santiago, Chile: a randomised controlled trial. Lancet. 2003;361:995–1000.
64. Blanchard MR, Waterreus A, Mann AH. The effect of primary care nurse
intervention upon older people screened as depressed. Int J Geriatr Psychiatry.
1995;10:289–298.
65. Bruce M, Ten Have T, Reynolds C, et al. Reducing suicidal ideation and depressive
symptoms in depressed older primary care patients. JAMA. 2004;291(9):1081–1091.
66. Callahan C, Hendrie H, Dittus R, et al. Improving treatment of late life depression in
primary care: a randomized clinical trial. J Am Geriatr Soc. 1994;42:839–846.
67. Coleman EA, Grothaus LC, Sandhu N, et al. Chronic care clinics: a randomized
controlled trial of a new model of primary care for frail older adults. J Am Geriatr
Soc. 1999;47:775–783.
68. Dietrich AJ, Oxman TE, Williams JW, et al. Re-engineering systems for the treatment
of depression in primary care: cluster randomised controlled trial. BMJ.
2004;329:602–609.
69. Jarjoura D, Polen A, Baum E, et al. Effectiveness of screening and treatment for
depression in ambulatory indigent patients. J Gen Intern Med. 2004;19(1):78–84.
70. Katon W, Von Korff M, Lin E, et al. Stepped collaborative care for primary care
patients with persistent symptoms of depression: a randomized trial. Arch Gen
Psychiatry. 1999;56:1109–1115.
71. Katon W, Rutter C, Ludman EJ, et al. A randomized trial of relapse prevention of
depression in primary care. Arch Gen Psychiatry. 2001;58:241–247.
72. Katon WJ, Von Korff M, Lin EHB, et al. The Pathways Study: a randomized trial of
collaborative care in patients with diabetes and depression. Arch Gen Psychiatry.
2004;61:1042–1049.
73. Oslin D, Sayers S, Ross J, et al. Disease management for depression and at risk drinking
via telephone in an older population of veterans. Psychosom Med. 2003;65:931–937.
74. Rickles N, Svarstad BL, Statz-Paynter JL, et al. Pharmacist telemonitoring of
antidepressant use: effects on pharmacist–patient collaboration. J Am Pharm Assoc.
2005;45:344–353.
75. Simon G, Von Korff M, Rutter C, et al. Randomised trial of monitoring, feedback and
management of care by telephone to improve treatment of depression in primary care.
BMJ. 2000;320:550–554.
76. Simon GE, Ludman EJ, Tutty S, et al. Telephone psychotherapy and telephone care
management for primary care patients starting antidepressant treatment: a randomized
controlled trial. JAMA. 2004;292(8):935–942.
77. Swindle R, Rao J, Helmy A, et al. Integrating clinical nurse specialists into the
treatment of primary care patients with depression. Int J Psychiatry Med.
2003;33(1):17–37.
78. Unutzer J, Katon W, Williams J, et al. Improving primary care for depression in late
life: the design of a multicenter randomized trial. Med Care. 2001;39(8):785–799.
This page intentionally left blank
8
TECHNOLOGICAL APPROACHES
TO SCREENING AND CASE FINDING
FOR DEPRESSION

William H. Rogers, Debra Lerner, and David A. Adler

1. Technological Methods of Screening for Depression

2. Ten Issues When Developing Computerized Screening for Depression
3. Examples of Implementation of Computerized Screening for Depression
4. Discussion
5. Conclusion

Context
What are the strengths and weaknesses of computer-based and other auto-
mated methods of detecting depression? Two promising technologies make use
of the Internet and speech recognition. Whatever technology is used, each
method needs to be assessed rigorously using the same high standards that
have been applied to pencil-and-paper tests.

We are in the midst of a technological revolution that inevitably will transform

psychiatric clinical practice. A consensus for routine depression screening is
building,1,2 and at the same time methods by which it could be accomplished are
emerging. The hope is that the right technology can provide an easy, inexpensive,
valid, and reliable public health approach to depression screening.
Computerized assessment is well accepted in diverse fields, and the use of
Internet-based survey technology has grown exponentially.3–7 Issues regarding
the strengths and limitations of computerized assessments are addressed reg-
ularly in the literature.3–11 For example, such assessments have been shown to
improve data quality while at the same time reducing cost as well as the time to
score, analyze, and report results. Increasingly, as depressive disorders have

143
144 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

been recognized as highly prevalent with significant morbidity, multiple

screeners using an array of technological advances have been developed2,12–33
(Table 8.1 lists selected studies).34–49
This chapter will review the technologies that are currently available for
automated depression screening and will discuss them in terms of criteria that
should dictate their adoption.

1. Technological Methods of Screening for Depression

The growing list of technologies can be classified on several dimensions. Perhaps
the most important of these is adaptive vs. non-adaptive. In an adaptive technology
pioneered by the Educational Testing Service,50 a computer, using a prepro-
grammed algorithm, decides which question to ask next given the responses so
far.3,9,48,49,51–55 Paper-and-pencil is the classical non-adaptive technology—
everyone gets the same paper with the same questions in the same order.
Technological modality is a second dimension. Currently available tech-
nologies include the phone, the Internet, and hand-held electronic devices.5
The phone can be split into several groups, including agent: computer-assisted
telephone interview (CATI), speech recognition, and touch-tone. Phone can
also be classified as inbound (the patient initiates the call to a toll-free number)
or outbound (the system initiates the call). Hand-held devices could include
tablets such as personal digital assistants, game consoles, modern cell phones,
or ‘‘electronic paper.’’ Internet-based screeners (eg, Patient Health
Questionnaire-9 [PHQ-9], Zung Self-Rating Depression Scale),13,20 can be
implemented through standard web browsers, at public kiosks, or through
connected hand-held devices. In this chapter, all of these methods are classified
together under the term ‘‘Internet’’ because they follow a common approach of
visually presenting the screener or monitoring instrument and taking responses
by interaction with that visual image. There is always a computer involved in
presenting the data and recording the responses.
One can even envision the day when more futuristic technologies such as
eye-tracking equipment, brain scans, blood tests, or electrical system monitors
for depression will be available.
Two basic premises underlie our discussion:
1. There is no fail-proof methodology. There is no single technology that
guarantees success, but some technologies have inherent failures.
2. Implementation and circumstances matter. A technology that performs
well in one setting (eg, Internet screening at home) may be unacceptable
in another (automated screening on a desktop computer in a physician’s
waiting room). In the current marketplace, there are no full-service
automated systems that are embedded in an electronic medical record.
Table 8.1. Technological Methods of Depression Screening: Summary of Studies

Technological Author/ Sample/Setting Accuracy of Computerized Method Comment

Method Publication

Mental Health-Based Studies

Computer voice Gonzales (2007), English- and Spanish-speaking CES-D 20, alpha = 0.87–0.91 Computerized CES-D speech
recognition: Hisp J Behav Sci patients, n = 217, visual computer/written; CES-D vs. BDI-2: recognition vs. written acceptable in
VIDAS r = 0.74–0.86; ROC CES-D (cut point both English and Spanish speakers;
of 16) vs. CIDI-SF: Se: 0.88–1.0; Sp: visual somewhat better than aural
0.42–0.20; PPV: 0.61–0.28; NPV:
0.77–1.0
Computer vs. Kobak (1997), CMHC, n = 51 PRIME-MD IVR/Desktop and SCID- IVR vs. Desktop of PRIME-MD and
IVR telephone Psychiatr Serv IV for MDD Kappa 0.49/0.27; Se: compared to phone SCID-IV, Ham
0.77/0.77; Sp: 0.75/0.50; PPV: 0.87/ D-17 and chart Dx, both acceptable
0.77; NPV: 0.77/0.69; similar phone SCID and chart Dx, both
prevalence rates acceptable
Computer voice Munoz (1999), Women’s health clinic, n = 104 CES-D 20 and MDD (18-item DIS Voice recognition of CES-D and
recognition J Consult Clin English- and Spanish-speaking Mood questions) Screener K=0.82/ MDD screener to clinician interview
Psychol women 0.89 for current and lifetime MDD for of both plus PRIME-MD yielded
computer vs. interview K = 0.81/0.75 comparable results
computer vs. interview of MDD vs.
PRIME-MD Se: 0. 89/0.91 current/
lifetime MDD; Sp: 0.93/0.91 current/
lifetime MDD
Population-Based Studies
IVR using Baer (1995), Midwest Univ. and NE high-tech Zung (SDS)-20 found acceptable by No direct comparison with other
telephone JAMA firm; n = 1,812; 1,597/1,812 Zung subjects forms of screening
keypad completers
Computer Lin (2007), BMC Taiwanese volunteers, n = 579 ISP-D for MDD vs. MINI (N=55): Internet-based Self-assessment
touchscreen Psychiatry Kappa 0.80; Se: 0.82; Sp: 0.73; PPV: Program for Depression (ISP-D) is
0.67; NPV: 0.86 reliable and valid online tool for
assessing depression with excellent
retest reliability

(Continued )
Table 8.1. (Continued)

Technological Author/ Sample/Setting Accuracy of Computerized Method Comment

Method Publication

Computer Patton (1999), Australian HS students; n = 2,032 Computerized CIS-R to live CIDI 2–9 Students favorable to computer
Soc Psychiatry 65 of 1,729 completers with MDD weeks late CISR/CIDI Se: 0.97; Sp:
Psychiatr 0.18; PPV: 0.49; NPV: 0..91
Epidemiol
Medical-Based Studies
Computer Allenby (2002), Australian amb. oncology center; n BDI-2, Cancer Needs Questionnaire; No direct comparison with other
touchscreen Eur J Cancer = 450, median age 61 EORTC QLQ-C30 forms of screening. Acceptable to
Care patients
Computer Bliven (2001), Cardiac OPD, n = 55 SF-36, 8 subscales/Seattle Angina Compared computer to written, 82%
touchscreen Quality of Life Quest. SF-36 computer/written r preferred computer
Research = 0.54–0.76; SF-MH mean scale
computer/written: 66.19/65.77; r =
0.54
Computer Cull (2001), Br J Outpatient chemotherapy patients, MHI-5>10, Hospital Anxiety and Two (HADS and MHI-5) screeners 2–
touchscreen Cancer n = 172 Depression Scale (HADS) >8, 4 weeks apart compared to an in-
computer vs. PSE diagnosis of MDD: person interview using Present State
Se: 0.85; Sp: 0.71; PPV: 0.47; NPV: Exam (PSE) within a week
0.26
Computer Kurt (2004), Pts. >65, PCP office; n = 240; 68/ CESD-20 (or 35) and GDS (Geriatric Patients favorable to computer
touchscreen Computer 240 participated Depression Scale) computer/written:
Methods in BL reliability: 0.74/0.72 computer/
Biomedicine written: F/Up reliability: 0.61/0.83
Computer Sharpe (2004), Br Cancer center, n = 5,613; 891/ Comparison of Hospital Anxiety and No direct comparison with other
touchscreen J Cancer 3,938 HADS completers, score Depression Scale (HADS) with DSM- forms of screening
>14; 196/570 interviewed had IV SCID clinician telephone
MDD interview
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 147

2. Ten Issues When Developing Computerized Screening

for Depression
With this in mind, we now consider the issues that arise regarding the use of
automated screeners in general and depression-monitoring instruments
specifically.

Quality Control and Accuracy

The first question posited in any discussion of automation is its accuracy.
Technology-based methods are more consistently applied, which implies
more comparable and interpretable data.3,6,17,20,47,56–66 No human bias is
introduced. Clinician interviews and agent-administered phone CATI depend
on a human being. A clinician or an agent speaks and listens differently every
time. Paper-and-pencil screeners, as well as automated electronic surveys,
eliminate this source of variation. If this advantage is pursued, agreement
with known standards can be improved beyond what is possible with a clinician
or agent. While the technology already exists, ensuring accuracy rests on the
craftsmanship of the instrument (eg, inaccurate or poorly designed program-
ming will result in poor-quality data).

Error Control
Evidence to date is that different data collection methods do not change the
probability that the answer is recorded as intended.7 In paper-and-pencil
screeners, respondents can make stray marks that scanners cannot easily
interpret. These can be reduced to acceptable levels by providing clear instruc-
tions with examples on how to make marks. In speech recognition systems,
respondents can speak responses outside of the answer set, but asking questions
in a way that prompts a response in range and challenging responses that do not
seem to be within range can reduce this.36 For both of these systems, human
post-response review of questionable responses is desirable. For example,
scanners can detect stray marks and voice recognizers can identify problematic
voice input. With these measures, very low error rates (eg, over 99.5% correct)
are possible. Without these measures, the error rates are low but errors do occur
(eg, over 98% correct). Numerous data companies report error control checks
within these ranges and better.
Nominal error rates for touch-tone and for the Internet and related technol-
ogies such as kiosks and hand-held devices are low because these systems
enforce a single answer. However, this does not mean that such devices are free
of error. The error rates on the Internet are low if the respondent can see all the
responses and no default choices are premarked. Several studies have found
148 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

that Internet surveys and mail are equivalent.67 If the respondent has to click a
mouse to see all the responses, then the results will be biased. For touch-tone
interactive voice recognition (IVR), elderly respondents and those whose
touch-tone buttons are in the receiver are likely to have high error rates, but
no further identification of errors is possible without a very laborious review of
each response—a practice suitable for banking but not for screening question-
naires. Touch-tone also invites cognitive errors because the verbal responses
must be converted to numerical form before they can be entered. Most studies
have concluded that touch-tone is not equivalent to mail.67

Honesty
Research has shown repeatedly that respondents even with depression are more
honest with computers or mail than they are with live interviewers, translating
into better acccuracy.59,60,64,68–70

Physical Clues
Conversely, human interpreters, and especially clinician interviewers, are best
at dealing with clues such as crying, gaps in speech, or slurred, sped-up, or
retarded speech that might have important implications in the screening pro-
cess.4 Voice recognition systems could also be trained to find these, but this has
not happened yet to our knowledge, and it would never be as good as trained
clinicians meeting with depressed individuals.

Performance
Case-specific performance data are key to successful use of an automated
system, given the potential time savings.7,20 Physicians can use the results
most efficiently if patient-specific reports of positive predictive value (PPV)
and negative predictive value (NPV) are included. In one of the few studies
addressing depression, Kobak and colleagues,20 using the PHQ-9, reported a
PPV of 0.87 and a sensitivity for touch-tone and IVR of 0.84 to 0.88. The cost
of untreated depression is high, particularly among employed patients,71–74 so
automated screening will normally be cost-effective compared with the hap-
hazard approach characteristic of population screening. If the screener cannot
find cases (poor sensitivity or low NPV), then other case-finding tools may
need to be used anyway.

Workload Considerations
A highly effective automated system that is used to screen all individuals
routinely has the potential to generate many possible or probable cases very
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 149

quickly. For example, as found in studies by Sharpe30 and Cull40 and their
colleagues, if every attendee at a regional cancer center is assessed, it is
possible that 20% might be flagged as high scorers on a depression scale.
Even with a second filter such as request for help, a large number of people
may need to be seen. The potential benefit of a high yield of true cases might
come at the expense of a large number (in absolute terms) of false positives,
each of whom has higher expectations on the basis of the first-stage alert and
needs to be have follow-up. Alternatively, fear of workload may defeat the
screening process itself. When the PPV is too much below 70%, physicians
may choose to ignore screening results on the grounds that following up 30% or
more who are false positives is too much work.7,75 Although PPV and sensi-
tivity are affected by response errors, they are more influenced by the screening
instrument itself. The balance between them is implementation-specific. In
general, demanding criteria for diagnosing depression will result in good PPV
but poor sensitivity.27

Acceptability
A system is useful only if subjects are willing to use it; acceptability is a
necessity for implementation of any automated screening system. Most of the
evidence to date suggests that patients accept automated screening as a general
idea compared with visits to mental health specialists.3,6,20,24,30,40 A number of
national studies have had excellent response rates with no particular item non-
response on depression screening questions.38,47,76–78
With respect to the technologies, the survey response literature has some
lessons to teach. The technological challenge to the respondent of touch-tone
IVR is higher than speech recognition; touch-tone response rates are lower.
The Internet (and associated device-related technologies) is generally regarded
as usable, but not every home has a computer, and in many businesses personal
computer use is restricted or frowned upon.4 In addition, many people have
privacy worries about the Internet, and in some businesses these are justified.79
Some degree of computer skill and literacy is necessary.38 The impact of age
cohort, gender, and cultural issues requires further study. This suggests that
alternatives to the Internet will remain useful. Combination approaches invol-
ving Internet, phone, and either outbound calling or mail achieve the best
coverage.67

Prices
As a general rule, prices are highly implementation-dependent, and a bid is
necessary to know what the price will be. However, some general principles
apply. Paper-and-pencil surveys depend on a combination of mailing costs and
150 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

processing costs.80,81 Very efficient high-end scanners are available, but they
must still be fed. Even a ‘‘free’’ screener that is entered by fax machine in a
doctor’s office costs more than $3 when the cost of handing out the survey,
collecting the response, and feeding it into a fax machine is counted. If mail is
involved, back-end duties can be handled by clerks, but this cost reduction is
more than offset by the price of mailing.6,67,76 The traditional methods of
screening such as paper surveys and scanning are only suited to large-scale
data-collection systems with central mail processing facilities and are difficult
to manage in smaller settings. For Internet screeners and voice recognition or
touch-tone, the marginal cost of the screener ranges from nothing to a dollar,
but there are fixed costs associated with developing and fielding the system
purposes.82–84 Such costs are typically between $10,000 and $25,000.12

Availability
All of the methods except for scanned paper-and-pencil surveys can be pro-
cessed immediately, with real-time feedback to respondents about what to do.
Patients often have time to consider the possibilities at times of the day when
physicians are not available (eg, the middle of the night). Results are immedi-
ately available without transcription error.47

Embedding in a System
To be useful, a screening system needs to be embedded in a healthcare system
that can deal with the information.3,7,20,85,86 Unless the results are available and
retrievable, they are useless. This very important issue is mostly beyond the
scope of this paper. Technology has some impact. A mailed and scanned
questionnaire cannot be acted on in a timely way. All of the electronic methods
can be followed up with questions about context (Did someone important to
you die recently? Are you thinking of taking your life soon?). In principle, the
results can be transmitted to electronic medical records (EMR) or physician
e-mail, if the setting allows for one. Contextual data such as medications could
also be drawn from an EMR. In the current environment, embedding screeners
is still a custom operation—EMR is not at this point sold with a depression
screener or monitor website included.

3. Examples of Implementation of Computerized Screening

for Depression
Whether a system is actually acceptable in practice depends on both the
technology and the context. All of the technologies have been shown to be
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 151

acceptable in some context (see Table 8.1 for selected studies discussed
below). For example, in our prior work, most patients in primary care offices
were willing to fill out a two-page depression screener that was immediately
scanned.26 We are now using web-based touchscreen methodology to screen

Figure 8.1.a Work and Health Initiative depression pre-screener.

Figure 8.1.b Work and Health Initiative depression pre-screener.

152 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Figure 8.1.c Sample electronic WHI Patient Depression Report.

employed individuals for depression in workplace settings (Fig. 8.1). The study
by Baer and associates13 using IVR with telephone keypad response was one of
the first to demonstrate the use and acceptability of fully automated technology
for confidential mass depression screening. Two recent studies—Gonzalez and
associates,36 using computer voice recognition, and Lin and coworkers,35
using computer touchscreen—found good psychometric properties for well-
accepted depression screeners compared to standardized diagnostic in-person
interviews. Kobak and associates,19,20,47,61 in a series of studies, demonstrated
the acceptability and equivalence of all forms of depression screening (clin-
ician interview by telephone, phone IVR, and computer touchscreen). Kurt and
colleagues22 found similar results for a computer-assisted assessment of
depression in geriatric primary care patients. Even in a minority population,
Munoz and associates24 met no resistance to depression screening with com-
puterized voice-recognition technology.
In non-mental health outpatient settings Allenby and colleagues12 in
oncology and Bliven and associates80 in cardiology found high degrees of
acceptability for computer-assisted technology in screening for psychosocial
distress. Sharpe and colleagues30 applied touchscreen technology and found no
resistance to screening for depression and anxiety in a regional ambulatory
cancer center. Cull and colleagues40 used touchscreen technology to admin-
ister the Mental Health Index and Hospital Anxiety and Depression Scale to
develop a depression screening algorithm with adequate psychometric proper-
ties among outpatient cancer patients.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 153

4. Discussion
Automated methods for both general health and depression-specific screening
are here to stay. They produce more accurate answers, are more suited to
evidence-based medicine, and are less expensive than paper-and-pencil
person-dependent methods or mail. Electronic methods are also superior to
paper and pencil because they produce timely answers and can also explore
some of the follow-up issues, such as more detail about suicidal ideation or
how the patient fits into the care process. While mental health clinicians’ face-
to-face observations of patients can identify verbal and nonverbal depressive
cues and lead to more immediate response, most individuals with depression
are not seen in the mental health specialty sector. However, gaps in both
evidence and barriers remain to effective widespread use.
Once a screening context is established, then some methods that are accep-
table in principle become unacceptable in practice. For example, most patients
would feel uncomfortable conducting a phone interview while sitting in a
crowded waiting room, or taking an Internet-based screener on a home com-
puter known to be infected with a virus. On the other hand the same patients
might feel comfortable taking a phone interview at home or completing an
Internet-based screener on a computer in a private room off the waiting area at
the doctor’s office. A number of groups have studied the issues of implementa-
tion in a number of settings focusing on acceptability and accuracy
(see Table 8.1). In general, these pilot projects find that depressed patients
are able to accurately complete both computer (desktop and web) and tele-
phone screener methodologies and find them acceptable alternatives to both
paper-and-pencil and clinician interviews. Just as with conventional methods,
there is no one-size-fits-all answer: multiple modalities are needed to meet
varied patient and provider needs. Solution modality by itself (eg, Internet,
phone, or tablet) is not the answer—much of the value lies in the craft with
which it is executed. Good-quality solutions are available in all three modal-
ities, but so are poor solutions. Choice is dependent upon purpose. If tech-
nology such as computer-adaptive testing is to be applied to population
screening, a multi-tiered approach can improve the accuracy. For example, a
general mental health prescreening can efficiently reduce the number of
individuals who might then be followed with a diagnosis-specific pre-screener,
reserving full screening for at-risk populations and for following patients
known to have a depressive disorder.
With respect to acceptability, the evidence to date suggests that automated
depression screening via web, computer, telephone, or soon tablet does not
incur reluctance by those screened. With respect to follow-up, however, the
story may differ. In most health risk-appraisal systems, patients and providers
can ignore a positive depression screener. On the other hand, a positive
154 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

screener can lead to overreaction. Work needs to be done on the back end of a
positive screener to identify cases that are appropriate for follow-up. Careful
thought needs to be given to how results will be handled with providers, what
follow-up would be cost-effective, and who will need to deliver follow-up
services. Nonetheless, without an electronic system, there is no mechanism to
help the system address these issues.
The marketplace will continue to define and redefine solutions that are
available and affordable. We have raised a set of questions that should be
asked of such systems and put them into two categories: concerns that are
frequently raised but usually do not turn out to be important issues (eg,
accuracy and acceptability) and concerns that have often led to existing
systems working less well than they could and that need to be addressed in
every implementation (eg, privacy, follow-up, and the interface of automated
results to the physician–patient relationship).

5. Conclusion
Thirty years of research has led to the conclusion that the benefits of
automated methods outweigh their limitations in general,3,6,7 for mental
health issues,3,20,58,61,62,64,68,87 and specifically for depression
13,15,16,20,24,35,36,47,88,89
screening. In the absence of information about a parti-
cular implementation and the setting it is in, one cannot say that it is auto-
matically worthwhile or unacceptable. However, one can say that pencil-and-
paper screeners will be effective only under a limited set of conditions that
avoid the costs and delays commonly associated with mail. The two most
promising technologies seem to be the Internet (using web browsers and/or
hand-held devices) and speech recognition. Whatever technology is used, there
needs to be a good fit between the technology and the system within which it is
deployed.86 Acceptability depends on context; accuracy depends on craft. The
system needs to connect the patient to a physician and support that physician
with the correct information.

References
1. Agency for Health Care Policy and Research. Depression in primary care: detection
and diagnosis. Rockville, MD, 1993.
2. U.S. Preventive Services Task Force. Guide to clinical preventive services, 2nd ed.
Baltimore: Williams & Wilkins, 1996.
3. Berger M. Computer-assisted clinical assessment. Child Adolesc Mental Health.
2006;11(2):64–75.
4. Butcher JN, Perry J, Hahn J. Computers in clinical assessment: historical developments,
present status, and future challenges. J Clin Psychol. 2004;60(3):331–345.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 155

5. Dillman DA. Mail and Internet surveys: the tailored design method, 2nd ed. Hoboken,
NJ: John Wiley & Sons, 2007:352–412.
6. Epstein J, Klinkenberg WD. From Eliza to Internet: a brief history of computerized
assessment. Computers in Human Behavior. 2001;17:295–314.
7. Garb HN. Computer-administered interviews and rating scales. Psychol Assess.
2007;19(1):4–13.
8. Buchanan T, Smith JL. Using the Internet for psychological research: personality
testing on the World Wide Web. Br J Psychol. 1999;90(Pt 1):125–144.
9. Revicki DA, Cella DF. Health status assessment for the twenty-first century: item
response theory, item banking and computer adaptive testing. Qual Life Res.
1997;6(6):595–600.
10. Truell AD, Bartlett JE, Alexander MW. Response rate, speed, and completeness: a
comparison of Internet-based and mail surveys. Behav Res Methods Instrum Comput.
2002;34(1):46–49.
11. Schleyer TK, Forrest JL. Methods for the design and administration of web-based
surveys. J Am Med Inform Assoc. 2000;7(4):416–425.
12. Allenby A, Matthews J, Beresford J, et al. The application of computer touch-screen
technology in screening for psychosocial distress in an ambulatory oncology setting.
Eur J Cancer Care (Engl). 2002;11(4):245–253.
13. Baer L, Jacobs DG, Cukor P, et al. Automated telephone screening survey for
depression. JAMA. 1995;273(24):1943–1944.
14. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression
Inventory: twenty-five years of evaluation. Clin Psychol Rev. 1988;8:77–100.
15. Gonzalez GM, Spiteri CB, Knowlton JP. An exploratory study using computerized
speech recognition for screening depressive symptoms. Computers in Human Behavior.
1995;11(1):85–93.
16. Carr AC, Ancill RJ, Ghosh A, et al. Direct assessment of depression by microcomputer.
A feasibility study. Acta Psychiatr Scand. 1981;64(5):415–422.
17. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med.
1983;13(1):151–158.
18. Klinkman MS, Coyne JC, Gallo S, et al. Case finding instruments to be used to
improve physician detection of depression in primary care. Arch Fam Med.
1997;6:567–573.
19. Kobak KA, Reynolds WM, Rosenfeld R, et al. Development and validation of a
computer-administered version of the Hamilton Depression Rating Scale. Psychol
Assess. 1990;2:56–63.
20. Kobak KA, Taylor LVH, Dottl SL, et al. Computerized screening for psychiatric
disorders in an outpatient community mental health clinic. Psychiatr Serv.
1997;48(8):1048–1057.
21. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity
measure. J Gen Intern Med. 2001;16(9):606–613.
22. Kurt R, Bogner HR, Straton JB, et al. Computer-assisted assessment of depression and
function in older primary care patients. Comput Methods Programs Biomed.
2004;73(2):165–171.
23. Mulrow CD, Williams JW Jr, Gerety MB, et al. Case-finding instruments for depression
in primary care settings. Ann Intern Med. 1995;122(12):913–921.
24. Munoz RF, McQuaid JR, Gonzalez GM, et al. Depression screening in a women’s
clinic: using automated Spanish- and English-language voice recognition. J Consult
Clin Psychol. 1999;67(4):502–510.
156 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

25. Patton GC, Coffey C, Posterino M, et al. A computerised screening instrument for
adolescent depression: population-based validation and application to a two-phase
case-control study. Soc Psychiatry Psychiatr Epidemiol. 1999;34(3):166–172.
26. Rogers WH, Wilson IB, Bungay KM, et al. Assessing the performance of a new
depression screener for primary care (PC-SAD(c)). J Clin Epidemiol.
2002;55(2):164–175.
27. Rogers WH, Adler DA, Bungay KM, et al. Depression screening instruments make
good severity measures in a cross-sectional analysis. J Clin Epidemiol.
2005;58:370–377.
28. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility
of depression screening. Psych Serv. 1998;49(1):55–61.
29. Schwenk TL. Screening for depression in primary care: a disease in search of a test.
J Gen Intern Med. 1996;11:437–439.
30. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a regional
cancer centre: screening and unmet treatment needs. Br J Cancer. 2004;90(2):314–320.
31. Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of
PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental
Disorders. Patient Health Questionnaire. JAMA. 1999;282(18):1737–1744.
32. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in
primary care. Ann Intern Med. 2001;134(5):345–360.
33. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression.
Two questions are as good as many. J Gen Intern Med. 1997;12(7):439–445.
34. Kim H, Bracha Y, Tipnis A. Automated depression screening in disadvantaged
pregnant women in an urban obstetric clinic. Arch Womens Ment Health.
2007;10(4):163–169.
35. Lin CC, Bai YM, Liu CY, et al. Web-based tools can be used reliably to detect patients
with major depressive disorder and subsyndromal depressive symptoms. BMC
Psychiatry. 2007;7:12.
36. Gonzalez GM, Carter C, Blanes E. Bilingual computerized speech recognition
screening for depression symptoms: comparing aural and visual methods. Hispanic
Journal of Behavioral Sciences. 2007;29(2):156–180.
37. Fann J, Berry DL, Wolpin SE, et al. Feasibility of depression screening using the PHQ-9
administered on a touchscreen computer. Psychooncology. 2006;15(1):S18–S18.
38. Ekman A, Dickman PW, Klint A, et al. Feasibility of using web-based questionnaires in
large population-based epidemiological studies. Eur J Epidemiol. 2006;21(2):103–111.
39. Hyler SE, Gangure DP, Batchelder ST. Can telepsychiatry replace in-person psychiatric
assessments? A review and meta-analysis of comparison studies. CNS Spectr.
2005;10(5):403–413.
40. Cull A, Gould A, House A, et al. Validating automated screening for psychological
distress by means of computer touchscreens for use in routine oncology practice.
Br J Cancer. 2001;85(12):1842–1849.
41. Houston TK, Cooper LA, Vu HT, et al. Screening the public for depression through the
internet. Psychiatr Serv. 2001;52(3):362–367.
42. Leon AC, Kelsey JE, Pleil A, et al. An evaluation of a computer-assisted telephone
interview for screening for mental disorders among primary care patients. J Nerv Ment
Dis. 1999;187(5):308–311.
43. Brodey BB, Rosen CS, Brodey IS, et al. Reliability and acceptability of automated
telephone surveys among Spanish- and English-speaking mental health services
recipients. Ment Health Serv Res. 2005;7(3):181–184.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 157

44. Mitchell AM, Mittelstaedt ME, Schott-Baer D. Postpartum depression: the

reliability of telephone screening. MCN Am J Matern Child Nurs.
2006;31(6):382–387.
45. Ogles BM, France CR, Lunnen KM, et al. Computerized depression screening and
awareness. Community Ment Health J. 1998;34(1):27–38.
46. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for
depression (D-CAT). Qual Life Res. 2005;14(10):2277–2291.
47. Kobak KA, Mundt JC, Greist JH, et al. Computer assessment of depression: automating
the Hamilton Depression Rating Scale. Drug Inf J. 2000;34:145–156.
48. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce
the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368.
49. Gardner W, Shear K, Kelleher KJ, et al. Computerized adaptive measurement of
depression: a simulation study. BMC Psychiatry. 2004;4:13.
50. Educational Testing Services. Educational testing services. [Web document], 2000.
Accessed 7-30-2007.
51. Green B, Bock R, Humphreys L, et al. Technical guidelines for assessing computerized
adaptive tests. J Educ Measure. 1984;21:347–360.
52. Sands WA, Waters BK, McBride JR. Computerized adaptive testing: from inquiry to
operation. Washington, DC: APA Books, 1997.
53. Wainer H, Dorans NL. Computerized adaptive testing: a primer. Hillsdale, NJ:
Erlbaum Associates, 2000.
54. Ware JE Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and
computerized adaptive testing: a brief summary of ongoing studies of widely used
headache impact scales. Med Care. 2000;38(9 Suppl):II73–II82.
55. Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol. 1985;53(6):774–789.
56. Baer L, Brown-Beasley MW, Sorce J, et al. Computer-assisted telephone
administration of a structured interview for obsessive-compulsive disorder.
Am J Psychiatry. 1993;150(11):1737–1738.
57. Buchanan T. Online assessment: desirable or dangerous. Professional Psychology:
Research and Practice. 2002;33:148–154.
58. Carr AC, Ghosh A. Accuracy of behavioural assessment by computer. Br J Psychiatry.
1983;142:66–70.
59. Erdman HP, Klein MH, Greist JH. Direct patient computer interviewing. J Consult Clin
Psychol. 1985;53(6):760–773.
60. Erdman HP, Greist JH, Gustafson DH, et al. Suicide risk prediction by computer
interview: a prospective study. J Clin Psychiatry. 1987;48(12):464–467.
61. Kobak KA, Greist JH, Jefferson JW, et al. Computer-administered clinical rating scales.
A review. Psychopharmacology (Berl). 1996;127:291–301.
62. Peters L, Andrews G. Procedural validity of the computerized version of the Composite
International Diagnostic Interview (CIDI-Auto) in the anxiety disorders. Psychol Med.
1995;25(6):1269–1280.
63. Robins L, Helzer J, Cottler L, et al. NIMH Diagnostic Interview Schedule, Version III
Revised (DIS-III-R). St. Louis, MO: Washington University, 1989.
64. Rosenfeld R, Dar R, Anderson D, et al. A computer-administered version of the Yale-
Brown Obsessive-Compulsive Scale. Psychol Assess. 1992;4:329–332.
65. Shaffer D, Fisher P, Lucas CP, et al. NIMH Diagnostic Interview Schedule for Children
Version IV (NIMH DISC-IV): description, differences from previous versions, and
reliability of some common diagnoses. J Am Acad Child Adolesc Psychiatry.
2000;39(1):28–38.
158 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

66. Wilson FR, Genco KT, Yager GG. Assessing the equivalence of paper-and-pencil
versus computerized tests: Demonstration of a promising technology. Computers in
Human Behavior. 1985;1:265–275.
67. Rodriguez HP, von GT, Rogers WH, et al. Evaluating patients’ experiences with
individual physicians: a randomized trial of mail, internet, and interactive voice
response telephone administration of surveys. Med Care. 2006;44(2):167–174.
68. Davis LJ Jr, Hoffmann NG, Morse RM, et al. Substance Use Disorder Diagnostic
Schedule (SUDDS): the equivalence and validity of a computer-administered and an
interviewer-administered format. Alcohol Clin Exp Res. 1992;16(2):250–254.
69. Millstein S. Acceptability and reliability of sensitive information collected via
computer interview. Educational and Psychological Measurement. 1987;47:523–533.
70. Rosenman SJ, Levings CT, Korten AE. Clinical utility and patient acceptance of the
computerized composite international diagnostic interview. Psychiatr Serv.
1997;48(6):815–820.
71. Adler DA, McLaughlin TJ, Rogers WH, et al. Job performance deficits due to
depression. Am J Psychiatry. 2006;163(9):1569–1576.
72. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in
the United States: how did it change between 1990 and 2000? J Clin Psychiatry.
2003;64(12):1465–1475.
73. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive
disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA.
2003;289(23):3095–3105.
74. Wang PS, Patrick A, Avorn J, et al. The costs and benefits of enhanced depression care
to employers. Arch Gen Psychiatry. 2006;63(12):1345–1353.
75. Grove WM, Zald DH, Lebow BS, et al. Clinical versus mechanical prediction: a meta-
analysis. Psychol Assess. 2000;12(1):19–30.
76. Selim AJ, Berlowitz DR, Fincke G, et al. The health status of elderly veteran enrollees
in the Veterans Health Administration. J Am Geriatr Soc. 2004;52(8):1271–1276.
77. Tarlov AR, Ware JE Jr, Greenfield S, et al. The Medical Outcomes Study. An
application of methods for monitoring the results of medical care. JAMA.
1989;262(7):925–930.
78. Wells KB, Burnam MA, Camp P. Severity of depression in prepaid and fee-for-
service general medical and mental health specialty practices. Med Care.
1995;33(4):350–364.
79. Kilbourne AM, McGinnis GF, Belnap BH, et al. The role of clinical information
technology in depression care management. Adm Policy Ment
Health.2006;33(1):59–69.
80. Bliven BD, Kaufman SE, Spertus JA. Electronic collection of health-related quality of
life data: validity, time benefits, and patient preference. Qual Life Res.
2001;10(1):15–22.
81. Radosevich DM, Werni TL. A practical guide for implementing, analyzing, and
reporting outcomes measurements. Health Outcomes Institute, 1998.
82. Rind DM, Kohane IS, Szolovits P, et al. Maintaining the confidentiality of medical
records shared over the Internet and the World Wide Web. Ann Intern Med.
1997;127(2):138–141.
83. Soetikno R, Young HS, Keefe EB. Role of emerging technology in the era of cost
containment. Am J Gastroenterol. 1997;92:1038–1040.
84. Subramanian AK, McAfee AT, Getzinger JP. Use of the World Wide Web for multisite
data collection. Acad Emerg Med. 1997;4(8):811–817.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING 159

85. Barak A. Psychological applications on the Internet: a discipline on the threshold of a

new millennium. Applied and Preventive Psychology. 1999;8(4):231–245.
86. Blumenthal D, Glaser JP. Information technology comes to medicine. N Engl J Med.
2007;356(24):2527–2534.
87. Skinner HA, Allen BA. Does the computer make a difference? Computerized versus
face-to-face versus self-report assessment of alcohol, drug, and tobacco use. J Consult
Clin Psychol. 1983;51(2):267–275.
88. Greist JH, Gustafson DH, Stauss FF, et al. A computer interview for suicide-risk
prediction. Am J Psychiatry. 1973;130(12):1327–1332.
89. Kobak KA, Reynolds WM, Griest JH. Computerized and clinician assessment of
depression and anxiety: respondent evaluation and satisfaction. J Pers Assess.
1994;63(1):173–180.
This page intentionally left blank
9
SCREENING FOR DEPRESSION IN PRIMARY
CARE: CAN IT BECOME MORE EFFICIENT?

Kathryn M. Magruder and Derik E. Yeager

1. Introduction
2. Epidemiology of Depression in Primary Care
3. Is Screening for Depression in Primary Care Worthwhile?
4. Which Screening Tool Should Be Used?
5. Implementing Screening in Primary Care
6. What Developments Are on the Horizon?
7. Conclusions

Context
Screening for depression has been so widely advocated that the burden of proof
has shifted to skeptics who argue against it. Yet only recently has sufficient
evidence accrued to judge dispassionately the advantages and disadvantages
of screening. Here we discuss the evidence for specific tools and specific
strategies in improving the outcome of depression screening in primary care.

1. Introduction
In 1978, the Institute of Medicine defined primary care as ‘‘care that is
accessible, comprehensive, coordinated, continuous, and accountable.’’1
While the definition has evolved over time,2 these fundamental characteristics
are still valid today. Included in the primary care mission is to serve as the first
line for detection and either treatment or referral of common mental disorders,
including depression. The inclusion of first-line mental health services as a
component of primary care distinguishes primary care (including outpatient

161
162 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

clinics in managed care organizations, community hospitals, Veterans

Administration hospitals, teaching institutions, and other medical centers)
from care in more specialized clinical settings. The comprehensiveness of
primary care and the obligation of its providers for first-line care make it a
logical and appropriate venue for mental health screening.3
Complicating the issue, however, are the time constraints on primary care
providers. Although the amount of time spent per patient visit is about 20
minutes in the United States,4 the recommended services that should be
provided in that short period of time are daunting. It is therefore imperative
that these recommended services—in particular preventive health services—
be provided in the most efficient manner possible. Services that cannot be
provided efficiently and fit within the busy, fast-paced world of primary care
are at risk of being omitted. This is especially true for preventive mental health
services. Screening for depression is such a service; therefore, it is critical that
primary care providers make use of the best and most efficient depression
screening approaches possible.
In this chapter, we will address issues related to screening for depression in
the primary care context. We will start by briefly reviewing the epidemiology
of depression as related to primary care. Next, we will provide a critical
examination of the applicability to depression screening of the World Health
Organization’s criteria. Then we will review published screening tools and
their attributes for use in primary care settings. Last, we will provide a discus-
sion of future directions, including additional ways that screening for depres-
sion in primary care can be made more efficient and thus more effective and
more widely implemented.

2. Epidemiology of Depression in Primary Care

Population Prevalence of Depression
The National Comorbidity Survey Replication (NCS-R), conducted on
adults over 18 years old, found a 12-month prevalence of 9.5% for any
DSM-IV mood disorder, with 6.7% for major depression and 1.5% for
dysthymia.5 From this survey, 19.5% of major depression cases in the
community are classified as mild, with 50.1% and 30.4% classified as
moderate and serious, respectively.5 Thus, about 80% of those with major
depressive disorder have symptoms that are moderate to serious, and it is
likely that those who seek health services are in the higher spectra of
disorder. In a European epidemiologic study of mental disorders involving
six countries, major depression was the single most common disorder
assessed, with a 12-month prevalence of 3.9%.6 Wittchen and Jacobi7
conducted a meta-analysis of 27 studies with data on the prevalence of
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 163

mental disorders in European countries. The 12-month prevalence of major

depression ranged between 3.1% and 10.1%, with a median prevalence of
6.9%. Clearly, depression may be the most prevalent of mental disorders and
constitutes a worldwide problem affecting approximately 5% to 10% of
adults in a given year.

Primary Care Prevalence of Depression

An early compendium of studies showed that pre-DSM-III-R depression pre-
valence in primary care ranged from 4.8% to 8.6%.8 More recently, one of the
most comprehensive assessments of mental disorders in primary care was
conducted by the World Health Organization and involved 15 cities in 14
countries.9 Using the Composite International Diagnostic Interview (CIDI) as
the diagnostic assessment tool for DSM-III-R and ICD-10 conditions, this
study showed that the prevalence of current psychiatric disorders is 24% but
varies substantially by country.9 In particular, prevalence estimates for major
depression ranged from 2.6% in Nagasaki, Japan, to an exceptionally high
29.5% in Santiago de Chile (over 12% greater than the next highest—16.9% in
Manchester, England). The total prevalence of ICD-10 major depression was
10.4%. Although it is acknowledged that there is considerable variability
within a city or country based on the characteristics of a primary care clinic
(eg, inner-city clinics that serve disadvantaged patients may have higher
depression prevalence), and thus the findings of this study do not generalize
as national primary care prevalences, this important international study has
helped to solidify the importance of depression in primary care settings
throughout the world.
A number of studies have found significant prevalence and morbidity of
subthreshold disorders. For example, in a study of 619 primary care patients,
Backenstrass and associates10 found a prevalence of 4.6% for major depres-
sion, 6.2% for minor depression, and 9.1% for nonspecific depression symp-
toms. Levels of disability followed a similar pattern, with highest levels for
major depression and lowest levels for nonspecific depression symptoms.10
Thus, these ‘‘sub-major’’ forms of depression are not without associated
morbidity.

Primary Care is the ‘‘De Facto’’ Mental Health

Treatment System
Primary care has been termed the de facto mental health treatment system since
as many people with mental disorders receive treatment in general medical
settings as in mental health specialty settings.11,12 From Epidemiologic
164 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Catchment Area (ECA) data, it has been estimated that only 45% of those with
unipolar major depression used any health service in the 12 months prior;
27.8% sought care in the specialty mental health sector, while 25.3% sought
care in the general medical sector.11 Paralleling ECA findings, NCS-R data
have shown that 51.6% of those who met the criteria for major depression
received some health services for depression in the past 12 months, with 27.2%
in the general medical sector.13 This paper also examined symptom severity
with respect to treatment and found that only 12.8% of those in treatment in the
general medical sector were classified as mild cases—all others were moderate
and above.
It has been estimated that 50% to 80% of depression management occurs in
primary care. Harman and colleagues14 found that for older adults 64% of
depression visits occurred in primary care, representing only 3% of all elder
primary care visits, contrasted with 26% of depression visits occurring in
psychiatric care, representing 58% of all psychiatric elder visits. Thus, the
index of suspicion is likely to be low in primary care settings where the
prevalence is also low.
An analysis of National Ambulatory Medical Care Survey data showed that
for the average primary care doctor, 10.33 visits per week were considered
antidepressant medication visits, compared with 11.04 such visits for the
average psychiatrist.15 While antidepressant medication visits are slightly
higher for psychiatrists than for primary care physicians, it is likely that
primary care physicians initiate more antidepressant prescriptions but fewer
monitoring visits, while psychiatrists have fewer antidepressant-initiating
visits but more monitoring visits.

Unassisted Recognition of Depression in Primary Care

Ironically, while general medical settings are a primary venue for treating
mental disorders, a very large percentage of such disorders go unrecognized by
primary care providers and therefore untreated. Some reports suggest that
fewer than 50% of those with depression are so diagnosed in primary care
settings.16–18
The WHO primary care study found that overall, 54.2% of those who met
criteria for depression (ICD F32/33) were recognized as having a psycholo-
gical illness by their treating physician. This ranged from a low of 19.3% in
Nagasaki to a high of 74.0% in Santiago de Chile.19
Thus, studies show that depression is relatively common in primary care
settings, but many with depression go unrecognized. It is no wonder that a
number of screening tools have been developed to assist providers in recog-
nizing and diagnosing depression. Yet there are other issues to consider before
initiating screening programs.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 165

3. Is Screening for Depression in Primary Care Worthwhile?

Screening is an important aspect of prevention and early intervention for many
diseases and conditions, and this includes depression. WHO describes 10
criteria for initiating a screening program. Below, we discuss each criterion
along with issues that should be considered for clinically effective depression
screening. Because our focus is on primary care, we consider these criteria in
that context.

The Condition Should Be an Important Health Problem

With a depression prevalence of approximately 5% to 10% worldwide and 5%
to 20% in primary care settings, depression is considered an important health
problem. In addition to personal suffering, those with depression have signifi-
cantly worse functioning. Based on the landmark publication on worldwide
disability,20 Ustun and associates21 have updated earlier data and estimate that
depression was the fourth leading cause of global disease burden in the year
2000. The burden of depression on the healthcare system is equally significant.
The average medical costs (6-month period) for primary care patients in the
United States diagnosed with depression or anxiety were approximately twice
the average costs for patients with subthreshold depression or anxiety or no
disorder ($2,390 vs. $1,248),22 resulting in national annual medical costs of
approximately $26 billion (1990 dollars).23 For the most part, this burden is on
primary care in terms of recognition and treatment,24 including antidepressant
prescribing.25,26 On another level, the societal burden of depression is great,
and patients need not receive a clinical diagnosis of depression to experience
impaired functioning,27 missed workdays (at an annual national cost of $17
billion),23 and disability days,28 with impairment equal to or greater than that
found with other chronic conditions such as diabetes, arthritis, gastrointestinal
disturbances, lung disturbances, bronchitis, emphysema, and back problems.29
Thus, there is no doubt that at all levels depression is an important health and
public health problem.

There Should Be a Treatment for the Condition

A number of effective treatments exist for depression, including cognitive-
behavioral therapy and medications. In fact, the robust research basis for these
treatments has prompted a proliferation of treatment guidelines that provide
practical approaches for implementing these evidence-based practices for
primary care providers (see, for example, the Agency for Healthcare
Research and Quality website with depression guidelines).30
166 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Facilities for Diagnosis and Treatment Should Be Available

Although this tends to be setting-specific, more and more primary care practi-
tioners are recognizing their roles as first-line responders for depression
diagnosis and treatment. Additionally, many primary care practices incorpo-
rate mental health care specialists in their practice (eg, psychiatric nurse
specialist), are aligned with mental health specialists (ie, have a ready referral
source), or are part of larger healthcare organizations that incorporate mental
health services (eg, HMOs, U.S. Veterans Health Administration). Thus, when
there is a positive screen and a diagnosis of depression is made, treatment is
typically available within the practice or within a referral network.

There Should Be a Latent Stage of the Disease

Although the diagnosis of depression depends on the presence of symptoms,
the disorder can be considered to have a latent stage in the following sense.
Depression is often not detected clinically, patients do not spontaneously
report symptoms to providers, and patients themselves may not be aware that
their symptoms constitute depression. From NCS-R data, it has been estimated
that there is a delay of approximately 8 years between the onset of depression
and first receipt of professional help.31 Additionally, longstanding depression
is associated with disability as well as psychiatric and medical comorbidities,
which early detection and intervention may prevent.

There Should Be a Test or Examination for the Condition

As is detailed in the next section, a number of adequate depression screening tools
exist, including standard screeners (eg, the Zung Self-Rating Depression Scale
[SDS]),32 short screens (eg, Medical Outcomes Study Depression Screen [MOS-
D]),33 and some ultra-brief screens (eg, Patient Health Questionnaire [PHQ]-2).34
In addition, there are diagnostic interviews suitable for use in primary care, such
as the depression module of the Mini International Neuropsychiatric Interview
(M.I.N.I.),35 the Primary Care Evaluation of Mental Disorders (PRIME-MD),36
and the Symptom-Driven Diagnostic System for Primary Care (SDDS-PC).37

The Test Should Be Acceptable to the Population

Screens for depression are generally acceptable to both participants and the
staff who administer them.38,39 Diagnostic tools are lengthier and may be more
difficult for some patients; however, they are considered acceptable in terms of
risk and time. Certainly, relative to other recommended primary care screen-
ings (eg, colonoscopy), screening for depression is noninvasive, brief, and well
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 167

Burden Screening Tasks

Patient,
Screen
PC Staff

PC Staff Score

PCP Review results

– | +
Patient,
PC Staff, 2nd Stage Screen Diagnostic Work-up
PCP – + – +

Patient, Psych Education

PC Staff,
PCP
Watchful Wait Referral Treatment

Figure 9.1. Screening burden by task.

tolerated by patients, and results are relatively easy to interpret. In contrast to

some screenings such as colonoscopy and mammography, which require only a
referral from the primary care provider, depression screening typically requires
more clinician (nurse or physician) time to administer, interpret, and assess,
and (if positive) to treat or refer. Thus, the screening burden to clinicians is
significantly greater than to patients, and may well influence acceptability in
clinical practice (Fig. 9.1).

The Natural History of the Disease Should Be Adequately

Understood
Depression is known as a disorder with exacerbations and remissions.
Persistent depression is a risk factor for disability,40 both medical and psychia-
tric comorbidities,5 and suicide.41 There is evidence that early recognition and
effective treatment of depression can alter the trajectory by reducing disability
and premature mortality,42 promoting remission, and preventing relapse.43
There is also evidence suggesting that early recognition and effective treatment
of depression can improve patient outcomes such as social functioning, pro-
ductivity,44 and absenteeism.45
‘‘Sub-major’’ depression is often considered to be an integral part of the
natural course of major depression and is sometimes referred to as the pro-
dromal phase.46 Research has demonstrated that both subthreshold and sub-
syndromal depression are associated with increased functional disability47 and
have a negative impact on quality of life.48 Data from a randomized trial
of older adults (PROSPECT) show that patients initially presenting with
168 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

sub-major depression were five times more likely to have major depression
after 1 year.47 Thus, identification of these patients may help broaden the focus
of depression treatment to include a more preventive approach,49 allowing
patients to benefit from improved functional and quality-of-life outcomes and
receive more aggressive assessment and symptom monitoring to hasten recog-
nition of major depressive disorder.
Patients presenting with sub-major depression may, in fact, benefit from
treatment. Seligman and colleagues50 followed ‘‘at-risk’’ university students
and found that those randomized to receive weekly cognitive-behavioral therapy
workshop meetings had significantly fewer depressive symptoms after 8 weeks.

There Should Be an Agreed Policy on Whom to Treat

This may vary from site to site, with some advocating treatment for minor
depression and adjustment disorders with depressed mood. All clinical practice
guidelines advocate treating patients who meet the criteria for a diagnosis of
major depression. Several groups have shown that patients whose depression is
not recognized have milder forms of the disorder with less disability.51 To
some extent, treating those with ‘‘sub-major’’ depression may be a resource
issue. Some have advocated low-cost, low-intensity, nontraditional treatments
(eg, bibliotherapy, web-based self-help) where therapeutic intensity and cost
are aligned with symptom severity.52 While there may be benefit to treating
these sub-major conditions, those policy decisions should not compromise
system capacity to provide treatment for other important conditions.

The Total Cost of Finding a Case Should Be Economically

Balanced in Relation to Medical Expenditure as a Whole
Given the relatively short and inexpensive screening instruments, the availability
of structured diagnostic assessments for depression that can be administered in-
house for diagnostic follow-up, and the relatively moderate cost of treatment,
contrasted with the medical and psychiatric comorbid problems that are apt to
develop from lack of treatment, economics favor screening for depression. In a
cost-utility study, Valenstein and coworkers53 concluded that one-time
screening for depression is cost-effective, and more frequent screening is
likely to become more cost-effective with improvements in treatments.

Case-Finding Should Be a Continuous Process

Several studies have shown that depression can occur throughout the life-
span.5,54 Furthermore, it may have been present but not detected until many
years later. Thus, it makes sense to have in place a system that will screen
periodically throughout the lifespan.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 169

4. Which Screening Tool Should Be Used?

Primary care providers have a great deal to consider when selecting a screening
instrument, and there are many tools from which to choose, each with its own
set of attributes. Time is of obvious importance in primary care, and typically
the provider time to administer a screening tool and score it (rather than patient
time) is a key consideration. In the quest for brevity, screening tools have
evolved from standard screeners to short screeners to ultra-brief screeners.
Below, we consider a number of published screening tools organized by
administration time. In addition to time, we also consider scope of use,
administration/scoring, and performance.

Standard Screeners
In a recent article, Mitchell and Coyne55 defined a ‘‘standard’’ screening tool as
one that contains 15 or more items and takes, on average, more than 5 minutes
to complete. In addition to the term standard, many of these screeners can also
be defined as traditional, as many, including the Zung SDS,32 Beck Depression
Inventory (BDI),56 and Center for Epidemiologic Studies Depression Scale
(CES-D),57 have been in use since the early 1960s. Also, they have been
translated into dozens of languages and have been used in virtually every
health setting, including primary care and specialty clinics, and for research.
Table 9.4 provides details about the administration, scoring, and psychometric
performance of five ‘‘standard’’ depression screeners: the BDI,56 CES-D,57
Geriatric Depression Scale (GDS),58 Inventory for Depression (ID),59 and the
Zung SDS.32 The BDI,56 CES-D,57 and GDS58 are available in multiple,
typically shorter, versions. Some of these screeners offer situational advan-
tages over the others; for example, scoring results for the BDI and the Zung
SDS provide an estimate of symptom severity. The GDS was designed speci-
fically for use with geriatric patients. One must take these characteristics (and
others, such as self-administration and time frame of symptoms) into account
when selecting a screening tool. In general, all five of these screeners are well
suited for use in primary care settings; they are easy to administer, they are easy
to score, and they offer decent accuracy. Despite this, standard-length
screeners may seem cumbersome to some busy primary care providers who
prefer shorter alternatives.

Short Screeners
Short screeners, defined as consisting of 5 to 14 items and taking between 2 and
5 minutes to complete,55 include the Hospital Anxiety and Depression Scale
Table 9.4. Standard Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference

60–64 63 64
BDI Depression only* 7, 13, or 21 Score range: Sensitivity: 97% ; 89% (81–95) Original citation:
Severity of items* 0–63 Specificity: 99%63; 64% (59–68)64 Beck AT, Ward CH, Mock J,
symptoms today 2–5 min to Usual cut Efficiency: 0.9963 et al. An inventory for
complete point:10–19 False positive: 0.0163 measuring depression.
Literacy: Easy (mild), 20–29 False negative: 0.0063 Arch Gen Psychiatry.
Scoring: Simple (moderate), 30 LRþ: 4.2 (1.2;13.6)61; 2.564 1961;4:561–571
(severe) LR: 0.17 (0.1;0.3)61; 0.1764 www.psychcorpcenter.com/
Can be PPV: 84.0%63; 29.6% (10.7;57.6)62 content/bdi-ll.htm
self-administered AUC (95% CI): 0.87 (0.82–0.91)64
CES-D60–64 Depression only 10 or 20 items Score range: Sensitivity: 81%63; 93% (85–97)64 Original citation:
Frequency of 2–5 min to 0–60 Specificity: 72%63; 69% (65–74)64 Radloff L. The CES-D scale:
symptoms in the complete Usual cut point: Efficiency: 0.7263 A self-report depression scale
past week Literacy: Easy 16 False positive: 0.2763 for research in the general
Scoring: Simple False negative: 0.0163 population. Appl Psychol
LRþ: 3.3 (2.5; 4.4)61; 3.064 Meas. 1977;1:385–401.
Can be LR: 0.24 (0.2; 0.3)61; 0.1064 www.mhhe.com/hper/health/
self-administered PPV: 13.0%63; 24.8% (20; 30.6)62 personalhealth/labs/stress/
AUC (95% CI): 0.89 (0.85–0.92)64 activ2-2.html
GDS60,62 Depression only 15 or 30 items Score range: LRþ: 3.3 (2.4; 4.7)62 Original citation:
Endorsement of 2–5 min to 0–30 LR: 0.16 (0.1; 0.3)62 Yesavage JA, Brink TL, Rose
symptoms (y/n) in complete Usual cut point: PPV: 24.8% (19.4; 32)62 TL, et al. Development and
the past week Literacy: Easy 11 validation of a geriatric
Scoring: Simple depression screening scale: a
preliminary report.
J Psychiatr Res.
1982–83;17(1):37–49.
www.stanford.edu/
~yesavage/GDS.html
Table 9.4. (Continued)

Scope of Use Administration Scoring Performance Reference

ID60,61 Depression only 15 items Score range: Original citation: Popoff,
Recently 2–5 min to 0–15 L. M. A simple method for
complete Usual cut point: diagnosis of depression by the
Literacy: Easy 10 family physician. Clinical
Medicine. 1969 March:
24–29.
SDS60–63 Depression only 20 items Score range: Sensitivity: 100%63 Original citation: Zung, WW
Frequency of 2–5 min to 25–100 Specificity: 71%63 (1965) A self-rating
symptoms recently complete Usual cut point: Efficiency: 72%63 depression scale. Arch Gen
Literacy: Easy 50–59 (mild), False positive: 0.2863 Psychiatry 12, 63–70.
Scoring: Simple 60–69 False negative: 0.0063
(moderate), 70 LRþ: 3.3 (1.3; 8.1)62
Can be (severe) LR: 0.35 (0.2; 0.8)62
self-administered PPV: 15.0%63; 24.8% (11.5; 44.8)62
fpinfo.medicine.uiowa.edu/
calculat.htm

AUC, area under the curve; CI, confidence interval; LR, likelihood ratio; PPV, positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
172 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(HADS),65 MOS-D,33 and PHQ34 (Table 9.5). Many authors consider the
diagnostic performance of these intermediate-length screeners to range from
modest to good.55,64,66 Despite the advantage of both diagnostic performance
and brevity, a national U.K. survey demonstrated that they continue to be
underused in primary and secondary care settings.67 This lack of use may
have led to the development of even shorter screeners.

Ultra-short/Ultra-brief Screeners
What is the minimum number of items required to effectively screen for
depression? With the quest to reduce screening time, several new screening
instruments with four or fewer questions have been published. Mitchell and
Coyne have defined ultra-short/ultra-brief screeners as consisting of four or
fewer items and taking less than 2 minutes to complete (Table 9.6).55 Whooley
and colleagues64 reported data supporting a two-item screener, and the U.S.
Veterans Administration has adopted a four-item screener to satisfy a 1998
universal depression screening mandate. A meta-analysis on 22 studies that
assessed the accuracy of ultra-short screeners for depression in primary care
found that diagnostic rule-in accuracy increases with the number of items, with
two- and three-item screeners offering the greatest accuracy (80%) and one-
item screeners providing very poor accuracy (30%).55 No four-item screeners
met inclusion criteria for this analysis. The authors concluded that while two-
and three-item screeners can help providers identify 8 out of 10 depression
cases, it is most often at the expense of a high false-positive rate. They there-
fore argue for a two-stage screening approach when an ultra-brief screener is
employed.

Two-Stage Approaches
Another approach that may offer advantages in some situations or practices is
the use of a two-stage process. Screening followed by a standardized diagnostic
assessment has often been used in research projects for efficient identification
of potential subjects who meet criteria for major depression. The approach
enables investigators to avoid conducting diagnostic assessments on all sub-
jects, yet has the advantage of having screening information available on all
subjects, with diagnostic data on those above a certain screening threshold.
While in theory any screener could be combined with any acceptable
diagnostic assessment, two instruments that ‘‘package’’ both screening and
diagnosis, the SDDS-PC and PRIME-MD, were developed in the late 1990s
specifically for use in primary care settings.36, 37, 68, 69 These instruments were
intended for both clinical and research purposes. They were both designed to
Table 9.5. Short Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference

60,62
HADS Anxiety and 14 items Score range: 0–21 LRþ: 7.0 Original citation:
depression £2 min to complete Usual cut point: 11 (2.9; 11.2)62 Zigmond AS, Snaith RP. The
Severity of Literacy: Difficult LR: 0.3 (0.3; 0.4)62 Hospital Anxiety and Depression
symptoms in the Scoring: Simple PPV: 41.3% Scale. Acta Psychiatr Scand
past week (22.6; 52.8)62 1983;67:361–370.
www.clinical-supervision.com/
hads.htm
MOS-D60,61,64 Depression only 8 items Score range: 0–1 (logistic Sensitivity: 93% Original citation:
Frequency of <2 min to complete regression) (86–97)64 Burnam MA, Wells KB, Leake
symptoms in the Literacy: Average Usual cut point: 0.06 Specificity: 72% B, et al. Development of a brief
past week (68–76)64 screening instrument for
Can be LRþ: 3.364 detecting depressive disorders.
self-administered LR: 0.1064 Medical Care. 1988;26,775–789.
AUC (95% CI): 0.89
(0.85–0.91)64
PHQ60,62 Depression only 9 items Diagnosis: LRþ: 12.2 (8.4; 18)62 Original citation:
Frequency of <2 min to complete Score range: 0–9 LR: 0.28 (0.2; Kroenke K, Spitzer RL, Williams
symptoms in the Literacy: Average Usual cut point: 5 0.5)62 JBW. The PHQ-9: Validity of a
past 2 weeks Scoring: Simple symptoms PPV: 55% (45.7; brief depression severity
Severity: 64.3)62 measure. J Gen Intern Med.
Can be Score range: 0–27 2001;16:606–613.
self-administered Usual cut point: 0–4 www.depression
(none), 5–9 (mild), 10–14 primarycare.org/ap1.html
(moderate), 15–19
(major), 20 (severe)

LR, likelihood ratio; PPV, positive predictive value.

Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
Table 9.6. Ultra-Short Depression Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference

60 63
PRIME-MD Multiple 2 items Score range: 0–2 Sensitivity: 96% Original citation: Spitzer RL,
(PHQ-2)60–63 components with 1–2 min to complete Usual cut point: 160 Specificity: 57% Williams JB, Kroenke K, et al.
depression category Literacy: Average Efficiency: 0.5963 Utility of a new procedure for
Presence of Scoring: Complex FP: 0.4163 diagnosing mental disorders in
symptoms in FN: 0.0063 primary care. The PRIME-MD
the past month Can be LRþ: 2.7 (2.0; 3.7)62 1000 study. JAMA.
self-administered LR: 0.14 (0.1; 0.3) 62 1994;272:1749–1756.
PPV: Kroenke K, Spitzer RL,
21.3% (16.7–27)62 Williams JBW. The Patient
Health Questionnaire-2:
Validity of a two-item
depression screener. Medical
Care. 2003;41:1284–1292.
SDDS- Multiple 5 items Score range: 0–560 Sensitivity: 96%64 Original citation: Broadhead
PC60–62,64 components with 1–2 min to complete Usual cut point: 260 Specificity: 51%64 WE, Leon AC, et al.
depression category Literacy: Easy LRþ: 3.5 (2.4; 5.1)62 Development and validation of
Presence of Scoring: Complex LR: 0.2 (0.1; 0.4)62 the SDDS-PC screen for
symptoms in PPV: 25.9% (19.4; multiple mental disorders in
the past month 33.8)62 primary care. Arch Fam Med.
AUC (95% CI): 0.86 1995;4:211–219.
(0.82–0.89)64

AUC, area under the curve; CI, confidence interval; FN, false negative; FP, false positive; LR, likelihood ratio; PPV, positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 175

take minimum clinician time and still provide multiple psychiatric diagnoses in
primary care. Both instruments have a quick screen (sometimes referred to as
stem questions) for multiple psychiatric disorders, followed by specific dis-
order modules when so indicated by the quick screen. Both instruments include
major depression. Time burden is placed mainly on patients for the quick
screen and clinicians for the disorder modules (but only for the subset of
patients with a high likelihood of disorder). For practices interested in only a
single disorder, the screen questions and module for that disorder can be
selected for use. Notably, the developers of the PRIME-MD developed the
PHQ (with slightly improved sensitivity and specificity for major depression)
because the PRIME-MD was still considered too long to be clinically useful.36

Screening for General Emotional Distress

One fundamental issue is whether screening should be aimed at identifying
distress rather than depression alone. There are several popular tools that
screen for nonspecific psychiatric distress, including the General Health
Questionnaire (GHQ),70 the Hopkins Symptom Checklist (HSCL),71 the
World Health Organization Well-Being Scale (WHO-5),72 and the Emotional
State Questionnaire-2 (EST-Q2)73 (Table 9.7). A prospective cohort study
found that the WHO-5, a well-being screener, performed better in a primary
care setting than the GHQ-12, PHQ-9, or an unaided physician diagnosis when
compared to the CIDI as the gold standard for detection of depression.74
Despite the broadness of this approach, brevity can be achieved by taking
advantage of shared symptomatology and diagnostic comorbidity. Thus, the
specificity of the screener for a disorder may not matter so much, and it will be
up to the provider to sort out, for example, major depression from post-
traumatic stress or other anxiety disorders. Because first-line primary care
treatments for many disorders are similar (eg, pharmacotherapy with selective
serotonin reuptake inhibitors), this approach could work reasonably well in
primary care.

Screening for Multiple Disorders

For many providers, it may be worthwhile to implement a screener that covers
many disorders, including only one or two items for each disorder. Means-
Christensen and coworkers75 tested such an approach with the Anxiety and
Depression Detector (ADD) and found that screening for panic disorder, post-
traumatic stress disorder, social phobia, generalized anxiety disorder, and
major depression simultaneously offered advantages in time efficiency while
maintaining screener performance. The SDDS-PC and PRIME-MD
Table 9.7. General Psychiatric Screening Instruments Commonly Used in Primary Care

Scope of Use Administration Scoring Performance Reference

WHO-5 Measures 5 items Sensitivity: 94% Original citations:
degree of Specificity: 65% Bech P, Gudex C, Johansen KS. The
well-being False negative: 0.06 WHO(Ten) Well-Being Index:
PPV: 0.37 validation in diabetes. Psychother
NPV : 0.98 Psychosom. 1996;65:183–190.
LRþ: 2.69 Bech P, Olsen LR, Kjoller M, et al.
LR: 0.09 Measuring well-being rather than the
absence of distress symptoms: a
comparison of the SF-36 Mental Health
subscale and the WHO-Five
Well-Being Scale. Int J Methods
Psychiatr Res. 2003;12:85–91.
GHQ60,61,63 General 12, 28, or 30 Score range: 0–28 Sensitivity: 76%63 Original citation:
psychiatric items Usual cut point: 4 Specificity: 74%63 Goldberg DP. The detection of
distress 2–10 min to Efficiency: 0.7463 psychiatric illness by questionnaire.
Frequency of complete False positive: 0.2563 London, Oxford University Press,
symptoms in the Literacy: Easy False negative: 0.0163 1972.
past week PPV: 13.0%63
Can be
self-administered
Table 9.7. (Continued)

Scope of Use Administration Scoring Performance Reference

60,61
HSCL General distress 13 or 25 items Score range: 25–100 Original citation:
Frequency of 2–5 min to Usual cut point: 43 Derogatis LR, Lipman RS, Rickels K,
symptoms in the complete Uhlenhuth EH, Covi L. The Hopkins
past week Literacy: Symptom Checklist (HSCL): a self-
Average report symptom inventory Behav Sci.
1974 Jan; 19(1):1–15.

EST-Q276 Detection of 28 items Score Range: 0–112 Sensitivity: 81%76 Aluoja A, Shlik J, Vasar V, Luuk K,
symptoms Depression Depression subscale: Specificity: 81%76 Leinsalu M. Development and
characteristic of subscale: 8 items Score Range: 0–32 False Positive: 0.1976 psychometric properties of the
depressive and Time to Usual Cutpoint: >11 False Negative: 0.1976 Emotional State Questionnaire, a self-
anxiety complete: PPV: 0.4476 report questionnaire for depression and
disorders during unknown NPV : 0.9676 anxiety. Nord J Psychiatry 1999; 53:
the past four Literacy: LRþ: 4.376 443–449.
weeks unknown LR-: 0.2376

LR , likelihood ratio; NPV, negative predictive value; PPV; positive predictive value.
Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of
case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
178 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

(mentioned above) cover multiple disorders (including major depression) that

are prevalent and often undetected in primary care settings. They also cover
suicidality, an important consideration regardless of diagnosis.

Severity Rating
Depression screening instruments are important beyond case-finding.
Additional uses for certain instruments include monitoring symptom
levels (eg, frequency, severity) for ‘‘at-risk’’ patients or evaluating treat-
ment response/effectiveness. The types of screening instruments that
would be most valuable in these situations are those that provide severity
levels (eg, Zung SDS). The practice of ‘‘watchful waiting’’ (see Fig. 9.1)
involves following patients who present with symptomatology that may
be subthreshold or otherwise not sufficient for a clinical diagnosis of
depression, yet suggestive of an increased risk of developing depression
in the future. In this scenario, depression screeners can be administered
repeatedly over time to monitor symptom levels and determine symptom
changes and patterns (in much the same way that prostate-specific antigen
levels are monitored over time). Patients who have been clinically diag-
nosed with depression and are receiving treatment can be routinely admi-
nistered screeners both to assess treatment effectiveness and to determine
if additional interventions are required (the U.S. Preventive Services Task
Force recommends the PHQ-9 for this purpose).

5. Implementing Screening in Primary Care

Implementation of a screening strategy must be undertaken with both the
screening instrument performance characteristics and clinical context in
mind. In addition to considering overall staffing patterns and underlying
nonpsychiatric case mix, a key contextual issue is the estimated underlying
prevalence of depression in the clinic population. This, along with screening
instrument performance characteristics of sensitivity and specificity, allow one
to estimate resource use for various implementation strategies. Such exercises
can aid in determining the most parsimonious approach under various
scenarios.
Table 9.1a/b illustrates a one-stage screening approach using an instrument
with sensitivity and specificity both 80%. We present the results of using this
instrument under different prevalence scenarios: 5% and 10% (see Appendix
Tables 3 and 4 for additional scenarios). Assuming 5% prevalence, if 1,000
patients were screened for major depression, 230 would screen as positive, but
only 40 would actually have major depression (positive predictive value
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 179

Table 9.1a/b. Sample Performance Yields for Single-Stage Screening in Primary Care Setting
9.1a Prevalence: Low (5% or 50 MDD cases)

Gold Standard

MDD + MDD Total

Screen 40 190 230 PPV: 40/230 =

+ True Positive False Positive Screen Positive 17.4%. For every
100 subjects who
Screen 10 760 770 screen positive, only
False Negative True Negative Screen Negative approximately 17
50 950 1000 would be depressed.
MDD Positive MDD Negative Total Sample Excess diagnostic
burden: 190/1000 =
19%. Diagnostic
assessment would be
performed on 190
patients who were
not depressed.

9.1b Prevalence: Average (10% or 100 MDD cases)

Gold Standard

MDD + MDD Total

Screen 80 180 260 PPV: 80/260 =

+ True Positive False Positive Screen Positive 30.8%. For every
100 subjects who
Screen 20 720 740 screen positive, only
False Negative True Negative Screen Negative approximately 31
100 900 1000 would be depressed.
MDD Positive MDD Negative Total Sample Excess diagnostic
burden: 180/1000 =
18%. Diagnostic
assessment would be
performed on 180
patients who were
not depressed.

n = 1,000; Screener sensitivity 80%, specificity 80%

[PPV] 17.4%). That means that 190 false-positive patients would undergo
diagnostic assessment for major depression—an excess diagnostic burden of
19% (190/1,000). From this chart, it can be seen that as prevalence increases,
PPV also increases and excess diagnostic burden declines.
Table 9.2a/b illustrates a two-stage approach using an initial screener with
sensitivity of 95% and specificity of 60% and a follow-up screener of
180 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

sensitivity 80% and specificity 80%. Assuming prevalence of 5%, of the 1,000
patients screened in the first stage, 428 will be positive, of whom 48 are true
positive (PPV 11.2%). In stage two, these 428 are screened again, yielding 38
true positives of 114 screen positives for a more favorable PPV of 33.3%. The
cumulative yield from steps 1 and 2 combined would be 38 true positives (76%
sensitivity) and 874 true negatives (92%), with a PPV of 33% and a negative
predictive value (NPV) of 99%.

Table 9.2a/b. Sample Performance Yields for Two-Stage Screening in Primary Care
Setting
9.2a Depression Prevalence: Low (5% or 50 MDD cases)

Stage I
Gold Standard

MDD + MDD Total

Screen 48 380 428 PPV: 48/428 = 11.2%.

+ True Positive False Positive Screen Positive For every 100 subjects
who screen positive,
Screen 2 570 572 approximately 11
False Negative True Negative Screen Negative would be depressed.
50 950 1000
MDD Positive MDD Negative Total Sample

Stage II
Gold Standard

MDD + MDD Total

Screen 38 76 114 PPV: 38/114 = 33.3%.

+ True Positive False Positive Screen Positive For every 100 subjects
who screen positive,
Screen 10 304 314 approximately 33
False Negative True Negative Screen Negative would be depressed.
48 380 428 Overall excess
MDD Positive MDD Negative Total Sample diagnostic burden:
76/1,000 = 7.6%.
Diagnostic
assessment would be
performed on 76
patients who were not
depressed.
9.2b Depression Prevalence: Average (10% or 100 MDD cases)

Stage I
Table 9.2a/b. (Continued)

Gold Standard

MDD + MDD Total

Screen 95 360 455 PPV: 95/455 = 20.9%.

+ True Positive False Positive Screen Positive For every 100 screen
positives,
Screen 5 540 545 approximately 21
False Negative True Negative Screen Negative would be depressed
100 900 1000
MDD Positive MDD Negative Total Sample
Stage II
Gold Standard

MDD + MDD Total

Screen 76 72 148 PPV: 76/148 = 51.4%.

+ True Positive False Positive Screen Positive For every 100 screen
positives,
Screen 19 288 307 approximately 51
False Negative True Negative Screen Negative would be depressed
95 360 455 Overall excess
MDD Positive MDD Negative Total Sample diagnostic burden: 72/
1,000 = 7.2%.
Diagnostic
assessment would be
performed on 72
patients who were not
depressed.

n = 1,000; stage I screener sensitivity 95%, specificity 60%; stage II screener sensitivity 80%,
specificity: 80%

Table 9.3 assigns time costs to the various screening tasks, as well as
diagnostic assessment for screen-positive patients. In this table, we estimate
patient, staff, and clinician time under the various screening scenarios
(prevalence of 5%, 10%, 20%; one- and two-stage screening approaches).
We assume the same sensitivity and specificity of the screening instruments
as in Table 9.2. In a sample of 1,000 patients where the prevalence of
depression is 5%, we estimate the burden in patient time for a single-stage
screener to be 6,600 minutes. This is based on an estimate of 2 minutes per
patient for the initial screen, with 20 additional minutes for each screen-
positive patient. We estimate 2,000 minutes of non-physician staff time
(based on 2 minutes per patient) and 4,600 minutes of clinician time
(based on 20 minutes per screen-positive patient). In the single-screener

181
182 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Table 9.3. Screening and Diagnosis Time Burden for Patients, Staff, and Providers

Time Burden (min) MDD Prevalence

5% 10% 20%
A. Single-Stage Screening Approach
Sensitivity 80%, specificity 80%
Screening (patient) 2,000 (1000*2) 2,000 (1000*2) 2,000 (1000*2)
Scoring (staff) 2,000 (1000*2) 2,000 (1000*2) 2,000 (1000*2)
Screening yield 23.0% (230/1000) 26.0% (260/1000) 32.0% (320/1000)
Diagnostic interview
Patient 4,600 (230*20) 5,200 (260*20) 6,400 (320*20)
Provider 4,600 (230*20) 5,200 (260*20) 6,400 (320*20)
Positive predictive value 17.4% (40/230) 30.8% (80/260) 50% (160/320)
Total time
Patient 6,600 min 7,200 min 8,400 min
Staff 2,000 min 2,000 min 2,000 min
Provider 4,600 min 5,200 min 6,400 min

B1. Two-Stage Screening Approach: Stage I

Sensitivity 95%, specificity 60%
Screening (patient) 1000 (1000*1) 1000 (1000*1) 1000 (1000*1)
Scoring (staff) 1000 (1000*1) 1000 (1000*1) 1000 (1000*1)
Screening yield 42.8% (428/1000) 45.5% (455/1000) 51.0% (510/1000)

B2. Two-Stage Screening Approach: Stage II

Sensitivity 80%, specificity 80%
Screening (patient) 856 (428*2) 910 (455*2) 1,020 (510*2)
Scoring (staff) 856 (428*2) 910 (455*2) 1,020 (510*2)
Screening yield 26.6% (114/428) 32.5% (148/455) 42.4% (216 /510)
Diagnostic interview
Patient 2,280 (114*20) 2,960 (148*20) 4,320 (216*20)
Provider 2,280 (114*20) 2,960 (148*20) 4,320 (216*20)
Positive predictive value 33.3% (38/114) 51.4% (76/148) 70.4% (152/216)
Total time
Patient 4,136 min 4,870 min 6,340 min
Staff 1,856 min 1,910 min 2,020 min
Provider 2,280 min 2,960 min 4,320 min
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 183

example, patient burden and provider burden increase with increasing

prevalence.
We provide similar time estimates for the two-stage screening approach.
Here, patient burden decreases (because the initial screener is half the time of
the more comprehensive second-stage screener), staff burden decreases
slightly for prevalences of 5% and 10% but increases slightly for 20% pre-
valence, and provider time decreases significantly (because there are fewer
false positives to evaluate). In the above examples, we have emphasized
tangible costs and have not estimated costs of non-detection (false negative)
or costs of treatment.

6. What Developments Are on the Horizon?

Opinions concerning the appropriateness of screening for depression in
primary care have shifted over the past two decades. As more effective
treatments have become available to primary care providers, and as provi-
ders have become more knowledgeable about the importance of recognizing
and treating depression, there has been a shift towards advocating routine
screening in primary care settings. Many patients, however, are still
unwilling to accept a diagnosis of depression or treatment for depression.
Clinicians need to explain screening and diagnostic results in a way that is
non-stigmatizing. Providers must offer educational information and moti-
vate patients to accept treatment. Building depression treatment capabilities
may increase patient acceptance of both the diagnosis and treatment, as
treatment in primary care is seen as less stigmatizing, more timely, and more
integrated into overall healthcare.
Over the past two decades, remarkable progress has been made in screening
for depression in primary care. This can be seen in the change in U.S.
Preventive Services Task Force guidance,30 which recommends screening
adults ‘‘in clinical practices with systems in place to assure accurate diagnosis,
effective treatment, and follow-up.’’ It can also be seen in the myriad of
guidelines for detecting and treating depression in clinical practice, and the
tools that have been developed to assist in this. Clear advances have come in
reducing the burden of screening tools, so that some instruments with excellent
performance characteristics are as short as two questions. With increasing
acceptance of depression as a treatable illness to both patients and providers,
parallel gains need to be made in terms of implementation of screening and
early detection practices.
Further efficiencies in the screening benefit–cost ratio will need to be made
by improving treatment outcomes or by reducing screening time.
Psychometricians will be hard-pressed to develop briefer screening tools
184 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

than the current group of ultra-brief instruments described in Table 9.1, but
new methods might be able to focus on those who most need help. For
example, New Zealand researchers have found that following the two
PRIME-MD screening items with the question, ‘‘Is this something with
which you would like help?’’ improves specificity from 78% to 89% (sensi-
tivity remained the same at 96%) (CIDI was the gold standard).77 This would
theoretically improve efficiencies by selecting patients who are likely to accept
treatment for depression.
Another possibility is to reduce clinician and staff time by modifying the
screening modality. For example, waiting rooms could contain carrels with
computers where patients could update their histories and answer screening
questions. Notable results could be flagged and printed for clinicians to address
with the patient in the exam room. Similarly, patients could undertake similar
updates and self-screens on their home computers, again with results going
automatically and confidentially to providers. Automated computer reminders
for clinicians to perform depression (and other) screens could also improve
efficiency, as would making use of trained nurses to administer the screens and
flag positive results for the provider (see Chapter 8 for further discussion).
Depending on the practice, a two-stage screening process is another possibility,
using extremely brief first-level screens followed by more intense second-level
diagnostic assessments when indicated. Such second-level assessments could
even include self-administered instruments that are considered diagnostic in
nature, such as the PRIME-MD or the SDDS.
Considering depression screening in the context of other psychiatric illness
may broaden our notion of screening effectiveness. For example, depressive
symptoms frequently co-occur with generalized anxiety, post-traumatic stress,
and substance use disorders. False-positive screening results for depression
may be less worrisome in that such patients may be positive for any of these
three. Thus, a positive screen—though false positive for depression—may in
essence correctly identify patients in need of mental health treatment, even if
not for depression.
Timing is yet another way to improve efficiency. Screening less often (eg,
every 2 to 5 years instead of every year) would minimize the cost of the
screening itself but at the expense of a lower detection rate. Approaches could
be developed that take into account patient profiles to target screening to
those at highest risk. Similarly, prior screening results (eg, subthreshold
scores, positive screens for other mental health conditions, or answers to
highly predictive questions) could be used to generate a screening frequency
algorithm. In an age with electronic medical records and computer-generated
clinical reminders, the ability to develop and implement such frequency
algorithms based on individual profiles may not be as far away as it once
seemed.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 185

7. Conclusions
Screening for depression in primary care has changed radically in the past 20
years. With improvements in depression treatment, reduced stigmatization,
better acceptance of depression as a treatable illness, and more efficient
screening tools, primary care providers have embraced the notion that they
are responsible for recognizing and treating this condition. Fortunately,
providers have many excellent screening tools from which to choose. For
additional efficiencies to be realized, advances in technology (eg, compu-
terized screening and scoring), along with improved treatment outcomes,
will need to take place to change the benefit–cost ratio for depression
screening even more favorably.

References
1. Institute of Medicine (IOM). A manpower policy for primary health care. Washington,
DC: National Academy of Sciences, 1978.
2. Starfield B. Primary care: concept, evaluation, and policy. New York: Oxford
University Press, 1992.
3. Culpepper L. The active management of depression. J Fam Pract. 2002;51:769–776.
4. Mechanic D, McAlpine DD, Rosenthal M. Are patients’ office visits with physicians
getting shorter? N Engl J Med. 2001;344(3):198–204.
5. Kessler RC, Chiu WT, Demler O, et al. Prevalence, severity, and comorbidity of
12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch
Gen Psychiatry. 2005;62(6):617–627.
6. Alonso J, Angermeyer MC, Bernert S, et al. 12-Month comorbidity patterns and
associated factors in Europe: results from the European Study of the Epidemiology of
Mental Disorders (ESEMeD) project. Acta Psychiatr Scand. 2004;109(s420):28–37.
7. Wittchen H-U, Jacobi F. Size and burden of mental disorders in Europe—a critical
review and appraisal of 27 studies. Eur Neuropsychopharmacol. 2005;15(4):357–376.
8. Depression Guideline Panel. Depression in primary care, vol 1. Detection and
diagnosis. Clinical Practice Guideline, No. 5. Rockville, MD: DHHS Pub Hlth Serv.
AHCPR Publication No. 93–0550, 1993.
9. Goldberg D, Lecrubier Y. Chapter 4.1. Form and Frequency of Mental Disorders across
Centres. In Mental illness in general health care: an international study. Chichester:
John Wiley and Sons, 1995.
10. Backenstrass M, Frank A, Joest K, et al. A comparative study of nonspecific depressive
symptoms and minor depression regarding functional impairment and associated
characteristics in primary care. Compr Psychiatry. 2006;47(1):35–41.
11. Regier D, Goldberg I, Taube C. The de facto US mental and addictive disorders service
system. Epidemiologic Catchment Area prospective 1-year prevalence rates of
disorders and services. Arch Gen Psychiatry. 1993;50:85–94.
12. Regier D, Narrow W, Rae D, et al. The de facto US mental health services system: a
public health perspective. Arch Gen Psychiatry. 1978;35(6):685–693.
13. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive
disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA.
2003;289(23):3095–3105.
186 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

14. Harman JS, Veazie PJ, Lyness JM. Primary care physician office visits for depression
by older Americans. J Gen Intern Med. 2006;21(9):926–930.
15. Pincus HA, Tanielian TL, Marcus SC, et al. Prescribing trends in psychotropic
medications: primary care, psychiatry, and other medical specialties. JAMA.
1998;279(7):526–531.
16. Bridges K, Goldberg D. Somatic presentation of DSM III psychiatric disorders in
primary care. J Psychosom Res. 1985;29:563–569.
17. Magruder-Habib K, Zung W, Feussner J. Improving physicians’ recognition and
treatment of depression in general medical care: results of randomized clinical trial.
Med Care. 1990;28(3):239–250.
18. Wilson D, Widmer R, Cadoret R, et al. Somatic symptoms: a major feature of
depression in a family practice. J Affective Disorders. 1983;5:299–307.
19. Ustun T, Von Korff M. Chapter 4.3. Primary mental health services: access and
provision of care. In Mental illness in general health care: an international study.
Chichester: John Wiley and Sons, 1995.
20. Murray C, Lopez A, eds. The global burden of disease: a comprehensive assessment of
mortality and disability from diseases, injuries and risk factors in 1990 and projected to
2020. Cambridge, MA: Harvard University Press on behalf of the World Health
Organization and the World Bank, 1996.
21. Ustun TB, Ayuso-Mateos JL, Chatterji S, et al. Global burden of depressive disorders in
the year 2000. Br J Psychiatry. 2004;184:386–392.
22. Simon G, Ormel J, VonKorff M, et al. Health care costs associated with depressive and
anxiety disorders in primary care. Am J Psychiatry. 1995;152(3):352–357.
23. Greenberg P, Stiglin L, Finkelstein S, et al. Depression: a neglected major illness. J Clin
Psychiatry. 1993;54(11):419–424.
24. Manderscheid RW, Rae DS, Narrow WE, et al. Congruence of service utilization
estimates from the Epidemiologic Catchment Area Project and other sources. Arch
Gen Psychiatry. 1993;50(2):108–114.
25. Beardsley RS, Gardocki GJ, Larson DB, et al. Prescribing of psychotropic medication by
primary care physicians and psychiatrists. Arch Gen Psychiatry. 1988;45(12):1117–1119.
26. Simon GE, VonKorff M, Wagner EH, et al. Patterns of antidepressant use in community
practice. Gen Hosp Psychiatry. 1993;15(6):399–408.
27. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed
patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919.
28. Broadhead WE, Blazer DG, George LK, et al. Depression, disability days, and days lost
from work in a prospective epidemiologic survey. JAMA. 1990;264(19):2524–2528.
29. Wells KB, Golding J, Burnam MA. Psychiatric disorder and limitations in physical
functioning in a general population. Am J Psychiatry. 1988;145:712–717.
30. Guide to clinical preventive services. AHRQ Publication No. 06–0588, Agency for
Healthcare Research and Quality, Rockville, MD. Available at: http://www.ahrq.gov/
clinic/pocketgd.htm. 2006.
31. Wang PS, Berglund P, Olfson M, et al. Failure and delay in initial treatment contact
after first onset of mental disorders in the National Comorbidity Survey Replication.
Arch Gen Psychiatry. 2005;62(6):603–613.
32. Zung, WW (1965) A self-rating depression scale. Arch Gen Psychiatry 12, 63–70.
33. Burnam MA, Wells KB, Leake B, & Landsverk J (1988). Development of a brief
screening instrument for detecting depressive disorders. Medical Care, 26, 775–789.
34. Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: Validity of
a two-item depression screener. Medical Care. 2003;41:1284–1292.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 187

35. Lecrubier Y, Sheehan DV, Weiller E, Amorim P, Bonora I, Sheehan K Harnett, Janavs J
and Dunbar GC (1997) The Mini International Neuropsychiatric Interview (MINI). A
short diagnostic structrued interview: reliability and validity according to the CIDI. Eur
Psychiat 12, 224–231.
36. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care. The PRIME-MD 1000 study. JAMA.
1994;272:1749–1756.
37. Broadhead WE, Leon AC, et al. Development and validation of the SDDS-PC screen
for multiple mental disorders in primary care. Arch Fam Med. 1995;4:211–219.
38. Bermejo I, Niebling W, Berger M, et al. Patients’ and physicians’ evaluation of the
PHQ-D for depression screening. Primary Care and Community Psychiatry.
2005;10(4):125–131.
39. Loerch B, Szegedi A, Kohnen R, et al. The primary care evaluation of mental disorders
(PRIME-MD), German version: a comparison with the CIDI. J Psychiatr Res.
2000;34(3):211–220.
40. Ormel J, Von Korff M, Oldehinkel A, et al. Onset of disability in depressed and non-
depressed primary care patients. Psychol Med. 1999;29:847–853.
41. American Psychiatric Association. Diagnostic and statistical manual of mental
disorders. Vol. 4th ed., text rev. Washington, DC, 2000.
42. Sherman L. Depression and medical illness. Audio Digest Psychiatry. 2004;33(16):1–6.
43. Halfin A. Depression: the benefits of early and appropriate treatment. Am J Manag
Care. 2007;13:S92–S97.
44. Coulehan J, Schulberg H, Block M, et al. Treating depressed primary care patients
improves their physical, mental, and social functioning. Arch Intern Med.
1997;157:1113–1120.
45. Rost K, Smith J, Dickinson M. The effect of improving primary care depression
management on employee absenteeism and productivity. A randomized trial. Med
Care. 2004;42:1202–1210.
46. Eaton WW, Badawi M, Melton B. Prodromes and precursors: epidemiologic data
for primary prevention of disorders with slow onset. Am J Psychiatry.
1995;152(7):967–972.
47. Lyness JM, Heo M, Datto CJ, et al. Outcomes of minor and subsyndromal depression
among elderly patients in primary care settings. Ann Intern Med. 2006;144(7):496–504.
48. Wells KB, Burnam MA, Rogers W, et al. The course of depression in adult outpatients.
Arch Gen Psychiatry. 1992;49:788–794.
49. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive
disorder: a systematic review of prospective studies. Acta Psychiatr Scand.
2004;109(5):325–331.
50. Seligman MEP, Schulman P, DeRubeis RJ, et al. The prevention of depression and
anxiety. Prevention & Treatment. 1999;2(1).
51. Simon G, Goldberg D, Tiemens B, et al. Outcomes of recognized and unrecognized
depression in an international primary care study. Gen Hosp Psychiatry.
1999;21(2):97–105.
52. Magruder KM, Calderone GE. Public health consequences of different
thresholds for the diagnosis of mental disorders. Compr Psychiatry. 2000;41(2,
Supplement 1):14–18.
53. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in
primary care. Ann Intern Med. 2001;134(5):345–360.
188 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

54. Pálsson S, Östling S, Skoog I. The incidence of first-onset depression in a population

followed from the age of 70 to 85. Psychol Med. 2001;31:1159–1168.
55. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect
depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J
Gen Practice. 2007;57:144–151.
56. Beck AT, Ward CH, Mock J, Erbaugh J. An inventory for measuring depression.
Archives of General Psychiatry. 196;4:561–571.
57. Radloff, L. The CES-D scale: A self-report depression scale for research in the general
population. Appl Psychol Meas 1:385–401, 1977.
58. Yesavage JA, Brink TL, Rose TL, Lum O, Huang V, Adey M, Leirer VO. Development
and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr
Res. 1982–83;17(1):37–49.
59. Popoff LM. A simple method for diagnosis of depression by the family physician.
Clinical Medicine. 1969 March:24–29.
60. Williams JW, Pignone M, Ramirez G, Perez Stellato C. Identifying depression in
primary care: a literature synthesis of case-finding instruments. General Hospital
Psychiatry 2002;24(4):225–237.
61. Mulrow CD, Williams JW, Gerety MB, Ramirez G, Montiel OM, Kerber C. Case-
Finding Instruments for Depression in Primary Care Settings. Ann Intern Med
1995;122(12):913–921.
62. Nease DE, Jr., Malouin JM. Depression screening: a practical strategy. (Applied
evidence: research findings that are changing clinical practice). Journal of Family
Practice 2003;52(2):118(8).
63. McAlpine DD, Wilson AR. Screening for depression in primary care: what do we still
need to know? Depression & Anxiety (1091–4269) 20041;19(3):137–145.
64. Whooley M, Avins A, Miranda J, et al. Case-finding instruments for depression: two
questions are as good as many. J Gen Intern Med. 1997;12(7):439–445.
65. Zigmond AS, Snaith RP. The hospital anxiety and depression scale, Acta Psychiatr
Scand 1983;67:361–70.
66. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility
of depression screening. Psychiatr Serv. 1998;49(1):55–61.
67. Gilbody S, Whitty P, Grimshaw J, et al. Improving the recognition and management of
depression in primary care. Effective Health Care Bull. 2002;7(5).
68. Weissman MM, Broadhead WE, Olfson M, et al. A diagnostic aid for detecting
(DSM-IV) mental disorders in primary care. Gen Hosp Psychiatry.
1998;20(1):1–11.
69. Weissman M, Olfson M, Leon AC, et al. Brief diagnostic interviews (SDDS-PC) for
multiple mental disorders in primary care: a pilot study. Arch Fam Med.
1995;4(3):220–227.
70. Goldberg DP. The detection of psychiatric illness by questionnaire. London, Oxford
University Press, 1972.
71. Derogatis LR, Lipman RS, Rickels K, Uhlenhuth EH, Covi L. The Hopkins Symptom
Checklist (HSCL): a self-report symptom inventory Behav Sci. 1974 Jan;19(1):1–15.
72. Bech P, Olsen LR, Kjoller M, Rasmussen NK: Measuring well-being rather than the
absence of distress symptoms: a comparison of the SF-36 Mental Health subscale and
the WHO-Five Well-Being Scale. Int J Methods Psychiatr Res 12:85–91, 2003.
73. Aluoja A, Shlik J, Vasar V, Luuk K, Leinsalu M. Development and psychometric
properties of the Emotional State Questionnaire, a self-report questionnaire for
depression and anxiety. Nord J Psychiatry 1999;53:443–449.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE 189

74. Henkel V, Mergl R, Kohnen R, et al. Identifying depression in primary care: a

comparison of different methods in a prospective cohort study. BMJ.
2003;326(7382):200–201.
75. Means-Christensen AJ, Sherbourne CD, Roy-Byrne PP, Craske MG, and Stein MB.
Using five questions to screen for five common mental disorders in primary care:
diagnostic accuracy of the Anxiety and Depression Detector. General Hospital
Psychiatry 2006; 28(2): 108–111.
76. Ööpik P, Aluoja A, Kalda R, et al. Screening for depression in primary care. Fam Pract.
2006;23(6):693–698.
77. Arroll B, Goodyear-Smith F, Kerse N, et al. Effect of the addition of a ‘‘help’’ question
to two screening questions on specificity for diagnosis of depression in general practice:
diagnostic validity study. BMJ. 2005;331:884–887.
This page intentionally left blank
10
SCREENING FOR DEPRESSION IN MEDICAL
SETTINGS: ARE SPECIFIC SCALES USEFUL?

Gordon Parker and Matthew Hyett

1. An Introductory Logic
2. Depression in the Medically Ill
3. ‘‘False-Positive’’ Depression Reflecting Confounding by
Physical Symptoms Associated with Medical Illness
4. Screening Measures Used to Assess Depression in the Medically Ill
5. Discussion

Context
There are two broad strategies for screening and quantifying depression in
medical settings. The first approach is replying upon measures developed in
psychiatric samples, and the second approach is to concede that symptoms are
substantially different and to develop customized scales. Here we discuss the
merits of several specific scales for measuring depression in physical settings
and make the case for scales tailored to specific populations. A subsequent
chapter (Babaei and Mitchell) will present a contrasting position.

1. An Introductory Logic
There are two broad strategies for screening and quantifying depression in
medical settings. The first approach involves using measures developed in
psychiatric samples and assuming that their relevance holds. The second
approach is to concede that there are intrinsic limitations to extrapolating
those ‘‘general’’ measures to medically ill populations. In the former case the
hypothesis is that symptoms of depression are essentially the same when

191
192 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

depression occurs with and without physical illness. In the latter case the
hypothesis is the symptoms are substantially different. Pursuing the latter,
there are two key concerns.
Firstly, such an approach assumes some constancy to the nature of depres-
sion across differing psychiatric and medical settings. Depression, however, is
difficult enough to define in psychiatric patient samples. Even ignoring the
debate as to whether depression is viewed as comprising a set of subtypes or is
best modeled along a continuum, quantifying clinical depression remains
problematic, as detailed elsewhere in this book. Over the past few decades,
clinical depression has most commonly been viewed as synonymous with
major depression, but, as numerous studies have shown, comparable sympto-
matic distress and disability associated with major depression and minor
depression—and even with subsyndromal depression1,2—begs an obvious
question: Can imposing a cutoff score on a dimensional measure of depression
accurately distinguish true cases and true non-cases in a psychiatric sample?
Further, assuming that a cutoff is derived with an acceptable classification rate,
can we extrapolate decision rules derived from psychiatric samples to screen
and quantify depression caseness in the medically ill? As measures that have
been widely used for decades (such as the Zung and the Beck Depression
Inventory) generate widely differing cutoff scores across psychiatric, general
practice, and medical settings, there would appear to be quantitatively and
possibly qualitative differences to the nature of depression in medical contexts,
making general measure extrapolation problematic.
The second issue of concern is a methodologic one. Many measures used to
assess depression in psychiatric samples weight features such as fatigue,
anergia, anhedonia, and loss of interest, as well as appetite and sleep changes.
However, it is quite possible for nondepressed patients with a medical illness to
rate positively on such items purely as a consequence of their physical problem
or of the drugs being used to treat the medical condition, or even of being
hospitalized. Such confounding clearly risks false-positive scores, which then
will inflate case identification and severity estimates. This issue also requires
some consideration.

2. Depression in the Medically Ill

The 12-month prevalence and odds of major depression are high in individuals
with chronic medical conditions, and major depression is associated
with significant increases in utilization, lost productivity, and functional dis-
ability.3 Those with a medical illness may have a co-occurring depressive
illness (melancholic or nonmelancholic) that is similar in all regards to
those depressive conditions observed in a psychiatric context. However,
many with a medical illness will more have a grief-like reaction to the
10 ARE SPECIFIC SCALES USEFUL? 193

medical illness per se. Here, instead of experiencing the primary defining
feature of depression—a loss in self-esteem or self-worth—as might be
expected for an individual with clinical depression, they may more be grieving
the loss of their previous healthy role and have no impairment of self-esteem.
In addition, medical illness itself can cause psychological features approaching
the phenomenology of depression. Cassell4 has emphasized (i) disconnection
from the usual world, (ii) a loss of the sense of indestructibility or omnipotence,
(iii) a loss of competence and completeness of reason, and (iv) a loss of control
of the sufferer’s world. He notes that, as illness deepens, medically ill people
become more and more withdrawn from their usual world, their previous
interests, friends, and families, reflecting that, ‘‘We exist to the extent that
we are connected.’’ When medically ill patients experience such feelings, they
will frequently develop irritability, anxiety, fear, and even depression. The
disconnection can occur rapidly after events such as a myocardial infarction or
severe trauma, or be gradual following the development of a chronic disease or
long-term illness.4 The loss of the sense of omnipotence is commonly handled
by denial and/or disavowal as the individual seeks to preserve his or her
intactness. The loss of control—where the patient perceives himself or herself
as helpless—can be one of the most distressing of human experiences.
According to Cassell,4 such features are illness. While they sometimes approx-
imate to depressive phenomenology, they can be distinguished by careful
clinical inquiry—but not always by simple screening measures. In essence,
there is a distinction between the experiential components of illness and
depression. Thus, in screening for depression in the medically ill, there is a
need to ensure that items are not confounded by questions that risk false-
positive responses emerging from those with a nondepressive illness.

3. ‘‘False-Positive’’ Depression Reflecting Confounding by

Physical Symptoms Associated with Medical Illness
As noted earlier, individuals with many medical conditions might be expected
to report features such as loss of interest, anergia, and sleep and appetite
disturbances, which, if secondary to the medical illness and not a reflection
of depression, will tend to inflate depression estimates.
A number of options have been proposed to redress such confounding
influences. Several authors5 have argued for an inclusive approach. Here,
every relevant depressive symptom is counted even if secondary to the
illness or its management, with or without subsequent adjustment to
threshold scores to calibrate caseness estimates. A contrasting exclusive
approach6 ignores features common to those with medical illness. A third
substitutive approach7 involves substituting psychological symptoms (eg,
tearfulness and social withdrawal) or vegetative symptoms (eg, weight loss,
194 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

appetite and sleep disturbance, fatigue, and concentration difficulties).

Fourthly, both DSM-III-R and DSM-IV decision rules allow an etiologic
approach, whereby symptoms are counted only if they are judged as not
being caused by a general medical condition. The last approach requires the
rater to make interpretative (and thus subjective) judgments.
Common sense would suggest that there would be advantages to having
measures of depression in medical settings that assess items defining depres-
sion per se and that are unlikely to be confounded by aspects of the medical
illness or of its treatment. Such an approach therefore rejects the use of general
depression measures, and argues for consideration only of measures that have
been designed to preempt confounding influences. We now review measures
that have been widely advanced and/or specifically designed for measuring
depression in medical settings.

4. Screening Measures Used to Assess Depression in the

Medically Ill
The Hospital Anxiety Depression Scale
This seven-item subscale (HADS8) is one of the most commonly used research
measures of depression in the medically ill. As the authors judged anhedonia to
be a central feature of depression and a predictor of antidepressant drug
response, five of its seven items assess anhedonia (eg, ‘‘I still enjoy the
things I used to enjoy’’; ‘‘I feel cheerful’’; ‘‘I have lost interest in my appear-
ance’’; ‘‘I look forward with enjoyment to things’’; and ‘‘I can enjoy a good
book or program’’), suggesting some redundancy. For this dimensional mea-
sure, the authors suggested that a cutoff score of 11 or more indicates a definite
case of depression, while noncases score less than 8 and doubtful cases score in
the 8 to 10 range. While this scale is widely used, Hermann9 noted that there is
‘‘still no comprehensive documentation of its psychometric properties,’’ while
its actual validity has been challenged in both medically unwell and psychiatric
patients.10 The positive predictive value (PPV) of the HADS in the latter study
showed poor discrimination, with only 17% of medically ill patients accurately
diagnosed at a cutoff of 8, rising to just 25% at a cutoff of 11. Moreover, a
recent review of the validity of the HADS11 identified differing optimal cutoffs
across differing primary care populations, suggesting that its case-finding
ability is dependent on sample characteristics. For instance, its use in general
practice settings revealed areas under the curve in the range of 0.84 to 0.96,
though its translation to more specific medical settings (eg, stroke clinics)
reveals uncharacteristically low case-finding cutoffs (ie, 4). Thus, the
validity of the HADS as a measure of depression in divergent medical settings
lacks support.
10 ARE SPECIFIC SCALES USEFUL? 195

The Beck Depression Inventory for Primary Care (BDI-PC)

This seven-item measure12 was developed for primary care (and therefore
medical settings) by removing somatic items from the well-established Beck
Depression Inventory.13 Sadness and loss of pleasure or anhedonia were
included on an a priori basis, as at least one of these symptoms is necessary
for a DSM-IV diagnosis of major depression. Suicidal ideation was also chosen
on an a priori basis, being judged as an important clinical indicator of suicidal
risk. The remaining four items—pessimism, past failure, self-dislike, and self-
criticalness—were derived empirically from data obtained from a study of 500
psychiatric patients. A cutoff score of 4 or more is used to define depression
caseness, with sensitivity and specificity being quantified at 82% to 99%
across medical inpatient and outpatient samples.14–16 In a head-to-head com-
parison of the BDI-PC and HADS depression measures, the former was shown
to be superior in distinguishing depressed nondepressed patients referred to a
consultation-liaison service.12

The Depression in the Medically Ill (DMI) Scales

These scales were developed by our research team17 with the objective of
developing a valid measure of depression in the medically ill by focusing on
cognitive symptoms. In comparison to Beck’s strategy of stripping somatic
items from an accepted measure of depression, we adopted a ‘‘bottom-up’’
approach of specifically studying those with medical illness to generate
possible salient constructs. In essence, we selected 81 items assessing the
impact of a medical illness on the individual4 as well as ones capturing
cognitive aspects of depression (eg, anhedonia, self-reproach, nonreactive
mood).
Items were scored by subjects on a three-point scale (‘‘not true at all,’’ ‘‘true
to some degree,’’ ‘‘very true’’) for the previous 2 to 3 days. The initial study
population comprised inpatients and outpatients of a large Sydney teaching
hospital being treated for a primary medical condition. A research psychiatrist
subsequently (i) made a dimensional estimate of any depression and (ii) judged
whether there was any current depression of clinical significance (ie, major
depression or an adjustment disorder with depressed mood). A number of the
subjects also completed the HADS and BDI-PC measures so that the compara-
tive properties of the measures could be examined.
We refined the initial 81-item measure by removing items affirmed by both
depressed and nondepressed subjects. We also deleted items that, while
weighted to depression, had a low prevalence (eg, suicidal ideation), resulting
in a final set of 16 items. Of interest, the measure did not appear limited to
196 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

assessing depressed mood, including a brooding item and two items having
anxiety connotations (fearfulness and insecurity).
The internal consistency of the derived DMI-16 was high (alpha = 0.95), and
total score measures correlated highly with the BDI-PC (0.80) and HADS
(0.72) measures. When correlated against depression severity as estimated by
the psychiatrist, the DMI-16 returned a high coefficient (0.74), slightly in
excess of the BDI-PC (0.68) and superior to the HADS (0.54). A receiver
operating characteristic curve (ROC) analysis derived a cutoff score of 18 or
more with both high sensitivity (100%) and specificity (96%). Of the 29
subjects who received a psychiatrist-rated judgment of a clinically significant
depression, the DMI-16 cutoff discriminated highly (kappa = 0.91) and was
superior to the BDI-PC (0.68) and the HADS (0.57).
In a second study18 involving a larger sample of hospitalized medically ill
patients, we derived a briefer version (the DMI-10) and further examined its
properties (along with the DMI-16) in comparison to the BDI-PC and HADS
measures. While anhedonia is included in the DMI-16, as it was affirmed by a
significant number of nondepressed medically ill subjects, it was excluded
from the DMI-10 measure. Analysis against clinically judged caseness estab-
lished similar overall classification rates for the DMI-10 and DMI-18 mea-
sures, comparable to that derived for the BDI-PC but superior to the HADS
measure. In this study, the formally recommended HADS cutoff of 8 or more
for a probable case was also the optimal cutoff suggested by our ROC analysis
using clinical judgment as a criterion. The recommended HADS cutoff of 11 or
more for a definite case, however, showed low sensitivity. Our ROC analysis of
the BDI-PC established a cutoff score of 5 or more, close to its recommended
cutoff score of 4 or more.
In a third development study report,19 the capacity of the DMI-10 to screen
for a depression in a general practice setting—where it might be assumed that
the majority of the subjects would have a primary medical illness—was again
supported. The DMI-10 measure is shown in Table 10.1.

Parsimonious Screening
Chochinov and colleagues20 compared four screening measures in a sample of
inpatients with advanced cancer who were receiving palliative care. A single
item (‘‘Are you depressed?’’) was reported to have perfect sensitivity and
specificity, with the authors concluding that this question provides a ‘‘reliable
and remarkably accurate screen.’’ However, as responses to the outcome
measure (Research Diagnostic Criteria status) and the single predictor question
could both have been derived by subjective response bias (ie, affirming or
denying depression), this study risks a tautologic bias. Subsequent meta-
analysis showed more modest results.21 The need for economical accurate
10 ARE SPECIFIC SCALES USEFUL? 197

Table 10.1. Ten-Item Depression in the Medically Ill Screening Measure

DMI-10
Depression Self-Report Questionnaire
Please consider the following questions and rate how true each one is in relation to how you
have been feeling lately (ie. in the last 2 to 3 days) compared to how you usually or
normally feel.

Please tick ([) the most relevant Not True Slightly Moderately Very True
option

1. Are you stewing over things?

2. Do you feel more vulnerable

than usual?

3. Are you being self-critical and

hard on yourself?

4. Are you feeling guilty about

things in your life?

5. Do you find that nothing seems

to be able to cheer you up?

6. Do you feel as if you have lost

your core and essence?

7. Are you feeling depressed?

8. Do you feel less worthwhile?

9. Do you feel hopeless or helpless?

10. Do you feel more distant from

other people?

Adapted from www.blackdoginstitute.org.au/docs/DMI-10.pdf

measures encouraged development of a four-item screener (the Brief Case

Finder for Depression [BCD]22), assessing whether depressed mood or ‘‘rest-
less and disturbed nights’’ were present, together with items assessing inability
to overcome difficulties and/or dissatisfaction with life. This measure also
tends to be overly inclusive because its broad questioning generates many false
positives; however, the sensitivity of the measure and negative predictive
power appear adequate for ruling out those who are not depressed.23
198 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

The Primary Care Evaluation of Mental Disorders

(PRIME-MD)
The PRIME-MD24 assesses four domains of mental disorders commonly
observed in general population settings: mood, anxiety, somatoform, and
alcohol disorders. The two-tier assessment structure of the PRIME-MD
allows patients who score positively on the patient questionnaire (PQ) to
then receive a physician-administered structured interview (Clinician
Evaluation Guide [CEG]) involving modularized DSM-III-R criteria. A
patient who scores positively on one of two depressive symptoms (demor-
alization and/or anhedonia) on the PQ is subsequently assessed for more
specific criteria. Due to the length of administration time, consequent self-
report measures (PRIME-MD Patient Health Questionnaire [PHQ]) have
been designed,25 including the PHQ-9 measure of depressive status.
Standard DSM-IV major depressive disorder criteria apply, and recognition
of symptomatology is comparable—if not slightly more sensitive—than the
original PRIME-MD.25
In the initial PHQ primary care study,25 measured sensitivity and speci-
ficity were 73% and 98% respectively for the self-report version, compared
with 57% and 94% for the original clinician-administered version. More
specifically, at a cutoff score of 10 or more, the PHQ-9 derived sensitivity
estimates of 88%, likewise for specificity, to meet diagnostic criteria for
major depression.26 However, diagnostic concordance of the PHQ-9, while
higher than both the HADS and the WHO Wellbeing Index (WBI-5),27 is
still relatively low in comparison with DSM-IV criteria (kappa ¼ 0.56). The
comparative validity of the PHQ-9 with physicians’ diagnoses (sensitivities
of 98% and 40% respectively) is, however, superior.28 Thus, the PHQ-9 is
suggested to be somewhat more accurate29 than HADS and physicians’
diagnoses, though comparable to more general measures of well-being in
primary care populations.

5. Discussion
The capacity for medical illness and/or hospitalization to distort the assessment
of depression in the medically ill argues against use of any general depression
measure, and we have therefore not reviewed studies using such measures
other than the PRIME-MD. The last does risk confounding by medical illness
nuances but has the advantage of delivering DSM case status decisions,
although the risk is that the intrinsic limitations to such diagnoses in medically
ill groups may fail to be recognized. We therefore take as a given that any valid
depression measure excludes items that can be confounded by illness or
10 ARE SPECIFIC SCALES USEFUL? 199

hospitalization, and have focused on relevant measures. Two—the HADS and

the BDI-PC—have adopted the exclusive approach by effectively removing
potentially confounding items from established depression measures. In devel-
oping the DMI measures, we adopted a differing ‘‘bottom-up’’ approach of
examining the properties of items capturing the world of medically ill patients
(both depressed and nondepressed).
While the HADS measure has long been in use, it has been criticized for
the lack of studies examining its psychometric properties and even for its
validity. Its focus on anhedonia respects that construct’s utility in psychiatric
subjects but, as we established its high rate of affirmation of anhedonia in
nondepressed medically ill subjects,18 that construct may not be as central to
depression as imagined. Our quantifying18 low sensitivity for the HADS in
diagnosing definite depression is of concern if the aim of the screening
measure is to prioritize detection of those with probable or definite depres-
sion. In our initial DMI study,17 we established that the DMI-16 had high
internal consistency and was distinctly superior to the HADS and somewhat
superior to the BDI-PC when compared against a psychiatrist’s independent
clinical judgment of depression severity and case status. These findings were
essentially confirmed in our second study,18 where we again compared the
three relevant measures.
Any measure of depression in the medically ill needs to be acceptable, brief,
and minimally intrusive. The last issue is worthy of consideration. We deleted
a provisional item assessing suicidal ideation as it proved intrusive to a number
of our medically ill subjects. However, we demonstrated that its omission (in
the final DMI measures) was not of concern, as all those admitting to suicidal
ideation scored above the cutoff on the DMI-16.
Our studies of the three principal candidate screening measures (HADS,
BDI-PC and the DMI) suggest that the BDI-PC and DMI measures are roughly
comparable—but superior to the HADS—in terms of their capacity to separate
depressed and nondepressed individuals in medical settings. We would recom-
mend both the use of the BDI-PC and DMI-10.

References
1. Kessler R. Prevalence, correlates, and course of minor depression and major depression
in the National Comorbidity Survey. J Affect Disord. 1997;45:14–30.
2. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive
disorder: a systematic review of prospective studies. Acta Psychiatr Scand.
2004;109(5):325–331.
3. Egede LE. Major depression in individuals with chronic medical disorders: prevalence,
correlates and association with health resource utilization, lost productivity and
functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
200 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

4. Cassell EJ. Reactions to physical illness and hospitalisation. In Usdin G, Lewis JM, eds.
Psychiatry in general nedical practice. New York: McGraw Hill, 1979.
5. Cohen-Cole SA, Brown FN, McDaniel JS. Diagnostic assessment of depression in the
medically ill. In Stoudermire A, Fogel B, eds. Psychiatric care of the medical patient.
New York: Oxford University Press, 1993:53–70.
6. Plumb MM, Holland J. Comparative studies of psychological function in patients
with advanced cancer-I. Self-reported depressive symptoms. Psychosom Med.
1977;39(4):264–276.
7. Endicott J. Measurement of depression in patients with cancer. Cancer. 1984;53(10
Suppl):2243–2249.
8. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr
Scand. 1983;67(6):361–370.
9. Hermann C. International experiences with the Hospital Anxiety and Depression
Scale: a review of validation data and clinical results. J Psychosom Res.
1997;42(1):17–41.
10. Silverstone PH. Poor efficacy of the Hospital Anxiety and Depression Scale in the
diagnosis of major depressive disorder in both medical and psychiatric patients. J
Psychosom Res. 1994;38(5):441–450.
11. Bjelland I, Dahl AA, Haug TT, et al. The validity of the Hospital Anxiety and
Depression Scale: An updated literature review. J Psychosom Res. 2002;52(2):69–77.
12. Beck AT, Guth D, Steer RA, et al. Screening for major depression disorders in medical
inpatients with the Beck Depression Inventory for Primary Care. Behav Res Ther.
1997;35(8):785–791.
13. Beck AT, Beck RW. Screening depressed patients in family practice—rapid technique.
Postgrad Med. 1972;52(6):81–85.
14. Beck AT, Steer RA, Ball R, et al. Use of Beck anxiety and depression inventories for
primary care with medical outpatients. Assessment. 1997;4(Suppl 3):211–219.
15. Steer RA, Cavalieri DO, Leonard DM, et al. Use of the Beck Depression Inventory for
Primary Care to screen for major depressive disorders. Gen Hosp Psychiatry.
1999;21(2):106–111.
16. Winter LB, Steer RA, Jones-Hicks L, et al. Screening for major depression disorders in
adolescent medical outpatients with the Beck Depression Inventory for Primary Care. J
Adolesc Health. 1999;24(6):389–394.
17. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Screening for depression in the medically
ill: the suggested utility of a cognitive-based approach. Aust N Z J Psychiatry.
2001;35(4):474–480.
18. Parker G, Hilton T, Bains J, et al. Cognitive-based measures screening for depression in
the medically ill: the DMI-10 and the DMI-18. Acta Psychiatr Scand.
2002;105(6):419–426.
19. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Clinical and personality correlates of a new
measure of depression: a general practice study. Aust N Z J Psychiatry. 2003;37(1):
104–109.
20. Chochinov HM, Wilson KG, Enns M, et al. ‘‘Are you depressed?’’ Screening for
depression in the terminally ill. Am J Psychiatry. 1997;154(5):674–676.
21. Mitchell AJ. Are one or two simple questions sufficient to detect depression in
cancer and palliative care? A Bayesian meta-analysis. Br J Cancer. 2008;98(12):
1934–1943.
22. Clarke DM, McKenzie DP, Marshall, RJ, et al. The construction of a brief case-finding
instrument for depression in the physically ill. Integr Psychiatry. 1994;10:117–123.
10 ARE SPECIFIC SCALES USEFUL? 201

23. Jefford M, Mileshkin L, Richards K, et al. Rapid screening for depression—validation

of the Brief Case-Finder for Depression (BCD) in medical oncology and palliative care
patients. Br J Cancer. 2004;91(5):900–906.
24. Spitzer RL, Williams JBW, Kroenke K, et al. Utility of a new procedure for diagnosing
mental disorders in primary care: The PRIME-MD 1000 study. JAMA.
1994;272(22):1749–1756.
25. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report
version of PRIME-MD: The PHQ primary care study. JAMA. 1999;282(18):
1737–1744.
26. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression
severity measure. J Gen Intern Med. 2001;16(9):606–613.
27. World Health Organization (WHO). Wellbeing measures in primary health care: The
DepCare Project. WHO, Regional Office for Europe, Copenhagen: 1998.
28. Löwe B, Spitzer RL, Gräfe K et al. Comparative validity of three screening
questionnaires for DSM-IV depressive disorders and physicians’ diagnoses. J Affect
Disord. 2004;78(2):131–140.
29. Wittkampf KA, Naeije L, Schene AH, et al. Diagnostic accuracy of the mood module of
the Patient Health Questionnaire: a systematic review. Gen Hosp Psychiatry.
2007;29(5):388–395.
This page intentionally left blank
11
SCREENING FOR DEPRESSION IN
MEDICAL SETTINGS: THE CASE
AGAINST SPECIFIC SCALES

Fariba Babaei and Alex J. Mitchell

1. Overview of Depression in Physical Disease

2. Defining Somatic Symptoms
3. Diagnostic Accuracy of Somatic Symptoms in Depression
4. Evidence For and Against Somatic Symptoms when Diagnosing Comorbid
Depression
5. Implications for Screening

Context
The prevailing view for detecting mood disorders in the presence of physical
disease is to exclude somatic symptoms that might contaminate a diagnosis
(See Parker and Hyatt, Chapter 10 for a presentation of this point of view). This
chapter will examine whether this approach is beneficial, with a view to
deciding whether new depression scales for each physical disorder (each
excluding somatic symptoms) are required.

1. Overview of Depression in Physical Disease

There is a bidirectional relationship between depression and physical illness.
New evidence suggests that among depressed individuals presenting in pri-
mary care, most have at least one comorbid psychiatric condition and at least
one physical condition.1,2 At least 75% of elderly depressed patients in primary
care also have a known physical illness, and in 30–50% this is of high
severity.3–6 In one study only 10% of elderly depressed patients in primary
care had pure depression with no comorbidity.7 Thus, comorbid depression
should be considered ‘‘normal’’ in primary care. Some evidence suggests that

203
204 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

those with comorbidity are less likely to have depression treatment initiated by
their primary care practitioner.8 They are also less likely to recover from
depression.9 Specific conditions such as speech disorders, arthritis, and
dermatologic problems have been linked with worse outcomes of
depression.10,11
The exact relationship of depression and comorbidities is complex. In one of
the largest studies, Egede (2007)12 examined data from 30,801 adults captured
in the 1999 Household National Health Interview Survey. The community
prevalence of major depression was 4.7% in those without chronic medical
illness but 7.7%, 9.8%, and 12% in those with one, two, or three or more
chronic disorders, respectively (Fig. 11.1). Major depression was associated
with significant increases in utilization, lost productivity, and functional dis-
ability. Patients with chronic medical illness and comorbid depression (and
anxiety) also have significantly higher numbers of medical symptoms, even
controlling for severity of disease.13 Around one in four people in the general
population have functional disability, but in those with depression and medical
comorbidity, at least three out of four have functional limitations.14

10
8

0
r

)
de

31
37

68
or

=4
=7

=1
s

(n
di

(n
re

re
o

PD
N

ilu

ilu
te
io

C
Fa

Fa
ns

O
ry

C
te

te
rt

al
er

D
ea

en
yp
H

R
ar
H
e

e
on
iv

ag
st

St
ge

d-
on

En
C

Figure 11.1. 12-month prevalence of major depression in community population. Data

from Egede LE. Major depression in individuals with chronic medical disorders:
prevalence, correlates and association with health resource utilization, lost productivity and
functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
11 THE CASE AGAINST SPECIFIC SCALES 205

A population survey in New Zealand found that a quarter of people with

chronic physical conditions suffered from a comorbid mental disorder, com-
pared with 15% of the population without chronic conditions.15 Further, those
with a mental disorder had higher rates of chronic pain, cardiovascular disease,
high blood pressure, and respiratory conditions as well as the risk factors
smoking, overweight/obesity, and hazardous alcohol use. In a primary care
survey of 6,641 patients with multiple physical disorders, Nuyen and collea-
gues (2006)16 used morbidity data recorded by Dutch general practitioners to
examine both psychiatric and physical comorbidities. The top three conditions
linked with lifetime depression were schizophrenia, anxiety disorders, and
substance abuse. The top three medical disorders were Parkinson’s disease,
male genital problems, and stroke.
Physical disease is also strongly linked with suicide. Juurlink and associates
(2004)17 examined 1,354 provincial coroners’ records of Ontario residents 66
years or older who committed suicide between 1992 and 2000. Their prescrip-
tion records during the preceding 6 months were compared with those of living
matched controls (1:4) to determine the presence or absence of 17 illnesses
potentially associated with suicide. Conditions associated with suicide are
shown in Figure 11.2. Compared with patients with no identified illness, for
example, patients with three illnesses had about a threefold increase in the
estimated relative risk of suicide, and patients with five illnesses had about a
fivefold increase in risk.

2. Defining Somatic Symptoms

Defining and Eliciting ‘‘Somatic Symptoms’’
What exactly is meant by ‘‘somatic symptoms’’? At face value the answer
seems obvious: somatic symptoms are physical complaints relating to bodily
sensations. These would include aches, low energy, fatigability, muscle weak-
ness, leaden paralysis, and gastrointestinal symptoms (low appetite and weight
loss). Pain, sexual dysfunction, and sleep disturbance are certainly core
somatic symptoms, but are these strictly bodily sensations? For example,
pain may be defined as ‘‘an unpleasant sensory and emotional experience
associated with actual or potential tissue damage or is described in terms of
such damage.’’18 Thus pain (and sleep) may represent physical and psycholo-
gical aspects. Even more difficult to classify but still conventionally regarded
as somatic are concentration problems, agitation/retardation, and changes in
arousal. This short list is not exhaustive. Less common somatic symptoms of
depression might include shortness of breath, dry mouth, constipation or
diarrhea, urinary frequency or hesitancy, menstrual disturbances, dizziness,
changes in libido, palpitations, increased sweating, flushing, blurred vision,
tremor, pins and needles, restless legs, and rash. Indeed, any bodily sensation
Is
ch
em
ic
he
R ar

0
1
2
3
4
5
6
7
8
9
10

he td
um is
at ea
oi se
d
D ar
ia th
be rit
te is
s
m
el
H Pr
os lit
yp us
er ta
ac te
id ca
ity nc
sy er
nd
ro
B m
re es
Pa as
rk tc
in an
so ce
C n’ r
hr s
on di
ic se
C lu as
on
ge ng e
st di
iv se
e as
he e
and the risk of suicide in the elderly. Arch Intern Med. 2004;164:1179–1184. ar
tf
M ai
od lu
U re
rin er
at
ar
y e
in pa
co in
A nt
in
nx Se
ie iz en
ty ur ce
an e
d di
so
Ps sl
ee rd
yc p er
ho di
se so
s rd
an er
d s
ag
ita
tio
D n
ep
re
ss
io
Se n
ve
B re
ip pa
ol in
ar
di
so
rd
er
Figure 11.2. Suicide risk in medical and psychiatric disorders. Reprinted with permission from Juurlink DN, Herrmann N, Szalai JP. Medical illness
11 THE CASE AGAINST SPECIFIC SCALES 207

might be included, although some symptoms may be due to medication rather

than the underlying depression.
One study examined how reliably clinicians elicit somatic compared to
nonsomatic symptoms. In the Rhode Island MIDAS project, Zimmerman and
colleagues (2006)19 conducted an in-depth analysis of symptoms for major
depressive disorder by trained raters administering a semi-structured interview
to 1,523 psychiatric outpatients. They analyzed a 17-item bank of possible
symptoms of depression, including the standard 9 DSM items but separating
the compound criteria that encompass more than one symptom (eg, increased
sleep OR insomnia), along with non-DSM diagnostic items such as hope-
lessness, helplessness, and unreactive mood. The authors found that some
items were rated more reliably than others—for example, suicidal ideas/plan/
attempt (suicidality) achieved almost perfect agreement, whereas raters often
disagreed about what constituted psychomotor retardation (Textbox 11.1).
There was no overall pattern indicating that somatic symptoms were rated
more or less reliably than nonsomatic symptoms.

Textbox 11.1. Inter-Rater Reliability Eliciting Individual Symptoms

of Depression

Bold text indicates somatic symptoms.

208 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Somatic Symptoms in Current Diagnostic Systems and Scales

Somatic items are included in both ICD-10 and DSM-IV. In fact, ICD-10
includes fatigue as a core feature. Neither ICD-10 nor DSM-IV gives clear
guidance about how to judge these specific symptoms in the case of depression
and physical disease (Table 11.1). As many as 70% of patients with depression
(and to a lesser extent anxiety) present with somatic symptoms as their first
complaint. Emotional symptoms are less likely to be mentioned if they are not
specifically asked about by the interviewer.20 That said, physical complaints
are seldom attributed to psychological causes, and the focus for clinical
examination is usually physical disorders with somatic symptoms.21 Thus,
somatic symptoms may indicate major depression or an underlying physical
disorder. Particular difficulty arises in the case of major depression occurring
in the context of a comorbid physical disorder. In this situation it is unclear how
to judge the significance of somatic symptoms.22,23 In an attempt to improve
upon the discriminatory value of the Beck Depression Inventory (BDI) and the
Zung Self-Rating Depression Scale, questionnaires such as the Hospital
Anxiety and Depression Scale (HADS) and the General Health
Questionnaire (GHQ-12) omit most somatic symptoms of depression in favor
of cognitive aspects.24 Most commonly fatigue and appetite and weight
changes are omitted.25 In this approach somatic symptoms are assumed to
contaminate a diagnosis of comorbid depression. The concern is that somatic
symptoms may lead to an overdiagnosis of depression because of the lack of

Table 11.1. Somatic Symptoms of Depression in ICD and DSM

Somatic or NonSomatic Core Symptom ICD-10 DSM-IV

Nonsomatic Persistent sadness or low mood Yes Yes
(core) (core)
Nonsomatic Loss of interests or pleasure Yes Yes
(core) (core)
Somatic Fatigue or low energy Yes Yes
(core)
Somatic Disturbed sleep Yes Yes
Somatic Poor concentration or Yes Yes
indecisiveness
Nonsomatic Low self-confidence Yes No
Somatic Poor or increased appetite Yes No
Nonsomatic Suicidal thoughts or acts Yes Yes
Somatic Agitation or slowing of movements Yes Yes
Nonsomatic Guilt or self-blame Yes Yes
Somatic Significant change in weight No Yes
11 THE CASE AGAINST SPECIFIC SCALES 209

discrimination regarding the cause of the symptoms.26 One way to investigate

this is to compare the ability of somatic items to distinguish between healthy
controls and those with major depression. A second method is to compare the
ability of somatic items to distinguish between those with uncomplicated major
depression and those with comorbid major depression and physical illness. A
third method is to compare those with comorbid depression and those with
physical illness alone. We consider each of these in turn below.

3. Diagnostic Accuracy of Somatic Symptoms in Depression

Given the almost endless list of possible somatic symptoms, it is important to
first establish which, if any, are of diagnostic significance in primary depres-
sion and then in the diagnosis of comorbid depression and physical illness. For
example, Chochinov and associates (1994)27 compared results from semi-
structured diagnostic interviews in 130 patients receiving palliative care.
Diagnoses according to the Research Diagnostic Criteria (RDC) were com-
pared with diagnoses made according to Endicott’s revised criteria (which
replace the somatic symptoms change in weight or in appetite, sleep distur-
bance, loss of energy, and reduced concentration with the nonsomatic alter-
natives depressed appearance, social withdrawal, brooding, self-pity or
pessimism, and lack of reactivity). The authors found that including somatic
symptoms in the diagnostic criteria increased the rates of diagnosis, but only
when these symptoms are used in conjunction with a low-threshold approach.
Similarly, Dugan and coworkers (1998)28 analyzed the Zung Self-Rating
Depression Scale both with and without somatic items and reported 5% more
false positives when measuring depression in cancer with somatic items.
However, to confirm or refute this effect, a diagnostic validity study is
needed in which somatic symptoms are added or removed from the model to
examine the effect on accuracy of ruling in or ruling out the condition
according to the gold standard. Once this information is gathered, then a
decision can be made whether to include or exclude the somatic symptoms.
A slightly more sophisticated approach uses somatic symptoms only if they are
caused by depression (Textbox 11.2). In reality this etiologic approach is
challenging, because causation of specific symptoms is usually impossible to
establish except in the crudest terms.
One reason for uncertainty is that the rate of somatic complaints is not clear
in each subgroup. For example, although somatic symptoms are certainly
common in depressed patients, they also appear to be common in the general
population: more than 75% of respondents in one community study reported at
least one somatic complaint during the previous 30 days.29 The most common
symptoms were tiredness (50%), headache (42%), and lower back pain (35%).
210 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Textbox 11.2. Approaches to Somatic Symptoms of Depression

Inclusive
The inclusive approach uses all of the symptoms of depression, regardless of
whether they may or may not be secondary to a physical illness. This
approach is used in the Schedule for Affective Disorders and Schizophrenia
(SADS) and the Research Diagnostic Criteria.
Etiologic
The etiologic approach attempts to assess the origin of each symptom and
counts a symptom of depression only if it is clearly not the result of the
physical illness. This is proposed by the Structured Clinical Interview for
DSM and Diagnostic Interview Schedule (DIS), as well as the DSM-III-R/IV.
Substitutive
The substitutive approach assumes somatic symptoms are a contaminant and
replaces these with additional cognitive symptoms. However, it is not clear
what specific symptoms should be substituted.
Exclusive
The exclusive approach eliminates somatic symptoms but without
substitution. There is concern that this might lower sensitivity, with an
increased likelihood of missed cases (false negatives).
Adapted from Trask PC. Assessment of depression in cancer patients. J Natl Cancer Inst Monogr.
2004;32:80–92.

However, only about one third of patients with somatic symptoms seek med-
ical help. From the reverse perspective, mood disorders are a common finding
in those with somatic symptoms, accounting for approximately 30% of patients
presenting with physical complaints.30 In the Epidemiological Catchment Area
Study (ECA), the presence of physical symptoms was associated with at least a
twofold increase in anxiety or depressive disorders.31,32 In the HUNT-II study,
which surveyed all inhabitants from the Nord-Trøndelag County of Norway,
women had a mean of 3.8 somatic symptoms and men 2.9 symptoms.33 There
was a linear association between the number of somatic symptoms and the total
HADS score. Gerber and associates (1992)34 showed that sleep disturbance,
fatigue, more than three complaints, nonspecific musculoskeletal complaints,
back pain, shortness of breath, amplified complaints, and vaguely stated
complaints distinguished between depressed and nondepressed patients in a
general medical primary care practice. Better evidence was recently reported in
the Rhode Island MIDAS project. Zimmerman and colleagues (2006)35 found
that the ranked order of diagnostic weight (by individual item) for DSM-IV
11 THE CASE AGAINST SPECIFIC SCALES 211

membership on logistic regression was depressed mood > anhedonia > sleep
disturbance > concentration/indecision > worthlessness/excessive guilt > loss
of energy > appetite/weight disturbance > psychomotor change > death/sui-
cidal thoughts. In the 8.9% who fulfilled the minimum DSM-IV criteria for
major depressive disorder (five features only), increased weight, decreased
weight, and indecisiveness rarely influenced diagnostic classification and in
fact were influential in diagnosis in the whole sample in about 1% of cases.
More detailed analysis of the MIDAS project was recently reported by
Mitchell and colleagues (2008).36 We found that somatic symptoms had
value in ruling in and ruling out primary depression (Fig. 11.3). When ruling
in depression (case-finding), the most successful single symptoms were psy-
chomotor retardation, diminished interest/pleasure, indecisiveness, depressed
mood, and worthlessness. When ruling out depression (reassurance), the most
successful symptoms were depressed mood, diminished drive, loss of energy,
diminished interest/pleasure, and diminished concentration. Therefore, it may
be concluded that psychomotor retardation, loss of energy, and diminished
concentration do indeed help clinicians diagnose uncomplicated depression.
What is the evidence that somatic symptoms assist in a diagnosis of comorbid
depression?

4. Evidence For and Against Somatic Symptoms when

Diagnosing Comorbid Depression
Evidence from Comparative Studies of Primary Depression
versus Secondary Depression
Lipsey and colleagues (1986)37 studied 43 post-stroke depressed patients
against 43 patients with functional major depression to compare their depres-
sive symptoms. They concluded that the depressive syndrome profiles in the
two patient groups were similar, and only two symptoms were significantly
different: slowness was more common and lack of interest/concentration was
less common in post-stroke patients. Simon and associates (2005)38 examined
the validity of the DSM-IV depression criteria in 235 individuals with medical
comorbidities, including diabetes, ischemic heart disease, or chronic obstruc-
tive lung disease, versus 204 depressed subjects without those conditions. At
the midpoint of the depression severity scale, patients with medical comor-
bidity had a 54% probability of reporting fatigue compared to 45% in those
without comorbidity. All four somatic symptoms showed robust improvement
with treatment, and this improvement did not differ significantly between
patients with and without medical comorbidity. They could find only limited
evidence that fatigue, changes in weight or appetite, psychomotor agitation/
retardation, and sleep disturbance are less valid indicators of depression in
0.00
0.10
0.20
0.30
0.40
0.50

–0.10
An ge
r

An xie
ty
De cr
ease
da ppeti
te
De cr
ease
d we
igh t
De pre
ssed
mo od
Dimin
is h ed
co nc
e n trati
on
Dimin
is h ed
Dimin drive
is h ed
in te r
est/p
le asu re

Exce
s sive
gu il t

Help
le ssn
e ss
Hope
le ssn
e ss
H yp e
rsom
n ia
In cre
ase d
appe
ti te
In cre
ase d
we ig
ht
In de
cisiv
en es
s

In som
n ia
Lack
of re acti
ve m
o od

Lo ss
of e n
ergy
P s yc
h ic a
n xie ty
P s yc
h om
o tor
a gita
tio n
P s yc
h om
o tor
ch an
P s yc ge
h om
o tor
re ta rd
ati on
Sle e
p dis
tur ba nc
e
Som
ati c a
nxiety
Rule-In Added Value (PPV-Prev)
Rule-Out Added Value (NPV-Prev)

Th ou
ghts
o f de
ath
Wo rt
h le
Figure 11.3. Added value in diagnosing primary depression. Adapted from Mitchell AJ, McGlinchey JB, Young D, et al. Accuracy of specific

ssne
ss
symptoms in the diagnosis of major depressive disorder in psychiatric out-patients: data from the MIDAS project. Psychol Med. Nov. 12, 2008:1–10.
11 THE CASE AGAINST SPECIFIC SCALES 213

patients with chronic medical illness. Pickard and associates (2006)39 used
Rasch methods to compare symptoms of depression in 32 subjects with post-
stroke depression versus 366 depressed primary-care patients. They found that
four items demonstrated statistically significant differential item functioning:
‘‘my sleep was restless,’’ ‘‘I felt that people disliked me,’’ ‘‘I did not feel like
eating,’’ and ‘‘I had crying spells.’’ Each of these items identified with statis-
tically significant Differential Item Functioning (DIF) demonstrated a logit
difference of approximately 0.5 or more across the two groups. Overall,
however, the authors found few differences between groups.
Van Wilgen and associates (2006)40 analyzed the influence of somatic
symptoms on the Center for Epidemiologic Studies Depression Scale
(CES-D) in 509 patients with oropharyngeal, gynecologic, colorectal,
and breast cancer after treatment versus a control group of 223 depressed
patients without cancer. They concluded that the incidences of somatic
morbidity within cancer types differ, but somatic items do not interfere
with the outcome of depression as measured with the CES-D.
Interestingly, some cancer groups showed both less somatic morbidity
(colorectal cancer) while others showed more (oral/oropharyngeal,
breast) than the comparison group. In the analyses of the CES-D with
and without the somatic domain, the prevalence of depression symptoms
with the somatic domain is lower for the cancer groups.
Ehrt and colleagues (2007)41 compared the individual depressive symptoms
of 145 depressed patients with Parkinson’s disease and 100 depressed patients
without Parkinson’s disease by comparing item scores on the Montgomery-
Åsberg Depression Rating Scale. Depressed patients with Parkinson’s disease
showed significant less reported sadness, less anhedonia, fewer feelings of
guilt, and slightly less loss of energy but more concentration problems than
depressed control subjects. Thus, some but not all somatic symptoms were
increased in comorbid groups. The results of this study support the hypothesis
that depression profile in Parkinson’s disease differs to a certain extent from
that in non-Parkinson’s disease patients with major depression.
Yates and colleagues for STAR*D (2007)42 analyzed the effect of
specific somatic symptoms in separating primary depression from depres-
sion with comorbid physical disease. Clearly, if somatic symptoms were
overrepresented in the comorbid group, then the classic view that somatic
symptoms may contaminate a diagnosis of depression in physical disease
would be supported. Two somatic symptoms occurred in 80% or more of
those with noncomplicated depression and four occurred in 80% or more
of those with comorbid depression. The two most common were impaired
concentration (91%) and fatigue (87%). Although somatic symptoms were
common in patients with both depression and physical ill health, somatic
symptoms were also common in patients without comorbidity. In
214 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

particular, impaired concentration and fatigue occurred in approximately

90% of both groups. Other studies have examined this issue in relation to
comorbid depression versus healthy controls.

Evidence from Comparative Studies of Comorbid Depression

versus Healthy Controls
Aikens and associates (1999)43 evaluated the depressive symptoms in 105
multiple sclerosis patients and compared the results with 80 healthy controls
as well as three other comparison groups: diabetes (n = 71), chronic pain
(n = 80), and psychiatric patients with depressive disorder (n = 37). They eval-
uated the appropriateness of omitting somatic items from the original BDI when
assessing depressive symptoms in multiple sclerosis patients. They suggested
that somatic items appear to function quite normally for this group, with
psychometric indices comparable to those observed in psychiatric and nonpsy-
chiatric samples, and recommended against dropping items from the original
BDI for routine depression assessment in multiple sclerosis samples.
Guo and colleagues (2006)44 looked at a small sample of 33 cancer patients,
13 patients with major depression without cancer, and 12 normal comparison
subjects. The authors examined which HAM-D items would optimize the
diagnosis of depression among cancer patients. Their final model contained
six HAM-D items, combining somatic and nonsomatic items (late insomnia,
agitation, psychic anxiety, diurnal mood variation, depressed mood, and gen-
ital symptoms). At a cutoff of 6 the sensitivity was 81.3% and specificity
87.5%. However, in this study, certain somatic items, including middle
insomnia, retardation, somatic symptoms (gastrointestinal and general), and
loss of weight, were not discriminatory.
Holzapfel and associates (2008)45 examined depressed patients with
(n = 113) and without (n = 137) chronic heart failure in relation to individual
DSM-IV depressive symptoms, as measured with the Patient Health
Questionnaire (PHQ)-9. Among the patients meeting the criteria for major
depressive disorder, patients with heart failure reported significantly lower
levels of depressed mood (p = 0.006) and worthlessness/guilt (p = 0.019) than
patients without. No significant differences were found for sleep disturbance,
loss of energy, change in appetite, poor concentration, psychomotor agitation/
retardation, and suicidal thoughts (Fig. 11.4).

Evidence from Comparative Studies of Comorbid Depression

versus Physical Illness Alone
Symptom profiles of depressed and nondepressed patients with cancer were
examined by Chen and Chang (2004),46 who recruited 121 hospitalized
11 THE CASE AGAINST SPECIFIC SCALES 215

Symptom severity: Symptom severity:

CHF > Non-CHF CHF > Non-CHF

Loss of interest

Depressed mood

Sleep disturbance

Loss of energy

Change in appetite

Worthlessness/feelings of guilt

Weak concentration

Psychomotor agitation/retardation

Suicidal thoughts

–1.0 –0.5 0 +0.5 +1.0

Figure 11.4. Differences in severity of individual depression symptoms in patients with

major depressive disorder with and without chronic heart failure. Data from Holzapfel N,
Müller-Tasch T, Wild B. et al. Depression profile in patients with and without chronic heart
failure. J Affect Disord. 2008;1:53–62.

patients with breast, esophageal, and head and neck cancer. Using a HADS-D
cutoff score of 11, 30 patients were classified as depressed and 91 as non-
depressed. Depressed patients showed a significantly higher occurrence rate
than nondepressed patients on insomnia (83% versus 62%), pain (83% versus
55%), anorexia (63% versus 42%), fatigue (67% versus 32%), and wound/
pressure sore (30% versus 13%). A significant chi-squared statistic with Yates
correction (w2 = 10.74, p = 0.001) indicated an association between multiple
symptoms and depression in this sample. Patients simultaneously experiencing
multiple symptoms (insomnia, pain, anorexia, and fatigue) had a significantly
higher risk of being depressed. Both groups showed similar rankings of
symptom occurrence rates.
216 SCREENING FOR DEPRESSION IN CLINICAL PRACTICE

Evidence from Noncomparative Studies (eg, Rasch Analysis)

Stein and coworkers (1996)47 found that somatic items of depression were less
sensitive than nonsomatic items in the diagnosis of post-stroke depression. In
this study 189 persons with unilateral ischemic or embolic cerebrovascular
accident were interviewed by a psychologist, 4 weeks or more after stroke,
using the BDI and the HAM-D. Findings suggested that the most discrimi-
nating individual symptoms of post-stroke depression were nonsomatic.
Somatic items from both scales were significantly less specific when diag-
nosing post-stroke depression than were the nonsomatic items. Somatic symp-
toms were neither specific to post-stroke depression nor added incremental
validity over nonsomatic symptoms for diagnosing post-stroke depression.
Kathol and colleagues (1990)48 investigated the relation of scores on the
HAM-D and BDI to the presence or absence of criteria-based diagnoses of
depression in cancer. The diagnoses of major depression in 152 cancer patients
differed as much as 13% depending on the diagnostic system used. The BDI
and the HAM-D were useful tools for screening patients with depressive
symptoms but frequently misclassified those who had no major depression
according to one or more of the criteria-based diagnostic systems.
Kalichman and colleagues (2000)49 worked on overlapping somatic symp-
toms of depression and HIV disease in 357 people living with HIV/AIDS. They
directly compared the diagnostic use of the BDI and the CES-D in this single
sample. Results of a factor analysis entering the six depression factor scores
from the BDI and CES-D showed that HIV symptoms were most strongly
associated with the somatic depression symptom factors of the BDI and
CES-D. In other words, the findings suggested that depression scales that
include somatic symptoms will inflate depression scores in people living with
HIV infection, and available methods for distinguishing overlapping symptoms
should be employed when assessing people living with HIV infection.
Leentjens and coworkers (2001)50 assessed the sensitivity of individual
depressive symptoms and their relative contribution to the diagnosis of depres-
sive disorder using the Structured Clinical Interview for DSM Disorders
(SCID) in 149 patients with Parkinson’s disease. Applying the HAM-D and
the Montgomery-Åsberg Depression Rating Scale, they showed that only two
somatic symptoms, early morning awakening and reduced appetite, had good
discriminative properties. Therefore, they concluded that the core symptoms
were most important in distinguishing depressed and nondepressed
Parkinson’s disease patients.
Akechi and associates (2003)51 used data from 220 cancer patients with
major depression to examine the intercorrelations among the DSM-IV somatic
and nonsomatic symptom criteria as well as whether the presence of an
individual somatic symptom could discriminate the severity of major
11 THE CASE AGAINST SPECIFIC SCALES 217

depression. Appetite changes and a diminished ability to think but not sleep
disturbance and fatigue were significantly associated with nonsomatic symp-
toms. These associations were consistent after adjusting for physical func-
tioning and pain. Only patients with appetite changes showed a higher severity
of depression.
De Coster and colleagues (2005)52 studied 206 patients with first-ever
stroke with the SCID for DSM-IV and the HAM-D. In a discriminant analysis
HAM-D item scores correctly classified 88.3% of patients as depressed or
nondepressed. Depressed mood discriminated best between depressed and
nondepressed stroke patients, but many psychological symptoms, such as
hypochondriasis, lack of insight, and feelings of guilt, were not very sensitive.
In contrast, somatic symptoms, such as reduced appetite, psychomotor retarda-
tion, and fatigue, had high discriminative properties.

5. Implications for Screening

Somatic symptoms have a role in the diagnosis of uncomplicated depression,
but their role in comorbid depression has been subject to considerable confu-
sion. Two early studies suggested that including somatic symptoms in scales
might result in an overdiagnosis of comorbid depression and cancer (low
specificity and low positive predictive value). Since that time, our search
revealed six studies comparing primary depression and secondary depression,
three studies comparing comorbid depression and healthy controls, but only
one study comparing comorbid depression versus physical illness alone. From
the first group, somatic symptoms were certainly common in patients with
comorbid depression, but they were also common in those with uncomplicated
depression and less common in patients in physical illness alone and least
uncommon in healthy controls. Taking the example of cancer, individuals with
cancer undergoing active treatment clearly have numerous somatic symptoms.
Indeed, compared with healthy controls, individuals with cancer have a higher
level of all somatic symptoms rated by items 14 to 21 on the BDI, with the
exception of loss of libido.53 However, such differences are easy to over-
estimate. Individuals with comorbid and uncomplicated depressions have an
even higher rate of somatic symptoms. Overall, somatic symptoms did not
emerge as insignificant in primary or secondary depressions. Indeed, of the
possible list of symptoms potentially discriminating depressed patients with
and without comorbid physical illness, several nonsomatic items such as guilt
appear to be better discriminators than somatic symptoms (see Fig. 11.4). Thus,
the formulation of custom secondary depression scales by indiscriminately
omitting somatic items does not appear to be justified. That said, it is possible
that certain medical disorders might be atypical and feature somatic symptoms
that have special significance. For example, van Wilgen and colleagues
Table 11.2. Systematic Review of Comparative Studies Examining Value of Somatic Symptoms in Comorbid Depression

Year Reference Method Setting Results (Description) Supports

Unique
Scales?
(Yes, no,
uncertain)
1997 Suh T, Gallo JJ. Symptom ECA (Epidemiologic 4,931 and 363 household (1) Except for gender, there Uncertain
profiles of depression Catchment Area) program: respondents from 3 ECA were significant differences
among general medical series of epidemiologic sites (Baltimore, Durham, between the two groups
service users compared surveys conducted by and Los Angeles) who according to the
with speciality mental collaborators (1980–1984) used general medical sociodemographic factors
health service users. at 5 sites in US. ECA data sector or speciality mental (p < 0.001). (2) Speciality
Psychol Med. include both community and health respectively within mental health service users
1997;27(5):1051–1063. institutional populations 6 months of interview were more likely to report all
interviewed in person. the depression symptoms. (3)
Measurement strategy: used General medical users were
standardized and generally less likely to report dysphasia
pre-coded questions as part (OR = 0.49; 95% CI = 0.33–
of highly structured 0.72) and worthless/sinful/
interview administered by guilty (OR = 0.55; 95%
an agency lay interviewer CI = 0.35–0.86) after holding
with DIS (Diagnostic constant the level of
Interview Schedule) depression but were more
training. Logistic regression likely to report fatigue
models were used to (OR = 1.82; 95%
implement item response CI = 1.17–1.83).
theory in the framework of
the symptom criteria of
major depression in
DSM-III
Table 11.2. (Continued)

Year Reference Method Setting Results (Description) Supports

Unique
Scales?
(Yes, no,
uncertain)
2001 Leentjens AFG, Marinus DSM-IV diagnosis of 169 patients with primary Using the HAM-D, suicidality No
J, Van Hilten JJ, et al. The depressive disorder was PD, as defined by the was the best discriminator
contribution of somatic considered the gold United Kingdom between depressed and
symptoms to the standard. All patients Parkinson’s Disease nondepressed patients,
diagnosis of depressive completed the Hamilton Society Brain Bank (UK- followed, in descending order,
disorder in Parkinson’s Rating Scale for PDS-BB), were referred by feelings of guilt, psychic
disease. J Depression (HAM-D) and from the neurologic anxiety, reduced appetite,
Neuropsychiatry Clin 111 patients completed the outpatient department for depressed mood, and reduction
Neurosci. 2003;15:74– MADRS, which were a protocolized mental of work and interest. Most
77. highly significant and used status examination. 20 somatic items had low
as symptom checklists. (11.8%) were excluded discriminative properties, but
The contribution of the because of dementia. reduced appetite and early-
individual items of these morning wakening (or late
scales to the diagnosis of insomnia) had relatively high
‘‘depressive disorder’’ was discriminative properties. On
calculated by discriminant the MADRS, the two ‘‘core’’
analysis. Then, a symptoms of depression,
correlation coefficient depressed mood and anhedonia,
with this discriminant had the highest correlation
function was obtained for coefficients. Somatic items as
each of the individual well as the item ‘‘concentration
items on these scales to difficulties’’had low correlation
reflect the relative strength coefficients. However, reduced
of association of each appetite was a relatively
symptom with the important indicator of
discriminant function. depression. Following a post
Wilks’ lambda was hoc analysis,