• Cognitive deficits are present in euthymic bipolar patients, and although some confounds may
explain part of the previously reported effect sizes, they cannot entirely explain the impairments.
• Individual patient data meta-analysis has important advantages over the use of published summary
data for systematic review especially with regard to controlling for confounds.
Bourne et al.
• The relative lack of drug effects on neuropsychological test performance should be treated with
caution as this mega-analysis could not take into account duration or dosage of each drug treatment.
• Similarly, the correlational analysis suggesting that some impairments may track illness progression
should also be treated with caution until longitudinal data supports the causality of this relationship.
Euthymic bipolar cognition: IPDMA
advantages over the use of published summary cessing speed; iii) Digit Span [from WAIS-R Digit
data for systematic review (17). In particular, IP- Span (21)] as a non-word working memory span
DMA allows the primary study effect sizes to be task and iv) Wisconsin Card Sorting Task (WCST)
adjusted for confounding factors (i.e. factors such (22) as a measure of set shifting and rule discovery.
as age, education and IQ) prior to meta-analysis Verbal Learning Task (VLT), TMT and WCST all
and for a large data set to be analysed for drug and appear in the International Society for Bipolar
illness severity effects. The latter having been previ- Disorders recently recommended battery for neu-
ously restricted to primary studies of modest ropsychological assessment (23).
sample size or narrative review. The adjustment From the four selected neuropsychological tests,
for confounding factors is especially valuable we focused on 11 specific outcome measures: VLT
because, although some of the primary studies total score on trials 1–5 (Total1–5), VLT score on
were very tightly matched case–control studies Short Delay (ShortDelay), VLT score on Long
focusing on one or two neuropsychological tests, Delay (LongDelay), VLT score on Recognition
other included studies were more opportunistic (Recognition), VLT score for Recognition minus
samples running large neuropsychological test bat- score for False Positives (Recog-FP); time to com-
teries with more sample variation. In a standard plete Trail Making Test A (TMTA), time to com-
meta-analysis, the results from these two types of plete Trail Making Test B (TMTB); score on
study are combined without adjustment. Forward Digit Span (FDS), score on Reverse Digit
Span (RDS); number of categories achieved on
Wisconsin Card Sorting Task (WCSTCats.) and
Aims of the study
number of perseverations on Wisconsin Card
The main aim of the study was to synthesize data Sorting Task (WCSTPersev.).
demonstrating cognitive deficits in euthymic bipo- Where possible, demographic and clinical vari-
lar patients in such a way as to be able to adjust ables were also collected for each primary data
for confounding factors to provide a more defini- set including i) age; ii) IQ; iii) current mood; iv)
tive estimate for effects sizes than in prior meta- age at onset; v) number of prior manic and
analyses. A secondary aim was to create a large depressed episodes; vi) number of prior manic
data set to provide a more definitive view of drug and depressed hospitalizations and vii) drug
and illness severity effects on cognitive impair- treatment history.
ments than has been possible in relative small sam-
ple primary studies. We chose to include tests that
Search strategy
had appeared consistently in the meta-analyses as
showing impairment and for which data were actu- Given the existence of five recent prior reviews in
ally available for the majority of individual this area (each with similar but different search
patients. terms and inclusion/exclusion criteria), this study
did not conduct an additional full systematic
search under PRISMA (24) rules. Rather, in an
Material and methods
attempt to include all the primary studies that had
Table 1 shows the results from the four existing been in the prior reviews, all first authors of studies
meta-analyses as the rank of the neuropsychologi- appearing in the five review papers that contained
cal tests showing the largest effects in each review. data on at least one of the four required neuropsy-
Effect sizes appear to be relatively large, but it is chological tests were contacted. In addition, Psy-
striking that sample numbers vary considerably chInfo and PubMed databases were searched with
due to the differences in criteria for study inclu- the key concepts of bipolar disorder, euthymia and
sion. The relative order of neuropsychological tests cognitive impairment to find any additional pri-
when ranked by effect size is variable from analysis mary studies that met our criteria. These searches
to analysis partly due to the variation in study were restricted to articles published between 1 Jan-
inclusion and probably partly due to noise. uary 2007 and 30 June 2010 in English language
Primary data were sought that tested both eu- peer-reviewed journals. In total, 45 primary studies
thymic bipolar patients and healthy controls (aged were identified from 41 different authors (see
18–65) on at least one of four key neuropsycholog- Table S1). This number is smaller than may have
ical tasks identified in Table 1: i) a verbal learning first appeared from the literature search as some
and memory task, that is, California Verbal Learn- studies incorporated data sets used in other pub-
ing Task (CVLT) (18) or Rey Verbal Learning lished studies and therefore did not constitute
Task (RAVLT) (19); ii) the Trail Making Test mutually exclusive data sets. Of the 45 eligible pub-
(TMT) (20) as a measure of set shifting and pro- lished studies, full data were provided by primary
Bourne et al.
Table 1. Summary of the effect sizes found for neuropsychological performance of bipolar patients relative to healthy controls. Top seven effect sizes in the meta-analysis by (a)
Arts et al. (6), (b) Bora et al. (7), (c) Robinson et al. (8) and (d) Torres et al. (9)
1 RDS Executive 222 205 1.02 <0.0001
2 TMTB Executive 309 306 0.99 <0.0001
3 WCST (Perseveration) Resp. Inhib 268 288 0.88 <0.0001
4 Category Fluency Executive 178 178 0.87 <0.0001
5 Rey/CVLT (Delayed Recall) Verb. L + M 269 282 0.85 <0.0001
6 Digit Symbol Subtest Attention 202 249 0.84 <0.0001
7 Rey/CVLT (Total Recall) Verb. L + M 369 382 0.82 <0.0001
1 TMTB Executive 793 626 0.86 <0.0001
2 Rey/CVLT (Learning) Verb. L + M 619 632 0.85 <0.0001
3 CPT Omission Attention 303 279 0.83 <0.0001
4 Rey/CVLT (Delayed Recall) Verb. L + M 578 612 0.77 <0.0001
5 Stroop Resp. Inhib 746 707 0.76 <0.0001
6 Digit Symbol Subtest Attention 381 479 0.75 <0.0001
7 RDS Executive 375 487 0.75 <0.0001
1 Category Fluency Executive 149 135 1.09 <0.0001
2 RDS Executive 222 209 0.98 0.0031
3 Rey/CVLT (Total Recall) Verb. L + M 344 347 0.90 <0.0001
4 TMTB Executive 418 355 0.78 <0.0001
5 WCST (Perseveration) Resp. Inhib 195 216 0.76 <0.0001
6 Rey/CVLT (Short Free Recall) Verb. L + M 345 349 0.73 <0.0001
7 Rey/CVLT (Long Free Recall) Verb. L + M 365 368 0.71 <0.0001
1 Rey/CVLT (Total Recall) Verb. L + M 381 439 0.81 <0.0001
2 Digit Symbol Subtest Attention 222 310 0.79 <0.0001
3 Rey/CVLT (Short Delay) Verb. L + M 315 307 0.74 <0.0001
4 CPT Hits Attention 188 208 0.74 <0.0001
5 Rey/CVLT (Long Delay) Verb. L + M 361 441 0.72 <0.0001
6 Stroop Resp. Inhib 346 329 0.71 <0.0001
7 WCST (Perseveration) Resp. Inhib 244 229 0.69 <0.0001
CPT, continuous performance task; CVLT, California Verbal Learning Task; RDS, Reverse Digit Span; Resp. Inhib, Response Inhibition; TMTB, Trail Making Test B; Verb. L + M,
Verbal Learning and Memory; WCST, Wisconsin Card Sorting Task.
authors in relation to 25 published papers (4, cian Administered Rating Scale for Mania Factor
25–48) with the data from the remaining 20 eligible 1 CARS-M(F1) (55) or 20 on Manic State Rat-
studies unavailable and therefore not included in ing Scale (MSRS) (56). If no mood ratings were
this reanalysis. Additionally, new primary data available, then euthymia had been assessed by a
that met our criteria were also provided in relation qualified psychiatrist only.
to six unpublished data sets (49, 50) (A. Macritche, The total sample size for the reanalysis was
manuscript in preparation; A. Varma, manuscript therefore 2876 participants: 1276 euthymic bipolar
in preparation; A. Pfennig, M. Alda, T. Young, patients (54.7% female) and 1609 healthy controls
G. MacQueen, J. Rybakowski, A. Suwalska, (53.5% female). The bipolar patients were 83.5%
C. Simhandl, B. König, T. Hajek, C. O‘Donovan, Bipolar I, 12.3% Bipolar II, 2.7% Bipolar NOS,
S. von Quillfeldt, D. Wittekind, J. Ploch, C. Sauer, 1.4% Schizoaffective Disorder.
M. Bauer, manuscript in preparation; M.G. Soeiro-
de-Souza & D. Soares-Bio, manuscript in
Statistical analyses
preparation), giving a total of 31 primary data sets
for this reanalysis as shown in Table 2. Parametric statistical tests were used to compare a
Where mood scores were available, euthymia variety of demographic variables between bipolar
was defined as 8 on Hamilton Depression Rat- patients and healthy controls. Where appropriate,
ing Scale (HDRS) (51) or 15 on Montgomery– homogeneity of variance was checked using
Asberg Depression Rating Scale (MADRS) (52) or Levene’s test. All continuous measures (including
11 on Inventory of Depressive Symptomatology depression and mania scores) were converted to
(Clinician Rating; IDS-C) (53) and 8 on Young standardized z-scores within each study sample
Mania Rating Scale (YMRS) (54) or 8 on Clini- (patients plus controls) before further analysis.
Euthymic bipolar cognition: IPDMA
Group effect size of cognitive deficits. To investigate Table 2. List of studies in reanalysis data set
group (patient vs. control) effects on neuropsycho- Study N Nbp Ncont
logical performance, group, age, IQ and gender
were regressed on to each of the 11 neuropsycho- 1 Balanza-Martinez 41 15 26
et al. (26)
logical test outcome measures within each of the
2 Bora et al. (27) 95 65 30
31 studies. For the eight studies that did not use an 3 Cavanagh et al. (28) 39 19 20
explicit measure of IQ, years of education was used 4 Clark et al. (29) 60 30 30
as a proxy (rp = 0.50, P < 0.001). The regression 5 Cubukcuoglu & 101 51 50
Aydemir (49)
coefficient and standard error for group within 6 Dias et al. (46) 115 65 50
each study were then entered for meta-analysis for 7 Dittmann et al. (30) 116 74 42
each outcome variable. Thus, the meta-analysis 8 El-Badri et al. (31) 57 30 27
was effectively performed on study group effect 9 Fleck et al. (32) 51 11 40
10 Fleck et al. (33) 70 22 48
sizes adjusted a priori for the confounds of age, IQ 11 Frangou et al. (34) 86 42 44
and gender. The meta-analyses were conducted on 12 Goswami et al. (35) 74 37 37
both fixed and random effects assumptions, but 13 Hellvin et al. (50)* 228 63 165
results did not differ materially. This analysis did 14 Kaya et al. (48) 62 43 19
15 Kieseppa et al. (36) 140 26 114
not use the more standard IPDMA technique of 16 A. Macritche 56 28 28
mixed model regression (with fixed and random (manuscript in
effects) as the between-study heterogeneity for preparation)
group effect size was considered too high for at 17 Martinez-Aran et al. (4) 69 39 30
18 Martinez-Aran et al. (37) 112 77 35
least some of the outcome measures (see Table 4). 19 Mur et al. (38) 89 43 46
20 A. Pfennig, M. Alda, 54 33 21
Residual mood effects. Residual mood symptoms T. Young, et al.
(manuscript in preparation)
(both depression and mania) could not be added to
21 Senturk et al. (39) 56 27 29
the above analysis because they were confounded 22 Simonsen et al. (25) 146 29 117
with group. However, in an attempt to understand 23 Simonsen et al. (47)† 204 31 173
how much of the group effect on performance 24 Smith et al. (40) 54 21 33
25 M.G. Soeiro-de-Souza & 134 38 96
might be attributable to residual confounding by D. Soares-Bio
mood, two further analyses were conducted. The (manuscript in preparation)
first approach used meta-regression, with each of 26 Stoddart et al. (41) 59 19 40
the studies ascribed a factor relating to the relative 27 Szoke et al. (42) 145 97 48
28 Thompson et al. (43) 126 63 63
level of residual mood symptoms in the patient 29 Torrent et al. (44) 73 38 35
group. The second method considered mood 30 A. Varma 106 53 53
effects within the patient group only using mixed (manuscript in preparation)
model regression with data collapsed across stud- 31 Zalla et al. (45) 58 38 20
ies. Depression scores and mania scores along with Grand total 2876 1267 1609
age, IQ, gender (all fixed effects) and study (ran-
*Data set reduced from that published to exclude participants already included in
dom effect) were regressed on to each of the 11 Simonsen et al. (25, 47).
neuropsychological test outcome measures. †Data set reduced from that published to exclude participants already included in
Simonsen et al. (25).
Drug effects within patient group. To investigate
potential drug effects within the patient group, ber of depressed episodes, number of manic epi-
mixed model linear regression was used. Patients sodes, total number of episodes, number of
were coded for five binary (yes/no) drug status depressed hospitalizations, number of manic hos-
variables: lithium, anticonvulsants, antipsychotics, pitalizations, total number of hospitalizations and
antidepressants and drug free. Each drug status illness duration were each fitted separately into the
variable (fixed effect) together with age, IQ, gender regression model with age, IQ and gender as uni-
(fixed effects) and study (random effect) was versal confounders (fixed effect) and study (ran-
regressed on to each of the 11 neuropsychological dom effect) for each of the 11 neuropsychological
test outcome measures. test outcome measures.
Statistical analysis was conducted in R 2.12.2
Relationship between illness variables and cognitive (The R Foundation for Statistical Computing,
deficits. Mixed model linear regression was also Vienna, Austria) except for the meta-analysis
used to investigate potential relationships between which was conducted in STATA IC Version 11
illness severity measures and neuropsychological (StataCorp LP., College Station, TX, USA). All
test performance within the patient group. Num- statistical tests were two-tailed.
Bourne et al.
Euthymic bipolar cognition: IPDMA
Recog-FP, recognition minus false positives; TMT, Trail Making Test; VLT, Verbal Learning Task; WCST, Wisconsin Card Sorting Task.
just three of 11 outcome measures (when account- size = 0.29, P = 0.006, 95% CI, 0.49 to 0.08)
ing for the effect of mania, age, IQ and gender), of the 11 outcome measures (Ps > 0.1 for all other
typically on measures of memory, speed and execu- effect sizes of antipsychotic status except for VLT
tive function: VLT Total1–5 effect size = 0.09, ShortDelay and VLT LongDelay both with
t652 = 2.68, P = 0.008, 95% CI, 0.16 to 0.03; P = 0.08 and WCSTPersev. with P = 0.09). Being
VLT Recognition effect size = 0.13, t605 = 3.32, drug free improved performance (given effects of
P = 0.001, 95% CI, 0.02 to 0.05; and TMTA study, age, IQ and gender) relative to any drug on
effect size = 0.09, t682 = 2.62, P = 0.009, 95% CI, two of the 11 outcome measures: VLT Total1–5
0.02–0.16. Higher depression scores were related to (effect size = 0.39, P = 0.010, 95% CI, 0.69 to
worse cognitive performance but the effect size was 0.09) and VLT LongDelay (effect size = 0.35,
considerably smaller than the relevant effect size P = 0.017, 95% CI, 0.64 to 0.06; Ps > 0.1 for
for group (see Table 4). There was no overall main all other effect sizes of drug-free status).
effect of mania score on any of the 11 outcome
measures (when accounting for the effect of depres-
Relationship between illness variables and cognitive deficits
sion, age, IQ and gender).
Table 5 shows the illness characteristics of the
patient sample. The mixed model regression analy-
Drug effects within patient group
sis within the patient group suggested that some of
Within the patient sample, there was full informa- these illness variables correlated at better than
tion on drug treatment for 952 patients (75%) and chance with some of the 11 outcome variables
information on lithium status for 1122 (89%). (eight out of 66) but effects were generally small.
Thus, for comparative analysis, 652 patients were Thus, number of manic episodes affected perfor-
on lithium with 470 lithium free, 337 were on anti- mance on three of the outcome measures (given
convulsants with 409 anticonvulsant free, 209 were effects of study, age, IQ and gender): VLT Short-
on antidepressants with 537 antidepressant free, Delay (effect size = 0.07, P = 0.03, 95% CI,
209 were on antipsychotics with 537 antipsychotic 0.14 to 0.01); VLT LongDelay (effect
free and 72 were drug free compared to 880 on at size = 0.09, P = 0.007, 95% CI, 0.16 to 0.03);
least one drug type. The mixed model regression and TMTA (effect size = 0.09, P = 0.03, 95% CI,
analysis within the patient group suggested that 0.01–0.17). Number of total episodes only affected
neither lithium (given effects of study, age, IQ and performance on TMTA (effect size = 0.08,
gender) nor antidepressants (given effects of study, P = 0.03, 95% CI, 0.01–0.15). Number of depres-
age, IQ and gender) affected performance on any sive episodes had no main effects. Number of
of the 11 outcome measures (Ps > 0.1 for all effect depressive hospitalizations also only affected per-
sizes of lithium or antidepressant status). Similarly, formance on TMTA (effect size = 0.26, P = 0.003,
anticonvulsants showed no effect on performance 95% CI, 0.09–0.42) whilst number of total hospi-
(given effects of study, age, IQ and gender) on any talizations affected performance on TMTA (effect
of the 11 outcome measures (Ps > 0.1 for all effect size = 0.12, P = 0.008, 95% CI, 0.03–0.21), TMTB
sizes of anticonvulsants except for WCST Cats. (effect size = 0.13, P = 0.005, 95% CI, 0.04–0.21)
with P = 0.08). Antipsychotics (given effects of and WCSTCats. (effect size = 0.12, P = 0.01,
study, age, IQ and gender) showed a reduced per- 95% CI, 0.21 to 0.03). Number of manic hospi-
formance on VLT Total1–5 only (effect talizations had no main effects. Thus, of the four
Bourne et al.
Bora et al 2007 0.48 (0.14, 0.82) 6.83 Bora et al 2007 0.62 (0.27, 0.97) 6.29
Cavanagh et al 2002 0.90 (0.36, 1.44) 2.69 Cavanagh et al 2002 0.80 (0.22, 1.39) 2.31
Clark et al 2002 0.75 (0.29, 1.21) 3.74 Clark et al 2002 0.58 (0.13, 1.02) 3.92
Cubukcuoglu & Aydemir 0.62 (0.27, 0.96) 6.53 Cubukcuoglu & Aydemir 0.54 (0.19, 0.88) 6.66
Fleck et al 2003 0.98 (0.40, 1.57) 2.30 Fleck et al 2003 0.76 (0.15, 1.37) 2.12
Goswami et al 2006 0.39 (–0.04, 0.82) 4.21
Goswami et al 2006 0.76 (0.36, 1.16) 4.87
Hellvin et al 0.22 (–0.07, 0.51) 9.43
Hellvin et al 0.05 (–0.23, 0.34) 9.86
Kaya et al 2007 0.98 (0.49, 1.46) 3.34
Kieseppa et al 2005 0.43 (0.06, 0.81) 5.55
Kieseppa et al 2005 0.44 (0.04, 0.84) 4.85
Martinez-Aran et al 2004 0.58 (0.16, 0.99) 4.64
Martinez-Aran et al 2004 0.47 (0.07, 0.88) 4.76
Martinez-Aran et al 2007 0.56 (0.20, 0.93) 5.85
Martinez-Aran et al 2007 0.34 (–0.02, 0.69) 6.22
Mur et al 2007 0.06 (–0.34, 0.46) 4.87
Mur et al 2007 –0.05 (–0.46, 0.35) 4.76
Pfenning et al –0.14 (–0.64, 0.36) 3.15
Pfenning et al 0.18 (–0.37, 0.72) 2.64
Simonsen et al 2008 0.02 (–0.35, 0.39) 5.73
Simonsen et al 2008 0.11 (–0.28, 0.50) 5.15
Simonsen et al 2011 0.55 (0.22, 0.89) 6.91
Simonsen et al 2011 0.39 (0.05, 0.72) 7.05
Smith et al 2006 1.00 (0.50, 1.50) 3.15 Smith et al 2006 0.88 (0.36, 1.39) 2.95
Stoddart et al 2007 0.80 (0.24, 1.35) 2.52 Stoddart et al 2007 0.86 (0.33, 1.40) 2.78
Thompson et al 2005 0.58 (0.28, 0.87) 8.85 Thompson et al 2005 0.53 (0.22, 0.84) 8.06
Torrent et al 2006 0.75 (0.36, 1.14) 5.06 Torrent et al 2006 0.52 (0.15, 0.88) 5.77
Varma et al 0.85 (0.51, 1.19) 6.91 Varma et al 0.84 (0.50, 1.18) 6.73
Overall (I-squared = 61.1%, p = 0.000) 0.51 (0.42, 0.60) 100.00 Overall (I-squared = 39.1%, p = 0.038) 0.48 (0.39, 0.57) 100.00
Pfenning et al 0.02 (–0.53, 0.56) 2.59 Pfenning et al 0.32 (–0.34, 0.97) 2.64
Simonsen et al 2008 0.17 (–0.21, 0.56) 5.14 Simonsen et al 2008 0.25 (–0.15, 0.65) 6.98
Simonsen et al 2011 0.67 (0.35, 0.99) 7.58 Simonsen et al 2011 0.50 (0.16, 0.85) 9.57
Smith et al 2006 0.80 (0.27, 1.32) 2.83 Smith et al 2006 0.72 (0.21, 1.23) 4.34
Stoddart et al 2007 1.06 (0.51, 1.61) 2.57
Stoddart et al 2007 0.71 (0.13, 1.28) 3.37
Thompson et al 2005 0.52 (0.21, 0.84) 7.97
Thompson et al 2005 0.57 (0.25, 0.90) 10.64
Torrent et al 2006 0.71 (0.34, 1.09) 5.58
Torrent et al 2006 0.72 (0.31, 1.13) 6.65
Varma et al 0.70 (0.35, 1.06) 6.29
Overall (I-squared = 41.9%, p = 0.026) 0.55 (0.47, 0.64) 100.00 Overall (I-squared = 0.0%, p = 0.651) 0.46 (0.35, 0.57) 100.00
–1.4 0 1.4
Fig. 1. Forest plots showing the main effect of group (accounting for effect of age, IQ and gender) for the five outcome variables
associated with Verbal Learning Task (VLT).
Euthymic bipolar cognition: IPDMA
Balanza-Martinez et al 2005 –0.41 (–1.11, 0.29) 1.69 Balanza-Martinez et al 2005 –0.71 (–1.39, –0.03) 1.57
Bora et al 2007 –0.56 (–0.90, –0.21) 6.89 Bora et al 2007 –0.69 (–1.02, –0.36) 6.78
Cubukcuoglu & Aydemir –0.08 (–0.47, 0.32) 5.45 Cubukcuoglu & Aydemir –0.32 (–0.69, 0.04) 5.41
Dias et al 2009 –0.56 (–0.90, –0.23) 6.62
Dias et al 2009 –0.42 (–0.76, –0.09) 7.30
Dittman et al 2007 –0.30 (–0.59, –0.01) 8.63
Dittman et al 2007 –0.26 (–0.58, 0.05) 8.33
El-Badri et al 2001 –0.60 (–1.12, –0.09) 2.71
Goswami et al 2006 –0.49 (–0.89, –0.09) 5.14
Goswami et al 2006 –1.42 (–1.69, –1.15) 9.65
Macritchie et al –0.05 (–0.59, 0.49) 2.85
Macritchie et al –0.54 (–0.96, –0.12) 4.05
Martinez-Aran et al 2004 –0.84 (–1.22, –0.47) 5.92
Martinez-Aran et al 2004 –0.49 (–0.90, –0.07) 4.13
Martinez-Aran et al 2007 –0.68 (–1.03, –0.32) 6.66
Martinez-Aran et al 2007 –0.43 (–0.79, –0.06) 5.53
Mur et al 2007 –0.44 (–0.77, –0.12) 7.83
Mur et al 2007 –0.49 (–0.79, –0.19) 8.08
Smith et al 2006 –0.17 (–0.73, 0.39) 2.66
Smith et al 2006 –1.27 (–1.71, –0.83) 3.77
Soeiro-de-Souza & Soares-Bio –0.37 (–0.75, 0.02) 5.56
Soeiro-de-Souza & Soares-Bio –0.42 (–0.83, –0.01) 4.33
Stoddart et al 2007 –0.52 (–1.02, –0.02) 3.32
Stoddart et al 2007 –0.67 (–1.16, –0.18) 3.05
Szoke et al 2006 –0.68 (–1.00, –0.36) 8.12
Szoke et al 2006 –0.64 (–0.98, –0.31) 6.47
Thompson et al 2005 –0.50 (–0.82, –0.18) 8.02 Thompson et al 2005 –0.32 (–0.64, 0.01) 6.95
Torrent et al 2006 –0.71 (–1.10, –0.33) 5.62 Torrent et al 2006 –0.50 (–0.91, –0.09) 4.33
Varma et al –0.61 (–0.95, –0.27) 7.30 Varma et al –0.82 (–1.15, –0.49) 6.62
Zalla et al 2004 –0.46 (–1.24, 0.33) 1.34 Zalla et al 2004 –0.78 (–1.52, –0.04) 1.33
Overall (I-squared = 8.4%, p = 0.355) –0.49 (–0.58, –0.40) 100.00 Overall (I-squared = 68.6%, p = 0.000) –0.63 (–0.72, –0.55) 100.00
Fig. 2. Forest plots showing the main effect of group (accounting for effect of age, IQ and gender) for the two outcome variables
associated with Trail Making Test (TMTA and TMTB).
Balanza-Martinez et al 2005 0.71 (0.10, 1.32) 3.33 Balanza-Martinez et al 2005 –0.72 (–1.27, –0.17) 4.04
Bora et al 2007 0.54 (0.15, 0.93) 8.25 Bora et al 2007 –0.47 (–0.86, –0.08) 8.02
Cubukcuoglu & Aydemir 0.40 (0.01, 0.78) 8.42 Cubukcuoglu & Aydemir –0.23 (–0.62, 0.17) 8.02
Fleck et al 2008 0.17 (–0.32, 0.67) 4.96 Fleck et al 2008 –0.18 (–0.69, 0.34) 4.61
Frangou et al 2005 0.64 (0.19, 1.09) 5.95 Frangou et al 2005 –0.20 (–0.69, 0.29) 5.22
Kieseppa et al 2005 0.12 (–0.33, 0.57) 6.11 Kieseppa et al 2005 –0.31 (–0.75, 0.13) 6.28
Martinez-Aran et al 2004 0.16 (–0.29, 0.62) 5.90 Martinez-Aran et al 2004 –0.51 (–0.95, –0.06) 6.23
Martinez-Aran et al 2007 0.07 (–0.30, 0.44) 8.97 Martinez-Aran et al 2007 –0.41 (–0.77, –0.04) 9.28
Melle et al 0.17 (–0.30, 0.63) 5.75 Melle et al 0.04 (–0.41, 0.50) 6.02
Mur et al 2007 0.24 (–0.17, 0.66) 7.19 Mur et al 2007 –0.27 (–0.69, 0.15) 6.94
Senturk et al 2007 0.36 (–0.13, 0.85) 5.12 Senturk et al 2007 –0.48 (–0.95, –0.01) 5.62
Simonsen et al 2011 –0.16 (–0.86, 0.54) 2.50 Simonsen et al 2011 0.10 (–0.62, 0.82) 2.38
Soeiro-de-Souza & Soares-Bio –0.18 (–0.57, 0.20) 8.34 Soeiro-de-Souza & Soares-Bio 0.52 (0.12, 0.92) 7.71
Szoke et al 2006 0.37 (0.03, 0.72) 10.22 Szoke et al 2006 –0.42 (–0.76, –0.08) 10.48
Torrent et al 2006 0.13 (–0.30, 0.56) 6.68 Torrent et al 2006 –0.48 (–0.90, –0.06) 7.01
Zalla et al 2004 0.51 (–0.22, 1.25) 2.29 Zalla et al 2004 –0.70 (–1.46, 0.06) 2.13
Overall (I-squared = 11.7%, p = 0.319) 0.26 (0.15, 0.37) 100.00 Overall (I-squared = 44.7%, p = 0.028) –0.29 (–0.40, –0.17) 100.00
Fig. 3. Forest plots showing the main effect of group (accounting for effect of age, IQ and gender) for the two outcome variables
associated with Wisconsin Card Sorting Task (WCSTCats. and WCSTPersev.).
illness variables that affected cognitive perfor- Table 5. Clinical indices of the patient group
mance, TMTA was affected by all four. Patients
M (SD) Range
Bourne et al.
yses (6–8, 10) (ds = 0.5–1.0). This reduction in more standardized computerized formats locally
observed effect sizes is in part due to controlling or even on line.
better for the effect of age, IQ and gender. How- Nevertheless, the group effect sizes allow confi-
ever, we were also able to include unpublished dence that a substantial average effect is present
studies which often had the lowest effect sizes [e.g. for the domains of attention/working memory,
Hellvin et al. (50) and A. Pfennig, M. Alda, T. verbal memory, speed and executive function. It is
Young, et al. (manuscript in preparation) for VLT somewhat easier to say what cannot explain these
Total1–5, LongDelay and Recog-FP; Cubukcuo- effects, than to say what can. Residual mood
glu & Aydemir (49) and A. Macritche (manuscript symptoms within the patient group were under-
in preparation) for TMTA and TMTB; A. Varma standably confounded with group. However, our
(manuscript in preparation) for FDS and RDS; analysis suggests that residual symptom scores in
M.G. Soeiro-de-Souza & D. Soares-Bio (manu- the patient group cannot explain much of the dif-
script in preparation) for WCSTCats.; and Hellvin ference found between the groups across the vari-
et al. (50) and M.G. Soeiro-de-Souza & D. Soares- ous tests. Cognitive deficits are also not simply
Bio (manuscript in preparation) for WCSTPer- explained as side-effects of drug therapy. This has
sev.]. This suggests the field has had some previously been the subject of debate; some studies
impact from publication bias, which perhaps is suggesting that antipsychotic drugs may cause
unsurprising. some cognitive impairment (62, 63) and others sug-
Specifically, the following effect sizes were found gesting no drug effect on cognitive performance
(compared to prior studies) in the following cogni- (64). The present analysis suggests that most neu-
tive domains: i) verbal memory – Total Score effect ropsychological tests do not exhibit any significant
size = 0.51 (prior studies = 0.90–0.81), Short effect attributable to drug treatment. The only pos-
Delay effect size = 0.48 (prior studies = 0.85–0.73), sible exception is on measures of verbal memory
Long Delay effect size = 0.55 (prior stud- with antipsychotics having an impairing effect on
ies = 0.85–0.71), Recognition effect size = 0.46 VLT Total1–5 and drug-free status being associ-
(prior study = 0.43), Recog-FP effect size = 0.38; ated with improved performance on VLT Total1–5
ii) visual scanning speed – TMTA effect size = 0.49 and LongDelay (relative to any drug). However,
(prior studies = 0.82–0.60); iii) working memory any potential implied drug effects must be treated
capacity – FDS effect size = 0.30 (prior stud- with caution due to the potential for confounding
ies = 0.47–0.37); iv) executive function – TMTB by indication. For example, a history of psychosis
effect size = 0.63 (prior studies = 0.99–0.55), RDS may be related to specific working memory impair-
effect size = 0.60 (prior studies = 1.02–0.54), ments (65–67), and those with a history of psycho-
WCSTCats. effect size = 0.26 (prior stud- sis are also likely to be those currently taking
ies = 0.69–0.52); v) response inhibition = WCST- antipsychotics (68). We could not analyse the effect
Persev. = 0.29 (prior studies = 0.88–0.70). of polypharmacy, which is common in clinical
The high heterogeneity of some tests appears samples, but not in these research samples. It is
to underlie the differences in the results of prior likely that there was a deliberate effort to exclude
meta-analyses. The variation in effect sizes symptomatic and heavily medicated patients from
between the previously published meta-analysis these studies given the intention was usually to
(Table 1) is likely to have been due to variations reduce the confounds between the patient and
in the studies included. In turn, the range of control groups.
effect sizes produced by including a different sub- If illness course had had a negative impact on
set of studies can be directly explained by the rel- cognition, it would potentially be a key finding; it
atively high level of heterogeneity revealed in this could imply that neuropsychological outcome
sample by our analysis (typically 39–84%; see measures are sensitive to treatment. In a partial
Table 4) especially for some tests. The test with support of this hypothesis, some of the neuropsy-
the most heterogeneity in this analysis was chological measures correlated with illness inten-
TMTB. TMTB is known to have considerable sity variables, for example number of manic
variability across test sites (61), thus there episodes appears to affect performance on certain
appears to be a strong case for trying to refine VLT measures, whilst TMTA appears to be espe-
the operationalization of TMTB as well as VLT cially sensitive to potential illness progression
(encoding and short term recall) and Digit Span effects. However, the magnitude of these associa-
(Forward and Reverse). Each test taps domains tions may be unreliable for various reasons. First,
of function markedly impaired in bipolar patients the impact of illness may not be simply cumulative,
as shown by the large average effect sizes. One and the largest effects may occur early in the illness
important possibility would be to present them in course, as appears likely in schizophrenia (69).
Euthymic bipolar cognition: IPDMA
Second, measures of illness severity that depend on indicate that deficits are stable despite long-term
counting episodes in mature samples of patients lithium therapy (76, 77).
are of uncertain validity. Quantifying depressive As with all analyses of neuropsychological per-
episodes when so much of the depressive burden of formance, this study’s findings and conclusions are
bipolar disorder is chronic, subsyndromal and limited by the reliability, validity and psychometric
poorly recalled is questionable; indeed, we found properties of the individual neuropsychological
no associations with number of depressive epi- tests. The high levels of heterogeneity found in this
sodes. Positive findings for more memorable study and the previous standard meta-analyses
events, like manic episodes and numbers of hospi- (6–9, 12, 13) for some measures highlight the need
talizations, appear more likely to be valid and did for standardization in test presentation to try and
produce some significant results in this analysis. meet this limitation. Indeed, the high levels of heter-
The hypothesis that much of the apparent cogni- ogeneity consistently found for some measures
tive impairment of bipolar disorder is attributable raises the question as to whether it is meaningful to
to the accumulated impact of the illness course combine them in a meta-analysis at all. This study
remains plausible but not proven by the present is also limited by the response bias of authors allow-
study. Only adequately powered prospective ing access to their primary data sets. Furthermore,
studies in early stages of illness will establish the it is acknowledged that this study considered out-
effect beyond doubt. come measures from a relatively small number of
Although the range in effect sizes reported here neuropsychological tests. However, despite being
appears to support previous suggestions that exec- limited to those primary studies that consented to
utive function and memory may be especially provide data, and partly because the analysis was
affected in bipolar disorder (6, 8, 9), it is also nota- limited to the most frequently used neuropsycho-
ble that all of the effect sizes reported here could be logical tests, this study contained sample sizes sub-
considered to be small to medium (70) in magni- stantially greater than many of the prior standard
tude across all the cognitive domains investigated. meta-analyses and thus represents a major data
Our results could therefore also be interpreted as synthesis. Furthermore, by using IPDMA (rather
being consistent with the notion of cognitive than standard meta-analysis) this study was both
impairment in bipolar disorder being a relatively able to i) provide the least confounded estimates of
non-specific effect on multiple functional brain net- the effect size relating to cognitive impairment in
works. This can be related to similarly non-specific euthymic bipolar patients and ii) provide the first
imaging findings suggesting lateral ventricle analysis of potential medication and illness severity
enlargement (effect size = 0.39) and increased rates effects on neuropsychological performance in a
of deep white matter hyperintensities without grey statistically valuable sample size.
matter volume decrements (71) in the many imag- In summary, this reanalysis provides further
ing studies conducted in bipolar patients. evidence that euthymic bipolar patients exhibit
Although these structural abnormalities can be significant cognitive impairment on a range of
greater in older patients they are also found in neuropsychological tests. These impairments
samples of similar mean age as the sample in this remain substantial but less than previous work
study (71). The evolving evidence for widely dis- (including previous meta-analyses) has suggested
tributed disturbances in white matter structure (1–4, 6–10). The advantage of IPDMA in control-
from diffusion tensor imaging is also supportive of ling for a greater range of confounding factors
an underlying functional neuropathology (72). and the inclusion of unpublished studies accounts
Although its aetiology remains poorly understood, for this. The impairment effect appears largely
a contribution from intracellular mechanisms regu- independent of drug treatment. Performance on
lating oxidative stress is one hypothesis that is some neuropsychological tests appears to have
assuming increasing importance (73). Given the deteriorated further as illness progressed (i.e.
putative neuroprotective effects of lithium (74, 75), number of episodes increased) but longitudinal
an improved cognitive performance for those data from earlier in the illness course are needed
patients taking lithium relative to those lithium to show that the relationship is causal and clini-
free might have been expected. However, no such cally important. Finally, this review and reanalysis
effect was found; either because lithium does not has highlighted the variability and heterogeneity
enhance cognitive performance or because any between individual primary studies. This means
neuroprotective effect is dependent upon factors, the field remains polarized between the certainty
such as chronic use, which could not be estimated that cognitive impairment is a feature of bipolar
in this dataset. In support of the former ‘ineffective disorder and uncertainty, for example about its
hypothesis’, two recent longitudinal cohort studies heritability, specificity or the impact of illness
Bourne et al.
