Checklist Systematic Review - Default
Checklist Systematic Review - Default
People sometimes get the review quality and the included study
quality mixed up
Just because a review includes poor-quality studies doesnt mean
it is a poor quality review
You can have a good-quality review of a subject area in which only
poor-quality studies are available
Alternatively, you can have a poor-quality review of a subject area
with good-quality studies
The methodological quality of the included studies may affect
what weight you may wish to put on the results. A review should
discuss this issue, and may present a sensitivity analysis (for
example, an analysis just including high-quality studies to see if
this differs from the analysis of all available studies)
The inclusion of unpublished data (i.e., data from abstracts or data not
published at all). Some independent SRs may include unpublished data
and write to authors to obtain more details. Sometimes, industry
sponsored or authored reviews may include completely unpublished data
with no methodological assessment. Beware of terms such as data on
file which are not subject to external scrutiny
Unpublished data is a complex issue and there are suggestions that trials
that find no effect are less likely to be published (publishing bias). Hence,
including some unpublished data may be very informative. Alternatively,
some unpublished data may be misleading. See how the review has
handled these data, and whether what it has done or included seems
reasonable and free from bias
Some poor-quality reviews may report methodological parameters very
sparsely. It can be difficult, say, to be sure that all the trials included in an
analysis are RCTs. Dont take anything for granted. If it doesnt state that
they are RCTs, dont assume that they are. Sometimes imprecise terms
such as trials or clinical trials are used in the text. If in doubt, look at
the abstracts of studies included in the analysis on PubMed as a first step
STUDY QUALITY
Two reviewers independently assessed study quality
according to the Jadad scale. This records whether a study is
described as randomised and double blind, the methods for
generation of the allocation schedule and double blinding,
and whether there is a description of dropouts during the
trial.
In Data Analysis and Statistical Analysis (slide 20) the review reported that
it had calculated relative risks. This is a commonly reported statistic for
categorical data (e.g., yes / no; present / absent)
In this excerpt from figure 2, it states that the numerators are people with
either global symptoms or abdominal pain unimproved or persistent after
treatment
The relative risk is an appropriate summary statistic to use in this case
Fig 2Forest plot of randomised controlled trials of fibre versus placebo or low fibre diet in
irritable bowel syndrome. Events are number of patients with either global symptoms of
irritable bowel syndrome or abdominal pain unimproved or persistent after treatment
Study
Country
Setting
Lech
Denmar Secondary
1988w29 k
care
Liu
Taiwan Secondary
1997w30
care
Capanni Italy
2005w32
Secondary
care
Cappell Italy
o
2007w31
Secondary
care
Diagnostic criteria
Dose of
for irritable bowel Criteria to define symptom Sample peppermint
syndrome
improvement after therapy
size
oil
Clinical diagnosis
Patient reported
47
200 mg
and investigations improvement in global
three times
symptoms
daily
Clinical diagnosis
Patient reported
110 187 mg
and investigations improvement in abdominal
three or four
pain
times daily
Rome II
Improvement in global
178 2 capsules
symptoms assessed by
three times
validated questionnaire
daily
Rome II and
50% improvement from
57
225 mg
investigations
baseline in overall irritable
twice daily
bowel syndrome symptom
score using questionnaire
data
Duration of
therapy
4 weeks
Jadad
score
3
1 month
3 months
4 weeks
Whenever you combine data from different RCTs there is going to be some
heterogeneity. The question is what degree of heterogeneity is acceptable
We have already mentioned clinical heterogeneity. When results are numerically
combined, a statistical test of heterogeneity should be reported
Statistical heterogeneity is often reported as a chi-squared test (X 2) with a P value
and/or the I2 test statistic
If there is a high degree of heterogeneity among RCTs, this suggests there may be
something different among the RCTs, and that their results should not be combined
If there is statistical heterogeneity among RCTs in an analysis, you should expect
the review to comment on the reasons for this. Often the review will exclude trials
that account for the heterogeneity and recalculate the analysis, and may also
report other sensitivity analyses. If a review does exclude trials, it should have a
good underlying reason for doing so, other than its results are different
If a Forest plot is presented, you can visually examine the spread of the results
In practice, some reviews report significant heterogeneity tests in the results tables,
but dont mention this in the results text or allude to this further. Watch out for this!
Look in Data Synthesis and Statistical Analysis where the review outlines the
tests of heterogeneity it is going to perform;
The review was going to analyse antispasmodics as a group. This is a
grouping of agents with a treatment effect rather than a grouping based on
drug structure;
We might, therefore, speculate that effects may vary by the individual agents
used. Read this section again where the review outlines a priori that it is going
to do a subgroup analysis by each individual agent. A similar procedure is
specified for the fibre analysis
In the Results section under the Antispasmodics subheading the review reports
a sensitivity analysis of the overall result. It also reports I 2 results for the
individual antispasmodics analysis. Read where it discusses the relative
strength of the evidence on the individual agents. It further alludes to this issue
in the Discussion section
In fact, the review found statistical heterogeneity in a number of the analyses.
See how it discusses the issue of heterogeneity in the Discussion section. It also
finds some evidence of publication bias
These issues may affect what weight you may wish to put on the results.
Hence, interpretation may be complex. However, it is important that such
issues are identified and the limitations of any analyses are discussed
Beware of reviews which dont report on or discuss the limitations of their
analyses
You should expect a review to report on the limitations of its analysis. This is often
reported in the Discussion section, as it is in this review
You should also expect the review to discuss its results in the context of previously
reported evidence or guidelines. That is, how they may differ, agree, or contradict
previous studies or practice. Again, this is usually reported in the Discussion section
as it is in this review
Some test statistics are more difficult to interpret clinically than others. For
example, standardised mean differences (SMDs) and effect sizes have no units
specified. Hence, if these are reported, it is difficult to know what the results
actually mean in clinical terms
Similarly, if there is an improvement in pain of, say, 8 points on a 100 point VAS
scale, at what point does the change become clinically important?
Although this particular review didnt report on these type of data, if a review does,
it should give some guidance as to what represents an important clinical effect
In interpreting any result, it is important to remember that statistical significance
and clinical importance are not synonymous. For example, a very large study may
find that an intervention significantly reduces systolic blood pressure by 0.7 mmHg.
The question is than one of interpretation. In terms of the individual, is this clinically
important?
BMJ Publishing Group Limited (BMJ Group) 2011. All rights reserved.
www.clinicalevidence.bmj.com