Appraising the evidence
CRITICAL APPRAISAL
VALIDDITY, IMPORTANCE AND APPLICABILITY
VALIDITY: In Methods section:
design, sample, sample size, eligibility criteria
(inclusion, exclusion) sampling method,
randomization method, measurements, methods
of analysis, etc
IMPORTANCE: In Results section
characteristics of subjects, drop out, analysis, p
value, confidence intervals, etc
APPLICABILITY: In Discussion section + our patient’s
characteristics
CRITICAL APPRAISAL
1.THERAPY
2.DIAGNOSIS
3.PROGNOSIS
4.HARM
5 SYSTEMATIC REVIEW
DESCRIPTTION
OF
CRITICAL APPRAISAL
OF
THERAPY
Validity Questions to ask Key Learning points
(FRISBE)
• Is there outcome data for How do dropouts threaten
all patients show entered validity? Dropouts or those
the trial? • If so, was the lost to follow-up create
F: Patient percentage of patients missing data that might
Follow-Up without outcome data disrupt the balance in
Were all similar between groups? • groups created by
patients who Were reasons why patients randomization, especially
entered the dropped out or were since those who
trial properly missing outcome data well- discontinue a study often
accounted for described? have a different prognosis
and attributed than do those who
at its continue. A large number
conclusion? of dropouts may introduce
Was follow-up systematic differences
complete? between groups in those
lost to follow-up
Why is randomization
important? Randomization
R: guarantees that each
Randomizati subject has the same
• Were patients selected chance of entering any
on Was the at random from the target
allocation group and aims to
population? • Was the balance groups for known
(assignment) assignment randomized?
of patients to and unknown prognostic
• Was the method to factors so that group
treatment generate randomization
randomized? differences can be
appropriate? • Was attributed to the effect of
Was the evidence of concealment
allocation treatment. Allocation
provided? concealment assures that
concealed?
those assessing eligibility
and assigning patients to
groups don’t have
knowledge of the
allocation sequence.
• Were all patients Why intention-to-treat
analyzed in the groups to analysis is important? ITT
which they were preserves the balance of
I: Intention to randomized? • What prognostic factors in
Treat percentage of patients was groups created by the
Analysis excluded from the original random group
Were patients analyses? • How were allocation. It provides the
analyzed in missing outcomes handled truest estimate of the
the groups to (e.g., were missing data effects of treatment
which they imputed using statistical allocation in real-world
were modeling techniques)? • If practice by including data
randomized? missing data were imputed from crossovers, non-
Were all was a sensitivity analysis adherents, dropouts and
randomized or “worst case scenario” those lost to follow-up
patient data analysis done? If so, what
analyzed? did that analysis show?
• Was sufficient
information provided Why should the groups
S: Similar about important be similar at baseline? It
Baseline demographic and clinical is important to verify that
Characteristi characteristics known to
those factors known to
cs of affect prognosis? • If influence outcome are
Patients important differences equally distributed. And
Were groups existed between the
to assess the potential
similar at the groups, did the
effect on the study
start of the imbalance favor the outcome of an imbalance
trial? control or treatment that occurs by chance.
group?
Why is blinding important?
• Potential groups needing Blinding equalizes the
blinding: patients, effect of patient and
providers, raters or therapist expectations on
assessors, data analysts, outcome across groups.
B: Blinding adjudicators. • While For raters, blinding
Were patients, patients and providers are minimizes subjectivity in
health necessarily unblinded in outcome measurement.
workers, and psychotherapy trials, For providers, blinding
study objectivity is enhanced by eliminates the possibility of
personnel the use of blinded raters either
“blind” to and objective outcome conscious/unconscious
treatment? measures. • If appropriate, differential administration
was the integrity of the of effective intervention to
blinding tested and found either group, such as co-
to have been preserved? interventions (unintended
additional care to either
group) or contamination
(provision of the
intervention to the control
group).
E: Equal Why should groups be
Treatment treated equally? Equal
Aside from • Were patients in the treatment helps
the different groups treated guarantee that the
experimental differently in any way groups remain
intervention, (other than the prognostically balanced
were the intervention)? by avoiding systematic
groups differences in the care
treated provided other that the
equally? intervention.
Notable
strengths:
Weaknesses or
concerns: How
serious are the
Summary of article’s threats to
validity validity and in
what direction
could they bias
the study
outcomes?
IMPORTANCE
Calculate
1.CER, (CONTROL EVENT RATE)
2.EER, (EXPERIMENTAL EVENT RATE)
3.ARR, (ABSOLUTE REDUCTION RISK)
4.RRR, (RELATIVE REDUCTION
5.ABI, (ABSOLUTE BENEFIT INCREASE)
6.RBI, (RELATIVE BENEFIT INCREASE)
5.NNT (NUMBER NEEDED TO TREAT)
CRITICAL APPRAISAL THERAPY WORKSHEET
1a. R- Was the assignment of patients to treatments
Randomised?
What is best? Where do I find the information
Centralised computer The Methods should tell you
randomisation is ideal and how patients were allocated to
often used in multi-centred groups and whether or not
trials. Smaller trials may use randomisation was concealed.
an independent person (e.g,
the hospital pharmacy) to
“police” the randomization.
This paper: Yes • No • Unclear • Comment:
1b. R- Were the groups similar at the start of the trial?
What is best? Where do I find the
information?
If the randomisation process The Results should have a
worked (that is, achieved table of "Baseline
comparable groups) the Characteristics" comparing the
groups should be similar. The randomized groups on a
more similar the groups the number of variables that could
better it is. There should be affect the outcome (ie. age,
some indication of whether risk factors etc). If not, there
differences between groups may be a description of group
are statistically significant (ie. similarity in the first paragraphs
p values). of the Results section.
This paper: Yes • No • Unclear • Comment:
Were the groups similar
at the start of the trial?
UJI HOMOGENITAS COVARIABLE
(VARIABEL SOSIOGRAFI &
INDEPENDENT LAINNYA YANG DAPAT
DIUJI) SEBELUM TRIAL DIMULAI
BILA HOMOGEN TRIAL BOLEH
DIMULAI BILA TIDAK HOMOGEN
TIDAK BOLEH DILAKUKAN TRIAL
AKAN TETAPI DILAKUKAN RE-
RANDOMISASI DAN DIUJI KEMBALI
2a. A – Aside from the allocated treatment, were groups
treated equally?
What is best? Where do I find the
information?
Apart from the intervention Look in the Methods section
the patients in the different for the follow-up schedule,
groups should be treated the and permitted additional
same, eg., additional treatments, etc and in Results
treatments or tests. for actual use.
This paper: Yes • No • Unclear • Comment:
2b. A – Were all patients who entered the trial accounted for? –
and were they analysed in the groups to which they were
randomised?
What is best? Where do I find the
information?
Losses to follow-up should The Results section should
be minimal – preferably less say how many patients were
than 20%. However, if few Randomised (eg., Baseline
patients have the outcome of Characteristics table) and
interest, then even small how many patients were
losses to follow-up can bias actually included in the
the results. Patients should analysis. You will need to
also be analysed in the read the results section to
groups to which they were clarify the number and reason
randomised – ‘intention-to- for losses to follow-up.
treat analysis’.
This paper: Yes • No • Unclear • Comment:
3. M - Were measures objective or were the patients
and clinicians kept “blind” to which treatment was
being received?
What is best? Where do I find the
information?
It is ideal if the study is First, look in the Methods
‘double-blinded’ – that is, both section to see if there is
patients and investigators are some mention of masking of
unaware of treatment treatments, eg., placebos
allocation. If the outcome is with the same appearance
objective (eg., death) then or sham therapy. Second, the
blinding is less critical. If the Methods section should
outcome is subjective (eg., describe how the outcome
symptoms or function) then was assessed and whether
blinding of the outcome the assessor/s were aware of
assessor is critical. the patients' treatment.
This paper: Yes • No • Unclear • Comment:
What were the results?
How large was the treatment effect?
Most often results are presented as dichotomous outcomes
(yes or not outcomes that happen or don't happen) and can
include such outcomes as cancer recurrence, myocardial
infarction and death.
Consider a study in which
15% (0.15) of the control group died
10% (0.10) of the treatment group died
after 2 years of treatment.
What is the measure? What does it mean?
Relative Risk (RR) = riskThe relative risk tells us how
of the outcome in the many times more likely it is that
treatment group / risk ofan event will occur in the
the outcome in the control
treatment group relative to the
group. control group. An RR of 1 means
that there is no difference
between the two groups thus, the
treatment had no effect. An RR
< 1 means that the treatment
decreases the risk of the
outcome. An RR > 1 means that
the treatment increased the risk
of the outcome.
In our example, the RR = Since the RR < 1, the treatment
0.10/0.15 = 0.67 decreases the risk of death.
Absolute Risk The absolute risk reduction tells
Reduction (ARR) = risk us the absolute difference in the
of the outcome in the rates of events between the two
control group - risk of the groups and gives an indication
outcome in the treatment of the baseline risk and
group. This is also known treatment effect. An ARR of 0
as the absolute risk means that there is no difference
difference. between the two groups thus,
the treatment had no effect.
In our example, the ARR The absolute benefit of
= 0.15 - 0.10 = 0.05 or 5% treatment is a 5% reduction in
the death rate.
Relative Risk Reduction The relative risk reduction is the
(RRR) = absolute risk complement of the RR and is
reduction / risk of the probably the most commonly
outcome in the control reported measure of treatment
group (ARR/CER). An effects. It tells us the reduction in
alternative way to the rate of the outcome in the
calculate the RRR is to treatment group relative to that
subtract the RR from 1 in the control group.
(eg. RRR = 1 - RR)
In our example, the RRR The treatment reduced the risk
= 0.05/0.15 = 0.33 or 33% of death by 33% relative to that
Or RRR = 1 - occurring in the control group.
0.67 = 0.33 or 33%
Number Needed to Treat The number needed to treat
(NNT) = inverse of the represents the number of patients
ARR and is calculated as we need to treat with the
1 / ARR. experimental therapy in order to
prevent 1 bad outcome and
incorporates the duration of
treatment. Clinical significance
can be determined to some extent
by looking at the NNTs, but also
by weighing the NNTs against any
harms or adverse effects (NNHs)
of therapy.
In our example, the NNT = We would need to treat 20 people
1/ 0.05 = 20 for 2 years in order to prevent 1
death.
FOR TREATMENT EFFECTIVITY
AS
THE OUTCOMES
Absolute Benefit Increase (ABI) = is the
arithmetic difference between the rates of
events in the experimental and control group.
An Absolute Benefit Increase (ABI) refers to
the increase of a good event as a result of the
intervention.
An Absolute Risk Reduction (ARR) refers to
the decrease of a bed event as the result of
the intervention. [ARR = EER-CER]
Relative Benefit Increase
(RBI)
Is the proportional increase
in benefit between the rates
of events in the control
group and the experimental
group.
[RBI = EER - CER / CER]
How precise was the estimate of the treatment effect?
1.The true risk in the population is estimate the true risk
from sample of patients in the trial.
2.This estimate is called the point estimate and interpret
by the confidence intervals (CI) for each estimate.
3.If the confidence interval is fairly narrow then we can be
confident that our point estimate is a precise reflection of
the population value.
4.The significancy test
Numbers Needed to Treat (NNT) is
the number of patients who need to
be treated to prevent one bad
outcome or produce one good
outcome.
In other words, it is the number of
patients that a clinician would have
to treat with the experimental
treatment compared to the control
treatment to achieve one additional
patient with a favorable outcome.
[NNT = 1/ARR]
Will the results help me in caring for my
patient (ExternalValidity/Applicability)
The questions that you should ask before you decide
to apply the results of the study to your patient are:
1.Is my patient so different to those in the study that
the results cannot apply?
2.Is the treatment feasible in my setting?
3.Will the potential benefits of treatment outweigh the
potential harms of treatment for my patient?
Applying the Evidence Worksheet
Similar Patients
1. Are your patients similar
to those in the study?
2. Are they so different that
the results can’t help you?
3. How much of the study
effect can you expect for
your patients?
Realistic Interventions
4. Is the intervention
realistic in your setting?
5. Does the comparison
intervention reflect your
current practice?
6. What alternatives are
available?
Right Outcomes
7. Have all the right
outcomes been
considered?
8. Are the outcomes
appropriate to your
patient?
9. Does the intervention
meet their values and
preferences?
Critical appraisal for therapy
Were the subjects randomized?
Were all subjects received similar treatment?
Were all relevant outcomes considered?
Were all subjects randomized included in the analysis?
Calculate CER, EER, RRR, ARR, AAI,RAI and NNT
Were study subjects similar to our patients in terms of
prognostic factors?
Critical appraisal for Diagnosis
Step 1: Are the results of the study valid?
Was the diagnostic test evaluated in a Representative spectrum of
patients (like those in whom it would be used in practice)?
What is best? Where do I find the information?
It is ideal if the diagnostic test is The Methods section should tell you
applied to the full spectrum of how patients were enrolled and
patients - those with mild, severe, whether they were randomly selected
early and late cases of the target or consecutive admissions. It should
disorder. It is also best if the patients also tell you where patients came from
are randomly selected or consecutive and whether they are likely to be
admissions so that selection bias is representative of the patients in whom
minimized. the test is to be used.
This paper: Yes No Unclear Comment:
Was the reference standard applied regardless of the index test
result?
What is best? Where do I find the information?
Ideally both the index test and the The Methods section should
reference standard should be indicate whether or not the
carried out on all patients in the reference standard was applied to
study. In some situations where the all patients or if an alternative
reference standard is invasive or reference standard (e.g., follow-up)
expensive there may be was applied to those who tested
reservations about subjecting negative on the index test.
patients with a negative index test
result (and thus a low probability of
disease) to the reference standard.
An alternative reference standard is
to follow-up people for an
appropriate period of time
(dependent on disease in question)
to see if they are truly negative.
This paper: Yes No Unclear Comment:
Was there an independent, blind comparison between the index
test and an appropriate reference ('gold') standard of diagnosis?
What is best? Where do I find the information?
There are two issues here. First The Methods section should have
the reference standard should a description of the reference
be appropriate - as close to the standard used and if you are
'truth' as possible. Sometimes unsure of whether or not this is an
there may not be a single appropriate reference standard you
reference test that is suitable and a may need to do some background
combination of tests may be used searching in the area. The
to indicate the presence of Methods section should also
disease.Second, the reference describe who conducted the two
standard and the index test being tests and whether each was
assessed should be applied to conducted independently and
each patient independently and blinded to the results of the other.
blindly. Those who interpreted the
results of one test should not be
aware of the results of the other
test.
This paper: Yes No Unclear Comment:
Step 2: What were the results?
There are two types of results commonly
reported in diagnostic test studies.
1. The accuracy of the test and is reflected
in the sensitivity and specificity.
2. How the test performs in the population
being tested and is reflected in
predictive values (also called post-test
probabilities).
To explore the meaning of these terms, consider a
study in which 1000 elderly people with suspected
dementia undergo an index test and a reference
standard. The prevalence of dementia in this group
is 25%. 240 people tested positive on both the
index test and the reference standard and 600
people tested negative on both tests. The first step
is to draw a 2 x 2 table as shown below. We are told
that the prevalence of dementia is 25% therefore we
can fill in the last row of totals - 25% of 1000 people
is 250 - so 250 people will have dementia and 750
will be free of dementia. We also know the number
of people testing positive and negative on both
tests and so we can fill in two more cells of the
table.
Gold Standard
+ve -ve
Index test +ve 240
-ve 600
250 750 1000
Gold Standard
+ve -ve
Index test +ve
240 150 390
-ve
10 600 610
250 750 1000
What is the measure? What does it mean?
Sensitivity (Sn) = the The sensitivity tells us how well the
proportion of people with the test identifies people with the
condition who have a positive condition. A highly sensitive test will
test result. not miss many people.
In our example, the Sn = 10 people (4%) with dementia were
240/250 = 0.96 falsely identified as not having it. This
means the test is fairly good at
identifying people with the condition.
Specificity (Sp) = the The specificity tells us how well the
proportion of people without the test identifies people without the
condition who have a negative condition. A highly specific test will not
test result. falsely identify many people as having
the condition.
In our example, the Sp = 150 people (20%) without dementia
600/750 = 0.80 were falsely identified as having it.
This means the test is only moderately
good at identifying people without the
condition.
Positive Predictive Value (PPV) This measure tells us how well the test
= the proportion of people with a performs in this population. It is
positive test who have the dependent on the accuracy of the test
condition. (primarily specificity) and the
prevalence of the condition.
In our example, the PPV = Of the 390 people who had a positive
240/390 = 0.62 test result, 62% will actually have
dementia.
Negative Predictive Value This measure tells us how well the test
(NPV) = the proportion of people performs in this population. It is
with a negative test who do not dependent on the accuracy of the test
have the condition. and the prevalence of the condition.
In our example, the NPV = Of the 610 people with a -ve test , 98%
600/610 = 0.98 will not have dementia.
Step 3: Applicability of the results
Were the methods for performing the test described in sufficient
detail to permit replication?
What is best? Where do I find the information?
The article should have sufficient The Methods section should
description of the test to allow its describe the test in detail.
replication and also
interpretation of the results.
This paper: Yes No Unclear Comment:
Hierarchy/Level of evidence
I a. Meta-analysis of RCT
b. Large RCT
II a. Controlled trial without randomization
b. Cohort, case control studies
III a. Cross-sectional
b. Case series, case reports
IV Expert opinion
Impelentation of EBM practice
How to get started
1. Teaching EBM in medical schools
Easier than to change the already existing attitude
Most important
May be included in formal curricula or integrated in
existing activities: ward rounds, on calls, case
presentations, group discussions, journal clubs, etc
2. Workshop for teaching staff
3. Workshop for practitioners, incl. nurses
Resistance to EBM teaching & learning
Rudimentary skill in critical appraisal /
methodological skill
Limited resources, esp. time factor
Lack of high quality evidence
Scepticism toward evidence-based
practice
‘Happy’ with current practice
Development of EBM practice
Passive diffusion model
Active dissemination model
Coordinated implementation
model:
Patients & community
Health administrators
Public policy makers
Clinical policy makers
Strategies for developing EBM practice
Clinical guidelines
Practice development leaders (! Environment)
Development units
Dissemination of good practice
Networking
Research summaries
Action research