Diagnostic Accuracy Part 1 Basic Concepts Sensitivity and Specificity ROC Analysis STARD Statement
Diagnostic Accuracy Part 1 Basic Concepts Sensitivity and Specificity ROC Analysis STARD Statement
Ana-Maria Simundic
University Department of Chemistry
University Hospital SESTRE MILOSRDNICE
School of Medicine, Faculty of Pharmacy and Biochemistry,
Zagreb University
Vinogradska 29
10 000 Zagreb
CROATIA
The discriminative ability of a diagnostic procedure is Furthermore, measures of a test performance are not
called diagnostic accuracy, and a number of quantitative fixed indicators of a test quality, but are very sensitive
measures out of which sensitivity and specificity are to the characteristics of the population in which the test
mostly used in the biomedical literature can express it. accuracy is being evaluated.
Each diagnostic-accuracy measure relates to some Some measures largely depend on the disease
specific aspects of a diagnostic procedure. While some prevalence, while others are highly sensitive to the
measures are used to assess the discriminative property spectrum of the disease in the studied population.
of the test, others are used to assess its predictive ability.
It is therefore of outmost importance to understand the
Discriminative measures are mostly used by health-policy meaning of different measures of diagnostic accuracy
decision makers; predictive measures are most useful for and to know how to interpret them and under what
predicting the probability of a disease in an individual. conditions they may be used.
Page 1
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
that almost all healthy individuals shall have their values Why do we have so many measures of
somewhere within the reference limits, whereas those diagnostic accuracy
who have a disease shall have significantly higher (less
frequently lower) values of a measured parameter. Each measure of diagnostic accuracy relates to some
specific aspects of a diagnostic procedure. While some
What we would expect to observe rather rarely measures are used to assess the discriminative property
are healthy individuals with an elevated marker of the test, others are used to assess its predictive ability.
concentration (the so-called false positives) as well
as diseased individuals with values falling within the Discriminative measures are mostly used by health-
reference interval (false negatives). policy decision makers, whereas predictive measures are
most useful for predicting the probability of a disease in
Even though it may seem as an easy “mission”, the an individual.
absolutely ideal marker does not exist and we therefore
unfortunately always end up with a certain proportion Some measures assess the global performance of a
of individuals having falsely elevated or lowered marker test, whereas others are related to its ability to detect
concentration. or exclude the disease, or to the clinical significance of a
positive or negative test result in a specific patient.
The less of those false positives and false negatives
observed, the better is the marker. What is also important is the fact that measures of a test
performance are not fixed indicators of a test quality.
The only question is: how to measure this discriminative On the contrary, measures of diagnostic accuracy are
potential of some diagnostic procedure (biochemical very sensitive to the characteristics of the population in
parameter, panel of parameters, radiologic analysis or which the test accuracy is being evaluated.
clinical exam)? How to know which procedure is better?
Page 2
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
Studies suffering from some major methodological A collaborative group of researchers have developed the
shortcomings can severely over- or underestimate the STARD (Standards for Reporting of Diagnostic Accuracy)
indicators of test performance and limit the external statement aimed to improve the quality of reporting of
validity of the study, i.e. the generalizability of the studies of diagnostic accuracy.
results of the study.
The statement consists of a checklist of 25 items and
The easiest and most appealing way to design a a flow diagram that authors can use to ensure that all
diagnostic-accuracy study is a so-called “two-gate“ relevant information is present.
(case-control) study design. In such studies, patients are
compared with healthy individuals. The aim and history of STARD as well as the STARD
checklist, STARD flow diagram and many other related
This way, measures of diagnostic accuracy have been documents can be accessed at the official STARD
shown to overestimate the measures severalfold, website: stard-statement.org. The STARD initiative was
compared with properly designed studies that use a very important step toward the improvement of the
single series of consecutive patients to evaluate the quality of reporting of studies of diagnostic accuracy.
same test. The case-control study design is therefore
not recommended. According to the STARD statement, the simple example
of the flow diagram for our study of diagnostic accuracy
In the properly designed study, patients are collected as of S-100B for acute ischemic stroke would be as
a consecutive series of individuals in whom the target presented on the FIGURE 1.
condition is suspected. The biochemical marker under
evaluation is performed in all individuals presenting with Calculating and interpreting sensitivity and
disease symptoms. specificity
Subsequently, the presence of disease is determined A perfect diagnostic marker for acute ischemic stroke
by performing the reference standard method for would have the potential to completely discriminate
diagnosis. individuals with and without stroke. Unfortunately, as
was already pointed out, such perfect diagnostic test
In our example with a new marker (S-100B) for acute does not exist.
ischemic stroke, the ideal design would be as follows:
Therefore, by using the cut-off for S-100B of 0.5 µg/L,
All individuals with acute ischemic stroke symptoms for example, we may classify study participants into four
presenting to the Emergency department of our subgroups considering parameter concentrations:
Neurology clinic are consecutively recruited into the
study. Blood samples are drawn immediately and sent to • True positive (TP) – subjects having stroke and
the laboratory for S-100B concentration measurement. S-100B > 0.5 µg/L
• False positive (FP) – subjects without stroke and
All individuals undergo the same diagnostic work-up S-100B > 0.5 µg/L
and a stroke diagnosis is made based on established • True negative (TN) – subjects without stroke and
criteria, equal for all patients. S-100B < 0.5 µg/L
• False negative (FN) – subjects having stroke and
Subsequently, statistical analysis is performed and S-100B < 0.5 µg/L
measures estimated in order to assess the power of the
S-100B marker to discriminate between individuals with The first step in calculating sensitivity and specificity is
and without acute ischemic stroke. to make a 2 × 2 table with groups of subjects divided
Page 3
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
Eligible stroke
patients
N = 200
S - 100B assay
according to a gold standard or reference method Hence, it relates to the potential of a test to identify
(diagnostic criteria) in columns, and categories according subjects with the disease.
to test (S-100B) in rows (TABLE 1).
In our example the sensitivity is 90 % at a cut-off value
for serum S-100B protein of 0.5 µg/L.
Individuals Individuals
with stroke without stroke
What does it mean? It means that if we measure the
S-100B > 0.5 µg/L TP (N = 90) FP (N = 40)
S-100B concentration in every individual presenting
S-100B < 0.5 µg/L FN (N = 10) TN (N = 60) with stroke symptoms at the Emergency department
TABLE 1: 2 × 2 table for calculating measures of diagnostic accuracy of our Neurology clinic, we shall observe S-100B >
0.5 µg/L in nine out of 10 individuals in whom stroke
was subsequently diagnosed, according to standard
Sensitivity (%) defines the proportion of true positive diagnostic criteria for acute ischemic stroke (gold
subjects with the disease in a total group of subjects with standard).
the disease (TP / (TP + FN)). In other words, sensitivity is
defined as the probability of getting a positive test result Moreover, it also means that if we solely rely on the
in subjects with the disease. S-100B result, in the absence of other diagnostic
options, we would miss one out of every 10 stroke
Page 4
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
patients. The question is: are we willing to accept such These individuals would be exposed to further
diagnostic uncertainty? diagnostic work-up and psychological stress related to
the (spurious) existing probability of having a disease.
So, the sensitivity is a very useful marker that gives us
an idea about the discriminative power of the marker The question again is: are we willing to accept this
and the proportion of diseased individuals missed by the diagnostic uncertainty? The answer is not an easy one,
marker. nor is there a unique answer to this question.
However, what would be far more informative for the The decision on the acceptable level of diagnostic
physician is: if a concentration of S-100B > 0.5 µg/L uncertainty depends on the disease characteristics,
is measured in an individual presenting with stroke healthcare costs and psychological impact of a missed
symptoms, how sure can I be that this patient has a stroke? diagnosis and many other issues.
Unfortunately, sensitivity tells us nothing about it. If a disease is a serious life-threatening condition, we
may not want to miss it, so maximum sensitivity shall be
Specificity (%) is another measure of the diagnostic most suitable.
test accuracy, complementary to sensitivity. It is defined
as a proportion of subjects without the disease with So, the specificity also gives us an idea about the
a negative test result in total of subjects without the discriminative power of the marker. Again, as with
disease (TN / (TN + FP)). sensitivity, what the physician would like to know is: if
a concentration of S-100B < 0.5 µg/L is measured in an
Analogous to sensitivity, specificity represents the individual presenting with stroke symptoms, how sure
probability of a negative test result in a subject without can I be that this patient does not have a stroke?
the disease.
The knowledge about the marker specificity does not
Therefore, we can postulate that specificity relates to provide the exact evidence for such clinical judgments.
the aspect of diagnostic accuracy that describes the test
ability to identify subjects without the disease, i.e. to ROC curves
exclude the condition of interest.
The specificity and sensitivity of every diagnostic test
Again, let us look back at the example with stroke depend on the selected cut-off level. Therefore, a pair
patients and the S-100B diagnostic marker. The of diagnostic sensitivity and specificity values exists for
specificity in our study turned out to be 60 % at a cut- every individual cut-off. The ROC (Receiver Operating
off value for serum S-100B protein of 0.5 µg/L. What Characteristic) curve is constructed by plotting these
does it mean? pairs of values on the graph with the 1-specificity on
the x-axis and sensitivity on the y-axis.
A specificity of 60 % means that if we measure the
S-100B concentration in every individual presenting The shape of the ROC curve and the area under the
with stroke symptoms at the Emergency department of curve (AUC) help us estimate the discriminative power
our Neurology clinic, in six out of 10 individuals in whom of a test. The closer the curve follows the upper left-
stroke was subsequently ruled out, a concentration of hand corner and the larger the area under the curve, the
S-100B < 0.5 µg/L shall be observed. better the test is at discriminating between those with
and without the disease.
It also means that four out of 10 individuals without
stroke shall have a falsely elevated marker concentration.
Page 5
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
1 Nonetheless, sensitivity and specificity may vary greatly
0,9 depending on the spectrum of the disease in the studied
C= 0 ,7
AU
sensitivity C= group. Sensitivity and specificity are commonly used
AU
estimates of diagnostic accuracy.
5
0,
=
interpreted in order to serve as valid evidence for health
C
AU
care providers, clinicians and laboratory professionals; to
the best for the patient care.
0 1
1-specificity
Conclusion
Page 6
Ana-Maria Simundic: Diagnostic accuracy – Part 1 Basic concepts: sensitivity and specificity, ROC ... Article downloaded from acutecaretesting.org
References
1. Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. De
signing studies to ensure that estimates of test accuracy
are transferable. BMJ. 2002; 324(7338): 669-71.