Notes On Medical Statistics
Notes On Medical Statistics
Notes On Medical Statistics
Calvin Chong
Chemical Pathology Laboratory
Princess Margaret Hospital
Population and Sample. The population of pathology practice is often practically
infinite in size. Statistical methods that study the characteristics of such population
thus are required to obtain observations or measurements from a representative
group of population called sample.
Random sampling. A sampling is random when the chance of selecting each member
of the population is the same.
Probability distributions. Common population distributions in chemical pathology
includes Gaussian distribution, Log-normal distribution (values which shows a
Gaussian distribution when a logarithmic transformation is performed), and nonGaussian distributions. Some population are said to be bi-modal, where in fact two
separate distributions of values exist in one population.
Parameters. Parameters are constants that define characteristics of a population. One
example outside statistics would be that, a circle on a two-dimension Cartesian space
is defined by the coordinate of its centre (x,y), and its radius r - these three
parameters defines the circle; another example would be that a Gaussian distribution,
which is defined by its mean and standard deviation.
Gaussian distribution. Gaussian (normal) distribution is fully defined by its mean and
standard deviation, and these two parameters defines totally the Gaussian distribution
of that population. For a population (when all measurements have been done on all
individuals), the mean ( ) and population standard deviation ( ) are given as
below:
2
x
( x )
=
=
Sample standard deviation. When the entity of the population cannot be measured,
the population standard deviation can be estimated by sample standard deviation,
which is:
mean=
x SD = ( x )2
N
N 1
The coefficient of variation is the ratio of the standard deviation to the mean,
multiplied by 100%:
CV =
SD
100
mean
( x dr xd )2
D(n1)
( xd x )
D1
n1 2 2
s +s
n r b
It should be noted that between-run variation is often mixed up with the total
variation.
Confidence interval for the imprecision. The distributions of variance are in accord
with the 2 distribution. The confidence intervals (2 SD) are thus given as follows:
SD ( 2.5 )=SD
N1
N 1
SD ( 97.5 )=SD 2
(2.5 , N 1)
(97.5 , N1)
2
Linearity. Linearity refers to the linear relationship between measured and expected
values over the range of measurement. Often, a dilution series is used to confirm
linearity, with measurement obtained in duplicates. The first, second, and third order
polynomial regression are performed, t-test is performed to see whether the obtained
regression coefficient is significant (i.e. different from zero).
Limit of Blank. Limit of blank is defined as the highest apparent analyte concentration
expected to be found when replicates of a blank (sample containing no analytes) are
tested. The limit of blank are defined as follows:
LoB=meanblank + SDblank Pr ( Onetailed , 95 )
The one tailed probability at 95% is 1.645 x SD for Gaussian distributed values. The
above formula specifies for a limit of blank in which when a blank is measured, 95% of
the measurements will yield a value below the limit of blank. The remaining 5%
represents the type I error (false positive). The limit of blank is determined by
measuring a known blank for at least 20 times, with the limit of blank calculated mean
and sample standard deviation obtained from the experiment.
Limit of Detection. Limit of detection is determined by statistical calculation, a
concentration which can be reliably distinguished from the limit of blank. The limit of
detection is calculated such that a specimen with concentration at the limit of
detection, when measured, will yield a concentration above the limit of blank in 95%
of the measurement. The standard deviation used in the equation is determined using
a small but known concentration of a substance of interest.
LoD=LoB+ SDlow concentration sample Pr (One tailed , 95 )
Note that in graphical representation, the one tailed 95% is with the 5% starting from
the limit of blank, towards zero. The recommended number of runs to establish these
parameters is 60, whereas the number of runs to verify a manufacturer-specified LoD
is 20.
Limit of quantitation. Limit of quantitation is the lowest concentration in which the
imprecision and bias of the assay at that particular concentration onwards are
acceptable. Thus the limit of quantitation can be at the limit of detection (e.g. if the
bias and imprecision at LoD is already acceptable for use), or at a higher level.
Functional sensitivity. The concentration at which the coefficient of variation at the
concentration is 20%. Usually used for immunoassays. The figure 20% was originally
determined for use with the thyroid-stimulating hormone assay.
Analytical sensitivity/Sensitivity. This is a term of considerable controversy. An editors
note from Clin Chem 1996 42(12):2051 is appended below:
Editors Note: The term sensitivity has come to have two
meanings: the slope of the calibration curve (IUPAC 2) and the lower
limit of detection (immunoassay workers). This Journal reserves the
term sensitivity for the IUPAC meaning. We call the lower limit of
detection the lower limit of detection.
Relevant discussion was also noted in Clin Chem 1997 43(10):1824-31, as well as
1831-1837 on the same issue. Naturally, the venerable Oxford English Dictionary was
2 The full name of IUPAC was spelt out in the original Editors note. It is International
Union of Pure and Applied Chemistry.
quoted, and no conclusion reached among the two. The IUPAC definition stems from
the fact that, the steeper the calibration curve, the better the ability of the assay to
detect small difference.
Analytical goals. The hierarchy of analytical goals are listed as follows:
Hierarchy Description
I (Best)
Based on clinical outcomes (in studies)
II
Based on clinical decisions in general (biological variation/clinician
opinion)
III
Based on professional recommendations (International expert
bodies/local groups)
IV
Performance goals set by (Regulatory bodies/organizer of EQA schemes)
V
Based on state of the art (EQA proficiency scheme/publications of
methods)
Analytical goals of bias and imprecision based on biological variation. The
recommendations by Cotlove et al (Clin Chem 1970 16:1028-1032) are as follows:
Bias
Imprecisi
on
Desirably, bias less than 0.25 total CV; Optimal to be 0.125 x total CV,
Minimum performance of 0.375 total CV
Imprecision CV less than half of the biological variation CV this is such
that the total CV would not exceed the biological CV by 12% (11.8%);
There are further recommendation that the minimum performance should
be less than 75% of biological CV (total CV <= 1.25 biological CV), and
optimum performance of 25% of biological CV (total CV <= 1.03
biological CV)
It should be noted that the analytical goal of some analytes are set by professional
bodies:
Test
NCEP
CDC
NCEP
Bias
TC
TG
HDL
LDL
CDC
CV
3
5
5
4
3
5
5
-
3
5
4
4
3
5
4
-
NCEP
Total Error
8.9
15
13
12
RCPA
ALP
0.5 or 10%
0.2 or 10%
0.2 or 10%
0.2 or 10%
Clinical sensitivity and specificity. This is based on the 2x2 cross table:
Disease
Positive
Disease
Negative
Test Positive
Test Negative
Sensitivity
A/(A+C)
Specificity
D/(B+D)
When comparing two test, the disease column is changed to test 1, and test column
changed to test 2:
Test 1
Test 1
Test 2 Positive
Test 2 Negative
Positive
A
C
Negative
B
D
The agreement measures of two tests, however, can occur by chance. One of the
best-known measures of agreement above chance is kappa. Kappa represents the
ratio of observed excess agreement beyond chance to maximum possible excess
agreement beyond chance.
Calculation of kappa. The expected index of agreement is calculated from the
probability of positive result in both tests. The expected +/+ and -/- pattern by chance
are thus calculated. Together with the
1. Calculate the expected +/+ pattern: Pr[+ in test 1] * Pr [+ in test 2]
Calculate the expected -/- pattern:
Pr[- in test 1] * Pr [- in test 2]
The expected agreement = Pr[+/+] + Pr[-/-]
2. Observed agreement = Obs[+/+] + Obs[-/-]
3. Calculate with the formula below:
Kappa=
I 0I e
I and I e are the observed and expected index of agreement
1I e O
Interpretation of kappa
Kappa
>0.7
0.4-0.7
<0.4
<0
Interpretation
Excellent agreement beyond
chance
Fair to good agreement beyond
chance
Poor agreement beyond chance
Agreement below chance
be less than 0.7 x the reporting unit however, the rounding should be performed at
the last step to avoid information loss during calculation. Another way of calculating
the reporting unit is by dividing the RCV by the possible information loss during
rounding (e.g. Total SD = 1.2; RCV = 3.324; Reporting unit should be >3.324, e.g. 5
units). Badrick et al (ACB, 2004) suggested a less stringent criteria (50% certainty,
calculated with analytical SD only), such that a value change in the reporting is more
likely than not a real change. It should be clear that the reporting unit depend on the
SD/CV at different levels and one uniform reporting unit cannot be used across all
values especially when an assay spans several decades.
Definitions in uncertainty of measurement. Uncertainty of measurement is established
by first identifying significant uncertainty components, and then assigns a standard
uncertainty for each component. The components are then combined using error
propagation formulae, and the expanded uncertainty is reported.
Word
Analyte
Measurand
Error
Evaluation
Degree of
freedom (v)
Definition
Substance or constituent of interest which is the subject of
measurement
e.g. Concentration of Creatine Kinase-MB in plasma
Quantity intended to be measured (as defined by VIM).
e.g. The creatine kinase B activity in the plasma
Difference between the true value and the measured values
Systematic errors Bias is an estimate of a systematic
measurement error. Theoretically, systemic error can
theoretically eliminated from the result by an appropriate
correction (which has errors in itself.) Systemic errors cannot be
evaluated by statistical means.
Random errors Precision describes the unpredictable (random)
variability of replicate measurement of a measurand. Random
errors can be evaluated by statistical procedures.
Type A evaluation method for evaluation of uncertainty via
statistical analysis of a series of observation.
Type B evaluation method for evaluating uncertainty by means
other than statistical analysis of a series of observation
Coverage factor (k): numerical value that corresponds to z score
in statistics
Mathematically, the number of observation minus the number of
fitted parameters. For type B evaluation, the degree of freedom
is often not given. The calculation can be done by the following
1 u 2
formula: v =
, where u is the standard uncertainty and
2 u
delta u is the estimate of the uncertainty in the standard
uncertainty itself. Conversely, this formula can also be used to
estimate the uncertainty of the uncertainty itself for type A
investigations.
A parameter associated with the result of a measurement, which
characterizes the dispersion of the values that could reasonably
be attributed to the measurands. (ISO 15189)
A non-negative parameter characterising the dispersion of the
quantity values being attributed to a measurand. (VIM
International vocabulary of metrology)
Property of the result of a measurement or the value of a
standard, whereby it can be related to stated references, usually
national or international standards, through an unbroken chain of
( )
Uncertainty
Traceability
Numerical
significance
The dotted line that extend from the lower left to upper right represents a test with no
discrimination and is called the random guess line. The area under the ROC curve is
a relative measure of a tests performance. The Wilcoxon statistic / Mann-Whitney U
test can be used to statistically determine the ROC curve which has more area under
it.
Likelihood ratio. The slope of the ROC curve at a point is the likelihood ratio of disease
for a given test value. The slope of the line drawn from the point to the right upper
corner is the likelihood ratio of the lack of disease for a decision based on that point
(When test is negative, Odds of having disease). Likewise, the slope of the line drawn
from the point to the left lower point is the positive likelihood ratio (When test is
positive, odds of having the disease).
Odds ratio. Odds ratio is defined as the probability of a specific disease divided by the
probability of its absence. For example, a 60% probability is referred to an odds ratio
of 3:2 (or 1.5 to 1); a 8% probability is referred to an odds ratio of 1 to 11.5
P ( B| A ) P( A)
P ( B| A ) P ( A )+ P ( B| A ) P( A)
For example, in a bag with 500 balls, the probabilities that all balls are black and that
1/5 of the balls is black are both equal to . If the first ball drawn is black, what is the
probability that all balls are black?
Assume, P(B) = Black balls drawn; P(A) = That the bag is of all black balls:
P ( B| A )=1 (When the bag is of all black balls, P(Black ball drawn)=1)
Given)
P ( A )=0.5 and P ( A ) =0.5
P ( B| A )=0.2 (When the bag has only 1/5 black balls, P(Black ball drawn)=0.2)
1 0.5
0.5
5
P ( A|B )=
=
=
1 0.5+0.2 0.5 0.5+0.1 6
The limitation of applying Bayes theorem is that the theorem depends on mutually
independent probability between P(A) and P(B), and such is usually not the case in
clinical medicine
Choice of test sequence. When you have a sensitive test and a specific test, when the
goal is to optimize for specificity (i.e. both test must be positive in order to count as
positive), the economic method is to use the more specific test first, followed by the
less specific test. Because by choosing the more specific test first, more people would
be screened-out (negative) on the first testing. Similarly, when the goal is to have a
more sensitive test (i.e. one test positive would count as positive), the economic
method is to use the more sensitive test first.