Notes On Medical Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Statistical notes in Chemical Pathology

Calvin Chong
Chemical Pathology Laboratory
Princess Margaret Hospital
Population and Sample. The population of pathology practice is often practically
infinite in size. Statistical methods that study the characteristics of such population
thus are required to obtain observations or measurements from a representative
group of population called sample.
Random sampling. A sampling is random when the chance of selecting each member
of the population is the same.
Probability distributions. Common population distributions in chemical pathology
includes Gaussian distribution, Log-normal distribution (values which shows a
Gaussian distribution when a logarithmic transformation is performed), and nonGaussian distributions. Some population are said to be bi-modal, where in fact two
separate distributions of values exist in one population.
Parameters. Parameters are constants that define characteristics of a population. One
example outside statistics would be that, a circle on a two-dimension Cartesian space
is defined by the coordinate of its centre (x,y), and its radius r - these three
parameters defines the circle; another example would be that a Gaussian distribution,
which is defined by its mean and standard deviation.
Gaussian distribution. Gaussian (normal) distribution is fully defined by its mean and
standard deviation, and these two parameters defines totally the Gaussian distribution
of that population. For a population (when all measurements have been done on all
individuals), the mean ( ) and population standard deviation ( ) are given as
below:

2
x
( x )

=
=

Whereas the distribution of measurands need not be following the Gaussian


distribution, most measurement errors follow a Gaussian distribution. This is explained
by the Central limit theorem: When a sufficiently large number of independent
random variables contribute to a variable, each with a mean and defined variance, the
variable will be approximate a Gaussian distribution. The original distributions of the
independent variables are not important. Probabilistic calculations that stems from a
Gaussian distribution can be calculated from z:
x

The variable z follows the Gaussian distribution1.


z=

Sample standard deviation. When the entity of the population cannot be measured,
the population standard deviation can be estimated by sample standard deviation,
which is:
mean=

x SD = ( x )2
N

N 1

1 NORMSDIST(z) in Microsoft Excel i.e. NORMSDIST(0)=0.5,

The coefficient of variation is the ratio of the standard deviation to the mean,
multiplied by 100%:
CV =

SD
100
mean

Student t distribution. The student t distribution describes a family of distribution


based on the degree of freedom in the sample standard deviations. When the size of
the sample (and thus the degree of freedom) is infinite, the Student t distribution is
identical to the Gaussian distribution. For example, given n=20 (i.e. df=19),
probability of a measurement to be two SD above the mean is given by:
Pr ( t> 2, df =19)=0.03
This can be calculated in spreadsheet or obtained from statistical tables.
Non-parametric statistics. In cases where the data deviate significantly from the
Gaussian distribution, for example, with asymmetric distribution, non-parametric
approach to reference interval can be used. In fact, non-parametric approach is
preferred by the CLSI guideline C28-A3c:
... the working group endorses its previous recommendation that
the best means to establish a reference interval is to collect
samples from a sufficient number of qualified reference individuals
to yield a minimum of 120 samples for analysis, by nonparametric
means, for each partition... (Emphasis by the CLSI)
With 120 subjects, the 3rd and 118th rank represents the 2.5th and 97.5th centile, and
the first and last 7 ranks represents the 90% confidence intervals for the lower and
upper reference limits, respectively.
Comparison of means. When two means needs to be compared as to whether they are
significantly different, the parametric approach is to use Students t-test. T-test can be
done with paired t-test (in which two measurements are done for each case), or nonpaired t-test (in which two separate groups are compared). The non-parametric
versions of the t-test are the Wilcoxon test and Mann-Whitney test respectively. The
Mann-Whitney test provides for significance test for the difference between two
median (not mean) in the two groups.
Mann-Whitney U Test. The Mann-Whitney U test calculates a value U for each of the
two populations to be compared. The value U is calculated by considering each
element in each population, and the number of element that is of lower rank than the
currently examined element. The measurement is then looked up in a table to
determine its significance.
Wilcoxon test. Wilcoxon test calculates a value T from the sign function and absolute
difference, and the rank of the item when ranked according to the absolute difference.
The value is then looked up from tables for its significance.
Trueness and bias. Trueness is a new term introduced by the ISO to represent the
closeness of agreement between average value of a large series of measurement and
the true value. The difference is expressed numerically as the bias.

Precision. Precision represents the closeness of agreement between independent


results of measurement. Imprecision is typically expressed as the coefficient of
variation, or the standard deviation.
Repeatability. Repeatability is the within-run precision. It is defined as the square root
of sum of the squared differences of daily reading and daily averages, divided by
number of days times the degree of freedom in each day:
sr =

( x dr xd )2
D(n1)

Between-run imprecision. The between-run standard deviation is the sample standard


deviation with the means obtained from each day:
s b=

( xd x )
D1

Total imprecision (Within-laboratory imprecision). The within laboratory imprecision is


given by the formula below, and is expressed in one standard deviation:
s total=

n1 2 2
s +s
n r b

It should be noted that between-run variation is often mixed up with the total
variation.
Confidence interval for the imprecision. The distributions of variance are in accord
with the 2 distribution. The confidence intervals (2 SD) are thus given as follows:
SD ( 2.5 )=SD

N1
N 1
SD ( 97.5 )=SD 2
(2.5 , N 1)
(97.5 , N1)
2

Where N is the number of samples used to estimate the total imprecision.


Precision profile. Typically, at low concentration, the more significant source of errors
comes from constant, rather than proportional error. At this range of concentrations,
the coefficient of variation thus varies inversely with analyte concentration. At higher
concentration, however, proportional error become more and more significant, and
thus coefficient of variation thus become constant with increasing analyte
concentration. Combining the effect at low and high concentration, a graph is yielded
below:

Linearity. Linearity refers to the linear relationship between measured and expected

values over the range of measurement. Often, a dilution series is used to confirm
linearity, with measurement obtained in duplicates. The first, second, and third order
polynomial regression are performed, t-test is performed to see whether the obtained
regression coefficient is significant (i.e. different from zero).
Limit of Blank. Limit of blank is defined as the highest apparent analyte concentration
expected to be found when replicates of a blank (sample containing no analytes) are
tested. The limit of blank are defined as follows:
LoB=meanblank + SDblank Pr ( Onetailed , 95 )
The one tailed probability at 95% is 1.645 x SD for Gaussian distributed values. The
above formula specifies for a limit of blank in which when a blank is measured, 95% of
the measurements will yield a value below the limit of blank. The remaining 5%
represents the type I error (false positive). The limit of blank is determined by
measuring a known blank for at least 20 times, with the limit of blank calculated mean
and sample standard deviation obtained from the experiment.
Limit of Detection. Limit of detection is determined by statistical calculation, a
concentration which can be reliably distinguished from the limit of blank. The limit of
detection is calculated such that a specimen with concentration at the limit of
detection, when measured, will yield a concentration above the limit of blank in 95%
of the measurement. The standard deviation used in the equation is determined using
a small but known concentration of a substance of interest.
LoD=LoB+ SDlow concentration sample Pr (One tailed , 95 )
Note that in graphical representation, the one tailed 95% is with the 5% starting from
the limit of blank, towards zero. The recommended number of runs to establish these
parameters is 60, whereas the number of runs to verify a manufacturer-specified LoD
is 20.
Limit of quantitation. Limit of quantitation is the lowest concentration in which the
imprecision and bias of the assay at that particular concentration onwards are
acceptable. Thus the limit of quantitation can be at the limit of detection (e.g. if the
bias and imprecision at LoD is already acceptable for use), or at a higher level.
Functional sensitivity. The concentration at which the coefficient of variation at the
concentration is 20%. Usually used for immunoassays. The figure 20% was originally
determined for use with the thyroid-stimulating hormone assay.
Analytical sensitivity/Sensitivity. This is a term of considerable controversy. An editors
note from Clin Chem 1996 42(12):2051 is appended below:
Editors Note: The term sensitivity has come to have two
meanings: the slope of the calibration curve (IUPAC 2) and the lower
limit of detection (immunoassay workers). This Journal reserves the
term sensitivity for the IUPAC meaning. We call the lower limit of
detection the lower limit of detection.
Relevant discussion was also noted in Clin Chem 1997 43(10):1824-31, as well as
1831-1837 on the same issue. Naturally, the venerable Oxford English Dictionary was
2 The full name of IUPAC was spelt out in the original Editors note. It is International
Union of Pure and Applied Chemistry.

quoted, and no conclusion reached among the two. The IUPAC definition stems from
the fact that, the steeper the calibration curve, the better the ability of the assay to
detect small difference.
Analytical goals. The hierarchy of analytical goals are listed as follows:
Hierarchy Description
I (Best)
Based on clinical outcomes (in studies)
II
Based on clinical decisions in general (biological variation/clinician
opinion)
III
Based on professional recommendations (International expert
bodies/local groups)
IV
Performance goals set by (Regulatory bodies/organizer of EQA schemes)
V
Based on state of the art (EQA proficiency scheme/publications of
methods)
Analytical goals of bias and imprecision based on biological variation. The
recommendations by Cotlove et al (Clin Chem 1970 16:1028-1032) are as follows:
Bias
Imprecisi
on

Desirably, bias less than 0.25 total CV; Optimal to be 0.125 x total CV,
Minimum performance of 0.375 total CV
Imprecision CV less than half of the biological variation CV this is such
that the total CV would not exceed the biological CV by 12% (11.8%);
There are further recommendation that the minimum performance should
be less than 75% of biological CV (total CV <= 1.25 biological CV), and
optimum performance of 25% of biological CV (total CV <= 1.03
biological CV)

It should be noted that the analytical goal of some analytes are set by professional
bodies:
Test

NCEP

CDC

NCEP

Bias
TC
TG
HDL
LDL

CDC
CV

3
5
5
4

3
5
5
-

3
5
4
4

3
5
4
-

NCEP
Total Error
8.9
15
13
12

RCPA
ALP
0.5 or 10%
0.2 or 10%
0.2 or 10%
0.2 or 10%

Clinical sensitivity and specificity. This is based on the 2x2 cross table:
Disease
Positive

Disease
Negative

Test Positive

Test Negative

Sensitivity
A/(A+C)

Specificity
D/(B+D)

Positive predictive value


A/(A+B)
Negative predictive value
D/(C+D)
Disease prevalence
(A+C)/(A+B+C+D)

When comparing two test, the disease column is changed to test 1, and test column
changed to test 2:
Test 1

Test 1

Test 2 Positive
Test 2 Negative

Positive
A
C

Negative
B
D

The agreement measures of two tests, however, can occur by chance. One of the
best-known measures of agreement above chance is kappa. Kappa represents the
ratio of observed excess agreement beyond chance to maximum possible excess
agreement beyond chance.
Calculation of kappa. The expected index of agreement is calculated from the
probability of positive result in both tests. The expected +/+ and -/- pattern by chance
are thus calculated. Together with the
1. Calculate the expected +/+ pattern: Pr[+ in test 1] * Pr [+ in test 2]
Calculate the expected -/- pattern:
Pr[- in test 1] * Pr [- in test 2]
The expected agreement = Pr[+/+] + Pr[-/-]
2. Observed agreement = Obs[+/+] + Obs[-/-]
3. Calculate with the formula below:
Kappa=

I 0I e
I and I e are the observed and expected index of agreement
1I e O

Interpretation of kappa
Kappa
>0.7
0.4-0.7
<0.4
<0

Interpretation
Excellent agreement beyond
chance
Fair to good agreement beyond
chance
Poor agreement beyond chance
Agreement below chance

Error in reference method vs routine method. Measurement by the reference method


is associated with, by definition, only random error. In a routine method, there is also
bias related to calibration and specificity problems. Non-specificity is a kind of bias,
not imprecision (it is termed sample-related random bias.)
Method comparison. In method comparison, Bland-Altman plot and Deming regression
should be performed. It should be noted that paired t-test are inadequate in method
comparison, because it is unable to detect bias when the overall bias is zero.
Examples of such as is given in the Tietz textbook as below:

Deming regression. Deming regression is a type of linear regression with


consideration paid to the fact that there are errors associated with the measurement
of both values. Ordinary least-square approach to linear regression is suitable for
measurement errors in y but not in x. When OLR is used to perform linear regression
on data where measurement errors exist in x axis, underestimation of slope would
result.
Deming regression is calculated by performing linear regression by minimizing the
distance between points and the regression line in an axis depending on the
uncertainty of y and x axes. The higher the uncertainty over the x axis, the more
horizontal would the line be drawn from the point to the regression line, the higher the
uncertainty over the y axis, the more vertical would the line be drawn from the point
to the regression line.
Deming regression requires the uncertainty of measurement to be known to perform
regression.
Passing-Blabok regression. Passing-Blabok regression is a non-parametric approach to
linear regression. It however assumes a constant ratio of variances, and is based on
rank principle. There is no requirement of the knowledge of uncertainty of
measurement of the two methods.
There is a school of thought that ordinary least-square regression can still be used
when the pearson correlation is very high ( r 0.99 for small data range, e.g. within 1
decade, or r 0.975 for large data range).
Significant figures and reference change value. Reporting interval should be set such
as any result change is greater than the analytical imprecision. A reference change
value (or critical difference) is defined as, if a result changes by one reference change
value, there is 95% chance that such change is not due to normal biological variation
or measurement error alone. The magnitude for reference change value is thus
calculated as follows:
SD ( Each measurement )=z S D2a + S D2i =1.96 S Dtotal
SD( Both measurements)= 2 1.96 S D 2a +S D 2i =2.77 S Dtotal
For the number of significant figure, there are several ways of obtaining the figure.
Hawkins and Johnson (Clin Chem, 1990) suggests that the standard deviation should

be less than 0.7 x the reporting unit however, the rounding should be performed at
the last step to avoid information loss during calculation. Another way of calculating
the reporting unit is by dividing the RCV by the possible information loss during
rounding (e.g. Total SD = 1.2; RCV = 3.324; Reporting unit should be >3.324, e.g. 5
units). Badrick et al (ACB, 2004) suggested a less stringent criteria (50% certainty,
calculated with analytical SD only), such that a value change in the reporting is more
likely than not a real change. It should be clear that the reporting unit depend on the
SD/CV at different levels and one uniform reporting unit cannot be used across all
values especially when an assay spans several decades.
Definitions in uncertainty of measurement. Uncertainty of measurement is established
by first identifying significant uncertainty components, and then assigns a standard
uncertainty for each component. The components are then combined using error
propagation formulae, and the expanded uncertainty is reported.
Word
Analyte
Measurand
Error

Evaluation

Degree of
freedom (v)

Definition
Substance or constituent of interest which is the subject of
measurement
e.g. Concentration of Creatine Kinase-MB in plasma
Quantity intended to be measured (as defined by VIM).
e.g. The creatine kinase B activity in the plasma
Difference between the true value and the measured values
Systematic errors Bias is an estimate of a systematic
measurement error. Theoretically, systemic error can
theoretically eliminated from the result by an appropriate
correction (which has errors in itself.) Systemic errors cannot be
evaluated by statistical means.
Random errors Precision describes the unpredictable (random)
variability of replicate measurement of a measurand. Random
errors can be evaluated by statistical procedures.
Type A evaluation method for evaluation of uncertainty via
statistical analysis of a series of observation.
Type B evaluation method for evaluating uncertainty by means
other than statistical analysis of a series of observation
Coverage factor (k): numerical value that corresponds to z score
in statistics
Mathematically, the number of observation minus the number of
fitted parameters. For type B evaluation, the degree of freedom
is often not given. The calculation can be done by the following
1 u 2
formula: v =
, where u is the standard uncertainty and
2 u
delta u is the estimate of the uncertainty in the standard
uncertainty itself. Conversely, this formula can also be used to
estimate the uncertainty of the uncertainty itself for type A
investigations.
A parameter associated with the result of a measurement, which
characterizes the dispersion of the values that could reasonably
be attributed to the measurands. (ISO 15189)
A non-negative parameter characterising the dispersion of the
quantity values being attributed to a measurand. (VIM
International vocabulary of metrology)
Property of the result of a measurement or the value of a
standard, whereby it can be related to stated references, usually
national or international standards, through an unbroken chain of

( )

Uncertainty

Traceability

Numerical
significance

comparisons all having stated uncertainties.


The significant figures of a number are those that have some
practical meaning. The significant figures of a number express its
magnitude to a specified degree of accuracy.

Receiver Operating Characteristic Curve. The ROC curve is generated by plotting


sensitivity (y-axis) over 1-specificity (x-axis). The third axis is located along the ROC
curve, which represent the decision values. A

The dotted line that extend from the lower left to upper right represents a test with no
discrimination and is called the random guess line. The area under the ROC curve is
a relative measure of a tests performance. The Wilcoxon statistic / Mann-Whitney U
test can be used to statistically determine the ROC curve which has more area under
it.
Likelihood ratio. The slope of the ROC curve at a point is the likelihood ratio of disease
for a given test value. The slope of the line drawn from the point to the right upper
corner is the likelihood ratio of the lack of disease for a decision based on that point
(When test is negative, Odds of having disease). Likewise, the slope of the line drawn
from the point to the left lower point is the positive likelihood ratio (When test is
positive, odds of having the disease).

Odds ratio. Odds ratio is defined as the probability of a specific disease divided by the
probability of its absence. For example, a 60% probability is referred to an odds ratio
of 3:2 (or 1.5 to 1); a 8% probability is referred to an odds ratio of 1 to 11.5

Bayes theorem. Expressed mathematically, note that P(X) is pronounced as The


probability that X is true, and P(X|Y) as The probability that X is true if Y is true.
P ( B ) P ( A|B )=P ( B| A ) P ( A )
P ( B| A ) P ( A )
P ( A|B )=
P (B)
Transforming using the universal identity P ( A )+ P ( A ) =1 :
P ( A|B )=

P ( B| A ) P( A)
P ( B| A ) P ( A )+ P ( B| A ) P( A)

For example, in a bag with 500 balls, the probabilities that all balls are black and that
1/5 of the balls is black are both equal to . If the first ball drawn is black, what is the
probability that all balls are black?
Assume, P(B) = Black balls drawn; P(A) = That the bag is of all black balls:
P ( B| A )=1 (When the bag is of all black balls, P(Black ball drawn)=1)
Given)
P ( A )=0.5 and P ( A ) =0.5
P ( B| A )=0.2 (When the bag has only 1/5 black balls, P(Black ball drawn)=0.2)
1 0.5
0.5
5
P ( A|B )=
=
=
1 0.5+0.2 0.5 0.5+0.1 6
The limitation of applying Bayes theorem is that the theorem depends on mutually
independent probability between P(A) and P(B), and such is usually not the case in
clinical medicine
Choice of test sequence. When you have a sensitive test and a specific test, when the
goal is to optimize for specificity (i.e. both test must be positive in order to count as
positive), the economic method is to use the more specific test first, followed by the
less specific test. Because by choosing the more specific test first, more people would
be screened-out (negative) on the first testing. Similarly, when the goal is to have a
more sensitive test (i.e. one test positive would count as positive), the economic
method is to use the more sensitive test first.

You might also like