Cofactor Statistics
Cofactor Statistics
(a) To describe the stages in the design of a clinical trial, taking into account the
research question and hypothesis
- Literature review
- Statistical advice
- Ideal study protocol to minimise the risk of bias and to achieve
- Optimum power of the study
- Ethical issues and informed consent
- Data collection and processing
(b) To explain the concepts in statistics such as distribution of data and frequency
distributions, measures of central tendency and dispersion of data, and the
appropriate selection and application of non-parametric and parametric tests in
statistical inference
(c) To explain the principles of errors of statistical inference and describe techniques to
minimise such errors through good study design
(d) Have an understanding of sources of bias and confounding in medical research and
methods available that can reduce such bias
(e) To describe the features of a diagnostic test, including the concepts of sensitivity,
specificity, positive and negative predictive value and how these are affected by the
prevalence of the disease in question
(b) Histogram
Displays a frequency distribution of “class intervals” (along the x-
axis), where each vertical rectangle for each “class interval” is
proportional to the frequency
Shows distribution of data according to its overall shape (Ie.
symmetric, skewed, multimodal) and location of central tendency
of data
(d) Dot plots – Individual interval data points are presented (cf.
subdividing them into class intervals). Useful with relatively small data sets
(e) Scatterplots (See “Association” below)
- (2) Categorical data
o Data that is discrete and qualitative. Either:
(a) Nominal – Data stratified into groups with no rank order and each
group is arbitrarily labelled (Eg. gender, hair colour)
(b) Ordinal – Data stratified into groups with rank order where each
group has no direct relationship with one another (Eg. pain score where a
score of 5 is not necessary half the pain intensity as a score of 10)
o Presenting categorical data:
(a) Table
“Summary table” of data (Eg. frequencies of the data set)
(d) Line graph – Similar to bar chart except bars replaced by points joined
by a line. Useful when categories depicted on x-axis exist as a continuum
(Eg. sedation score)
- The strength of the association or correlation between two sets of paired interval data (Ie.
closeness to a straight line) can be evaluated by the “Correlation coefficient”:
o If data sets are normally distributed, a “Pearson correlation coefficient (r)” is
calculated:
“r” exists in a range between -1 (all data points exist on a straight line with
a –ve correlation) and +1 (all data points exist on a straight line with a
+ve correlation). A “r” of 0 means no correlation (Eg. random scatterplot
pattern)
P-value can be determined to assess the statistical significance of “r” (Ie.
whether a true linear correlation between the two sample data exists) and
whether linear regression analysis can be undertaken to derive an equation
for the straight linear relationship
“r2” can be calculated as a value between 0 and +1 – When this is
expressed as a percentage, it is the amount of variance between the two
sample variables that is shared (Eg. r2 of 0.8 between x and y means 80%
of the variance in y is due to variation in x, and vice versa)
o If data set is not normally distributed, a “Spearman or Kendal correlation
coefficient” is calculated instead
- Note that a strong association or correlation does NOT imply causality between the two
variables – All other potential reasons for the association or correlation must be
considered and excluded first!
Linear regression:
- If a statistically significant correlation exists between two variables, the nature of the
correlation or association between them can be examined by calculating the equation for
the straight linear relationship
Y= a + bX + E
Bland-Altman plot:
- Correlation coefficient analysis is valid only when comparing two independent (or
completely different) variables that may be associated (Eg. patient height and weight) –
BUT when comparing two different methods of measuring the same variable (Eg. PA
catheter vs transoesophageal ECHO for CO monitor), this form of analysis may be
misleading as a statistically significant correlation may still mean clinically unacceptable
differences between the two methods of measurement
- Bland-Altman plot is more relevant in this situation. This involves:
o Plotting the mean of each individual data pair (x-axis; Eg. PA catheter CO
measurement) against the difference between each data pair (y-axis; Eg. difference
in CO measurement between the two methods)
o Correlation coefficient (r) between the two methods can be determined
o “Bias” or mean difference between all the data pairs is shown by a solid line
o “Limits of agreement” or a 95% CI for range of differences between individual
data pairs is derived (equal to twice the standard deviation of the distribution of
the differences)
Distribution of data:
- (1) Normal (or “parametric”) distribution
o Characterised by:
A unimodal, symmetrical, bell-shaped curve whereby all 3 measures of
central tendency (mean, median, mode) are equal and described at the
zenith of the curve
The “mean” positions the curve on the x-axis and is its best measure of
central tendency
The “standard deviation” (SD) determines the shape (or width) of the
curve as it is the measure of data variability – Note that 68% of data lie
within +/- 1 SD of the mean; 95% data lie within 2 SD of the mean;
99.7% data lie within +/- 3 SD of the mean
o The proper distribution to be used for a given sample size is specified by the
“degrees of freedom” (which is equal to n – 1)
- (4) Binomial distribution:
o Describe any situation where there are “n” independent trials with two mutually
exclusive and independent outcomes (Eg. “n” number of coin tossing events),
and the outcome of interest occurs with a probability of “p” on each trial (Eg.
“p” of tossing tails and “1-p” of tossing heads)
o It follows a normal distribution provided “n” is reasonably large and “p” does not
take too extreme a value (0 or 1), such that:
Mean of binomial distribution = n x p
SD of binomial distribution = √np(1-p)
- (5) Poisson distribution:
o Discrete distribution where there is no strict upper limit to the possible values of
the variable. The variable is the count of a number of independent events that
occur randomly in a fixed interval of time or space (Eg. radioactive emissions
from a source over time)
o Used as the limiting form of a binomial distribution when “n” is large, and “p” is
small
The choice of a specific statistical testing for inference will depend on:
- (1) Type of data being tested (Eg. interval or categorical)
- (2) Number of groups of data being tested (one group, two groups or multiple groups)
- (3) Distribution of the data (Eg. normal distribution)
o The type of statistical test employed (parametric vs non-parametric) will depend
on the distribution pattern of the (i) population parameter being studied, and (ii)
sample data
o The “normality” of the sample data distribution is assessed by:
(a) Observing the distribution of the histogram or frequency curve –
However, with small sample sizes (n < 20) this may not be obvious
(b) When sample sizes are small (n < 20), normality can be assessed by
formal statistical analysis using the “Shapiro-Wilkes test”
o A “parametric test” can be used IF:
(a) Population parameter being studied follows a normal distribution
(regardless of the study’s sample size or distribution of sample data)
(b) Distribution of the population parameter under study is unclear BUT
the sample data obtained appears to follow a normal distribution (given a
large enough sample size)
(c) Non parametric data is converted into one that follows a normal
distribution by logarithmic, square root, or reciprocal transformation
o A “non-parametric test” is used if there is any doubt regarding the distribution of
the population parameter or the sample data (esp when “n” is very small)
o Note that in order to avoid choosing the incorrect type of statistical test, a large
number of subjects in the study should be employed (n > 100) – This is allows
both types of analyses to produce similar results and are equally power (otherwise
misleading results occur when the wrong test is applied)
- (4) Pairing of data (Eg. paired vs unpaired)
o “Unpaired statistical analysis” should be used for unpaired (or unmatched or
independent) data – This describes data obtained from studies where treatment
groups (and their respective outcomes) are INDEPENDENT of one another
(Eg. RCT where subjects are allocated randomly into treatment groups such that
each group should be as similar to each other as possible EXCEPT for the
intervention they receive)
o “Paired statistical analysis” should be used for paired (or matched or dependent)
data – This describes data typically obtained from “Crossover studies” where all
subjects recruited receive one treatment, followed by the other after a suitable
“washout” period (Ie. to allow the effects of the first treatment to subside). The
effectiveness of data pairing can be assessed by “correlation coefficient” (and its
corresponding P-value)
o Note that “paired statistical tests” are MORE powerful and require fewer subjects
to be recruitment to prove a difference exists between groups – BUT more time
is required for the “crossover study” to occur, and there is risk that the “washout
period” is inadequate
SEM = s2 = SD
n √n
- SEM relates (a) inversely with sample size (Ie. SEM decreases with larger
sample sizes), and (b) proportionately with SD (Ie. SEM decreases with SD)
o (b) Determining if the sample data differs from a population parameter
(a) Student’s one sample t-test (for parametric data)
“t-value” is calculated as follows:
t = (x – μ)
SEM
P-value for this test statistic is looked up on the relevant t-
distribution (with degrees of freedom equal to n-1) to assess if the
difference between the sample data and population parameter is
statistically significant
(b) Wilcoxon rank sum test (for non-parametric data)
Each sample datum is assigned a rank depending on how far it is
from the median (Ie. datum values lower than the median have –
ve values; those higher than the median have +ve values)
Then these signed ranks are summated to produce a “W-value” –
If the null hypothesis is true, W is near zero
- (2) Comparing two groups of interval data:
o (a) Student’s two sample t-test (for parametric data)
Can used to compare results of two unpaired (independent) or paired
(dependent) groups of parametric data
This is done by calculating either:
(i) A confidence interval (CI) for the difference in means between
the two sets of data
(ii) A P-value for the difference in means between the two sets of
data, which involves determining the t-value:
t = (xA – xB) __
√(s2/nA + s2/nB)
Note that for this t-test to be valid, variance of both groups must be
SIMILAR for pooling of the variance to occur (otherwise Welch’s
correction to the t-test must be applied)
o (b) Non-parametric interval data
(i) Mann-Whitney U test is used to compare results of two unpaired (or
independent) sets of non-parametric data
(ii) Wilcoxon matched pairs test is used to compare results of two paired
(or dependent) sets of non-parametric data
Note that both tests analyse data by comparing the medians (cf. means)
and by considering the data in rank order values (cf. absolute values)
- (3) Comparing three or more groups of interval data:
o (a) For parametric data:
(i) Analysis of variance (ANOVA)
Compares mean of three or more unpaired (or independent) sets
of parametric data to determine if they are statistically significant
from one another
If there is statistically significant analysis, this means at least one
of the data sets has a different mean from the others (but this
does NOT mean that all of them do!) – Thus, several “post-hoc”
tests can be done to determine which differences between certain
datasets are significant
(ii) Repeated measures ANOVA test
Similar to ANOVA except used for datasets that are paired (or
dependent)
o (b) For non-parametric data
(i) Kruskal-Wallis ANOVA by ranks test (for unpaired or unmatched
datasets)
(ii) Friedman test (for paired or matched datasets)
2 x 2 contingency table Data type 1 (Eg. death) Data type 2 (Eg. survival) Totals
Category 1 (Eg. A B A+B
treatment A)
Category 2 (Eg. C D C+D
treatment B)
Totals A+C B+D N=A+B+C+D
- (a) For two sample proportions of categorical data grouped in a 2x2 contingency table:
o “Fisher exact test” is used when data are unpaired, while the “McNemar’s test” is
used when data are paired
o Unlike the X2 test, these tests do not assume random sampling and are used with
small sample sizes (n < 20) and “expected” frequency (E < 5)
o These tests calculate the probability of all tables that would produce the same
observed marginal totals (Ie. A+B, C+D, A+C, B+D)
- (b) For three or more sample categorical data grouped in a contingency table, a “Chi-
squared test” is used:
o This test assumes:
(i) Random sampling of data from the population
(ii) Sufficient sample size (n > 20)
(iii) “Expected” frequency at least > 5 (unless Yates’ correction applied)
(iv) Observations independent of each other (Ie. unpaired)
o Test statistic (X2) is calculated as follows
Oi = Observed frequency
Ei = Expected frequency
n = Number of cells in table
Where the “expected” frequency for each cell is the “row total” x “column total” divided by N:
[(A+B)x(A+C)] / N [(A+B)x(B+D)] / N
[(C+D)x(A+C)] / N [(C+D)x(B+D)] / N
= (a/(a+b))
(c/(c+d))
o Requires an estimate of incidence – It can be directly estimated from RCTs and
cohort studies
o Can be used to determine:
(i) Relative protection (RP) = (1 / RR)
(ii) Relative risk reduction (RRR) = (1 – RR)
- “Absolute risk reduction” (ARR) and “Number needed to treat” (NTT):
o ARR is the difference in incidence of an outcome between exposed and
unexposed groups
o NNT is the number of cases needed to be treated to avoid one outcome (or
disease). It is calculated by the reciprocal of ARR
NNT = (1 / ARR)
= (a/c)
(b/d)
o A measure of strength of an association in retrospective case-control studies (but
can also be used for RCTs and cohort studies)
- “Sensitivity” is the proportion of reference test +ve patients (diseased) who test +ve with
the screening test
- “Specificity” is the proportion of reference test –ve patients (no disease) who test –ve
with the screening test
Specificity = d = TN__
(b+d) (TN + FP)
- “Positive predictive value” is the chance a subject has the disease after being tested +ve
by the screening test
PPV = a = TP__
(a+b) (TP + FP)
- “Negative predictive value” is the chance a subject is not diseased after being tested –ve
by the screening test
NPV = d = TN__
(c+d) (TN + FN)
Note that interpreting +ve (and –ve) test results by the PPV (and NPV) are
influenced by prevalence of the disease (or pre-test chance)
- “Likelihood ratio” (LR) is defined as the likelihood that a given test result would be
expected in a patient with the target illness compared to the likelihood that the same
result would be expected in a patient without the target illness
Probability Theory:
- Probability theory is vital to inferential statistical analysis as it helps predicts population
parameters from sample data on the assumption that sample data are “typical” of the
population data
- It determines the probability of an event mathematically – This probability is expressed as
a relative frequency that the event occurs in an infinite number of trials. This frequency
ranger from 0 (event never occurs) to 1 (event always occurs)
- Scenarios:
o For mutually exclusive events (Ie. occurrence of one event is dependent of the
other events occurring):
P(A and B) = 0
P(A or B) = P(A) + P(B)
o If the events are exhaustive, the probability of the events will add to 1 (Eg.
binomial probabilities such as coin flipping) – Thus, P(A or B) = 1
o If the two events are not mutually exclusive (Ie. occurrence of one event is
independent of the other event occurring):
P(A and B) = P(A) x P(B)
P(A or B) = P(A) + P(B) – P(A and B)
Significance Testing:
- (2) Calculate the difference between the groups with respect of the outcome being
measured (“point estimate”)
- (3) Calculate the probability of obtaining the observed data assuming the difference
observed is generated by random chance alone (Ie. HO is true):
o (a) This is accomplished by selecting an appropriate statistical method to produce
a “test statistic” (Eg. t-test with t-value)
o (b) The appropriate probability distribution for this test statistic is looked up
according to sample size and variability of the data (Eg. t-distribution for t-value
with proper degrees of freedom), and the probability value for it is determined
o (c) If the probability of HO being true is smaller than a set “critical value” (Ie. P <
0.05), then it can be confidently rejected (Ie. conclude difference is likely real); if
the probability of HO being true is higher than this “critical value”, then it cannot
be rejected (Ie. conclude difference is likely due to chance alone)
Note that it is more appropriate to do two-tailed test – This is because analysis of data by a one-tailed
test may incorrectly accept the HO (Ie. differences between group due to chance alone) when the
opposite result to that expected is revealed
Overview of P-value:
- “P-value” is defined as a probability number between 0 and 1 indicating the likelihood
that a difference seen between two groups is due to chance alone (Ie. probability that the
HO is true)
- The threshold P-value for statistical significance is equivalent to “Type I error” (α), which
is the probability of inappropriately rejecting the “null hypothesis”:
o α (or the threshold P-value for statistical significance) is usually set at 0.05 by
researchers – This means that there is a 1 in 20 chance for erroneously rejecting
the null hypothesis when in fact the difference seen is really due to chance
o A lower α (or threshold P-value for statistical significance) means less chance of
erroneously rejecting the null hypothesis – However, this decreases the power of
the study (Ie. reduced ability to detect a difference between two groups when one
exists) unless a compensatory rise in sample size is made
o As a result, if the P-value is LESS than α (or < 0.05), then the null hypothesis can
be rejected (as there is a very small chance that it is true), and the difference
between the two groups is likely to be real and unlikely to be due to chance alone
- Issues with using P-values:
o (1) Only indicate statistical significance between the two group parameters being
compared
o (2) Lacks clinical applicability – P-values do not provide clinical significance as
there is no indication of magnitude or the direction of difference between the two
groups parameters being compared
o (3) Does not provide any information about the precision of the results
o (4) Does not provide an estimation of the population parameter
Thus, a “95% confidence interval” is the range of values derived from a sample
population of interest where there is a 95% probability of encompassing the true
population estimate
Nb. β is larger than α – This is because it is safer and more ethical to falsely accept NO rather than to
falsely reject it (Ie. more risk of incorrectly changing medical practice as it can cause harm!)
- Power
o Defined as the ability of the study to detect a difference between two groups
when a difference actually exists, thus allowing the null hypothesis to be rejected
appropriately
o Power can be calculated as follows:
Power = 1 – β
Thus, power can be decreased by a rise in type II error, which is generally
caused by the following factors:
- (i) Recruiting an inadequate sample size for the study
- (ii) Decreasing type I error (lowering α) – UNLESS a compensatory
rise in sample size is made
- (iii) Small effect size (Ie. little difference exists between the two
groups)
- (iv) Imprecise means of measuring the study and outcome factors
- (v) Employing a two-sided test (rather than a one-sided one)
Note that the sample size (n) for a given study will depend on:
- (1) Power required (MAIN) (Z1-β)
- (2) Size of minimum actual effect important enough to
n = (Z1-α/2 + Z1-β) x σ2
require detection (Δ2)
Δ2
- (3) Two-sided significance level (Z1 – α/2)
- (4) Variance in the population (σ2)
Study Design:
o In order to minimise systematic bias (esp confounding factors) that can obscure
the true relation between the study and outcome factors, a good RCT will employ
– (i) Randomisation, (ii) Double blinding, and (iii) Allocation concealment
o Advantages:
(i) Prospective nature of study – Study factor (Eg. intervention) can be
administered in a precise and controlled manner, and the outcome factor
can be measured over time. Also allows the size, funding and data analysis
of the study to be determined prior to commencing the study
(ii) Randomisation of subjects to treatment groups – This minimises
allocation bias and confounder bias (caused by unequal distribution of
confounding factors between treatment groups)
(iii) Double blinding – Experimenters and subjects are blinded to the
designation of study factor to subjects within the study. This decreases
subject-observer bias
(iv) Control group – Allows more meaningful conclusions that can be
made of the studied treatment
(v) Measurements (esp parametric data) can be chosen precisely, thus
making it easier to make observations consistently and for statistical
methods to be used
(vi) RCTs can have a high power – This is ideal for detecting small but
clinically relevant conclusions
(vii) RCTs allow for subgroup analysis – This enhance usefulness for
clinical practice
(viii) Large, multi-centre RCTs have greater applicability and afford higher
level of evidence (level I)
(ix) Even RCTs with inconclusive results can be eminently publishable
o Issues:
(i) Increased expense and time consumption (Ie. very difficult and costly
to organise and supervise study at multiple research sites)
(ii) Results may not mimic real life treatment situation
(iii) Risk of choosing subjects whose consent is invalid or treatments that
are unethical (Ie. denying treatment to one group)
(iv) Recruitment/selection bias can occur (Ie. patients too ill or declined)
– This causes patient selection to be too specific, which results in
decreased the variance and reduced applicability of the study results to the
general population
(v) Inaccurate results due to systemic bias with poor RCT design (Ie. lack
of blinding, poor randomisation process)
- (2) Cohort study:
o Used to provide evidence of causality – Used to study aetiology, therapy and
prognosis clinical questions
o Less ideal than a RCT – Thus used when an RCT is NOT feasible (Ie. test the
effects of smoking)
o Study group is selected from a population of interest and the participants are
separated according to the exposure or lack of exposure to the study factor (but
not allocated per se). The outcome of interest is measured in a prospective
fashion from the time of exposure to the study factor
o Advantages:
(i) Allows causality as there is temporal sequence between exposure and
outcome
(ii) Allows assessment of multiple outcomes and of other factors that may
influence outcome
(iii) Permits measurement of incidence
(iv) Exposure can be measured without bias
o Issues:
(i) Not always cost/time efficient (can require very large sample sizes)
(ii) May be difficult to accurately define and measure exposure at times
(iii) Bias can occur due to loss to follow-up
- (3) Case-control study:
o Used to provide evidence of causality – Used to study aetiology, therapy and
prognosis clinical questions
o Less ideal than a RCT – Thus used when an RCT is NOT feasible (Ie. test the
effects of smoking)
o Studies begin with identification of +ve cases (Ie. have outcome of interest), then
controls are then selected to be compared with these cases retrospectively for
their exposure to the study factor
o Issues:
(i) Difficulty is ensuring that the cases and the controls are from the same
population (minimise if study is population-based rather than hospital-
based)
(ii) Appropriateness of control group (need to question if control would
be a +ve case if they had the outcome of interest)
(iii) Potential for bias (esp selection bias of cases/controls, recall, survivor
and misclassification biases)
(iv) Cannot infer causality due to lack of temporal sequence of events
(v) Cannot derive incidence or prevalence data
(vi) Inefficient type of study if study factor (or exposure) is rare
o Advantages:
(i) Capable of detecting an effect with much smaller numbers (cf. cohort
study)
(ii) Can explore the importance of several study or aetiological factors
simultaneously (“Hypothesis generation”)
(iii) Relatively quick and inexpensive type of study
(iv) Good for studying causes of rare outcomes or outcomes with long
latency periods
- (4) Cross-sectional study
o Used to provide evidence of causality – Used to study aetiology, therapy and
prognosis clinical questions
o Study population is selected and then each participant is examined for the
presence of the study factors and outcome factors. The relationship between the
study and outcome factors are examined as they exist at one time
- (5) Prospective diagnostic study
o Used to study diagnostic clinical question
o Involves a blinded comparison where all subjects get the test of interest as well as
a test with 100% certainty (“gold standard”)
Overview of bias:
- “Bias” is defined as the systematic disposition of a trial design that causes an estimated
measure of association or frequency to differ way from the truth (Ie. produces results that
are consistently better or worse than they actually are)
- It can occur in either direction, and does not necessarily get smaller with an increase in
sample size
- Types of bias:
o (1) Recruitment/Selection bias
Bias in choosing the subjects to participate in the study. This can lead to
certain subjects being included (or excluded) from the study
Includes – Sampling bias, Volunteer bias, Prevalence bias
o (2) Information bias
Bias occurs during taking of measurements or recording of data
Includes – Recall bias, Misclassification bias, Subject-Observer bias,
Ascertainment bias, Measurement bias
o (3) Allocation bias
Situation where recruitment and allocation are performed by the
experimenter. This can lead to removal (or not recruiting) a patient after
knowledge of allocation
o (4) Confounding bias (See below)
o (5) Publication bias – Tendency for positive studies to only be published
o (6) Language bias – Tendency to limit searches to English
- “Confounding bias” is the largest source of bias (esp in RCTs):
o It is defined as a situation in which a measure of the effect of the study factor is
distorted because of the association of the study factor with other factors (aka.
“confounders”) that influence the outcome
o Must fulfil three criteria:
(i) Independent risk factor for the outcome of interest
(ii) Not be an intervening variable on the causal pathway between study
and outcome factor
(iii) Associated with the study factor in the data being analysed
o Confounding can be minimised by:
(i) Omitting the confounding factor during recruitment (Ie. exclude
certain age group or gender)
(ii) Randomisation – This ensures that potential confounders are equally
distributed between treatment groups
(iii) Collate potential confounders in a table (Eg. Table 1) so any
maldistribution can be evaluated – Note that the P-value of the size of
difference of the confounding factor between treatment groups does
NOT indicate whether confounding is likely or not. It really depends on
how powerful the confounding factor is!
(iv) If there is maldistribution of a potential confounder between
treatment groups, it can be corrected for mathematically using statistical
methods (Ie. analysis for the potential confounder can be performed to
see if the treatment outcome under study is similar in subjects with and
without the potential confounder)
- Removing bias from a study:
o In order to eliminate all possible bias, the study should employ:
(i) Allocation concealment – Experimenters are unaware of the
randomisation sequence and assignment of each subjects to a particular
treatment group. This removes allocation bias
(ii) Randomisation – Subjects are randomly assigned to treatment groups
(Ie. by computer). This removes confounder bias by ensuring that the
groups are similar and the only difference between them is the study
factor
(iii) Double blinding – Experimenter and subjects are not aware of which
treatment the subject has received. This removes “observer bias”
o By doing this, the only confounding that arises within the study is that which
occurs by random chance (or non-systematic bias) – Statistics is then employed to
quantify that degree of random chance!
Note:
- Squares represent the point estimate of the difference between
the groups, with its size being proportion to the weight of the
study (which is determined by sample size)
- The horizontal lines from the square represent the 95% CI of the
estimate
- Diamond represents overall result pooled from all studies
- y-axis is the line of no effect