Biostatistics: A Refresher: Kevin M. Sowinski, Pharm.D., FCCP
Biostatistics: A Refresher: Kevin M. Sowinski, Pharm.D., FCCP
Biostatistics: A Refresher: Kevin M. Sowinski, Pharm.D., FCCP
Biostatistics: A Refresher
Kevin M. Sowinski, Pharm.D., FCCP
Purdue University College of Pharmacy
Indiana University School of Medicine
West Lafayette and Indianapolis, Indiana
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-41
Biostatistics: A Refresher
Learning Objectives:
1. Describe differences between descriptive and
inferential statistics.
2. Identify different types of data (nominal, ordinal,
continuous [ratio and interval]) to determine an
appropriate type of statistical test (parametric vs.
nonparametric).
3. Describe strengths and limitations of different types
of measures of central tendency (mean, median, and
mode) and data spread (standard deviation, standard
error of the mean, range, and interquartile range).
4. Describe the concepts of normal distribution and the
associated parameters that describe the distribution.
5. State the types of decision errors that can occur
when using statistical tests and the conditions under
which they can occur.
6. Describe hypothesis testing, and state the meaning of and
distinguish between p-values and confidence intervals.
7. Describe areas of misuse or misrepresentation that
are associated with various statistical methods.
8. Select appropriate statistical tests on the basis of the
sample distribution, data type, and study design.
9. Interpret statistical significance for results from
commonly used statistical tests.
10. Describe the similarities and differences between
statistical tests; learn how to apply them appropriately.
11. Identify the use of survival analysis and different
ways to perform and report it.
Self-Assessment Questions
Answers and explanations to these questions can be
found at the end of this chapter.
1. A randomized controlled trial assessed the effects of
the treatment of heart failure on global functioning
in three groups of adults after 6 months of treatment.
Investigators desired to assess global functioning
with the New York Heart Association (NYHA)
functional classification, an ordered scale from I to
IV, and compare the patient classification between
groups after 6 months of treatment. Which statistical test is most appropriate to assess differences in
functional classification between the groups?
A. Kruskal-Wallis.
2. You are evaluating a randomized, double-blind, parallel-group controlled trial that compares four antihypertensive drugs for their effect on blood pressure. The authors conclude that hydrochlorothiazide
is better than atenolol (p<0.05) and that enalapril is
better than hydrochlorothiazide (p<0.01), but no difference is observed between any other drugs. The
investigators used an unpaired (independent samples) t-test to test the hypothesis that each drug was
equal to the other. Which is most appropriate?
A. Investigators used the appropriate statistical
test to analyze their data.
B. Enalapril is the most effective of these drugs.
C. ANOVA would have been a more appropriate test.
D. A paired t-test is a more appropriate test.
3. In the results of a randomized, double-blind, controlled clinical trial, it is reported that the difference
in hospital readmission rates between the intervention group and the control group is 6% (p=0.01), and it
is concluded that there is a statistically significant difference between the groups. Which statement is most
consistent with this finding and these conclusions?
A. The chance of making a type I error is 5 in 100.
B. The trial does not have enough power.
C. There is a high likelihood of having made a
type II error.
D. The chance of making an alpha error is 1 in 100.
4. You are reading a manuscript that evaluates the impact of obesity on enoxaparin pharmacokinetics.
The authors used a t-test to compare the baseline
values of body mass index (BMI) in normal subjects
and obese subjects. You are evaluating the use of a
t-test to compare the BMI between the two groups.
Which represents the most appropriate criteria to
be met to use this parametric test for this particular
evaluation?
A. The sample sizes in the normal and obese subjects must be equal to allow the use of a t-test.
B. A t-test is not appropriate because BMI data
are ordinal.
C. The variance of the BMI data needs be similar
in each group.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-42
Biostatistics: A Refresher
7. Researchers planned a study to evaluate the percentage of subjects who achieved less than a target blood
pressure (less than 140/90 mm Hg) when initiated
on two different doses of amlodipine. In the study of
B. Kaplan-Meier curve.
C. Regression.
D. CIs.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-43
Biostatistics: A Refresher
I. INTRODUCTION TO STATISTICS
A. Method for Collecting, Classifying, Summarizing, and Analyzing Data
B. Useful Tools for Quantifying Clinical and Laboratory Data in a Meaningful Way
C. Assists in Determining Whether and by How Much a Treatment or Procedure Affects a Group of Patients
D. Why Pharmacists Need to Know Statistics
E. As Statistics Pertains to Most of You:
Pharmacotherapy Specialty Examination Content Outline
Domain 2: Retrieval, Generation, Interpretation, and Dissemination of Knowledge in Pharmacotherapy (25%)
Interpret biomedical literature with respect to study design and methodology, statistical analysis, and
significance of reported data and conclusions.
Knowledge of biostatistical methods, clinical and statistical significance, research hypothesis
generation, research design and methodology, and protocol and proposal development
F. Several papers have investigated the various types of statistical tests used in the biomedical literature, the
data from one of which are illustrated below.
Table 1. Statistical Content of Original Articles in The New England Journal of Medicine, 20042005
Statistical Procedure
% of Articles
Containing Methods
Statistical Procedure
% of Articles
Containing
Methods
No statistics/
descriptive statistics
13
t-tests
26
Multiway tables
13
Contingency tables
53
Power analyses
39
Nonparametric tests
27
Cost-benefit analysis
<1
Epidemiologic statistics
35
Sensitivity analysis
Pearson correlation
Repeated-measures analysis
12
Missing-data methods
Analysis of variance
16
Noninferiority trials
Transformation
10
Receiver-operating characteristics
Nonparametric correlation
Resampling
Survival methods
61
Principal component
and cluster analyses
Multiple regression
51
Other methods
Multiple comparisons
23
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-44
Biostatistics: A Refresher
Table 2. Statistical Content of Original Articles from Six Major Medical Journals from January to March 2005
(n=239 articles). Papers published in American Journal of Medicine, Annals of Internal Medicine, BMJ, JAMA,
Lancet, and The New England Journal of Medicine.
Statistical Test
No. (%)
Statistical Test
No. (%)
219 (91.6)
Others
Simple statistics
120 (50.2)
Intention-to-treat analysis
42 (17.6)
Chi-square analysis
70 (29.3)
Incidence/prevalence
39 (16.3)
t-test
48 (20.1)
29 (12.2)
Kaplan-Meier analysis
48 (20.1)
Sensitivity analysis
21 (8.8)
38 (15.9)
Sensitivity/specificity
15 (6.3)
33 (13.8)
Analysis of variance
21 (8.8)
Correlation
16 (6.7)
Multivariate analysis
164 (68.6)
64 (26.8)
54 (22.6)
7 (2.9)
38 (15.9)
None
5 (2.1)
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-45
Biostatistics: A Refresher
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-46
Biostatistics: A Refresher
60
59
65
64
62
54
54
68
67
79
55
48
65
59
65
87
49
46
46
a. Calculate the mean, median, and mode of the above data set.
b. Calculate the range, SD (will not have to do this by hand), and SEM (standard error of the mean) of
the above data set.
c. Evaluate the visual presentation of the data.
B. Inferential Statistics
1. Conclusions or generalizations made about a population (large group) from the study of a sample of that
population
2. Choosing and evaluating statistical methods depend, in part, on the type of data used.
3. An educated statement about an unknown population is commonly referred to in statistics as an
inference.
4. Statistical inference can be made by estimation or hypothesis testing.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-47
Biostatistics: A Refresher
V. CONFIDENCE INTERVALS
A. Commonly Reported as a Way to Estimate a Population Parameter
1. In the medical literature, 95% CIs are the most commonly reported CIs. In repeated samples, 95% of
all CIs include true population value (i.e., the likelihood/confidence [or probability] that the population
value is contained within the interval). In some cases, 90% or 99% CIs are reported. Why are 95% CIs
most often reported?
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-48
Biostatistics: A Refresher
2. Example:
a. Assume a baseline birth weight in a group with a mean SD of 1.18 0.4 kg.
b. 95% CI is about equal to the mean 1.96 SEM (or 2 SEM). In reality, it depends on the
distribution being employed and is a bit more complicated.
c. What is the 95% CI? (1.07, 1.29), meaning there is 95% certainty that the true mean of the entire
population studied will be between 1.07 and 1.29 kg
d. What is the 90% CI? The 90% CI is calculated to be (1.09, 1.27). Of note, the 95% CI will always
be wider than the 90% CI for any given sample. Therefore, the wider the CI, the more likely it is to
encompass the true population mean and, in general, the more confident we wish to be.
3. The differences between the SD, SEM, and CIs should be noted when interpreting the literature because
they are often used interchangeably. Although it is common for CIs to be confused with SDs, the
information each provides is quite different and needs to be assessed correctly.
4. Recall the previous example about HDL-C and green tea. What is the 95% CI of the data set, and what
does that mean?
B. CIs Can Also Be Used for Any Sample Estimate. Estimates derived from categorical data such as risk, risk
differences, and risk ratios are often presented with the CI and will be discussed later.
C. CIs Instead of Hypothesis Testing
1. Hypothesis testing and calculation of p-values tell us (ideally) whether there is, or is not, a statistically
significant difference between groups, but they do not tell us anything about the magnitude of the difference.
2. CIs help us determine the importance of a finding or findings, which we can apply to a situation.
3. CIs give us an idea of the magnitude of the difference between groups as well as the statistical
significance.
4. CIs are a range of data, together with a point estimate of the difference.
5. Wide CIs
a. Many results are possible, either larger or smaller than the point estimate provided by the study.
b. All values contained in the CI are statistically plausible.
6. If the estimate is the difference between two continuous variables: A CI that includes zero (no difference
between two variables) can be interpreted as not statistically significant (a p-value of 0.05 or greater).
There is no need to show both the 95% CI and the p-value.
7. The interpretation of CIs for odds ratios and relative risks is somewhat different. In that case, a value of 1
indicates no difference in risk, and if the CI includes 1, there is no statistical difference. (See the discussion of
case-control/cohort in other sections for how to interpret CIs for odds ratios and relative risks.)
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-49
Biostatistics: A Refresher
6. The results of the hypothesis testing will indicate whether enough evidence exists for H0 to be
rejected.
a. If H0 is rejected = statistically significant difference between groups (unlikely attributable to
chance)
b. If H0 is not rejected = no statistically significant difference between groups (any apparent
differences may be attributable to chance). Note that we are not concluding that the treatments are
equal.
B. To Determine What Is Sufficient Evidence to Reject H0: Set the a priori significance level () and generate
the decision rule.
1. Developed after the research question has been stated in hypothesis form
2. Used to determine the level of acceptable error caused by a false positive (also known as level of
significance)
a. Convention: A priori is usually 0.05.
b. Critical value is calculated, capturing how extreme the sample data must be to reject H0.
C. Perform the Experiment and Estimate the Test Statistic.
1. A test statistic is calculated from the observed data in the study, which is compared with the critical value.
2. Depending on this test statistics value, H0 is not-rejected (often referred to as fail to reject) or rejected.
3. In general, the test statistic and critical value are not presented in the literature; instead, p-values
are generally reported and compared with a priori values to assess statistical significance. p-value:
Probability of obtaining a test statistic and critical value as extreme, or more extreme, than the one
actually obtained
4. Because computers are used in these tests, this step is often transparent; the p-value estimated in the
statistical test is compared with the a priori (usually 0.05), and the decision is made.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-50
Biostatistics: A Refresher
C. Parametric Tests
1. Student t-test: Several different types
a. One-sample test: Compares the mean of the study sample with the population mean
Group 1
b. Two-sample, independent samples, or unpaired test: Compares the means of two independent
samples. This is an independent samples test.
Group 1
Group 2
i.
Measurement 2
d. Common error: Use of several t-tests with more than two groups
2. ANOVA: A more generalized version of the t-test that can apply to more than two groups
a. One-way ANOVA: Compares the means of three or more groups in a study. Also known as singlefactor ANOVA. This is an independent samples test.
Group 1
Group 2
Group 3
Group 1
Group 2
Group 3
Old groups
Group 1
Group 2
Group 3
Measurement 1
Measurement 2
Measurement 3
Biostatistics: A Refresher
D. Nonparametric Tests
1. These tests may also be used for continuous data that do not meet the assumptions of the t-test or
ANOVA.
2. Tests for independent samples
a. Wilcoxon rank sum test and Mann-Whitney U test: These compare two independent samples
(related to a t-test).
b. Kruskal-Wallis one-way ANOVA by ranks
i. Compares three or more independent groups (related to one-way ANOVA)
ii. Post hoc testing
3. Tests for related or paired samples
a. Sign test and Wilcoxon signed rank test: These compare two matched or paired samples (related to
a paired t-test).
b. Friedman ANOVA by ranks: Compares three or more matched/paired groups
E. Nominal Data
1. Chi-square (2) test: Compares expected and observed proportions between two or more groups
a. Test of independence
b. Test of goodness of fit
2. Fisher exact test: Specialized version of the chi-square test for small groups (cells) containing less than
five predicted observations
3. McNemar: Paired samples
4. Mantel-Haenszel: Controls for the influence of confounders
F. Correlation and Regression (see section IX)
G. Choosing the Most Appropriate Statistical Test: Example 1
1. A trial was conducted to determine whether rosuvastatin was better than simvastatin at lowering lowdensity lipoprotein cholesterol (LDL-C) concentrations. The trial was designed such that the subjects
baseline characteristics were as comparable as possible with each other. The intended primary end
point for this 3-month trial was the change in LDL-C from baseline. The results of the trial are reported
as follows:
Table 4. Rosuvastatin and Simvastatin Effect on LDL-C
Group
Baseline LDL-C
(mg/dL)
p-value
Baseline
Final LDL-C
(mg/dL)
p-value
Final
Rosuvastatin (n=25)
152 5
> 0.05
138 7
> 0.05
Simvastatin (n=25)
151 4
135 5
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-52
Biostatistics: A Refresher
4. The authors concluded that rosuvastatin is similar to simvastatin. What else would you like to know in
evaluating this study?
VIII.
DECISION ERRORS
H0 Is True
(no difference)
H0 Is False
(difference)
Type II error
(beta error)
Reject H0 (difference)
Type I error
(alpha error)
H0 = null hypothesis.
A. Type I Error: The probability of making this error is defined as the significance level .
1. Convention is to set to 0.05, effectively meaning that, 1 in 20 times, a type I error will occur when
the H0 is rejected. So, 5.0% of the time, a researcher will conclude that there is a statistically significant
difference when one does not actually exist.
2. The calculated chance that a type I error has occurred is called the p-value.
3. The p-value tells us the likelihood of obtaining a given (or a more extreme) test result if the H0 is true.
When the level is set a priori, H0 is rejected when p is less than . In other words, the p-value tells us
the probability of being wrong when we conclude that a true difference exists (false positive).
4. A lower p-value does not mean the result is more important or more meaningful, but only that it is
statistically significant and not likely attributable to chance.
B. Type II Error: The probability of making this error is termed .
1. Concluding that no difference exists when one truly does (not rejecting H0 when it should be rejected)
2. It has become a convention to set to between 0.20 and 0.10.
C. Power (1 )
1. The probability of making a correct decision when H0 is false; the ability to detect differences between
groups if one actually exists
2. Dependent on the following factors:
a. Predetermined : The risk of error you will tolerate when rejecting H0
b. Sample size
c. The size of the difference between the outcomes you wish to detect. Often not known before
conducting the experiment, so to estimate the power of your test, you will have to specify how
large a change is worth detecting
d. The variability of the outcomes that are being measured
e. Items c and d are generally determined from previous data and/or the literature.
3. Power is decreased by (in addition to the above criteria)
a. Poor study design
b. Incorrect statistical tests (use of nonparametric tests when parametric tests are appropriate)
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-53
Biostatistics: A Refresher
p-value
Point
Estimate (%)
95% CI
480/800 (60)
416/800 (52)
0.001
3% to 13%
15/25 (60)
13/25 (52)
0.57
19% to 35%
15/25 (60)
9/25 (36)
0.09
24
3% to 51%
240/400 (60)
144/400 (36)
< 0.0001
24
17% to 31%
CI = confidence interval.
f. Which study (or studies) observed a statistically significant difference in response rate?
g. If the smallest change in response rate thought to be clinically significant is 20%, which of these
trials may be convincing enough to change practice?
h. What if the smallest clinically important difference were 15%?
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-54
Biostatistics: A Refresher
1
Perfect negative linear relationship
0
No linear relationship
+1
Perfect positive linear relationship
5. Hypothesis testing is performed to determine whether the correlation coefficient is different from zero.
This test is highly influenced by sample size.
C. Pearls About Correlation
1. The closer the magnitude of r to 1 (either + or ), the more highly correlated the two variables. The
weaker the relationship between the two variables, the closer r is to 0.
2. There is no agreed-on or consistent interpretation of the value of the correlation coefficient. It depends
on the environment of the investigation (laboratory vs. clinical experiment).
3. Pay more attention to the magnitude of the correlation than to the p-value because it is influenced by
sample size.
4. Crucial to the proper use of correlation analysis is the interpretation of the graphic representation of
the two variables. Before using correlation analysis, it is essential to generate a scatterplot of the two
variables to visually examine the relationship.
D. Spearman Rank Correlation: Nonparametric test that quantifies the strength of an association between two
variables but does not assume a normal distribution of continuous data. Can be used for ordinal data or
nonnormally distributed continuous data.
E. Regression
1. A statistical technique related to correlation. There are many different types; for simple linear regression:
One continuous outcome (dependent) variable and one continuous independent (causative) variable
2. Two main purposes of regression: (1) development of prediction model and (2) accuracy of prediction
3. Prediction model: Making predictions of the dependent variable from the independent variable; Y =
mx+ b (dependent variable = slope independent variable + intercept)
4. Accuracy of prediction: How well the independent variable predicts the dependent variable. Regression
analysis determines the extent of variability in the dependent variable that can be explained by the
independent variable.
a. Coefficient of determination (r2) measured describing this relationship. Values of r2 can range from 0 to 1.
b. An r2 of 0.80 could be interpreted as saying that 80% of the variability in Y is explained by the
variability in X.
c. This does not provide a mechanistic understanding of the relationship between X and Y, but rather,
a description of how clearly such a model (linear or otherwise) describes the relationship between
the two variables.
d. Like the interpretation of r, the interpretation of r2 is dependent on the scientific arena (e.g., clinical
research, basic research, social science research) to which it is applied.
5. For simple linear regression, two statistical tests can be employed.
a. To test the hypothesis that the y-intercept differs from zero
b. To test the hypothesis that the slope of the line is different from zero
6. Regression is useful in constructing predictive models. The literature is full of examples of predictions. The
process involves developing a formula for a regression line that best fits the observed data.
7. Like correlation, there are many different types of regression analysis.
a. Multiple linear regression: One continuous dependent variable and two or more continuous
independent variables
b. Simple logistic regression: One categorical response variable and one continuous or categorical
explanatory variable
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-55
Biostatistics: A Refresher
c. Multiple logistic regression: One categorical response variable and two or more continuous or
categorical explanatory variables
d. Nonlinear regression: Variables are not linearly related (or cannot be transformed into a linear
relationship). This is where our PK (pharmacokinetic) equations come from.
e. Polynomial regression: Any number of response and continuous variables with a curvilinear
relationship (e.g., cubed, squared)
8. Example of regression
a. The following data are taken from a study evaluating enoxaparin use. The authors were interested
in predicting patient response (measured as antifactor Xa concentrations) from the enoxaparin dose
in the 75 subjects who were studied.
X. SURVIVAL ANALYSIS
A. Studies the Time Between Entry in a Study and Some Event (e.g., death, myocardial infarction)
1. Censoring makes survival methods unique; considers that some subjects leave the study for reasons
other than the event (e.g., lost to follow-up, end of study period)
2. Considers that not all subjects enter the study at the same time
3. Standard methods of statistical analysis (e.g., t-tests and linear or logistic regression) cannot be applied
to survival data because of censoring.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-56
Biostatistics: A Refresher
2 Samples
(independent)
2 Samples
(related
> 2 Samples
(independent)
> 2 Samples
(related)
Nominal
McNemar test
Cochran Q
Ordinal
Kruskal-Wallis
(MCP)
Friedman ANOVA
Paired t-test
1-way ANOVA
(MCP)
Repeated-measures
ANOVA
ANCOVA
2-way
repeatedmeasures
ANOVA
2-way ANOVA
(MCP)
2-way
repeatedmeasures
ANOVA
Continuous
No factors
1 factor
ANCOVA = analysis of covariance; ANOVA = analysis of variance; MCP = multiple comparisons procedures
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-57
Biostatistics: A Refresher
REFERENCES
1. Crawford SL. Correlation and regression. Circulation
2006;114:2083-8.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-58
Biostatistics: A Refresher
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-59
Biostatistics: A Refresher
Acknowledgment: The contributions of the previous author/contributor, Dr. G. Robert DeYoung, to this topic are
acknowledged.
ACCP Updates in Therapeutics 2014: Pharmacotherapy Preparatory Review and Recertification Course
2-60