Sample Size R Module
Sample Size R Module
Calculation with
R
Dr. Mark Williamson, Statistician
Biostatistics, Epidemiology, and Research Design Core
DaCCoTA
Purpose
https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/The_Three_Bears_-
_Project_Gutenberg_eText_17034.jpg/1200px-The_Three_Bears_-_Project_Gutenberg_eText_17034.jpg
Key Bits of Sample Size Calculation
Effect size: magnitude of the effect under the
alternative hypothesis
• The larger the effect size, the easier it is to detect an effect and require fewer
samples
Power: probability of correctly rejecting the null
hypothesis if it is false
• AKA, probability of detecting a true difference when it exists
• Power = 1-β, where β is the probability of a Type II error (false negative)
• The higher the power, the more likely it is to detect an effect if it is present and
the more samples needed
• Standard setting for power is 0.80
Significance level (α): probability of falsely rejecting the
null hypothesis even though it is true
• AKA, probability of a Type I error (false positive)
• The lower the significance level, the more likely it is to avoid a false positive and
the more samples needed
• Standard setting for α is 0.05
• Given those three bits, and other information based
on the specific design, you can calculate sample size
for most statistical tests
https://images-na.ssl-images-amazon.com/images/I/61YIBfLPPuL._SX355_.jpg
Effect Size in detail
• While Power and Significance level are usually set
irrespective of the data, the effect size is a property
of the sample data
• It is essentially a function of the difference between
the means of the null and alternative hypotheses
over the variation (standard deviation) in the data 𝑀𝑒𝑎𝑛𝐴 − 𝑀𝑒𝑎𝑛0
𝐸𝑓𝑓𝑒𝑐𝑡 𝑆𝑖𝑧𝑒 ≈
How to estimate Effect Size: 𝑆𝑡𝑑. 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
A. Use background information in the form of preliminary/trial data
to get means and variation, then calculate effect size directly
B. Use background information in the form of similar studies to get
means and variation, then calculate effect size directly
C. With no prior information, make an estimated guess on the effect
size expected, then use an effect size that corresponds to the size
of the effect
• Broad effect sizes categories are small, medium, and large
• Different statistical tests will have different values of effect size for
each category
Effect Size Calculation within R
• As opposed to GPower, which allows you to enter details such as means and standard
deviations into the program and it will calculate effect size for you, that is not the
case for R
• Most R functions for sample size only allow you to enter effect size
• If you want to estimate effect size from background information, you’ll need to
calculate it yourself first
• Throughout this Module, I will provide an equation to calculated effect size for each
of the statistical tests
❖Disclaimer: Most of the examples and practice problems are the same as an earlier GPower
Module. However, it was not always clear how effect size was calculated in GPower or in R,
so sometimes the sample size calculated was different between the two. When in doubt, I
would go with the result that gives the higher sample size to avoid undersampling.
Statistical Rules of the Game
Here are a few pieces of terminology to refresh yourself with before embarking on calculating
sample size:
• Null Hypothesis (H0): default or ‘boring’ state; your statistical test is run to either Reject or Fail to Reject the Null
• Alternative Hypothesis (H1): alternative state; usually what your experiment is interested in retaining over the Null
• One Tailed Test: looking for a deviation from the H0 in only one direction (ex: Is variable X larger than 0?)
• Two-tailed Test: looking for a deviation from the H0 in either direction (ex: Is variable Y different from 0?)
• Parametric data: approximately fits a normal distribution; needed for many statistical tests
• Non-parametric data: does not fit a normal distribution; alternative and less powerful tests available
• Paired (dependent) data: categories are related to one another (often result of before/after situations)
• Un-paired (independent) data: categories are not related to one another
• Dependent Variable: Depends on other variables; the variable the experimenter cares about; also known as the Y or response variable
• Independent Variable: Does not depend on other variables; usually set by the experimenter; also known as the X or predictor variable
Using R: Basics
• This module assumes the user is familiar with R
• For an introduction or refresher, please check out the following material
• https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
• http://www.r-tutor.com/r-introduction
• https://www.statmethods.net/
• d=effect size
• sig.level=significant level
• power=power of test
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • type=type of test
Var(s) Group # Interest
1 0 0 0 Yes N/A
n = 33.36713 → Round up to 34
d = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
One Mean T-Test: Practice
Calculate the sample size for the following scenarios (with α=0.05, and
power=0.80):
1. You are interested in determining if the average income of college freshman is
less than $20,000. You collect trial data and find that the mean income was
$14,500 (SD=6000).
2. You are interested in determining if the average sleep time change in a year for
college freshman is different from zero. You collect the following data of sleep
change (in hours).
Sleep
-0.55 0.16 2.6 0.65 -0.23 0.21 -4.3 2 -1.7 1.9
Change
1. You are interested in determining if the average weight change in a year for
college freshman is greater than zero.
One Mean T-Test: Answers
1. You are interested in determining if the average income of college freshman is less than $20,000. You collect
trial data and find that the mean income was $14,500 (SD=6000).
• Effect size = (MeanH1-MeanH0)/SD= (14,500-20,000)/6000 = -0.917
• One-tailed test
• pwr.t.test(d=-0.917, sig.level=0.05, power=0.80, type="one.sample", alternative=“less")
• n = 8.871645 -> 9 samples
2. You are interested in determining if the average sleep time change in a year for college freshman is different
from zero. You collect the following data of sleep change (in hours).
Sleep Change -0.55 0.16 2.6 0.65 -0.23 0.21 -4.3 2 -1.7 1.9
• d=effect size
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • sig.level=significant level
Var(s) Group # Interest
• power=power of test
1 1 2 1 Yes No • type=type of test
3. You are interested in determining if the average glucose level in blood is lower
in men than women
Two Means T-Test: Answers
1. You are interested in determining if the average daily caloric intake different between men and women. You
collected trial data and found the average caloric intake for males to be 2350.2 (SD=258), while females had
intake of 1872.4 (SD=420).
• Effect size = (MeanH1-MeanH0)/ SDpooled =(2350.2-1872.4)/ √((2582+ 4202)/2) = 477.8/348.54 = 1.37
• two-tailed test
• pwr.t.test(d=1.37, sig.level=0.05, power=0.80, type=“two.sample", alternative=“two-sided")
• n = 9.43 -> 10 samples per group
2. You are interested in determining if the average protein level in blood different between men and women.
You collected the following trial data on protein level (grams/deciliter).
Male Protein 1.8 5.8 7.1 4.6 5.5 2.4 8.3 1.2
Female Protein 9.5 2.6 3.7 4.7 6.4 8.4 3.1 1.4
• Effect size = (MeanH1-MeanH0)/ SDpooled =(4.59-4.98)/ √((2.582+ 2.882)/2) = -0.14
• two-tailed test
• pwr.t.test(d=-0.14, sig.level=0.05, power=0.80, type=“two.sample", alternative=“two-sided")
• n = 801.87 -> 802 samples per group
3. You are interested in determining if the average glucose level in blood is lower in men than women
• Guessed a small effect (0.20), then used a one-tailed test
• pwr.t.test(d=-0.20, sig.level=0.05, power=0.80, type=“two.sample", alternative=“less")
• n = 309.8 -> 310 samples per group
Paired T-test
Description: this tests if a mean from one group is different R Code: pwer -> pwr.t.test
from the mean of another group, where the groups are
dependent (not independent) for a normally distributed pwr.t.test(d = , sig.level = , power = , type = c("two.sample",
variable. Pairing can be leaves on same branch, siblings, the "one.sample", "paired"))
same individual before and after a trial, etc. • d=effect size
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired
• sig.level=significant level
Var(s) Group # Interest • power=power of test
• type=type of test
1 1 2 1 Yes Yes
1 1 >2 1 Yes No
k=6
n = 214.7178 → Round up to 215 samples per group
f = 0.1
sig.level = 0.05
power = 0.8
h = 0.2
n = 196.2215 → Round up to 197
sig.level = 0.05
power = 0.8
alternative = two.sided
Single Proportion: Practice
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if the male incidence rate proportion of cancer in North
Dakota is higher than the US average (prop=0.00490). You find trial data cancer prevalence
of 0.00495.
2. You are interested in determining if the female incidence rate proportion of cancer in North
Dakota is lower than the US average (prop=0.00420).
Single Proportion: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if the male incidence rate proportion of cancer in North
Dakota is higher than the US average (prop=0.00490). You find trial data cancer prevalence
of 0.00495.
• h= 2*asin(sqrt(0.00495))-2*asin(sqrt(0.00490))=0.0007
• pwr.p.test(h=0.0007, sig.level=0.05, power=0.80, alternative=“greater")
• n = 12617464 -> 12,617,464 samples
2. You are interested in determining if the female incidence rate proportion of cancer in North
Dakota is lower than the US average (prop=0.00420).
• Guess a very low effect size (0.001)
• pwr.p.test(h=-0.001, sig.level=0.05, power=0.80, alternative=“less")
• n = 6182557 -> 6,182,557 samples
Two Proportions Test
Description: this tests when you only have two groups and R Code: pwer -> pwr.2p.test
you want to know if the proportions of each group are
different from one another. pwr.2p.test(h = , sig.level =, power =,
alternative="two.sided", "less", or "greater" )
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired
Var(s) Group # Interest
• h=effect size
0 2 2 2 N/A No • sig.level=significant level
• power=power of test
• alternative=type of tail
Example: Is the expected proportion of students passing a
stats course taught by psychology teachers different from
the observed proportion of students passing the same stats
class taught by mathematics teachers?
• H0=0, H1≠0 Effect size calculation
• You don’t have background info, so you guess that • h= 2*asin(sqrt(p1))-2*asin(sqrt(p2))
there is a small effect size • p1=proportion 1
• For h-tests: • p2=proportion 2
0.2=small, 0. 5=medium, and 0.8 large effect sizes
• Selected Two-sided, because we don’t care about
directionality
Two Proportions Test
Results:
h = 0.2
n = 392.443 → Round up to 393
sig.level = 0.05
power = 0.8
alternative = two.sided
2. You are interested in determining of the expected proportion (P1) of female students who
selected YES on a question was higher than the observed proportion (P2) of male students
who selected YES. The observed proportion of males who selected yes was 0.75.
Two Proportions: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if the expected proportion (P1) of students passing a stats
course taught by psychology teachers is different than the observed proportion (P2) of
students passing the same stats class taught by biology teachers. You collected the
following data of passed tests.
• P1=7/10=0.70, P2=6/10=0.60
• h= 2*asin(sqrt(0.60))-2*asin(sqrt(0.70))=-0.21
• pwr.2p.test(h=-0.21, sig.level=0.05, power=0.80, alternative=“two.sided")
• n = 355.96 -> 356 samples
Psychology Yes Yes Yes No No Yes Yes Yes Yes No
Biology No No Yes Yes Yes No Yes No Yes Yes
2. You are interested in determining of the expected proportion (P1) of female students who
selected YES on a question was higher than the observed proportion (P2) of male students
who selected YES. The observed proportion of males who selected yes was 0.75.
• Guess that the expected proportion (P1) =0.85
• h= 2*asin(sqrt(0.85))-2*asin(sqrt(0.75))=0.25
• pwr.2p.test(h=0.25, sig.level=0.05, power=0.80, alternative=“greater")
• n = 197.84 -> 198 samples
Chi-Squared Test
Description: Extension of proportions test, which asks if table R Code: pwer -> pwr.chisq.test
of observed values are any different from a table of expected
ones. Also called Goodness-of-fit test. pwr.chisq.test(w =, df = , sig.level =, power = )
• w=effect size
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • df=degrees of freedom
Var(s) Group # Interest • sig.level=significant level
0 ≥1 ≥2 1 N/A No • power=power of test
w = 0.3
N = 121.1396 → Round up to 122
df = 3
sig.level = 0.05
power = 0.8
u=1
v = 22.50313
f2 = 0.35
sig.level = 0.05
power = 0.8
2. You are interested in determining if the size of a city (in square miles) can predict the
population of the city (in # of individuals).
Simple Linear Regression: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if height (meters) in plants can predict yield (grams of
berries). You collect the following trial data. Yield 46.8 48.7 48.4 53.7 56.7
Height 14.6 19.6 18.6 25.5 20.4
• Created variables in R
• yield<-c(46.8, 48.7, 48.4, 53.7, 56.7)
• height<-c(14.6, 19.6, 18.6, 25.5, 20.4)
• Ran linear model to find R-squared
• linearMod <- lm(height~yield)
• summary(linearMod) -> adj R2=0.2784
• f2=R=√(adj R2)= √(0.4588)=0.53
• pwr.f2.test(u=1, f2=0.53, sig.level=0.05, power=0.80)
• v=14.96 -> 15+ 2(variables) ->17 samples
2. You are interested in determining if the size of a city (in square miles) can predict the
population of the city (in # of individuals).
• Guessed a large effect size (0.35); for 1 predictor so 1 df
• pwr.f2.test(u=1, f2=0.35, sig.level=0.05, power=0.80)
• v=22.5 -> 23+ 2(variables) ->25 samples
Multiple Linear Regression
Description: The extension of simple linear regression. The R Code: pwer -> pwr.f2.test
first major change is there are more predictor variables. The
second change is that interaction effects can be used. Finally, pwr.f2.test(u =, v= , f2=, sig.level =, power = )
the results typically can’t be plotted.
• u=numerator degrees of freedom
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired
Var(s) Group # Interest • v=denominator degrees of freedom
>2 0 N/A N/A Yes N/A • f2=effect size
• sig.level=significant level
• power=power of test
Example: Can height, age, and time spent at the gym, predict
weight in adult males?
• H0=0, H1≠0
• You don’t have background info, so you guess that there is a
medium effect size
• For f2-tests: Effect size calculation
0.02=small, 0.15=medium, and 0.35 large effect sizes • f2=R=√(R2)
• Numerator degrees of freedom is the number of predictor • R=correlation coefficient
variables (3) • R2 =goodness-of-fit
• Output will be denominator degrees of freedom rather than • Use adjusted R2
sample size; will need to round up and add the total number
of variables (4)
Multiple Linear Regression
Results:
u=3
v = 72.70583
f2 = 0.15
sig.level = 0.05
power = 0.8
• r=correlation
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • sig.level=significant level
Var(s) Group # Interest
• power=power of test
2 0 N/A N/A Yes No
n = 28.24841 → Round up to 29
r = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
Correlation: Practice
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if there is a correlation between height and
weight in men
Height 178 166 172 186 182
Males
Weight 165 139 257 225 196
2. You are interested in determining if, in lab mice, the correlation between
longevity (in months) and average protein intake (grams).
Correlation: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if there is a correlation between height and
weight in men 178
Males
166 172
Height 186 182
• Created variables in R and ran correlation test Weight 165 139 257 225 196
• MH <-c(178,166,172,186,182)
• MW <-c(165,139,257,225,196)
• cor(MH, MW) -> 0.37
• pwr.r.test(r=0.37, sig.level=0.05, power=0.80)
• n = 54.19 -> 55 samples
2. You are interested in determining if, in lab mice, the correlation between longevity (in
months) and average protein intake (grams).
• Guessed large (0.5) correlation
• pwr.r.test(r=0.5, sig.level=0.05, power=0.80)
• n = 28.24 -> 29 samples
Non-Parametric T-tests
Description: versions of the t-tests for non-parametric Examples: (for t-tests, 0.2=small, 0.5=medium, and 0.8 large effect
sizes)
data. One Mean Wilcoxon:
• One Mean Wilcoxon: sample mean against set value Is the average number of children in Grand Forks families different
• Mann-Whitney: two sample means (unpaired) than 1?
• Paired Wilcoxon: two sample means (paired) • H0=1 child
• H1>1 child
• You don’t have background info, so you guess that there is a medium
Name Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired
Var(s) Group # Interest effect size
• Select one-tailed (greater)
One Mean 1 0 0 0 No N/A
Wilcoxon Mann-Whitney:
Mann-Whitney
Does the average number of snacks per day for individuals on a diet
1 1 2 1 No No
differ between young and old persons?
Paired Wilcoxon • H0=0 difference in snack number,
1 1 2 1 No Yes • H1≠0 difference in snack number
• You don’t have background info, so you guess that there is a small
• There aren’t any R packages that had useful non- •
effect size
Select two-sided
parametric t-tests Paired Wilcoxon:
• I suggest using the parametric + 15% approach Is genome methylation patterns different between identical twins?
• H0=0% methylation
• H1≠0% methylation
Effect size calculation • You don’t have background info, so you guess that there is a large
• Cohen’s D: (M2-M1)/SD; (M2-M1)/Sdpooled; (Meandiff)/ SDdiff effect size
• Select one-tailed (greater)
Non-parametric Tests
Results:
>#One Mean Wilcoxon >#Paired Wilcoxon
> pwr.t.test(d=0.5, sig.level=0.05, power=0.80, type="one.sample", alternative="greater") > pwr.t.test(d=0.8, sig.level=0.05, power=0.80, type="paired",
One-sample t test power calculation alternative="greater")
n = 26.13753 Paired t test power calculation
d = 0.5 n = 11.14424
sig.level = 0.05 d = 0.8
power = 0.8 sig.level = 0.05
alternative = greater power = 0.8
> #Non-parametric correction alternative = greater
> round(26.13753*1.15,0) NOTE: n is number of *pairs*
[1] 30 → Total sample size > #Non-parametric correction
> round(11.14424*1.15,0)
>#Mann-Whitney [1] 13 → Total number of pairs
> pwr.t.test(d=0.2, sig.level=0.05, power=0.80, type=“two.sample",
alternative="two.sided")
Two-sample t test power calculation
n = 198.1508
d = 0.2
sig.level = 0.05
power = 0.8
alternative = two.sided
> #Non-parametric correction
> round(198.1508*1.15,0)
[1] 228 → Total sample size
Non-Parametric T-tests: Practice
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if the average number of pets in Grand Forks families is
greater than 1. You collect the following trial data for pet number.
Pets 1 1 1 3 2 1 0 0 0 4
2. You are interested in determining if the number of meals per day for individuals on a diet is
higher in younger people than older. You collected trial data on meals per day.
Young meals 1 2 2 3 3 3 3 4
Older meals 1 1 1 2 2 2 3 3
3. You are interested in determining if genome methylation patterns are higher in the first
fraternal twin born compared to the second. You collected the following trial data on
methylation level difference (in percentage).
Methy. Diff (%) 5.96 5.63 1.25 1.17 3.59 1.64 1.6 1.4
Non-Parametric T-tests: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if the average number of pets in Grand Forks families is greater than 1. You
collect the following trial data for pet number.
• Effect size = (MeanH1-MeanH0)/SD= (1.3-1.0)/1.34 =0.224 Pets 1 1 1 3 2 1 0 0 0 4
• One-tailed test
• pwr.t.test(d=0.224, sig.level=0.05, power=0.80, type="one.sample", alternative=“greater")
• n =124.58*1.15 (then round up)-> 143 samples
2. You are interested in determining if the number of meals per day for individuals on a diet is higher in younger people
than older. You collected trial data on meals per day.
Young meals 1 2 2 3 3 3 3 4
Older meals 1 1 1 2 2 2 3 3
• Effect size = (MeanH1-MeanH0)/SDpooled =(2.625-1.875)/ √((0.922+ 0.832)/2) = 0.856
• One-tailed test
• pwr.t.test(d=0.856, sig.level=0.05, power=0.80, type=“two.sample", alternative=“greater")
• n = 17.59*1.15 (then round up)-> 20 samples per group
3. You are interested in determining if genome methylation patterns are different in the first fraternal twin born
compared to the second. You collected the following trial data on methylation level difference (in percentage).
• Effect size = (Meandiff)/ SDdiff =(2.78)/ 2.01 = 1.38
• Two-tailed test
Methy. Diff (%) 5.96 5.63 1.25 1.17 3.59 1.64 1.6 1.4
• pwr.t.test(d=1.38, sig.level=0.05, power=0.80, type=“paired", alternative=“two.sided")
• n = 6.29*1.15 (then round up) -> 7 pairs
Kruskal Wallace Test
Description: this tests if at least one mean is different among R Code: pwer -> pwr.anova.test
groups, where the groups are larger than two for a non-normally
distributed variable. (AKA, non-parametric ANOVA). There really isn’t pwr.anova.test(k =, f = , sig.level = , power = )
a good way of calculating sample size in R, but you can use a rule of
thumb: • k=number of groups
1. Run Parametric Test • f=effect size
2. Add 15% to total sample size • sig.level=significant level
• power=power of test
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired
Var(s) Group # Interest
1 1 >2 1 No No
k=3
n = 52.3966
f = 0.25
sig.level = 0.05
power = 0.8
55 42 50
42 40 23
Kruskal Wallace Test: Answers
Calculate the sample size for the following scenarios Faculty Staff Hourly
(with α=0.05, and power=0.80):
1. You are interested in determining there is a
difference in hours worked across 3 different groups 42 46 29
(faculty, staff, and hourly workers). You collect the
following trial data of weekly hours (shown on right).
• η2 = SStreat / SStotal =286.5/(286.5+625.2) = 0.314 45 45 42
• f = √((0.314/(1- 0.314) = 0.677
• 3 groups
• pwr.anova.test(k =3, f =0.677, sig.level=0.05, power =0.80) 46 37 33
• n =8.09*1.15 (then round up)-> 10 samples per group
2. You are interested in determining there is a
difference in assistant professor salaries across 25 55 42 50
different departments.
• Guess small effect size (0.10)
• 25 groups 42 40 23
• pwr.anova.test(k =25, f =0.10, sig.level=0.05, power =0.80)
• n =90.67*1.15 (then round up)-> 105 samples per group
Repeated Measures ANOVA
Description: this tests if at least one mean is different among R Code: WebPower -> wp.rmanova
groups, where the groups are repeated measures (more than
two) for a normally distributed variable. Repeated Measures wp.rmanova(ng = NULL, nm = NULL, f = NULL, nscor = 1, alpha =
ANOVA is the extension of the Paired T-test for more than two 0.05, power = NULL, type = 0)
• ng=number of groups
groups. • nm=number of measurements
• f=effect size
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • nscor=nonsphericity correction coefficient
Var(s) Group # Interest • alpha=significant level of test
1 1 >2 1 Yes Yes • power=statistical power
• type=(0,1,2) The value "0" is for between-effect; "1" is for
Example: Is there a difference in blood pressure at 1, 2, 3, within-effect; and "2" is for interaction effect
and 4 months post-treatment?
• H0=0, H1≠0 Effect size calculation NOTE:
𝜎
• 1 group, 4 measurements • 𝑓= 𝑚 • Within-effects: variability
𝜎
• of a particular value for
You don’t have background info, so you guess that there is a • σm = standard deviation of group
individuals in a sample
small effect size means • Between-effects:
• For f-tests: σ𝐾 2 examines differences
𝑘=1(𝑚𝑘 −𝑚)
0.1=small, 0.25=medium, and 0.4 large effect sizes • 𝜎𝑚 = between individuals
𝑘
• For the nonsphericity correction coefficient, 1 means • mk = group mean
sphericity is met. There are methods to estimate this but • m = overall mean
will go with 1 for this example. • k=number of groups,
• Type will be 1, as we want within-effect • σ =overall standard deviation
Repeated Measures ANOVA
Results:
35 48 51 44
21 27 29 36
Repeated Measures ANOVA: Answers
Calculate the sample size for the following scenarios (with α=0.05, and 6 months 12 months 18 months 24 months
power=0.80):
1. You are interested in determining if there is a difference in blood serum
levels at 6, 12, 18, and 24 months post-treatment. You collect the following
trial data of blood serum in mg/dL (shown on right).
38 38 46 52
𝜎𝑚 σ𝐾
𝑘=1(𝑚𝑘 −𝑚)
2
• 𝑓= = /σ
𝜎 𝑘
𝐽=number of treatments,
• and 𝐾=number of sections
Multi-Way ANOVA (1 Category of Interest)
Results:
on right).
7.9 -1 -1.5 3.9 -1.9 1.3 2.5 -8.2 -9.7
2. You are interested in determining if there is a
difference in treatment (Drug A, B, and C) across age
(child, adult, elder) and cancer stage (I, II, III, IV, V).
You collect trial data and find that the between-
group variance is 27.3, while the total variance is
85.2.
Multi-Way ANOVA: Answers
Calculate the sample size for the following scenarios (with α=0.05, and Drug A Drug B Drug C
power=0.80):
1. You are interested in determining if there is a difference in treatment (Drug A,
B, and C), while controlling for age (child=c, adult=a, elder=e). You collect the c a e c a e c a e
following trial data for treatment (shown on right).
• Only care about Drug, so focus on treatment only -6.4 8.7 -3.1 1.3 -6.0 6.8 -2.0 -4.3 -1.2
2
σ𝐾 2 σ𝐽𝑗=1 σ𝐾
𝑘 (𝜇𝑗𝑘 −𝜇)
𝜎𝑏 𝑘=1(𝜇𝑘 −𝜇)
• 𝑓= = ൘
𝜎𝑤 𝐾 𝐽𝐾
• 𝑓 = 1.74Τ2.65 = 0.657 7.9 -1 -1.5 3.9 -1.9 1.3 2.5 -8.2 -9.7
• Numerator df = 3 (Drug treatments) -1 = 2
• Number of groups = 3*3 = 9
• wp.kanova(ndf=2, f=0.657, ng=9, alpha=0.05, power=0.80)
• n =26.6-> 27 samples total (3 per group)
2. You are interested in determining if there is a difference in treatment (Drug A,
B, and C) across age (child, adult, elder) and cancer stage (I, II, III, IV, V). You
collect trial data and find that the between-group variance is 27.3, while the total variance
is 85.2.
• Care about treatment, age, and cancer stage
• Numerator df = (3-1)*(3-1)*(5-1)=2*2*4=16
• Number of groups is 3*3*5=45
• 2
η2 = 𝜎𝑚 / 𝜎𝑡2 = 27.3/85.2 =0.32
• f = η2Τ(1 − η2 ) = 0.32Τ(1 − 0.32) = 0.686
• wp.kanova(ndf=16, f=0.686, ng=45, alpha=0.05, power=0.80)
• n 67.03-> 68 samples, need 90 samples to have even groups (2 per group)
Logistic Regression
Description: Tests whether a predictor variable is a significant R Code: WebPower -> wp.logistic
predictor of a binary outcome, with or without other
covariates. It is a type of non-parametric regression:
numerical variables are not normally distributed. In Logistic wp.logistic(n = NULL, p0 = NULL, p1 = NULL, alpha = 0.05,
power = NULL, alternative = c("two.sided", "less", "greater"),
regression, the response variable (Y) is binary (0/1). family = c("Bernoulli", "exponential", "lognormal", "normal",
"Poisson", "uniform"), parameter = NULL)
Numeric. Cat. Var(s) Cat. Var Cat Var. # of Parametric Paired • p0= Prob(Y=1|X=0): the probability of observing 1 for the outcome
Var(s) Group # Interest variable Y when the predictor X equals 0
• p1= Prob(Y=1|X=1): the probability of observing 1 for the outcome
≥2 0 N/A N/A No N/A variable Y when the predictor X equals 1
• alpha= significance level
Example: Does body mass index (BMI) influences mortality • power= statistical power
(yes 1, no 0)? • alternative= direction of the alternative hypothesis ("two.sided" or
• H0=0, H1≠0 "less" or "greater")
• You must have at least some background (or good guess) on • family= distribution of the predictor ("Bernoulli","exponential",
"lognormal", "normal","Poisson", "uniform"). The default is
the p0 and p1 probabilities; let’s use 0.15 and 0.25
"Bernoulli"
• Will use ‘two-sided’ because we don’t care about direction • parameter= corresponding parameter for the predictor’s distribution.
• BMI seems normally distributed, so will go with normal for The default is 0.5 for "Bernoulli", 1 for "exponential", (0,1) for
the family (but should confirm the distribution for whatever "lognormal" or "normal", 1 for "Poisson", and (0,1) for "uniform"
predictor variable you use)
Effect size calculation
• Can leave the parameter empty at the default of mean=0,
SD=1 • N/A, uses probability information instead
Logistic Regression
Results:
2. You are interested in determining if the rate of lung cancer incidence changes with a drug
treatment.
Logistic/Poisson Regression: Answers
Calculate the sample size for the following scenarios (with α=0.05, and power=0.80):
1. You are interested in determining if body temperature influences sleep disorder prevalence
(yes 1, no 0). You collect the following trial data. Temperature 98.6 98.5 99.0 97.5 98.8 98.2 98.5 98.4 98.1
Sleep Disorder? No No Yes No Yes No No Yes No
• Logistic Regression (two.sided)
• Mean temp is 98.4 (SD=0.436) -> range of one SD=(97.964 --98.836)
• P0=0.33 (as only one had sleep disorder at ranges outside one SD); P1=0.67
• Temperature is normally distributed
• wp.logistic(p0=0.33, p1=0.67, alpha=0.05, power=0.80, alternative="two.sided", family="normal")
• n =40.80-> 41 samples total
2. You are interested in determining if the rate of lung cancer incidence changes with a drug
treatment.
• Poisson Regression (two.sided)
• Expect the base rate (intercept) for male lung cancer is 57.8 (per 100,000), so exp0 = exp(57.8/100000) = 1.0005
• Expect the relative increase of the event rate (slope) to be -1.02, so exp1 = exp(-1.02) = 0.36
• Go with default distribution of Bernoulli
• wp.poisson(exp0=1.0005, exp1=0.36, alpha=0.05, power=0.80, alternative =“two.sided", family=“Bernoulli")
• n =56.8-> 59 samples total
Multilevel Modeling: Cluster Randomized Trials
Description: Multilevel models are used when data are R Code: WebPower -> wp.crt2arm
clustered within a hierarchical structure that will make them
wp.crt2arm(n=NULL, f = NULL, J = NULL, icc = NULL, power = NULL, alpha =
non-independent. Also known as linear mixed models. 0.05, alternative = c("two.sided", "one.sided"))
Cluster randomized trials (CRT) are a type of multilevel design • n= sample size (number of individuals per cluster)
• f= effect size (either main effect of treatment, or mean difference
where the entire cluster is randomly assigned to a control between treatment clusters and control clusters)
arm or one or more treatment arms. • J= number of clusters/sides. It tells how many clusters are considered
in the study design. At least two clusters are required
• icc= intra-class correlation (degree to which two randomly drawn
observations within a cluster are correlated)
• alpha= significance level
Example: Is there a difference in blood glucose levels between a treatment and
• power= statistical power
control?
• alternative= direction of the alternative hypothesis ("two.sided" or
• H0=0, H1≠0
• You don’t have background info, so you guess that there is a medium "less" or "greater")
effect size
• For f-tests: Effect size calculation NOTE: here we show a
0.1=small, 0.25=medium, and 0.4 large effect sizes 2 arm example
• Don’t know the icc, so will guess at 0.1 (0.5 is the default for repeated • 𝑓 = 𝜇𝐷 ൗ (𝜎𝐵2 +𝜎𝑊
2
) (treatment, control); to
measures, but we expect this to be lower, since the observations are from use a 3 arm design
different people) • 𝜇𝐷 =mean difference between (treatment1,
• Alternative is “two.sided” as we only care about difference treatment and control clusters treatment2, control),
• We can test for two sizes: number per cluster or cluster number
1. Try for 100 clusters • 𝜎𝐵2 =between-cluster variance use wp.crt3arm
2
2. Try for 15 individuals per cluster to get cluster number • 𝜎𝑊 =within-cluster variance
Multilevel Modeling: Cluster Randomized Trials
Results: