Sample Size Calculation & Software
Sample Size Calculation & Software
Sample Size Calculation & Software
software
KWENTI E. TEBIT
Why do sample size calculations?
Prospective study design with sample size calculation helps to avoid studies that are:
• Too small: leads to equivocal results. An under powered study may dismiss a potentially beneficial
treatment, or may fail to detect an important relationship.
• Too large: wastes resources.
Both sample size errors create ethical issues when using humans or animals.
• Too small: you have exposed them to harm with little likelihood of learning anything.
• Too big: you have exposed more of them to harm than was necessary.
Why do sample size calculations?
Secondary benefit: Makes for better studies. Before you can do a sample size calculation, you will
have to:
• Define the scientific issue you are addressing.
• Translate the issue into research questions or hypotheses.
• Determine what data are needed.
• Formulate the questions or hypotheses in terms of parameters describing the distribution of the
data to be collected.
• Map out the statistical analysis plans
Cont…
The process of sample size calculation can substantially improve study design. It requires one to
think through:
• definition of the scientific issue
• how the scientific issue is being formulated as an empirical question
• sampling plan
• variables to be collected
• statistical analysis plan
• expected results
In general, if the details of implementation has been glossed over, this will become obvious
during sample size calculation.
Recall that the p-value from a hypothesis test can be used to
1. decide whether to reject the null hypothesis (reject if p-value less than α)
2. summarize the evidence against the null
For the purposes of designing a study we use the first method.
Typically α = 0:05.
When we run the study we can also interpret the p-value as evidence against the null.
Definitions:
The power of a test is the probability of the correct decision when the null hypothesis is false.
Power = Pr(reject H0jH0 is false)
That is, the power is the probability of finding an effect when an effect exists.
Power = Pr(reject H0jH0 is false)
= 1 − Pr(fail to reject H0jH0 is false) = 1 − β
Cont…
Over powered:
If the sample size is too large the study will be able to detect very small differences.
This is a waste of money and time if the difference is so small it is scientifically or clinically
unimportant.
If the intervention is risky you have put too many individuals at risk.
Under powered:
If the sample size is too small the study will be unable to detect differences that are scientifically or
clinically important.
The risk taken by the individuals in the study was unnecessary because the study was unlikely to
detect clinically important effects.
Also a waste of money and time.
Methods of calculating sample sizes
There are three main methods of estimating samples sizes; online, software and manual
calculation.
Online
There are websites that can be used for calculating sample sizes hosted by a number of
organisations and are open for the public such as
www.surveysystem.com/sscalc.htm
www.nss.gov.au/nss/home.nsf/pages/Sample+size+calculator
www.raosoft.com/samplesize.html
https://www.surveymonkey.com/mp/sample-size-calculator/
powerandsamplesize.com/
www.calculator.net › Math Calculators
https://fluidsurveys.com/survey-sample-size-calculator/
Manual computation
Using formulae and calculators you can estimate the sample size required for your study such
as
𝒁2 𝒙 𝑷 (𝟏 −𝒑)
n= 𝒆²
} Lorentz formula
There are formulae for case control studies, comparison of proportions, comparison of means,
diagnostic accuracy and other descriptive studies. (see Hajian-Tilaki K. Journal of Biomedical
Informatics 2014; 48: 193–204).
Software
Statistical software can also be used to calculate sample size e.g. Epi Info, SPSS, Stata, SAS,
R, Minitab etc.
These do not give you control over your type 1 and type 2 errors.
G*Power offers you with great control over your type 1 and type 2 error and provide a more
accurate approach in estimating your sample size.
Introduction to G*Power
Many papers published in the scientific literature do not have enough power to make any
conclusion.
G*Power is an easy to use program for performing various types of power analysis.
G*Power version 3.1.9.2 was written by Franz Faul, Universitât Kiel, Germany.
It is the most widely used software for power analysis.
Types of power analyses
There are two most common types—a priori and post-hoc power analysis.
a priori
An a priori analysis is done before a study takes place.
It is the ideal type of power analysis because it provides users with a method to control both
the type 1 error probability α and the type 2 error probability β
By implication, it also controls the power of the test, that is, the complement of the type-2
error probability (1 - β) (i.e., the probability of correctly rejecting H0 when it is in fact false).
An a priori analysis is used to determine the necessary sample size N of a test given a desired α
level, a desired power level (1 - β), and the size of the effect to be detected
i.e. a measure of the difference between the H0 and the H1
post-hoc analysis
a post-hoc analysis is typically performed after a study has been conducted so that the sample
size N is already a matter of fact. Given N, α, and a specified effect size, this type of analysis returns
the power (1 – β), or the error probability of the test.
Obviously, post-hoc analyses are less ideal than a priori analyses because only α is controlled, not β.
Both β and its complement (1 - β) are assessed but not controlled in post-hoc analyses.
Thus, post-hoc power analyses can be characterized as instruments providing for a critical
evaluation of the (often surprisingly large) error probability β associated with a false decision in
favor of the H0.
Getting to know G*Power
When you open G*Power for the first time, it presents with 3 window; input parameters, output
parameters, and a window presenting the distribution plot.
Above you have the quick access toolbar with commands such as file, edit, view, tests,
calculator etc.
Below the distribution plot window, you have three drop down menus; test family, statistical
test, type of power analysis.
Below the input and output parameters, you have commands to plot an X-Y graph for a range
of values.
How to use G*Power
1) Observational 2) Experimental
Descriptive studies Randomised controlled trails
Ecological studies Field trials
Ecological fallacy Community trials
Cross-sectional studies
Case-control studies
Cohort studies
Effect size
An effect size is simply an objective and standardized measure of the magnitude of observed
effect (Field, 2005).
The fact that the measure is standardize thus means that we can compare effect sizes across
different studies that have measured different variables, or have used different scales of
measurement.
The most common measures of effect sizes are Cohen’s d, and Pearson’s correlation
coefficient, r.
Others include Hedges’ g, Glass’ , odd ratios and risk rates
1) Correlation coefficients (r)
r can also be used to express difference between means and is constrained to lie
between 0 (no effect) and 1 (a perfect effect).
r can also be used to express the difference between two groups.
r is related to the t in the t-test: r can be easily obtained from several common test
statistics.
For example, if a t test has been used, r is a function of the observed t-value and the
degree of freedom, df, on which it is based:
When ANOVA has been used and an F-ratio is the test statistics, then when there is 1
degree of freedom for the effect, the following conversion can be used:
In which F(1,-) is simply the F-ratio for the effect (which must have 1 degree of
freedom) and dfR is the degrees of freedom for the error term on which the F-ratio is
based.
r = 0.10 (small effect): in this case, the effect explains 1% of the total variance.
r = 0.30 (medium effect): the effect accounts for 9% of the total variance.
r = 0.50 (large effect): the effect accounts for 25% of the total variance.
2) Cohen’s d
Where pooled SD =
Among 7th graders in Lowndes County Schools taking the CRCT reading exam (N = 336),
there was a statistically significant difference between the two teaching teams, team 1 (M
= 818.92, SD = 16.11) and team 2 (M = 828.28, SD = 14.09). Compute the effect size.