0% found this document useful (0 votes)
55 views10 pages

Samplesizearticle

The document is a review article by Jihad Abdallah that focuses on simplifying sample size determination for common experimental designs. It highlights the importance of accurate sample size calculations in research planning and provides simplified formulas and guidelines for researchers. Additionally, an Excel sheet for sample size calculations is made available as a supplementary resource.

Uploaded by

Three Stooges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views10 pages

Samplesizearticle

The document is a review article by Jihad Abdallah that focuses on simplifying sample size determination for common experimental designs. It highlights the importance of accurate sample size calculations in research planning and provides simplified formulas and guidelines for researchers. Additionally, an Excel sheet for sample size calculations is made available as a supplementary resource.

Uploaded by

Three Stooges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/377439946

A Review of Sample Size Determination for Common Experi-mental Designs:


Further Simplified Equations

Article in An-Najah University Journal for Research - A (Natural Sciences) · January 2024
DOI: 10.35552/anujr.a.38.1.2120

CITATIONS READS

0 89

1 author:

Jihad Abdallah
An-Najah National University
45 PUBLICATIONS 888 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jihad Abdallah on 13 February 2024.

The user has requested enhancement of the downloaded file.


An-Najah University Journal for Research – A

Natural Sciences
A Review of Sample Size Determination for Common
Experimental Designs: Further Simplified Equations
Received 27th May. 2023, Accepted 27th Aug. 2023, Published 1st Feb, 2024, DOI: 10.35552/anujr.a.38.1.2120

Jihad M. Abdallah1*

Abstract: Determination of the required sample size is an important step in the planning of
any research study. A large number of commercial and online resources are readily avail-
able for calculation of sample size required for various research designs. However, this
abundance of information, the complexity, and the variation in calculation formulations and
2” × 2”
terminology used make it more confusing to researchers, particularly those with limited sta- Graphical Table of Con-
tistical knowledge. Therefore, there is a need for more simplified, easy to implement formu-
las for calculation of sample size. Here, we present a short review of the rules for calcula- tent
tion of sample size for common experimental designs used in research and provide more (TOC)
simplified forms for some of these formulas. Also, data are presented on sample size re-
quired for various scenarios with the intent to provide guidelines for researchers. A simple-
to-use Excel sheet was developed to perform sample size calculations and is available
online as a supplementary file. This Excel sheet was used to generate the data presented
in this publication.

Keywords: Effect Size, Multiple Testing, Required Sample Size, Statistical Power, Testing Means, Testing Proportions
Introduction
Sample size calculation is an important early step in any re- Meysamie et al., 2014 (5) showed that most of the online sample
search not only to attain sufficient statistical precision but also for size calculators are limited to sample size calculation for estimat-
better utilization of available resources. If the sample used is too ing proportions and considered a fixed value of 0.50, and in cer-
small, the study will not provide reliable answers to study ques- tain cases, inaccurate calculations were obtained.
tions (1). It will lack the statistical power to detect significant dif- Simplification of formulas for sample size calculation allows
ferences and effects, the data will not approximate well the un- the researcher to make a quick determination of sample size
derlying statistical distribution (normal or other) and will lack suf- while avoiding the overwhelming statistical notations and the
ficient representation for the results to accurately describe the mathematical derivations (6). The main objectives of this work
population (2, 3). Increasing the sample size improves the valid- are to present a review of the formulas for calculation of sample
ity and reliability of results and increases the statistical power to size for common research designs, to provide these rules in the
detect significant differences when they truly exist. However, if simplest possible forms, and to explore sample size for various
the sample utilized is too large, it is a poor use of resources and scenarios (i.e., different values of power and effect size, etc.). In
extends the time and effort required to finish the study. Further- addition, an easy-to-use Excel sheet was developed by the au-
more, a sample larger than the required size may put more indi- thor to implement these rules and is available as a supplemen-
viduals at risk in certain interventions (4). It is therefore crucial tary file for interested users.
for researchers to determine the required sample size before
conducting research studies to ensure that they have enough Factors affecting sample size and related statis-
sample size to draw meaningful conclusions without wasting tical terminology
available resources. Many factors affect the calculation of the required sample
A large number of commercial and online resources are size including the study design, one-sided or two sided hypothe-
readily available for calculation of sample size required for vari- sis testing, the sampling method, the type of population being
ous research designs. However, the abundance of information sampled (homogenous or heterogeneous), the dropout rate (or
and the variation in the calculation formulations and the terminol- mortality rate), the nature of the outcome being measured (binary
ogy used make it more confusing to researchers, particularly or continuous), the effect size, the statistical power, the signifi-
those with little statistical knowledge. Furthermore, a review by cance level, and the variability in the population. The number of

1
Department of Animal Production & Animal Health, Faculty of Agriculture and Veterinary Medicine, An-Najah National University, Nablus
PO Box 7, Palestine
*Corresponding author: [email protected]

8
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
predictors in the regression model, R-square, and effect size are the parameter and the true
important factors to consider when determining sample size for value of the parameter.
regression analysis. For a good overview of factors influencing Nature of the sampled popu- Homogenous populations
sample size determination, the reader is referred to other reviews lation: refers to how similar require less sample size
in the literature (4, 7-11). Table (1) provides a summary of the are the sampling units in the compared to less homoge-
main statistical terms related to sample size calculation and their population (homogenous or neous populations because
effect on the required sample size. heterogeneous). of lower variability when the
Table (1): Summary of the statistical terminology and the main population is homogenous.
factors affecting the required sample size.
Dropout rate (or mortality The required sample size
Term or factor Effect on sample size rate): the percentage of the increases as the expected
Significance level, α: the The sample size increases subjects or units in the sample drop out rate increases and
probability of Type 1 error, or as α decreases and vice who drop out or die during the vice versa.
the false positive rate. It is the versa course of the study.
probability to falsely reject the R-square: the proportion of The required sample size
null hypothesis and is set by the total variation in the de- increases when a higher R-
the researcher. The typical pendent variable that is ex- square is deemed accepta-
value used is 0.05. plained by the set of predictors ble and vice versa.
Statistical power: is the prob- The sample size increases in the regression model.
ability to reject a false null hy- as statistical power in- Number of predictors (in the The required sample size
pothesis of no effect or no dif- creases and vice versa regression model) increases as the number of
ference, i.e., the ability of the predictors increase and
statistical test to detect a true vice versa.
significant effect. Power = 1-β,
Type of study The sample size required
where β is the probability of
for descriptive studies
Type 2 error (probability of not
(such as those based on
rejecting a false null hypothe-
surveys and question-
sis). A typical value of 80% is
naires) is larger than that
generally used by researchers
required for analytical stud-
for statistical power.
ies. Observational studies
Effect size: the magnitude of The sample size deceases need larger samples than
the effect or difference to be as the effect size increases experimental studies
tested; for example, the differ- and vice versa.
Qualitative vs. quantitative Quantitative research is
ence between the treatment
research generally based on larger
and control groups.
samples than qualitative re-
One-sided vs. two-sided hy- The required sample size is search
pothesis testing: a one-sided smaller for one-sided tests
Binary vs. continuous out- Binary outcomes require
hypothesis tests if the param- compared to two-sided
comes: binary outcomes in- larger sample size than
eter is larger or smaller than a tests (because Zα < Zα/2).
volve outcomes with two cate- continuous outcomes
hypothesized value while a
gories (for example, yes/no or
two-sided hypothesis tests if
presence/absence responses)
the parameter is different from
a specified value. Sample size calculation
Population standard devia- As σ increases, the re- Some of the early approaches to deal with sample size de-
tion, σ: quantifies the variabil- quired sample size in- termination in experiments include Cochran and Cox (1957),
ity among units in the popula- creases and vice versa Harris et al. (1948), Harter (1957), Tang (1938), and Tukey
tion. (1953) (12-16). Most approaches are based on detecting differ-
ences of a specified size or obtaining confidence intervals not
Population proportion, P: The sample size is maxi-
larger than a stated width. The first approach will be illustrated
the portion of the population mum at P = 0.50 and de-
herein for determining sample size to test means, and the sec-
having the investigated char- creases P gets closer to 0
ond approach is illustrated in determining sample size to test pro-
acteristic (e.g., prevalence of or 1.
portions. Other available approaches will be also discussed.
the disease, proportion of
smokers, etc.) Sample size calculation for testing means
Margin of error: refers to the The sample size increases Based on the first approach, a general formula to determine
level of precision required. It is as the desired margin of er- the minimum sample size required for testing means with a
half the width of the desired ror decreases and vice stated effect size is given by Steel et al. ,1997 (3) as follows (with
confidence interval. Also versa slightly modified notation and arrangement):
2
called the maximum error of
(𝑍𝛼 +𝑍𝛽 )
the estimate and is defined as 𝑛= 2
(1)
∆2
the maximum likely difference 𝜎2
𝐷
between the point estimate of

9
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
where, n is the sample size per group, 𝛼 is the desired sig- where P is the assumed population proportion and d is the
nificance level (probability of Type 1 error, that is the probability margin of error which is equal to half-width of the confidence
to falsely reject H0), 𝛽 is probability of Type 2 error (probability interval with a desired (1 − 𝛼)100% confidence cofficient. The
of not rejecting a false H0, with the power of the test defined as problem here is that an estimate of P is required. If no previous
1- 𝛽), ∆ is the true difference or effect size to be tested (e.g., 𝜇1 − information is available on P, the researcher can use P = 0.50
𝜇0 , 𝜇1 − 𝜇2 , for single group and two groups, respectively) and which results in the maximum sample size ([𝑃(1 − 𝑃)] is maxi-
𝜎𝐷2 depends on the research design, and hence, the statistical mum when P = 0.50).
test used. For pre-test/post-test design (before-after design), 𝜎𝐷2 Equation (6) assumes that sampling is from an infinite popu-
is the variance of the differences. For other designs, it is defined lation. The sample size is corrected for finite population size as
in terms of the error variance, 𝜎 2 (𝜎𝐷2 = 𝜎 2 for single-group de- follows (17):
sign and 𝜎𝐷2 = 2𝜎 2 for designs involving two or more groups in- 𝑛𝑁
𝑛∗ = (7)
cluding independent-groups designs and the randomized com- 𝑛+(𝑁−1)

plete block design). The values 𝑍𝛼/2 and 𝑍𝛽 are critical values where N is the population size. Note that n* is smaller than
obtained from the standard normal distribution such that n, that is, correction results in smaller sample size as sampling
𝑃 (𝑍 ≥ 𝑍𝛼 ) = 𝛼/2 and 𝑃(𝑍 ≥ 𝑍𝛽 ) = 𝛽. The typical values used from a finite population is more efficient than sampling from an
2
infinite population. Correction can be ignored when the sampling
by most researchers are 0.05 for 𝛼 and 0.20 for 𝛽 (i.e., power =
fraction (n/N) is small.
80%). If one-tailed test is desired instead of a two-tailed test,
then 𝑍𝛼 is replaced by 𝑍𝛼 (in this case the required sample size For testing the difference between two proportions (two-in-
2
dependent samples), the following formula is used (18-20):
will be smaller). 2

The main problem with this approach is that 𝜎 2 is usually not (𝑍𝛼 +𝑍𝛽 ) [𝑃1 (1−𝑃1 )+𝑃2 (1−𝑃2 )]
2
𝑛= (8)
known and an estimate is needed. If 𝜎 2 is underestimated, n is (𝑃1 −𝑃2 )2

too small and if 𝜎 2 is overestimated, n is too large (3). The prob- where n is the sample size per group, 𝑃1 and 𝑃2 are the as-
lem is cleverly solved by defining ∆ in terms of 𝜎, i.e., using a sumed proportions in group 1 and group 2, respectively. A cor-
standardized effect size. Therefore, if we define the standardized rection for finite population sizes is performed as follows:
∆ ∆
effect size 𝛿 = (or 𝛿 = for the pre-test/post-test design), 2
𝜎 𝜎𝐷 (𝑍𝛼 +𝑍𝛽 ) [𝑓1 𝑃1 (1−𝑃1 )+𝑓2 𝑃2 (1−𝑃2 )]
∗ 2
then Equation (1) becomes: 𝑛 = (9)
(𝑃1 −𝑃2 )2
2
𝑁1 −𝑛 𝑁2 −𝑛
𝑘(𝑍𝛼 +𝑍𝛽 ) where 𝑓1 = , and 𝑓2 = with N1 and N2 the sizes
2 (𝑁1 −1) (𝑁2 −1)
𝑛= (2)
𝛿2 of the populations being sampled. As was pointed out for single
where k =1 for single-group and pre-test/post-test designs, proportion, the correction for finite population sizes can be ig-
and k =2 for the independent groups (e.g., the completely ran- nored when the ratio of the sample size to the population size is
domized design) and the randomized complete block designs. If small (e.g., less than 0.02).
we apply the typical values of 0.05 for 𝛼 (𝑍𝛼 = 1.64, 𝑍𝛼/2 = 1.96)
Sample size considerations for multiple testing
and 0.20 for 𝛽 (𝑍𝛽 = 0.84), then Equation (2) is further simplified
In many experiments, the researcher is interested in making
to:
pairwise comparisons among several treatment groups. Multiple
7.85 𝑘
𝑛= , for two-tailed tests (3) and tests end up being performed in a single experiment. This raises
𝛿2
6.15 𝑘 the issue of Type 1 error rate in multiple testing. In the previous
𝑛= , for one- tailed tests (4) sections, the significance level, 𝛼, was defined as the probability
𝛿2

Therefore, for two-tailed tests, 𝑛 =


7.85
for single sam- of Type 1 error for a single comparison (single test). If m inde-
𝛿2
15.70
pendent comparisons are performed, then the probability that at
ples and before-after designs, and 𝑛 = for independent- least one null hypothesis is falsely rejected is 1 − (1 − 𝛼)𝑚 which
𝛿2
groups and randomized complete block designs. Allen, 2011 (6) is larger than 𝛼. This is usually called the family-wise error rate,
16
suggested using 𝑛 = as a rule of thumb for calculating sam- FWER (3, 21). For example, if the researcher wishes to perform
𝛿2
ple size for two-independent groups (two-tailed t-test). five independent pairwise comparisons, then FWER = 0.226 for
𝛼 = 0.05. Several approaches have been proposed to correct the
Because the critical values are obtained from the standard
significance level for multiple testing. The most common are the
normal distribution, a correction must be made to the sample size
Sidak correction (also called the Sidak-Dunn correction) and the
to account for t-distribution as follows (3):
Bonferroni correction. Based on Sidak correction (22, 23), if a
(𝑑𝑓+3)
𝑛∗ = 𝑛 (5) family-wise error rate = 𝛼` is desired (typically 𝛼`= 0.05), then the
(𝑑𝑓+1)
required significance level for each individual test is calculated
where df is the error degrees of freedom for the specified
as 𝛼 = 1 − (1 − 𝛼`)(1/𝑚), while the Bonferroni correction (22, 24)
design. Both n and n* are rounded to the largest integer value.
uses 𝛼 = 𝛼`/𝑚, and both give a value of 0.01 for m = 5 and 𝛼` =
Sample size calculation for testing proportions 0.05. In this particular example, the researcher will use a signifi-
The sample size for testing proportions is usually determined cance level of 0.01 instead of 0.05 in calculation of the sample
based on obtaining confidence intervals not larger than a stated size required to control the family-wise error rate.
width. For testing population proportion (e.g., disease preva- The Sidak and the Bonferroni corrections are considered
lence) using a single sample, the following formula is used (17): very conservative when the number of tests is large and when
2
the tests are not independent (22). Therefore, alternative ap-
(𝑍𝛼 +𝑍𝛽 ) [𝑃(1−𝑃)]
𝑛= 2
(6) proaches have been proposed to make the correction less strin-
𝑑2
gent (e.g., 25-27). The procedure by Benjamini and Hochberg,

10
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
1995 (25) is based on reducing the false discovery rate (FDR)
instead of the family-wise error rate (22). If the pairwise compar- Other statistical methods for calculation of sample size
isons are not independent as in most experiments, one can use
In the previous sections, the focus was on formula-based
the correction suggested by John W. Tukey (28): 𝛼 = 1 −
techniques for calculating the required sample size (closed-form
(1 − 𝛼`)(1/√𝑚). This results in a larger α (0.02 compared to 0.01
solutions) These widely used methods for sample size determi-
for the example above), and hence smaller sample size is re- nation are based on the frequentist approach where prior point
quired than when independence is assumed (21). estimates need to be specified. One problem is that we are gen-
Sample size for other research designs erally uncertain about these prior estimates and this uncertainty
The focus in this review was on some widely used experi- is not accounted for by the frequentist methods. In contrast,
mental designs particularly in agricultural, environmental, social Bayesian methods (e.g., 42-45) can deal with the uncertainty as-
and life sciences. Other designs like case-control and cohort sociated with prior information by replacing the prior point esti-
studies are very common in medical studies. Interested readers mate by a prior distribution which is then updated to a posterior
may consult other reviews (e.g., 29, 30) for further details on distribution using Bayes rule. Brus et al., 2022 (46) provided an
sample size determination for such designs. Furthermore, sev- excellent overview of both approaches with application to deter-
eral online tools are available to determine sample size require- mine the number of sampling locations required for soil survey.
ments for such designs (e.g., Epitools-Epidemiological Calcula- Another approach for estimating sample size is by simulation.
tors site available at: http://epitools.ausvet.com.au/samplesize). The basis for simulation methods is the general approach for es-
Allen, 2011 (6) extended the formula used for two-independent timating power presented by Feiveson, 2002 (47). Simulation-
samples to accommodate the repeated measures design. Cross- based methods are iterative techniques which start by generat-
over designs (within-subject design where the same subject re- ing a data set with an initial size from a given distribution and
ceives different treatments during different periods in random or- then calculating the statistical power (or any other criteria like
der) are also popular in medical trials. Siyasinghe and minimum error, R-square, etc). The sample size is then itera-
Sooriyarachchi, 2011 (31) provided guidelines for calculating tively modified, repeating power calculation, until a sample size
sample size in 2x2 crossover trials while Moxely, 2021 (32) pro- with a certain desired value of power is reached (48, 49). Simu-
vided a review of sample size and efficacy of these designs. lation-based determination of sample size can be implemented
Dharmarajan et al., 2019 (33) proposed two new sample size with codes using existing standard statistical software (50). How-
estimation methods that provide a more accurate estimate of the ever, when complex models are involved, specialized complex
true required sample size for the case-crossover studies than the algorithms are required. The major disadvantage of Bayesian
traditionally used Dupont formula which was originally developed methods and simulation-based methods is that they are compu-
for matched case-control studies (34). tationally intensive which has restricted their application.
Regression analysis is very important in many research Software and web applications for determination of sample
fields, particularly, agricultural, environmental and social sci- size
ences. Several methods have been proposed to deal with sam- Many software programs and web applications are freely
ple size determination for both linear and logistic regression anal- available for calculation of sample size. Researchers should be
yses (35-39). careful to which resources to use as some online calculators may
The equations presented herein for testing two proportions give erroneous calculations as outlined by Meysamie et al., 2014
assume independent samples. Conor, 1987 (40) presented sam- (5). Researchers also need to choose the most relevant resource
ple size calculations for testing differences in proportions for the for the type and design of the study they intend to carryout. Some
paired-sample design. Divine et al., 2013 (41) reviewed sample widely used software and online calculators (commercial and
size calculations for different Wilcoxon tests (nonparametric free) are listed in Table (2) with information on calculation meth-
tests). In addition, the present review assumed equal group ods and where to access them.
sizes, the readers are referred to the review by Whitley and Ball,
2002 (11) where unequal group sizes were also described.
Table )2(: Examples on available software and web applications for calculation of sample size
Software or web application Determination Method Where to access
Epitools-Epidemiological calculators Formula-based (51) http://epitools.ausvet.com.au
(Free web application)
Select Statistical Services Calcula- Formula-based https://select-statistics.co.uk/calculators/
tors
(Free web-application)
G*Power Simulation https://www.psychologie.hhu.de/arbeitsgrup-
(Free software) (52, 53) pen/allgemeine-psychologie-und-arbeitspsycholo-
gie/gpower
Scalex and ScalaR calculators Simulation (54) https://sites.google.com/view/sr-ln/ssc
(Free software)
Psychometroscar Simulation (39) https://psychometroscar.com/2018/07/31/power-
(Free software) analysis-for-multilevel-logistic-regression/
nQuery Multiple (Formula-based, Bayesian, www.statsols.com/nquery
(Commercial software) and adaptive design)
Power and Precision Power analysis software https://www.power-analysis.com/
(Commercial software)

11
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
Illustration data of 0.05 was used in the calculations. The graph illustrates that
Testing means the required sample size is larger for two-tailed tests compared
to one-tailed tests and increases with increased power and for
Figure (1) shows the required sample size for pre-test/post-
smaller effect size. However, power becomes less important as
test designs and for various scenarios of power and standardized
the effect size exceeds 1.5 standard deviations.
effect size for one-tailed and two-tailed tests. A significance level

Figure (1): Sample size calculations for pre-test/post-test designs. Calculations were based on a significance level of 0.05.
Table (3) shows sample size calculations for independent- quire closely similar sample sizes for the same number of treat-
groups designs (e.g., the completely randomized design) and the ments and effect size. However, because randomized block de-
randomized complete block design with different number of treat- signs are more efficient than independent-groups designs (less
ments or groups and size effects. Calculations were based on a error variance is expected due to blocking), researchers can as-
two-tailed test, a significance level of 0.05 and power of 80% sume larger effect size when determining sample size for ran-
(𝛽 = 0.20). The number of treatments has almost no effect on domized block designs. For a two-tailed test and typical power
sample size requirement. Furthermore, both types of designs re- of 0.80, the group size required to detect a size effect of 0.5 to 3
standard deviations varies from 3 to 65 replicates per treatment.
Table (3): Sample size calculations for the independent-groups design and the randomized complete block design for various scenarios of
effect size and number of treatments.
Number of treatments or groups
2 3 4 56 7 2 3 4 5 6 7
Effect size
n*, a
Independent-Groups Design Randomized Complete Block Design
0.50 64 64 64 64 64 64 65 64 64 64 64 64
0.75 29 29 29 29 29 29 30 29 29 29 29 29
1.00 17 17 17 17 17 16 18 17 17 17 17 17
1.25 12 11 11 11 11 11 12 12 11 11 11 11
1.50 9 8 8 8 8 8 9 9 8 8 8 8
1.75 7 6 6 6 6 6 7 7 6 6 6 6
2.00 6 5 5 5 5 5 6 6 5 5 5 5
2.25 4 4 4 4 4 4 5 4 4 4 4 4
2.50 4 4 4 3 3 3 5 4 4 4 3 3
2.75 3 3 3 3 3 3 4 3 3 3 3 3
3.00 3 3 3 3 3 3 4 3 3 3 3 3
n = minimum number of replicates per treatment assuming α = 0.05, 𝛽 = 0.20 (power = 80%), and two-tailed tests.
a *

Testing proportions can safely assume P = 0.50. Power has large impact on the re-
As shown in Figure (2), the required sample size is largely quired sample size when P is intermediate but its effect dimin-
affected by the assumed value of P. It is maximum at P = 0.50 ishes as P approaches 0 or 1.
and decreases symmetrically as P gets closer to 0 or 1. There-
fore, if no prior information is available on P, then the researcher

12
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
Figure (2): Sample size required for testing single population Figure (4): Sample size required per group for testing two pro-
proportion for various values of power and P. Calculations were portions for various values of power and 𝑃1 − 𝑃2 . Calculations
made using 𝛼 = 0.05 (95% CI), d = 0.05, and assuming two- were made using 𝛼 = 0.05 (95% CI), and assuming two-tailed
tailed test and infinite population size. test and infinite population sizes.
Figure (3) shows corrected sample size for n = 100 and var- Multiple testing
ious values of N. Correction for finite population size can be ne- Table (4) shows the values of the family-wise error rate when
glected only when the sample size is very small compared to the no adjustment is made for multiple testing and the corrected sig-
population size, i.e., when the sampling fraction, n/N, is small nificance level, α, that needs to be used for pairwise compari-
(less than 0.02 in the simulated graph). sons when an overall error rate of 5% (𝛼`= 0.05) is desired. The
data are shown for up to twenty multiple comparisons based on
Sidak and Bonferroni corrections for independent tests and
Tukey correction for dependent tests. The data demonstrates
that the required significance level gets smaller as the number of
comparisons increases but the adjustment is less stringent (the
required significance level is larger) if we assume dependent
tests compared to independent tests. Note also that the cor-
rected significance level values are very similar for both Sidak
and Bonferroni corrections for independent tests.
Table (4): Required significance level in calculation of sample
size to control for the family-wise error rate in multiple testing.
Number Family- Corrected significance level (𝜶) b
of com- wise Er- Sidak Bonfer- Tukey
pari- ror Rate a correc- roni cor- correc-
sons tion rection tion
1 0.0500 0.0500 0.0500 0.0500
2 0.0975 0.0253 0.0250 0.0356
3 0.1426 0.0170 0.0167 0.0292
4 0.1855 0.0127 0.0125 0.0253
Figure (3): Corrected sample size (n*) in relation to population 5 0.2262 0.0102 0.0100 0.0227
size (N). Calculations were made for n = 100 with n*= nN/[n+(N-
6 0.2649 0.0085 0.0083 0.0207
1)].
7 0.3017 0.0073 0.0071 0.0192
The sample size required per group depends largely on 𝑃1 −
𝑃2 (the required sample size decreases as the difference in- 8 0.3366 0.0064 0.0063 0.0180
creases and vice versa), while power has little impact when 𝑃1 − 9 0.3698 0.0057 0.0056 0.0170
𝑃2 exceeds 0.25 (Figure 4). These sample sizes presented in 10 0.4013 0.0051 0.0050 0.0161
Figure (4) were calculated assuming infinite population sizes.
11 0.4312 0.0047 0.0045 0.0153
12 0.4596 0.0043 0.0042 0.0147
13 0.4867 0.0039 0.0038 0.0141

13
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
Number Family- Corrected significance level (𝜶) b with simulated data. This review, along with the developed Excel
of com- wise Er- calculation sheet, will make it easier for researchers to under-
Sidak Bonfer- Tukey
pari- ror Rate a correc- roni cor- correc-
stand, choose, apply, and correctly report the appropriate sam-
sons ple size calculations for their studies.
tion rection tion
14 0.5123 0.0037 0.0036 0.0136 Declarations
15 0.5367 0.0034 0.0033 0.0132 Ethics approval and consent to participate Not applicable

16 0.5599 0.0032 0.0031 0.0127 Consent for publication Not applicable

17 0.5819 0.0030 0.0029 0.0124 Author’s contribution Not applicable


Availability of data and materials All data generated dur-
18 0.6028 0.0028 0.0028 0.0120
ing this study are included in this published article. The Excel
19 0.6226 0.0027 0.0026 0.0117 sheet used to generate the data is available online as a supple-
20 0.6415 0.0026 0.0025 0.0114 mentary file.
a Funding No funding has been received for this work
Family-wise error rate if no correction is made for multiple tests
= 1 − (1 − 𝛼)𝑚 for m independent pairwise test. Conflict of interests The author declares that that there is
b no conflict of interests regarding the publication of this article
Calculations of 𝛼 were made to control the familywise error rate
at 0.05 level (𝛼` = 0.05). Acknowledgments: The author would like to acknowledge
Reporting sample size determination by re- the logistic support provided by An-najah National University
(www.najah.edu).
searchers
It is very important in this review to highlight the need for
References
transparent reporting of sample size calculations in research ar- 1) Fitzner C, Heckinger E. Sample size calculation and power
ticles. Some studies (55) surveyed published research and con- analysis: A quick review. The Diabetes Educator. 2010;
cluded that sample size calculation is still inadequately reported 36(5):701-707. https://doi.org/10.1177/0145721710380791
and often erroneous or based on assumptions that are frequently 2) Di Lorio CK. Review of statistical concepts. In: Measurement
inaccurate. Guidelines and recommendations have been devel- in health behavior: Methods for research and evaluation.
oped for reporting the outcomes of scientific studies including San Francisco: Jossey-Bass; 2005:153-154.
sample size calculations, such as the Consolidated Standards of
3) Steel RGD, Torrie JH, Dickey DA. Principles and Procedures
Reporting Trials (CONSORT) 2010 statement developed for ran-
of Statistics: a biometrical approach. 3rd edition, NY:
domized controlled trials (56, 57) and its 2022 extension (58) and
McGraw Hill Inc.; 1997.
the SPIRIT 2013 statement for clinical trial protocols (59, 60).
Researchers need to adhere to these guidelines and ensure ac- 4) Gumpili SP, Das AV. Sample size and its evolution in re-
curate and complete reporting of sample size determination and search. IHOPE J Ophthalmol. 2022;1(1):9-13.
resources used in calculations, which will contribute to improving https://doi.org/10.25259/IHOPEJO_3_2021
the quality of scientific publications. 5) Meysamie A, Taee F, Mohammadi-Vajari M-A, Yoosefi-
Khanghah S, Emamzadeh-Fard S, Abbassi M. Sample size
Emerging trends and future directions
calculation on web, can we rely on the results? J Med Stat
Sample size calculation will continue to be an important is- Inform. 2014;2(3):1-8. http://dx.doi.org/10.7243/2053-7662-
sue due to its major impact on the results of research. Emerging 2-3
trends in sample size calculation include adaptive trial designs
6) Allen JC. Sample size calculation for two independent
(which involve re-estimation of sample size based on accumulat-
groups: A useful rule of thumb. Proceedings of Singapore
ing interim data to achieve the desired power) and implementa-
Healthcare. 2011;20(2):138-140.
tion of the “promising zone’ design in clinical trials. The promising
zone design was first described by Mehta and Pocock, 2011(61) 7) Al-Subaihi A. Sample size determination: Influencing factors
based on earlier work by Chen et al., 2004 (62). For further de- and calculation strategies for survey research. Saudi Med J.
tails on adaptive designs and the promising zone design, the 2003;24(4):323-330.
reader is referred to the reviews by Pallmann et al., 2018 (63) 8) Althubaiti A. Sample size determination: A practical guide for
and Edwards et al., 2020 (64), respectively. The extension of health researchers. J Gen Fam Med. 2023;24:72–78.
these designs to other types of studies should be investigated in https://doi.org/10.1002/jgf2.600
the future. Developing easy-to-use software programs for imple- 9) Chander NG. Sample size estimation. J Indian Prosthodont
menting Bayesian and simulation-based methods while reducing Soc. 2017; 17(3):217-218.
their computational burden will facilitate their practical applica- https://doi.org/10.4103/jips.jips_169_17
tion by researchers. A comprehensive study of available online
10) Das S, Mitra K, Mandal M. Sample size calculation: Basic
sample size calculators (their features, advantages and limita-
principles. Indian J Anaesth. 2016;60:652-656. DOI:
tions) to filter out unreliable resources would help researchers to
10.4103/0019-5049.190621
avoid inaccurate calculations.
11) Whitley E, Ball J. Statistics review 4: Sample size calcula-
Conclusion tions. Critical Care. 2002;6:335-341.
This work presented an overview of the methods for the cal- 12) Cochran WG, Cox GM. Experimental designs. 2nd ed. New
culation of the required sample size for common experimental York: Wiley; 1957.
designs and presented the calculation equations in the most pos-
sible simple forms. In addition, the calculations were illustrated

14
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
13) Harris M, Harvits DG, Mood AM. On the determination of 32) Moxley KC. A review of sample size and design efficacy in
sample sizes in designing experiments. J Amer Statist Ass. crossover design in peer-reviewed psychology research.
1948;43:391-402. Wayne State University Dissertations. 2021;3548.
14) Harter HL. Error rates and sample sizes for range tests in https://digitalcommons.wayne.edu/oa_dissertations/3548
multiple comparisons. Biometrics. 1957;13:511-536. 33) Dharmarajan S, Lee J-Y, Izem R. Sample size estimation for
15) Tang PC. The power function of the analysis of variance case‐crossover studies. Stat Med. 2019;38(2):956-968. DOI:
tests with tables and illustrations of their use. Statist Res https://doi.org/10.1002/sim.8030
Mem. 1938;2 34) Dupont WD. Power calculations for matched case-control
16) Tukey JW. The problem of multiple comparisons. Mimeo- studies. Biometrics. 1988;44(4):1157-1168.
graph, Princeton University, NJ; 1953. 35) Dupont WD, Plummer WD. Power and sample size calcula-
17) Daniel WW. Biostatistics: A foundation for analysis in the tions for studies involving linear regression. Control Clin Tri-
health sciences. 7th edition. New York: John Wiley & Sons; als. 1998;19(6):589–601. https://doi.org/10.1016/S0197-
1999. 2456(98)00037-3

18) Chow SC, Shao J, Wang H. Sample Size Calculations in 36) Green SB. How many subjects does it take to do a regres-
Clinical Research, Second Edition. Boca Raton: Chapman & sion analysis. Multivariate Behavioral Research.
Hall/CRC; 2008. 1991;26(3):499-510.
https://doi.org/10.1207/s15327906mbr2603_7
19) Miot HA. Sample size in clinical and experimental trials. J
Vasc Bras. 2011;10(4):275-278. 37) Hsieh FY, Bloch DA, Larsen MD. A simple method of sample
size calculation for linear and logistic regression. Stat Med.
20) Wang H, Chow SC. Sample Size Calculation for Comparing
1998;17:1623–1634. https://doi.org/10.1002/(SICI)1097-
Proportions. In: Wiley Encyclopedia of Clinical Trials, 10.
0258(19980730)17:14
John Wiley & Sons, Inc.; 2007. Available from:
https://doi.org/10.1002/9780471462422.eoct005. 38) Maxwell SE. Sample size and multiple regression analysis.
Psychol Methods. 2000;5(4):434–458.
21) McConnell B, Vera-Hernandez M. Going Beyond Simple
https://doi.org/10.1037/ 1082-989X.5.4.434
Sample Size Calculations: a Practitioner’s Guide. IFS Work-
ing Paper W15/17. 39) Olvera Astivia OL, Gadermann A, Guhn M. The relationship
between statistical power and predictor distribution in multi-
22) Abdi H. The Bonferroni and Sidak corrections for multiple
level logistic regression: a simulation-based approach. BMC
comparisons. In: Salkind NJ (ed) Encyclopedia of measure-
Med Res Methodol. 2019;19:97, 1-20.
ment and statistics. Thousand Oaks: Sage; 2007.
https://doi.org/10.1186/s12874-019-0742-8
23) Šidák ZK. Rectangular confidence regions for the means of
40) Connor RJ. Sample size for testing differences in proportions
multivariate normal distributions. J Amer Statist Ass.
for the paired-sample design. Biometrics. 1987;43(1):207–
1967;62(318):626–633.
211. https://doi.org/10.2307/2531961
https://doi.org/10.1080/01621459.1967.10482935
41) Divine GH, Norton J, Hunt R, Dienemann J. A review of anal-
24) Bonferroni CE. Teoria statistica delle classi e calcolo delle
ysis and sample size calculation considerations for Wilcoxon
probabilita`. Pubblicazioni del Istituto Superiore di Scienze
tests. Anesthesia & Analgesia. 2013;117(3):699-710.
Economiche e Commerciali di Firenze. 1936;8:3–62.
https://doi.org/10.1213/ANE.0b013e31827f53d7
25) Holm S. A simple sequentially rejective multiple test proce-
42) Adcock CJ. The Bayesian approach to determination of sam-
dure. Scandinavian Journal of Statistics. 1979;6:65–70.
ple sizes: Some comments on the paper by Joseph, Wolfson
26) Hochberg Y. A sharper Bonferroni procedure for multiple and du Berger. The Statistician. 1995;44:155–161.
tests of significance. Biometrika. 1988;75:800–803.
43) Adcock CJ. Sample size determination: a review. J R Statist
27) Benjamini Y, Hochberg Y. Controlling the false discovery Soc D (The Statistician). 1997;46(2):261-283.
rate: A practical and powerful approach to multiple testing. J
44) Joseph L, Belisle P. Bayesian sample size determination for
R Statist Soc B. 1995;57(1):289–300.
normal means and differences between normal means. The
28) Braun HIE. The collected works of John W Tukey Vol. VIII. Statistician. 1997;46:209–226.
Multiple comparisons: 1948-1983. New York: Chapman &
45) Wang F, Gelfand AE. A simulation-based approach to
Hall; 1994.
Bayesian sample size determination for performance under
29) Charan J, Biswas T. How to calculate sample size for differ- a given model and for separating models. Statistical Sci-
ent study designs in medical research? Indian J Psychol ence. 2002;17(2):193–208. Available from:
Med. 2013;35(2):121-126. https://doi.org/10.4103/0253- http://www.jstor.org/stable/3182824
7176.116232
46) Brus DJ, Kempen B, Rossiter D, Balwinder-Singh, McDonald
30) Sharma SK, Mudgal SK, Thakur K, Gaur R. How to calculate AJ. Bayesian approach for sample size determination, illus-
sample size for observational and experimental nursing re- trated with Soil Health Card data of Andhra Pradesh (India).
search studies? Natl J Physiol Pharm Pharmacol. Geoderma. 2022;405:115396:1-10.
2019;10(1):1-8. https://doi.org/10.1016/j.geoderma.2021.115396
https://doi.org/10.5455/njppp.2020.10.0930717102019
47) Feiveson AH. Power by simulation. Stata Journal. 2002;
31) Siyasinghe NM, Sooriyarachchi MR. Guidelines for calculat- 2:107–124.
ing sample size in 2x2 crossover trials: a simulation study. J
Natn Sci Foundation Sri Lanka. 2011;39(1):77-89.

15
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine
48) Sutton AJ, Donegan S, Takwoingi Y, et al. An encouraging 56) Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 ex-
assessment of methods to inform priorities for updating sys- planation and elaboration: updated guidelines for reporting
tematic reviews. J Clin Epidemiol. 2009;62:241–251. parallel group randomised trials. BMJ. 2010;340:c869.
https://doi.org/10.1016/j.jclinepi.2008.04.005 https://doi.org/doi:10.1136/bmj.c869
49) Growther MJ, Hinchliffe SR, Donald A, Sutton AJ. Simula- 57) Schulz KF, Altman DG, Moher D. CONSORT 2010 state-
tion-based sample-size calculation for designing new clinical ment: updated guidelines for reporting parallel group ran-
trials and diagnostic test accuracy studies to update an ex- domised trials. BMJ. 2010;340:c332.
isting meta-analysis. The Stata Journal. 2013;13 (3):451– https://doi.org/10.1136/bmj.c332
473. 58) Butcher NJ, Monsour A, Mew EJ, et al. Guidelines for Re-
50) Zhao W, Li AX. Generalized approach to estimating sample porting Outcomes in Trial Reports: The CONSORT-Out-
sizes. SAS Global Forum 2012 (Statistical Data Analysis), comes 2022 Extension. JAMA. 2022;328(22):2252–2264.
Paper 336:1-7. https://doi.org/10.1001/jama.2022.21022
51) Sergeant ESG. Epitools Epidemiological Calculators. 59) Chan AW, Tetzlaff JM, Altman DG, et al. SPIRIT 2013 State-
Ausvet. Available at: http://epitools.ausvet.com.au ment: Defining standard protocol items for clinical trials. Ann
52) Faul F, Erdfelder E, Lang A, Buchner A. G*Power 3: A flexi- Intern Med. 2013;158:200-207.
ble statistical power analysis program for the social, behav- 60) Chan AW, Tetzlaff JM, Gøtzsche PC, et al. SPIRIT 2013 Ex-
ioral, and biomedical sciences. Behavior Research Methods. planation and Elaboration: Guidance for protocols of clinical
2007;39(2):175-191. https://doi.org/10.3758/bf03193146 trials. BMJ. 2013;346:e7586.
53) Faul F, Erdfelder E, Buchner A, Lang A. Statistical power 61) Mehta C, Pocock S. Adaptive increase in sample size when
analyses using G*Power 3.1: Tests for correlation and re- interim results are promising: a practical guide with exam-
gression analyses. Behavior Research Methods. ples. Stat Med. 2011;30:3267–84.
2009;41(4):1149-1160. 62) Chen J, DeMets DL, Lan G. Increasing the sample size when
https://doi.org/10.3758/brm.41.4.1149 the unblinded interim result is promising. Stat Med.
54) Naing L, Bin Nordin R, Abdul Rahman H, Naing YT. Sample 2004;23(7):1023-1038. https://doi.org/10.1002/sim.1688
size calculation for prevalence studies using Scalex and 63) Pallmann P, Bedding AW, Choodari-Oskooei B, et al. Adap-
ScalaR calculators. BMC Medical Research Methodology. tive designs in clinical trials: why use them, and how to run
2022;22:209. https://doi.org/10.1186/s12874-022-01694-7 and report them. BMC Med. 2018;16(1):29.
55) Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P. https://doi.org/10.1186/s12916-018-1017-7
Reporting of sample size calculation in randomised con- 64) Edwards JM, Walters SJ, Kunz C, Steven A. A systematic
trolled trials: review. BMJ. 2009;338:1-6. review of the “promising zone” design. Trials. 2020;21:1000.
https://doi.org/10.1136/bmj.b1732 https://doi.org/10.1186/s13063-020-04931-w

16
An - Najah Univ. J. Res. (N. Sc.) Vol. 38 (1), 2024 An-Najah National University, Nablus, Palestine

View publication stats

You might also like