100% found this document useful (1 vote)
167 views5 pages

Sample Size Calculations

Statistics for Sample size

Uploaded by

AKNTAI002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
167 views5 pages

Sample Size Calculations

Statistics for Sample size

Uploaded by

AKNTAI002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Kidney Disease and Population Health

Nephron Clin Pract 2011;118:c319–c323 Published online: February 3, 2011


DOI: 10.1159/000322830

Sample Size Calculations


Marlies Noordzij a Friedo W. Dekker b Carmine Zoccali c Kitty J. Jager a
       

a
ERA-EDTA Registry, Department of Medical Informatics, Academic Medical Center, University of Amsterdam,
 

Amsterdam, and b Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, The Netherlands;
 

c
  CNR-IBIM, Clinical Epidemiology and Pathophysiology of Renal Diseases and Hypertension, Renal and
Transplantation Unit, Ospedali Riuniti, Reggio Calabria, Italy

Key Words sample size calculations is to determine the number of


Sample size ⴢ Power ⴢ Study design ⴢ Epidemiology ⴢ participants required to detect a clinically relevant treat-
Statistics ⴢ Nephrology ment effect. Optimizing the sample size is extremely im-
portant. If the sample size is too small, one may not be
able to detect an important effect, while a sample that is
Abstract too large may be a waste of time and money. Determining
The sample size is the number of patients or other experi- the sample size is one of the first steps in the design of a
mental units that need to be included in a study to answer trial, and methods to calculate the sample size are ex-
the research question. Pre-study calculation of the sample plained in several conventional statistical textbooks [1, 2].
size is important; if a sample size is too small, one will not be However, it is difficult for investigators to decide which
able to detect an effect, while a sample that is too large may method to use, because there are many different formulas
be a waste of time and money. Methods to calculate the sam- available, depending on the study design and the type of
ple size are explained in statistical textbooks, but because outcome studied. Furthermore, these calculations are
there are many different formulas available, it can be difficult sensitive to errors, because small differences in selected
for investigators to decide which method to use. Moreover, parameters can lead to large differences in sample size. In
these calculations are prone to errors, because small chang- this paper, we explain the basic principles of sample size
es in the selected parameters can lead to large differences in calculations based on an example describing a hypothet-
the sample size. This paper explains the basic principles of ical randomized controlled trial (RCT) on the effect of
sample size calculations and demonstrates how to perform erythropoietin (EPO) treatment on anaemia in dialysis
such a calculation for a simple study design. patients.
Copyright © 2011 S. Karger AG, Basel

The Basic Principles of Clinical Studies: An Example


Introduction
Suppose one wishes to study the effect of EPO treat-
The sample size is the number of patients or other ex- ment on haemoglobin levels in anaemic dialysis patients
perimental units that should be included in a study to be (haemoglobin !13 g/dl in men and !12 g/dl in women)
able to answer the research question. The main aim of [3]. These patients are randomized to receive either EPO

© 2011 S. Karger AG, Basel Marlies Noordzij, PhD


1660–2110/11/1184–0319$38.00/0 ERA-EDTA Registry, Department of Medical Informatics
Fax +41 61 306 12 34 Academic Medical Center, University of Amsterdam, PO Box 22700
E-Mail [email protected] Accessible online at: NL–1100 DE Amsterdam (The Netherlands)
www.karger.com www.karger.com/nec Tel. +31 20 566 78 73, Fax +31 20 691 98 40, E-Mail m.noordzij @ amc.uva.nl
Table 1. Overview of the components required for sample size calculations

Component Synonyms Definition Conventional values

Alpha type I error the chance of a false-positive result 0.05 or 0.01


p value
significance level
Beta type II error the chance of a false-negative result 0.20 or 0.10
Power (1 – beta) the chance of finding a statistically significant difference 0.80 or 0.90
between the groups when this difference exists in reality
Minimal clinically MCRD the minimal difference between the groups that a researcher –
relevant difference considers clinically relevant and biologically plausible
Variance standard deviation1 the variability of the outcome measure –
1
In the case of a continuous outcome.

will be. To determine how many patients we actually need


0.995
0
,00
0 to include in our RCT to detect a clinically relevant effect
10 ,000 0 0.99
0.1 6 ,0000
4 ,0 0
of EPO, we need to perform a sample size calculation or
3 ,00 0.98
2 400
1, 000 0.97
estimation.
0.2 1, 00
8 0 0.96
0.95 In the case of a simple study design, such as our RCT
60 00
0.3 5 00
4 0 on EPO treatment, a graphical method can be used to es-
30 0 0.90
24 00 timate the sample size required for the study. Figure 1
Standardized difference

0.4 2 0
16 40 Number 0.85
1 20
1 00 0.80 shows an example of a nomogram for sample size estima-
0.5 1
80 0 0.75 tion as published by Altman [4]. From this nomogram,
Power

7 0 0.70
0.6 6
0.65 we can read that we need a few parameters to estimate the
50
40 0.60
0.7 30 0.55 required sample size, i.e. the standardized difference in a
24 0 0.50
2 6 0.45 study, the power and the significance level.
0.8 1 4 0.40
1 2
1 0
1
0.35 To be able to use such a nomogram or another method
0.9 0.30
8
0.05 0.25 for sample size calculation, it is helpful to have some un-
0.20 derstanding of the basic principles of clinical studies.
1.0
0.15
0.01
0.10
When performing a clinical study, an investigator usually
1.1 Significance level
tries to determine whether the outcomes in two groups are
1.2 0.05 different from each other. In most cases, individuals treat-
ed with a certain drug or other health intervention are
Fig. 1. Nomogram for the calculation of sample size or power compared with untreated individuals. In general, the ‘true
(adapted from Altman [4], with permission). effect’ of a treatment is the difference in a specific outcome
variable, in our example haemoglobin level, between
treated and untreated individuals in the population. How-
ever, in clinical research, effects are usually studied in a
or placebo treatment. The primary outcome of this study study sample instead of in the whole population and as a
is a continuous one, namely haemoglobin level. After the result two fundamental errors can occur, which are called
intervention period, haemoglobin levels in the treated type I and type II errors. The values of these type I and
and placebo groups are compared. Of course, we hope to type II errors are important components in sample size
find a statistically significant difference in haemoglobin calculations. In addition, it is necessary to have some idea
level between the group treated with EPO and the placebo of the results expected in a study to be able to calculate the
group. Intuitively, we expect that the more patients we sample size. These components of sample size calculations
include in our study, the more significant our difference are described below and are summarized in table 1.

c320 Nephron Clin Pract 2011;118:c319–c323 Noordzij /Dekker /Zoccali /Jager


       
Components of Sample Size Calculations fect is a true effect. In the case of a continuous outcome
variable, the variability is estimated by means of the stan-
Type I Error (Alpha) dard deviation (SD). The variance is usually unknown,
The type I error, also called alpha, the significance lev- and therefore investigators often use an estimate obtained
el or the p value, represents the chance that a researcher from a pilot study or a previously performed study.
concludes that two groups differ when in reality they do
not or, in other words, the chance of a false-positive con-
clusion. Most commonly, alpha is fixed at 0.05, meaning Estimating Sample Size Using Graphical Methods
that a researcher desires a less than 5% chance of drawing
a false-positive conclusion. Now that we understand the separate components of
sample size calculations, we can use the nomogram as
Power published by Altman [4] (fig. 1) to estimate the sample
Investigators can also draw a false-negative instead of size required for our RCT on EPO treatment in dialysis
a false-positive conclusion. They then conclude that there patients. Suppose we consider a difference in haemoglo-
is no difference between two groups when in fact there is. bin level of 0.50 g/dl between the group treated with EPO
The chance of a false-negative conclusion is called a type and the placebo group as clinically relevant and we spec-
II error (beta). Beta is conventionally set at a level of 0.20, ified such an effect to be detected with 80% power (0.80)
which means that a researcher desires a less than 20% and a significance level alpha of 0.05. The last value we
chance of a false-negative conclusion. need for the calculation is the population variance. Previ-
For the calculation of the sample size, one needs to ously published reports on similar experiments using
know the beta or the power of a study. The power is the similar measuring methods in similar patients suggest
complement of beta, i.e. 1 – beta. This means that the that our data will be approximately normally distributed,
power is 0.80 or 80% when beta is 0.20. The power repre- and we estimate that the SD will be around 1.90 g/dl.
sents the chance of avoiding a false-negative conclusion To use this nomogram, one needs the standardized
or, in other words, the chance of detecting a specified ef- difference, which can simply be calculated by dividing
fect if it really exists. the minimal clinically relevant difference (0.50 g/dl) by
the SD in the population (1.90 g/dl). For our example, this
Minimal Clinically Relevant Difference yields 0.50/1.90 = 0.26. We can now use the nomogram to
The minimal clinically relevant difference is the small- estimate the sample size by drawing a straight line be-
est effect between the studied groups that the investigator tween the value of 0.26 on the scale for the standardized
wants to be able to detect. It is the difference that the in- difference and the value of 0.80 on the scale for power
vestigator believes to be clinically relevant and biological- and reading off the value on the line corresponding to
ly plausible. In the case of a continuous outcome variable, alpha = 0.05, which gives a total sample size of 450, i.e.
the minimal clinically relevant difference is a numerical 225 per group. However, although this nomogram seem
difference. For instance, if systolic blood pressure were the to work well for our example, one should keep in mind
outcome of a trial, an investigator could choose a differ- that these graphical methods often make assumptions
ence of 10 mm Hg as the minimal clinically relevant dif- about the type of data and statistical tests to be used. In
ference. If a trial had a binary outcome, such as the devel- many cases, it is therefore more appropriate to apply sta-
opment of catheter-related bacteraemia (yes/no), a rele- tistical formulas to calculate the required sample size.
vant difference between the event rates in both treatment
groups should be estimated. For example, the investigator
could choose a difference of 10% between the percentage Estimating Sample Size Using a Formula
of infections in the treatment group and that in the control
group as the minimal clinically relevant difference. Based on our trial example, we will now demonstrate
how sample size can be calculated. We will use the sim-
Variability plest formula for a continuous outcome variable, such as
Finally, the sample size calculation is based on the pop- haemoglobin level, and equal sample sizes in the treated
ulation variance of the outcome variable. In general, the (EPO) and control (placebo) groups [5]:
greater the variability of the outcome variable, the larger N = 2[(a + b)2␴2]/(␮1 – ␮2)2
the sample size required to assess whether an observed ef-

Sample Size Calculations Nephron Clin Pract 2011;118:c319–c323 c321


Table 2. Multipliers for conventional values of alpha and beta case in registry studies, sample size calculation is proba-
bly not necessary or even not possible. Also, for observa-
Alpha Beta tional studies aimed at the discovery or exploration of
0.05 0.01 0.20 0.10 0.05 0.01
effects, sample size is not of major importance.
So, sample size calculations are especially of interest in
Multiplier 1.96 2.58 0.842 1.28 1.64 2.33 the design of an RCT. Because a lot of money is invested
in this type of study, it is important to be sure that a suf-
ficient number of patients are included in the study to
find a relevant effect if it exists. However, sample size cal-
where N is the sample size in each of the groups, ␮1 is the culations are also sometimes needed in studies with oth-
population mean in treatment group 1, ␮2 is the popula- er designs, such as case-control or cohort studies, and
tion mean in treatment group 2, ␮1 – ␮2 is the minimal different formulas for sample size calculation are re-
clinically relevant difference, ␴2 is the population vari- quired in these cases [6, 7]. In the case of a clinical trial
ance (SD), a is the conventional multiplier for alpha and testing the equivalence of two treatments rather than the
b is the conventional multiplier for power. superiority of one over the other, another approach for
Again, we chose a power of 0.80, an alpha of 0.05 and sample size calculation is necessary. These equivalence or
a minimal clinically relevant difference in haemoglobin non-inferiority trials usually demand greater sample siz-
level between the two groups of 0.50 g/dl (␮1 – ␮2). Be- es [8].
cause we chose the significance level alpha to be 0.05, we Several software programs such as nQuery Advisor
should enter the value 1.96 for a in the formula. Similarly, and PASS can assist in sample size calculations for differ-
because we chose beta to be 0.20, the value 0.842 should ent types of data and study designs. In addition, there are
be filled in for b in the formula. These multipliers for con- some websites that allow free sample size calculations,
ventional values of alpha and beta can be found in table 2. but not all of these programmes are reliable. However,
The final value we need for the calculation is the pop- because many methods are not straightforward, we rec-
ulation variance (SD) of 1.90 g/dl. Entering all values in ommend consulting a statistician in all but the most basic
the formula yields: studies.
2 ! [(1.96 + 0.842)2 ! 1.902]/0.502 = 226.7.
This means that a sample size of 227 subjects per group Difficulties in Sample Size Calculations
is needed to answer the research question. This sample
size is in line with the number of 225 subjects per group Although sample size calculations are useful, especial-
which we estimated from the nomogram. ly because they force investigators to think about the
planning and likely outcomes of their study, they have
some important drawbacks. Firstly, some knowledge of
Different Study Designs and Situations the research area is needed before one can perform a sam-
ple size calculation, and lack of this knowledge is often a
In our example, the outcome variable is a continuous problem. Secondly, it is necessary to choose a primary
one. However, in many trials the outcome variable may outcome in order to calculate the required sample size,
be, for example, binary (e.g. yes/no) or survival (e.g. time while many clinical trials aim to study several outcomes.
to event). If this is the case, one still needs the four basic Researchers often change the planned outcome(s) after
components, but different formulas should be used and their study has begun, making the reported p values in-
other assumptions may be required. valid and potentially misleading [9]. Furthermore, the re-
Also, for different types of study designs, different quired sample size is very sensitive to the values the in-
methods for sample size calculation should be used. First vestigator chooses for the basic components in the calcu-
of all, it is important to realize that sample size calcula- lation. Based on our example, namely an RCT on EPO
tions are not required in all types of studies. These calcu- treatment, we show how selection of alpha, beta and the
lations are especially of interest in the context of hypoth- minimal clinically relevant difference can influence the
esis testing, as in trials aiming to show a difference be- results of sample size calculations. Choosing a higher
tween groups. If one just wants to know the occurrence power leads to a larger sample size. Since beta is the
of a certain disease (incidence or prevalence), as is the complement of the power, a higher power automatically

c322 Nephron Clin Pract 2011;118:c319–c323 Noordzij /Dekker /Zoccali /Jager


       
means a lower beta, indicating a lower chance of drawing ing the components in such a way that they need fewer
a false-negative conclusion. If we were to choose a power patients, as that is usually what is most convenient to the
of 0.90 instead of 0.80, the conventional multiplier for researchers. For this reason, sample size calculations are
beta in the formula would be 1.28 instead of 0.842 (ta- sometimes of limited value.
ble 1), and this would yield a larger sample size: Furthermore, more and more experts are expressing
criticism of the current methods used. They suggest in-
2 ! [(1.96 + 1.28)2 ! 1.902]/0.502 = 303.2.
troducing new ways to determine sample size, for exam-
Similarly, choosing a lower significance level alpha, indi- ple estimating the sample size based on the likely width
cating a lower chance of drawing a false-positive conclu- of the confidence interval for a set of outcomes [9]. How-
sion, leads to a larger sample size. So, if we were to choose ever, consensus about these alternative methods has not
a lower alpha of 0.01 instead of 0.05, we would have to use yet been reached.
2.58 as the conventional multiplier for alpha instead of
1.96, resulting in a larger sample size:
2 ! [(2.58 + 0.842)2 ! 1.902]/0.502 = 338.2.
Conclusions

These calculations with different values for alpha and Because there are many different methods available to
beta clearly show that using a sample size that is too small calculate the sample size required to answer a particular
leads to a higher risk of drawing a false-positive or false- research question and because the calculations are sensi-
negative conclusion. Finally, the choice of the minimal tive to errors, performing a sample size calculation can
clinically relevant difference has the largest influence. be complicated. We therefore recommend caution when
The smaller the difference one wants to be able to detect, performing the calculations or asking for statistical ad-
the larger the required sample size. If we aimed to detect vice during the designing phase of the study.
a difference of 0.3 g/dl instead of 0.5 g/dl, the calculation
would yield:
Acknowledgements
2 ! [(1.96 + 0.842)2 ! 1.902]/0.302 = 629.8.
The research leading to the findings reported herein has re-
These examples show the most important drawback of ceived funding from the European Community’s Seventh Frame-
sample size calculations; investigators can easily influ- work Programme under grant agreement No. HEALTH-F2-
ence the result of their sample size calculations by chang- 2009-241544.

References 1 Altman DG: Practical Statistics for Medical 6 Machin D, Campbell M, Fayers P, Pinol A:
Research. London, Chapman & Hall, 1991. Sample Size Tables for Clinical Studies, ed 2.
2 Bland M: An Introduction to Medical Statis- London, Blackwell Science, 1997.
tics, ed 3. Oxford, Oxford University Press, 7 Lemeshow S, Levy PS: Sampling of Popula-
2000. tions: Methods and Applications, ed 3. New
3 World Health Organization: Nutritional York, John Wiley & Sons, 1999.
Anemia. Report of a WHO Scientific Group. 8 Christensen E: Methodology of superiority
Geneva, World Health Organization, 1968. vs. equivalence trials and non-inferiority tri-
4 Altman DG: Statistics and ethics in medical als. J Hepatol 2007;46:947–954.
research. III. How large a sample? Br Med J 9 Bland JM: The tyranny of power: is there a
1980;281:1336–1338. better way to calculate sample size? BMJ
5 Florey CD: Sample size for beginners. BMJ 2009;339:1133–1135.
1993;306:1181–1184.

Sample Size Calculations Nephron Clin Pract 2011;118:c319–c323 c323

You might also like