Determination of Sample Size
Determination of Sample Size
ON
“RMD”
ON
DETERMINATION OF
SAMPLE SIZE
SUBMITTED BY :
SUSHANT SHARMA
SEC: SD2
COURSE : ISBE
Determination of Sample Size
Determining the sample size for a study is a crucial component of study design. The
goal is to include sufficient numbers of subjects so that statistically significant results
can be detected. Using too few subjects results in wasted time, effort, research
dollars, and animal lives, and yields statistically inconclusive results. Statistically
inconclusive findings make it difficult to determine whether a particular treatment or
intervention was effective and to identify directions for future studies. Studies with
insufficient subjects also may result in potentially important research advances that
go undetected. In statistical language, these studies are referred to as “under-
powered.” That is, the probability that they will detect an existing treatment effect is
lower than optimal.
Using too many subjects may result in statistically significant conclusions and clear
future study directions. However, if the same answer could have been obtained with
fewer subjects, then time, effort, research dollars, and animal lives also have been
wasted. In statistical language, these studies are referred to as “over-powered.” That
is, the probability that they will detect a treatment effect is higher than optimal.
Using the appropriate number of subjects optimizes the probability that a study will
yield interpretable results and minimizes research waste. From a statistical
perspective, studies with the optimal number of subjects have sufficient -- neither
too much nor too little -- statistical “power” to detect findings.
Under federal regulations, one of the responsibilities of the Institutional Animal Care
and Use Committee (IACUC) is to ensure that study sample sizes have been rigorously
determined. One of the roles of the Data Management Services (DMS) Statistical
Consulting group is to assist investigators with these determinations.
The statistician comes to the same conclusion but expresses it in somewhat different
terms. The ultimate purpose of most studies is to use a sample (a subgroup) to make
inferences about a population (the larger group of interest). When data exhibit large
amounts of within-group variability relative to treatment variability, then any
generalizations made to the population must be made with uncertainty. In other
words, the reliability with which the sample can be used to make inferences about
the population is less than when within-group variability is relatively small.
Statistical Analysis and Variability
Many statistical analyses grapple with this problem – given that we know that
subjects will vary in their responses to the same treatment, are the observed
differences between treatment groups consistent enough to state with relative
certainty that the treatment worked?
The statistician conceptualizes the problem in terms of variability. Within a particular
study, there are two major influences on the variability of measured responses:
1) the treatment, and 2) error. The treatment contributes to the variability of
measured responses, if it was effective, by systematically increasing or
decreasing them. Error contributes to the variability of measured responses in
several ways. It is important to DMS – Statistical Consulting Group (Faraday)
January 2006 3 DMS – Statistical Consulting Group (Faraday) January 2006 4
note that the term “error” does not indicate that mistakes were made in the
study. “Error” is the term used to refer to all of the influences other those that
result from the treatment that could alter measured responses. Error
includes, therefore, the inconsistency inherent in measurements obtained
with a measurement device or technique that is not perfect, procedural
differences in how the same treatment was administered to subjects, and
inherent differences among subjects that are not related to the treatment.
Error is considered a non-systematic influence on responses because it can
increase or decrease them. The total variability in responses in a particular
study can be divided into these two components: 1) variability that is
associated with or that is the result of the treatment and, 2) variability that is
not the result of the treatment or error variability.
Many statistical analyses address the same question: is the variability associated with
the treatment large enough relative to the variability associated with error to be
relatively certain that the treatment worked?
Notice that this is the same question that was stated above using different
terminology. The intuitive grasp that the situation in the right side of Figure 3 reflects
more certainty about the treatment effectiveness than the situation on the left side
of the figure illustrates this point.
Also note that the absolute size of treatment variability and error variability is not
important – only their relative relationship. A useful analogy is a signal-to-noise ratio.
The treatment variability is the signal; the error variability is the noise. Noisy data –
data that exhibit a great deal of within-group variability – require that the signal –
the treatment variability -- be strong in order to be detected.
In general, if the variability associated with the treatment is large relative to the
error variability, then relatively few subjects will be required to obtain statistically
significant results. Conversely, if the variability associated with the treatment is small
relative to the error variability, then relatively more subjects will be required to
obtain statistically significant results.
The sample size determination formulas come from the formulas for the maximum
error of the estimates. The formula is solved for n. Be sure to round the answer
obtained up to the next whole number, not off to the nearest whole number. If you
round off, then you will exceed your maximum error of the estimate in some cases.
By rounding up, you will have a smaller maximum error of the estimate than allowed,
but this is better than having a larger one than desired.
Population Mean
Here is the formula for the sample size which is obtained by solving the maximum
error of the estimate formula for the population mean for n.
Population Proportion