Buderer 1996
Buderer 1996
I ABSTRACT
.....................................................................................................................................................
Careful consideration of statistical issues related to the choice of a sample size is critical for achieving
meaningful results in research studies designed to evaluate diagnostic tests. When assessing the ability of a
diagnostic test to screen for disease, the parameters sensitivity, specificity, and predictive values are of interest.
Study sample size requirements can be calculated based on a clinically acceptable degree of precision. the
hypothesized values of sensitivity and specificity, and the estimated prevalence of disease in the target pop-
ulation. The simple methods and tables in this paper guide the researcher when deciding how many subjects
to sample in a study designed to estimate both the sensitivity and the specificity of a diagnostic test, given a
specified precision and estimated disease prevalence.
Key words: statistics; sample size; sensitivity; specificity; test performance; disease prevalence; precision.
Acad. Ernerg. Med. 1996; 3:895-900.
I Careful consideration of statistical issues related to of a study to determine the sensitivity and specificity of
the choice of a sample size is critical for achieving intrasound vibration testing, in comparison with plain
meaningful results in research studies designed to eval- radiography, for the diagnosis of acute ankle fractures in
uate diagnostic tests, but such issues are frequently over- the ED. How many subjects are needed to estimate the
looked.' Instead, sample sizes based on other criteria (in- sensitivity and specificity of this new diagnostic test?
cluding numbers seen in published studies, past The researcher wants to ensure that the data will yield
experience, convenience, and cost) are commonly cho- estimates of sensitivity and specificity that have good
sen. While these factors are helpful, one also must con- precision.
sider how sample size and prevalence of disease influ- Sample size requirements are calculated based on
ence the degree of precision in the estimates of clinically acceptable precision for estimates of sensitivity
sensitivity and specificity. A consequence of using too and specificity, hypothesized values of sensitivity and
few subjects is that the estimates of sensitivity and spec- specificity, and the estimated prevalence of disease in the
ificity may be imprecise, and therefore fail to provide target population. It is assumed that both the diagnostic
clinically useful information. Furthermore, evaluating test and the criterion standard are dichotomous tests.
the diagnostic test with a sample of subjects whose prev- That is, a subject either tests positive or negative; a sub-
alence of disease is different from that of the population ject either has the disease (fracture) or is disease-free (no
for whom the test is developed may provide misleading fracture).
information. The simple methods and tables in this paper guide
We use as an example of this phenomenon the design the researcher in deciding how many subjects to sample
to estimate both the sensitivity and specificity of a di-
..............................................................................
agnostic test.
From St. Vincenr Medical Cenrer; Toledo, OH, Research Department
(NMFB). SAMPLEDATA
I ................................. ...................
Series editor: Roger J. Lewis, MD, PhD, Department of Emergency
Medicine. Harbor- UCLA Medical Center; Torrance, CA. In general, a new diagnostic test is introduced when it
Received: September 15, 1995; revision received: January 11, 1996; appears to offer some advantage over the commonly ac-
accepted: February 8, 1996; updated: February 22, 1996. cepted criterion standard (e.g., it is less expensive, it is
Address for correspondence and reprints: Nancy M. Fenn Buderer;
less invasive, or it allows for a more expedient diagno-
MS, St. Vincent Medical Center; Research Departmenr, 2213 Cherry sis) without sacrificing diagnostic accuracy. To evaluate
Street, Toledo, OH 43608. Fax: 419-321-3884. how well the new intrasound test predicts fractures, the
896 ACADEMIC EMERGENCY MEDICINE SEP 1996 VOL 3/NO 9
ificity. For instance, 2 10% might be clinically acceptable. (FF' + 'IN) and subsequently N2. The final sample size,
The narrower the desired CI width (or the more precise), N. for the experiment is the larger of NI and N2. In this
the more subjects required; the wider the clinically ac- way, the sample size will be adequate for estimation of
ceptable CI width, the fewer subjects required. Choosing both sensitivity and specificity with the desired precision.
this width requires careful consideration; it is not a trivial If the predictive values are of most importance, then
problem. sample size should be based on the expected proportions
with positive and negative tests, and estimates of PPV
and NPV, in an analogous fashion.
2. The prevalence of disease in the target population dic- The following 6 steps will lead the researcher from
tates how many of the N subjects will likely have the specifying the variables to determining a sample size, N.
disease (TP + FN) and will not have the disease (FP
+ TN). Since TP + FN and FP + TN are the denom-
inators of the SEs for sensitivity and specificity, re- Step 1: Specifications
spectively, the size of those 2 groups affects the width
of the CI, and subsequently affects the total sample Specify the maximum clinically acceptable width of the
size. For sensitivity, the higher the prevalence, the 95% CI. Call it W.
fewer subjects required; for specificity, the lower the
prevalence, the fewer subjects required. Specify an estimate for the prevalence of disease in the
target population. Call it P.
3. Widths of CIS depend on the values hypothesized for Specify a value for the expected sensitivity of the new
sensitivity and specificity. Holding the sample size diagnostic test. Call it SN.
constant, a sensitivity or specificity of 50% yields the
maximum CI width; larger sensitivities or specificities Specify a value for the expected specificity of the new
yield narrower widths. If the researcher cannot make diagnostic test. Call it SP.
an educated guess of the expected sensitivity or spec-
ificity, then a conservative choice of 50% provides a (For Purposes of calculations, w , P, SN, and SP are ex-
sample size that protects the precision for the maxi- pressed as numbers between 0 and 1, rather than as per-
mum width'; if the observed sensitivity or specificity centages.)
is >50%, then the observed CI widths will be narrower.
mated sensitivity, N I , is then calculated. Similarly, for are provided, therefore 01 = 0.05 and Zd = 1.96 (for
specificity, calculate the number of disease-free subjects 90% CIS, (Y = 0.10 and *Z , = 1.645).
Expected Sensitivity
Prevalence of Disease 50% 55% 60% 65% 70% 75% 80% 85% 90%
~ ~ ~~ ~ ~~
Step 3: Calculate the Sample Size Required for Sen- ?W, assuming the sensitivity and specificity are of sizes
sitivity, NI SN and SP, respectively.
Table 3 shows the number of subjects required to
N1 =
TP + FN yield a 10% width of a 2-sided 95% CI, for various
P values of sensitivity, with differing prevalences of dis-
ease; similarly, Table 4 shows Ns for specificity. (Notice
Step 4: Calculate the Number without Disease, FP + that some of the extreme cells in the table have small
TN sample sizes. For extreme cases such as these, more ex-
SP(1 - SP) act methods for sample size should be used.) Sample
FP + TN = ZL2 SAS (SAS Institute Inc., Cary, NC) code and output are
W2 provided (Appendix A).
As with all sample size calculations, the specifica-
Step 5: Calculate the Sample Size Required for Spec-
tions and assumptions are made before the data are col-
ificity, N2
lected. What is actually observed may be different from
FP + TN these a priori assumptions. Therefore, the observed
N2 = widths of the CIS that one achieves from the data are not
(1 - p>
guaranteed to be equal to the prespecified value of W.
Step 6: Select the Final Sample Size, N
Example: Sample Size for Intrasound Study
N is the larger of N1 and N2.
Step 1
N is the number of randomly selected subjects re-
quired to estimate the sensitivity and specificity, in a Previous research in wrists showed very high sensi-
target population with prevalence of disease P, within tivity and specificity. To be conservative, the researchers
I TABLE 4 Sample Size for Specificity, N2, with 95% CI Width of 10%
....................................................................................................................................................
Expected Specificity
Prevalence of Disease 50% 55% 60% 65% 70% 75% 80% 85% 90%
1% 98 97 94 89 82 73 63 50 35
5% 102 101 98 92 85 76 65 52 37
10% 107 106 103 98 90 81 69 55 39
20% 121 119 116 110 101 91 77 62 44
30% 138 136 132 125 116 103 88 70 50
40% 161 159 154 146 135 121 103 82 58
50% 193 191 185 175 162 145 123 98 70
60% 24 1 238 23 1 219 202 181 154 123 87
70% 32 1 317 308 292 269 24 1 205 164 116
80% 48 1 476 46 1 437 404 361 308 245 173
90% 96 1 95 1 922 874 807 721 615 490 346
~
APPENDIXA
*step 1 specifications;
sn = 0.90 ; *substitute your value for sensitivity here;
sp = 0.85 ; *substitute your value for specificity here;
p = 0.20 ; *substitute your value for prevalence here;
w = 0.10 ; *substitute your value for width here;
*step 3 calculate N l ;
n l = aclp;
*round up to the next whole integer;
n l i n t = int(n1);
if nl ne nl-int then n l = n l i n t + I ;
SAS Output