0% found this document useful (0 votes)
15 views6 pages

Buderer 1996

The document discusses the importance of sample size calculation in research studies evaluating diagnostic tests, emphasizing the need to incorporate disease prevalence for accurate estimates of sensitivity and specificity. It provides a methodology for determining the necessary sample size based on clinically acceptable precision, hypothesized values of sensitivity and specificity, and estimated disease prevalence. The paper includes tables and formulas to assist researchers in making informed decisions about sample size to ensure meaningful results in diagnostic test evaluations.

Uploaded by

Tiago Moreira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Buderer 1996

The document discusses the importance of sample size calculation in research studies evaluating diagnostic tests, emphasizing the need to incorporate disease prevalence for accurate estimates of sensitivity and specificity. It provides a methodology for determining the necessary sample size based on clinically acceptable precision, hypothesized values of sensitivity and specificity, and estimated disease prevalence. The paper includes tables and formulas to assist researchers in making informed decisions about sample size to ensure meaningful results in diagnostic test evaluations.

Uploaded by

Tiago Moreira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Disease Prevalence and Sample Size, Buderer 895

Statistical Methodology: I. Incorporating the


Prevalence of Disease into the Sample Size
Calculation for Sensitivity and Specificity
Nancy M. Fenn Buderer; MS

I ABSTRACT
.....................................................................................................................................................

Careful consideration of statistical issues related to the choice of a sample size is critical for achieving
meaningful results in research studies designed to evaluate diagnostic tests. When assessing the ability of a
diagnostic test to screen for disease, the parameters sensitivity, specificity, and predictive values are of interest.
Study sample size requirements can be calculated based on a clinically acceptable degree of precision. the
hypothesized values of sensitivity and specificity, and the estimated prevalence of disease in the target pop-
ulation. The simple methods and tables in this paper guide the researcher when deciding how many subjects
to sample in a study designed to estimate both the sensitivity and the specificity of a diagnostic test, given a
specified precision and estimated disease prevalence.
Key words: statistics; sample size; sensitivity; specificity; test performance; disease prevalence; precision.
Acad. Ernerg. Med. 1996; 3:895-900.

I Careful consideration of statistical issues related to of a study to determine the sensitivity and specificity of
the choice of a sample size is critical for achieving intrasound vibration testing, in comparison with plain
meaningful results in research studies designed to eval- radiography, for the diagnosis of acute ankle fractures in
uate diagnostic tests, but such issues are frequently over- the ED. How many subjects are needed to estimate the
looked.' Instead, sample sizes based on other criteria (in- sensitivity and specificity of this new diagnostic test?
cluding numbers seen in published studies, past The researcher wants to ensure that the data will yield
experience, convenience, and cost) are commonly cho- estimates of sensitivity and specificity that have good
sen. While these factors are helpful, one also must con- precision.
sider how sample size and prevalence of disease influ- Sample size requirements are calculated based on
ence the degree of precision in the estimates of clinically acceptable precision for estimates of sensitivity
sensitivity and specificity. A consequence of using too and specificity, hypothesized values of sensitivity and
few subjects is that the estimates of sensitivity and spec- specificity, and the estimated prevalence of disease in the
ificity may be imprecise, and therefore fail to provide target population. It is assumed that both the diagnostic
clinically useful information. Furthermore, evaluating test and the criterion standard are dichotomous tests.
the diagnostic test with a sample of subjects whose prev- That is, a subject either tests positive or negative; a sub-
alence of disease is different from that of the population ject either has the disease (fracture) or is disease-free (no
for whom the test is developed may provide misleading fracture).
information. The simple methods and tables in this paper guide
We use as an example of this phenomenon the design the researcher in deciding how many subjects to sample
to estimate both the sensitivity and specificity of a di-
..............................................................................
agnostic test.
From St. Vincenr Medical Cenrer; Toledo, OH, Research Department
(NMFB). SAMPLEDATA
I ................................. ...................
Series editor: Roger J. Lewis, MD, PhD, Department of Emergency
Medicine. Harbor- UCLA Medical Center; Torrance, CA. In general, a new diagnostic test is introduced when it
Received: September 15, 1995; revision received: January 11, 1996; appears to offer some advantage over the commonly ac-
accepted: February 8, 1996; updated: February 22, 1996. cepted criterion standard (e.g., it is less expensive, it is
Address for correspondence and reprints: Nancy M. Fenn Buderer;
less invasive, or it allows for a more expedient diagno-
MS, St. Vincent Medical Center; Research Departmenr, 2213 Cherry sis) without sacrificing diagnostic accuracy. To evaluate
Street, Toledo, OH 43608. Fax: 419-321-3884. how well the new intrasound test predicts fractures, the
896 ACADEMIC EMERGENCY MEDICINE SEP 1996 VOL 3/NO 9

researcher wishes to determine the test performance or TABLE 1 Display of Data


I ..........................................................................
“screening parameters” of the test [i.e., the sensitivity,
specificity, positive predictive value (PPV), and negative Diseased Disease-free
predictive value (NPV)].2.3Tables 1 and 2 provide defi- Positive diagnostic test TP (true positives) FP (false positives)
nitions of these terms. Negative diagnostic test FN (false negatives) TN (true negatives)
Under the intrasound protocol, the subject has either
a painful response to the vibration, or no pain. Therefore,
the test result is dichotomous. Diagnostic tests typically Table 2 contains the formulas for calculating estimates
classify a subject as either positive or negative for the of the screening parameters and their SEs.
disease or condition. However, the results from some An approximate 2-sided 95% CI for a proportion is
diagnostic tests are continuous. A continuous result can calculated as the
be converted to a dichotomous outcome by the judicious estimate t 1.96 X (SE)’
choice of a cutoff point.
The target population (the group of subjects for The Iower bound of the CI is the
whom the diagnostic test is designed) for the intrasound
study is all patients aged 2 1 5 years, arriving to the ED estimate - 1.96 X (SE)
with a suspected ankle fracture. The researcher estimates
The upper bound of the CI is the
that 20% of the ankle x-rays are typically positive for
fracture. That is, the prevalence (the proportion of the estimate + 1.96 X (SE)
population that has the disease) of fractures in the target
population is 20%. A 95% CI can be viewed as meaning that we are
The researcher plans to consecutively enroll a sample 95% sure the true value falls between the lower and
of N subjects. We anticipate that the prevalence of frac- upper bounds. Given the screening parameter estimates
tures in the sample is similar to that of the target pop- and SEs from Table 2, the CIS can be computed. It is
ulation and that the results may then be generalized from worth mentioning that this CI formula assumes the sam-
the sample to the target population. ple size is large. For small sample sizes, CIS should be
Each subject is given the intrasound test by the phy- based on more exact method^.^
sician before any x-rays are taken. A painful response The width of a 95% CI, where the width is given by
(reflexive withdrawal of the foot) is recorded as a posi- 1.96 X (SE), gives the precision in the estimates of sen-
tive result; if the patient shows no sign of pain, the result sitivity, specificity, PPV, and NPV (the total length of
is negative. After intrasound evaluation, the ankle is then the CI is 2 X width). Wide CIS indicate imprecision in
x-rayed and read by the radiologist (the same radiologist the estimates; narrow CIS indicate better precision.
for all subjects), who is blinded to the result of the in- Incorporating the prevalence of disease into the sam-
trasound test. ple size calculation adds a valuable new dimension to
these estimates. In their previous work, Arkin and Wach-
tell explain how to choose the number of diseased or
I . .ANALYSIS
.......................................................................... disease-free subjects (depending on whether sensitivity
or specificity is of interest) without considering preva-
Before explaining the calculation of sample size, a re- lence. The approach discussed here guides the researcher
view of some basic concepts may be helpful. Typically in deciding how many total subjects to sample to esti-
research data for evaluation of diagnostic tests are dis- mate both sensitivity and specificity, while accounting
played in a 2 X 2 contingency table (Table 1). Subjects for the prevalence of the disease.
are classified into 1 of the following 4 categories: true The number of subjects depends on 3 quantities that
positive (TP, diseased and test positive); false positive must be prespecified by the researcher: 1) the width of
(FP, not diseased but test positive); false negative (FN. a CI [clinically acceptable precision); 2) the prevalence
diseased but test negative); and true negative (TN, not of the condition or disease in the population of interest;
diseased and test negative). These are used to calculate and 3) hypothesized values of sensitivity and specificity.
the screening parameters.
Screening parameters are simply proportions. There-
1 . The researcher chooses a clinically acceptable width
fore, the basic formulas for proportions may be used to
of a 95% CI for the estimates of sensitivity and spec-
calculate their standard errors (SEs). Generally, the for-
mula for the SE of a proportion is the following‘:

v[(proportion) X (1 - proportion) / denominator of the proportion]


Disease Prevalence and Sample Size, Buderer 897

ificity. For instance, 2 10% might be clinically acceptable. (FF' + 'IN) and subsequently N2. The final sample size,
The narrower the desired CI width (or the more precise), N. for the experiment is the larger of NI and N2. In this
the more subjects required; the wider the clinically ac- way, the sample size will be adequate for estimation of
ceptable CI width, the fewer subjects required. Choosing both sensitivity and specificity with the desired precision.
this width requires careful consideration; it is not a trivial If the predictive values are of most importance, then
problem. sample size should be based on the expected proportions
with positive and negative tests, and estimates of PPV
and NPV, in an analogous fashion.
2. The prevalence of disease in the target population dic- The following 6 steps will lead the researcher from
tates how many of the N subjects will likely have the specifying the variables to determining a sample size, N.
disease (TP + FN) and will not have the disease (FP
+ TN). Since TP + FN and FP + TN are the denom-
inators of the SEs for sensitivity and specificity, re- Step 1: Specifications
spectively, the size of those 2 groups affects the width
of the CI, and subsequently affects the total sample Specify the maximum clinically acceptable width of the
size. For sensitivity, the higher the prevalence, the 95% CI. Call it W.
fewer subjects required; for specificity, the lower the
prevalence, the fewer subjects required. Specify an estimate for the prevalence of disease in the
target population. Call it P.
3. Widths of CIS depend on the values hypothesized for Specify a value for the expected sensitivity of the new
sensitivity and specificity. Holding the sample size diagnostic test. Call it SN.
constant, a sensitivity or specificity of 50% yields the
maximum CI width; larger sensitivities or specificities Specify a value for the expected specificity of the new
yield narrower widths. If the researcher cannot make diagnostic test. Call it SP.
an educated guess of the expected sensitivity or spec-
ificity, then a conservative choice of 50% provides a (For Purposes of calculations, w , P, SN, and SP are ex-
sample size that protects the precision for the maxi- pressed as numbers between 0 and 1, rather than as per-
mum width'; if the observed sensitivity or specificity centages.)
is >50%, then the observed CI widths will be narrower.

Step 2: Calculate the Number with Disease, TP + FN


Since the sample size required to estimate sensitivity
will likely differ from that required for specificity, 2
sample sizes are calculated (NI and N2, respectively). TP
- SN)
+ FN = ZL2 SN(1W2
Given the desired width and a best guess for sensitivity,
solve for the number of diseased subjects (TP FN). + The symbol Zd is the value from a standard normal
With an estimate of prevalence and a value for TP + table (found in most standard statistical textbooks), with
FN, the total number of subjects required for an esti- (Y being the type I error rate. Typically, 2-tailed 95% CIS

mated sensitivity, N I , is then calculated. Similarly, for are provided, therefore 01 = 0.05 and Zd = 1.96 (for
specificity, calculate the number of disease-free subjects 90% CIS, (Y = 0.10 and *Z , = 1.645).

I TABLE 2 Definitions, Estimates, and Standard Errors for Screening Parameters*


. . . . . . . . . . . . . ......................................................................................................................................

Screening Parameter Estimate Standard Error


Sensitivity (SN); the proportion of those who are diseased, who are SN = TP/(TP + FN) d[SN(l - SN)/(TP + FN)]
labeled positive by the diagnostic test
Specificity (SP): the proportion of those who are disease-free, who SP = TN/(TN + FP) d [ S P ( l - SP)/(TN + FP)]
are labeled negative by the diagnostic test
Positive predictive value (PPV); the proportion of subjects with pos- PPV = TP/(TP + FP) d [ P P V ( l - PPV)/(TP + FP)]
itive diagnostic test results, who have the disease
Negative predictive value (NPV); the proportion of subjects with NPV = TN/(TN + FN) d [ N P V ( l - NPV)/(TN + FN)]
negative diagnostic test results, who do not have the disease
*TN = true negative; FN = false negative; TP = true positive; FP = false positive. Convert to percentages by multiplying estimates, standard
errors, and confidence intervals by 100.
898 ACADEMIC EMERGENCY MEDICINE SEP 1996 VOL 3 / N O 9

TABLE 3 Sample Size for Sensitivity, N l , with 95% C1 Width of 10%


I ..............................................................................................................................................................
~

Expected Sensitivity
Prevalence of Disease 50% 55% 60% 65% 70% 75% 80% 85% 90%
~ ~ ~~ ~ ~~

1% 9,604 9,508 9,220 . 8.740 8,068 7.203 6,147 4,899 3.458


5% 1,921 1.902 1,844 1,748 1,614 1,441 1,230 980 692
10% 961 951 922 874 807 72 1 615 490 346
20% 48 1 476 46 1 437 404 36 1 308 245 173
30% 32 1 317 308 292 269 24 1 205 164 116
40% 24 1 238 23 1 219 202 181 154 123 87
50% 193 191 185 175 162 145 123 98 70
60% 161 159 154 I46 135 121 103 82 58
70% 138 136 132 125 116 103 88 70 50
80% 121 119 116 110 101 892 77 62 44
90% 107 106 103 98 90 81 69 55 39

Step 3: Calculate the Sample Size Required for Sen- ?W, assuming the sensitivity and specificity are of sizes
sitivity, NI SN and SP, respectively.
Table 3 shows the number of subjects required to
N1 =
TP + FN yield a 10% width of a 2-sided 95% CI, for various
P values of sensitivity, with differing prevalences of dis-
ease; similarly, Table 4 shows Ns for specificity. (Notice
Step 4: Calculate the Number without Disease, FP + that some of the extreme cells in the table have small
TN sample sizes. For extreme cases such as these, more ex-
SP(1 - SP) act methods for sample size should be used.) Sample
FP + TN = ZL2 SAS (SAS Institute Inc., Cary, NC) code and output are
W2 provided (Appendix A).
As with all sample size calculations, the specifica-
Step 5: Calculate the Sample Size Required for Spec-
tions and assumptions are made before the data are col-
ificity, N2
lected. What is actually observed may be different from
FP + TN these a priori assumptions. Therefore, the observed
N2 = widths of the CIS that one achieves from the data are not
(1 - p>
guaranteed to be equal to the prespecified value of W.
Step 6: Select the Final Sample Size, N
Example: Sample Size for Intrasound Study
N is the larger of N1 and N2.
Step 1
N is the number of randomly selected subjects re-
quired to estimate the sensitivity and specificity, in a Previous research in wrists showed very high sensi-
target population with prevalence of disease P, within tivity and specificity. To be conservative, the researchers

I TABLE 4 Sample Size for Specificity, N2, with 95% CI Width of 10%
....................................................................................................................................................

Expected Specificity
Prevalence of Disease 50% 55% 60% 65% 70% 75% 80% 85% 90%
1% 98 97 94 89 82 73 63 50 35
5% 102 101 98 92 85 76 65 52 37
10% 107 106 103 98 90 81 69 55 39
20% 121 119 116 110 101 91 77 62 44
30% 138 136 132 125 116 103 88 70 50
40% 161 159 154 146 135 121 103 82 58
50% 193 191 185 175 162 145 123 98 70
60% 24 1 238 23 1 219 202 181 154 123 87
70% 32 1 317 308 292 269 24 1 205 164 116
80% 48 1 476 46 1 437 404 361 308 245 173
90% 96 1 95 1 922 874 807 721 615 490 346
~

Disease Frevalence and Sample Size, Buderer 899

I TABLE 5 Features of Sample Size Estimation Method Based Step 5


on Desired Sensitivity and Specificity Precision
..........................................................................
N2 = 48.9804 / (1 - 0.20) = 61.23
Alternative names and related merhods- Sample size calculation for
determining the width of a CI for a proportion.
Step 6
Dam type-Dichotomous diagnostic test result and criterion standard.
N1 > N2, thus N, rounded to the next higher whole
Assumptions-Clinically acceptable width (precision) of 95% CI, W, number, is 173 subjects.
prevalence of disease in target population. P; hypothesized values
of sensitivity, SN. and specificity, SP. Large sample size.
I CONCLUSION
............................................................................
Principal resulr.s-Tota1 number of randomly selected subjects re-
quired to estimate sensitivity and specificity, in a target population
Calculating sample size requirements for controlling CI
with prevalence of disease P,within +W, assuming sensitivity and
specificity are at least of sizes SN and SP, respectively. widths helps to ensure that study results will have a clin-
ically acceptable degree of precision. The method de-
Srrengrhs-Helps ensure estimates of sensitivity and specificity will tailed in this paper incorporates the prevalence of disease
have clinically acceptable precision, accounting for the prevalence into the sample size calculation. By randomly selecting
of disease in the target population.
(or consecutively enrolling) N subjects for the study, the
Limitations-Actual observed CI widths are not guaranteed to be prevalence of disease in the sample is likely to reflect
equal to prespecified value of W. the prevalence found in the target population. This
method also allows the researcher to control CIS for both
sensitivity and specificity. Features of this approach are
hypothesize that the values in ankles will be lower. They
summarized in Table 5.
select SN = 0.90 and SP = 0.85. They choose a clinically
acceptable width of the 95% CIS for sensitivity and spec-
ificity to be no larger than 10%; i.e., a SN from 0.80 to The author is grateful to Judith C. White, MEd. for editorial assistance
1.OO and a SP from 0.75 to 0.95. Set W = 0.10. In their and to Martha Kreimer-Bimbaum. PhD, and Michael C. Plewa, MD,
for general support.
ED. they usually see 20% of the ankle x-rays positive
for fracture; i.e., set P = 0.20.
8 REFERENCES
Step 2
1. Arkin CF, Wachtel MS. How many patients are necessary to assess
0.90 X (1 - 0.90) test performance? [comment]. JAMA. 1990; 263:275-8.
TP + FN = 1.962 X 0. lo2
= 34.57 2. Riegelman RK, Hirsch RP. Studying a Study and Testing a Test:
How to Read the Medical Literature. 2nd rev. ed. Boston: Little,
Brown, 1989, pp 151-63.
Step 3 3. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 3. Sen-
sitivity, specificity, predictive value, and hypothesis testing. Ann
N1 = 34.5744 10.20 = 172.872 *.
Emerg Med. 1990; 19591-7.
4. Diamond GA. Limited assurances. Am J Cardiol. 1989; 63:99-
Step 4 100.
5 . Elenbaas RM, Elenbaas JK, Cuddy PG. Evaluating the medical
0.85 X (1 - 0.85)
FP + TN = 1.962 X o.102
= 48.98 literature: Part 2. Statistical analyses. Ann Ernerg Med. 1983; 12:
6 10- 20.

(The appendix appears on rhe next page)


900 ACADEMIC EMERGENCY MEDICINE SEP 1996 VOL 3/NO 9

APPENDIXA

Sample SAS Code

title1 ‘Choosing Sample Size for Evaluating a Diagnostic Test’;


data main;
label sn = ‘sensitivity’
sp = ’specificity’
p = ‘prevalence of disease’
w = ‘width of 95% CI’
a c = ‘#diseased subjects T P t F ”
n l = ‘sample size for SN’
n2 = ‘sample size for SP’
n = ‘total #subjects N’;

*step 1 specifications;
sn = 0.90 ; *substitute your value for sensitivity here;
sp = 0.85 ; *substitute your value for specificity here;
p = 0.20 ; *substitute your value for prevalence here;
w = 0.10 ; *substitute your value for width here;

*step 2 calculate TP+FN;


a x = 1.96*1.96*sn*(l-sn)/(w*w);

*step 3 calculate N l ;
n l = aclp;
*round up to the next whole integer;
n l i n t = int(n1);
if nl ne nl-int then n l = n l i n t + I ;

*step 4 calculate FP+TN


b-d = 1.96*1.96*sp*(l -sp)/(w*w);

*step 5 calculate N2;


n2 = b-d/(l -p);
*round up to the next whole integer;
n2int = int(n2);
if n2 ne n2int then n2 = n2int + 1;

* step 6 get final sample size;


if nl gt n2 then n=nl;
else if n2 gt n l then n=n2;
else if nl=n2 then n=nl;
run;

* print the sample size;


proc print label noobs;
var w sn sp p n l n2 n;
run;

SAS Output

Choosing Sample Size for Evaluating a Diagnostic Test

sample sample total


width of prevalence size for size for #subjects
95% c1 sensitivity specificity of disease SN SP N
0.1 0.9 0.85 0.2 173 62 173

You might also like