0% found this document useful (0 votes)
32 views10 pages

Agresti 2000

This article discusses confidence intervals for proportions and differences of proportions. It notes that standard confidence intervals used in introductory statistics courses often have poorer than expected coverage rates. However, the intervals can be improved by making simple adjustments based on adding two successes and two failures as pseudo observations to the data. When making these adjustments, the coverage rates of nominal 95% confidence intervals for differences of proportions were actually above 93% in almost all cases for small sample sizes, a big improvement over the standard intervals. The adjusted intervals provide better performance while still being easily taught in non-calculus based statistics courses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views10 pages

Agresti 2000

This article discusses confidence intervals for proportions and differences of proportions. It notes that standard confidence intervals used in introductory statistics courses often have poorer than expected coverage rates. However, the intervals can be improved by making simple adjustments based on adding two successes and two failures as pseudo observations to the data. When making these adjustments, the coverage rates of nominal 95% confidence intervals for differences of proportions were actually above 93% in almost all cases for small sample sizes, a big improvement over the standard intervals. The adjusted intervals provide better performance while still being easily taught in non-calculus based statistics courses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article was downloaded by: [McGill University Library]

On: 29 March 2013, At: 08:56


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

The American Statistician


Publication details, including instructions for authors and subscription information:
http://amstat.tandfonline.com/loi/utas20

Simple and Effective Confidence Intervals for


Proportions and Differences of Proportions Result from
Adding Two Successes and Two Failures
a a
Alan Agresti & Brian Caffo
a
Department of Statistics, University of Florida, Gainesville, FL, 32611-8545
Version of record first published: 17 Feb 2012.

To cite this article: Alan Agresti & Brian Caffo (2000): Simple and Effective Confidence Intervals for Proportions and
Differences of Proportions Result from Adding Two Successes and Two Failures, The American Statistician, 54:4, 280-288

To link to this article: http://dx.doi.org/10.1080/00031305.2000.10474560

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://amstat.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should
be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,
proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in
connection with or arising out of the use of this material.
Teacher's Corner
Simple and Effective Confidence Intervals for Proportions
and Differences of Proportions Result from Adding Two
Successes and Two Failures
Alan AGRESTIand Brian CAFFO
An approximate 100(1 - a)% confidence interval for
p1 - p2 is
The standard confidence intervals for proportions and their
differences used in introductory statistics courses have poor
performance, the actual coverage probability often being
much lower than intended. However, simple adjustments of
these intervals based on adding four pseudo observations, These confidence intervals result from inverting large-
Downloaded by [McGill University Library] at 08:56 29 March 2013

half of each type, perform surprisingly well even for small sample Wald tests, which evaluate standard errors at
samples. To illustrate, for a broad variety of parameter set- the maximum likelihood estimates. For instance, the in-
tings with 10 observations in each sample, a nominal 95% terval for p is the set of po values for which I$ -
interval for the difference of proportions has actual cov-
erage probability below .93 in 88% of the cases with the
p o l / d m< z , / ~ ; that is, the set of po having P
value exceeding a in testing HO : p = po against H , : p # po
standard interval but in only 1% with the adjusted interval; using the approximately normal test statistic. The intervals
the mean distance between the nominal and actual cover- are sometimes called Wald intervals. Although these inter-
age probabilities is .06 for the standard interval, but .01 for
vals are simple and natural for students who have previ-
the adjusted one. In teaching with these adjusted intervals, ously seen analogous large-sample formulas for means, a
one can bypass awkward sample size guidelines and use the considerable literature shows that they behave poorly (e.g.,
same formulas with small and large samples. Ghosh 1979; Vollset 1993; Newcombe 1998a, 1998b). This
KEY WORDS: Binomial distribution; Score test; Small can be true even when the sample size is very large (Brown,
sample; Wald test. Cai, and DasGupta 1999). In this article, we describe sim-
ple adjustments of these intervals that perform much better
but can be easily taught in the typical non-calculus-based
statistics course.
1. INTRODUCTION These references showed that a much better confidence
interval for a single proportion is based on inverting the
Let X denote a binomial variate for n trials with pa- test with standard error evaluated at the null hypothesis,
rameter p , denoted bin(n,p), and let $ = X/n denote the which is the score test approach. This confidence interval,
sample proportion. For two independent samples, let X1 due to Wilson (1927), is the set of po values for which
be bin(nl,pl), and let X2 be bin(n2,pz). Let z, denote the I$ - P o I / J P o ( l - p o ) / n < z,/2, which is
1--a quantile of the standard normal distribution. Nearly all
elementary statistics textbooks present the following confi-
dence intervals for p and p1 - p2:

An approximate 100(1 - a)% confidence interval for


p is
The midpoint is a weighted average of $ and 1/2, and it
equals the sample proportion after adding z&2 pseudo ob-
servations, half of each type. The square of the coefficient
of z,/~ in this formula is a weighted average of the variance
Alan Agresti is Professor, and Brian Caffo is a Graduate Student, Depart- of a sample proportion when p = $ and the variance of a
ment of Statistics, University of Florida, Gainesville, FL 32611-8545 (E-
mail: [email protected]).This work was partially supported by grants
+
sample proportion when p = 1/2, using n z2/2 in place
of the usual sample size n. For the 95% case, Agresti and
from the National Institutes of Health and the National Science Foun-
dation. The authors appreciate helpful comments from Brent Coull and Coull (1998) used this representation to motivate approxi-
Yongyi Min. mating the score interval by the ordinary Wald interval (1)

280 The American Statistician, November 2000, Vol. 54, No. 4 @ 2000 American Statistical Association
Coverage Probability Coverage Probability Coverage Probability

95%

0 .2 .4 .6 .8 1 P
0 .2 4 . 6 8 1 0 .2 .4 .6 .8 1

I ...... Wald - Adpsted I

Coverage Probability Coverage Probability Coverage Probability

99% 90

.85

80

.75
Downloaded by [McGill University Library] at 08:56 29 March 2013

.70 P
0 .2 .4 .6 8 1

n=5 n=lO n=20


Figure 1. Coverage probabilities for the binomial parameter p with the nominal 95% and 99% Wald confidence interval and the a@usted interval
based on adding four pseudo observations, for n = 5, 10, 20.

after adding 2%25= 1.962 M 4 pseudo observations, two of 1; and 1/2 rather than the weighted average of the
@ of
each type. That is, their adjusted “add two successes and variances; by Jensen’s inequality, the adjusted interval is
two failures” interval has the simple form wider than the score interval.
For small samples, the improvement in performance of

* 6 2 . 0 2 5 d @ ( 1 - 6)/fi ,
the adjusted interval compared to the ordinary Wald interval
(3) is dramatic. To illustrate, Figure 1 shows the actual cover-
+
but with fi = ( n+ 4) trials and fi = ( X 2)/(n+ 4). The age probabilities for the nominal 95% Wald and adjusted
midpoint equals that of the 95% score confidence interval intervals plotted as a function of p , for n = 5, 10, and 20.
(rounding 2.025 to 2.0 for that interval), but the coefficient of For all n great improvement occurs for p near 0 or 1. For
2,025 uses the variance @(1 - @)/6at the weighted average instance, Brown et al. (1999) stated that when p = .01, the
size of n required such that the actual coverage probability
Coverage Probability
of a nominal 95% Wald interval is uniformly at least .94
for all n above that value is n = 7963, whereas for the ad-
justed interval this is true for every n; when p = .10 the
values are n = 646 for the Wald interval and n = 11 for
the adjusted interval. The Wald interval behaves especially
poorly with small n for p near the boundary, partly because
of the nonnegligible probability of having 1; = 0 or 1 and
thus the degenerate interval [O, 01 or [l, 11. Agresti and
w
Coull (1998) recommended the adjusted interval for use in
elementary statistics courses, since the Wald interval be-
haves poorly yet the score interval is too complex for most
students. Many students in non-calculus-based courses are
mystified by quadratic equations (which are needed to solve
1 for the score interval) and would have difficulty using the
0 2 4 6 8 weighted average formula above. In such courses, it is of-
ten easier to show how to adapt a simple method so that it
t Pseudo Observations works well rather than to present a more complex method.
Let I t ( n , z )denote the adjustment of the Wald interval
Figure 2. Boxplots of coverage probabilities for nominal 95% ad- that adds t/2 successes and t / 2 failures. With confidence
justed confidence intervals based on adding t pseudo observations; dis-
tributions refer to 10,000 cases, with n l and n2 each chosen uniformly levels (1 - a ) other than .95, the Agresti and Coull approx-
between 10 and 30 and p l and p2 chosen uniformly between 0 and 1. imation of the score interval uses I t ( n , z ) with t = zi,2
The American Statistician, November 2000, Vol. 54, No. 4 281
Table 1. Summary of Performance of Nominal 95% Confidence Intervals for p1 - p2 Based on Adding t Pseudo Obser-
vations, Averaging with Respect to a Uniform Distribution for (p, ,p2).

Number of Pseudo Observations t Hybrid Approximate


Characteristic n 0 2 4 6 8 Score Bayes

Coverage 10 ,891 ,949 ,960 ,958 ,945 ,954 ,952

20 ,924 ,949 ,956 ,955 ,948 .953 ,951

30 .933 ,949 ,954 ,954 ,949 ,950 .951

30, 10 .895 ,948 .959 ,959 .950 .950 .952

Distance 10 ,059 ,014 .013 ,020 .035 ,014 ,012

20 ,026 ,008 ,008 .012 ,022 ,009 ,007

30 ,017 ,006 .006 ,008 ,016 ,008 .006

30, 10 ,055 ,018 ,012 ,013 ,023 ,010 ,011


Downloaded by [McGill University Library] at 08:56 29 March 2013

Length 10 ,647 ,670 ,673 ,668 ,659 ,654 ,647

20 ,480 ,487 ,488 ,487 .485 ,481 ,477

30 ,398 ,401 ,401 ,401 ,401 .398 .396

30, 10 ,537 ,551 ,553 ,551 .545 ,537 .536

Cov. Prob. < .93 10 .880 .090 ,010 .I 00 ,235 .072 ,046

20 ,404 ,016 .002 ,046 .I 75 .020 .008

30 .I 80 .005 .ooo .023 .I 31 ,009 .002

30, 10 .934 .I 12 ,004 ,028 ,173 ,029 .018


~

NOTE: Table reports mean of coverage probabilities Ct(n,pl; n,pz), mean of distances ICt(n,pr; n,pz) - .95/from nominal level, mean of expected interval lengths, and proportion of cases
with Ct(n,pi: n,pz) <.93

instead of t = 4, for instance adding 2.7 pseudo observa- tially after adding a pseudo observation of each type to
tions for a 90% interval and 5.4 for a 99% interval. Many each sample, regarding sample i as (n, + 2 ) trials with
instructors in elementary courses will find it simpler to tell 17, = ( X , + l ) / ( n z+ 2). There is no reason to expect an
students to use the same constant for all cases. One will optimal interval to result from this method, or in particu-
do reasonably well, especially at high nominal confidence lar from adding the same number of pseudo observations
levels, by the recipe of always using t = 4. The perfor- to each sample or even the same number of cases of each
mance of the adjusted interval 14(n,x) is much better than type, but we restricted attention to this form because of the
the Wald interval (1) for the usual confidence levels. To simplicity of explaining it in a classroom setting.
illustrate, Figure 1 also shows coverage probabilities for
nominal 99% intervals, when n = 5, 10, 20. Since the .95
confidence level is the most common in practice and since 2. COMPARING PERFORMANCE OF WALD
this “add two successes and two failures” adjustment pro- INTERVALS AND ADJUSTED INTERVALS
vides strong improvement over the Wald for other levels For the two-sample comparison of proportions, we now
as well, it is simplest for elementary courses to recommend study the performance of the Wald confidence formula (2)
that adjustment uniformly. Of the elementary texts that rec- after adding t pseudo observations, t/4 of each type to each
ommend adjustment of the Wald interval by adding pseudo sample, truncating when the interval for p1 -p2 contains val-
observations, some (e.g., McClave and Sincich 2000) di- ues < -1 or > 1. Denote this interval by I t ( n l ,XI;n2,22),
rect students to use I 4 ( n , x ) regardless of the confidence or It for short, so I0 denotes the ordinary Wald interval.
coefficient whereas others (e.g., Samuels and Witmer 1999) Our discussion refers mainly to the .95 confidence coef-
recommend t = z:,~. ficient, but our evaluations also studied .90 and .99 coef-
The purpose of this article is to show that a simple ad- ficients. Let Ct(n1,pl;n2,p2), or Ct for short, denote the
justment, adding two successes and two failures (total), true coverage probability of a nominal 95% confidence in-
also works quite well for two-sample comparisons of pro- terval It. We investigated whether there is a t value for
portions. The simple Wald formula (2) improves substan- which ICt(nl,pl;n2,p2) - .951 tends to be small for most

282 Teacher’s Comer


Proportion Below .93 Proportion Below .93

1 1
\
\
.8 .8

.6

.4

.2

0 I
n l = n2 = 10
I
.6

.4

.2

0
\ \
n l =30 n 2 = 1 0
Downloaded by [McGill University Library] at 08:56 29 March 2013

I I I I I

0 2 4 6 8 0 2 4 6 8

t Pseudo Observations t Pseudo Observations

Figure 3. Proportion of (pl, p2) cases with p l and p2 chosen uniformly between 0 and 1 for which nominal 95% adjusted confidence intervals
based on adding t pseudo observations have actual coverage probabilities below ,93, for n l = n2 = 10 and n l = 30, n2 = 10.

even with small n1 and n2,with Ct rarely very far


(pl,p2), The ordinary 95% Wald interval behaves poorly. Its cov-
(say .02) below .95. To explore the performance for a vari- erage probabilities tend to be too small, and they converge
ety o f t with small n2,we randomly sampled 10,000 values to 0 as each Pz moves 1 Or 0- The coverages for
of (n1,p1;n2,p2), taking p1 and p2 independently from a It improve greatly for the positive of t. The
14 with four pseudo observations behaves especially well,
uniform distribution over [0,1] and taking n1 and 722 inde-
having relatively few poor coverage probabilities. For in-
pendently from a uniform distribution Over {", 11> ' . . 301' 1
stance, the proportion of cases for t = (0, 2, 4, 6, 8) that had
For each we ct (721 ,p1;1227 p2) for be- C, < .93 were (-572, .026, .002, .046, .171). Similarly, the
tween 0 and 8. Figure 2 illustrates results, showing skeletal proportion of nominal 99% intervals that had actual cover-
box plots of Ct for t = 0 , 2 , 4 , 6 , 8 he., adding 0, .5, 1, 1.5, age probability below .97 were (.310, .012, .OOO, .OOO, .OOO),
2 observations of each type to each sample). and the proportion of nominal 90% intervals that had ac-

Coverage Probability Coverage Probability Coverage Probability


1.O( 1.O( 1.O(

.95 .95 .95

.90 .90 .90

.85 .85 .85

.m pi .80
1
0 2 4 6 8 1 0 2 4 6 8 1 0 2 4 6 8 1

P2 = .1 P2 = .3 P2 = .5

Figure 4. Coverage probabilities for nominal 95% Wald and adjusted confidence intervals (adding t = 4 pseudo observations) as a function of
p l when p2 = . I , .3, .5,with n l = n2 = 20.

The American Statistician, November 2000, Vol. 54,No. 4 283


Coverage Probability Coverage Probability Coverage Probability
1.oo 1 1001 too{

.95 -

.90 -

.85 -
,85]

.80 .80 -

I ......
- Adjusted
Wald
I
.75 -

'70{ , , , , , , PI .70 {, , , , , , pl 70 1 I I I I I
PI
0 2 .4 .6 .8 1 0 .2 .4 .6 8 1 0 .2 .4 .6 .8 1

n l = n2 = 10 n l = 20, n2 = 10 n l = 40, n2 = 10
Downloaded by [McGill University Library] at 08:56 29 March 2013

Figure 5. Coverage probabilities for nominal 95% Wald and adjusted confidence intervals (adding t = 4 pseudo observations) as a function of
p l when p2 = .3 when n l = n2 = 10, n l = 20, n2 = 10, and n l = 40, n2 = 10.

tual coverage probability below .88 were (.623, .045, .016, (721,122) = (10, l o ) , (20, lo), and (40,lO). Figure 6 shows Co
.131, .255). The pattern exhibited here is illustrative of a and C4 as a function of p1 when p l - p 2 = 0 or .2 and when
variety of results from analyzing Ct more closely, as we the relative risk p 1 / p 2 = 2.0 or 4.0, when n1 = n2 = 10.
now discuss. In Figures 4-6, only rarely does the adjusted interval have
We analyzed the performance of the It interval for coverage significantly below the nominal level. On the other
various fixed (nl, n2) combinations. Table 1 summarizes hand, Figures 4 and 6 show that it can be very conservative
some characteristics, in an average sense based on tak- when pl and p2 are both close to 0 or 1, say with (pl + p 2 ) / 2
ing ( p l , p 2 ) uniform from the unit square, for (n1,n2)= below about .2 or above about .8 for the small sample sizes
(10, l o ) , (20,20), (30,30), (30,lO). Although the adjusted studied here. This is preferred, however, to the very low
interval I4 tends to be conservative, it compares well to coverages of the Wald interval in these cases. Figures 7
other cases in the mean of the distances ICt - .951 and es- and 8 illustrate their behavior, showing surface plots of Co
pecially the proportion of cases for which Ct < .93. For n, and C, over the unit square when n1 = n2 = 10. The spikes
= 10, for instance, the actual coverage probability is below at values of p , in Figures 4 and 5 become ridges at values
.93 for 88% of such cases with the Wald interval, but for of p1 - pa in these figures.
only 1% of them with 14. Figure 3 shows the proportions The poor performance of the Wald interval does not oc-
of coverage probabilities that are below .93 as a function cur because it is too short. In fact, for moderate-sized p ,
it tends to be too long. For instance, when n1 = n2 = 10,
o f t , for (nl, n2) = (10,lO) and (30, 10). The improvement
10has greater expected length than 14 for p2 between . l l
over the ordinary Wald interval from adding t = 4 pseudo
and .89 when p1 = .5 and for p2 between .18 and .82
observations is substantial. Remaining figures concentrate
when pl = .3. When n1 = 122 = n and when Ijl =
on this particular adjustment, which fared well in a variety
5 2 = 5,10 has greater length than It when 5 falls within
of evaluations we conducted.
Averaging performance over the unit square for ( p I , p 2 )
+
J.25 - n(4n + t ) / [ 2 4 n 2+ 12nt 2t2] of .5. For all t > 0,
this interval around .5 shrinks monotonically as n increases
can mask poor behavior in certain regions, and in practice to .50 f .50/&, or (.21,.79), which applies also to the
certain pairings (e.g., Ip1 - pal small) are often more com- Agresti and Coull (1998) adjusted interval in the single-
mon or more important than others. Thus, besides studying sample case. As in the single-proportion case, the Wald in-
these summary expectations, we plotted C, as a function terval suffers from having the maximum likelihood estimate
of p1 for various fixed values of p2, p l - p2, and p l / p 2 . exactly in the middle of the interval.
To illustrate, Figure 4 plots the Wald coverage COand the There is nothing unique about t = 4 pseudo observations
coverage C, for the adjusted interval, fixing pa at .l, .3, in providing good performance of adjusted intervals in the
and .5, for n1 = 122 = 20. The poor coverage spikes for one- and two-sample problems. For instance, Figure 3 and
the Wald interval disappear with 14,but this adjustment Table 1 show that other adjustments often work well. A re-
is quite conservative when p l and p2 are both close to 0 gion of t values provide substantial improvement over the
or both close to 1. The adjustment I4 performs reasonably Wald interval, with values near t = 2 being less conserva-
well, and much better than the Wald interval, even with very tive than t = 4. We emphasized the case t = 4 earlier for
small or unbalanced sample sizes. Figure 5 illustrates, plot- the two-sample case because it rarely has poor coverage.
ting COand C, as a function of p l with p 2 fixed at .3, for We believe it is worth permitting some conservativeness to
284 Teacher's Comer .
Coverage Probability Coverage Probability PI -P2=.2
P I -P2=0

..-.__ ......
..................................
...............
_.. .....

.........
.? - Adjusted

I I I I i I pi .6 , , , , PI
0 .2 .4 .6 .8 1 .2 .4 .6 .8 1

Coverage Probability
PI/P2=2
Downloaded by [McGill University Library] at 08:56 29 March 2013

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
Figure 6. Coverage probabilities for nominal 95% Wald and adjusted confidence intervals (adding t = 4 pseudo observations) as a function of
p l when p l p2 = 0 or .2 and when pl/p2 = 2 or 4, for n l = n2 = 10.

ensure that the coverage probability rarely falls much below courses, it focuses on the simple It adjustment rather than
the nominal level. In the one-sample case the adjusted in- methods that may be suggested by statistical principles. To
terval 12(n,z)is better than I4(12,z) in approximating the find a good method more generally, one approach is to invert
score interval with small confidence levels, such as 90%. a test of Ho : p l - p 2 = A that has good properties, such
An advantage of the interval 12(n,2)for p is consistency
as using the large-sample score test (Mee 1984) or profile
between the single-sample case and our recommended ad-
likelihood methods (Newcombe 199813). The score test of
justment 14(n1, zl;122, x2) for two samples. For instance,
as n2 + 00 and the second sample yields a perfect esti- p l - p 2 = 0 is the familiar Pearson chi-squared test, so
mate, the resulting “add two successes and two failures” this approach has the advantage that the confidence interval
two-sample interval uses the first sample in the same way is consistent with the most commonly taught test of the
as does the “add one success and one failure” single-sample same nominal level. The method of obtaining the confidence
interval. However, for the single-sample problem we prefer interval is too complex for elementary courses, however,
the 14(n,z)interval, since .95 is by far the most common partly because the test of p1 - p 2 = h requires finding the
confidence level in practice and this interval works some- maximum likelihood estimates of (pl,p 2 ) for the standard
what better than 12(n, ). in that case.
error subject to the constraint $1 - $2 = A.
Newcombe (1998b) evaluated various confidence interval
methods for pl - p 2 . He proposed a method that performs
substantially better than the Wald interval and similar to
3. COMPARING THE ADJUSTED INTERVAL the score interval, while being computationally simpler (al-
WITH OTHER GOOD INTERVALS though too complex for most elementary statistics courses).
Many methods have been proposed for improving on the His method is a hybrid of results from the single-sample
ordinary Wald confidence interval for p1 - pa. Since this score intervals for p1 and 132. Specifically, let @,) u,)be the
article dicusses methods appropriate in elementary statistics roots for p , in z a p = I$,-pZI/Jpz(l - p,)/n,. Newcombe’s
The American Statistician, November 2000, VoZ. 54, No. 4 285
hybrid score interval is .92 for the 95% adjusted interval and .86 for the 95% hybrid
score interval.
The adjusted interval I4 and the hybrid score interval
both have a greater tendency for distal non-coverage then
mesial non-coverage. For instance, for the 10,000 randomly
selected cases, the mean probability for which the lower
limit exceeds pl - p2 when p1 - p~ > 0 or the upper limit
is less than pl - p 2 when p1 - p a < 0 was .030 for 14 and
Compared to the adjusted interval I,, the hybrid score in- .033 for the 95% hybrid score interval, whereas the mean
terval also is conservative when pl and p 2 are both close to probability for which the upper limit is less than p l - p2
0 or 1; overall, it is less conservative, however, with mean when p l - p2 > 0 or the lower limit exceeds pl - p2 when
coverage probability closer to the nominal level (see Table p l - p a < 0 was .013 for I4 and .014 for the 95% hybrid
1). Likewise, it tends to be a bit shorter. It has a some- score. As t increases for I t , the ratio of incidence of distal
what higher proportion of cases with coverage probability non-coverage to mesial non-coverage increases; for these
being too small, mainly for values of lpl - pal near 1; for randomly selected cases, for t = (0, 2, 4, 6, 8) it equals (.7,
the 10,000 randomly selected cases with nt also random 1.2, 2.2,4.3, 8.1). Unlike the adjusted interval and the Wald
between 10 and 30, the minimum coverage probability was interval, the hybrid score interval cannot produce overshoot,
Downloaded by [McGill University Library] at 08:56 29 March 2013

c o rage Probability
ct rage Probability

1
1

95
.95

,9

#7
.7

Figure 8. Coverage probabilities for 95% nominal adjusted confi-


Figure 7. Coverage probabilities for 95% nominal Wald confidence dence interval (adding t = 4 pseudo observations) as a function of p l
interval as a function of p 1 and p2, when n 1 = n2 = 10. and p2, when n l = n2 = 10.

286 Teacher's Corner


with the interval for pl - p2 extending below -1 or above Finally, an alternative way to improve the Wald method
+ 1 and thus requiring truncation. Overshoot for It is less is with a continuity correction (Fleiss 1981, p. 29). As with
common as t increases. For instance, for these randomly other continuity corrections, this generally results in con-
selected cases, the mean probability of overshoot for t = servative performance, usually more so than the adjusted
(0, 2, 4, 6, 8) was (.048, .033, .016, .006, .OOO). interval. However, the coverage probabilities, like those of
Since standard intervals for p and p1 -p2 improve greatly the Wald interval, can dip substantially below the nominal
with adjustment corresponding to shrinkage of point esti- level when both p , are near 0 or 1.
mates, one would expect intervals resulting from a Bayesian
approach with comparable shrinkage also to perform well
in a frequentist sense. Carlin and Louis (1996, pp. 117-
123) provided evidence of this type for estimating p , For 4. TEACHING THE ADJUSTED INTERVALS
pl - p 2 , consider independent uniform prior distributions Agresti and Coull(1998) motivated their adjusted interval
for p1 and pa. The posterior distribution of p , is beta with (3) for a single proportion as a simple approximation for the
+ +
mean p”, = ( X , 1)/(n, 2 ) and variance 11,(1-pz)/ (n,+3). score 95% confidence interval. We know of no such sim-
Using a crude normal approximation for the distribution of ple motivation for the adjusted interval for the two-sample
the difference of the posterior beta variates leads to the in- comparison, other than the similarity with the Bayesian in-
terval
terval (4). A problem for future research is to study whether
theoretical support exists for this simple yet effective ad-

&+?d,jz
Downloaded by [McGill University Library] at 08:56 29 March 2013

justment, such as Edgeworth or saddlepoint expansions that


(81 - F a ) f Pl(1 -111) p”2(1-1?2) (4) might provide improved approximations for the tail behav-
ior of Fl - p2.
The motivation needed for teaching in the elementary
statistics course is quite different. How can one motivate
This has the same center as the adjusted interval I4 but
uses n, + 3 instead of n, + 2 in the denominators of the adding pseudo observations? In the single-sample case we
standard error.’ For elementary courses, this interval was remind students that the binomial distribution is highly
suggested by Berry (1996, p. 291). Like Newcombe’s hy- skewed as p approaches 0 and 1, and because of this perhaps
brid score interval, it tends to perform quite well, being should not be the midpoint of the interval. As support for
slightly shorter and less conservative than I4 but suffering this, we have students use the software ExplorStat (available
occasional poorer coverages (see Table 1). For sample size at http://www.stat.ufl.edu/-dwack/). Through simulation
combinations we considered, its minimum coverage proba- it shows how operating characteristics of statistical methods
bility was only slightly below that for the adjusted interval. change as students vary sample sizes and population distri-
If conservativeness is a concern (e.g., if both p , are likely butions. For instance, when p takes values such as .10 or
to be close to O), the approximate Bayes and hybrid score .90, students observe a relatively high proportion of Wald
intervals are slightly preferable to 1 4 . intervals failing to contain p when n is 30, the sample size
The adjusted interval I4 (and the similar approximate their text suggests is adequate for large-sample inference
Bayes interval (4)) is simpler than other methods that im- for a mean.
prove greatly over the Wald interval. Thus, we believe it Most students, however, seem more convinced by spe-
is appropriate for elementary statistics courses. We do not cific examples where the Wald method seems nonsensical,
claim optimality in any sense or that other methods may such as when p = 0 or 1. We often use data from a ques-
not be better for some purposes. Some applications, for in- tionnaire administered to the students at the beginning of
stance, may require that the true confidence level be no term. For instance, one of us (Agresti) taught a class to 24
lower than the nominal level, mandating a method that is honors students in fall 1999. In response to the question,
necessarily conservative (e.g., Chan and Zhang 1999). Also, “Are you a vegetarian?’, 0 of the 24 students responded
we recommend 1 4 for interval estimation and not for an im- “yes,” yet they realized that the Wald interval of [O, 01 was
plicit test of NO : p~ - p~ = 0, although such a test would not plausible for a corresponding population proportion. We
be more reliable than one based on the Wald interval. For have also used homework exercises such as estimating the
a significance test, we would continue to teach the Pearson probability of success for a new medical treatment when all
chi-squared test in elementary courses. The test based on 10 subjects in a sample experience success, or estimating
I4 is too conservative when the common value of p , un- the probability of death due to suicide when a sample of 30
der the null is close to 0 or close to 1, for most sample death records has no occurrences. (Again, the Wald interval
sizes more conservative than the Pearson test for such p,. is [O, 01, but the National Center for Health Statistics re-
Although the adjusted interval is not guaranteed to be con- ports that in the United States the probability of death due
sistent with the result of the Pearson test, it usually does to suicide is about .01.) Although one can amend the Wald
agree. For instance, for common values (.l, .2, .3, .4, .5) of method to improve its behavior when $ = 0 or 1, such as
p z , the 95% version of I4 and the Pearson test with nominal by replacing the endpoints by ones based on the exact bi-
significance level of .05 agree with probability (.972, .996, nomial test, making such exceptions from a general recipe
.9996, 1.000, 1.000) when 121 = nz = 30 and (1.0, 1.0, 1.0, distracts students from the main idea of taking the estimate
1.0, 1.0) when n1 = 122 = 10. plus and minus a normal-score multiple of a standard error.
The American Statistician, November 2000, VoZ. 54, No. 4 287
Why four pseudo observations? In the single-sample case [Received September 1999. Revised February 2000.1
we explain that this approximates the results of a more
complex method that does not require estimating the un-
known standard error; here, we explain the concept of in- REFERENCES
verting the test with null standard error, or finding solutions Agresti, A,, and Coull, B. A. (1998), “Approximate is Better than ‘Exact’
of (6 - p ) = 2 d m that do not require estimating for Interval Estimation of Binomial Proportions,” The American Statis-
J p w . In the two-sample case one could explain that tician, 52, 119-126.
this approximates a statistical analysis that represents prior Berry, D. A. (19961, Statistics: A Bayesian Perspective, Belmont, CA:
Wadsworth.
beliefs about each p , by a uniform distribution. (Some in- Brown, L. D., Cai, T. T., and DasGupta, A. (1999), “Confidence Intervals
structors, of course, will prefer a more fully Bayesian ap- for a Binomial Proportion and Edgeworth Expansions,” technical report
proach, as in Berry 1996.) 99-18, Purdue University, Statistics Department.
The poor performance of the ordinary Wald intervals Carlin, B. P., and Louis, T. A. (1996), Bayes and Empirical Bayes Methods
for p and for p l - p2 is unfortunate, since they are the for Data Analysis, London: Chapman and Hall.
simplest and most obvious ones to present in elementary Chan, I. S. F., and Zhang, Z. (19991, “Test-Based Exact Confidence Inter-
vals for the Difference of Two Binomial Proportions,” Biometrics, 55,
courses. Also unfortunate for these intervals is the difficulty 1202-1209.
of providing adequate sample size guidelines. Introductory Fleiss, J. L. (19811, Statistical Methods for Rates and Proportions (2nd
textbooks provide a variety of recommendations, but these ed.), New York: Wiley.
have inadequacies (Leemis and Trivedi 1996; Brown et al. Ghosh, B. K. (1979), “A Comparison of Some Approximate Confidence In-
1999). And, needless to say, most texts do not indicate tervals for the Binomial Parameter,” Journal of the American Statistical
Downloaded by [McGill University Library] at 08:56 29 March 2013

what to do when the guidelines are violated, other than Association, 74, 894900.
perhaps to consult a statistician. The results in this arti- Leemis, L. M., and Trivedi, K. S. (19961, “A Comparison of Approximate
Interval Estimators for the Bernoulli Parameter,” The American Statis-
cle suggest that for the “add two successes and two fail- tician, 50, 63-68.
ures’’ adjusted confidence intervals, one might simply by- McClave, J. T., and Sincich, T. (2000), Statistics (8th ed.), Englewood
pass sample size rules. The adjusted intervals have safe Cliffs, NJ: Prentice Hall.
operating characteristics for practical application with al- Mee, R. W. (1984), “Confidence Bounds for the Difference Between Two
most all sample sizes. In fact, we note in closing (and with Probabilities,” Biometrics, 40, 1175-1 176.
tongue in cheek) that the adjusted intervals L(7-2,~) and Newcombe, R. (1998a), “Two-sided Confidence Intervals for the Single
Proportion: Comparison of Seven Methods,” Statistics in Medicine, 17,
I4(n1,zl; 7-22, z 2 ) have the advantage that, as with Bayesian
857-872.
methods, one can do an analysis without having any data. -(1998b), “Interval Estimation for the Difference Between Inde-
In the single-sample case the adjusted sample then has fi =
2/4, and the 95% confidence interval is .5 ad-, * pendent Proportions: Comparison of Eleven Methods,” Statistics in
Medicine, 17, 873-890.
or [0, 11. In the two-sample case the adjusted samples have Samuels, M. L., and Witmer,.J. W. (19991, Statisticsfor the Life Sciences
p1 = l / 2 and $2 = 1/2, and the 95% confidence interval is (2nd ed.), Englewood Cliffs, NJ: Prentice Hall.
+
(.5 - .5) f 2 J [ ( . 5 ) ( . 5 ) / 2 ] [(.5)(.5)/2],or [-1, +1]. Both Vollset, S. E. (1993), “Confidence Intervals for a Binomial Proportion,”
Statistics in Medicine, 12, 809-824.
analyses are uninformative, as one would hope from a fre-
Wilson, E. B. (1927), “Probable Inference, the Law of Succession, and
quentist approach with no data. No one will get into too Statistical Inference,” Journal of the American Statistical Association,
much trouble using them! 22.209-212.

288 Teacher’s Corner

You might also like