A Practical Guide For Understanding Confidence Intervals and P Values
A Practical Guide For Understanding Confidence Intervals and P Values
INVITED ARTICLE
0194-5998/$36.00 © 2009 American Academy of Otolaryngology–Head and Neck Surgery Foundation. All rights reserved.
doi:10.1016/j.otohns.2009.02.003
Wang et al A practical guide for understanding confidence . . . 795
Figure 5 These illustrations show that the difference between means can be the same, but the spread of the data within each group can
influence the probability that the groups are significantly different. Here, there is a mixture of actual data, the means of groups A and B,
and inferential calculations such as the SEM for each group (SEM-A and SEM-B) and the SED.
In Figure 5, the difference between means are the same group B.7 It is not so important to know how these are
in Figure 5A and B; however, the SEMs (SEM-A and calculated, but it is useful to know from where these values
SEM-B) for each group are different, causing the standard come (Appendix).
error of the difference between means (SED) to be different. In performing a t test, a “critical ratio” is calculated by
Explanations of these terms follow. dividing the difference between means by the SED.8 This
Understanding P Values
The inferential SEM is calculated from the actual standard
deviation (s) divided by the square root of the number of
subjects in the group. The SD is an index of dispersion
of actual sample data, and the SEM is an index of dispersion
of a hypothetical series of means repetitively taken from the
parent population from which the sample was taken.7 The
SED is inferential as well and is calculated by taking
the square root of the sum of the squared SD of group A
divided by the number of subjects in group A plus the Figure 6 This figure illustrates the generation of probability
squared SD of group B divided by the number of subjects in scores using a t test.
Wang et al A practical guide for understanding confidence . . . 797
the difference between means # Z!(SED) (Fig 7). Note that AUTHOR INFORMATION
in this new distribution of differences between means, the
SED is much like the SE and the 95 percent CI looks like From the Department of Otolaryngology–Head and Neck Surgery, Wash-
ington University School of Medicine.
the inner 95th percentile range in descriptive statistics of
Corresponding author: J. Gail Neely, MD, Department of Otolaryngology–
actual data (Appendix).
Head and Neck Surgery, Washington University School of Medicine, 660
Most CIs are equidistant about the mean, having an S. Euclid Ave, Box 8115, St Louis, MO 63110.
upper limit and a lower limit often reported as such, as with E-mail address: [email protected];
the SPSS statistical program (SPSS Inc, Chicago, IL), or as
[email protected] (alternative).
a range separated by a hyphen or two numbers separated by
a comma. When only one number is given for a CI, as is the
case with the SigmaStat program (Systat Software Inc,
Richmond, CA), it is assumed that the symbol#precedes AUTHOR CONTRIBUTIONS
that CI single number to determine the upper and lower Eric W. Wang, primary contributor; Nsangou Ghogomu, primary con-
limit values about the mean. tributor; Courtney C. J. Voelker, reader, editor; Jason T. Rich, reader,
However, occasionally, the upper and lower limits are editor; Randal C. Paniello, reader, editor; Brian Nussenbaum, reader,
asymmetrical about the mean. This is the case with odds editor; Ron J. Karni, reader, editor; J. Gail Neely, primary author.
ratios (OR), also called the cross-product ratio (ad/bc), be-
cause the CI is first calculated as the natural log (Loge; ln)
of the OR (ln(OR)) and the results (lnLower Limit and DISCLOSURE
lnUpper Limit) are then converted back to a range of ORs
by taking the antilogarithm of each using them as exponents Competing interests: None.
of e7,9 (Appendix, Table 1). Sponsorships: None.
REFERENCES
CONCLUSION
1. Goodman SN, Berlin JA. The use of predicted confidence intervals
The 95 percent CI about the mean demarcates the range of when planning experiments and the misuse of power when interpreting
values in which the mean would fall if many samples from results. Ann Intern Med 1994;121:200 – 6.
the universal parent population were taken. In other words, 2. Shakespeare TP, Gebski VJ, Veness MJ, et al. Improving interpretation
if the same observation, experiment, or trial were done over of clinical studies by use of confidence levels, clinical significance
curves, and risk-benefit contours. Lancet 2001;357:1349 –53.
and over with a different sample of subjects, but with the
3. Sim J, Reid N. Statistical inference by confidence intervals; issues of
same characteristics as the original sample, 95 percent of the interpretation and utilization. Phys Ther 1999;79:186 –95.
means from those repeated measures would fall within this 4. Smith SD. Statistical tools in the quest for truth: hypothesis testing,
range. This gives a measure of how confident we are in the confidence intervals, and the power of clinical studies. Ophthalmology
original mean. It not only tells us whether the results are 2008;115:423– 4.
5. Visintainer PF, Tejani N. Understanding and using confidence intervals
statistically significant because the CI falls totally on one
in clinical research. J Matern Fetal Med 1998;7:201– 6.
side or the other of the no difference marker (0 if continuous 6. Zou GY, Donner A. Construction of confidence limits about effect
variables; 1 if proportions), but it gives us the actual values measures: a general approach. Stat Med 2007;27:1693–702.
so that we might determine whether the data seem clinically 7. Feinstein AR. Clinical epidemiology: the architecture of clinical re-
important. In contrast, the P value tells us only whether the search. Philadelphia: WB Saunders; 1985. p. 102–3, 113, 145, 161,
124 –5, 432.
results are statistically significant, without translating that
8. Jekel JF, Katz DL, Elmore JG. Epidemiology, biostatistics, and preven-
information into values relative to the variable that was tive medicine. 2 ed. Philadelphia: WB Saunders; 2001. p. 158.
measured. Consequently, the CI is a better choice to de- 9. Motulsky H. Intuitive biostatistics. New York: Oxford University Press;
scribe the results of observations, experiments, or trials. 1995. p. 78 and Table A5.4.
Wang et al A practical guide for understanding confidence . . . 799
APPENDIX
Descriptive (of actual data Inferential (use of actual data to infer values
sample) if sample is repeated many times)
Continuous variables
Mean (X̄) X̄ " !"Xi# ⁄ n
SD (s)
Inner 95th percentile range (ipr95)
s" $ "!Xi # X̄#2⁄N # 1
ipr95 " X̄ $ Z!s " X̄ $ 1.96s
SEM (SX) s ⁄ $N
95% CI about a single variable CI " X̄ $ Z! s¯X " X̄ $ 1.96s¯X
mean, large sample size, two-
tailed
95% CI about a single variable CI " X̄ $ t!v s¯X
mean, small sample size (!30), Example: d t
two-tailed 29 1.699
20 1.725
10 1.812
df $ n & 1
95% CI about the difference CI95 " X̄difference $ Z! !SED# " X̄difference $ 1.96
between means, two-tailed !SED#
SED
Proportions
SED " % % SA2
nA
%
S2B
nB
SD of a proportion s" $pq
SE of a proportion s¯X " $pq ⁄ N
95% CI of a proportion 95% CI " p $ Z! !$pq ⁄ n#
95% CI of OR 95% CI ln!OR# " ln
!OR# $ 1.96 %1 1 1 1
% % %
A B C D
→ Lower lim 95% CI !OR# " OR # eln LL
→ Upper lim 95% CI !OR# " OR % eln UL
Certainly the universal parent population has a mean ', a standard deviation (, and a proportion ), if a dichotomous. However, the
importance of these two columns, described in terms of the sample nomenclature, is to emphasize those items that predominantly
describe the sample and those that require significant inference from the parent population to allow a degree of confidence about
the sample data.
Xi, individual value; "!Xi#, sum of all the individual values in a group; n or N, number of all individual values in the group under
consideration; Z! , Z frequency (probability) distribution of specific alpha level, which is usually 0.05, meaning the level set to
demarcate statistical significance in which 5 percent error is acceptable; *, square root; t!v , t distribution at + alpha level (usually
0.05) and ,, degrees of freedom, which is n minus the number of times a mean is calculated (usually n & 1 per group); p, proportion
of interest; q, 1-p.
% 1 1 1 1
% % % where A,B,C,D are the values from the cells (a,b,c,d) in the original data 2 - 2 table generating the OR, also known
A B C D
as the cross-product ratio (ad/bc). ln $ natural log $ Loge. e $ 2.718281828. lnLL $ natural log of the CI lower limit and lnUL $
natural log of CI upper limit.