0% found this document useful (0 votes)
88 views6 pages

A Practical Guide For Understanding Confidence Intervals and P Values

This document discusses confidence intervals and P values, which are statistical methods used to analyze data from medical studies. Confidence intervals indicate a range of values that the true population mean is likely to fall within, based on results from a sample. P values indicate whether results are statistically significant, but do not provide information on the actual values or precision of the results. The document explains how these statistical methods can help readers understand and generalize results from medical studies.

Uploaded by

kilicbilge50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views6 pages

A Practical Guide For Understanding Confidence Intervals and P Values

This document discusses confidence intervals and P values, which are statistical methods used to analyze data from medical studies. Confidence intervals indicate a range of values that the true population mean is likely to fall within, based on results from a sample. P values indicate whether results are statistically significant, but do not provide information on the actual values or precision of the results. The document explains how these statistical methods can help readers understand and generalize results from medical studies.

Uploaded by

kilicbilge50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Otolaryngology–Head and Neck Surgery (2009) 140, 794-799

INVITED ARTICLE

A practical guide for understanding confidence


intervals and P values
Eric W. Wang, MD, Nsangou Ghogomu, BS, Courtney C. J. Voelker, MD,
D. Phil (Oxon), Jason T. Rich, MD, Randal C. Paniello, MD,
Brian Nussenbaum, MD, Ron J. Karni, MD, and J. Gail Neely, MD,
St Louis, MO
No sponsorships or competing interests have been disclosed for lation.1-6 In other words, the CI is the range of values about
this article. the sample mean that we can be relatively certain the true
ABSTRACT mean of the universal population falls.
There is a difference between actual data and our cer-
The 95 percent confidence interval about the mean demarcates the tainty, or inference, about the data. When we speak of a P
range of values in which the mean would fall if many samples value, SEM (often shortened to just SE1) or CI about the
from the universal parent population were taken. In other words, if mean, we are discussing inferential statistics in contradis-
the same observation, experiment, or trial were done over and over tinction to descriptive statistics7 (see Appendix).
with a different sample of subjects, but with the same character-
This distinction between actual reported data and infer-
istics as the original sample, 95 percent of the means from those
repeated measures would fall within this range. This gives a ence is important, primarily because we want to be able to
measure of how confident we are in the original mean. It tells us generalize the results of a clinical article to our own prac-
not only whether the results are statistically significant because the tice. To do this, it is crucial to remember that the article
CI falls totally on one side or the other of the no difference marker generally reports one sample taken from the universal, or
(0 if continuous variables; 1 if proportions), but also the actual parent, population of subjects with the same characteristic
values so that we might determine if the data seem clinically (Fig 1). For example, suppose authors in New York report
important. In contrast, the P value tells us only whether the results on the comparison of treatment A versus treatment B. We
are statistically significant, without translating that information
immediately think that a sample of subjects so treated is just
into values relative to the variable that was measured. Conse-
like our patients here in Missouri, California, Texas, or
quently, the CI is a better choice to describe the results of obser-
vations, experiments, or trials. Toronto, and that if we did the same thing, we would get the
same results. In other words, we tend to generalize their
© 2009 American Academy of Otolaryngology–Head and Neck sample to our own practice. However, to be a bit more
Surgery Foundation. All rights reserved.
discerning, we need to know just how stable their results are
or how confident we might be about their reported data.
S tudies use a sample of patients who have a disease or
have undergone a treatment to draw conclusions about
the larger population of similar individuals. No matter how
That is where inferential statistics applies. Certainly, many
factors of how they obtained the data may be more impor-
tant, but this article is concerned only with the stability of
carefully the study sample is selected to minimize bias and
the obtained data.
baseline group differences, information gathered from a
Descriptive statistics describes the actual data from a
sample leads to some level of uncertainty and chance. Tra-
sample group, experiment, or trial. Actual data generates
ditionally, P value, or probability, has been used to deter-
individual data points (Xi), the central tendency of the data,
mine whether the results are due to chance. P value is
limited in that it provides no information regarding the such as the mean (X̄) or median, and the spread of the data,
magnitude and precision of the results. In addition, P value described as standard deviation (SD or s), interquartile
does not address how much the results would vary if the range, or inner percentile ranges (ipr)7 (Fig 2).
study were performed numerous times. Conversely, a con- Inferential statistics describes what we might expect if
fidence interval (CI) is a range of plausible results that the same sampling of a similar group, experiment, or trial
attempts to estimate the precision of the results and quantify was repeated many times to characterize the universe, or
the uncertainty inherent to studying a sample of the popu- parent population. This helps us know how confident we

Received January 19, 2009; accepted February 3, 2009.

0194-5998/$36.00 © 2009 American Academy of Otolaryngology–Head and Neck Surgery Foundation. All rights reserved.
doi:10.1016/j.otohns.2009.02.003
Wang et al A practical guide for understanding confidence . . . 795

Figure 3 This illustration represents how stable the reported


mean of data from a single sample variable is (SEM) and how
confident we might be that the reported mean falls within 95
percent of the mean values (95% CI) if the same experiment were
Figure 1 The large circle represents the universal, parent pop- done many times over. The CI is demarcated by an upper limit and
ulation of similar subjects and the boxes represent samples of a lower limit of the mean values from hypothetical repeated mea-
subjects that may be taken from this large population. All studies surements.
report only data from samples.

would be that the mean reported for a variable in an article


is within 95 percent of the means obtained by repeated
sampling (Fig 3)— or how confident we would be that the
comparison between group outcomes is due to the interven-
tion, rather than to chance.

COMPARISON BETWEEN GROUPS:


P VALUES
When two groups are compared, each group has actual data
defining the mean of each group and the spread of the data
within that group. In comparing the groups, both the means
and the spread are important. What catches our eye imme-
diately is the difference between the means (Fig 4). How-
ever, the spread of the data within the groups may be more
important (Figs 5A, B and 6).

Figure 2 This illustration represents actual data relative to a


single variable. The central vertical line with bar X is the mean,
the two lines on either side of the mean represent the SD (s) about
the mean, and the outer lines inscribe 95 percent of the data,
defined as plus and minus 1.96 times the SD, and known as the
inner 95th percentile range (ipr95). The data can be further divided
into quartiles in which the lower 25th percentile and the upper 75th Figure 4 In this illustration of the two groups being compared,
percentile can be defined and inscribe the central 50th percentile of each group has its own mean and unique spread of data. The
the data known as the interquartile range. difference between the means immediately catches our eye.
796 Otolaryngology–Head and Neck Surgery, Vol 140, No 6, June 2009

Figure 5 These illustrations show that the difference between means can be the same, but the spread of the data within each group can
influence the probability that the groups are significantly different. Here, there is a mixture of actual data, the means of groups A and B,
and inferential calculations such as the SEM for each group (SEM-A and SEM-B) and the SED.

In Figure 5, the difference between means are the same group B.7 It is not so important to know how these are
in Figure 5A and B; however, the SEMs (SEM-A and calculated, but it is useful to know from where these values
SEM-B) for each group are different, causing the standard come (Appendix).
error of the difference between means (SED) to be different. In performing a t test, a “critical ratio” is calculated by
Explanations of these terms follow. dividing the difference between means by the SED.8 This

Understanding P Values
The inferential SEM is calculated from the actual standard
deviation (s) divided by the square root of the number of
subjects in the group. The SD is an index of dispersion
of actual sample data, and the SEM is an index of dispersion
of a hypothetical series of means repetitively taken from the
parent population from which the sample was taken.7 The
SED is inferential as well and is calculated by taking
the square root of the sum of the squared SD of group A
divided by the number of subjects in group A plus the Figure 6 This figure illustrates the generation of probability
squared SD of group B divided by the number of subjects in scores using a t test.
Wang et al A practical guide for understanding confidence . . . 797

marker, the results are not statistically significant. Note that


this is true for data with normal distributions; however, if
the data is markedly skewed, other measures such as the use
of the median are more appropriate. Discussion of skewed
data is outside the scope of this article.
The interpretation of clinical significance, or importance,
is an additional value of the CI. The CI values may show us
that even statistically significant results really do not seem
to mean much clinically. For example, in a study, a large
number of subjects are treated with canal wall up mastoid-
ectomy and a similarly large number of subjects are treated
with a canal wall down mastoidectomy, and a mean differ-
ence in hearing of 3 dB is calculated. The mean difference
may be statistically significant in favor of canal wall up
procedures because P ! 0.05 and the CI about the mean
Figure 7 This illustration depicts a frequency distribution curve difference of 3 dB ranges from 1 to 5 dB. However, we
of the comparison between means of two groups and the genera- might not think that this small difference is clinically mean-
tion of the 95 percent CI about the reported difference between the ingful. On the other hand, an article that finds, in a smaller
group means. SED, standard error of the difference between the number of subjects, that the mean difference is 15 dB, but
two means (see text for calculation). because the P value is "0.05 and the CI ranges from –1 to
31 dB, the results would not be statistically significant.
However, we might feel that this may be a clinically very
critical ratio defines the statistic “t,” which is then looked up important preliminary finding. Thus, it might be worthwhile
in a probability table for t statistics to get the P value9 (Fig to test a larger number of subjects to determine if this
6). The table shows us that the number of subjects plays a difference holds up and is both statistically and clinically
major role in the resulting P value. meaningful. Explanations of terms in calculating CI follow.
The P value assures us of the probability of the results
being really from the intervention (P ! 0.05) versus simply
by chance (P " 0.05). However, because numbers of sub- Understanding the Calculation of
jects are so important, it is possible to have statistically Confidence Intervals
significant results, but the difference is so small as to be Again, the SEM is an index of dispersion of a hypothetical
clinically meaningless. Thus, the P value is qualitative in series of means repetitively taken from the parent popula-
the sense that it tells us yes or no if chance played a major tion from which the sample was taken.7 As seen in Figure 3,
role; however, it does not tell us quantitatively about the a CI also may be obtained for univariate data for a single
generalizable values associated with the sample means. variable. The CI is calculated as the mean # Z!(SEM); in
More discussion of this issue will be presented later. larger sample sizes, z! for a two-tailed description $ 1.96
for 95 percent CI7 (Appendix).
When comparing two groups, a new mean value, the
difference between means, is generated. Likewise, the new
COMPARISON BETWEEN GROUPS: SEM in this new group is the SED. If repetitively the same
CONFIDENCE INTERVALS experiment were done many times, a new frequency distri-
bution of a series of values representing the differences
The CI about the mean is also inferential but serves as an
between means would be constructed, but the SE this time
index of the values associated with a sample mean as well
would be the SED. Thus, the CI is calculated, as above, as
as an index of statistical significance if the experiment or
trial were done over and over to create a hypothetical series
of means. The values give us a sense of just how meaningful
the sample data is to generalize to our practice. CIs may be Table 1
2 ! 2 contingency table
set at any percent desired; however, in most cases 95 per-
cent is usually chosen (95% CI). This means that if samples Dependent
of the parent population were taken over and over, 95 (outcome)
percent of the values of the resulting means would fall variable
within this CI, demarcated by the upper limit and lower
limit of the CI (Figs 3 and 7). Independent (predictor) % &
variable
The statistical interpretation is easy. If the whole CI is on Group A a b a%b
one side or the other of the no difference marker, 0 in Group B c d c%d
continuous data and 1 in ratios, then the results are statis- a%c b%d
tically significant; however, if the interval crosses the
798 Otolaryngology–Head and Neck Surgery, Vol 140, No 6, June 2009

the difference between means # Z!(SED) (Fig 7). Note that AUTHOR INFORMATION
in this new distribution of differences between means, the
SED is much like the SE and the 95 percent CI looks like From the Department of Otolaryngology–Head and Neck Surgery, Wash-
ington University School of Medicine.
the inner 95th percentile range in descriptive statistics of
Corresponding author: J. Gail Neely, MD, Department of Otolaryngology–
actual data (Appendix).
Head and Neck Surgery, Washington University School of Medicine, 660
Most CIs are equidistant about the mean, having an S. Euclid Ave, Box 8115, St Louis, MO 63110.
upper limit and a lower limit often reported as such, as with E-mail address: [email protected];
the SPSS statistical program (SPSS Inc, Chicago, IL), or as
[email protected] (alternative).
a range separated by a hyphen or two numbers separated by
a comma. When only one number is given for a CI, as is the
case with the SigmaStat program (Systat Software Inc,
Richmond, CA), it is assumed that the symbol#precedes AUTHOR CONTRIBUTIONS
that CI single number to determine the upper and lower Eric W. Wang, primary contributor; Nsangou Ghogomu, primary con-
limit values about the mean. tributor; Courtney C. J. Voelker, reader, editor; Jason T. Rich, reader,
However, occasionally, the upper and lower limits are editor; Randal C. Paniello, reader, editor; Brian Nussenbaum, reader,
asymmetrical about the mean. This is the case with odds editor; Ron J. Karni, reader, editor; J. Gail Neely, primary author.
ratios (OR), also called the cross-product ratio (ad/bc), be-
cause the CI is first calculated as the natural log (Loge; ln)
of the OR (ln(OR)) and the results (lnLower Limit and DISCLOSURE
lnUpper Limit) are then converted back to a range of ORs
by taking the antilogarithm of each using them as exponents Competing interests: None.
of e7,9 (Appendix, Table 1). Sponsorships: None.

REFERENCES
CONCLUSION
1. Goodman SN, Berlin JA. The use of predicted confidence intervals
The 95 percent CI about the mean demarcates the range of when planning experiments and the misuse of power when interpreting
values in which the mean would fall if many samples from results. Ann Intern Med 1994;121:200 – 6.
the universal parent population were taken. In other words, 2. Shakespeare TP, Gebski VJ, Veness MJ, et al. Improving interpretation
if the same observation, experiment, or trial were done over of clinical studies by use of confidence levels, clinical significance
curves, and risk-benefit contours. Lancet 2001;357:1349 –53.
and over with a different sample of subjects, but with the
3. Sim J, Reid N. Statistical inference by confidence intervals; issues of
same characteristics as the original sample, 95 percent of the interpretation and utilization. Phys Ther 1999;79:186 –95.
means from those repeated measures would fall within this 4. Smith SD. Statistical tools in the quest for truth: hypothesis testing,
range. This gives a measure of how confident we are in the confidence intervals, and the power of clinical studies. Ophthalmology
original mean. It not only tells us whether the results are 2008;115:423– 4.
5. Visintainer PF, Tejani N. Understanding and using confidence intervals
statistically significant because the CI falls totally on one
in clinical research. J Matern Fetal Med 1998;7:201– 6.
side or the other of the no difference marker (0 if continuous 6. Zou GY, Donner A. Construction of confidence limits about effect
variables; 1 if proportions), but it gives us the actual values measures: a general approach. Stat Med 2007;27:1693–702.
so that we might determine whether the data seem clinically 7. Feinstein AR. Clinical epidemiology: the architecture of clinical re-
important. In contrast, the P value tells us only whether the search. Philadelphia: WB Saunders; 1985. p. 102–3, 113, 145, 161,
124 –5, 432.
results are statistically significant, without translating that
8. Jekel JF, Katz DL, Elmore JG. Epidemiology, biostatistics, and preven-
information into values relative to the variable that was tive medicine. 2 ed. Philadelphia: WB Saunders; 2001. p. 158.
measured. Consequently, the CI is a better choice to de- 9. Motulsky H. Intuitive biostatistics. New York: Oxford University Press;
scribe the results of observations, experiments, or trials. 1995. p. 78 and Table A5.4.
Wang et al A practical guide for understanding confidence . . . 799

APPENDIX

Formulas used in descriptive and inferential assessments

Descriptive (of actual data Inferential (use of actual data to infer values
sample) if sample is repeated many times)

Continuous variables
Mean (X̄) X̄ " !"Xi# ⁄ n
SD (s)
Inner 95th percentile range (ipr95)
s" $ "!Xi # X̄#2⁄N # 1
ipr95 " X̄ $ Z!s " X̄ $ 1.96s
SEM (SX) s ⁄ $N
95% CI about a single variable CI " X̄ $ Z! s¯X " X̄ $ 1.96s¯X
mean, large sample size, two-
tailed
95% CI about a single variable CI " X̄ $ t!v s¯X
mean, small sample size (!30), Example: d t
two-tailed 29 1.699
20 1.725
10 1.812
df $ n & 1
95% CI about the difference CI95 " X̄difference $ Z! !SED# " X̄difference $ 1.96
between means, two-tailed !SED#
SED

Proportions
SED " % % SA2
nA
%
S2B
nB
SD of a proportion s" $pq
SE of a proportion s¯X " $pq ⁄ N
95% CI of a proportion 95% CI " p $ Z! !$pq ⁄ n#
95% CI of OR 95% CI ln!OR# " ln

!OR# $ 1.96 %1 1 1 1
% % %
A B C D
→ Lower lim 95% CI !OR# " OR # eln LL
→ Upper lim 95% CI !OR# " OR % eln UL
Certainly the universal parent population has a mean ', a standard deviation (, and a proportion ), if a dichotomous. However, the
importance of these two columns, described in terms of the sample nomenclature, is to emphasize those items that predominantly
describe the sample and those that require significant inference from the parent population to allow a degree of confidence about
the sample data.
Xi, individual value; "!Xi#, sum of all the individual values in a group; n or N, number of all individual values in the group under
consideration; Z! , Z frequency (probability) distribution of specific alpha level, which is usually 0.05, meaning the level set to
demarcate statistical significance in which 5 percent error is acceptable; *, square root; t!v , t distribution at + alpha level (usually
0.05) and ,, degrees of freedom, which is n minus the number of times a mean is calculated (usually n & 1 per group); p, proportion
of interest; q, 1-p.

% 1 1 1 1
% % % where A,B,C,D are the values from the cells (a,b,c,d) in the original data 2 - 2 table generating the OR, also known
A B C D
as the cross-product ratio (ad/bc). ln $ natural log $ Loge. e $ 2.718281828. lnLL $ natural log of the CI lower limit and lnUL $
natural log of CI upper limit.

You might also like