0% found this document useful (0 votes)
56 views

What Do P-Values and Confidence Intervals Really Tell Us?

This document discusses p-values and confidence intervals in statistical analysis. It explains that p-values and confidence intervals allow researchers to make inferences about populations based on random samples. The document provides examples of how to select the appropriate statistical test based on study design and variables. It also explains how to calculate a p-value and what it represents - the probability of obtaining results at least as extreme as the observed results of a study, assuming the null hypothesis is true. A lower p-value is stronger evidence against the null hypothesis.

Uploaded by

santoshchitra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

What Do P-Values and Confidence Intervals Really Tell Us?

This document discusses p-values and confidence intervals in statistical analysis. It explains that p-values and confidence intervals allow researchers to make inferences about populations based on random samples. The document provides examples of how to select the appropriate statistical test based on study design and variables. It also explains how to calculate a p-value and what it represents - the probability of obtaining results at least as extreme as the observed results of a study, assuming the null hypothesis is true. A lower p-value is stronger evidence against the null hypothesis.

Uploaded by

santoshchitra
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 52

What do p-values and

confidence intervals really


tell us?

Rita Popat, PhD


Clinical Assistant Professor
Division of Epidemiology
Stanford University School of Medicine
August 9, 2007
Why use statistics at all?
Average height of all 25-year-old men in North America
is a PARAMETER.

The height of the members of a sample of 100 such


men are measured; the average of those 100 numbers
is a STATISTIC.

Using inferential statistics, we make inferences


about population (taken to be unobservable) based
on a random sample taken from the population of
interest.
2
Is risk factor X associated with disease Y?
Selection of subjects
Population
Sample

Inference

From the sample, we compute an estimate of the effect


of X on Y (e.g., risk ratio if cohort study):
- Is the effect real? Did chance play a role?
3
Why worry about chance?
Sample 1
Population
Sample 2…

Sample k

Sampling variability…
- you only get to pick one sample!

4
Interpreting the results
Selection of subjects
Population
Sample

Inference

Make inferences from data collected using laws of


probability and statistics
- tests of significance (p-value)
- confidence intervals 5
How do we determine if an association
is significant?

 Significance is in context of “Statistical”


significance

 p-values
 Confidence intervals

6
Significance testing

 The interest is generally in comparing


two groups (e.g., risk of outcome in the
treatment and placebo group)

 The statistical test depends on the


format of the data and the study design

7
Significance testing

Choice of Statistical
test

Predictor/ Study
Independent design
Variable & question

Outcome/
Dependent
variable
8
Choice of statistical test when…
Dichotomous
outcome (yes/no,
alive, dead)

Independent variable
Categorical
(e.g., smoking yes vs no)

Is smoking associated
with the outcome? Outcome + Outcome -

Smk (yes) a b
Statistical test…
(pS+ )
 Two sample proportion or Smk (no) c d
 Chi-square or (pS- )
 Risk ratio 9
Choice of statistical test when…
Dichotomous
outcome (yes/no,
alive, dead)

Independent variable
Continuous
(e.g., smoking pack yrs)

Is smoking associated
with the outcome?
Outcome + Outcome - Statistical test…
 Two sample t-test
Smoking
amount
(pk yrs)
xo  xo 
10
Significance testing
Subjects with Acute MI
Mortality ? Mortality
IV nitrate  No nitrate
PN PC

 Suppose we do a clinical trial to answer the above question


 Even if IV nitrate has no effect on mortality, due to
sampling variation, it is very unlikely that PN = PC
 Any observed difference b/w groups may be due to
treatment or a coincidence (or chance)

11
Obtaining P values
Number dead / randomized
Trial Intravenous Control Risk Ratio 95% C.I. P value
nitrate
How do we get this p-value?

Chiche 3/50 8/45 0.33 (0.09,1.13) 0.08

Bussman 4/31 12/29 0.24 (0.08,0.74) 0.01

Flaherty 11/56 11/48 0.83 (0.33,2.12) 0.70

Jaffe 4/57 2/57 2.04 (0.39,10.71) 0.40

Lis 5/64 10/76 0.56 (0.19,1.65) 0.29

Jugdutt 24/154 44/156 0.48 (0.28, 0.82) 0.007

12
Table adapted from Whitley and Ball. Critical Care; 6(3):222-225, 2002
Null Hypothesis(Ho)
 There is no association between the
independent and dependent/outcome
variables
 Formal basis for hypothesis testing

 In the example, Ho :”The administration of IV


nitrate has no effect on mortality in MI
patients” or PN - PC = 0

13
Example of significance testing
 In the Chiche trial:
 pN = 3/50 = 0.06; pC = 8/45 = 0.178

 Null hypothesis:
 H0: pN – pC = 0 or pN = pC

 Statistical test:
 Two-sample proportion

14
General form of a test statistic

relevant statistic  hypothesized parameter


test statistic 
standard error of the relevant statistic

X N  XC XN XC
p , pN  , pC 
n N  nC nN nC
15
Test statistic for Two Population
Proportions
The test statistic for p1 – p2 is a Z statistic:
Observed difference

 p N  pC    PN  PC  o
Z 
 1 1 
p (1  p)   
 0
n
 N nC  Null hypothesis

No. of subjects in IV
nitrate group No. of subjects in
control group
X N  XC XN XC
where p  , pN  , pC 
n N  nC nN nC
16
Testing significance at 0.05 level

-1.96 +1.96

Rejection Nonrejection region Rejection


region region
Z/2 = 1.96
Reject H0 if Z < -Z /2 or Z > Z /2

17
Two Population Proportions
(continued)

Z 
 0.06  0.178  1.79
 1 1 
0.116 (1  .116)   
 50 45 

38 3 8
where p  0.116 , p N   0.06 , p C   0.178
45  50 45 50

18
Statistical test for p1 – p2
Two Population Proportions, Independent Samples

 0.06  0.178 Two-tail test:


Z   1.79
 1 1  H0: pN – pC = 0
0.116 (1  .116 )   
 50 45  H1: pN – pC ≠ 0

/2 /2

Since -1.79 is > than -1.96, we fail to -z/2 z/2


reject the null hypothesis.
Z/2 = 1.96
But what is the actual p-value?
Reject H0 if Z < -Z/2
P (Z<-1.79) + P (Z>1.79)= ?
 or Z > Z/2
19
0.04 0.04

-1.79 +1.79
P (Z<-1.79) + P (Z>1.79)= 0.08
What is a P value?
 ‘P’ stands for probability
 Tail area probability based on the observed effect
 Calculated as the probability of an effect as large
as or larger than the observed effect (more
extreme in the tails of the distribution), assuming
null hypothesis is true

 Measures the strength of the evidence


against the null hypothesis
 Smaller P values indicate stronger evidence
against the null hypothesis

21
What is a P value?
 Fisher suggested 5% level (p<0.05) could be used
as a scientific benchmark for concluding that fairly
strong evidence exists against H0
 Was never intended as an absolute threshold
 Strength of evidence is on a continuum
 Simply noting the magnitude of the P-value should
suffice
 Scientific context is critical

 By convention, p-values of <.05 are often accepted as


“statistically significant” in the medical literature; but this is
an arbitrary cut-off.

22
What is a P value?
 P<0.05 is an arbitrary cut-point
 Does it make sense to adopt a therapeutic agent
because P-value obtained in a RCT was 0.049,
and at the same time ignore results of another
therapeutic agent because P-value was 0.051?

 Hence important to report the exact p-value


and not  0.05 or >0.05

23
P-values

Number dead / randomized


Trial Intravenous Control Risk Ratio 95% C.I. P value
nitrate

Chiche 3/50 8/45 0.33 (0.09,1.13) 0.08

Some evidence against the null hypothesis

Flaherty 11/56 11/48 0.83 (0.33,2.12) 0.70

Very weak evidence against the null hypothesis…very likely a chance


finding
Lis 5/64 10/76 0.56 (0.19,1.65) 0.29

Jugdutt 24/154 44/156 0.48 (0.28, 0.82) 0.007

Very strong evidence against the null hypothesis…very unlikely to be a


chance finding 24
Interpreting P values
If the null hypothesis were true…
Number dead / randomized
Trial Intravenous Control Risk Ratio 95% C.I. P value
nitrate

Chiche 3/50 8/45 0.33 (0.09,1.13) 0.08

…8 out of 100 such trials would show a risk reduction of 66% or more
extreme just by chance
Flaherty 11/56 11/48 0.83 (0.33,2.12) 0.70

…70 out of 100 such trials would show a risk reduction of 17% or more
extreme just by chance…very likely a chance finding
Lis 5/64 10/76 0.56 (0.19,1.65) 0.29

Jugdutt 24/154 44/156 0.48 (0.28, 0.82) 0.007

Very unlikely to be a chance finding


25
Interpreting P values
Trial Intravenous Control Risk ratio 95% P value
nitrate confidence
interval
Chiche 3/50 8/45 0.33 (0.09, 1.13) 0.08

Bussman 4/31 12/29 0.24 (0.08, 0.74) 0.01

Flaherty 11/56 11/48 0.83 (0.33, 2.12) 0.7

Jaffe 4/57 2/57 2.04 (0.39, 10.71) 0.4

Lis 5/64 10/77 0.56 (0.19, 1.65) 0.29

Jugdutt 12/77 44/157 0.48 (0.28, 0.82) 0.007

 Size of the p-value is related to the


sample size

 Lis and Jugdutt trials are similar in effect


(~ 50% reduction in risk)…but Jugdutt trial
has a large sample size 26
Interpreting P values
Trial Intravenous Control Risk ratio 95% P value
nitrate confidence
interval
Chiche 3/50 8/45 0.33 (0.09, 1.13) 0.08

Bussman 4/31 12/29 0.24 (0.08, 0.74) 0.01

Flaherty 11/56 11/48 0.83 (0.33, 2.12) 0.7

Jaffe 4/57 2/57 2.04 (0.39, 10.71) 0.4

Lis 5/64 10/77 0.56 (0.19, 1.65) 0.29

Jugdutt 12/77 44/157 0.48 (0.28, 0.82) 0.007

 Size of the p-value is related to the


effect size or the observed association or
difference
 Chiche and Flaherty trials approximately
same size, but observed difference greater
in the Chiche trial
27
P values
 P values give no indication about the clinical
importance of the observed association

 A very large study may result in very small p-


value based on a small difference of effect
that may not be important when translated
into clinical practice

 Therefore, important to look at the effect size


and confidence intervals…

28
Confidence intervals
“Statistics means never having to say you’re certain!”

 P values give no indication about the clinical


importance of the observed association

 Relying on information from a sample will always


lead to some level of uncertainty.

 Confidence interval is a range of values that tries


to quantify this uncertainty:
 For example , 95% CI means that under
repeated sampling 95% of CIs would contain
the true population parameter
29
P-values versus Confidence intervals
 P-value answers the question...
 "Is there a statistically significant difference
between the two treatments?“

 The point estimate and its confidence interval


answers the question...
 "What is the size of that treatment difference?",
and "How precisely did this trial determine or
estimate the treatment difference?"

30
Computing confidence intervals (CI)
 General formula:
(Sample statistic)  [(confidence level)  (measure of how
high the sampling variability is)]

 Sample statistic: observed magnitude of effect or


association (e.g., odds ratio, risk ratio)

 Confidence level: varies – 90%, 95%, 99%. For example,


to construct a 95% CI, Z/2 =1.96

 Sampling variability: Standard error (S.E.) of the estimate


is a measure of variability
Point estimate  ( Z 2  S.E.)
31
Confidence
intervals

 The above picture shows 50 realisations of a


confidence interval for μ.
 Each 95% confidence interval has fixed
endpoints, where μ might be in between (or
not).
 There is no probability of such an event!
32
Confidence
intervals

 Suppose α =0.05, we cannot say: "with probability 0.95


the parameter μ lies in the confidence interval."
 We only know that by repetition, 95% of the intervals will
contain the true population parameter (μ)
 In 5 % of the cases however it doesn't. And unfortunately
we don't know in which of the cases this happens.
 That's why we say: with confidence level 100(1 − α) % μ
lies in the confidence interval."
33
Interpretation of Confidence intervals
 Width of the confidence interval (CI)
 A narrow CI implies high precision
 A wide CI implies poor precision (usually due to
inadequate sample size)

 Does the interval contain a value that implies no


change or no effect or no association?
 CI for a difference between two means: Does
the interval include 0 (zero)?
 CI for a ratio (e.g, OR, RR): Does the interval
include 1?

34
Interpretation of Confidence intervals
Null value CI

No statistically significant change

Statistically significant increase

Statistically significant decrease

35
Connection between P-values and CIs
 If a 95% CI includes the null effect, the
P-value is >0.05 (and we would fail to
reject the null hypothesis)

 If the 95% CI excludes the null effect, the


P-value is <0.05 (and we would reject
the null hypothesis)

36
Interpreting confidence intervals
Number dead / randomized
Trial Intravenous Control Risk Ratio 95% C.I. P value
nitrate

Chiche 3/50 8/45 0.33 (0.09,1.13) 0.08

Wide interval: suggests reduction in mortality of 91% and an increase of


13%
Flaherty 11/56 11/48 0.83 (0.33,2.12) 0.70

Jaffe 4/57 2/57 2.04 (0.39,10.71) 0.40

Reduction in mortality as little as 18%, but little evidence to suggest that


IV nitrate is harmful
Jugdutt 24/154 44/156 0.48 (0.28, 0.82) 0.007

37
Table adapted from Whitley and Ball. Critical Care; 6(3):222-225, 2002
What about clinical importance?
“A difference, to be a difference, must make a
difference.” -- Gertrude Stein

 Does the confidence interval lie partly or


entirely within a range of clinical indifference?

 Clinical indifference represents values of


such a trivial size that you do not want to
change your current practice
 E.g., would you recommend a cholesterol-
lowering drug that reduced LDL levels by 2 units
in one year?
38
What about clinical importance?
 Clinical importance is a medical judgment, not
statistical!

 Clinicians should change practice only if they believe


the study has definitively demonstrated a treatment
difference and that the treatment difference is large
enough to be clinically important.

 Depends on your knowledge of


 a range of possible treatments
 their costs
 their side effects

39
Interpretation of Confidence intervals
Null value CI

Keep doing things the same way!


Range of clinical indifference

Sample size too small?


Range of clinical indifference

Statistically significant but no


Range of clinical indifference practical significance

Statistically significant and


Range of clinical indifference
practical significance
40
41
“Multivariate analysis showed that routine use of
immediate node dissection had no impact on survival
(hazard ratio 0·72,95% CI 0·5–1·02), whilst the status of
regional nodes affected survival significantly (p=0·007).”

Do you agree with the authors interpretation of


the results?

42
“Multivariate analysis showed that routine use of immediate node dissection
had no impact on survival (hazard ratio 0·72,95% CI 0·5–1·02), whilst the
status of regional nodes affected survival significantly (p=0·007).”

-- In isolation,the p value tells us that the result was not


(“statistically”) significant.

-- The point estimate of 0·72 and p-value of 0·07 suggest that the
result (or a result even more extreme) is consistent with a relative
survival benefit of 28%, and that the probability of the result being
due to chance is small in comparison.

--The 95% CI of 0.49-1.04 helps to shed more light…


• there might be a survival benefit of up to 50% for immediate
dissection (a hazard ratio for death of 0·5).
• there might be a survival detriment of up to 2% (a hazard ratio
of 1·02).
43
“Multivariate analysis showed that routine use of immediate node
dissection had no impact on survival (hazard ratio 0·72,95% CI 0·5–
1·02), whilst the status of regional nodes affected survival
significantly (p=0·007).”

• The results of the study are therefore


inconclusive, but we cannot rule out a
clinically relevant survival advantage.

• With a larger study population, a survival


benefit would probably beconfirmed
statistically.

44
Clinical vs statistical significance

Weinstein et al. JAMA 292:1188-94

Is physical activity associated with the risk of T2DM?


Clinical vs statistical significance

Weinstein et al. JAMA 292:1188-94

Is BMI associated with the risk of T2DM?

46
Reaction of investigator to results of a
statistical significance test

Statistical significance

Not significant Significant

Practical Not Happy Annoyed


importance of important
observed
effect Important Very sad Elated

47
Which statement(s) is/are correct?

The p-value is:

 The probability that my data are wrong.


 The probability of my data under the null
hypothesis.
 The probability that I erroneously find an
association.
 The probability that I find an association when
one exists.

48
Which of the following odds ratios for the relationship
between various risk factors and heart disease are
statistically significant at the .05-significance level?
Which are likely to be clinically significant?
Statistically Clinically
significant? significant?
A. Odds ratio for every 1-year increase
in age: 1.10 (95% CI: 1.01—1.19)
 

B. Odds ratio for regular exercise


 
(yes vs. no): 0.50 (95% CI: 0.30—0.82)

C. Odds ratio for high blood pressure


(high vs. normal): 3.0 (95% CI: 0.90— 
5.30)

D. Odds ratio for every 50-pound 


increase in weight: 1.05 (95% CI: 1.01—
1.20)
49
Summary of key points
 A P-value is a probability of obtaining an effect as
large as or larger than the observed effect, assuming
null hypothesis is true
 Provides a measure of strength of evidence
against the Ho
 Does not provide information on magnitude of the
effect
 Affected by sample size and magnitude of effect:
interpret with caution!
 Cannot be used in isolation to inform clinical
judgment

50
Summary of key points
 Confidence interval quantifies
 How confident are we about the true value
in the source population
 Better precision with large sample size
 Corresponds to hypothesis testing, but
much more informative than P-value

 Keep in mind clinical importance when


interpreting statistical significance!

51
Example: 95%CI for an odds ratio
  Cases Control  

Exposure + 20 10 30

Exposure - 6 24
30
 
OR = (20 * 24) / ( 6*10) = 8.0
ln (OR )  [1.96  S.E.( ln OR)]

1 1 1 1 1 1 1 1
ln ( 8 )  [1.96     ] ln ( 8 )  [1.96     ]
LCL  e 20 6 10 24
UCL  e 20 6 10 24

1 1 1 1 1 1 1 1
1.96    1.96   
95% CI  (8.0)e 20 6 10 24
, (8.0)e 20 6 10 24
 (2.47, 25.8)
52

You might also like