0% found this document useful (0 votes)
23 views12 pages

BT Wk5 LectureNotes B

Hypothesis testing is a formal procedure used to compare competing theories, typically involving a Null Hypothesis (H0) and an Alternative Hypothesis (H1). The classical approach calculates a p-value to determine whether to reject H0 based on observed data, while the Bayesian approach incorporates prior probabilities to update beliefs about the hypotheses after observing data. Both methods have their strengths and weaknesses, particularly regarding the interpretation of p-values and the assumptions made in the testing process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

BT Wk5 LectureNotes B

Hypothesis testing is a formal procedure used to compare competing theories, typically involving a Null Hypothesis (H0) and an Alternative Hypothesis (H1). The classical approach calculates a p-value to determine whether to reject H0 based on observed data, while the Bayesian approach incorporates prior probabilities to update beliefs about the hypotheses after observing data. Both methods have their strengths and weaknesses, particularly regarding the interpretation of p-values and the assumptions made in the testing process.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

5.

4 L5B: Hypothesis Testing

Overview

Hypothesis testing is a formal procedure for comparing competing theories about natural phenomena. It
can be viewed as a key component of the scientific method and, in general, a means of advancing knowledge
and understanding.
The simplest scenario has two competing hypotheses, one labelled the Null Hypothesis and denoted H0 and
the other labelled the Alternative Hypothesis and denoted H1 . In our statistical framework these hypotheses
are typically statements about the possible values of a parameter (or parameters):
H0 : θ ∈ Θ0
H1 : θ ∈ Θ1
The sets defined by the hypotheses are mutually exclusive, Θ0 ∩ Θ1 = ∅, and (usually) exhaustive, i.e., their
union includes the entire parameter space, Θ0 ∪ Θ1 = Θ.

Example. The fraction of a specific variety of potatoes infected by a virus is approximately 0.15. A virus
resistant variety of potatoes will be planted this year and the hope is that the fraction infected will be less
than 0.15. Letting θ denote the probability that plant is infected, the two competing hypotheses are:
H0 : θ ≥ 0.15
H1 : θ < 0.15

Example. In multiple regression one is interested in knowing if one or more covariates have a linear
relationship with a response variable. For example, is this model, E[Y ]=β0 + β1 x1 + β2 x2 , correct? Two
hypotheses might be:
H0 : β1 = 0
H1 : β1 6= 0

Comments

• A Null Hypothesis that includes only a single value for θ it is called a Point Null Hypothesis (or Simple
Hypothesis). There are often practical problems with such hypotheses; e.g., β1 = 0.001 would be
contrary to H0 .
• Competing hypotheses can (sometimes) be viewed as competing models about phenomena. The above
hypotheses could be written as:
H0 ≡ M0 is the correct model : E[Y ] = β0 + β2 x2
H1 ≡ M1 is the correct model : E[Y ] = β0 + β1 x1 + β2 x2
And one can easily imagine a larger set of hypotheses or alternative models:
M1 : E[Y ] = β0
M2 : E[Y ] = β0 + β1 x1
M3 : E[Y ] = β0 + β2 x2
M4 : E[Y ] = β0 + β1 x1 + β2 x2

94
5.4.1 Classical Hypothesis Testing

The classical or frequentist approach to testing two hypotheses is:

• Assume that one hypothesis is true, H0 .


• Calculate a test statistic based on the observed sample data, T (yobs ) (that should be informative about
H0 and H1 ).
• Conditional on H0 being true, calculate the probability of observing sample data that would yield test
statistics as extreme or more extreme than T (yobs ).
That probability is called the p-value and is formally defined:

p − value = Pr(T (y) more extreme than T (yobs )|θ, H0 ) (5.9)

where “extremeness” is in the direction of the alternative hypothesis.


• If that probability is
– “sufficiently small”, “Reject H0 ” and “Accept H1 ”.
– “relatively large”, “Do not reject H0 .” (But Do Not Say “Accept H0 .”)

Example A. The sampling model is Normal(θ, σ 2 ), where θ is unknown but σ 2 is known and equals 2
(admittedly seldom realistic). The null hypothesis is that θ is less than or equal to 3 while the alternative
hypothesis is that θ is greater than 3:

H0 : θ ≤ 3
H1 : θ > 3

A random sample of n=10 is taken and the sample average is ȳ = 4. The test statistic:
ȳ − θ0
T (y) = p
σ 2 /n

where θ0 is a value in the set θ ≤ 3. Note that conditional on H0 , T (y) is Normal(0,1)9 . Given that there
are infinite number of values in the set Θ0 , the convention is to select the value of θ ∈ Θ0 that would yield
the largest p-value, in this case θ0 =3, and then the p-value is
!
4−3
Pr (T (y) ≥ T (observed)) = Pr T (y) ≥ p = 2.236 = 1 − Φ(2.236) = 0.013
2/10

where Φ(z) is the cumulative distribution function for a standard normal random variable. Note that
extremeness here is in the direction of H1 , namely, towards values of θ > 3. Such a p-value of 0.013 would
be considered by many to be “sufficiently small”, or statistically significant, and H0 would be rejected.

Example B. Two linear models for an expected outcome are proposed, where one model is nested inside
the other model:

M 1 : E[Y ] = β0 + β1 x
M 2 : E[Y ] = β0 + β1 x + β2 x2
9 The test statistic T (y) for this setting is sometimes written z and is called the z-statistic.

95
Equivalently,

H0 : β2 = 0
H1 : β2 6= 0
β̂2 −0
Assuming normality of Y , the common test statistic is the t-statistic, t = std.error( β̂2 )
. And extremeness in
this case would be values of t that are relatively far from 0, t << 0 or t >> 0.

Problems with classical hypothesis testing.

1. H0 and Ha must be structured such that “extremeness” in the direction of Ha is definable in order
to calculate the p-value. If one is comparing models that are not nested, “extremeness” is not readily
definable. For example, exponential “growth” versus linear ”growth” models:

M 1 : E[Y ] = β0 exp(β1 t)
M 2 : E[Y ] = β0 + β1 t

If H0 is that M 1 is true, and H1 is that M 2 is true, then assuming that H0 is true, what is a measure
of extremeness in the direction of H1 ?
2. The evidence is only against H0 as the p-value is calculated assuming that H0 is true.
• A small p-value indicates that the data are not what would be expected if H0 is true.
• A large p-value, however, does not mean that H0 is true, that the model implied by H0 is true,
as the calculation is made assuming that H0 is true—-so there is no weight of evidence for H0 .
• This is the reason that the frequentist conclusion given a large p-value is to say “fail to reject”
H0 , and Not to say “accept” H0 . You can’t accept something that you assumed was true in the
first place.
3. The p-value itself, e.g., 0.01, does not provide “weight of evidence” for the H0 . The p-value is a long-
run relative frequency measure: if H0 was true, only 1% of the time would the observed results or more
extreme results. The p-value is not the probability that H0 is true.
4. Calculation of P-values involves including values that were not even observed. This violates the Like-
lihood Principle10 .

Example C. (This example was discussed previously in Lecture Notes 1.) The sampling model for
the data is Poisson(θ) and there are two hypotheses about θ:

H0 : θ = 1 H1 : θ = 2

A sample size n=1 is drawn and yields the value y=2. The standard frequentist approach is to calculate
the p-value: the probability of the observed value and any values in a direction away from H0 in the
direction of H1 . In this case the p-value is Pr(Y ≥ 2|H0 ) = 1- Pr(Y = 0 ∪ Y = 1|θ = 1) = 0.26411 .
Thus, one would not reject H0 .
This procedure is violating the Likelihood Principle, however, in that inference is being based on more
than the likelihood of the data: the probability of events that did not occur, such as Y =3 or Y =4, is
being used as the basis for inference.
10 Reminder from Lecture 1 Notes: The Likelihood Principle says that given a sample of data, y, any two sampling models for

y, say p1 (y|θ) and p2 (y|θ), that have proportional likelihoods yield the same inference for θ. The main point is that inference
for θ depends on the observed y alone, not on unobserved values of y.
11 In R: 1-ppois(q=1,lambda=1)=0.2642411.

96
5.4.2 Bayesian Hypothesis Testing

Suppose that there are two hypotheses about a parameter θ:

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1

where Θ0 ∪ Θ1 is the entire parameter space and Θ0 ∩ Θ1 = ∅.


The Bayesian approach is to specify prior probabilities for each hypothesis:

p0 = Pr(H0 is true) = Pr(θ ∈ Θ0 )


p1 = Pr(H1 is true) = Pr(θ ∈ Θ1 )

, and where p0 + p1 = 1.
Then data, y, are collected and the posterior probabilities for each hypothesis are calculated:

Pr(H0 |y) = Pr(θ ∈ Θ0 |y)

And Pr(H1 |y) = 1- Pr(H0 |y).


The complexity of the calculation of the posterior probability is affected by the nature of the hypotheses,
i.e., simple or composite.

Simple

Simple hypotheses have single parameter values:

H0 : θ = θ0 versus H1 : θ = θ 1

Then
f (y|θ0 )p0 f (y|θ0 )p0
Pr(H0 |y) = Pr(θ = θ0 |y) = =
m(y) f (y|θ0 )p0 + f (y|θ1 )p1
f (y|θ1 )p1 f (y|θ1 )p1
Pr(H1 |y) = Pr(θ = θ1 |y) = =
m(y) f (y|θ0 )p0 + f (y|θ1 )p1
Given that Pr(H0 |y) + Pr(H1 |y) = 1, Pr(H1 |y) is simply 1 − Pr(H0 |y).
Note that to calculate posterior odds the normalizing constant m(y) need not be calculated:
Pr(H0 |y) f (y|θ0 )p0
=
Pr(H1 |y) f (y|θ1 )p1

Composite

Composite hypotheses include sets of parameter values:

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1

Letting π(θ) be the prior probability over the entire parameter space, the prior probability for Hypothesis i
is
Z
pi = π(θ)dθ
θ∈Θi

97
Thus the prior distribution for the parameter θ, π(θ), is inducing the prior for the hypothesis, pi 12 .
Now
R R
p(y, Hi ) pi p(y|Hi ) pi p(y, θ|Hi ) dθ pi f (y|θ)π(θ|Hi ) dθ
Pr(Hi |y) = = = =
m(y) m(y) m(y) m(y)
R π(θ) R
pi θ∈Θi f (y|θ) pi dθ f (y|θ)π(θ)dθ
Z
= = θ∈Θi = p(θ|y) dθ = Pr(θ ∈ Θi |y)
m(y) m(y) θ∈Θi

The key step in the above is the equality of integrating π(θ|Hi ) over the entire parameter space Θ and
integrating π(θ)/pi over the reduced parameter space Θi .
Stated most plainly, however, the posterior probability of Hi is simply the integral of the posterior for θ over
Θi .
The posterior odds of H0 against H1 can be written:
R
Pr(H0 |y) f (y|θ)π(θ)dθ Pr(θ ∈ Θ0 |y) Pr(θ ∈ Θ0 |y)
= Rθ∈Θ0 = =
Pr(H1 |y) θ∈Θ1
f (y|θ)π(θ)dθ Pr(θ ∈ Θ 1 |y) 1 − Pr(θ ∈ Θ0 |y)

Remarks

• Multiple Hypotheses. Multiple hypotheses can be handled similarly. The different hypotheses could
correspond to different sets of models: M1 , . . ., MK :

Hi : The correct model is model Mi


PK
One would assign priors to each hypothesis, p(Hi ), where i=1 p(Hi ) = 1. Then the posterior proba-
bility for model i:

Pr(Hi , y) Pr(Hi , y)
Pr(Hi |y) = = PK
Pr(y) j=1 Pr(Hj , y)

where the form of Pr(Hi , y) would depend upon whether Hi was simple or composite.
• Computational difficulties. For composite hypotheses, the integration needed to calculate Pr(θ ∈
Θ|y) may not be analytically tractable.

5.4.3 Bayes Factors

An alternative to calculating posterior probabilities for the hypotheses is Bayes factors. A Bayes factor is
the ratio of posterior odds to prior odds. The prior odds for H0 against H1 is the ratio p0 /p1 . E.g., if
p0 =0.6 and p1 =0.4, then 0.6/0.4 = 1.5 are the prior odds. The posterior odds for H0 against H1 is the ratio
Pr(H0 |y)/ Pr(H1 |y). The Bayes Factor for H0 against H1 , which is written BF01 , is

Pr(H0 |y)/ Pr(H1 |y) Pr(θ ∈ Θ0 |y)/ Pr(θ ∈ Θ1 |y)


BF01 = = (5.10)
p0 /p1 p0 /p1

Rules of thumb for interpreting Bayes Factors are given by Kass and Raftery (Journal of the American
Statistical Association Volume 90, 1995 - Issue 430 ):
12 Note: one can specify a prior for the hypothesis independent of the prior for θ; e.g., simply state that p0 =0.3 regardless of
the π(θ).

98
BF01 Interpretation
<3 No evidence for H0 over H1
>3 Positive evidence for H0
> 20 Strong evidence for H0
> 150 Very strong evidence for H0

Note: BF10 = 1/BF01 . And

• BF01 < 1
3 ⇒ BF10 > 3 ⇒ positive evidence for H1
• BF01 < 1
20 ⇒ BF10 > 20 ⇒ strong evidence for H1

Simple vs Simple

H0 : θ = θ0 vs H1 : θ = θ1 .
Pr(H0 |y)/ Pr(H1 |y) f (y|θ0 )p0 /f (y|θ1 )p1 f (y|θ0 )
BF01 = = = (5.11)
p0 /p1 p0 /p1 f (y|θ1 )

Thus the Bayes Factor is simply the ratio of the likelihoods, and the priors for the hypotheses are irrelevant.

Example C (continued). The sampling distribution for the data is Poisson(θ) and H0 : θ = 1 and
H1 : θ = 2. The prior for H0 is p0 =0.8, thus p1 =0.2. A single observation, n = 1, is observed with y = 2.
Then BF01 = e−1 12 /e−2 22 = 0.6796, and BF10 = 1.4715. Thus there is no evidence for H0 over H1 , or for
H1 over H0 .

5.4.4 Composite vs Composite

H0 : θ ∈ Θ0 vs H1 : θ ∈ Θ1 ; Θ0 ∪ Θ1 = Θ.
hR i hR i
Pr(H0 |y)/ Pr(H1 |y) θ∈Θ0
f (y|θ)π(θ)dθ / θ∈Θ1
f (y|θ)π(θ)dθ Pr(θ ∈ Θ0 |y)/ Pr(θ ∈ Θ1 |y)
BF01 = = =
p0 /p1 p0 /p1 p0 /p1
(5.12)

Simple vs Composite

H0 : θ = θ0 vs H1 : θ 6= θ0 .
h R i

Pr(H0 |y)/ Pr(H1 |y) [f (y|θ0 )p0 ]/ p1 −∞ f (y|θ)π(θ)dθ f (y|θ0 ) f (y|θ0 )
BF01 = = = R∞ = (5.13)
p0 /p1 p0 /p1 −∞
f (y|θ)π(θ)dθ m(y)

Notes:

• As for the simple versus simple case, the prior probabilities for the hypotheses are cancelling out.

99
5.4.5 Example D: Simple Null and Simple Alternative

The number of hairs per square inch of mohair fabric used by a teddy bear manufacturer is assumed to have
a Poisson(θ) distribution (King and Ross, 2017). The manufacturer wants to test the hypotheses:

H0 : θ = 100; H1 : θ = 110

To test these hypotheses an independent random sample of n pieces of fabric is drawn and the number of
hairs per square inch, y = y1 , . . . , yn , is recorded.

1. The Bayes Factor, BF01 :

Pr(H0 |y)/ Pr(H1 |y) Pr(y|H0 ) Pr(H0 )/ Pr(y|H0 ) Pr(H0 )


BF01 = =
Pr(H0 )/P r(H1 ) Pr(H0 )/P r(H1 )
nȳ
 nȳ
Pr(y|H0 ) exp(−100 ∗ n)100 100
= = = exp(10n)
Pr(y|H1 ) exp(−110 ∗ n)110nȳ 110

2. Given n=10 and ȳ = 102.7:


 10∗102.7
100
BF01 = exp(10 ∗ 10) = 8.301576
110

which is between 3 and 20, thus “positive evidence” for H0 .


3. Assume the priors for H0 and H1 were p0 =p1 =0.5. The posterior probabilities for each hypothesis can
be calculated directly from the Bayes Factor as follows:
BF01 8.301576
Pr(H0 |y) = = = 0.8924913
1 + BF01 1 + 8.301576
Exercise: show why the above formula works.

Exercise: Given yi , i=1,. . .,10 are independent Normal(µ,1) random variables, where observe data:

3.4, 2.9, 3.0, 3.5, 3.3, 3.7, 2.7, 3.9, 2.7, 2.9

Test the simple hypothesis: H0 : µ = 3 vs H1 : µ = 3.5. Show that the Bayes Factor BF01 = 1.28.

5.4.6 Example E: Composite Null and Composite Alternative

A food manufacturer is considering releasing a new flavour of hummus, but before doing so wants to carry
out an experiment with volunteers to see whether this new flavour is liked better than a competitor’s version
(based on example from Carlin and Louis, 2009). They would like to be “pretty sure” that the new flavour
is preferred by at least 60% of hummus consumers. Letting θ be the probability that the new flavour is
preferred, there are two hypotheses:

H0 : θ ≥ 0.6 vs H1 : θ < 0.6

If there is strong evidence for H0 , they will release the new flavour. The manufacturer would prefer to be
cautious and selects a Beta prior for θ that has an expected value of 0.5 and a coefficient of variation of 0.3
(thus a standard deviation of 0.5*0.3=0.15). That translates into Beta(5.056, 5.056).

100
The induced prior probability for H0 is then13 :
Z 1 Z 1
1 Γ(10.13)
p0 = θ4.06 (1 − θ)4.06 dθ = θ4.06 (1 − θ)4.06 dθ = 0.265
0.6 Be(5.056, 5.056) 0.6 Γ(5.065)Γ(5.065)

Thus p1 = 0.735.
To test these hypotheses, a taste preference study is carried with n=16 volunteers. How would you recom-
mend that such a study be carried out?
Assume that the probability of preferring the new flavour is the same for all volunteers and the responses
are independent. Then, letting y be the number preferring the new flavour, y ∼ Binomial(16, θ). After the
study was completed, 13 of the 16 volunteers preferred the new flavour. What are the posterior probabilities
for H0 and H1 ? And what is BF01 ?
To begin, note that Pr(H0 |y) is the same as Pr(θ ≥ 0.6|y). We know that the Beta distribution is conjugate
for the Binomial distribution and the posterior is Beta(α + y, β + n − y), or in this case, Beta(5.056+13,
5.056+16-13) = Beta(18.056, 8.056). Therefore:
Z 1
1
Pr(H0 |y = 13) = θ18.056−1 (1 − θ)8.056−1 dθ = 0.8448
0.6 Be(18.056, 8.056)
Pr(H1 |y = 13) = 1 − Pr(H0 |13) = 0.1552

Note: the R code for Pr(θ ≥ 0.6) = 1-pbeta(0.6,18.056,8.056) = 0.8447625. And the Bayes Factor:

Pr(H0 |y = 13)/ Pr(H1 |y = 13) 0.8448/0.1552


BF01 = = = 15.1
Pr(H0 )/ Pr(H1 ) 0.265/0.735

which is between 3 and 20, thus “positive evidence” for H0 .


To evaluate the sensitivity of the resulting posterior probabilities and the Bayes Factor, three other priors
for θ were considered: Beta(0.5,0.5) or Jeffreys’ prior, Beta(1,1) or a Uniform(0,1), and Beta(2,2). The four
prior densities are shown in Figure 5.8.
The posterior quantiles and means for θ, given y=13 for n=16, as well as Pr(H0 ) and BF01 are shown in
Table 5.1. The resulting posterior distributions are shown in Figure 5.9. As can be seen, the initial prior is
the most skeptical regarding the preference for the new hummus.

Table 5.1: Numerical summaries of posterior quantities for taste preference study.

π(θ) p0 0.25 0.50 0.75 Mean Pr(Ho|y) BF(0,1)


Beta(5.056, 5.056) 0.265 0.63 0.70 0.76 0.69 0.84 15.06
Beta(0.5, 0.5) 0.436 0.74 0.81 0.87 0.79 0.96 34.43
Beta(1, 1) 0.400 0.72 0.79 0.85 0.78 0.95 30.81
Beta(2, 2) 0.352 0.69 0.76 0.82 0.75 0.93 24.60

13 In R: 1-pbeta(0.6,5.056,5.056)=0.265.

101
Figure 5.8: Four prior distributions for θ in the hummus taste preference study. The vertical line at 0.6
marks the division between H0 and H1 .

2.5
Beta(5.056, 5.056)
Beta(0.5, 0.5)
Beta(1, 1)
Beta(2, 2)

2.0
1.5
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Figure 5.9: Four posterior distributions for θ in the hummus taste preference study given y=13 in n=16
trials. The vertical line at 0.6 marks the division between H0 and H1 .

Beta(18.056, 8.056)
Beta(13.5, 3.5)
Beta(14, 4)
4

Beta(15, 5)
3
2
1
0

0.0 0.2 0.4 0.6 0.8 1.0

102
5.4.7 Example F: Simple Null and Composite Alternative

The sampling distribution is Poisson(θ). The null hypothesis is H0 : θ = 5 and the alternative is H1 : θ 6= 5,
where p0 =0.7. A Gamma prior distribution is chosen for θ such that E[θ]=5 with a CV of 0.1, thus a
Gamma(100,20).
A random sample of n=8 is drawn yielding the following values

3, 3, 3, 3, 5, 7, 7, 4
P8
Note: ȳ=4.375, and θ|y ∼ Gamma(100+ i=1 yi , 20+n) = Gamma(135,28).
To find the posterior probabilities:
8
Pr(H0 , y) Y e−5 5yi e−40 535
Pr(H0 |y) = ∝ p0 = 0.7 ∗ Q8 = 9.12873e − 08
m(y) i=1
yi ! i=1 yi !
Z ∞ −θ∗8 35
Pr(H1 , y) e θ 20100 100−1 −20∗θ
Pr(H1 |y) = ∝ p1 Q8 θ e dθ
m(y) 0 i=1 yi !
Γ(100)
1 20100 Γ(135)
= 0.3 ∗ Q8 = 3.684845e − 08
i=1 yi !
Γ(100) (28)135

Then
9.12873e − 08
Pr(H0 |y) = = 0.7124
9.12873e − 08 + 3.684845e − 08
3.684845e − 08
Pr(H1 |y) = = 0.2876
9.12873e − 08 + 3.684845e − 08
And the Bayes Factor14 for H0 against H1 :
0.7124/0.2376
BF01 = = 1.0617
0.7/0.3
which implies no evidence of H0 over H1 or vice versa.

5.4.8 Multiple Hypotheses

As said previously, multiple models can be viewed as multiple hypotheses. From an example by Lavine15 ,
a primary (elementary) school in Fresno, California had two high-voltage transmission lines nearby and
the cancer rate amongst staff was a concern as 8 of the 145 staff had developed invasive cancers. Assume
independence between staff and identical probabilities for cancer. Let y denote the number developing cancer
and θ the probability of cancer. Then y ∼ Binomial(n=145, θ) is the sampling model.
Based on data collected at a national level (for approximately the same age of the staff, mostly women, and
number of years of working), the expected number of cancers for 145 staff was estimated to be 4.2. Translating
that into a probability, one hypothesis was that θ=4.2/145 ≈ 0.03. However, different individuals thought
the rate was higher and three alternative hypotheses were postulated:

H1 : θ = 0.03, H2 : θ = 0.04, H3 : θ = 0.05, H4 : θ = 0.06


−40 35
14 Note: e 5
a simpler calculation based on the right-most term in Eq’n 5.13 is f (y|H0 )/m(y), where f (y|H0 )= Q 8 yi !
=
i=1
Q8 1 20100 Γ(135)
1.304104e-07 and m(y) = yi ! Γ(100) (28)135
=1.228282e-07, and BF01 =1.0617.
i=1
15 “What is Bayesian statistics and why everything else is wrong”.

103
These four hypotheses can be viewed as 4 models. Lavine proposed that a priori, H1 was as likely to be
right as it was to be wrong, thus the prior for H1 was Pr(H1 ) = 1/2. Then he assumed that any of the
remaining hypotheses was equally likely, thus Pr(H2 ) = Pr(H3 ) = Pr(H4 )=1/6. The posterior probabilities
for the four hypotheses can be viewed as the relative weight of evidence for the competing theories:

Pr(y = 8|H1 ) Pr(H1 )


Pr(H1 |y = 8) = P4
i=1 Pr(y = 8|Hi ) Pr(Hi )
0.038 ∗ 0.97137 ∗ 12
= 1 = 0.23
0.038 ∗ 0.97137 ∗ 2 + 0.048 ∗ 0.96137 ∗ 61 + 0.058 ∗ 0.95137 ∗ 1
6 + 0.068 ∗ 0.94137 ∗ 1
6

Repeating for H2 , H3 , and H4 :

Pr(H1 |y = 8) = 0.23, Pr(H2 |y = 8) = 0.21, Pr(H3 |y = 8) = 0.28, Pr(H4 |y = 8) = 0.28

Thus, one could conclude that given the data and the priors, each of the four hypotheses are about equally
likely. Or that the weight of evidence for each model is about the same. The posterior odds that the
cancer rate is higher than the national average, or the posterior odds of H2 or H3 or H4 against H1 is
(0.21+0.28+0.28)/0.23 =3.3. Given that the prior odds of H2 or H3 or H4 and H1 are 1, this is also the
Bayes Factor and by the Kass and Raftery criteria this is just above the “positive evidence” lower bound of
3.

Contrast with Frequentist Approach. Lavine also carried out the frequentist analysis H0 : θ = 0.3
against the alternative H1 : θ > 0.3. The P-value is the probability of observing an outcome equal to what
was observed, 8 occurrences of cancer in 145 staff, and anything more extreme in the direction of H1 16 :

Pr(Y ≥ 8|θ = 0.3) = Pr(Y = 8|θ = 0.3) + Pr(Y = 9|θ = 0.3) + . . . + Pr(Y = 145|θ = 0.3)
= 1 − Pr(Y < 8|θ = 0.3, n = 145) = 0.0717

This would be considered “significant” evidence against H0 if the cut-off was 0.10. However, as Lavine points
out this P-value does not account for how well the other hypotheses explain the data, information about
things that did not happen (e.g., there were Not 9, nor 10, nor 11, and so on incidences of cancer), and the
Likelihood Principle is not obeyed.

16 This can be calculated in R by 1-pbinom(7,size=145,prob=0.03).

104
R code

Taste preference code.


Code to produce the prior pdf plots in Figure 5.8.

# 4 sets of priors for theta


prior.a.set <- c(5.056,0.5,1,2)
prior.b.set <- c(5.056,0.5,1,2)
n <- 16
y <- 13
post.a.set <- prior.a.set + y
post.b.set <- prior.b.set + n-y

#- plot of prior distributions


theta.seq <- seq(0,1,by=0.01)
my.lwd <- 1.5
plot(theta.seq,dbeta(theta.seq,prior.a.set[1],prior.b.set[1]),type="l",
xlab=expression(theta),ylab="",col=1,lty=1,xlim=c(0,1),lwd=my.lwd)
for(j in 2:4) {
lines(theta.seq,dbeta(theta.seq,prior.a.set[j],prior.b.set[j]),
col=j,lty=j,lwd=my.lwd)
}
abline(v=0.6,col="purple")
legend("topleft",legend=paste0("Beta(",prior.a.set,", ",prior.b.set,")"),
lty=1:4,col=1:4,lwd=my.lwd)

Calculation of posterior quantiles, mean, Pr(H0 |y), and BF01 .

#--- Calculation of quantiles, mean, Pr(Ho) and BF(0,1)


out.mat <- matrix(data=NA,nrow=4,ncol=6)
dimnames(out.mat) <- list(paste0("Beta(",prior.a.set,", ",prior.b.set,")"),
c("0.25","0.50","0.75","Mean","Pr(Ho|y)","BF(0,1)"))
for(i in 1:4) {
out.mat[i,1:3] <- qbeta(c(0.25,0.50,0.75),post.a.set[i],post.b.set[i])
out.mat[i,"Mean"] <- post.a.set[i]/(post.a.set[i]+post.b.set[i])
p.Ho.y <- 1-pbeta(0.6,post.a.set[i],post.b.set[i])
p.Ho <- 1-pbeta(0.6,prior.a.set[i],prior.b.set[i])
out.mat[i,"Pr(Ho|y)"] <- p.Ho.y
out.mat[i,"BF(0,1)"] <- (p.Ho.y/(1-p.Ho.y))/(p.Ho/(1-p.Ho))
}
print(round(out.mat,3))

And the plots of the posterior distributions:

plot(theta.seq,dbeta(theta.seq,post.a.set[1],post.b.set[1]),type="l",
xlab=expression(theta),ylab="",col=1,lty=1,xlim=c(0,1),lwd=my.lwd)
for(j in 2:4) {
lines(theta.seq,dbeta(theta.seq,post.a.set[j],post.b.set[j]),
col=j,lty=j,lwd=my.lwd)
}
abline(v=0.6,col="purple")
legend("topleft",legend=paste0("Beta(",post.a.set,", ",post.b.set,")"),
lty=1:4,col=1:4,lwd=my.lwd)

105

You might also like