ProblemSheet1 23
ProblemSheet1 23
Section A questions
1. (a) Consider tossing a drawing pin [see figure at end]. Define the result of a toss to
be “heads” if the point lands downwards, and “tails” otherwise. Write p for the
probability that a toss will land point downwards. Think about p, and choose a, b,
so that a Beta(a, b) prior distribution approximates your subjective prior distribution
for p. [I used a = 2 and b = 3 but you may differ.]
(b) Now collect data. Toss a drawing pin 100 times and keep track of the number of
heads after 10, 50, and 100 tosses. You may find the result depends on the surface
you use. [I got 4, 16 and 26 heads after 10, 50 and 100 tosses.]
(c) Ask someone else what prior they chose. Think of your respective priors as a hy-
potheses about p. Who’s beliefs were better supported by the data? Compute a
Bayes factor comparing your priors. [for me the other person used a = 3 and b = 2.]
(d) Estimate a 95% HPD credible interval for p for each of the two priors you are con-
sidering, for the case when n = 10 trials. Write down the posterior averaged over
models, stating any assumptions you make, and estimate a 95% HPD credible interval
for p from the model averaged posterior.
2. (a) Specify a Metropolis-Hastings Markov chain Monte Carlo algorithm targeting p(x|θ)
where x ∈ {0, 1, ..., n} and
n
p(x|θ) = θx (1 − θ)n−x .
x
4. Let Γ(x; α, β) be the Gamma density. Consider Poisson observations Y = (Y1 , Y2 , ..., Yn )
with means λ = (λ1 , λ2 , ..., λn ) given by a mixture of Gamma densities: for shape pa-
rameters α1 , α2 and rate parameters β1 , β2 , a known mixture proportion 0 < p < 1 and
i = 1, 2, ..., n, we observe
Yi |λi ∼ Poisson(λi )
(all iid) with
λi ∼ pΓ(λi ; α1 , β1 ) + (1 − p)Γ(λi ; α2 , β2 ).
(a) Denote by π(α1 , β1 , α2 , β2 ) a prior for the unknown shape and rate parameters. Write
down the joint posterior for α1 , β1 , α2 , β2 and λ given Y1 , Y2 , ..., Yn . Give an MCMC
algorithm sampling α1 , β1 , α2 , β2 , λ|Y1 , ..., Yn .
(b) Integrate λ out of the joint posterior to obtain a marginal posterior denisty for
α1 , β1 , α2 , β2 |Y1 , ..., Yn . Comment briefly on how you would alter your MCMC algo-
rithm for the new target. What considerations would guide your choice of simulation
method (ie, whether to simulate the joint or the marginal posterior density)?
(e) Give a Gibbs sampler sampling π(θ|y) (Hint: π(θ, z|y) would be easier).
6. Let π(θ), θ ∈ R be a prior density for a scalar parameter, let p(y|θ), y ∈ Rn be the
observation model density and let π(θ|y) ∝ π(θ)p(y|θ) be the posterior density. Consider
a Markov chain simulated in the following way. Suppose θ(0) ∼ π(·) is a draw from the
prior and for t = 0, 1, 2, ... we generate a Markov chain by simulating data y (t) ∼ p(·|θ(t) )
and then θ(t+1) ∼ π(·|y (t) ).
(a) i. Calculate the joint density, p(θ(0) , θ(1) ) say, for θ(0) , θ(1) and show that p(θ(0) , θ(1) ) =
p(θ(1) , θ(0) ) (ie they are exchangeable).
ii. Show that marginally, θ(t) ∼ π(·) for all t = 0, 1, 2, ...
iii. Give the transition probability density K(θ, θ0 ) for the chain and show the chain
is reversible with respect to the prior π(θ).
(b) Suppose we are given an MCMC algorithm θ(T ) = M(θ(0) , T, y), initialised at θ(0) ,
D
and targeting the posterior π(θ|y) ∝ π(θ)p(y|θ), so θ(T ) → π(·|y) as T → ∞. Here
M is a function that moves us T steps forward in the MCMC run and this Markov
chain is just some MCMC algorithm for simulating π(θ|y) and so not related to the
Markov chain in the previous part.
Suppose we think we have chosen T sufficiently large that the chain has converged,
and so we believe θ(T ) ∼ π(·|y) is a good approximation.
Consider the following procedure simulating pairs (φi , θi ), i = 1, 2, ..., K: (Step 1)
parameter φi ∼ π(·) is an independent draw from the prior; (Step 2) synthetic data
yi0 ∼ p(·|φi ) is an independent draw from the observation model; (Step 3) the MCMC
(0)
algorithm M is initialised with a draw θi ∼ π (0) from an arbitrary fixed initial
(0)
distribution π (0) and (Step 4) we set θi = M(θi , T, yi0 ).
Let φ = (φ1 , ..., φK ) and θ = (θ1 , ..., θK ) be samples generated in this way.
i. Suppose the chain has indeed converged by T steps for all starting states θ(0) .
Let p(φ, θ) be the joint distribution of the random vectors φ and θ. Show that
p(φ, θ) = p(θ, φ).
ii. Give a non-parametric test for MCMC convergence which makes use of the result
in Question 6(b)i. Hint: the null is θ(T ) ∼ π(·|y).
7. (From Cox and Hinkley Theoretical Statistics) For i = 1, ..., n, let θi ∈ {0, 1} be the
indicator for the event that student i enjoys the course in 2023 and let θ = (θ1 , ..., θn ).
Suppose our prior probability for θi = 1, i = 1, ..., n is that they are iid with P (θi = 1) = p
with p our prior probability that an individual student enjoys the course and we take a
fixed value of p expressing our prior expectation for the proportion enjoying the course
(based perhaps on past years).
Our prior on the function q(θ) = n−1
P
i θi has mean p (that’s good) and variance p(1 −
p)/n. If n is large this prior expresses near certainty in the proportion of students enjoying
the course.
Criticise this prior elicitation and suggest an improvement. As a hint, something is wrong
with the prior variance of the random variable Q = q(θ) and we should change the prior
to fix this.
Section C questions
8. (MSc 2020 exam - students had a related practical in 2020) A book club with n members
wants to decide what book to read next. They have a shortlist of B books with labels
B = {1, ..., B}. Let PB be the set of all permutations of the labels in B. For i = 1, ..., n
the i’th reader gives a ranked list of the books yi = (yi,1 , ..., yi,B ), yi ∈ PB , ranking them
from most to least interesting. The data are y = (y1 , ..., yn ).
In a Plackett-Luce model each book b = 1, ..., B has interest measure θb > 0. Let θ =
(θ1 , ..., θB ), θ ∈ RB . Let Yi ∈ PB denote the random ranking from the i’th reader. In
the Plackett-Luce model, given Yi,1 = yi,1 , ..., Yi,a−1 = yi,a−1 , the a’th entry (ie, the next
entry) is decided by choosing book b with probability proportional to θb from the books
B \ {yi,1 , ..., yi,a−1 } remaining. The Y1 , ..., Yn are jointly independent given θ.
n Y
B
Y θyi,a
L(θ; y) = PB .
i=1 a=1 b=a θyi,b
QB
ii. The prior is πB (θ) = b=1 π(θi ) with π(θi ) = Γ(θi ; α0 , 1) with α0 > 0 given. Write
down the posterior density π(θ|y) and give an MCMC algorithm targeting π(θ|y).
iii. Explain why the scale β 0 in the prior Γ(α0 , β 0 ) for θi , i ∈ B may be set equal one.
Suppose odds of 1000 : 1 for ranking one book above another represent extreme
preference and are a priori unlikely for books on the shortlist. Explain how a
fixed numerical value of α0 might be chosen, noting any assumptions.
(b) Suppose B is large so each reader i = 1, ..., n only reports the first N entries xi =
(xi,1 , ..., xi,N ) in their ranking, with N B. Here xi,j = yi,j for i = 1, ..., n and
j = 1, ..., N . The data are x = (x1 , ..., xn ).
i. Show that the likelihood L(θ; x) for the new data is
n Y
N
Y θxi,a
L(θ; x) = PN P .
i=1 a=1 b=a θxi,b + d∈B\xi θd
Sn
ii. Let C = xi give the books appearing in at least one ranking and D = B \ C
i=1
P
be the books appearing in none. Let θC = (θb )b∈C and V = d∈D θd .
Write down the prior distribution of V and the likelihood L(θC , V ; x), and give
the posterior π(θC , V |x) as a function of θC and V .
iii. Give an MCMC algorithm targeting π(θC , V |x). State briefly why it may be more
efficient, for estimation of θC in the case |C| B, than MCMC targeting π(θ|x).