0% found this document useful (0 votes)
20 views5 pages

ProblemSheet1 23

The document outlines a problem sheet for SC7 Bayes Methods, focusing on various statistical concepts including prior distributions, Markov chain Monte Carlo algorithms, and posterior densities. It includes questions on practical applications such as radiocarbon dating, Poisson observations, and ranking models, requiring derivations and algorithm specifications. The document serves as a guide for students to apply Bayesian methods in statistical analysis and model building.

Uploaded by

jasonstreet312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

ProblemSheet1 23

The document outlines a problem sheet for SC7 Bayes Methods, focusing on various statistical concepts including prior distributions, Markov chain Monte Carlo algorithms, and posterior densities. It includes questions on practical applications such as radiocarbon dating, Poisson observations, and ranking models, requiring derivations and algorithm specifications. The document serves as a guide for students to apply Bayesian methods in statistical analysis and model building.

Uploaded by

jasonstreet312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SC7 Bayes Methods

First problem sheet (Sections 1-2 of lecture notes).

Section A questions
1. (a) Consider tossing a drawing pin [see figure at end]. Define the result of a toss to
be “heads” if the point lands downwards, and “tails” otherwise. Write p for the
probability that a toss will land point downwards. Think about p, and choose a, b,
so that a Beta(a, b) prior distribution approximates your subjective prior distribution
for p. [I used a = 2 and b = 3 but you may differ.]
(b) Now collect data. Toss a drawing pin 100 times and keep track of the number of
heads after 10, 50, and 100 tosses. You may find the result depends on the surface
you use. [I got 4, 16 and 26 heads after 10, 50 and 100 tosses.]
(c) Ask someone else what prior they chose. Think of your respective priors as a hy-
potheses about p. Who’s beliefs were better supported by the data? Compute a
Bayes factor comparing your priors. [for me the other person used a = 3 and b = 2.]
(d) Estimate a 95% HPD credible interval for p for each of the two priors you are con-
sidering, for the case when n = 10 trials. Write down the posterior averaged over
models, stating any assumptions you make, and estimate a 95% HPD credible interval
for p from the model averaged posterior.

2. (a) Specify a Metropolis-Hastings Markov chain Monte Carlo algorithm targeting p(x|θ)
where x ∈ {0, 1, ..., n} and
n
p(x|θ) = θx (1 − θ)n−x .
x

Prove that your chain is irreducible and aperiodic.


(b) Suppose now that the unknown true success probability for the Binomial random vari-
able X in part (a) is a random variable Θ which can take values in {1/2, 1/4, 1/8, ...}
only. The prior is
(
θ for θ ∈ {1/2, 1/4, 1/8, ...}, and
π(θ) =
0 for θ otherwise.

An observed value X = x of the Binomial variable in part (a) is generated by sim-


ulating Θ ∼ π(·) to get Θ = θ∗ say, and then X ∼ p(x|θ∗ ) as before. Specify a
Metropolis-Hastings Markov chain Monte Carlo algorithm simulating a Markov chain
targeting the posterior π(θ|x) for Θ|X = x.
Section B questions
3. In the radiocarbon dating example, suppose the dated materials are found in layers (strata)
piled up on one another, with yi,j the radiocarbon date for θi,j , the j’th date in the i’th
layer. Let L < ψ1 < ψ2 < ... < ψM < U be the age parameters for the layer boundaries. If
we have ni dates from the ith layer we know that for i = 1, 2, ..., M − 1, and j = 1, 2, ..., ni ,
ψi < θi,j < ψi+1 (so specimen dates in higher layers are not as old as dates in lower layers).
Let ψ = (ψ1 , ..., ψM ) and θ = (θ1 , ...θM −1 ) with θi = (θi,1 , ..., θi,ni ).
Derive a prior density π(θ, ψ) for the parameters θ, ψ with reference to the prior elicitation
checklist given in lectures. Hint: how are the layer boundary dates ψ2 , ..., ψM −1 generated?

4. Let Γ(x; α, β) be the Gamma density. Consider Poisson observations Y = (Y1 , Y2 , ..., Yn )
with means λ = (λ1 , λ2 , ..., λn ) given by a mixture of Gamma densities: for shape pa-
rameters α1 , α2 and rate parameters β1 , β2 , a known mixture proportion 0 < p < 1 and
i = 1, 2, ..., n, we observe
Yi |λi ∼ Poisson(λi )
(all iid) with
λi ∼ pΓ(λi ; α1 , β1 ) + (1 − p)Γ(λi ; α2 , β2 ).

(a) Denote by π(α1 , β1 , α2 , β2 ) a prior for the unknown shape and rate parameters. Write
down the joint posterior for α1 , β1 , α2 , β2 and λ given Y1 , Y2 , ..., Yn . Give an MCMC
algorithm sampling α1 , β1 , α2 , β2 , λ|Y1 , ..., Yn .
(b) Integrate λ out of the joint posterior to obtain a marginal posterior denisty for
α1 , β1 , α2 , β2 |Y1 , ..., Yn . Comment briefly on how you would alter your MCMC algo-
rithm for the new target. What considerations would guide your choice of simulation
method (ie, whether to simulate the joint or the marginal posterior density)?

5. Let X be an n × p design matrix with rows xi , i = 1, 2, ..., n and θ = (θ1 , θ2 , ..., θp )T a


p-component vector of parameters. Let z = (z1 , ..., zn ) be jointly independent normal
random variables, z ∼ N (Xθ, In ) with In the n × n identity. In the probit observation
model for y = (y1 , ..., yn ), we observe yi = 1 if zi > 0 and yi = 0 if zi ≤ 0.
Denote by π(θ, z) = π(θ)π(z|θ) the joint density of θ and z with π(θ) = N (θ; 0, Σ) a normal
prior for θ and Σ a p × p covariance matrix.

(a) Show that yi ∼ Bernoulli(Φ(xi θ)).


(b) Write the posterior π(θ, z|y) in terms of the model elements.
(c) Show that
p(θ|z) = N (θ; µ, V )
with µ = V X T z and V = (Σ−1 + X T X)−1 .
(d) Show that

N (zi ; xi θ, 1)Izi ≤0 if yi = 0
π(zi |yi , θ) ∝
N (zi ; xi θ, 1)Izi >0 if yi = 1

(e) Give a Gibbs sampler sampling π(θ|y) (Hint: π(θ, z|y) would be easier).

6. Let π(θ), θ ∈ R be a prior density for a scalar parameter, let p(y|θ), y ∈ Rn be the
observation model density and let π(θ|y) ∝ π(θ)p(y|θ) be the posterior density. Consider
a Markov chain simulated in the following way. Suppose θ(0) ∼ π(·) is a draw from the
prior and for t = 0, 1, 2, ... we generate a Markov chain by simulating data y (t) ∼ p(·|θ(t) )
and then θ(t+1) ∼ π(·|y (t) ).

(a) i. Calculate the joint density, p(θ(0) , θ(1) ) say, for θ(0) , θ(1) and show that p(θ(0) , θ(1) ) =
p(θ(1) , θ(0) ) (ie they are exchangeable).
ii. Show that marginally, θ(t) ∼ π(·) for all t = 0, 1, 2, ...
iii. Give the transition probability density K(θ, θ0 ) for the chain and show the chain
is reversible with respect to the prior π(θ).
(b) Suppose we are given an MCMC algorithm θ(T ) = M(θ(0) , T, y), initialised at θ(0) ,
D
and targeting the posterior π(θ|y) ∝ π(θ)p(y|θ), so θ(T ) → π(·|y) as T → ∞. Here
M is a function that moves us T steps forward in the MCMC run and this Markov
chain is just some MCMC algorithm for simulating π(θ|y) and so not related to the
Markov chain in the previous part.
Suppose we think we have chosen T sufficiently large that the chain has converged,
and so we believe θ(T ) ∼ π(·|y) is a good approximation.
Consider the following procedure simulating pairs (φi , θi ), i = 1, 2, ..., K: (Step 1)
parameter φi ∼ π(·) is an independent draw from the prior; (Step 2) synthetic data
yi0 ∼ p(·|φi ) is an independent draw from the observation model; (Step 3) the MCMC
(0)
algorithm M is initialised with a draw θi ∼ π (0) from an arbitrary fixed initial
(0)
distribution π (0) and (Step 4) we set θi = M(θi , T, yi0 ).
Let φ = (φ1 , ..., φK ) and θ = (θ1 , ..., θK ) be samples generated in this way.
i. Suppose the chain has indeed converged by T steps for all starting states θ(0) .
Let p(φ, θ) be the joint distribution of the random vectors φ and θ. Show that
p(φ, θ) = p(θ, φ).
ii. Give a non-parametric test for MCMC convergence which makes use of the result
in Question 6(b)i. Hint: the null is θ(T ) ∼ π(·|y).
7. (From Cox and Hinkley Theoretical Statistics) For i = 1, ..., n, let θi ∈ {0, 1} be the
indicator for the event that student i enjoys the course in 2023 and let θ = (θ1 , ..., θn ).
Suppose our prior probability for θi = 1, i = 1, ..., n is that they are iid with P (θi = 1) = p
with p our prior probability that an individual student enjoys the course and we take a
fixed value of p expressing our prior expectation for the proportion enjoying the course
(based perhaps on past years).
Our prior on the function q(θ) = n−1
P
i θi has mean p (that’s good) and variance p(1 −
p)/n. If n is large this prior expresses near certainty in the proportion of students enjoying
the course.
Criticise this prior elicitation and suggest an improvement. As a hint, something is wrong
with the prior variance of the random variable Q = q(θ) and we should change the prior
to fix this.

Section C questions
8. (MSc 2020 exam - students had a related practical in 2020) A book club with n members
wants to decide what book to read next. They have a shortlist of B books with labels
B = {1, ..., B}. Let PB be the set of all permutations of the labels in B. For i = 1, ..., n
the i’th reader gives a ranked list of the books yi = (yi,1 , ..., yi,B ), yi ∈ PB , ranking them
from most to least interesting. The data are y = (y1 , ..., yn ).
In a Plackett-Luce model each book b = 1, ..., B has interest measure θb > 0. Let θ =
(θ1 , ..., θB ), θ ∈ RB . Let Yi ∈ PB denote the random ranking from the i’th reader. In
the Plackett-Luce model, given Yi,1 = yi,1 , ..., Yi,a−1 = yi,a−1 , the a’th entry (ie, the next
entry) is decided by choosing book b with probability proportional to θb from the books
B \ {yi,1 , ..., yi,a−1 } remaining. The Y1 , ..., Yn are jointly independent given θ.

(a) i. Show that the likelihood L(θ; y) is

n Y
B
Y θyi,a
L(θ; y) = PB .
i=1 a=1 b=a θyi,b

QB
ii. The prior is πB (θ) = b=1 π(θi ) with π(θi ) = Γ(θi ; α0 , 1) with α0 > 0 given. Write
down the posterior density π(θ|y) and give an MCMC algorithm targeting π(θ|y).
iii. Explain why the scale β 0 in the prior Γ(α0 , β 0 ) for θi , i ∈ B may be set equal one.
Suppose odds of 1000 : 1 for ranking one book above another represent extreme
preference and are a priori unlikely for books on the shortlist. Explain how a
fixed numerical value of α0 might be chosen, noting any assumptions.
(b) Suppose B is large so each reader i = 1, ..., n only reports the first N entries xi =
(xi,1 , ..., xi,N ) in their ranking, with N  B. Here xi,j = yi,j for i = 1, ..., n and
j = 1, ..., N . The data are x = (x1 , ..., xn ).
i. Show that the likelihood L(θ; x) for the new data is

n Y
N
Y θxi,a
L(θ; x) = PN P .
i=1 a=1 b=a θxi,b + d∈B\xi θd

Sn
ii. Let C = xi give the books appearing in at least one ranking and D = B \ C
i=1
P
be the books appearing in none. Let θC = (θb )b∈C and V = d∈D θd .
Write down the prior distribution of V and the likelihood L(θC , V ; x), and give
the posterior π(θC , V |x) as a function of θC and V .
iii. Give an MCMC algorithm targeting π(θC , V |x). State briefly why it may be more
efficient, for estimation of θC in the case |C|  B, than MCMC targeting π(θ|x).

Statistics Department, University of Oxford


Geoff Nicholls: [email protected]

You might also like