Bayesian Statistics
Bayesian Statistics
For the estimation of a binomial probability θ from a single observation x of the random variable X with the prior
distribution of θ being beta with parameters α and β, investigate the form of the posterior distribution of θ and
determine the Bayesian estimate of θ under quadratic loss.
x <- rep(0,n)
for (i in 1:n) {
mean(x)
Q1.
An insurance company designed a new product and wanted to assess its clients’ responses to the product. A survey was
carried out giving an opportunity to each participating client to give a positive or negative response to the product,
independently of other clients.
Let X be the random variable representing the positive responses to the new product.
Out of 160 clients who responded independently to the survey, 101 gave a positive response for the new product.
The probability of obtaining a positive response for the product is denoted by θ and a Beta prior distribution with
parameters (α, β) is assumed for θ. The posterior distribution of θ is proportional to:
f(θ|x) ∝ θ x + α –1(1 – θ) n – x + β – 1,
An Analyst consulted by the company suggests that based on previous experience, a Beta prior with parameters (40, 24)
is more appropriate.
(vi) Plot the new prior and posterior distributions of θ on the same graph from part (v). [3]
(vii) Comment on the plots obtained in parts (v) and (vi). [2]
The company will put the new product on the market only if there is a high probability that θ is higher than 60%.
(viii) (a) Calculate the probability P(θ > 0.6 | X) in the case of both priors; that is, Uniform(0,1) and Beta with
parameters (40, 24). [4]
Q2. A study was carried out to estimate the proportion, 𝑝, of workers that commute by train to work. A total of n = 200
workers were sampled at random and were asked the question: ‘Do you take the train to work?’ The workers’ answers
were recorded as a binary outcome, yi, for worker i, with 1 for yes and 0 for no. The data are available in the file
BinaryTrain.RData.
Two commuters, Alice and Norman, were interested in the study and proposed different prior distributions for the
proportion p.
Alice assumed a discrete prior distribution g(p) given in the following table:
Norman chose to use a beta prior distribution for p, with parameters 3 and 12.
(i) (a) Calculate the mean and the standard deviation for Alice’s prior distribution. [4]
(b) Generate 10,000 random values from Norman’s prior distribution. [1]
(c) Calculate the mean and standard deviation of the values generated in part (i)(b). [2]
(d) Comment on whether or not Alice and Norman have similar prior beliefs for p. [2]
Norman’s beta prior distribution for p is adopted for the remainder of the question.
(ii) Plot the shape of the posterior density of p without identifying it. [4]
(iii) Plot the density of Norman’s prior distribution by setting ylim = c(0,14). [3]
(iv) (a) Plot the posterior density of p by adding it to the plot in part (iii). [3]
(b) Compare the two densities using your answer in part (iv)(a). [1]
(c) Comment on the extent to which the posterior distribution is affected by the prior distribution. [1]
(v) Determine a 90% interval estimate for p based on its posterior distribution. [2]
(vi) Determine the exact posterior probability that p exceeds 0.25. [2]
(c) Compare your answer in part (vii)(b) with your answer in part (vi). [3] [Total 28]
Q3. Consider the n = 30 independent and identically distributed observations ( y1, y2, …, yn) given below from a random
variable Y with probability distribution function f (y, θ) = θy e–θ / y! .
y = c(5,5,6,2,4,10,2,5,5,2,5,3,7,4,4,5,4,6,7,2,8,4,6,4,3, 6,6,6,5,7)
By assuming a prior distribution proportional to e–αθ, we can show that the posterior distribution of θ is:
We can observe that the posterior distribution of θ is Gamma with parameters ∑yi –1 and n + α.
(i) (a) Plot the posterior probability density function of θ for values of θ in the interval [3.2, 6.8] and assuming α
= 0.01.
(b) Carry out a simulation of N = 5,000 posterior samples for the parameter θ. [8]
Two possible values for the true value of parameter θ are θ =15 and θ = 5.
(iv) Comment on these two values based on the posterior distribution of θ plotted in part (ii) and summarised in
part (iii). [3] [Total 16]