0% found this document useful (0 votes)
8 views7 pages

Problem Set 1 Sol

The document provides solutions to a problem set covering Bayesian inference, including point estimation, posterior distributions, and the relationship between prior and posterior means and variances. It discusses the application of Poisson and Pareto distributions in modeling data, as well as the implications of prior information on posterior estimates. The document emphasizes the importance of updating beliefs based on observed data and includes practical examples and MATLAB/R code for visualization.

Uploaded by

Pinjala Anoop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Problem Set 1 Sol

The document provides solutions to a problem set covering Bayesian inference, including point estimation, posterior distributions, and the relationship between prior and posterior means and variances. It discusses the application of Poisson and Pareto distributions in modeling data, as well as the implications of prior information on posterior estimates. The document emphasizes the importance of updating beliefs based on observed data and includes practical examples and MATLAB/R code for visualization.

Uploaded by

Pinjala Anoop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Course 52558: Problem Set 1 Solution

1. A Thought Experiment. Let θ be the true proportion of men in Israel


over the age of 40 with hyper-tension.
(a) Though you may have little or no expertise in this area, use your
social knowledge and common sense to give an initial point estimate
(single value) of θ.
Solution:
As an example lets guess that 20% of the population has hypertension
so that our point “estimate” for θ is 0.2.
(b) Now suppose that in a properly designed survey, of the first 5 ran-
domly selection men, 4 are hypertensive. How does this information
effect your initial estimate of θ?.
Solution:
Naturally, having observed that 80% of the sample is hyper-tensive
causes us to suspect that our initial estimate may have been low.
Yet, due to the small sample we still believe that it is not the case
the most of the population is hyper-tensive and might update our
belief to, say, one third.
(c) Finally suppose that at the survey’s completion, 400 of 1000 men
have emerged as hypertensive. Now what is your estimate of θ?
Solution:
A properly selected random sample of 1000 is significant enough to
convince us that the true value of θ is close to 0.4. Although this
is higher than our original estimate, it is still less than the majority
and we are willing to let the sample surprise us to that extent.
(d) What guidelines for statistical inference do your answers suggest?
Solution:
The above suggests the following sensible data analysis steps
i. Plausible starting estimates
ii. Gradual revision of beliefs
iii. Eventual convergence to the data
2. Relationship between posterior and prior mean and variance.
(a) Show that for any continuous random variables X and Y

E(X) = E(E(X | Y ))

1
(Note that a similar proof can be used for the discrete case)
Solution:

Z Z Z Z
E(X) = Xp(X, Y )dXdY = Xp(X | Y )dup(Y )dv
Z
= E(X | Y )p(y)dy = E(E(X | Y ))

(b) Show that for any random variables X and Y

var(X) = E(var(X | Y )) + var(E(X | Y ))

Solution:
For variety, I will prove this one for the discrete case

E(var(X | Y ) + var(E(X | Y ))
= E(E(X 2 | Y ) − (E(X | Y ))2 ) + E((E(X | Y ))2 ) − (E(E(X | Y )))2
= E(X 2 ) − E((E(X | Y ))2 ) + E((E(X | Y ))2 ) − (E(X))2
= E(X 2 ) − (E(X))2 = var(X)

(c) Let y denote the observed data. We assume y was generated from
p(y | θ), where θ, the parameters governing the sampling of y are
random and distributed according to p(θ). Use the above and de-
scribe (i.e. understand the equation and then put into words) the
relationship between the mean and variance of the prior p(θ) and
and the posterior p(θ | y).
Solution:
Plugging θ and y into the first equation gives us

E(θ) = E(E(θ | y))

which means that the prior mean over the parameters is the overage
over all possible posterior means over the distribution of possible
data. This is in the opposite direction of the way we are used to
think of priors and posterior but in fact matches our intuition of
what the prior should capture. The second equation

var(θ) = E(var(θ | y)) + var(E(θ | y))

is interesting because it means that the posterior variance (left term


of right hand-side) is on average smaller than the prior variance (left-
side) by an amount that depends on the variation of the posterior
mean. Importantly, the greater the variation of the posterior mean,
the greater our potential for reducing our uncertainty about θ.

2
3. Posterior of a Poisson Distribution. Suppose that X is the number
of pregnant woman arriving at a particular hospital to deliver babies in
a given month. The discrete count nature of the data plus its natural
interpretation as an arrival rate suggest adopting a Poisson likelihood

e−θ θx
p(x | θ) = , x ∈ {0, 1, 2, . . .}, θ > 0
x!
To provide support on the positive real line and reasonable flexibility we
suggest a Gamma G(α, β) distribution prior

θα−1 e−θ/β
p(θ) = , θ > 0, α > 0, β > 0
Γ(α)β α
where Γ() is a continuous generalization of the factorial function so that
Γ(c) = cΓ(c − 1). α, β are the parameters of this prior, or the hyper-
parameters of the model. The Gamma distribution has mean αβ and
variance αβ 2 .
Show that the posterior distribution p(θ | x) is also Gamma distributed.
Determine its parameters α and β.
Solution:
Although the question was phrased for a univariate x, for generality, in
the solution x will be a vector of observations xi , each of which follows
the Poisson distribution. In this case, we have
P
p(x | θ) ∝ θ i xi e−nθ

Together with the prior we can the write


P
p(θ | x) ∝ θ i xi e−nθ θα−1 e−θ/β
P 1
= θ i xi +α−1 e−θ(n+ β )

This is an unnormalized Gamma distribution so that


X 1 β
θ | x ∼ Gamma( xi + α, 1 = nβ + 1 )
i
n+ β

4. Posterior of the Poisson Model.


In this question we will use Matlab/R to explore the Poisson model with
the Gamma prior considered above.
(a) The Matlab (GammaPrior.m) R (GammaPrior.txt) files in the Code
directory of the course web page can be used to plot the Gamma prior.
Use this code to investigate different values for α and β. Describe the
qualitative behavior of this prior as a function of these parameters
and try to explain why they are called ’shape’ and ’scale’ parameters,
respectively.

3
Solution:
For a fixed α, the larger the value of β the wider the distribution and
thus behaves like a stretching or scale parameter. For a fixed β, α
determines the form of the distribution. At α = 1 we see a singular
point where for alpha <= 1 the distribution starts with a high value
and decays exponentially, and for alpha > 1 the distribution is a
unimodal one that resembles a normal distribution more as α grows.
(b) Continuing the previous question involving births, assume that in
December 2008 we observed x = 42 moms arriving at the hospital
to deliver babies, and suppose we adopt a Gamma(5,6), which has
mean 30 and variance 180, reflecting the hospital’s total for the two
preceding years. Use Matlab/R to plot the posterior distribution of
θ next to its prior. What are your conclusions?
Solution:
In this case we have a single observation of x = 42 and our posterior
is Gamma(42 + 5, 6/(1 ∗ 6 + 1)).

0.07

Prior
0.06
Posterior

0.05
P(theta)

0.04

0.03

0.02

0.01

0
0 10 20 30 40 50 60 70 80 90 100
theta

It is clear that the mode of the posterior distribution is attracted


by the maximum likelihood mode of 42 and the distribution is more
concentrated than the prior since our uncertainty has diminished.
(c) Repeat the above for different values of x. What are your conclusions.
Note that different values of x only effect α and not β so that the
most significant change is the location of the mode of the posterior
distribution. When closer the maximum likelihood mode to that of
the prior, the closer the posterior. Since α effects the mode linearly
the posterior mode is at approximately (because of the -1 term) a
constant fraction of the way between the prior and the posterior.
The variance grows linearly with α and so larger values of x also lead
to a wider posterior.

4
5. Extinction of Species. Paleobotanists estimate the moment in the re-
mote past when a given species became extinct by taking cylindrical, ver-
tical core samples well below the earths surface and looking for the last
occurrence of the species in the fossil record, measured in meters above
the point P at which the species was known to have first emerged. Letting
{y1 , . . . , yn } denote a sample of such distances above P at a random set
of locations, the model
(yi |θ) ∼ Unif(0, θ)
emerges from simple and plausible assumptions. In this model the un-
known θ > 0 can be used, through carbon dating, to estimate the species
extinction time. This problem is about Bayesian inference for θ, and it
will be seen that some of our usual intuitions do not quite hold in this
case.
(a) Show that the likelihood may be written as
l(θ : y) = θ−n I(θ ≥ max(y1 , . . . , yn ))
where I(A) = 1 if A is true and 0 otherwise.
Solution:

Y Y1
l(θ : y) = p(yi | θ) = I(0 < yi ≤ θ)
i i
θ
Y
= θ−n I(yi ≤ θ) = θ−n I(θ ≥ max(y1 , . . . , yn ))
i

where we also used the fact that all yi , by the experiment design are
guaranteed to be positive.
(b) The Pareto(α, β) distribution has density

αβ α θ−(α+1)

θ≥β
p(θ) =
0 otherwise
αβ
where α, β > 0. The Pareto distribution has mean α−1 for α > 1
2
αβ
and a variance of (α−1)2 (α−2) for alpha > 2.
With the likelihood viewed as a constant multiple of a density for
θ, show that the likelihood corresponds to the Pareto(n1, m) distri-
bution. Now let the prior for θ be taken to be Pareto(α, β) and
derive the posterior distribution p(θ|y). Is the Pareto conjugate to
the uniform?
Solution:
We define m = (y1 , . . . , yn ) (this was supposed to be part of the
question but was omitted by mistake). The likelihood can then be
written as
l(θ : y) = θ−n I(m ≤ θ) ∝ θ−[(n−1)+1] (n − 1)mn−1 1(m ≤ θ)

5
which is, by definition, a Pareto distribution with parameters n − 1
and m. Together with a Pareto(α, β) prior we have

p(θ | y) ∝ θ−n I(m ≤ θ)αβ α θ−(α+1) I(β ≤ θ)


∝ θ−(n+α+1) I(max(β, m) ≤ θ)

which is a Pareto(n+α, max(β, m)) distribution and thus the uniform


distribution is indeed a conjugate prior.
(c) In an experiment in the Antarctic in the 1980s to study a particular
species of fossil ammonite, the following is a linearly rescaled version
of the data obtained: y = (0.4, 1.0, 1.5, 1.7, 2.0, 2.1, 3.1, 3.7, 4.3, 4.9).
Prior information equivalent to a Pareto prior with (α, β) = (2.5, 4)
was available. Plot the prior, likelihood, and posterior distributions
arising from this data set on the same graph, and briefly discuss what
this picture implies about the updating of information from prior to
posterior in this case.
Solution:
In our case n = 10 and m = max(y1 , . . . , y1 0) = 4.9 and max(m, β) =
4.9. Thus the likelihood is a Pareto(9, 4.9) and the posterior is a
Pareto(12.5, 4.9). Using Matlab, a call to the generalized Pareto dis-
tribution gppdf (α, β, β) can be used to plot the Pareto distribution.

0.25
Prior
Posterior
Likelihood
0.2

0.15
p(theta)

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10

theta

(d) Make a table summarizing the mean and standard deviation for the
prior, likelihood and posterior distributions, using the (α, β) choices
and the data in part (d) above. In Bayesian updating the posterior
mean is often a weighted average of the prior mean and the likelihood
mean (with positive weights), and the posterior standard deviation

6
is typically smaller than either the prior or likelihood standard devi-
ations. Are each of these behaviors true in this case? Explain briefly.
Solution:
For the prior,likelihood and posterior distribution, the shape (α) pa-
rameter of the Pareto distribution is greater than 2 so that both the
mean and standard deviation are finite (see equations above). Specif-
ically we have
Prior Likelihood Posterior
Mean 6.667 5.513 5.326
STD 5.9628 0.6945 0.4649
In this case the posterior standard deviation, as expected is smaller
than both that of the prior and the likelihood distribution. However,
non-typically, the mean of the posterior is further away from the prior
mean than the likelihood mean. This is a result of the unique nature
of the Pareto distribution and can be seen directly by plugging, for
a fixed β, the appropriate shape parameter into the mean equation
of the Pareto distribution.

You might also like