Bayesian Credible Interval
Bayesian Credible Interval
X −µ X −µ
q = √ ∼ N (0, 1).
1/n 1/ n
Step One: Since the mean of this distribution is 1/λ, one decent idea
for an estimator of λ is λ
b = 1/X.
a<Y <b
⇓
a < λX < b
and solve for λ “in the middle” to get
a b
<λ< .
X X
The 90% confidence interval for λ is then given by
a a
,
X X
where, again, a and b will depend on n and 0.90. (And ideally would
be written as chi-squared critical values!)
Interpretation: Once again, we have an interval with random end-
points that will contain the true value of λ with probability 0.90. Once
we collect the sample and compute the numerical value of X and hence
the numerical confidence interval, we will have a fixed interval that ei-
ther contains λ or doesn’t contain λ. In the long run, with repreated
sampling, the true mean λ will be correctly captured by the interval
90% of the time. t
u
2 Bayesian Credible Intervals
As Bayesians, we are thinking of parameters, such as µ for the normal distribution, as random
variables. Thus, it now makes sense to write statements like
This probability can be computed by integrating a prior pdf for µ but, if we want to let the
data speak, we’d better use the posterior pdf for µ given the data!
Like all Bayesian results, the credible interval will be affected by the choice of the prior
distribution for µ. In the following examples, we will compare results for different priors.
(b) µ has a N (µ0 , σ02 ) prior for known hyperparameters µ0 and σ02
Solution to (a):
The likelihood is Pn
1
(xi −µ)2
f (~x|µ) = (2π)−n/2 e− 2 i=1 .
The prior is
f (µ) ∝ 1, −∞ < µ < ∞.
The posterior is
f (µ|~x) ∝ f (~x|µ) · f (µ)
1
Pn
(xi −µ)2
∝ e− 2 i=1 ·1
1
xi − n
P P
x2i +µ µ2
= e− 2 2
1
xi − n
P P
x2i +µ µ2
∝ e− 2 2
.
= ..
n 2
∝ e− 2 (µ−x) .
~ = ~x is N (x, 1/n).
That is, the posterior distribution for µ given X
We wish to find critical values a and b such that
~ = ~x) = 90%
P (a < µ < b|X
µ−x
In this conditional world, we have that √
1/ n
∼ N (0, 1).
We know that for a N (0, 1) random variable Z,
P (−1.645 < Z < 1.645) = 0.90.
(Just as in the frequentist case, there are other non-symmetric values as well!)
So,
0.90 = P (−1.645 < Z < 1.645)
= P −1.645 < µ−x
√ ~ = ~x
< 1.645 X
1/ n
.
= ..
= P x − 1.645 √1n < µ < x + 1.645 √1n
Interpretation: This looks very similar to what we had in the frequentist case but the
interpretation is very different. The lowercase notation for x indicates that the sample mean,
and hence the endpoints of the interval, have been fixed and computed. As a frequentist,
the parameter µ would be fixed and would be either in the interval or not. It would not be
in there “with some probability”.
In the Bayesian case though, we are thinking of µ as random, and we can say, using a flat
prior and observing the data as x1 , x2 , . . . , xn , that µ is in this interval with probability 0.90!
Solution to (b):
The likelihood is still Pn
1
(xi −µ)2
f (~x|µ) = (2π)−n/2 e− 2 i=1 .
The prior is now
1
1 −
2σ 2
(µ−µ0 )2
f (µ) = q e 0 .
2πσ02
The posterior is then
f (µ|~x) ∝ f (~x|µ) · f (µ)
1
1
Pn − (µ−µ0 )2
(xi −µ)2 2σ 2
∝ e− 2 i=1 ·e 0
.
= ..
1
− (µ=µ∗ )2
∝ e 2(σ 2 )∗
where P
∗ µ0 +σ02 xi
µ = 1+nσ02
σ02
(σ 2 )∗ = 1+nσ02
So,
0.90 = P (−1.645 < Z < 1.645)
∗
= P −1.645 < √µ−µ2 ~ = ~x
< 1.645 X
(σ )∗
.
= ..
= P x − 1.645 √1n < µ < x + 1.645 √1n
The 90% credible interval for µ, using the conjugate N (µ0 , σ02 ), is
q q
µ∗ − 1.645 (σ 2 )∗ , µ∗ + 1.645 (σ 2 )∗ .
Note that, as σ02 → ∞, the N (µ0 , σ02 ) conjugate prior is an ever flattening bell curve that
is “squishing down” to a flat line. So, the flat uninformative prior can be thought of as a
limiting case of the conjugate prior. Indeed, in this case and for fixed n,
µ0 + σ02 xi
P P
∗ xi
µ = → =x
1 + nσ02 n
and
σ02 1
(σ 2 )∗ = 2
→
1 + nσ0 n
as σ02 → ∞. These limiting parameters match the parameters in the posterior distribution
for µ under the assumption of a flat prior.
Still, our resulting interval is kind of convoluted. In my personal opinion, too many people
fall back on conjugate priors for computational convenience. I would go with the flat prior
here unless I really knew something a priori about µ. I wouldn’t worry so much about the fact
that it is a normal distribution used for the prior as much as the parameters. Perhaps prior
experimentation/data suggests that the mean is around 3. I would then use a normal prior
with mean 3 and a variance that expresses how confident I am about that prior estimate.
(Small prior variance for high confidence and larger prior variance for less confidence.) tu
Solution to (a):
The likelihood is n
Pn
f (~x|λ) = λn e−λ xi
Y
i=1 I(0∞) (xi ).
i=1
The prior is
f (λ) ∝ 1, λ > 0.
The posterior is
f (λ|~x) ∝ f (~x|λ)
Pn
∝ λn e−λ i=1
xi
·1
Pn
= λn e−λ i=1
xi
If you’ve had MathStat, the best approach would be to multiply a < λ < b through by
appropriate constants in order to move from working with a gamma distribution to a chi-
squared distribution. Then, give your cutoffs in terms of symbolic chi-squared critical values
of numerical ones from a χ2 -table.
Otherwise, you need to numerically solve
Z b n
!n+1
1 P
λn e−λ xi
X
xi dλ.
a Γ(n + 1) i=1
There are many sets of values for a and b that will solve this. For simplicity, you could take
a = 0 and just solve for b. Alternatively, you could take b = ∞ and try to solve for a.
In frequentist statistics, taking a = 0 is often done for simplicity but sometimes people will
use Calculus (or numerics) to find the shortest possible confidence interval. After all, we are
giving an interval estimate for λ and we would like to be as precise as possible.
In Bayesian statistics, if you wanted to optimize your credible interval, the goal would be to
find an interval of values for λ that have the highest posterior density.
Definition:
In Bayesian statistics, a 100(1 − α)% highest posterior density region for a parameter θ
is a subset C of the parameter space that is defined by
C = {θ : f (θ|~x) ≥ k}
Basically this means that you want to find the highest horizontal line that, when intersected
with the posterior pdf, defines θ values that, when integrating between, will give you 1 − α.
For the exponential example, this will give two values as depicted below.
These values are the endpoints of the highest posterior density region for λ. Note that, in
the case of multimodal posterior densities, the highest posterior density region may consist
of a collection of disjoint intervals. (This is why it’s called a “region” and not a highest
posterior density “interval”.)
Solution to (b): Similar because the prior is a conjugate prior– we will just get a different
gamma posterior.