Lecture4 More Bayes
Lecture4 More Bayes
Rebecca C. Steorts
Bayesian Methods and Modern Statistics: STA 360/601
Lecture 4
1
Today’s menu
I Review of notation
I When are Bayesian and frequentist methods the same?
I Example: Normal-Normal
I Posterior predictive inference
I Example
I Credible Intervals
I Example
2
Notation
p(x|θ) likelihood
π(θ) prior
Z
p(x) = p(x|θ)π(θ) dθ marginal likelihood
p(x|θ)π(θ)
p(θ|x) = posterior probability
p(x)
Z
p(xnew |x) = p(xnew |θ)π(θ|x) dθ predictive probability
3
Another conjugate example
Suppose
iid
X1 . . . Xn | λ ∼ Poisson(λ)
λ ∼ Gamma(α, β).
Find p(λ | X).
n h
Y i β α α−1 −β×λ
p(λ | X) = λxi e−λ /xi ! × λ e (1)
γ(α)
i=1
∝ λnx̄ e−nλ × λα−1 e−β×λ (2)
nx̄+α−1 −λ(n+β)
∝λ e (3)
λ | X ∼ Gamma(nx̄ + α, n + β)
4
Normal-Normal
iid
X1 , . . . , Xn |θ ∼ N(θ, σ 2 )
θ ∼ N(µ, τ 2 ),
5
Two Useful Things to Know
Definition
The reciprocal of the variance is referred to as the precision. Then
1
Precision = .
Variance
6
Theorem
Let δn be a sequence of estimators of g(θ) with mean squared
error E(δn − g(θ))2 . Let bn (θ) be the bias.
(i) If E[δn − g(θ)]2 → 0 then δn is consistent for g(θ).
(ii) Equivalent to the above, δn is consistent if bn (θ) → 0 and
V ar(δn ) → 0 for all θ.
(iii) In particular (and most useful), δn is consistent if it is
unbiased for each n and if V ar(δn ) → 0 for all θ.
We omit the proof since it requires Chebychev’s Inequality along
with a bit of probability theory. See Problem 1.8.1 in TPE for the
exercise of proving this.
7
Normal-Normal Revisited
1
V (θ|x) = n 1 .
+
σ2 τ2
8
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.
1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.
1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.
1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider
1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider
1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider
10
Posterior Predictive Distributions
Consider
p(ỹ, y)
p(ỹ|y) = (6)
p(y)
R
p(ỹ, y, θ) dθ
= θ (7)
p(y)
R
p(ỹ|y, θ)p(y, θ) dθ
= θ (8)
p(y)
Z
= p(ỹ|y, θ)p(θ|y) dθ. (9)
θ
In most contexts, if θ is given, then ỹ|θ is independent of y, i.e.,
the value of θ determines the distribution of ỹ, without needing to
also know y. When this is the case, we say that ỹ and y are
conditionally independent given θ. Then the above becomes
Z
p(ỹ|y) = p(ỹ|θ)p(θ|y) dθ.
θ
11
Theorem
If θ is discrete and ỹ and y are conditionally independent given θ,
then the posterior predictive distribution is
X
p(ỹ|y) = p(ỹ|θ)p(θ|y).
θ
12
Negative Binomial Distribution
I We reintroduce the Negative Binomial distribution.
I The binomial distribution counts the numbers of successes in
a fixed number of iid Bernoulli trials.
I Recall, a Bernoulli trial has a fixed success probability p.
I Suppose instead that we count the number of Bernoulli trials
required to get a fixed number of successes. This formulation
leads to the Negative Binomial distribution.
I In a sequence of independent Bernoulli(p) trials, let X denote
the trial at which the rth success occurs, where r is a fixed
integer.
Then
x−1
f (x) = pr (1 − p)x−r , x = r, r + 1, . . .
r−1
14
X|λ ∼ Poisson(λ)
λ ∼ Gamma(a, b)
15
Recall
p(λ|x) ∝ p(x|λ)(p(λ)
∝ e−λ λx λa−1 e−λ/b
= λx+a−1 e−λ(1+1/b) .
1
Thus, λ|x ∼ Gamma(x + a, 1+1/b ), i.e.,
b
λ|x ∼ Gamma(x + a, b+1 ). Finish the problem for homework.
16
I Suppose that X is the number of pregnant women arriving at
a particular hospital to deliver their babies during a given
month.
I The discrete count nature of the data plus its natural
interpretation as an arrival rate suggest modeling it with a
Poisson likelihood.
I To use a Bayesian analysis, we require a prior distribution for
θ having support on the positive real line. A convenient choice
is given by the Gamma distribution, since it’s conjugate for
the Poisson likelihood.
The model is given by
X|λ ∼ Poisson(λ)
λ ∼ Gamma(a, b).
17
I We are also told 42 moms are observed arriving at the
particular hospital during December 2007. Using prior study
information given, we are told a = 5 and b = 6.
I (We found a, b by working backwards from a prior mean of 30
and prior variance of 180).
We would like to find several things in this example:
1. Plot the likelihood, prior, and posterior distributions as
functions of λ in R.
2. Plot the posterior predictive distribution where the number of
pregnant women arriving falls between [0,100], integer valued.
3. Find the posterior predictive probability that the number of
pregnant women arrive is between 40 and 45 (inclusive). Do
this for homework.
4. You are expected to have this done by early this week or next
week since you have an exam on Thursday, Feb 11 (in class).
(This material will not be turned in but could appear on the
exam).
18