0% found this document useful (0 votes)
15 views

Lecture4 More Bayes

The document discusses Bayesian and frequentist statistical methods, providing examples of conjugate priors and posterior distributions for normal-normal models. It also covers posterior predictive distributions, which allow predicting new observations given existing data by averaging likelihoods over the posterior distribution.

Uploaded by

Ala Bala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lecture4 More Bayes

The document discusses Bayesian and frequentist statistical methods, providing examples of conjugate priors and posterior distributions for normal-normal models. It also covers posterior predictive distributions, which allow predicting new observations given existing data by averaging likelihoods over the posterior distribution.

Uploaded by

Ala Bala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

More on Bayesian Methods

Rebecca C. Steorts
Bayesian Methods and Modern Statistics: STA 360/601

Lecture 4

1
Today’s menu

I Review of notation
I When are Bayesian and frequentist methods the same?
I Example: Normal-Normal
I Posterior predictive inference
I Example
I Credible Intervals
I Example

2
Notation

p(x|θ) likelihood
π(θ) prior
Z
p(x) = p(x|θ)π(θ) dθ marginal likelihood
p(x|θ)π(θ)
p(θ|x) = posterior probability
p(x)
Z
p(xnew |x) = p(xnew |θ)π(θ|x) dθ predictive probability

3
Another conjugate example
Suppose
iid
X1 . . . Xn | λ ∼ Poisson(λ)
λ ∼ Gamma(α, β).
Find p(λ | X).
n h
Y i β α α−1 −β×λ
p(λ | X) = λxi e−λ /xi ! × λ e (1)
γ(α)
i=1
∝ λnx̄ e−nλ × λα−1 e−β×λ (2)
nx̄+α−1 −λ(n+β)
∝λ e (3)

λ | X ∼ Gamma(nx̄ + α, n + β)

4
Normal-Normal

iid
X1 , . . . , Xn |θ ∼ N(θ, σ 2 )
θ ∼ N(µ, τ 2 ),

where σ 2 is known. Calculate the distribution of θ|x1 , . . . , xn .


Using a ton of math and algebra, you can show that
nx̄ µ !
σ 2 + τ2 1
θ|x1 , . . . , xn ∼ N n ,
2 + τ12 n2 + τ12
σ 2 σ
nx̄τ + µσ 2 σ2τ 2

=N , .
nτ 2 + σ 2 nτ 2 + σ 2

5
Two Useful Things to Know

Definition
The reciprocal of the variance is referred to as the precision. Then
1
Precision = .
Variance

Suppose the loss we assume is squared error. Let δ(x) be an


estimator of true parameter θ. Then

M SE(δ(x)) = Bias2 + V ariance (4)


h i
 2 2
= θ − Eθ [δ(x)] + Eθ δ(x) − Eθ [δ(x)] (5)

6
Theorem
Let δn be a sequence of estimators of g(θ) with mean squared
error E(δn − g(θ))2 . Let bn (θ) be the bias.
(i) If E[δn − g(θ)]2 → 0 then δn is consistent for g(θ).
(ii) Equivalent to the above, δn is consistent if bn (θ) → 0 and
V ar(δn ) → 0 for all θ.
(iii) In particular (and most useful), δn is consistent if it is
unbiased for each n and if V ar(δn ) → 0 for all θ.
We omit the proof since it requires Chebychev’s Inequality along
with a bit of probability theory. See Problem 1.8.1 in TPE for the
exercise of proving this.

7
Normal-Normal Revisited

We write the posterior mean and posterior variance out.


nx̄ µ
σ 2 + τ2
E(θ|x) = n 1 .
+
σ2 τ2
nx̄ µ
= σ2 + τ2
.
n 1 n 1
+ +
σ2 τ2 σ2 τ2

1
V (θ|x) = n 1 .
+
σ2 τ2

Can someone given an explanation of what’s happening here? How


does this contrast frequentist inference?

8
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.

9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2

9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider

E[x̄] = E[E[x̄ | θ]] = θ. (unbiased)

9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider

E[x̄] = E[E[x̄ | θ]] = θ. (unbiased)

In the case of the posterior variance, divide the denominator and


numerator by n. Then
1
n σ2
V (θ|x) = ≈ → 0 as n → ∞.
1 n 1 1 n
+
n σ2 n τ 2
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider

E[x̄] = E[E[x̄ | θ]] = θ. (unbiased)

In the case of the posterior variance, divide the denominator and


numerator by n. Then
1
n σ2
V (θ|x) = ≈ → 0 as n → ∞.
1 n 1 1 n
+
n σ2 n τ 2
9
A consistent estimator
Let δ̂(x) = E[θ | X]. Show that the posterior mean is consistent.
First consider the posterior mean as n → ∞.

1 nx̄ 1 µ x̄
2
+ 2 2
E(θ|x) = n σ n τ → σ = x̄ as n → ∞.
1 n 1 1 1
+
n σ2 n τ 2 σ2
Now consider

E[x̄] = E[E[x̄ | θ]] = θ. (unbiased)

In the case of the posterior variance, divide the denominator and


numerator by n. Then
1
n σ2
V (θ|x) = ≈ → 0 as n → ∞.
1 n 1 1 n
+
n σ2 n τ 2
Thus, the posterior mean is consistent by our Theorem, part (iii). 9
Posterior Predictive Distributions

I We have just seen how estimation can be done in Bayesian


analysis.
I Another goal might be prediction.
I That is given some data y and a new observation ỹ, we may
wish to find the conditional distribution of ỹ given y.
I This distribution is referred to as the posterior predictive
distribution.
I That is, our goal is to find p(ỹ|y).

10
Posterior Predictive Distributions
Consider
p(ỹ, y)
p(ỹ|y) = (6)
p(y)
R
p(ỹ, y, θ) dθ
= θ (7)
p(y)
R
p(ỹ|y, θ)p(y, θ) dθ
= θ (8)
p(y)
Z
= p(ỹ|y, θ)p(θ|y) dθ. (9)
θ
In most contexts, if θ is given, then ỹ|θ is independent of y, i.e.,
the value of θ determines the distribution of ỹ, without needing to
also know y. When this is the case, we say that ỹ and y are
conditionally independent given θ. Then the above becomes
Z
p(ỹ|y) = p(ỹ|θ)p(θ|y) dθ.
θ
11
Theorem
If θ is discrete and ỹ and y are conditionally independent given θ,
then the posterior predictive distribution is
X
p(ỹ|y) = p(ỹ|θ)p(θ|y).
θ

If θ is continuous and ỹ and y are conditionally independent given


θ, then the posterior predictive distribution is
Z
p(ỹ|y) = p(ỹ|θ)p(θ|y) dθ.
θ

12
Negative Binomial Distribution
I We reintroduce the Negative Binomial distribution.
I The binomial distribution counts the numbers of successes in
a fixed number of iid Bernoulli trials.
I Recall, a Bernoulli trial has a fixed success probability p.
I Suppose instead that we count the number of Bernoulli trials
required to get a fixed number of successes. This formulation
leads to the Negative Binomial distribution.
I In a sequence of independent Bernoulli(p) trials, let X denote
the trial at which the rth success occurs, where r is a fixed
integer.
Then
 
x−1
f (x) = pr (1 − p)x−r , x = r, r + 1, . . .
r−1

and we say X ∼ Negative Binom(r, p).


13
Negative Binomial Distribution

I There is another useful formulation of the Negative Binomial


distribution.
I In many cases, it is defined as Y = number of failures before
the rth success. This formulation is statistically equivalent to
the one given above in term of X = trial at which the rth
success occurs, since Y = X − r. Then
 
r+y−1
f (y) = pr (1 − p)y , y = 0, 1, 2, . . .
y

and we say Y ∼ Negative Binom(r, p).


I When we refer to the Negative Binomial distribution in this
class, we will refer to the second one defined unless we
indicate otherwise.

14
X|λ ∼ Poisson(λ)
λ ∼ Gamma(a, b)

Assume that X̃|λ ∼ Poisson(λ) is independent of X. Assume we


have a new observation x̃. Find the posterior predictive
distribution, p(x̃|x). Assume that a is an integer. First, we must
find p(λ|x).

15
Recall

p(λ|x) ∝ p(x|λ)(p(λ)
∝ e−λ λx λa−1 e−λ/b
= λx+a−1 e−λ(1+1/b) .

1
Thus, λ|x ∼ Gamma(x + a, 1+1/b ), i.e.,
b
λ|x ∼ Gamma(x + a, b+1 ). Finish the problem for homework.

16
I Suppose that X is the number of pregnant women arriving at
a particular hospital to deliver their babies during a given
month.
I The discrete count nature of the data plus its natural
interpretation as an arrival rate suggest modeling it with a
Poisson likelihood.
I To use a Bayesian analysis, we require a prior distribution for
θ having support on the positive real line. A convenient choice
is given by the Gamma distribution, since it’s conjugate for
the Poisson likelihood.
The model is given by

X|λ ∼ Poisson(λ)
λ ∼ Gamma(a, b).

17
I We are also told 42 moms are observed arriving at the
particular hospital during December 2007. Using prior study
information given, we are told a = 5 and b = 6.
I (We found a, b by working backwards from a prior mean of 30
and prior variance of 180).
We would like to find several things in this example:
1. Plot the likelihood, prior, and posterior distributions as
functions of λ in R.
2. Plot the posterior predictive distribution where the number of
pregnant women arriving falls between [0,100], integer valued.
3. Find the posterior predictive probability that the number of
pregnant women arrive is between 40 and 45 (inclusive). Do
this for homework.
4. You are expected to have this done by early this week or next
week since you have an exam on Thursday, Feb 11 (in class).
(This material will not be turned in but could appear on the
exam).
18

You might also like