0% found this document useful (0 votes)
4 views

Lecture 09

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 09

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

STA732

Statistical Inference
Lecture 09: Bayesian estimation

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 08

1. Construct minimum risk equivariant (MRE) estimator via


conditioning on maximal invariant statistics
2. Pitman estimator of location
3. MRE for location is unbiased under squared error loss
4. MRE usually admissible

2
Where we are

• We have finished the first approach of arguing for “the best”


estimator in point estimation: by restricting to a small set of
estimatiors
• Unbiased estimators
• Equivariant estimators
• We begin the second approach: global measure of optimality
• average risk
• minimax risk

3
Goal of Lecture 09

1. Bayes risk, Bayes estimator


2. Examples
3. Bayes estimators are usually biased
4. Bayes estimators are usually admissible

Chap. 7 in Keener or Chap. 4 in Lehmann and Casella

4
Bayes risk, Bayes estimator
Recall the components of a decision problem

• Data 𝑋
• Model family P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}, a collection of probability
distributions on the sample space
• Loss function 𝐿, 𝐿(𝜃, 𝑑) measures the loss incurred by the
decision 𝑑 when compared with the parameter obtained from 𝜃
• Risk function 𝑅, 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿)]

5
The frequentist motivation of the Bayesian setup

Motivation
It is in general hard to find uniformly minimum risk estimator.
Oftentimes, we have risks that cross. This difficulty will not arise if
the performance is measured via a single number.

Def. Bayes risk


The Bayes risk is the average-case risk, integrated w.r.t. some
measure Λ, called prior.

6
The frequentist motivation of the Bayesian setup

Motivation
It is in general hard to find uniformly minimum risk estimator.
Oftentimes, we have risks that cross. This difficulty will not arise if
the performance is measured via a single number.

Def. Bayes risk


The Bayes risk is the average-case risk, integrated w.r.t. some
measure Λ, called prior.

Remark
For now, assume Λ(Ω) = 1 (Λ is a prob measure). Later we might
deal with improper prior.

6
Bayes risk

𝑅Bayes (Λ, 𝛿) = ∫ 𝑅(𝜃, 𝛿)𝑑Λ(𝜃)


Ω
= 𝔼𝑅(Θ, 𝛿)

where Θ is the randoma variable with distribution Λ.

𝔼𝑅(Θ, 𝛿) = 𝔼[𝔼[𝐿(Θ, 𝛿(𝑋)) ∣ 𝑋]]

Both 𝑋 and Θ are considered random.


The frequentist understanding: average risk makes sense without believing the
parameter is random

7
Bayes estimator

An estimator 𝛿 which minimizes the average risk 𝑅Bayes (Λ, ⋅) is a


Bayes estimator.

8
Construct Bayes estimator

Thm 7.1 in Keener


Suppose Θ ∼ Λ, 𝑋 ∣ Θ = 𝜃 ∼ 𝑃𝜃 , and 𝐿(𝜃, 𝑑) ≥ 0 for all 𝜃 ∈ Ω and
all 𝑑. If

• 𝔼[𝐿(Θ, 𝛿0 )] < ∞ for some 𝛿0


• for a.e. 𝑥, there exists a 𝛿Λ (𝑥) minimizing

𝔼[𝐿(Θ, 𝑑) ∣ 𝑋 = 𝑥]

with respect to 𝑑

Then 𝛿Λ is a Bayes estimator.

In words: the Bayes estimator can be found by minimizing the conditional


distribution 𝔼[𝐿(𝜃, 𝑑) ∣ 𝑋 = 𝑥], one 𝑥 at a time

9
proof of Thm 7.1

10
Posterior

Def. Posterior
The conditional distribution of Θ given 𝑋, written as ℒ(Θ ∣ 𝑋) is
called the posterior distribution

Remark
• Λ is usually interpreted as prior belief about Θ before seeing
the data
• ℒ(Θ ∣ 𝑋) is the belief after seeing the data

11
Posterior calcultation with density

Suppose prior density 𝜆(𝜃), likelihood 𝑝𝜃 (𝑥), then the posterior


density is

𝜆(𝜃)𝑝𝜃 (𝑥)
𝜆(𝜃 ∣ 𝑥) =
𝑞(𝑥)

where 𝑞(𝑥) = ∫Ω 𝜆(𝜃)𝑝𝜃 (𝑥)𝑑𝜃 is the marginal density of 𝑋.


Then the Bayes estimator has the form

𝛿Λ (𝑥) = arg min ∫ 𝐿(𝜃, 𝑑)𝜆(𝜃 ∣ 𝑥)𝑑𝜃


𝑑 Ω

12
Posterior mean is Bayes estimator for squared error loss

2
Suppose 𝐿(𝜃, 𝑑) = (𝑔(𝜃) − 𝑑) then the Bayes estimator is the
posterior mean
proof:

13
Examples
Binomial model with Beta prior

Suppose 𝑋 ∣ Θ = 𝜃 ∼ Binomial(𝑛, 𝜃) with density 𝜃𝑥 (1 − 𝜃)𝑛−𝑥 (𝑛𝑥),


Θ ∼ Beta(𝛼, 𝛽) with density 𝜃𝛼−1 (1 − 𝜃)𝛽−1 Γ(𝛼)Γ(𝛽)
Γ(𝛼+𝛽) . Find the
Bayes estimator under squared error loss.

14
Weighted squared error loss

2
Suppose 𝐿(𝜃, 𝑑) = 𝑤(𝜃) (𝑔(𝜃) − 𝑑) . Find a Bayes estimator.

15
Normal mean estimation

𝑋 ∣ Θ = 𝜃 ∼ 𝒩(𝜃, 𝜎2 ),
Θ ∼ 𝒩(𝜇, 𝜏 2 ).
Find the Bayes estimator of mean under squared error loss
What if we have 𝑛 i.i.d. data points?

16
Binary classification

Suppose the parameter space Ω = {0, 1}.


ℙ(𝑋 = 𝑥 ∣ Θ = 0) = 𝑓0 (𝑥) and ℙ(𝑋 = 𝑥 ∣ Θ = 1) = 𝑓1 (𝑥). The
prior is 𝜋(1) = 𝑝, 𝜋(0) = 1 − 𝑝.

{0 𝑑 = 𝜃
Determine a Bayes estimator under 0-1 loss 𝐿(𝜃, 𝑑) =

{
⎩1 𝑑 ≠ 𝜃

17
Bayes estimators are usually biased
Unbiased estimator under squared error loss is not Bayes

Thm Lehmann Casella 4.2.3


If 𝛿 is unbiased for 𝑔(𝜃) with 𝑅Bayes (Λ, 𝛿) < ∞ then 𝛿 is not Bayes
under squared error loss unless its average risk is zero

𝔼 [(𝛿(𝑋) − 𝑔(Θ))2 ] = 0

18
proof:

19
Bayes estimators are usually
admissible
Uniqueness of Bayes estimator under strictly convex loss

Thm. Lehmann Casella 4.1.4


Let 𝑄 be the marginal distribution of 𝑋, i.e.,
𝑄(𝐸) = ∫ ℙ𝜃 (𝐸)𝑑Λ(𝜃). Suppose 𝐿 is strictly convex. If

1. 𝑅Bayes (Λ, 𝛿Λ ) < ∞,


2. 𝑄(𝐸) = 0 implies 𝑃𝜃 (𝐸) = 0, ∀𝜃,

then the Bayes estimator 𝛿Λ is unique (a.e. with respect to 𝑃𝜃 for all
𝜃).

20
proof: Use the following lemma
Lem. Lehmann Casella exercise 1.7.26
Let 𝜙 be a strictly convex function over an interval 𝐼. If there exists a
value 𝑎0 ∈ 𝐼 minimizing 𝜙, then 𝑎0 is unique.

21
A unique Bayes estimator is admissible

Thm. Lehmann Casella 5.2.4


A unique Bayes estimator (a.s. for all 𝑃𝜃 ) is admissible.

22
proof:

23
Summary

• Bayes estimator is defined as the minimizer of the average risk


over a prior on 𝜃.
• Bayes estimator can be constructed by conditioning the risk
on each 𝑥
• Bayes estimators are biased under squared error loss
• Bayes estimators are admissible under strictly convex loss

24
What is next?

• Where do priors come from?


• Pros and cons of Bayes

25
Thank you

26
27

You might also like