0% found this document useful (0 votes)
22 views31 pages

Lecture 05

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views31 pages

Lecture 05

Uploaded by

wangweian8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

STA732

Statistical Inference
Lecture 05: Rao-Blackwell Theorem

Yuansi Chen
Spring 2023
Duke University

https://www2.stat.duke.edu/courses/Spring23/sta732.01/

1
Recap from Lecture 04

• 𝑉 is ancillary if its distribution does not depend on 𝜃


• Completeness + sufficiency as the ideal notion of optimal data
compression. To prove completeness, one usually goes by
definition or by identifying exponential family.
• Basu’s theorem is useful to prove independence between a
complete sufficient statistics and an ancillary statistics.

2
Goal of Lecture 05

1. Convex loss
2. Rao-Blackwell Theorem
3. Uniformly minimum variance unbiased estimator (UMVU)

Chap. 3.6, 4.1-4.2 in Keener or Chap. 1.7, 2.1 in Lehmann and Casella

We are entering the first approach of arguing for “the best”


estimator in point estimation: by restricting to a smaller class of
estimators!

3
Convex loss
Definition. Convex set

A set 𝒞 ⊆ ℝ𝑝 is convex if given any two points 𝑥, 𝑦 ∈ 𝒞, for any


𝜆 ∈ [0, 1], we have

𝜆𝑥 + (1 − 𝜆)𝑦 ∈ 𝒞

4
Definition. Convex function

A real-valued function 𝑓 defined on a convex set 𝒞 ⊆ ℝ𝑝 is a convex


function if for any two points 𝑥, 𝑦 ∈ 𝒞 and any 𝜆 ∈ [0, 1], we have

𝑓(𝜆𝑥 + (1 − 𝜆)𝑦) ≤ 𝜆𝑓(𝑥) + (1 − 𝜆)𝑓(𝑦).

It is called strictly convex if the above inequality holds strictly for


𝑥 ≠ 𝑦 and 𝜆 ∈ (0, 1).

5
Jensen’s inequality in finite form

Jensen’s inequality in finite form


For a convex function 𝑓, 𝑥1 , … , 𝑥𝑛 in its domain, and positive
𝑛
weights 𝛼𝑖 with ∑𝑖=1 𝛼𝑖 = 1. Then
𝑛 𝑛
𝑓(∑ 𝑎𝑖 𝑥𝑖 ) ≤ ∑ 𝑎𝑖 𝑓(𝑥𝑖 )
𝑖=1 𝑖=1

proof by induction, omitted

6
Jensen’s inequality in a probabilistic setting

Jensen’s inequality in a probabilistic setting


𝑋 is an integrable real-valued random variable, 𝑓 is convex. Then

𝑓(𝔼[𝑋]) ≤ 𝔼[𝑓(𝑋)]

If 𝑓 is strictly convex, the inequality holds strictly unless 𝑋 is almost


surely constant.

proof see Thm 3.25, remark 3.26 in Keener or Wikipedia

7
Examples of convex functions

• 𝑥 ↦ 1/𝑥 is strictly convex on (0, ∞). Then for 𝑋 > 0, we have

1
≤ 𝔼[1/𝑋]
𝔼[𝑋]

• 𝑥 ↦ − log(𝑥) is strictly convex on (0, ∞). Then for 𝑋 > 0, we


have

log(𝔼[𝑋]) ≥ 𝔼 log(𝑋).

8
Convex loss penalizes extra noise to an estimator

Proposition
Suppose the loss 𝐿(𝜃, 𝑑) is convex in 𝑑. Let 𝛿(𝑋) be an estimate of
̃
𝜃. Define 𝛿(𝑋) = 𝛿(𝑋) + 𝜖, where 𝜖 is a zero-mean random variable
independent of 𝑋. Then

𝑅(𝜃, 𝛿)̃ ≥ 𝑅(𝜃, 𝛿)

where the risk 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿(𝑋))]

Proof idea: tower property + Jensen’s inequality

9
Rao-Blackwell Theorem
Rao-Blackwell Theorem

Thm 3.28 in Keener


Let 𝑇 be a sufficient statistics for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}, let 𝛿 be an
estimator of 𝑔(𝜃). Define 𝜂(𝑇 ) = 𝔼𝜃 [𝛿(𝑋) ∣ 𝑇 ]. If 𝐿(𝜃, ⋅) is convex,
then

𝑅(𝜃, 𝜂) ≤ 𝑅(𝜃, 𝛿).

where the risk 𝑅(𝜃, 𝛿) = 𝔼𝜃 [𝐿(𝜃, 𝛿(𝑋))].


Furthermore, if 𝐿(𝜃, ⋅) is strictly convex, the inequality is strict
a.s.
unless 𝛿(𝑋) = 𝜂(𝑇 ).

10
Interpretation

For convex loss functions,

1. If an estimator is not just based on sufficient statistics 𝑇 , we


can improve it.
2. The step of constructing 𝜂(𝑇 ) = 𝔼𝜃 [𝛿(𝑋) ∣ 𝑇 ] from 𝛿 is called
Rao-Blackwellization.
3. When discussing optimal estimators, the only estimators of
𝑔(𝜃) that are worth considering are functions of sufficient
statistics 𝑇 .

11
Proof of Rao-Blackwell Theorem

See Keener Thm 3.28, apply Jensen

12
UMVU
Bias

• The bias of an estimate 𝛿(𝑋) is 𝔼𝜃 [𝛿(𝑋) − 𝑔(𝜃)]


• We say an estimator 𝛿 is unbiased for 𝑔(𝜃) if

𝔼𝜃 [𝛿(𝑋)] = 𝑔(𝜃), ∀𝜃 ∈ Ω.

Ex: what is an unbiased estimator of 𝜃 for 𝑋 drawn from a uniform


distribution on (0, 𝜃)?

13
Bias-variance decomposition under squared error loss

Squared error loss:

𝐿(𝜃, 𝑑) = (𝑑 − 𝑔(𝜃))2

Risk decomposition under squared error loss


2
Risk becomes the mean squared error 𝑅(𝜃, 𝛿) = 𝔼𝜃 (𝛿(𝑋) − 𝑔(𝜃))

2
𝔼𝜃 (𝛿(𝑋) − 𝑔(𝜃))
2
= 𝔼𝜃 (𝛿(𝑋) − 𝔼𝜃 [𝛿] + 𝔼𝜃 [𝛿] − 𝑔(𝜃))
2 2
=𝔼
⏟⏟𝜃 (𝛿(𝑋)
⏟⏟⏟⏟ − 𝔼⏟
𝜃 [𝛿])
⏟⏟ + 𝔼 𝜃 (𝔼𝜃 [𝛿] − 𝑔(𝜃)) + 2𝔼
⏟⏟⏟⏟⏟⏟⏟ [(𝛿 − 𝔼𝜃 [𝛿])(𝔼𝜃 𝛿 − 𝑔(𝜃))]
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
Var𝜃 (𝛿) Bias(𝛿)2 =0

14
UMVU

Logic: according to the bias-variance decomposition under


squared error loss, if we restrict to unbiased estimators, comparing
variance is equivalent to comparing risk
Def. UMVU
An unbiased estimator 𝛿 is uniformly minimum variance unbiased
(UMVU) if

̃ ∀𝜃 ∈ Ω
Var𝜃 (𝛿) ≤ Var𝜃 (𝛿),

for any competing unbiased estimator 𝛿.̃

15
Does UMVU always exist?
No! Even unbiased estimators might not exist
1
Ex: estimate 𝜃2
for 𝑋 drawn from Uniform(0, 𝜃)

16
Does UMVU always exist?
No! Even unbiased estimators might not exist
1
Ex: estimate 𝜃2
for 𝑋 drawn from Uniform(0, 𝜃)

Def. U-estimable
We say 𝑔(𝜃) is U-estimable if there exists 𝛿 such that
𝔼𝜃 𝛿 = 𝑔(𝜃), ∀𝜃 ∈ Ω

Does UMVU exist under U-estimable assumption?

16
UMVU under U-estimable and given complete sufficient statistics

Theorem 4.4 in Keener, Lehmann-Scheffé


Suppose 𝑇 (𝑋) is complete sufficient for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}. For
a.s.
any U-estimable 𝑔(𝜃), there is a unique (up to = ) UMVU estimator
which is based on 𝑇 .

17
Proof of Thm 4.4

• Existence
• Uniqueness
• UMVU

18
Extension to convex loss

Extension of Thm 4.4 to convex loss


Supppose 𝑇 (𝑋) is complete sufficient for P = {𝑃𝜃 ∶ 𝜃 ∈ Ω}.
Under a strictly convex loss, among all unbiased estimators, there
a.s.
is a unique (up to = ) uniformly minimum risk unbiased estimator
which is based on 𝑇

19
Strategies for finding UMVU estimators

Two strategies for finding UMVU estimators:


• Directly find an unbiased estimator based on a complete
sufficient 𝑇
• Find any unbiased estimator, then Rao-Blackwellize it.

20
Example 1

i.i.d.
𝑋1 , … , 𝑋𝑛 ∼ Poisson(𝜃), 𝜃 > 0.

• Find a UMVU estimator for 𝜃


• Find a UMVU estimator for 𝜃2

21
Example 2

i.i.d.
𝑋1 , … , 𝑋𝑛 ∼ Unif(0, 𝜃), 𝜃 > 0.

• Find a UMVU estimator for 𝜃 in two ways

22
In Example 2, is the UMVU estimator also a “good” (admissible)
estimator in terms of total risk?

23
Summary

• Jensen’s inequality for convex function. Convex loss allows us


to rule out estimators with extra noise
• Rao-Blackwell theorem allows us to improve an estimator
based on sufficient statistics 𝑇
• If unbiased estimator exists, complete sufficient statistics 𝑇
exists, then UMVU estimator exists and is unique

24
What is next?

• Reflexion on the unbiasedness


• Information inequality

25
Thank you

26
27

You might also like