0% found this document useful (0 votes)
310 views10 pages

Minimax

The document discusses minimax estimation and its relationship to Bayes estimation. It defines the minimax estimator as the estimator that minimizes the maximum risk over the parameter space. The Bayes estimator with respect to the least favorable prior distribution will be minimax if its Bayes risk equals the maximum risk. Even when the prior is improper, if a sequence of priors approximates it and their Bayes risks converge, the estimator with the limiting risk can still be minimax.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
310 views10 pages

Minimax

The document discusses minimax estimation and its relationship to Bayes estimation. It defines the minimax estimator as the estimator that minimizes the maximum risk over the parameter space. The Bayes estimator with respect to the least favorable prior distribution will be minimax if its Bayes risk equals the maximum risk. Even when the prior is improper, if a sequence of priors approximates it and their Bayes risks converge, the estimator with the limiting risk can still be minimax.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

7 Minimax Estimation

Previously we have looked at Bayes estimation when the overall measure of the es-
timation error is weighted over all values of the parameter space, with a positive
weight function, a prior. Now we use the maximum risk as a relevant measure of the
estimation risk,

sup R(✓, ),

and define the minimax estimator, if it exists, as the estimator that minimizes this
risk, i.e.
ˆ = arg inf sup R(✓, ),

where the infimum is taken over all functions of the data.


To relate the minimax estimator with the Bayes estimator the Bayes risk
Z
r⇤ = R(✓, ) d⇤(✓)

is of interest.
Definition 16 The prior distribution ⇤ is called least favorable for estimating g(✓)
if r⇤ r⇤0 for every other distribution ⇤0 on ⌦.
When a Bayes estimator has a Bayes risk that attains the minimax risk it is minimax:
Theorem 11 Assume ⇤ is a prior distribution and assume the Bayes estimator ⇤
satisfies
Z
R(✓, ⇤ ) d⇤(✓) = sup R(✓, ⇤ ).

Then

(i) ⇤ is minimax.
(ii) If ⇤ is the unique Bayes estimator, it is the unique mimimax estimator.
(iii) ⇤ is least favorable.

Proof. Let 6= ⇤ be another estimator. Then


Z Z
sup R(✓, ) R(✓, ) d⇤(✓) R(✓, ⇤ ) d⇤(✓)

= sup R(✓, ),

which proves that ⇤ is minimax.


To prove (ii), uniqueness of the Bayes estimator means that no other estimator
has the same Bayes risk, so > replaces in the second inequality above, and thus

49
no other estimator has the same maximum risk, which proves the uniqueness of the
minimax estimator.
To show (iii), let ⇤0 6= ⇤ be a distribution on ⌦, and let ⇤ , ⇤0 be the corresponding
Bayes estimators. The the Bayes risks are related by
Z Z
r⇤ 0 = R(✓, ⇤0 ) d⇤0 (✓)  R(✓, ⇤ ) d⇤
0
(✓)  sup R(✓, ⇤)

= r⇤ ,

where the first inequality follows since ⇤0 is the Bayes estimator corresponding to ⇤0 .
We have proven that ⇤ is least favorable. 2

Note 1 The previous result says minimax estimators are Bayes estimators with re-
spect to the least favorable prior.

Two simple cases when the Bayes estimator’s Bayes risk attains the minimax risk are
given in the following.

Corollary 3 Assume the Bayes estimator ⇤ has constant risk, i.e. R(✓, ⇤) does
not depend on ✓. Then ⇤ is minimax.

Proof. If the risk is constant the supremum over ✓ is equal to the average over ✓ so
the Bayes and the minimax risk are the same, and the result follows from the previous
theorem. 2

Corollary 4 Let ⇤ be the Bayes estimator for ⇤, and define

⌦⇤ = {✓ 2 ⌦ : R(✓, ⇤) = sup R(✓0 , ⇤ )}.


✓0

Then if ⇤(⌦⇤ ) = 1, ⇤ is minimax.

Proof. The condition ⇤(⌦⇤ ) = 1 means that the Bayes estimator has constant risk
⇤-almost surely, and since the Bayes estimator is only determined modulo ⇤-null sets,
this is enough. 2

Example 32 Let X 2 Bin(n, p). Let ⇤ = B(a, b) be the Beta distribution for p. As
previously established, via the conditional distribution of p given X (which is Beta
also), the Bayes estimator of p is
a+x
⇤ (x) = ,
a+b+n

50
with risk function, for quadratic loss,

R(p, ⇤) = Var( ⇤ ) + (E( ⇤ ) p)2


!2
1 a + np p(a + b + n)
= 2
npq +
(a + b + n) a+b+n
1 ⇣ ⌘
2
= npq + (aq bp)
(a + b + n)2
a2 + (n 2a(a + b))p + ((a + b)2 n)p2
= .
(a + b + n)2
Since this function is a quadratic in p it is constant, as a function of p, if and only if
the coefficients of p and p2 are zero, i.e. i↵

(a + b)2 = n, 2a(a + b) = n,
p
which has a solution a = b = n/2.
Thus
p
x + n/2
⇤ (x) = p
n+ n

is a constant risk Bayes


p estimator
p and therefore minimax for the (least favorable)
distribution ⇤ = B( n/2, n/2). The Bayes estimator is unique p and p therefore this
is the unique minimax estimator for p, with respect to ⇤ = B( n/2, n/2).
Now let ⇤ be an arbitrary distribution on [0, 1]. Then the Bayes estimator of p is
⇤ (x) = E(p|x), i.e.
Z 1 Z 1
1
⇤ (x) = p d⇤(p|x) = pf (x|p) d⇤(p)
0 0 f (x)
R1 x n x
1 pp (1 p) d⇤(p)
= R1
x
.
0 p (1 p)n x d⇤(p)
Using the power expansion

(1 p)n x
= 1 + a1 p + . . . + an x p n x ,

this becomes
R 1 x+1
0 p + a1 px+2 + . . . + an x pn+1 d⇤(p)
⇤ (x) = R ,
px + a1 px+1 + . . . + an x pn d⇤(p)
which shows that the Bayes estimator depends on the distribution ⇤ only via the n + 1
first moments of ⇤. Therefore the least favorable distribution is not unique for esti-
mating p in a Bin(n, p) distribution: Two priors with the same first n + 1 moments
will give the same Bayes estimator. 2

51
Recall that when the loss function is convex, then for any randomized estima-
tor there is a nonrandomized estimator which has at least as small a risk as the
randomized, so then there is no need to consider randomized estimators.
The relation that was established between the minimax estimator and the Bayes
estimator, and obtaining the prior ⇤ as least favorable distribution, is valid when ⇤
is proper prior. What happens when ⇤ is not proper? Sometimes the estimation
problem at hand makes it natural to consider such an improper prior: One such
situation is when estimating the mean in a Normal distribution with known variance,
and the mean is unrestricted i.e. it is a real number. Then one could believe that the
least favorable distribution is the Lebesgue measure on R.
To model this, assume ⇤ is a fixed (improper) prior and let {⇤n } be a sequence
of priors that in some sense approximate ⇤:
Definition 17 Let {⇤n } be a sequence of priors and let n be the Bayes estimator
corresponding to ⇤n with
Z
rn = R(✓, n ) d⇤n (✓)

the Bayes risk. Define r = limn! rn .


If r⇤0  r for every (proper) prior distribution ⇤0 then the sequence {⇤n } is called
least favorable.
Now if is an estimator whose maximal risk attains this limiting Bayes risk, then
is minimax and the sequence is least favorable:
Theorem 12 Let {⇤n } be a sequence of priors and let r = limn!1 rn . Assume the
estimator satisfies
r = sup R(✓, ).

Then is minimax and the sequence {⇤n } is least favorable.


0
Proof. To prove the minimaxity: Let be any other estimator. Then
Z
sup R(✓, 0 ) R(✓, 0 ) d⇤(✓) rn ,

for every n. Since the right hand side converges to r = sup✓ R(✓, ) this implies that
sup R(✓, 0 ) sup R(✓, ),
✓ ✓

and thus is minimax.


To prove that {⇤n } is least favorable: Let ⇤ be any (proper) prior distribution
and let ⇤ be the corresponding Bayes estimator. Then
Z Z
r⇤ = R(✓, ⇤ ) d⇤(✓)  R(✓, ) d⇤(✓)
 sup R(✓, ) = r,

and thus {⇤n } is least favorable. 2

52
Note 2 Uniqueness of the Bayes estimators n does not implyR uniqueness of the
minimax
R
estimator, since in that case the
R
strict inequality in rn = R(✓, n ) d⇤n (✓) <
R(✓, ) d⇤(✓) is transformed to r  R(✓, 0 ) d⇤(✓) under the limit operation.
0

To calculate the Bayes risk rn for n, the following is useful.


Lemma 9 Let ⇤ be the Bayes estimator of g(✓) corresponding to ⇤, and assume
quadratic loss. Then the Bayes risk is given by
Z
r⇤ = Var(g(⇥)|x) dP (x).

Proof. The Bayes estimator is for quadratic loss given by ⇤ (x) = E(g(⇥)|x), and
thus the Bayes risk is
Z
r⇤ = R(✓, ⇤ )d⇤(✓) = ...(Fubinis theorem)... =
Z
= E([ ⇤ (x) g(⇥)]2 |x) dP (x)
Z
= E([E(g(⇥)|x) g(⇥)]2 |x) dP (x)
Z
= Var(g(⇥)|x) dP (x).
2

Note 3 If the conditional variance Var(g(⇥)|x) is independent of x, the Bayes risk


can be obtained as Var(g(⇥)|x).
Example 33 Let X = (X1 , . . . , Xn ) be an i.i.d sequence with X1 2 N (✓, 2 ). We
want to estimate g(✓) = ✓ and assume quadratic loss. Assume 2 is known. Then
= X̄ is minimax: To prove this we will find a sequence a Bayes estimators n whose
Bayes risk satisfies rn ! 2 /2 = sup✓ R(✓, ); the last equality holds since the risk of
X̄ does not depend on ✓.
Let ✓ be distributed according to the prior ⇤ = N (µ, b2 ). Then the Bayes estimator
is
nx̄/ 2 + µ/b2
⇤ (x) = ,
n/ 2 + 1/b2
with posterior variance
1
Var(⇥|x) = 2
,
n/ + 1/b2
which, since it is independent of x, implies the Bayes risk
1
r⇤ = 2
.
n/ + 1/b2
2
If b ! 1, then r⇤ = r⇤b ! /n. Thus = X̄ is minimax. 2

53
Lemma 10 Let F1 ⇢ F be sets of distributions. Let g(F ) be an estimand (a func-
tional) defined on F. Assume 1 is minimax over F1 . Assume

sup R(F, 1 ) = sup R(F, 1 ).


F 2F F 2F1

Then 1 is minimax over F.

Proof. Assume 1 is minimax over F1 . Since for every ,

sup R(F, )  sup R(F, )


F 2F1 F 2F

we get

sup R(F, 1 ) = sup R(F, 1 ) = inf sup R(F, )  inf sup R(F, ).
F 2F F 2F1 F 2F1 F 2F

Thus we have equality in the last inequality, and thus 1 is minimax over F. 2

7.1 Minimax estimation in exponential families


Recall that the standard approach to estimation in exponential families is to make a
restriction to the unbiased estimator, and in this restricted family find the estimator
(if it exists) that has smallest risk (or variance for quadratic loss), uniformly over
the parameter space (the UMVU), or locally for a fixed parameter (the LMVU). If
the UMVU estimator exists it might be inadmissable, i.e. there might be another
estimator which has uniformly smaller risk but that is biased.
For Bayes estimators this can not happen.
Theorem 13 Assume the Bayes estimator ⇤ is unique, P✓ -a.s. for every ✓ 2 ⌦.
Then ⇤ is admissible.
0
Proof. Assume ⇤ is inadmissible and so it is is dominated by some so that
Z Z
R(✓, 0 ) d⇤(✓)  R(✓, ⇤ ) d⇤(✓).

0 0
Then is also Bayes and uniqueness implies that = ⇤ P✓ -a.s., i.e. ⇤ is admissible.
2

Example 34 Assume X = (X1 , . . . , Xn ) is a random sample from a N (✓, 2 ) dis-


tribution with 2 known. We have previously shown that = X̄ is minimax for
estimating ✓. To investigate whether it is admissible define instead the estimator

a,b = aX̄ + b.

Is this estimator admissible?

54
Let ⇤ = N (µ, ⌧ 2 ) be a prior distribution for ✓. Then as previously shown the
unique Bayes estimator of ✓ is
n⌧ 2 2
⇤ (X) = 2 + n⌧ 2
X̄ + 2
µ.
+ n⌧ 2
Therefore ⇤ is admissible.
So if the factor a satisfies 0 < a < 1, for (the fixed n and 2 at hand) there is a
prior parameter ⌧ such that n⌧ 2 /( 2 + n⌧ 2 ) = a. Thus for 0 < a < 1 the estimator
a,b is admissible. 2

What happens with the admissibility of aX̄ + b for the other possible values of a, b?
2
Lemma 11 Assume X 2 N (✓, ). Then the estimator (X) = aX + b is inadmis-
sible whenever
(i) a > 1,
(ii) a < 0,
(iii) a = 1 and b 6= 0.
Proof. The risk of is
R(✓, ) = E(aX + b ✓)2
= a2 2 + ((a 1)✓ + b)2 =: ⇢(a, b).
Thus
⇢(a, b) a2 2
> 2
,
2
when a > 1, so then is dominated by X, which has risk , which proves (i).
Furthermore, when a < 0,
⇢(a, b) ((a 1)✓ + b)2
!2
2 b
= (a 1) ✓+
a 1
!2
b
> ✓+
a 1
!
b
= ⇢ 0, ,
a 1
so then is dominated by the constant estimator b/(a 1), which proves (ii).’
Finally, the estimator X b has risk
2
r(a, b) = + b2
2
> ,
and is therefore dominated by the estimator X, which proves (iii). 2

55
Example 35 (ctd.) The previous example and Theorem implies that = aX̄ + b is
admissible for estimating ✓ in a N (✓, 2 ) distribution, when 0 < a < 1. Also at a the
estimator is = b which is the only estimator with zero risk so it is admissible also
then.
The estimator is inadmissible when a < 0 or a > 1 and when a = 1, b 6= 0. 2

What happens when a = 1, b = 0, i.e. for the estimator (X) = X̄.


2
Example 36 Assume X = (X1 , . . . , Xn ) is a random sample from a N (✓, ) distri-
bution with 2 = 1, and assume we have squared error loss. Assume that = X̄ is
inadmissible, with ⇤ a dominating estimator.
Thus, since is unbiased with variance 1/n, we have

⇤ 1
R(✓, )  ,
n
for all ✓, with strict inequality for at least one ✓ = ✓0 . Since R(✓, ) is a continuous
function of ✓ (it is a weighted mean of quadratic functions) the strict inequality holds
in a neighbourhood (✓1 , ✓2 ) 3 ✓0 , i.e.

⇤ 1
R(✓, )  ✏, (5)
n
on (✓1 , ✓2 ), for some ✏ > 0.
Let ⇤⌧ = N (0, ⌧ 2 ) and define
Z
r⌧ = R(✓, ⇤ ) d⇤⌧ (✓) = (The Bayes risk for ⇤)

1
= 2
n/ + 1/⌧ 2
⌧2
= 2
,
Z1 + n⌧
r⇤ = R(✓, ⇤
) d⇤⌧ (✓).

Then
R ⇣1 ⌘
⌧ 2 /2⌧ 2
1/n r⌧⇤ p1 R(✓, ⇤
) e d⌧
2⇡⌧ n
= 1+n⌧ 2 n⌧ 2
1/n r⌧ n(1+n⌧ 2 )
n(1 + n⌧ 2 )✏ Z ✓2 ✓ 2 /2⌧ 2
p e .
2⇡⌧ ✓1

R ✓2
By monotone convergence the integral converges to ✓1 d✓ = ✓2 ✓1 , which implies
that
1/n r⌧⇤
! +1,
1/n r⌧

56
as ⌧ ! 1. But this implies that for some (large enough) ⌧0 we have r⌧⇤0 < r⌧0 , which
contradicts the fact that ⌧0 is the Bayes estimator.
Therefore (5) can not hold, and thus = X̄ is admissible. 2

Karlin’s theorem generalizes the obtained results to estimation of the mean in


exponential families. Thus let X have probability density

p✓ (x) = (✓)e✓T (x) , (6)

with ✓ a real valued parameter and T a real valued function. The natural parameter
space for this family is an interval ⌦ = (✓1 , ✓2 ) in the extended real line. Assume we
have squared error loss. Let (X) = aT (X) + b be an estimator.
Then, when a < 0 and a > 1 the estimator is inadmissible for estimating g(✓) =
E✓ (T (X)), the proof of which is analogous to the proof of Lemma 4. Also, for a = 0
the estimator is constant and is then admissible. What happens for 0 < a  1?
Reparametrize the estimator as
1
, (x) = T (x) + ,
1+ 1+
so that , replaces a, b, and 0 < a  1 is translated to 0  < 1.
Theorem 14 Assume that for some ✓0 2 (✓1 , ✓2 )
Z u u
e
lim du = 1,
u"✓2 ✓0 (u)
Z ✓0 u
e
lim du = 1.
u#✓1 u (u)
Then , is admissible for estimating g(✓).
For a proof, see Lehmann [?].
Corollary 5 If the natural parameter space of (6) is the whole real line ⌦ = ( 1, 1)
then T (X) is admissible for estimating g(✓).
Proof. T (X) corresponds to the estimator 0,1 (X) i.e. = 0, = 1. Then the
integrands in Karlin’s theorem is the constant 1, so both integrals tend to 1, and
thus T (X) is admissible. 2

Example 37 The natural parameter space is R for the Normal distribution with
known variance, the Poisson distribution and the Binomial distribution. (Excercise).
2

Lemma 12 If is admissible and has constant risk, then is minimax.

57
0
Proof. If is not minimax, there is another estimator such that
sup R(✓, 0 ) < sup R(✓, ) = c,
✓ ✓

so that for all ✓ we have R(✓, 0 ) < c = R(✓, ), which implies that is inadmissible. 2

Lemma 13 If is unique minimax, is admissible.


Proof. Assume is inadmissible so that the estimator 0 6= satisfies R(✓, 0 ) 
R(✓, ) for all ✓. But then can not be minimax, since it is unique. 2

Corollary 6 Let
(d g(✓))2
L(✓, d) =
Var(T (X))
be the loss function for estimating g(✓) = E✓ (T (X)), and assume the natural param-
eter space of (6) is the real line. Then (x) = T (x) minimax. It is unique.
Proof. The estimator T (X) is admissible and has constant risk under the loss L,
and is therefore minimax. It is unique since the loss function is strictly convex. 2

Example 38 Let X 2 Bin(n, p) and assume the loss function L(p, d) = (p d)2 /pq.
The natural parameter space of the Binomial distribution is R, and the density is
⇣ n⌘
x n x
p(x) = x p (1 p)!x !n
⇣ n⌘ p 1
= x 1 p 1 p
i.e. of the form (6) with ✓ = p/(1 p), (✓) = (1 p) n and T (x) = x. Thus
T (X) = X is unique minimax for estimating E(X) = np and therefore (X) = X/n
is unique minimax for estimating p. Since is unique minimax it is admissible. 2

7.2 Minimax estimation in group families


Assume P, L, g(✓) is an invariant estimation problem for the transformation groups
G, Ḡ, G ⇤ .
Then typically there exists a MRE estimator, and it has constant risk. Recall also
that a Bayes estimator with constant risk is minimax, and admissible if it is unique
Bayes. Recall the relation between Bayes estimators and equivariant estimators that
said that a Bayes estimator is almost equivariant. This implies that a unique Bayes
estimator in the present setting is admissible
It is possible to obtain results on minimaxity and admissibility for Bayes estimators
under improper priors, which we however refrain from.

58

You might also like