Minimax
Minimax
Previously we have looked at Bayes estimation when the overall measure of the es-
timation error is weighted over all values of the parameter space, with a positive
weight function, a prior. Now we use the maximum risk as a relevant measure of the
estimation risk,
sup R(✓, ),
✓
and define the minimax estimator, if it exists, as the estimator that minimizes this
risk, i.e.
ˆ = arg inf sup R(✓, ),
✓
is of interest.
Definition 16 The prior distribution ⇤ is called least favorable for estimating g(✓)
if r⇤ r⇤0 for every other distribution ⇤0 on ⌦.
When a Bayes estimator has a Bayes risk that attains the minimax risk it is minimax:
Theorem 11 Assume ⇤ is a prior distribution and assume the Bayes estimator ⇤
satisfies
Z
R(✓, ⇤ ) d⇤(✓) = sup R(✓, ⇤ ).
✓
Then
(i) ⇤ is minimax.
(ii) If ⇤ is the unique Bayes estimator, it is the unique mimimax estimator.
(iii) ⇤ is least favorable.
49
no other estimator has the same maximum risk, which proves the uniqueness of the
minimax estimator.
To show (iii), let ⇤0 6= ⇤ be a distribution on ⌦, and let ⇤ , ⇤0 be the corresponding
Bayes estimators. The the Bayes risks are related by
Z Z
r⇤ 0 = R(✓, ⇤0 ) d⇤0 (✓) R(✓, ⇤ ) d⇤
0
(✓) sup R(✓, ⇤)
✓
= r⇤ ,
where the first inequality follows since ⇤0 is the Bayes estimator corresponding to ⇤0 .
We have proven that ⇤ is least favorable. 2
Note 1 The previous result says minimax estimators are Bayes estimators with re-
spect to the least favorable prior.
Two simple cases when the Bayes estimator’s Bayes risk attains the minimax risk are
given in the following.
Corollary 3 Assume the Bayes estimator ⇤ has constant risk, i.e. R(✓, ⇤) does
not depend on ✓. Then ⇤ is minimax.
Proof. If the risk is constant the supremum over ✓ is equal to the average over ✓ so
the Bayes and the minimax risk are the same, and the result follows from the previous
theorem. 2
Proof. The condition ⇤(⌦⇤ ) = 1 means that the Bayes estimator has constant risk
⇤-almost surely, and since the Bayes estimator is only determined modulo ⇤-null sets,
this is enough. 2
Example 32 Let X 2 Bin(n, p). Let ⇤ = B(a, b) be the Beta distribution for p. As
previously established, via the conditional distribution of p given X (which is Beta
also), the Bayes estimator of p is
a+x
⇤ (x) = ,
a+b+n
50
with risk function, for quadratic loss,
(a + b)2 = n, 2a(a + b) = n,
p
which has a solution a = b = n/2.
Thus
p
x + n/2
⇤ (x) = p
n+ n
(1 p)n x
= 1 + a1 p + . . . + an x p n x ,
this becomes
R 1 x+1
0 p + a1 px+2 + . . . + an x pn+1 d⇤(p)
⇤ (x) = R ,
px + a1 px+1 + . . . + an x pn d⇤(p)
which shows that the Bayes estimator depends on the distribution ⇤ only via the n + 1
first moments of ⇤. Therefore the least favorable distribution is not unique for esti-
mating p in a Bin(n, p) distribution: Two priors with the same first n + 1 moments
will give the same Bayes estimator. 2
51
Recall that when the loss function is convex, then for any randomized estima-
tor there is a nonrandomized estimator which has at least as small a risk as the
randomized, so then there is no need to consider randomized estimators.
The relation that was established between the minimax estimator and the Bayes
estimator, and obtaining the prior ⇤ as least favorable distribution, is valid when ⇤
is proper prior. What happens when ⇤ is not proper? Sometimes the estimation
problem at hand makes it natural to consider such an improper prior: One such
situation is when estimating the mean in a Normal distribution with known variance,
and the mean is unrestricted i.e. it is a real number. Then one could believe that the
least favorable distribution is the Lebesgue measure on R.
To model this, assume ⇤ is a fixed (improper) prior and let {⇤n } be a sequence
of priors that in some sense approximate ⇤:
Definition 17 Let {⇤n } be a sequence of priors and let n be the Bayes estimator
corresponding to ⇤n with
Z
rn = R(✓, n ) d⇤n (✓)
for every n. Since the right hand side converges to r = sup✓ R(✓, ) this implies that
sup R(✓, 0 ) sup R(✓, ),
✓ ✓
52
Note 2 Uniqueness of the Bayes estimators n does not implyR uniqueness of the
minimax
R
estimator, since in that case the
R
strict inequality in rn = R(✓, n ) d⇤n (✓) <
R(✓, ) d⇤(✓) is transformed to r R(✓, 0 ) d⇤(✓) under the limit operation.
0
Proof. The Bayes estimator is for quadratic loss given by ⇤ (x) = E(g(⇥)|x), and
thus the Bayes risk is
Z
r⇤ = R(✓, ⇤ )d⇤(✓) = ...(Fubinis theorem)... =
Z
= E([ ⇤ (x) g(⇥)]2 |x) dP (x)
Z
= E([E(g(⇥)|x) g(⇥)]2 |x) dP (x)
Z
= Var(g(⇥)|x) dP (x).
2
53
Lemma 10 Let F1 ⇢ F be sets of distributions. Let g(F ) be an estimand (a func-
tional) defined on F. Assume 1 is minimax over F1 . Assume
we get
sup R(F, 1 ) = sup R(F, 1 ) = inf sup R(F, ) inf sup R(F, ).
F 2F F 2F1 F 2F1 F 2F
Thus we have equality in the last inequality, and thus 1 is minimax over F. 2
0 0
Then is also Bayes and uniqueness implies that = ⇤ P✓ -a.s., i.e. ⇤ is admissible.
2
a,b = aX̄ + b.
54
Let ⇤ = N (µ, ⌧ 2 ) be a prior distribution for ✓. Then as previously shown the
unique Bayes estimator of ✓ is
n⌧ 2 2
⇤ (X) = 2 + n⌧ 2
X̄ + 2
µ.
+ n⌧ 2
Therefore ⇤ is admissible.
So if the factor a satisfies 0 < a < 1, for (the fixed n and 2 at hand) there is a
prior parameter ⌧ such that n⌧ 2 /( 2 + n⌧ 2 ) = a. Thus for 0 < a < 1 the estimator
a,b is admissible. 2
What happens with the admissibility of aX̄ + b for the other possible values of a, b?
2
Lemma 11 Assume X 2 N (✓, ). Then the estimator (X) = aX + b is inadmis-
sible whenever
(i) a > 1,
(ii) a < 0,
(iii) a = 1 and b 6= 0.
Proof. The risk of is
R(✓, ) = E(aX + b ✓)2
= a2 2 + ((a 1)✓ + b)2 =: ⇢(a, b).
Thus
⇢(a, b) a2 2
> 2
,
2
when a > 1, so then is dominated by X, which has risk , which proves (i).
Furthermore, when a < 0,
⇢(a, b) ((a 1)✓ + b)2
!2
2 b
= (a 1) ✓+
a 1
!2
b
> ✓+
a 1
!
b
= ⇢ 0, ,
a 1
so then is dominated by the constant estimator b/(a 1), which proves (ii).’
Finally, the estimator X b has risk
2
r(a, b) = + b2
2
> ,
and is therefore dominated by the estimator X, which proves (iii). 2
55
Example 35 (ctd.) The previous example and Theorem implies that = aX̄ + b is
admissible for estimating ✓ in a N (✓, 2 ) distribution, when 0 < a < 1. Also at a the
estimator is = b which is the only estimator with zero risk so it is admissible also
then.
The estimator is inadmissible when a < 0 or a > 1 and when a = 1, b 6= 0. 2
⇤ 1
R(✓, ) ,
n
for all ✓, with strict inequality for at least one ✓ = ✓0 . Since R(✓, ) is a continuous
function of ✓ (it is a weighted mean of quadratic functions) the strict inequality holds
in a neighbourhood (✓1 , ✓2 ) 3 ✓0 , i.e.
⇤ 1
R(✓, ) ✏, (5)
n
on (✓1 , ✓2 ), for some ✏ > 0.
Let ⇤⌧ = N (0, ⌧ 2 ) and define
Z
r⌧ = R(✓, ⇤ ) d⇤⌧ (✓) = (The Bayes risk for ⇤)
1
= 2
n/ + 1/⌧ 2
⌧2
= 2
,
Z1 + n⌧
r⇤ = R(✓, ⇤
) d⇤⌧ (✓).
Then
R ⇣1 ⌘
⌧ 2 /2⌧ 2
1/n r⌧⇤ p1 R(✓, ⇤
) e d⌧
2⇡⌧ n
= 1+n⌧ 2 n⌧ 2
1/n r⌧ n(1+n⌧ 2 )
n(1 + n⌧ 2 )✏ Z ✓2 ✓ 2 /2⌧ 2
p e .
2⇡⌧ ✓1
R ✓2
By monotone convergence the integral converges to ✓1 d✓ = ✓2 ✓1 , which implies
that
1/n r⌧⇤
! +1,
1/n r⌧
56
as ⌧ ! 1. But this implies that for some (large enough) ⌧0 we have r⌧⇤0 < r⌧0 , which
contradicts the fact that ⌧0 is the Bayes estimator.
Therefore (5) can not hold, and thus = X̄ is admissible. 2
with ✓ a real valued parameter and T a real valued function. The natural parameter
space for this family is an interval ⌦ = (✓1 , ✓2 ) in the extended real line. Assume we
have squared error loss. Let (X) = aT (X) + b be an estimator.
Then, when a < 0 and a > 1 the estimator is inadmissible for estimating g(✓) =
E✓ (T (X)), the proof of which is analogous to the proof of Lemma 4. Also, for a = 0
the estimator is constant and is then admissible. What happens for 0 < a 1?
Reparametrize the estimator as
1
, (x) = T (x) + ,
1+ 1+
so that , replaces a, b, and 0 < a 1 is translated to 0 < 1.
Theorem 14 Assume that for some ✓0 2 (✓1 , ✓2 )
Z u u
e
lim du = 1,
u"✓2 ✓0 (u)
Z ✓0 u
e
lim du = 1.
u#✓1 u (u)
Then , is admissible for estimating g(✓).
For a proof, see Lehmann [?].
Corollary 5 If the natural parameter space of (6) is the whole real line ⌦ = ( 1, 1)
then T (X) is admissible for estimating g(✓).
Proof. T (X) corresponds to the estimator 0,1 (X) i.e. = 0, = 1. Then the
integrands in Karlin’s theorem is the constant 1, so both integrals tend to 1, and
thus T (X) is admissible. 2
Example 37 The natural parameter space is R for the Normal distribution with
known variance, the Poisson distribution and the Binomial distribution. (Excercise).
2
57
0
Proof. If is not minimax, there is another estimator such that
sup R(✓, 0 ) < sup R(✓, ) = c,
✓ ✓
so that for all ✓ we have R(✓, 0 ) < c = R(✓, ), which implies that is inadmissible. 2
Corollary 6 Let
(d g(✓))2
L(✓, d) =
Var(T (X))
be the loss function for estimating g(✓) = E✓ (T (X)), and assume the natural param-
eter space of (6) is the real line. Then (x) = T (x) minimax. It is unique.
Proof. The estimator T (X) is admissible and has constant risk under the loss L,
and is therefore minimax. It is unique since the loss function is strictly convex. 2
Example 38 Let X 2 Bin(n, p) and assume the loss function L(p, d) = (p d)2 /pq.
The natural parameter space of the Binomial distribution is R, and the density is
⇣ n⌘
x n x
p(x) = x p (1 p)!x !n
⇣ n⌘ p 1
= x 1 p 1 p
i.e. of the form (6) with ✓ = p/(1 p), (✓) = (1 p) n and T (x) = x. Thus
T (X) = X is unique minimax for estimating E(X) = np and therefore (X) = X/n
is unique minimax for estimating p. Since is unique minimax it is admissible. 2
58