Statistics 512 Notes 26: Decision Theory Continued: FX FX D
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
Continued
Posterior Analysis
We now develop a method for finding the Bayes rule. The
Bayes risk for a prior distribution
( )
is the expected loss
of a decision rule
( ) d X
when X is generated from the
following probability model:
First, the state of nature is generated according to the prior
distribution
( )
Then, the data X is generated according to the distribution
( ; ) f X
, which we will denote by
( | ) f X
Under this probability model (call it the Bayes model), the
marginal distribution of X is (for the continuous case)
( ) ( | ) ( )
X
f x f x d
(We have used the relations
,
( | ) ( ) ( , ) ( ) ( | )
X X
f x f x f x h x
)
Now the inner integral is the posterior risk and since
( )
X
f x
is nonnegative,
( ) B d
can be minimized by choosing
0
( ) ( ) d x d x
.
The action
*
a that minimizes the posterior risk is the Bayes
rule action
** *
( ) d x a
Example: Consider again the engineering example.
Suppose that we observe
2
45 X x
. In the notation of
that example, the prior distribution is
1
( ) .8
,
2
( ) .2
.
We first calculate the posterior distribution:
2 1 1
1 2
2
2
1
( | ) ( )
( | )
( | ) ( )
.3*.8
.3*.8 .2*.2
.86
i i
i
f x
h x
f x
Hence,
2 2
( | ) .14 h x
We next calculate the posterior risk (PR) for
1
a
and
2
a
:
1 1 1 1 2 2 1 2 2
( ) ( , ) ( | ) ( , ) ( | )
0 400*.14
56
PR a l a h x l a h x +
+
and
2 1 2 1 2 2 2 2 2
( ) ( , ) ( | ) ( , ) ( | )
100*.86 0
86
PR a l a h x l a h x +
+
n
R d E d X X
E
1
1
]
K
Suppose that a Bayesian approach is taken and the prior
distribution on is
( )
. Then from the above theorem,
the Bayes rule for squared error loss can be found by
minimizing the posterior risk, which is
2
( )
[( ) | ] E X x
. We have
[ ] ( )
2
2
( ) ( ) ( )
2
( ) ( )
[( ) | ] ( ) | ( ) |
| |
E X x Var X x E X x
Var X x E X x
1 1
+
] ]
1
+
]
The first term of this last expression does not depend on
and the second term is minimized by
( )
| E X x
1
]
.
Thus, the Bayes rule for squared error loss is the mean of
the posterior distribution of ,
( | ) h X d
.
Example: A (possibly) biased coin is thrown once, and we
want to estimate the probability of the coin landing heads
on future tosses based on this single toss. Suppose that we
have no idea how biased the coin is; to reflect this state of
knowledge, we use a uniform prior distribution on :
( ) 1, 0 1 g
Let 1 X if a head appears, and let 0 X if a tail appears.
The distribution of X given is
, 1
( | )
1 , 0
x
f x
x
'
In particular,
1
0
1
0
( | 1) 2
(1 ) 1
( | 0) 2(1 )
(1 )
h X
d
h X
d
3
.
Note that these estimates differ from the classical
maximum likelihood estimates, which are 0 and 1.
Comparison of the risk functions of the Bayes estimate and
the MLE:
The risk function of the above Bayes estimate is
( )
2 2
2
2 2
2
1 2
( 0) ( 1)
3 3
1 2
(1 )
3 3
1 1 1
9 3 3
Bayes
E P X P X
_ _
1
+
1
]
, ,
_ _
+
, ,
+
The risk function of the MLE
MLE
X is
( )
( ) ( )
( ) ( )
2
2 2
2 2
2
0 ( 0) 1 ( 1)
0 (1 ) 1
(1 )
MLE
E P X P X
1
+
1
]
+
+
The following graph shows the risk function of the Bayes
estimate and the MLE the Bayes estimate is the solid line
and the MLE is the dashed line. The Bayes estimate has
smaller risk than that of the MLE over most of the range of
[0,1] but neither estimator dominates the other.
Admissibility
Minimax estimators and Bayes estimators are good
estimators in the sense that their risk functions have
certain good properties; minimax estimators minimize the
worst case risk and Bayes estimators minimize a weighted
average of the risk. It is also useful to characterize bad
estimators.
Definition: An estimator
'
that dominates
, meaning that
( , ') ( , ) for all and
( , ') ( , ) for at least one
If there is no estimator ' that dominates , then is
R R
R R
admissible
<
Example: Let X be a sample of size one from a
( ,1) N
distribution and consider estimating with squared error
loss.
Let
( ) 2 X X . Then
( )
2
2
2
( , ) [(2 ) ]
(2 ) [2 ]
4
R E X
Var X E X
+
+
The risk function of the MLE
( )
MLE
X X is
( )
2
2
( , ) [( ) ]
( ) [ ]
1
MLE
R E X
Var X E X
( ) 2 X X and
( ) 2 X X is
inadmissible.
Consider another estimator
'
with smaller risk. In particular,
(3, ') (3, ) 0 R R . Hence,
( )
2 2
1
0 (3, ') ( ' 3) exp ( 3) / 2
2
R x dx
.
Thus,
. Even though
such that
( ) 0 >
for all
.
(2) is an interval and * d is a Bayes rule with respect to a
prior density function
( ) g
such that
( ) 0 >
for all
and
( , ) R d
is a continuous function of for all d .
Then * d is admissible.
Proof: We will prove the theorem for assumption (2). The
proof is by contradiction. Suppose that * d is inadmissible.
There is then another estimate, d , such that
( , *) ( , ) R d R d
for all and with strict inequality for
some , say
0
. Since
( , *) ( , ) R d R d
is a continuous
function of , there is an 0 > and an interval h t such
that
0 0
( , *) ( , ) for R d R d h h > +
Then,
[ ] [ ]
0
0
0
0
( , *) ( , ) ( ) ( , *) ( , ) ( )
( ) 0
h
h
h
h
R d R d d R d R d d
d
> >
.
The proof is complete.
( ) 3 X
for the normal distribution above).