How are Rao-Blackwell estimators better?
In #7(a) on the fourth problem set we see how a horrible aw in an estimator is remedied by the Rao-Blackwell process. That is one respect in which that particular Rao-Blackwell estimator is better than the awed estimator in that problem. Sometimes the goodness of an estimator of an unobservable quantity is measured by the smallness of its mean squared error E(( )2 ). Another respect in which Rao-Blackwell estimators are better than the estimators upon which they improve, is that they typically have smaller mean squared errors, and the never have bigger ones (the M.S.E. in some cases remains the same rather than getting smaller, however). To the question Which is bigger: E(Y 2 ) or (E(Y ))2 ? E(Y 2 ) (E(Y ))2 = var(Y ) and var(Y ) necessarily is 0. Armed with this observation, let us examine the mean squared error of a Rao-Blackwell estimator: The crude estimator is (X ). The Rao-Blackwell estimator is 0 (X ) = E( (X ) | T (X )) The Rao-Blackwell estimators mean squared error is E((0 (X ) )2 ) = = = = Summary: The R-B estimators M.S.E.
recall that the answer follows from the observation that
E((E( (X ) | T (X )) )2 ) E((E( (X ) | T (X )))2 ) (since is constant) E(E(( (X ) )2 | T (X ))) (since (E(Y ))2 E(Y 2 )) E(( (X ) )2 ) (since E(E(U | V )) = E(U )) the crude estimators mean squared error. the crude estimators M.S.E.
This bottom-line summary is the Rao-Blackwell Theorem. When is a Rao-Blackwell estimator the best estimator? The answer to that involves the concept of completeness. 1
Completeness
Suppose X1 , . . . , Xn i. i. d. N (, 12 ). Let X n = (X1 + + Xn )/n. Observe that X1 X n depends only on the data X1 , . . . , Xn and not on the unobservable , i.e., X1 X n is a statistic. E(X1 X n ) = 0 regardless of the value of . Changing the value of the unobservable does not change the fact that the expectation of this statistic is zero. In other words X1 X n is an unbiased estimator of zero. Suppose W1 , . . . , W9 i. i. d. Uniform(, +1). Let D = max{ W1 , . . . , W9 }min{ W1 , . . . , W9 }. It can be shown that E(D) = 0.8 regardless of the value of . And the value of D depends only on the data and not on the unobservable , so D is a statistic. This statistic is not an unbiased estimator of zero, but g (D) = D 0.8 is an unbiased estimator of zero. And h(W1 ) = sin(2W1 ) is also an unbiased estimator of zero. A locution about to be dened allows us to encapsulate the information above in these simple statements: X1 X n is not a complete statistic. D is not a complete statistic. W1 is not a complete statistic. Definition: A statistic U is complete i there is no function g such that g (U ) is an unbiased estimator of zero (except, of course, the indentically zero function g 0). Lehmann-Scheffe Theorem: If T (X ) is a complete sucient statistic for and (X ) is an unbiased estimator of then 0 (X ) = E( (X ) | T (X )) is an unbiased estimator of that has a smaller mean squared error than any other unbiased estimator of . Proof: Suppose (X ) is some other unbiased estimator of . Let 0 (X ) = E( (X ) | T (X )). Then 0 (X ) 0 (X ) is an unbiased estimator of zero. Because of completeness, we must therefore have 0 = 0 , i.e., they are both the same estimator. In other words, the RaoBlackwell process, which by the Rao-Blackwell theorem improves any unbiased estimator, will always yield the same result no matter which estimator we start with. Example: Suppose X1 , . . . , Xn i. i. d. N (, 12 ). Then there are many functions of X1 , . . . , Xn that are unbiased estimators of (e.g., . . . . . . .). But X n is a complete, sucient, and unbiased. Therefore X n is the best unbiased estimator of . Q: How do we know X n is complete? A: Think about two-sided Laplace transforms.. . . . . . .