11.1 Sufficient Statistic
11.1 Sufficient Statistic
{ θ : θ ∈ Θ}. Imagine that there are two people A and B, and that
i.e. the conditional distribution of the vector (X1 , . . . , Xn ) given T does not depend
on the parameter θ and is equal to 0 .
If this happens then we can say that T contains all information about the param-
eter θ of the disribution of the sample, since given T the distribution of the sample
is always the same no matter what θ is. Another way to think about this is: why the
second observer B has as much information about θ as observer A? Simply, given T ,
the second observer B can generate another sample X10 , . . . , Xn0 by drawing it accord-
ing to the distribution 0 (X1 , · · · , Xn |T ). He can do this because it does not require
the knowledge of θ. But by (11.1) this new sample X10 , . . . , Xn0 will have the same
distribution as X1 , . . . , Xn , so B will have at his/her disposal as much data as the
first observer A.
The next result tells us how to find sufficient statistics, if possible.
Theorem. (Neyman-Fisher factorization criterion.) T = T (X1 , . . . , Xn ) is suffi-
cient statistics if and only if the joint p.d.f. or p.f. of (X1 , . . . , Xn ) can be represented
42
LECTURE 11. 43
as
f (x1 , . . . , xn |θ) ≡ f (x1 |θ) . . . f (xn |θ) = u(x1 , . . . , xn )v(T (x1 , . . . , xn ), θ) (11.2)
for some function u and v. (u does not depend on the parameter θ and v depends on
the data only through T .)
Proof. We will only consider a simpler case when the distribution of the sample
is discrete.
1. First let us assume that T = T (X1 , . . . , Xn ) is sufficient statistics. Since the
distribution is discrete, we have,
f (x1 , . . . , xn |θ) =
θ (X1 = x1 , . . . , Xn = xn ),
i.e. the joint p.f. is just the probability that the sample takes values x1 , . . . , xn . If
X1 = x1 , . . . , Xn = xn then T = T (x1 , . . . , xn ) and, therefore,
θ (X1 = x 1 , . . . , Xn = x n ) =
θ (X1 = x1 , . . . , Xn = xn , T = T (x1 , . . . , xn ))
=
θ (X1 = x1 , . . . , Xn = xn |T = T (x1 , . . . , xn ))
θ (T = T (x1 , . . . , xn )).
f (x1 , . . . , xn |θ) =
θ (X1 = x1 , . . . , Xn = xn |T = T (x1 , . . . , xn ))
θ (T = T (x1 , . . . , xn )).
Since T is sufficient, by definition, this means that the first conditional probability
for some function u independent of θ, since this conditional probability does not
depend on θ. Also,
θ (X1 = x1 , . . . , Xn = xn |T (X1 , . . . , Xn ) = t)
θ (X1 = x1 , . . . , Xn = xn , T (X1 , . . . , Xn ) = t)
= . (11.3)
θ (T (X1 , . . . , Xn ) = t)
LECTURE 11. 44
t = T (x1 , . . . , xn ), (11.4)
θ (X1 = x1 , . . . , Xn = xn , T (X1 , . . . , Xn ) = t) =
θ (X1 = x1 , . . . , Xn = xn ),
since we just drop the condition that holds anyway. By (11.2), this can be written as
of all possible combinations of the x’s such that T (x1 , . . . , xn ) = t. Then, obviously,
the denominator in (11.3) can be written as,
θ (T (X1 , . . . , Xn ) = t) = ∈ A(t))
θ ((X1 , . . . , Xn )
X X
=
where in the last step we used (11.2) and (11.4). Therefore, (11.3) can be written as
and since this does not depend on θ anymore, it proves that T is sufficient.