0% found this document useful (0 votes)
36 views3 pages

11.1 Sufficient Statistic

The document discusses sufficient statistics. It defines a sufficient statistic as a function T(X1, ..., Xn) such that the conditional distribution of the sample (X1, ..., Xn) given T does not depend on the parameter θ. This means T contains all the information about θ. The Neyman-Fisher factorization criterion provides a way to determine if a statistic is sufficient - if the joint density can be written as f(x1, ..., xn|θ) = u(x1, ..., xn)v(T(x1, ..., xn), θ) for some functions u and v, then T is sufficient. The document proves this criterion and provides an example of a sufficient statistic

Uploaded by

ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

11.1 Sufficient Statistic

The document discusses sufficient statistics. It defines a sufficient statistic as a function T(X1, ..., Xn) such that the conditional distribution of the sample (X1, ..., Xn) given T does not depend on the parameter θ. This means T contains all the information about θ. The Neyman-Fisher factorization criterion provides a way to determine if a statistic is sufficient - if the joint density can be written as f(x1, ..., xn|θ) = u(x1, ..., xn)v(T(x1, ..., xn), θ) for some functions u and v, then T is sufficient. The document proves this criterion and provides an example of a sufficient statistic

Uploaded by

ramesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Lecture 11

11.1 Sufficient statistic.


(Textbook, Section 6.7)
We consider an i.i.d. sample X1 , . . . , Xn with distribution θ from the family 

{ θ : θ ∈ Θ}. Imagine that there are two people A and B, and that


1. A observes the entire sample X1 , . . . , Xn ,


2. B observes only one number T = T (X1 , . . . , Xn ) which is a function of the
sample.
Clearly, A has more information about the distribution of the data and, in par-
ticular, about the unknown parameter θ. However, in some cases, for some choices of
function T (when T is so called sufficient statistics) B will have as much information
about θ as A has.
Definition. T = T (X1 , · · · , Xn ) is called sufficient statistics if
0


θ (X1 , . . . , Xn |T ) =  (X1 , . . . , Xn |T ), (11.1)

i.e. the conditional distribution of the vector (X1 , . . . , Xn ) given T does not depend
on the parameter θ and is equal to 0 . 

If this happens then we can say that T contains all information about the param-
eter θ of the disribution of the sample, since given T the distribution of the sample
is always the same no matter what θ is. Another way to think about this is: why the
second observer B has as much information about θ as observer A? Simply, given T ,
the second observer B can generate another sample X10 , . . . , Xn0 by drawing it accord-
ing to the distribution 0 (X1 , · · · , Xn |T ). He can do this because it does not require


the knowledge of θ. But by (11.1) this new sample X10 , . . . , Xn0 will have the same
distribution as X1 , . . . , Xn , so B will have at his/her disposal as much data as the
first observer A.
The next result tells us how to find sufficient statistics, if possible.
Theorem. (Neyman-Fisher factorization criterion.) T = T (X1 , . . . , Xn ) is suffi-
cient statistics if and only if the joint p.d.f. or p.f. of (X1 , . . . , Xn ) can be represented

42
LECTURE 11. 43

as

f (x1 , . . . , xn |θ) ≡ f (x1 |θ) . . . f (xn |θ) = u(x1 , . . . , xn )v(T (x1 , . . . , xn ), θ) (11.2)

for some function u and v. (u does not depend on the parameter θ and v depends on
the data only through T .)
Proof. We will only consider a simpler case when the distribution of the sample
is discrete.
1. First let us assume that T = T (X1 , . . . , Xn ) is sufficient statistics. Since the
distribution is discrete, we have,

f (x1 , . . . , xn |θ) = 

θ (X1 = x1 , . . . , Xn = xn ),

i.e. the joint p.f. is just the probability that the sample takes values x1 , . . . , xn . If
X1 = x1 , . . . , Xn = xn then T = T (x1 , . . . , xn ) and, therefore,

θ (X1 = x 1 , . . . , Xn = x n ) = 

θ (X1 = x1 , . . . , Xn = xn , T = T (x1 , . . . , xn )).

We can write this last probability via a conditional probability

θ (X1 = x1 , . . . , Xn = xn , T = T (x1 , . . . , xn ))
= 

θ (X1 = x1 , . . . , Xn = xn |T = T (x1 , . . . , xn )) 

θ (T = T (x1 , . . . , xn )).

All together we get,

f (x1 , . . . , xn |θ) = 

θ (X1 = x1 , . . . , Xn = xn |T = T (x1 , . . . , xn )) 

θ (T = T (x1 , . . . , xn )).

Since T is sufficient, by definition, this means that the first conditional probability

θ (X1 = x1 , . . . , Xn = xn |T = T (x1 , . . . , xn )) = u(x1 , . . . , xn )

for some function u independent of θ, since this conditional probability does not
depend on θ. Also,

θ (T = T (x1 , . . . , xn )) = v(T (x1 , . . . , xn ), θ)

depends on x1 , . . . , xn only through T (x1 , . . . , xn ). So, we proved that if T is sufficient


then (11.2) holds.
2. Let us now show the opposite, that if (11.2) holds then T is sufficient. By
definition of conditional probability, we can write,

θ (X1 = x1 , . . . , Xn = xn |T (X1 , . . . , Xn ) = t)
θ (X1 = x1 , . . . , Xn = xn , T (X1 , . . . , Xn ) = t)
= . (11.3)


θ (T (X1 , . . . , Xn ) = t)
LECTURE 11. 44

First of all, both side are equal to zero unless

t = T (x1 , . . . , xn ), (11.4)

because when X1 = x1 , . . . , Xn = xn , T (X1 , . . . , Xn ) must be equal to T (x1 , . . . , xn ).


For this t, the numerator in (11.3)

θ (X1 = x1 , . . . , Xn = xn , T (X1 , . . . , Xn ) = t) = 

θ (X1 = x1 , . . . , Xn = xn ),

since we just drop the condition that holds anyway. By (11.2), this can be written as

u(x1 , . . . , xn )v(T (x1 , . . . , xn ), θ) = u(x1 , . . . , xn )v(t, θ).

As for the denominator in (11.3), let us consider the set

A(t) = {(x1 , . . . , xn ) : T (x1 , . . . , xn ) = t}

of all possible combinations of the x’s such that T (x1 , . . . , xn ) = t. Then, obviously,
the denominator in (11.3) can be written as,

θ (T (X1 , . . . , Xn ) = t) =  ∈ A(t))
θ ((X1 , . . . , Xn )
X X
= 

θ (X1 = x1 , . . . , Xn = xn ) = u(x1 , . . . , xn )v(t, θ)


(x1 ,···,xn )∈A(t) (x1 ,···,xn )∈A(t)

where in the last step we used (11.2) and (11.4). Therefore, (11.3) can be written as

u(x1 , . . . , xn )v(t, θ) u(x1 , . . . , xn )


P =P
A(t) u(x1 , . . . , xn )v(t, θ) A(t) u(x1 , . . . , xn )

and since this does not depend on θ anymore, it proves that T is sufficient.

Example. Bernoulli Distribution B(p) has p.f. f (x|p) = px (1 − p)1−x for x ∈


{0, 1}. The joint p.f. is
P P X
f (x1 , · · · , xn |p) = p xi (1 − p)n− xi = v( Xi , p),
P
i.e. it depends on x’s onlyPthrough the sum xi . Therefore, by Neyman-Fisher
factorization criterion T = Xi is a sufficient statistic. Here we set

v(T, p) = pT (1 − p)n−T and u(x1 , . . . , xn ) = 1.

You might also like