3 - Principles of Data Reduction
3 - Principles of Data Reduction
A sucient statistic for a parameter θ captures all the information about θ contained in the
sample. Any additional information in the sample besides the value of the sucient statistic
does not contain any more information about θ.
Def. A statistic T (X ) is a sucient statistic for θ if the conditional distribution of the sample
X given the value of T (X ) does not depend on θ.
The suciency principle states that if T (X ) is a sucient statistic for θ then any inference
about θ should depend on the sample X only through the value T (X ). That is, if x and y are
two sample points such that T (x) = T (y ) then the inference about θ should be the same
whether X = x or Y = y is observed.
Theorem. If p(x|θ) is the joint PDF or PMF of X and q(t|θ) is the PMF or PDF of T (X )
p(x|θ)
then T (X ) is a sucient statistic for θ if for every x in the sample space, the ratio is
q(t|θ)
constant as a function of θ.
Example: Let X1, X2, . . . , Xn r∼.s. Be(θ). Is the sample sum a sucient statistic for θ?
Theorem. Let f (x|θ) denote the joint PDF or PMF of X . A statistic T (X ) is a sucient
statistic for θ if and only if there exists functions g (t|θ) and h(x) such that, for all sample
points x and all parameter points θ,
f (x|θ) = g (T (X )|θ)h(x)
Note: When the parameter is a vector (e.g. (θ1 , θ2 )) then the sucient statistic is a vector and
they are viewed as jointly sucient. One-to-one functions of (jointly) sucient statistics are
also (jointly) sucient.
Def. A sucient statistic T (X ) is called a minimal sucient statistic if for any other
sucient statistic T (X ), T (X ) is a function of T (X ).
0 0
Remarks:
1. A minimal sucient statistic achieves the greatest possible data reduction for a sucient
statistic.
2. A minimal sucient statistic is not unique. Any one-to-one function of the minimal
sucient statistic is also a minimal sucient statistic.
Theorem. Let f (x|θ) be the joint PDF or PMF of a sample X . Suppose there exists a
f (x|θ)
function T (x) such that for every two sample points x and y , the ratio is constant as a
f (y |θ)
function of θ if and only if T (x) = T (y ). Then T (X ) is a minimal sucient statistic for θ.
Previously, we considered sucient statistics which contain all the information about θ that is
available in the sample. Now, we introduce a dierent sort of statistic which has a
complementary purpose.
Def. A statistic S(X ) whose distribution does not depend on the parameter θ is called an
ancillary statistic.
Alone, an ancillary statistic contains no information about θ. However, when used in
conjunction with other statistics, it can contain valuable information for inferences about θ.
T (x) is complete if and only if the only estimator of zero, which is a function of T and which
has zero mean, is a statistic that is identically zero with probability 1 (i.e., a statistic that is
degenerate at the point 0). Also, note that completeness is (strictly speaking) a property of a
family of distributions.
Theorem. Given a random sample from an exponential family, T (x) as previously dened also
yields a (set of jointly) complete sucient statistic/s. (NOTE: This theorem holds true as long
as a certain condition is also true. Due to the level of the course, we will only consider cases
where such condition is true.)
Example: Let X1, X2, . . . , Xn r∼.s. Be(θ). Is the sample sum a complete statistic?
Principles of Data Reduction 11 / 14
Basu's Theorem
Theorem. If T (x) is a complete (and minimal) sucient statistic then T (x) is independent of
every ancillary statistic.
Remarks:
This theorem is useful since it allows us to deduce the independence of two statistics
without ever nding their joint distribution.
The word "minimal" may be omitted from the statement of the theorem but it will remain
true: if a minimal sucient statistic exists then any complete statistic is also a minimal
sucient statistic.
Def. Let f (x|θ) denote the joint PDF or PMF of the sample X . Given that X = x is observed,
the function of θ dened by L(θ|x) = f (x|θ) is called the likelihood function.
The likelihood principle states that if x and y are two sample points such that L(θ|x) is
proportional to L(θ|y ), i.e. there exists a constant C (x, y ) such that