0% found this document useful (0 votes)
134 views14 pages

3 - Principles of Data Reduction

1. The document discusses two principles of data reduction: the sufficiency principle and the likelihood principle. The sufficiency principle promotes methods of data reduction that do not discard information about the unknown parameter while summarizing the data. The likelihood principle describes how the likelihood function contains all the available information about the parameter. 2. A sufficient statistic captures all information about the unknown parameter in a sample. The sufficiency principle states that inference should depend only on the sample through the value of a sufficient statistic. 3. The likelihood function is defined as the function of the parameter given an observed sample. The likelihood principle states that samples yielding proportional likelihood functions should result in identical conclusions.

Uploaded by

lucy heartfilia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views14 pages

3 - Principles of Data Reduction

1. The document discusses two principles of data reduction: the sufficiency principle and the likelihood principle. The sufficiency principle promotes methods of data reduction that do not discard information about the unknown parameter while summarizing the data. The likelihood principle describes how the likelihood function contains all the available information about the parameter. 2. A sufficient statistic captures all information about the unknown parameter in a sample. The sufficiency principle states that inference should depend only on the sample through the value of a sufficient statistic. 3. The likelihood function is defined as the function of the parameter given an observed sample. The likelihood principle states that samples yielding proportional likelihood functions should result in identical conclusions.

Uploaded by

lucy heartfilia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Principles of Data Reduction

Stat 131 | 2nd Sem, AY 2019-2020


Introduction
An experimenter uses the information in a sample X1 , X2 , . . . , Xn to make inferences about
an unknown parameter θ.
If the sample size is large, the observations may be a long list of numbers that is hard to
interpret. Hence, the experimenter might wish to summarize the information in a sample
by determining a few key features of the sample values.
This is usually done by computing statistics, functions of the sample.
Any statistic, T (X1 , . . . , Xn ), denes a form of data reduction or data summary.
In this chapter, we study two principles of data reduction:
1 Suciency Principle - promotes a method of data reduction that does not discard
information about θ while achieving some summarization of the data
2 Likelihood Principle - describes a function of the parameter (determined by the observed
sample) that contains all the information about θ that is available from the sample

Principles of Data Reduction 2 / 14


The Suciency Principle

Principles of Data Reduction 3 / 14


The Suciency Principle

A sucient statistic for a parameter θ captures all the information about θ contained in the
sample. Any additional information in the sample besides the value of the sucient statistic
does not contain any more information about θ.
Def. A statistic T (X ) is a sucient statistic for θ if the conditional distribution of the sample
X given the value of T (X ) does not depend on θ.

The suciency principle states that if T (X ) is a sucient statistic for θ then any inference
about θ should depend on the sample X only through the value T (X ). That is, if x and y are
two sample points such that T (x) = T (y ) then the inference about θ should be the same
whether X = x or Y = y is observed.

Principles of Data Reduction 4 / 14


Theorem

Theorem. If p(x|θ) is the joint PDF or PMF of X and q(t|θ) is the PMF or PDF of T (X )
p(x|θ)
then T (X ) is a sucient statistic for θ if for every x in the sample space, the ratio is
q(t|θ)
constant as a function of θ.
Example: Let X1, X2, . . . , Xn r∼.s. Be(θ). Is the sample sum a sucient statistic for θ?

Principles of Data Reduction 5 / 14


Factorization Theorem

Theorem. Let f (x|θ) denote the joint PDF or PMF of X . A statistic T (X ) is a sucient
statistic for θ if and only if there exists functions g (t|θ) and h(x) such that, for all sample
points x and all parameter points θ,

f (x|θ) = g (T (X )|θ)h(x)

Note: When the parameter is a vector (e.g. (θ1 , θ2 )) then the sucient statistic is a vector and
they are viewed as jointly sucient. One-to-one functions of (jointly) sucient statistics are
also (jointly) sucient.

Principles of Data Reduction 6 / 14


Factorization Theorem: Exponential Family
Theorem. Let X1, X2, . . . , Xn be iid observations from a PDF or PMF f (x|θ) that belongs to
an exponential family given by
k
!
X
f (x|θ) = h(x)c(θ)exp wi (θ)ti (x)
i=1

where θ = (θ1 , θ2 , . . . , θd ), d ≤ k . Then


n n
!
X X
T (X ) = t1 (Xj ), tk (Xj )
i=1 i=1

is a sucient statistic for θ.


Example: Let X1, X2, . . . , Xn r∼.s. Po(λ), λ > 0. Find a sucient statistic for λ.
Principles of Data Reduction 7 / 14
Minimal Sucient Statistic

Def. A sucient statistic T (X ) is called a minimal sucient statistic if for any other
sucient statistic T (X ), T (X ) is a function of T (X ).
0 0

Remarks:
1. A minimal sucient statistic achieves the greatest possible data reduction for a sucient
statistic.
2. A minimal sucient statistic is not unique. Any one-to-one function of the minimal
sucient statistic is also a minimal sucient statistic.

Principles of Data Reduction 8 / 14


Theorem

Theorem. Let f (x|θ) be the joint PDF or PMF of a sample X . Suppose there exists a
f (x|θ)
function T (x) such that for every two sample points x and y , the ratio is constant as a
f (y |θ)
function of θ if and only if T (x) = T (y ). Then T (X ) is a minimal sucient statistic for θ.

Principles of Data Reduction 9 / 14


Ancillary Statistic

Previously, we considered sucient statistics which contain all the information about θ that is
available in the sample. Now, we introduce a dierent sort of statistic which has a
complementary purpose.
Def. A statistic S(X ) whose distribution does not depend on the parameter θ is called an
ancillary statistic.
Alone, an ancillary statistic contains no information about θ. However, when used in
conjunction with other statistics, it can contain valuable information for inferences about θ.

Principles of Data Reduction 10 / 14


Complete Statistic
Def. A statistic T = T (x) is said to be complete if and only if
E [g (T )] = 0 implies P[g (T ) = 0] = 1, ∀θ ∈ Ωθ

T (x) is complete if and only if the only estimator of zero, which is a function of T and which
has zero mean, is a statistic that is identically zero with probability 1 (i.e., a statistic that is
degenerate at the point 0). Also, note that completeness is (strictly speaking) a property of a
family of distributions.
Theorem. Given a random sample from an exponential family, T (x) as previously dened also
yields a (set of jointly) complete sucient statistic/s. (NOTE: This theorem holds true as long
as a certain condition is also true. Due to the level of the course, we will only consider cases
where such condition is true.)
Example: Let X1, X2, . . . , Xn r∼.s. Be(θ). Is the sample sum a complete statistic?
Principles of Data Reduction 11 / 14
Basu's Theorem

Theorem. If T (x) is a complete (and minimal) sucient statistic then T (x) is independent of
every ancillary statistic.
Remarks:
This theorem is useful since it allows us to deduce the independence of two statistics
without ever nding their joint distribution.
The word "minimal" may be omitted from the statement of the theorem but it will remain
true: if a minimal sucient statistic exists then any complete statistic is also a minimal
sucient statistic.

Principles of Data Reduction 12 / 14


The Likelihood Principle

Principles of Data Reduction 13 / 14


The Likelihood Function

Def. Let f (x|θ) denote the joint PDF or PMF of the sample X . Given that X = x is observed,
the function of θ dened by L(θ|x) = f (x|θ) is called the likelihood function.
The likelihood principle states that if x and y are two sample points such that L(θ|x) is
proportional to L(θ|y ), i.e. there exists a constant C (x, y ) such that

L(θ|x) = C (x, y )L(θ|y )


then the conclusions drawn from x and y should be identical.

Principles of Data Reduction 14 / 14

You might also like