0% found this document useful (0 votes)
36 views

Lecture 5

This document discusses statistical models and statistics. It defines statistical models as collections of probability distributions or densities, distinguishing between parametric models which have a fixed form depending on parameters, and nonparametric models which make fewer assumptions. Statistics are defined as functions of the sample data, like the sample mean or variance, whose distributions can be studied. The document provides examples of how the distributions of statistics change depending on the underlying distributions of the data. It introduces the concepts of sufficiency, sufficient statistics and partitions, and the factorization theorem. A sufficient statistic is one that contains all the information about the parameters, and minimal sufficient statistics provide the greatest data reduction.

Uploaded by

vmtammath2005
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Lecture 5

This document discusses statistical models and statistics. It defines statistical models as collections of probability distributions or densities, distinguishing between parametric models which have a fixed form depending on parameters, and nonparametric models which make fewer assumptions. Statistics are defined as functions of the sample data, like the sample mean or variance, whose distributions can be studied. The document provides examples of how the distributions of statistics change depending on the underlying distributions of the data. It introduces the concepts of sufficiency, sufficient statistics and partitions, and the factorization theorem. A sufficient statistic is one that contains all the information about the parameters, and minimal sufficient statistics provide the greatest data reduction.

Uploaded by

vmtammath2005
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture Notes 5

1 Statistical Models
A statistical model P is a collection of probability distributions (or a collection of densi-
ties). An example of a nonparametric model is
P =

p :

(p

(x))
2
dx <

.
A parametric model has the form
P =

p(x; ) :

where R
d
. An example is the set of Normal densities {p(x; ) = (2)
1/2
e
(x)
2
/2
}.
For now, we focus on parametric models. Later we consider nonparametric models.
2 Statistics
Let X
1
, . . . , X
n
p(x; ). Let X
n
(X
1
, . . . , X
n
). Any function T = T(X
1
, . . . , X
n
) is itself
a random variable which we will call a statistic.
Some examples are:
1. order statistics, X
(1)
X
(2)
X
(n)
2. sample mean: X =
1
n

i
X
i
,
3. sample variance: S
2
=
1
n1

i
(X
i
x)
2
,
4. sample median: middle value of ordered statistics,
5. sample minimum: X
(1)
6. sample maximum: X
(n)
.
Often, we are interested in the distribution of T.
Example 1 If X
1
, . . . , X
n
(, ), then X (n, /n).
1
Proof. The mgf is
M
X
= E[e
tx
] = E[e
P
X
i
t/n
] =

i
E[e
X
i
(t/n)
]
= [M
X
(t/n)]
n
=

1
1 t/n

n
=

1
1 /nt

n
.
This is the mgf of (n, /n).
Example 2 If X
1
, . . . , X
n
N(,
2
) then X N(,
2
/n).
Example 3 If X
1
, . . . , X
n
iid Cauchy(0,1),
p(x) =
1
(1 +x
2
)
for x R, then X Cauchy(0,1).
Example 4 If X
1
, . . . , X
n
N(,
2
) then
(n 1)

2
S
2

2
(n1)
.
The proof is based on the mgf.
Example 5 Let X
(1)
, X
(2)
, . . . , X
(n)
be the order statistics, which means that the sample
X
1
, X
2
, . . . , X
n
has been ordered from smallest to largest:
X
(1)
X
(2)
X
(n)
.
Now,
F
X
(k)
(x) = P(X
(k)
x)
= P(at least k of the X
1
, . . . , X
n
x)
=
n

j=k
P(exactly j of the X
1
, . . . , X
n
x)
=
n

j=k

n
j

[F
X
(x)]
j
[1 F
X
(x)]
nj
Dierentiate to nd the pdf (See CB p. 229):
p
X
(k)
(x) =
n!
(k 1)!(n k)!
[F
X
(x)]
k1
p(x) [1 F
X
(x)]
nk
.
2
3 Suciency
(Ch 6 Casella and Berger.) We continue with parametric inference. In this section we
discuss data reduction as a formal concept.
3.1 Sucient Statistics
Suppose that X
1
, . . . , X
n
p(x; ). T is sucient for if the conditional distribution
of X
1
, . . . , X
n
|T does not depend on . Thus, p(x
1
, . . . , x
n
|t; ) = p(x
1
, . . . , x
n
|t).
Intuitively, this means that you can replace X
1
, . . . , X
n
with T(X
1
, . . . , X
n
) without losing
information. (This is not quite true as well see later. But for now, you can think of it this
way.)
Example 6 X
1
, , X
n
Poisson(). Let T =

n
i=1
X
i
. Then,
p
X
n
|T
(x
n
|t) = P(X
n
= x
n
|T(X
n
) = t) =
P(X
n
= x
n
and T = t)
P(T = t)
.
But
P(X
n
= x
n
and T = t) =

0 T(x
1
, . . . , x
n
) = t
P(X
1
= x
1
, . . . , X
n
= x
n
) T(x
1
, . . . , x
n
) = t.
Hence,
P(X
n
= x
n
) =
n

i=1
e

x
i
x
i
!
=
e
n

P
x
i

(x
i
!)
=
e
n

(x
i
!)
.
Now, T(x
n
) =

x
i
= t and so
P(T = t) =
e
n
(n)
t
t!
since T Poisson(n).
Thus,
P(X
n
= x
n
)
P(T = t)
=
t!
(

x
i
)!n
t
which does not depend on . So T =

i
X
i
is a sucient statistic for . Other sucient
statistics are: T = 3.7

i
X
i
, T = (

i
X
i
, X
4
), and T(X
1
, . . . , X
n
) = (X
1
, . . . , X
n
).
3
3.2 Sucient Partitions
It is better to describe suciency in terms of partitions of the sample space.
Example 7 Let X
1
, X
2
, X
3
Bernoulli(). Let T =

X
i
.
x
n
t p(x|t)
(0, 0, 0) t = 0 1
(0, 0, 1) t = 1 1/3
(0, 1, 0) t = 1 1/3
(1, 0, 0) t = 1 1/3
(0, 1, 1) t = 2 1/3
(1, 0, 1) t = 2 1/3
(1, 1, 0) t = 2 1/3
(1, 1, 1) t = 3 1
8 elements 4 elements
1. A partition B
1
, . . . , B
k
is sucient if f(x|X B) does not depend on .
2. A statistic T induces a partition. For each t, {x : T(x) = t} is one element of the
partition. T is sucient if and only if the partition is sucient.
3. Two statistics can generate the same partition: example:

i
X
i
and 3

i
X
i
.
4. If we split any element B
i
of a sucient partition into smaller pieces, we get another
sucient partition.
Example 8 Let X
1
, X
2
, X
3
Bernoulli(). Then T = X
1
is not sucient. Look at its
partition:
x
n
t p(x|t)
(0, 0, 0) t = 0 (1 )
2
(0, 0, 1) t = 0 (1 )
(0, 1, 0) t = 0 (1 )
(0, 1, 1) t = 0
2
(1, 0, 0) t = 1 (1 )
2
(1, 0, 1) t = 1 (1 )
(1, 1, 0) t = 1 (1 )
(1, 1, 1) t = 1
2
8 elements 2 elements
4
3.3 The Factorization Theorem
Theorem 9 T(X
n
) is sucient for if the joint pdf/pmf of X
n
can be factored as
p(x
n
; ) = h(x
n
) g(t; ).
Example 10 Let X
1
, , X
n
Poisson. Then
p(x
n
; ) =
e
n

P
X
i

(x
i
!)
=
1

(x
i
!)
e
n

P
i
X
i
.
Example 11 X
1
, , X
n
N(,
2
). Then
p(x
n
; ,
2
) =

1
2
2
n
2
exp

(x
i
x)
2
+n(x )
2
2
2

.
(a) If known:
p(x
n
; ) =

1
2
2
n
2
exp

(x
i
x)
2
2
2

. .. .
h(x
n
)
exp

n(x )
2
2
2

. .. .
g(T(x
n
)|)
.
Thus, X is sucient for .
(b) If (,
2
) unknown then T = (X, S
2
) is sucient. So is T = (

X
i
,

X
2
i
).
3.4 Minimal Sucient Statistics (MSS)
We want the greatest reduction in dimension.
Example 12 X
1
, , X
n
N(0,
2
). Some sucient statistics are:
T(X
1
, , X
n
) = (X
1
, , X
n
)
T(X
1
, , X
n
) = (X
2
1
, , X
2
n
)
T(X
1
, , X
n
) =

i=1
X
2
i
,
n

i=m+1
X
2
i

T(X
1
, , X
n
) =

X
2
i
.
5
T is a Minimal Sucient Statistic if the following two statements are true:
1. T is sucient and
2. If U is any other sucient statistic then T = g(U) for some function g.
In other words, T generates the coarsest sucient partition.
Suppose U is sucient. Suppose T = H(U) is also sucient. T provides greater reduction
than U unless H is a 1 1 transformation, in which case T and U are equivalent.
Example 13 X N(0,
2
). X is sucient. |X| is sucient. |X| is MSS. So are
X
2
, X
4
, e
X
2
.
Example 14 Let X
1
, X
2
, X
3
Bernoulli(). Let T =

X
i
.
x
n
t p(x|t) u p(x|u)
(0, 0, 0) t = 0 1 u = 0 1
(0, 0, 1) t = 1 1/3 u = 1 1/3
(0, 1, 0) t = 1 1/3 u = 1 1/3
(1, 0, 0) t = 1 1/3 u = 1 1/3
(0, 1, 1) t = 2 1/3 u = 73 1/2
(1, 0, 1) t = 2 1/3 u = 73 1/2
(1, 1, 0) t = 2 1/3 u = 91 1
(1, 1, 1) t = 3 1 u = 103 1
Note that U and T are both sucient but U is not minimal.
3.5 How to nd a Minimal Sucient Statistic
Theorem 15 Dene
R(x
n
, y
n
; ) =
p(y
n
; )
p(x
n
; )
.
Suppose that T has the following property:
R(x
n
, y
n
; ) does not depend on if and only if T(y
n
) = T(x
n
).
Then T is a MSS.
6
Example 16 Y
1
, , Y
n
iid Poisson ().
p(y
n
; ) =
e
n

P
y
i

y
i
,
p(y
n
; )
p(x
n
; )
=

P
y
i

P
x
i

y
i
!/

x
i
!
which is independent of i

y
i
=

x
i
. This implies that T(Y
n
) =

Y
i
is a minimal
sucient statistic for .
The minimal sucient statistic is not unique. But, the minimal sucient partition is unique.
Example 17 Cauchy.
p(x; ) =
1
(1 + (x )
2
)
.
Then
p(y
n
; )
p(x
n
; )
=
n

i=1
{1 + (x
i
)
2
}
n

j=1
{1 + (y
j
)
2
}
.
The ratio is a constant function of if
T(Y
n
) = (Y
(1)
, , Y
(n)
).
It is technically harder to show that this is true only if T is the order statistics, but it could
be done using theorems about polynomials. Having shown this, one can conclude that the
order statistics are the minimal sucient statistics for .
4 What Suciency Really Means
If T is sucient, then T contains all the information you need from the data to compute the
likelihood function. It does not contain all the information in the data. We will dene
the likelihood function in the next set of notes.
Note: Ignore the material on completeness and ancillary statistics.
7

You might also like