0% found this document useful (0 votes)
5 views12 pages

Chap 4

Chapter IV discusses limit theorems in probability and statistics, focusing on the behavior of the sample mean as sample size increases. It covers modes of convergence, including convergence in distribution, probability, and almost surely, along with examples illustrating these concepts. Additionally, it introduces the Weak Law of Large Numbers and the Strong Law of Large Numbers, concluding with the Central Limit Theorem, which describes the limiting distribution of the sample mean.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Chap 4

Chapter IV discusses limit theorems in probability and statistics, focusing on the behavior of the sample mean as sample size increases. It covers modes of convergence, including convergence in distribution, probability, and almost surely, along with examples illustrating these concepts. Additionally, it introduces the Weak Law of Large Numbers and the Strong Law of Large Numbers, concluding with the Central Limit Theorem, which describes the limiting distribution of the sample mean.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Stat 1301 Probability & Statistics I Spring 2008-2009

Chapter IV Limit Theorems

The probabilistic behaviour of the sample mean when the sample size n is large
(say, tends to infinity) is called the limiting distribution of the sample mean. Law
of large number (LLN) and the central limit theorem (CLT) are two of the most
important theorems in statistics concerning the limiting distribution of the sample
mean. These two theorems suggest the “nice” properties of the sample mean and
justify its advantages.

Before proceeding, we need to define what ‘convergence’ means in the context of


random variables.

§ 4.1 Modes of Convergence

Let X 1 , X 2 ,... be a sequence of random variables (not necessarily independent), X


be another random variable. Let FX n ( x ) be the distribution function of X n , FX ( x )
be the distribution function of X.

Converges in Distribution / Converges in Law / Weak Convergence

X n is said to converge in distribution to X if

lim FX n ( x ) = FX ( x )
n→∞

for all points x at which FX ( x ) is continuous. It is denoted as X n ⎯


⎯→
L
X .

Example 4.1

iid
Let U1 ,U 2 ,... ~ U (0,1) . Define X n as the maximum of U1 ,U 2 ,...,U n . Then the
distribution function of X n is given by

FX n ( x ) = 0 for x ≤ 0 ;
FX n ( x ) = 1 for x ≥ 1;
FX n ( x ) = P ( X n ≤ x ) = P (U1 ≤ x,U 2 ≤ x,...,U n ≤ x )
= P (U1 ≤ x )P (U 2 ≤ x )L P(U n ≤ x )
= xn , for 0 < x < 1 .

P.152
Stat 1301 Probability & Statistics I Spring 2008-2009

Therefore
⎧0 if x < 1
lim FX n ( x ) = ⎨ .
n→∞
⎩1 if x ≥ 1

On the other hand, consider a random variable X which is degenerated at 1, i.e.


P ( X = 1) = 1 . The distribution function of X is

⎧0 if x < 1
FX ( x ) = P ( X ≤ x ) = ⎨ .
⎩1 if x ≥ 1

Hence lim FX n ( x ) = FX ( x ) and thereby X n ⎯


⎯→
L
X . We may also write
n→∞

Xn ⎯
⎯→
L
1

as X is degenerated at 1.

Now suppose we define Yn = n (1 − X n ) , then the distribution function of Yn is

⎛ y⎞ ⎛ y⎞
FYn ( y ) = P (Yn ≤ y ) = P (n (1 − X n ) ≤ y ) = P⎜ X n ≥ 1 − ⎟ = 1 − FX n ⎜1 − ⎟
⎝ n⎠ ⎝ n⎠
⎧0 if y ≤ 0
⎪ n
⎪ ⎛ y⎞
= ⎨ 1 − ⎜1 − ⎟ if 0 < y < n .
⎪ ⎝ n ⎠
⎪1 if y ≥ n

Therefore
⎧0 if y ≤ 0
lim FYn ( y ) = ⎨ −y
n→∞
⎩ 1− e if 0 < y < ∞

which is the distribution function of Exp (1) . Hence Yn = n (1 − X n ) converges in


distribution to an exponential random variable with parameter λ = 1 , i.e.

Yn = n (1 − max(U1 ,U 2 ,...,U n )) ⎯⎯→


L
Exp (1) .

P.153
Stat 1301 Probability & Statistics I Spring 2008-2009

Converges in probability

X n is said to converge in probability to X if for any ε > 0 ,

lim P ( X n − X ≥ ε ) = 0 .
n→∞

It is denoted as X n ⎯⎯→
P
X.

Example 4.2

Consider the X n defined in example 4.1. Obviously,

P( X n − 1 ≥ ε ) = 0 if ε > 1 .
For any 0 < ε ≤ 1 ,

P ( X n − 1 ≥ ε ) = P (1 − X n ≥ ε ) = P ( X n ≤ 1 − ε ) = FX n (1 − ε ) = (1 − ε ) .
n

Therefore for any ε > 0 , lim P ( X n − 1 ≥ ε ) = 0


n→∞

and hence X n ⎯⎯→


P
1.

Converges Almost Surely / Strong Convergence

X n is said to converges almost surely to X if

(
P lim X n = X = 1
n→∞
)
It is denoted as X n ⎯a⎯→
.s .
X.

Example 4.3

Let Ω = [0,1] be the sample space and ω be a point uniformly drawn from Ω .
Define X n (ω ) = ω + ω n and X (ω ) = ω . Then

⎧ X (ω ) if 0 ≤ ω < 1
lim X n (ω ) = ⎨ .
n→∞
⎩ X (ω ) + 1 if ω = 1

Since the convergence occurs on the set [0,1) and P (ω ∈ [0,1)) = 1 , X n ⎯a⎯→
.s .
X.
P.154
Stat 1301 Probability & Statistics I Spring 2008-2009

Remarks

1. The basic relationships between the above three modes of convergences are as
follows:

X n ⎯a⎯→
.s .
X ⇒ X n ⎯⎯→
P
X ⇒ Xn ⎯
⎯→
L
X.

Note that the converse may not be true.

2. Although the definitions of convergence in probability and convergence almost


surely look similar, they are very different statements. To understand almost
sure convergence, recall the basic definition of a random variable. A random
variable is a real-valued function defined on the sample space Ω , and
X n ⎯a⎯→
.s .
X means
lim X n (ω ) = X (ω )
n→∞

for all ω ∈ Ω except those ω ∈ S where S ⊆ Ω and P (S ) = 0 .

Example 4.4

Let Ω = [0,1] be the sample space and ω be a point uniformly drawn from Ω .
Define

X (ω ) = ω
X 1 (ω ) = ω + I [0,1] (ω ) ,
X 2 (ω ) = ω + I [0,1 2 ] (ω ) , X 3 (ω ) = ω + I [1 2,1] (ω ) ,
X 4 (ω ) = ω + I [0 ,1 3] (ω ) , X 5 (ω ) = ω + I [1 3,2 3] (ω ) , X 6 (ω ) = ω + I [2 3,1] (ω )
… … …

Obviously, as n → ∞ , P ( X n (ω ) − X (ω ) ≥ ε ) is equal to the probability of an


interval of ω values whose length tends to 0. Hence X n ⎯⎯→ P
X . However, for
every ω , the value X n (ω ) alternates between the values ω and ω + 1 infinitely
often. Thus there is no value of ω ∈ Ω for which X n (ω ) converges to X (ω ) , i.e.
X n does not converge to X almost surely.

P.155
Stat 1301 Probability & Statistics I Spring 2008-2009

Example 4.5

To check convergence in distribution, nothing needs to be known about the joint


distribution of X n and X, whereas this distribution must be defined to check
iid
convergence in probability. For example, if X 1 , X 2 ,... ~ N (0,1), then X n ⎯
⎯→
L
X1
but X n does not converge in probability to X 1 .

§ 4.2 Weak Law of Large Number (WLLN)

Let X 1 , X 2 ,... be a sequence of identically independently distributed random


variables with finite mean E ( X i ) = μ and variance Var ( X i ) = σ 2 . Let
1 n
X n = ∑ X i be the sample mean of the random sample. Then the weak law of
n i =1
large number states that
X n ⎯⎯→
P
μ,
i.e. for any given ε > 0 , we have

lim P ( X n − μ ≥ ε ) = 0 .
n→∞

Proof
σ2
E (X n ) = μ , Var ( X n ) =
n
σ2 n
By Chebyshev’s inequality, P( X n − μ ≥ ε ) ≤ 2 .
ε

σ2
Hence 0 ≤ lim P ( X n − μ ≥ ε ) ≤ lim 2 = 0 .
n→∞ n → ∞ nε

Therefore lim P ( X n − μ ≥ ε ) = 0 .
n→∞

Alternatively, lim P ( X n − μ < ε ) = 1 .


n→∞

Remark
A more general version of the weak law of large number states that if E ( X i ) < ∞ ,
then X n ⎯⎯→
P
μ . Note that it does not require the variance to be finite.

P.156
Stat 1301 Probability & Statistics I Spring 2008-2009

§ 4.3 Strong Law of Large Number (SLLN)

Let X 1 , X 2 ,... be a sequence of identically independently distributed random


variables with finite mean E ( X i ) = μ and fourth moment E X i4 . Let ( )
1 n
X n = ∑ X i be the sample mean of the random sample. Then the strong law of
n i =1
large number states that
X n ⎯a⎯→
.s .
μ,

i.e. (
P lim X n = μ = 1 .
n→∞
)
Proof

( ) (
Let Yi = X i − μ , i = 1,2,... Then E (Yi ) = 0 . Let K = E Yi 4 = E ( X i − μ ) < ∞ .
4
)
n
Define S n = ∑ Yi . Consider
i =1

⎡⎛ n ⎞ 4 ⎤
E ( )
S n4 ( ) ⎛n⎞
( ( ))
= E ⎢⎜ ∑ Yi ⎟ ⎥ = nE Yi 4 + 6⎜⎜ ⎟⎟ E Yi 2
2

⎣⎢⎝ i =1 ⎠ ⎦⎥ ⎝2⎠
≤ nK + 3n(n − 1)K ( ) ( ) ( ( ))
( Var Yi 2 ≥ 0 ⇒ E Yi 4 ≥ E Yi 2
2
)
⎡ S n4 ⎤ K 3(n − 1)K K 3K
⇒ E⎢ 4 ⎥ ≤ 3 + 3
≤ 3+ 2
⎣ ⎦
n n n n n

⎡ ∞ S 4 ⎤ ∞ ⎡S 4 ⎤ ∞ 1 ∞ 1
Therefore E ⎢∑ n4 ⎥ = ∑ E ⎢ n4 ⎥ = K ∑ 3 + 3K ∑ 2 < ∞ .
⎣ i =1 n ⎦ i =1 ⎣ n ⎦ i =1 n i =1 n

S n4 ∞ S n4
Hence with probability 1, ∑ 4 < ∞ which implies that lim
→ ∞ 4
= 0 . Thus we have
i =1 n n n
proven that with probability 1,
S
lim Yn = lim n = 0 ,
( )
n →∞ n→∞ n

i.e. P lim X n = μ = 1 .
n→∞

Remark
A more general version of the strong law of large number states that X n ⎯a⎯→
.s .
μ
if and only if E ( X i ) < ∞ and E ( X i ) = μ .

P.157
Stat 1301 Probability & Statistics I Spring 2008-2009

§ 4.4 Central Limit Theorem (CLT)

Since X n − μ converges to zero, it may be difficult to study the behaviour of X n


when n is large. To manifest the convergence of X n − μ , we consider the limiting
n (X n − μ )
distribution of .
σ

Let X n be the sample mean of a sequence of independent and identically


distributed random variables X 1 , X 2 ,..., X n from a distribution with finite mean
E ( X i ) = μ and variance Var ( X i ) = σ 2 . Then the central limit theorem states that

Xn − μ n (X n − μ ) L
= ⎯⎯→ N (0,1) as n → ∞ ,
σ n σ

⎛ n (X n − μ ) ⎞
i.e. lim P⎜⎜ ≤ x ⎟⎟ = Φ ( x ) for all − ∞ < x < ∞ .
n→∞
⎝ σ ⎠

Proof
Xi − μ
Let Yi = for i = 1,2,... Then E (Yi ) = 0 and Var (Yi ) = 1 . Let M Y (t ) be the
σ
moment generating function (assume it exists) of each Yi . By Taylor’s expansion,

1
M Y (t ) = M Y (0) + M 'Y (0) t + M ' 'Y (ε ) t 2 = 1 + M ' 'Y (ε )t 2
1
where 0 ≤ ε ≤ t .
2 2

n (X − μ ) 1 n
Let Wn =
σ
= ∑ Yi . The moment generating function of Wn is
n i =1

n n
⎛ ⎛ t ⎞ ⎞ ⎛⎜ t2 ⎞ t
M Wn (t ) = ⎜ M Y ⎜ ⎟ ⎟ = ⎜1 + M ' 'Y (ε )⎟⎟ where 0 ≤ ε ≤ .
⎝ ⎝ n ⎠ ⎠ ⎝ 2n ⎠ n

When n → ∞ , ε → 0 and hence M ' 'Y (ε ) → M ' 'Y (0 ) = E (Y 2 ) = 1 . Therefore

n
⎛ t2 ⎞
lim M Wn (t ) = lim ⎜⎜1 + M ' 'Y (ε )⎟⎟ = e t
2
2
n→∞ n→∞
⎝ 2n ⎠

which is the moment generating function of N (0,1).


P.158
Stat 1301 Probability & Statistics I Spring 2008-2009

Remarks

1. The key to the above proof is the following lemma which we state without
proof.

Let Z1 , Z 2 ,... be a sequence of random variables having moment generating


functions M Z n (t ) , n = 1,2,... ; and let Z be a random variable having moment
generating function M Z (t ) . If lim M Z n (t ) = M Z (t ) for all t, then Z n ⎯
⎯→
L
Z.
n→∞

2. A more general proof of the CLT uses the so-called characteristic function
(always exists) and does not need the existence of the moment generating
function.

3. The CLT can be extended to the independent non-identically distributed case


by the Lindeberg-Feller Theorem. See Chung(1974) for details.

4. The CLT is one of the most startling theorems in statistics. It found the basis of
other important theorems and provides us with some useful approximations for
the large-sample cases.

Example 4.6

Let X 50 be the sample mean of a random sample with sample size 50 from a
distribution with pdf
x3
f (x ) = , 0 ≤ x ≤ 2.
4

Suppose we want to evaluate P (1.5 ≤ X 50 ≤ 1.65). It is difficult to derive the exact


distribution of X 50 . Fortunately we can use normal approximation:

4 2
⎛8⎞
5
2x 8 2x 8
μ=∫ dx = , σ =∫
2
dx − ⎜ ⎟ =
4 0
5 4 0
⎝ 5⎠ 75
Therefore
50 ( X 50 − 8 5) .
W50 = ~ N (0,1) .
8 75

⎛ 50 (1.5 − 8 5) 50 (1.65 − 8 5) ⎞
Thus P (1.5 ≤ X 50 ≤ 1.65) = P⎜⎜ ≤ W50 ≤ ⎟

⎝ 8 75 8 75 ⎠
≈ Φ (1.08) − Φ (− 2.17 ) = 0.8449
P.159
Stat 1301 Probability & Statistics I Spring 2008-2009

Example 4.7

Normal approximate binomial

iid
If Yi ~ b(1, p ) , then E (Yi ) = p , Var (Yi ) = p (1 − p ) .

n
Let X = ∑ Yi , then X ~ b(n, p ) , E ( X ) = np , Var ( X ) = np (1 − p ) .
i =1

The sample mean of Y’s can be written as Yn = X n .

By CLT,

n (Yn − p ) X − np X − E(X ) L
= = ⎯⎯→ N (0,1) as n → ∞ .
p (1 − p ) np (1 − p ) Var ( X )

Example 4.8

X ~ b(15,0.25)

9 ⎛ 15 ⎞
P (4 ≤ X ≤ 9 ) = ∑ ⎜⎜ ⎟⎟(0.25) (0.75)
15 − x
x
= 0.5379
y=x ⎝ y ⎠

Using normal approximation,

np = 15(0.25) = 3.75 , np(1 − p ) = 15(0.25)(0.75) = 2.8125

⎛ 4 − 3.75 X − np 9 − 3.75 ⎞
P (4 ≤ X ≤ 9 ) = P ⎜⎜ ≤ ≤ ⎟

⎝ 2 .8125 np (1 − p ) 2 .8125 ⎠

≈ Φ (3.13) − Φ (0.149 ) = 0.4404

It would be more accurate to use

⎛ 3.5 − 3.75 X − np 9.5 − 3.75 ⎞


P (4 ≤ X ≤ 9 ) = P ⎜⎜ ≤ ≤ ⎟

⎝ 2 . 8125 np (1 − p ) 2 .8125 ⎠

≈ Φ (3.43) − Φ (− 0.149 ) = 0.5589

P.160
Stat 1301 Probability & Statistics I Spring 2008-2009

The 0.5 added or subtracted from the bounds in the probability statement is called
the continuity correction. In general, when a continuous distribution is used to
approximate a discrete distribution, it would be better to use

P( X ≤ c + 0.5) instead of P( X ≤ c)
P( X < c − 0.5) instead of P( X < c)
P( X ≥ c − 0. 5 ) instead of P( X ≥ c)
P( X > c + 0.5) instead of P( X > c)

where c is an integer.

Example 4.9

Normal approximate Poisson

iid
If Yi ~ ℘(λ ) , then E (Yi ) = λ , Var (Yi ) = λ .

n
Let X = ∑ Yi , then X ~ ℘(nλ ) , E ( X ) = nλ , Var ( X ) = nλ .
i =1

The sample mean of Y’s can be written as Yn = X n .

By CLT,
n (Yn − λ ) X − nλ X − E ( X ) L
= = ⎯⎯→ N (0,1) as n → ∞ .
λ nλ Var ( X )

X −θ
Thus for X ~ ℘(θ ) , ⎯⎯→
L
N (0,1) as θ → ∞ .
θ

Example 4.10

X ~ ℘(10 )
21 e −1010 x
P (11 < X ≤ 21) = ∑ = 0.3025
x =12 x!

Using normal approximation,


⎛ 11 − 10 X − θ 21 − 10 ⎞
P (11 < X ≤ 21) = P⎜ < ≤ ⎟
⎝ 10 θ 10 ⎠
≈ Φ (3.48) − Φ (0.32 ) = 0.3745
P.161
Stat 1301 Probability & Statistics I Spring 2008-2009

If we apply continuity correction,

⎛ 11.5 − 10 X − θ 21.5 − 10 ⎞
P (11 < X ≤ 21) = P ⎜ ≤ ≤ ⎟
⎝ 10 θ 10 ⎠
≈ Φ (3.64 ) − Φ (0.47 ) = 0.3175
which is more accurate.

Example 4.11

Normal approximate Gamma / Chi-squared

iid 1
If Yi ~ Exp(λ ) , then E (Yi ) = Var (Yi ) =
1
, .
λ λ2
n n
Let X = ∑ Yi , then X ~ Γ(n, λ ) , Var ( X ) =
n
E(X ) = , .
i =1 λ λ2

The sample mean of Y’s can be written as Yn = X n .

n (Yn − 1 λ ) X −n λ X − E(X ) L
By CLT, = = ⎯⎯→ N (0,1) as n → ∞ .
1 λ2 n λ2 Var ( X )

X −α λ
Thus for X ~ Γ(α , λ ) , ⎯⎯→
L
N (0,1) as α → ∞ .
α λ 2

X −r L
In particular, for X ~ χ r2 , ⎯⎯→ N (0,1) as r → ∞ .
2r

Example 4.12

X ~ χ 80
2

From Chi-squared distribution table, P (64.28 ≤ X ≤ 101.9 ) = 0.95 − 0.1 = 0.85

Using normal approximation,

⎛ 64.28 − 80 X − r 101.9 − 80 ⎞
P (64.28 ≤ X ≤ 101.9 ) = P ⎜ < ≤ ⎟
⎝ 160 2r 160 ⎠
≈ Φ (1.7313) − Φ (− 1.2428) = 0.9583 − 0.1070 = 0.8513
P.162
Stat 1301 Probability & Statistics I Spring 2008-2009

The following diagram shows the relationship among some common families of
distributions. The limiting distributions are derived based on the central limit
theorem.

P.163

You might also like