Chap 4
Chap 4
The probabilistic behaviour of the sample mean when the sample size n is large
(say, tends to infinity) is called the limiting distribution of the sample mean. Law
of large number (LLN) and the central limit theorem (CLT) are two of the most
important theorems in statistics concerning the limiting distribution of the sample
mean. These two theorems suggest the “nice” properties of the sample mean and
justify its advantages.
lim FX n ( x ) = FX ( x )
n→∞
Example 4.1
iid
Let U1 ,U 2 ,... ~ U (0,1) . Define X n as the maximum of U1 ,U 2 ,...,U n . Then the
distribution function of X n is given by
FX n ( x ) = 0 for x ≤ 0 ;
FX n ( x ) = 1 for x ≥ 1;
FX n ( x ) = P ( X n ≤ x ) = P (U1 ≤ x,U 2 ≤ x,...,U n ≤ x )
= P (U1 ≤ x )P (U 2 ≤ x )L P(U n ≤ x )
= xn , for 0 < x < 1 .
P.152
Stat 1301 Probability & Statistics I Spring 2008-2009
Therefore
⎧0 if x < 1
lim FX n ( x ) = ⎨ .
n→∞
⎩1 if x ≥ 1
⎧0 if x < 1
FX ( x ) = P ( X ≤ x ) = ⎨ .
⎩1 if x ≥ 1
Xn ⎯
⎯→
L
1
as X is degenerated at 1.
⎛ y⎞ ⎛ y⎞
FYn ( y ) = P (Yn ≤ y ) = P (n (1 − X n ) ≤ y ) = P⎜ X n ≥ 1 − ⎟ = 1 − FX n ⎜1 − ⎟
⎝ n⎠ ⎝ n⎠
⎧0 if y ≤ 0
⎪ n
⎪ ⎛ y⎞
= ⎨ 1 − ⎜1 − ⎟ if 0 < y < n .
⎪ ⎝ n ⎠
⎪1 if y ≥ n
⎩
Therefore
⎧0 if y ≤ 0
lim FYn ( y ) = ⎨ −y
n→∞
⎩ 1− e if 0 < y < ∞
P.153
Stat 1301 Probability & Statistics I Spring 2008-2009
Converges in probability
lim P ( X n − X ≥ ε ) = 0 .
n→∞
It is denoted as X n ⎯⎯→
P
X.
Example 4.2
P( X n − 1 ≥ ε ) = 0 if ε > 1 .
For any 0 < ε ≤ 1 ,
P ( X n − 1 ≥ ε ) = P (1 − X n ≥ ε ) = P ( X n ≤ 1 − ε ) = FX n (1 − ε ) = (1 − ε ) .
n
(
P lim X n = X = 1
n→∞
)
It is denoted as X n ⎯a⎯→
.s .
X.
Example 4.3
Let Ω = [0,1] be the sample space and ω be a point uniformly drawn from Ω .
Define X n (ω ) = ω + ω n and X (ω ) = ω . Then
⎧ X (ω ) if 0 ≤ ω < 1
lim X n (ω ) = ⎨ .
n→∞
⎩ X (ω ) + 1 if ω = 1
Since the convergence occurs on the set [0,1) and P (ω ∈ [0,1)) = 1 , X n ⎯a⎯→
.s .
X.
P.154
Stat 1301 Probability & Statistics I Spring 2008-2009
Remarks
1. The basic relationships between the above three modes of convergences are as
follows:
X n ⎯a⎯→
.s .
X ⇒ X n ⎯⎯→
P
X ⇒ Xn ⎯
⎯→
L
X.
Example 4.4
Let Ω = [0,1] be the sample space and ω be a point uniformly drawn from Ω .
Define
X (ω ) = ω
X 1 (ω ) = ω + I [0,1] (ω ) ,
X 2 (ω ) = ω + I [0,1 2 ] (ω ) , X 3 (ω ) = ω + I [1 2,1] (ω ) ,
X 4 (ω ) = ω + I [0 ,1 3] (ω ) , X 5 (ω ) = ω + I [1 3,2 3] (ω ) , X 6 (ω ) = ω + I [2 3,1] (ω )
… … …
P.155
Stat 1301 Probability & Statistics I Spring 2008-2009
Example 4.5
lim P ( X n − μ ≥ ε ) = 0 .
n→∞
Proof
σ2
E (X n ) = μ , Var ( X n ) =
n
σ2 n
By Chebyshev’s inequality, P( X n − μ ≥ ε ) ≤ 2 .
ε
σ2
Hence 0 ≤ lim P ( X n − μ ≥ ε ) ≤ lim 2 = 0 .
n→∞ n → ∞ nε
Therefore lim P ( X n − μ ≥ ε ) = 0 .
n→∞
Remark
A more general version of the weak law of large number states that if E ( X i ) < ∞ ,
then X n ⎯⎯→
P
μ . Note that it does not require the variance to be finite.
P.156
Stat 1301 Probability & Statistics I Spring 2008-2009
i.e. (
P lim X n = μ = 1 .
n→∞
)
Proof
( ) (
Let Yi = X i − μ , i = 1,2,... Then E (Yi ) = 0 . Let K = E Yi 4 = E ( X i − μ ) < ∞ .
4
)
n
Define S n = ∑ Yi . Consider
i =1
⎡⎛ n ⎞ 4 ⎤
E ( )
S n4 ( ) ⎛n⎞
( ( ))
= E ⎢⎜ ∑ Yi ⎟ ⎥ = nE Yi 4 + 6⎜⎜ ⎟⎟ E Yi 2
2
⎣⎢⎝ i =1 ⎠ ⎦⎥ ⎝2⎠
≤ nK + 3n(n − 1)K ( ) ( ) ( ( ))
( Var Yi 2 ≥ 0 ⇒ E Yi 4 ≥ E Yi 2
2
)
⎡ S n4 ⎤ K 3(n − 1)K K 3K
⇒ E⎢ 4 ⎥ ≤ 3 + 3
≤ 3+ 2
⎣ ⎦
n n n n n
⎡ ∞ S 4 ⎤ ∞ ⎡S 4 ⎤ ∞ 1 ∞ 1
Therefore E ⎢∑ n4 ⎥ = ∑ E ⎢ n4 ⎥ = K ∑ 3 + 3K ∑ 2 < ∞ .
⎣ i =1 n ⎦ i =1 ⎣ n ⎦ i =1 n i =1 n
S n4 ∞ S n4
Hence with probability 1, ∑ 4 < ∞ which implies that lim
→ ∞ 4
= 0 . Thus we have
i =1 n n n
proven that with probability 1,
S
lim Yn = lim n = 0 ,
( )
n →∞ n→∞ n
i.e. P lim X n = μ = 1 .
n→∞
Remark
A more general version of the strong law of large number states that X n ⎯a⎯→
.s .
μ
if and only if E ( X i ) < ∞ and E ( X i ) = μ .
P.157
Stat 1301 Probability & Statistics I Spring 2008-2009
Xn − μ n (X n − μ ) L
= ⎯⎯→ N (0,1) as n → ∞ ,
σ n σ
⎛ n (X n − μ ) ⎞
i.e. lim P⎜⎜ ≤ x ⎟⎟ = Φ ( x ) for all − ∞ < x < ∞ .
n→∞
⎝ σ ⎠
Proof
Xi − μ
Let Yi = for i = 1,2,... Then E (Yi ) = 0 and Var (Yi ) = 1 . Let M Y (t ) be the
σ
moment generating function (assume it exists) of each Yi . By Taylor’s expansion,
1
M Y (t ) = M Y (0) + M 'Y (0) t + M ' 'Y (ε ) t 2 = 1 + M ' 'Y (ε )t 2
1
where 0 ≤ ε ≤ t .
2 2
n (X − μ ) 1 n
Let Wn =
σ
= ∑ Yi . The moment generating function of Wn is
n i =1
n n
⎛ ⎛ t ⎞ ⎞ ⎛⎜ t2 ⎞ t
M Wn (t ) = ⎜ M Y ⎜ ⎟ ⎟ = ⎜1 + M ' 'Y (ε )⎟⎟ where 0 ≤ ε ≤ .
⎝ ⎝ n ⎠ ⎠ ⎝ 2n ⎠ n
n
⎛ t2 ⎞
lim M Wn (t ) = lim ⎜⎜1 + M ' 'Y (ε )⎟⎟ = e t
2
2
n→∞ n→∞
⎝ 2n ⎠
Remarks
1. The key to the above proof is the following lemma which we state without
proof.
2. A more general proof of the CLT uses the so-called characteristic function
(always exists) and does not need the existence of the moment generating
function.
4. The CLT is one of the most startling theorems in statistics. It found the basis of
other important theorems and provides us with some useful approximations for
the large-sample cases.
Example 4.6
Let X 50 be the sample mean of a random sample with sample size 50 from a
distribution with pdf
x3
f (x ) = , 0 ≤ x ≤ 2.
4
4 2
⎛8⎞
5
2x 8 2x 8
μ=∫ dx = , σ =∫
2
dx − ⎜ ⎟ =
4 0
5 4 0
⎝ 5⎠ 75
Therefore
50 ( X 50 − 8 5) .
W50 = ~ N (0,1) .
8 75
⎛ 50 (1.5 − 8 5) 50 (1.65 − 8 5) ⎞
Thus P (1.5 ≤ X 50 ≤ 1.65) = P⎜⎜ ≤ W50 ≤ ⎟
⎟
⎝ 8 75 8 75 ⎠
≈ Φ (1.08) − Φ (− 2.17 ) = 0.8449
P.159
Stat 1301 Probability & Statistics I Spring 2008-2009
Example 4.7
iid
If Yi ~ b(1, p ) , then E (Yi ) = p , Var (Yi ) = p (1 − p ) .
n
Let X = ∑ Yi , then X ~ b(n, p ) , E ( X ) = np , Var ( X ) = np (1 − p ) .
i =1
By CLT,
n (Yn − p ) X − np X − E(X ) L
= = ⎯⎯→ N (0,1) as n → ∞ .
p (1 − p ) np (1 − p ) Var ( X )
Example 4.8
X ~ b(15,0.25)
9 ⎛ 15 ⎞
P (4 ≤ X ≤ 9 ) = ∑ ⎜⎜ ⎟⎟(0.25) (0.75)
15 − x
x
= 0.5379
y=x ⎝ y ⎠
⎛ 4 − 3.75 X − np 9 − 3.75 ⎞
P (4 ≤ X ≤ 9 ) = P ⎜⎜ ≤ ≤ ⎟
⎟
⎝ 2 .8125 np (1 − p ) 2 .8125 ⎠
P.160
Stat 1301 Probability & Statistics I Spring 2008-2009
The 0.5 added or subtracted from the bounds in the probability statement is called
the continuity correction. In general, when a continuous distribution is used to
approximate a discrete distribution, it would be better to use
P( X ≤ c + 0.5) instead of P( X ≤ c)
P( X < c − 0.5) instead of P( X < c)
P( X ≥ c − 0. 5 ) instead of P( X ≥ c)
P( X > c + 0.5) instead of P( X > c)
where c is an integer.
Example 4.9
iid
If Yi ~ ℘(λ ) , then E (Yi ) = λ , Var (Yi ) = λ .
n
Let X = ∑ Yi , then X ~ ℘(nλ ) , E ( X ) = nλ , Var ( X ) = nλ .
i =1
By CLT,
n (Yn − λ ) X − nλ X − E ( X ) L
= = ⎯⎯→ N (0,1) as n → ∞ .
λ nλ Var ( X )
X −θ
Thus for X ~ ℘(θ ) , ⎯⎯→
L
N (0,1) as θ → ∞ .
θ
Example 4.10
X ~ ℘(10 )
21 e −1010 x
P (11 < X ≤ 21) = ∑ = 0.3025
x =12 x!
⎛ 11.5 − 10 X − θ 21.5 − 10 ⎞
P (11 < X ≤ 21) = P ⎜ ≤ ≤ ⎟
⎝ 10 θ 10 ⎠
≈ Φ (3.64 ) − Φ (0.47 ) = 0.3175
which is more accurate.
Example 4.11
iid 1
If Yi ~ Exp(λ ) , then E (Yi ) = Var (Yi ) =
1
, .
λ λ2
n n
Let X = ∑ Yi , then X ~ Γ(n, λ ) , Var ( X ) =
n
E(X ) = , .
i =1 λ λ2
n (Yn − 1 λ ) X −n λ X − E(X ) L
By CLT, = = ⎯⎯→ N (0,1) as n → ∞ .
1 λ2 n λ2 Var ( X )
X −α λ
Thus for X ~ Γ(α , λ ) , ⎯⎯→
L
N (0,1) as α → ∞ .
α λ 2
X −r L
In particular, for X ~ χ r2 , ⎯⎯→ N (0,1) as r → ∞ .
2r
Example 4.12
X ~ χ 80
2
⎛ 64.28 − 80 X − r 101.9 − 80 ⎞
P (64.28 ≤ X ≤ 101.9 ) = P ⎜ < ≤ ⎟
⎝ 160 2r 160 ⎠
≈ Φ (1.7313) − Φ (− 1.2428) = 0.9583 − 0.1070 = 0.8513
P.162
Stat 1301 Probability & Statistics I Spring 2008-2009
The following diagram shows the relationship among some common families of
distributions. The limiting distributions are derived based on the central limit
theorem.
P.163