0% found this document useful (0 votes)
2 views

Math5846_chapter6

Chapter 6 of the document focuses on the convergence of random variables, detailing concepts such as convergence in probability, the weak law of large numbers, convergence in distribution, and the central limit theorem. It emphasizes the importance of these results in statistical analysis, particularly in approximating the behavior of sums and averages of random variables. The chapter also includes practical applications and examples to illustrate these convergence concepts.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Math5846_chapter6

Chapter 6 of the document focuses on the convergence of random variables, detailing concepts such as convergence in probability, the weak law of large numbers, convergence in distribution, and the central limit theorem. It emphasizes the importance of these results in statistical analysis, particularly in approximating the behavior of sums and averages of random variables. The chapter also includes practical applications and examples to illustrate these convergence concepts.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
Chapter 6
Convergence of Random Variables

2 / 83
Outline:
6.1 Introduction

6.2 Convergence in Probability

6.3 Weak Law of Large Numbers

6.4 Convergence in Distribution

6.5 Central Limit Theorem

6.6 Applications of the Central Limit Theorem


① Probability Calculations about a Sample Mean
② How Well Does the Central Limit Theorem Work?
③ Normal Approximation to the Binomial Distribution
④ What Should be the Size of n to Use the Normal Approximation to the
Binomial Distribution?
⑤ Normal Approximation to the Poisson Distribution
6.7 Delta Method

6.8 Supplementary Material

3 / 83
6.1 Introduction

4 / 83
The previous chapter dealt with the problem of finding the density function
(or probability function) of some transformation to a function of one or two
random variables.

In practice, we are usually interested in some function of many variables –


not just one or two. However, the calculations often become mathematically
intractable in this situation.

An example is the problem of finding the exact density function of the sum of
100 independent Uniform(0, 1) random variables.

5 / 83
This is a very tedious and complicated problem.

Because of the difficulties in obtaining exact results, a large portion of


mathematical statistics is concerned with approximations to density
functions.

These are based on convergence results for sequences of random variables.

6 / 83
In this chapter, we will focus on some key convergence results useful in
statistics.

Some of these (such as the law of large numbers and the central limit
theorem) relate to sums or averages of random variables.

These results are particularly useful because sums and averages are typically
used as summary statistics in quantitative research.

There are many different types of convergence of random variables.

7 / 83
6.2 Convergence in Probability

8 / 83
Definition
The sequence of random variables X1 , X2 , . . . converges in probability to
a constant c if, for any ε > 0,

lim P |Xn − c| > ε = 0.
n→∞

This is usually written as


P
Xn −
→ c.

Memorise this!

9 / 83
Example
Let X1 , X2 , . . . be independent Uniform (0, θ) variables.

Let Yn = max(X1 , . . . , Xn ) for n = 1, 2, . . . . Then it can be shown that


 y n
θ
, 0<y<θ
FYn (y) =
1, y≥θ
and

ny n−1
fYn (y) = , 0 < y < θ.
θn

See derivation of this result .


P
Show that Yn −
→ θ.

10 / 83
Example

Solution:
P 
Note that Yn −→ θ means limn→∞ P Yn − θ > ε = 0.

Recall 0 < Yn < θ so Yn − θ < 0 ⇒ Yn − θ = θ − Yn .

For 0 < ε < θ,

P (|Yn − θ| > ε) = P (Yn < θ − ε)


= FYn (θ − ε)
θ−ε n
 
=
θ
θ−ε
→ 0 as n→∞ (since < 1).
θ

For ε > θ, P (|Yn − θ| > ε) = 0 for all n ≥ 1 since the event Yn > θ cannot occur or it can occur with
probability zero. So limn→∞ P (|Yn − θ| > ε) = 0.

P
∴ Yn −→ θ.
11 / 83
Example
For n = 1, 2, . . . , let Yn ∼ N (µ, σn2 ) and suppose lim σn = 0.
n→∞
P
Show that Yn −
→ µ.

12 / 83
Example

Solution:

We need to show that limn→∞ P Yn − µ > ε = 0. For any ε > 0,

!
Yn − µ ε
P (|Yn − µ| > ε) = P >
σn σn
!
Yn − µ ε [ Yn − µ ε
= P <− >
σn σn σn σn
! !
Yn − µ ε Yn − µ ε
= P <− +P >
σn σn σn σn
Z −ε/σn Z ∞
1 2 1 2
= √ e−y /2 dy + √ e−y /2 dy
−∞ 2π ε/σn 2π
→ 0 as n → ∞,

ε Yn −µ P
since σn
→ ∞ and σn
∼ N (0, 1). Thus Yn −→ µ.

13 / 83
6.3 Weak Law of Large Numbers

14 / 83
Weak Law of Large Numbers
Suppose X1 , X2 , . . . are independent, each with mean µ and variance
0 < σ 2 < ∞. If n
1X
X̄n = Xi ,
n i=1
then
P
X̄n −
→ µ.

That is,
for all ε > 0, lim P (|X̄n − µ| > ε) = 0.
n→∞

15 / 83
Proof.
p
By Chebychev’s Inequality with ε = k V ar(X̄n ) and k = √ ε
, we
V ar(X̄n )
have
1
 q 
P X̄n − E(X̄n ) > k
V ar(X̄n ) ≤ 2
k
  V ar(X̄ )
n
P X̄n − E(X̄n ) > ε ≤ 2
ε
  σ2
P X̄n − µ > ε ≤
n ε2
(since E(X̄n ) = µ and V ar(X̄n ) = σ 2 /n )
→ 0 n → ∞.

∴ for all ε > 0, lim P (|X̄n − µ| > ε) = 0.


n→∞

16 / 83
6.4 Convergence in Distribution

17 / 83
Convergence in probability captures the concept of a sequence of random
variables approaching a fixed random variable.

However, it doesn’t say anything about the probability structure of the


sequence as n gets larger.

In this section, we introduce an alternative type of convergence that addresses


this situation, and allows approximate probability statements for large n.

18 / 83
Definition
Let X1 , X2 , . . . be a sequence of random variables. We say that Xn
converges in distribution to X if

lim FXn (x) = FX (x)


n→∞

for all x, where FX is continuous.

A common shorthand is
D
Xn −
→ X.

We say that FX is the limiting distribution of Xn .

Note that both X and Xn are random variables.


Memorise this! 19 / 83
Slutzky’s Theorem

The next result, known as Slutzky’s Theorem, is useful for establishing


convergence in the distribution results.

The proof is omitted in these notes but may be found in advanced texts such
as:

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics,


New York: John Wiley & Sons.

20 / 83
Slutzky’s Theorem
Let X1 , X2 , . . . be a sequence of random variables that converges in
distribution to X, i.e.,
D
Xn −
→ X.

Let Y1 , Y2 , . . . be another sequence of random variables that converges in


probability to a constant c, i.e.,
D
Yn −
→ c.

Then
D
1 X n + Yn −
→ X + c,
D
2 X n Yn −
→ c X.
21 / 83
Example
Suppose that X1 , X2 , . . . converges in distribution to X ∼ N (0, 1), i.e.,
D
→ N (0, 1), and suppose that n Yn ∼ Bin(n, 21 ).
Xn −

What are the limiting distributions of Xn + Yn and Xn Yn ?

22 / 83
Example
Suppose that X1 , X2 , . . . converges in distribution to X ∼ N (0, 1), i.e.,
D
→ N (0, 1), and suppose that n Yn ∼ Bin(n, 21 ).
Xn −

What are the limiting distributions of Xn + Yn and Xn Yn ?


Solution:
P 1 See why.
From the Weak Law of Large Numbers , Yn −→ 2
.

From the question, we know that


D
1 Xn −→ N (0, 1),
P 1
2 Yn −→ 2
.
Therefore, from Slutzky’s Theorem ,
1   1
D D
Xn + Yn −→ N ,1 and Xn Yn −→ N 0, .
2 4

22 / 83
6.5 Central Limit Theorem

23 / 83
For a general random sample X1 , X2 , . . . , Xn it is often of interest to make
probability statements about the sample mean X̄.

The next theorem provides a pathway to approximating such probabilities


when n is large.

24 / 83
Central Limit Theorem
Suppose X1 , X2 , . . . are independent and identically distributed random
variables with common mean µ = E(Xi ) and common variance
σ 2 = V ar(Xi ) < ∞.
Pn
For each n ≥ 1, let X̄n = n1 i=1 Xi . Then

X̄n − µ D
√ − → Z,
σ/ n
where Z ∼ N (0, 1). It is common to write

X̄n − µ D
√ − → N (0, 1).
σ/ n

Memorise this!

25 / 83
Note that
σ2
E(X̄n ) = µ and V ar(X̄n ) = .
n
See why

So the Central Limit Theorem states that the limiting distribution of any
standardised average of independent random variables is the standard
Normal or N (0, 1) distribution.

Note that we made no assumptions about the common distribution of


the Xi .

26 / 83
This is an important aspect of the Central Limit Theorem and is particularly
useful in practice.

In fact, the Central Limit Theorem is the single most useful result in
statistics, and it forms the basis of most of the statistical inference tools
used by researchers today.

27 / 83
Proof.
The method of proof will be to show that the moment generating function of
−µ
X̄n√
σ/ n
converges, as n → ∞, to the moment generating function of a N (0, 1)
random variable.

That is, if   
(X̄n − µ)
mn (u) = E exp u √
σ/ n
−µ
X̄n√
is the moment generating function of σ/ n
, then
2 /2
lim mn (u) = eu .
n→∞

(Recall the moment generating function of a N (µ, σ 2 ) random variable is


1 2 2
eµu+ 2 σ u .)

28 / 83
Proof. - continued
So, we have
( n √ )!
u X uµ n
mn (u) = E exp √ Xi −
σ n i=1 σ
 √  ( n
)!
−uµ n u X
= exp E exp √ Xi .
σ σ n i=1

29 / 83
Proof. - continued
The Xi ’s all have mean µ and variance σ 2 and so

E(Xi )u E(Xi2 )u2


mXi (u) = E(euXi ) = 1 + + + ···
1! 2!
1
= 1 + µu + (σ 2 + µ2 )u2 + · · ·
2
(since E(Xi2 ) = V ar(Xi ) + (E(Xi ))2 = σ2 + µ2 ).

30 / 83
Proof. - continued
( n
)!
u X
∴ E exp √ Xi
σ n i=1
n  !
Y u
= E exp √ Xi
i=1
σ n
n   
Y u
= E exp √ Xi (since Xi′ s independent)
i=1
σ n
n  
Y u
= mXi √
i=1
σ n
 2
n
u 1 2 2 u
= 1 + µ · √ + (σ + µ ) 2 + . . . .
σ n 2 σ n

31 / 83
Proof.
Thus

u2
  
−uµ n µu 1
ln mn (u) = + n ln 1 + √ + (σ 2 + µ2 ) 2 + . . .
σ σ n 2 σ n
√ (
2 
−u µ n µu 1 u
= +n √ + (σ 2 + µ2 ) 2 + . . .
σ σ n 2 σ n
2 )

1 µu 1 u2
− √ + (σ 2 + µ2 ) 2 + . . . + ...
2 σ n 2 σ n
1 2 1 3
(since ln(1 + x) = x − x + x − . . . .)
2 3
(
1 2 u2 1 µ2 u2
= n (σ + µ2 ) 2 −
2 σ n 2 σ2 n
)
1
+ (terms involving to powers larger than 1)
n

u2
= + (terms which → 0 as n → ∞).
2
u2
∴ lim mn (u) = e 2 .
n→∞

32 / 83
The Central Limit Theorem stated above provides the limiting distribution of

X̄ − µ
√ .
σ/ n

However, probabilities involving related quantities such as the sum ni=1 Xi


P
are sometimes required.

Since ni=1 Xi = n X̄, the Central Limit Theorem also applies to the sum of
P
a sequence of random variables.

The next result provides alternative forms of the Central Limit Theorem.

33 / 83
Results
Suppose X1 , X2 , . . . are independent and identically distributed random
variables with common mean µ = E(Xi ) and common variance
σ 2 = V ar(Xi ) < ∞.

Then the Central Limit Theorem may also be stated in the following
alternative forms:
√ D
1 → N (0, σ 2 ),
n(X̄ − µ) −
P
Xi −nµ D
2 i√
σ n

→ N (0, 1),
P
Xi −nµ D
3 i √
n

→ N (0, σ 2 ).

34 / 83
6.6 Applications of the Central Limit Theorem

35 / 83
In this section, we provide some applications of the Central Limit Theorem.

The Central Limit Theorem is the most important in statistics; its


applications are many and varied.

In particular, it is shown that a much simpler approximation can replace


complicated or long-winded probability calculations.

36 / 83
6.6.1 Probability Calculations about a Sample
Mean

37 / 83
Say we are interested in a random variable X.

We measure this variable on each of n randomly selected subjects, giving us


n independently and identically distributed (iid) random variables
X1 , X2 , . . . , Xn .

X̄ is the average of X from this sample.

Thanks to the Central Limit Theorem (CLT), we know that the average of a
sample from any random variable is approximately normally
distributed.

So, if we know µ and σ for this random variable, we can calculate any
probability we like about X̄.

38 / 83
Example
It is known that, in 1995, adult women in Australia had an average weight of
about 67kg, and the variance of this quantity is about 256.

We randomly choose 10 Australians. What is an approximate


distribution for the average weight of these people? What is the
chance that their average weight exceeds 80kg?

39 / 83
Example

Solution:
From the Central Limit Theorem ,
X̄ − µ D
√ −→ N (0, 1),
σ/ n
so we can say that
σ2
   
approx 256
X̄ ∼ N µ, =N 67, = N (67, 25.6).
n 10
So, using Chapter 3 methods to calculate normal probabilities,
 
X̄ − 67 80 − 67
P (X̄ > 80) = P √ > √
25.6 25.6
≈ P (Z > 2.569351)
≈ 0.00509446.

In RStudio, you type 1 -pnorm(2.569351).

40 / 83
Example
Cadmium is a naturally occurring heavy metal found in drinking water at low levels. The
Australian Drinking Water Guidelines recommend drinking water contain no more than
0.05 mg/L of cadmium due to health considerations.

Cadmium concentrations in water cannot be measured exactly, and it is known from


previous work that the variance of measurements of cadmium concentration in water is
about 0.0062 . Three water samples are taken from a small dam, and cadmium levels are
measured. The average of these three values is then used to estimate cadmium
concentration in the dam.

Water in the tested dam is high in cadmium, with 0.06 mg/L. What is the
chance that the unsatisfactory cadmium levels are detected, i.e., the chance
that the average of the three samples is higher than 0.05 mg/L?

41 / 83
Example

Solution:
Let X be the cadmium level for a randomly chosen sample, and we will assume that it is normally
distributed.
We are given that µ = 0.06, σ 2 = 0.0062 , and n = 3.
That is,
 2 !
0.006
X̄3 ∼ N 0.06, √ .
3

So,
!
X̄ − 0.06 0.05 − 0.06
P (X̄3 > 0.05) = P 0.006
> 0.006
√ √
3 3
= P (Z > −2.88675) (where Z ∼ N (0, 1))
≈ 0.998

In RStudio, type at the prompt: 1 - pnorm(-2.88675)

42 / 83
6.6.2 How Well Does the Central Limit
Theorem Work?

43 / 83
The normal approximation to the sample mean is an asymptotic
approximation; that is, it is an approximation obtained by considering
ever-increasing values of n.

This approximation can be expected to work well when n is large.

However, how large does n need to be for the normal


approximation to be reasonable?

It turns out that the answer depends on the distribution of X.

44 / 83
Recall that in our proof of the central limit theorem , we said that

u2
ln mn (u) = + (terms which → 0 as n → ∞)
2

When we say that the distribution of X̄ is normal, we are ignoring all the
terms in ln mn (u) that go to zero as n increases.

How well this approximation works in a particular setting depends on how


small these additional terms are.

45 / 83
The first (and usually the largest) of the terms we ignored is

u3 3 2 3
 u3
E(X ) − 3µσ − µ = κ1
6σ 3 n3/2 6n1/2
where κ1 is defined as the “skewness” of the distribution (a measure of the
asymmetry in the density function). See Information on skewness .

So this first term gets smaller as n gets larger, and it is small when the
skewness of the distribution is small.

The second term we ignored in the central limit theorem proof, which has a
coefficient of n−1 , is a function of both the skewness and the “kurtosis” of the
distribution (a measure of how long-tailed the density of X is).

46 / 83
From a further study of the distribution of the sample mean for different
choices of X, we can work out the following rough rules of thumb:

For most distributions encountered in practice, n > 30 is a large


enough value of n such that the normal approximation to the sample
mean is reasonable.

47 / 83
It should be noted however that more “pathological” distributions exist,
for which n > 30 is not sufficient to ensure approximate normality of X̄.

For any n, in theory, one can always produce an X such that X̄ is not
close to normal, for example, X ∼ P oisson(1/n).

However, such distributions are rarely encountered in practice.

48 / 83
If it is reasonable to assume that the distribution under consideration
has little skewness and without a “long-tailed” (i.e., it does not have
high kurtosis ), then the Central Limit Theorem will work well for
even smaller n.

Say, n > 10 is often sufficient in such cases.

49 / 83
6.6.3 Normal Approximation to the Binomial
Distribution

50 / 83
The Central Limit Theorem also allows us to approximate some common
distributions by the normal. An example is the binomial distribution.

Central Limit Theorem for Binomial Distribution


Suppose X ∼ Binomial(n, p). Then

X − np D
p −
→ N (0, 1),
np(1 − p)

where E(X) = n p and V ar(X) = n p (1 − p).

51 / 83
Proof.
Let X1 , . . . , Xn be a set of independent Bernoulli random variables with
parameter p. Then
X
X= Xi =⇒ X/n = X̄n
i

1
P
where X̄n = n i Xi is the sample mean of the Xi ’s. By the Central Limit
Theorem
 
X/n − µ
lim P √ ≤x = P (Z ≤ x),
n→∞ σ/ n
where Z ∼ N (0, 1) and µ = E(Xi ) = p and σ 2 = V ar(Xi ) = p(1 − p).

The required result follows immediately.

52 / 83
The practical ramifications are that probabilities involving binomial random
variables with large n can be approximated by normal probabilities.

However, a slight adjustment, known as a continuity correction, is often


used to improve the approximation:

53 / 83
Normal Approximation to Binomial Distribution with Continuity Correction
Suppose X ∼ Bin (n, p). Then
!
x − np + 12
P (X ≤ x) ≃ P Z ≤ p ,
np(1 − p)
where Z ∼ N (0, 1).

The continuity correction is based on the fact that a discrete random variable is being
approximated by a continuous random variable.

The continuity correction is subtracting 0.5 from any lower bound and adding 0.5 to any
upper bound.

54 / 83
Example
Adam tosses 25 pieces of toast off a roof and ten land butter side up.
Is this evidence that toast lands butter side down more often than
butter side up? That is, is P (X ≤ 10) unusually small?

55 / 83
Example
Adam tosses 25 pieces of toast off a roof and ten land butter side up.
Is this evidence that toast lands butter side down more often than
butter side up? That is, is P (X ≤ 10) unusually small?
Solution
Firstly, let X be the number of toasts that land butter side up. Here
X ∼ Binomial(25, 0.5).

We assume the toast will equally like to land on the butter or non-butter
sides.

We could answer this question by calculating the exact probability, which


would be time-consuming.

55 / 83
Example

Solution - continued
Instead, we use the fact that, by the Central Limit Theorem,
X − np D
Z=p −→ N (0, 1),
n p (1 − p)
p √
where n p = 25 × 12 = 12.5, n p (1 − p) = 25 × 12 × 12 = 6.25, and n p (1 − p) = 6.25 = 2.5.
Therefore,
!
X − np 10 − np
P (X ≤ 10) = P p ≤p
n p (1 − p) n p (1 − p)
 
10 − 12.5
= P Z≤
2.5
= P (Z ≤ −1) = Φ(−1)
≈ 0.1586553.

Using RStudio, pnorm(-1).


56 / 83
Example

Solution - continued
With the continuity correction ,

 
10 − 12.5 + 0.5
P (X ≤ 10) = P Z≤
2.5
= P (Z ≤ −0.80) = Φ(−0.80)
= 0.2118554,

using RStudio, pnorm(-0.80).

Compare this with the exact answer obtained from the binomial distribution:

P (X ≤ 10) = 0.2121781,

using RStudio, pbinom(10,25,0.5)


57 / 83
6.6.4 What Should be the Size of n to Use the
Normal Approximation to the Binomial
Distribution?

58 / 83
How large does n need to be for the normal approximation to the binomial
distribution to be reasonable?

Recall that how well the central limit theorem works depends on the
skewness of the distribution of X.

In the case of the Bernoulli and binomial distributions, the skewness is a


function of p (and skewness is zero for p = 0.5).

This means that how well the normal approximation to the binomial works is
a function of p, and it is a better approximation as p approaches 0.5.

59 / 83
A useful rule of thumb is that the normal approximation to the
binomial will work well when n is large enough that both np > 5 and
n(1 − p) > 5.

This rule of thumb means that we do not need a very large value of n for this
large sample approximation to work well.

For example, if p = 0.5,we only need n = 10 for the normal approximation to


work well.

On the other hand, if p = 0.005, we would need a sample size of n = 1000, at


least.

60 / 83
6.6.5 Normal Approximation to the Poisson
Distribution

61 / 83
Normal Approximation to the Poisson Distribution
Suppose X ∼ Poisson(λ). Then
 
X −λ
lim P √ ≤ x = P (Z ≤ x)
λ→∞ λ
where Z ∼ N (0, 1).

Note that E(X) = λ = V ar(X).

This approximation works increasingly well as λ gets large, and it provides a


reasonable approximation to most Poisson probabilities for λ > 5.

62 / 83
Example
Suppose X ∼ Poisson(100). Then
100x
P (X = x) = e−100 , x = 0, 1, 2, . . . .
x!

120
X 100x
To calculate P (80 ≤ X ≤ 120), we would need to evaluate e−100 .
x=80
x!
This isn’t easy!

Use a normal approximation to calculate

P(80 < X < 120).

63 / 83
Example

Solution:
We have X ∼ Poisson(100). So by the Central Limit Theorem ,
 
80 − λ X −λ 120 − λ
P (80 ≤ X ≤ 120) = P √ ≤ √ ≤ √
λ λ λ
 
80 − 100 X − 100 120 − 100
= P ≤ ≤
10 10 10
≈ P (−2 ≤ Z ≤ 2), where Z ∼ N (0, 1)
= Φ(2) − Φ(−2) = 0.9544997,

using RStudio, pnorm(2) - pnorm(-2) .

64 / 83
Example

Solution - continued:
So by the Central Limit Theorem and the continuity correction ,
 
80 − λ − 0.5 120 − λ + 0.5
P (80 ≤ X ≤ 120) = P √ ≤Z≤ √
λ λ
≈ P (−2.05 ≤ Z ≤ 2.05) where Z ∼ N (0, 1)
= Φ(2.05) − Φ(−2.05) = 0.9596356,

using RStudio, pnorm(2.05) - pnorm(-2.05) .

65 / 83
Example

Solution - continued:
The exact solution is
120
X 100x
P (80 ≤ X ≤ 120) = e−100
x=80
x!
= 0.9546815,

using RStudio, ppois(120,100) - ppois(80,100) .

66 / 83
6.7 Delta Method

67 / 83
The Central Limit Theorem provides a large sample approximation to the
distribution of X̄n .

However, what about other functions of a sequence X1 , X2 , . . . , and


p
1 functions of X̄ such as (X̄n )3 and sin−1 ( X̄n ).
2 functions defined through a non-linear equation such as the solution in α
to
α−1 X
X̄n − ln(Xi ) + Γ(α) = 0.
n i

This second example is particularly important in statistics, as we will see


when we study inference in later chapters.

68 / 83
It turns out that these random variable sequences also converge in
distribution to a normal random variable.

The general technique for establishing such results has become known as the
delta method.

The reason for this name is a bit mysterious, although it seems to be related
to the notation (e.g. δ) used in Taylor series expressions.

69 / 83
We are interested in the distribution of g(X̄n ) for some function of g.

The Delta method allows us to compute the asymptotic distribution of


g(X̄n ).

70 / 83
Delta Method
Let Y1 , Y2 , . . . be a sequence of random variables such that
√ D
→ N (0, σ 2 ).
n(Yn − θ) −
Suppose the function g is differentiable at θ and g ′ (θ) ̸= 0. Then
√ D
→ N (0, σ 2 g ′ (θ)2 ).
n{g(Yn ) − g(θ)} −

71 / 83
Proof.
Sketch of proof: Taylor series expansion gives

g(Yn ) = g(θ + Yn − θ) = g(θ) + g ′ (θ)(Yn − θ) + . . . .


Ignoring the lower order terms and re-arranging, one obtains the approximation:
√ √
n{g(Yn ) − g(θ)} ≃ g ′ (θ) n(Yn − θ).

However, the right-hand side is n(Yn − θ) multiplied by a constant g ′ (θ).
Then, we are given that
√ D
n(Yn − θ) −→ N (0, σ 2 )
and so it follows via the above approximation that
√ D
g ′ (θ) n(Yn − θ) −→ N (0, g ′ (θ)2 σ 2 )

72 / 83
Example
Let X1 , X2 , . . . be a sequence of identically distributed random variables with
mean two and variance seven.

Obtain a large sample approximation for the distribution of (X̄n )3 .

73 / 83
Example

Solution:
To obtain the asymptotic distribution of (X̄n )3 , we first need to find the asymptotic distribution of X̄n
and then apply the Delta method.
By the Central Limit Theorem, we know that
√ D √
n(X̄n − 2) −→ N (0, ( 7)2 ).
Let g(X̄n ) = (X̄n )3 . Applying of the Delta method with g(x) = x3 leads to g ′ (x) = 3x2 ⇒ g ′ (2) ̸= 0.
That is, the Delta method gives
√ √
n (X̄n )3 − 23
 
n g(X̄n ) − g(2) =
D
−→ N (0, 7 × (g ′ (2))2 ) = N (0, 7 × 9 × 16) = N (0, 1008).

For large n, the approximate distribution of (X̄n )3 is


 
1008
N 8, .
n
74 / 83
Multivariate Extension of the Delta Method
There is a multivariate extension of the Delta method that will be used later
in these notes.

A proof can be found in Chapter 3 of:

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics,


New York: John Wiley & Sons.

75 / 83
6.8 Supplementary Material

76 / 83
Supplementary Material - Chebychev’s Inequality

Chebychev’s Inequality
For any random variable Y ,
 p  1
P Y − E(Y ) > k V ar(Y ) ≤ .
k2

77 / 83
Supplementary Material - E(X̄n ) and V ar(X¯n )

n
1 X 
E(X̄n ) = E Xi
n i=1
n
1 X
= E(Xi )
n i=1
(since expectation is a linear operator)
1
= × n×µ
n
= µ.

n
1 X 
V ar(X̄n ) = V ar Xi
n i=1
n
1 X
= V ar(Xi )
n2 i=1
(since the Xi are independent)
1 2
= ×n×σ
n2
σ2
= .
n

return to notes

78 / 83
Supplementary Material - Distribution of the Maximum of Uniform(a, b)

Let Y = max{X1 , X2 , . . . , Xn }, where X1 , X2 , . . . , Xn are distributed as Uniform(a, b). We know that Y < x if and only
if every sample element is less than x. That is,
P (Y ≤ x) = P (X1 ≤ x, X2 ≤ x, . . . , Xn ≤ x)
n
Y
= P (Xi ≤ x) by independence
i=1
 n
= FXi (x) .

The cumulative distribution function of Xi ∼ Uniform(a, b) is



0
 if x < a
y−a
FXi (x) = b−a
if a < x < b

1 if x ≥ b.

Hence, the cumulative distribution function of Y is



0
 n if y < a
y−a
FY (y) = b−a
if a < y < b

if y ≥ b.

1

and the probability density function of Y is



 n (y−a)n−1
(b−a)n
if a < y < b
fY (y) =
0 otherwise.

Continued on the next slide.


79 / 83
Supplementary Material - Distribution of the Maximum of Uniform(a, b)

Special cases:

1 if Xi ∼ Uniform(0, 1), then the probability density function of Y = max{X1 , X2 , . . . , Xn } is

(
n y n−1 if 0 < y < 1
fY (y) =
0 otherwise.

This is a Beta distribution with α = n and β = 1 since

Γ(n + 1) n!
Beta(n, 1) = = = n.
Γ(n) Γ(1) (n − 1)!

2 if Xi ∼ Uniform(0, θ), then the probability density function of Y = max{X1 , X2 , . . . , Xn } is

n y n−1
(
fY (y) = θn
if 0 < y < θ
0 otherwise.

return to notes

80 / 83
Supplementary Material
Let Z1 , Z2 , . . . , Zn be independent and identically distributed Bernoulli(1/2) random variables.

We know that
1 E(Zi ) = 1 = µ
2
2 V ar(Zi ) = 1 × 1
= 1
= σ2
2 2 4

Then by the Weak Law of Large Numbers (WLLN) ,


n
1 X P 1
Z̄n = Zi −→ .
n i=1
2

Let
n
X 1
n Yn = Zi = n Z̄n ∼ Binomial(n, )
i=1
2
since it is the sum of independent and identically distributed Bernoulli(1/2), Therefore
P 1
Yn = Z̄n −→ .
2
return to notes

81 / 83
Supplementary Material - Skewness
Suppose the random variable X has mean µ and variance σ 2 .

The skewness of X is defined by


Skewness(X) = E (X − µ)3 /σ 3 .
 

Skewness is a measure of the asymmetry of the probability distribution of a real-valued


random variable about its mean. The skewness value can be positive (or right-skewed),
zero (or symmetrical), negative(left-skewed), or undefined.

return to the notes

82 / 83
Supplementary Material - Kurtosis

Suppose the random variable X has mean µ and variance σ 2 .

The kurtosis of X, kur(X) is defined by

kur(X) = E (X − µ)4 /σ 4 .
 

Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal
distribution is 3.

Distributions that are more outlier-prone than the normal distribution have kurtosis
greater than 3; distributions that are less outlier-prone have kurtosis less than 3.
return to notes

83 / 83

You might also like