0% found this document useful (0 votes)

2 views

Math5846_chapter6

Chapter 6 of the document focuses on the convergence of random variables, detailing concepts such as convergence in probability, the weak law of large numbers, convergence in distribution, and the central limit theorem. It emphasizes the importance of these results in statistical analysis, particularly in approximating the behavior of sums and averages of random variables. The chapter also includes practical applications and examples to illustrate these convergence concepts.

Uploaded by

huangde1212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Math5846_chapter6

Uploaded by

huangde1212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
Chapter 6
Convergence of Random Variables

2 / 83
Outline:
6.1 Introduction

6.2 Convergence in Probability

6.3 Weak Law of Large Numbers

6.4 Convergence in Distribution

6.5 Central Limit Theorem

6.6 Applications of the Central Limit Theorem

① Probability Calculations about a Sample Mean
② How Well Does the Central Limit Theorem Work?
③ Normal Approximation to the Binomial Distribution
④ What Should be the Size of n to Use the Normal Approximation to the
Binomial Distribution?
⑤ Normal Approximation to the Poisson Distribution
6.7 Delta Method

6.8 Supplementary Material

3 / 83
6.1 Introduction

4 / 83
The previous chapter dealt with the problem of finding the density function
(or probability function) of some transformation to a function of one or two
random variables.

In practice, we are usually interested in some function of many variables –

not just one or two. However, the calculations often become mathematically
intractable in this situation.

An example is the problem of finding the exact density function of the sum of
100 independent Uniform(0, 1) random variables.

5 / 83
This is a very tedious and complicated problem.

Because of the difficulties in obtaining exact results, a large portion of

mathematical statistics is concerned with approximations to density
functions.

These are based on convergence results for sequences of random variables.

6 / 83
In this chapter, we will focus on some key convergence results useful in
statistics.

Some of these (such as the law of large numbers and the central limit
theorem) relate to sums or averages of random variables.

These results are particularly useful because sums and averages are typically
used as summary statistics in quantitative research.

There are many different types of convergence of random variables.

7 / 83
6.2 Convergence in Probability

8 / 83
Definition
The sequence of random variables X1 , X2 , . . . converges in probability to
a constant c if, for any ε > 0,

lim P |Xn − c| > ε = 0.
n→∞

This is usually written as

P
Xn −
→ c.

Memorise this!

9 / 83
Example
Let X1 , X2 , . . . be independent Uniform (0, θ) variables.

Let Yn = max(X1 , . . . , Xn ) for n = 1, 2, . . . . Then it can be shown that

y n
θ
, 0<y<θ
FYn (y) =
1, y≥θ
and

ny n−1
fYn (y) = , 0 < y < θ.
θn

See derivation of this result .

P
Show that Yn −
→ θ.

10 / 83
Example

Solution:
P
Note that Yn −→ θ means limn→∞ P Yn − θ > ε = 0.

Recall 0 < Yn < θ so Yn − θ < 0 ⇒ Yn − θ = θ − Yn .

For 0 < ε < θ,

P (|Yn − θ| > ε) = P (Yn < θ − ε)

= FYn (θ − ε)
θ−ε n

=
θ
θ−ε
→ 0 as n→∞ (since < 1).
θ

For ε > θ, P (|Yn − θ| > ε) = 0 for all n ≥ 1 since the event Yn > θ cannot occur or it can occur with
probability zero. So limn→∞ P (|Yn − θ| > ε) = 0.

P
∴ Yn −→ θ.
11 / 83
Example
For n = 1, 2, . . . , let Yn ∼ N (µ, σn2 ) and suppose lim σn = 0.
n→∞
P
Show that Yn −
→ µ.

12 / 83
Example

Solution:

We need to show that limn→∞ P Yn − µ > ε = 0. For any ε > 0,

!
Yn − µ ε
P (|Yn − µ| > ε) = P >
σn σn
!
Yn − µ ε [ Yn − µ ε
= P <− >
σn σn σn σn
! !
Yn − µ ε Yn − µ ε
= P <− +P >
σn σn σn σn
Z −ε/σn Z ∞
1 2 1 2
= √ e−y /2 dy + √ e−y /2 dy
−∞ 2π ε/σn 2π
→ 0 as n → ∞,

ε Yn −µ P
since σn
→ ∞ and σn
∼ N (0, 1). Thus Yn −→ µ.

13 / 83
6.3 Weak Law of Large Numbers

14 / 83
Weak Law of Large Numbers
Suppose X1 , X2 , . . . are independent, each with mean µ and variance
0 < σ 2 < ∞. If n
1X
X̄n = Xi ,
n i=1
then
P
X̄n −
→ µ.

That is,
for all ε > 0, lim P (|X̄n − µ| > ε) = 0.
n→∞

15 / 83
Proof.
p
By Chebychev’s Inequality with ε = k V ar(X̄n ) and k = √ ε
, we
V ar(X̄n )
have
1
q
P X̄n − E(X̄n ) > k
V ar(X̄n ) ≤ 2
k
V ar(X̄ )
n
P X̄n − E(X̄n ) > ε ≤ 2
ε
σ2
P X̄n − µ > ε ≤
n ε2
(since E(X̄n ) = µ and V ar(X̄n ) = σ 2 /n )
→ 0 n → ∞.

∴ for all ε > 0, lim P (|X̄n − µ| > ε) = 0.

n→∞

16 / 83
6.4 Convergence in Distribution

17 / 83
Convergence in probability captures the concept of a sequence of random
variables approaching a fixed random variable.

However, it doesn’t say anything about the probability structure of the

sequence as n gets larger.

In this section, we introduce an alternative type of convergence that addresses

this situation, and allows approximate probability statements for large n.

18 / 83
Definition
Let X1 , X2 , . . . be a sequence of random variables. We say that Xn
converges in distribution to X if

lim FXn (x) = FX (x)

n→∞

for all x, where FX is continuous.

A common shorthand is
D
Xn −
→ X.

We say that FX is the limiting distribution of Xn .

Note that both X and Xn are random variables.

Memorise this! 19 / 83
Slutzky’s Theorem

The next result, known as Slutzky’s Theorem, is useful for establishing

convergence in the distribution results.

The proof is omitted in these notes but may be found in advanced texts such
as:

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics,

New York: John Wiley & Sons.

20 / 83
Slutzky’s Theorem
Let X1 , X2 , . . . be a sequence of random variables that converges in
distribution to X, i.e.,
D
Xn −
→ X.

Let Y1 , Y2 , . . . be another sequence of random variables that converges in

probability to a constant c, i.e.,
D
Yn −
→ c.

Then
D
1 X n + Yn −
→ X + c,
D
2 X n Yn −
→ c X.
21 / 83
Example
Suppose that X1 , X2 , . . . converges in distribution to X ∼ N (0, 1), i.e.,
D
→ N (0, 1), and suppose that n Yn ∼ Bin(n, 21 ).
Xn −

What are the limiting distributions of Xn + Yn and Xn Yn ?

22 / 83
Example
Suppose that X1 , X2 , . . . converges in distribution to X ∼ N (0, 1), i.e.,
D
→ N (0, 1), and suppose that n Yn ∼ Bin(n, 21 ).
Xn −

What are the limiting distributions of Xn + Yn and Xn Yn ?

Solution:
P 1 See why.
From the Weak Law of Large Numbers , Yn −→ 2
.

From the question, we know that

D
1 Xn −→ N (0, 1),
P 1
2 Yn −→ 2
.
Therefore, from Slutzky’s Theorem ,
1 1
D D
Xn + Yn −→ N ,1 and Xn Yn −→ N 0, .
2 4

22 / 83
6.5 Central Limit Theorem

23 / 83
For a general random sample X1 , X2 , . . . , Xn it is often of interest to make
probability statements about the sample mean X̄.

The next theorem provides a pathway to approximating such probabilities

when n is large.

24 / 83
Central Limit Theorem
Suppose X1 , X2 , . . . are independent and identically distributed random
variables with common mean µ = E(Xi ) and common variance
σ 2 = V ar(Xi ) < ∞.
Pn
For each n ≥ 1, let X̄n = n1 i=1 Xi . Then

X̄n − µ D
√ − → Z,
σ/ n
where Z ∼ N (0, 1). It is common to write

X̄n − µ D
√ − → N (0, 1).
σ/ n

Memorise this!

25 / 83
Note that
σ2
E(X̄n ) = µ and V ar(X̄n ) = .
n
See why

So the Central Limit Theorem states that the limiting distribution of any
standardised average of independent random variables is the standard
Normal or N (0, 1) distribution.

Note that we made no assumptions about the common distribution of

the Xi .

26 / 83
This is an important aspect of the Central Limit Theorem and is particularly
useful in practice.

In fact, the Central Limit Theorem is the single most useful result in
statistics, and it forms the basis of most of the statistical inference tools
used by researchers today.

27 / 83
Proof.
The method of proof will be to show that the moment generating function of
−µ
X̄n√
σ/ n
converges, as n → ∞, to the moment generating function of a N (0, 1)
random variable.

That is, if
(X̄n − µ)
mn (u) = E exp u √
σ/ n
−µ
X̄n√
is the moment generating function of σ/ n
, then
2 /2
lim mn (u) = eu .
n→∞

(Recall the moment generating function of a N (µ, σ 2 ) random variable is

1 2 2
eµu+ 2 σ u .)

28 / 83
Proof. - continued
So, we have
( n √ )!
u X uµ n
mn (u) = E exp √ Xi −
σ n i=1 σ
√ ( n
)!
−uµ n u X
= exp E exp √ Xi .
σ σ n i=1

29 / 83
Proof. - continued
The Xi ’s all have mean µ and variance σ 2 and so

E(Xi )u E(Xi2 )u2

mXi (u) = E(euXi ) = 1 + + + ···
1! 2!
1
= 1 + µu + (σ 2 + µ2 )u2 + · · ·
2
(since E(Xi2 ) = V ar(Xi ) + (E(Xi ))2 = σ2 + µ2 ).

30 / 83
Proof. - continued
( n
)!
u X
∴ E exp √ Xi
σ n i=1
n !
Y u
= E exp √ Xi
i=1
σ n
n
Y u
= E exp √ Xi (since Xi′ s independent)
i=1
σ n
n
Y u
= mXi √
i=1
σ n
2
n
u 1 2 2 u
= 1 + µ · √ + (σ + µ ) 2 + . . . .
σ n 2 σ n

31 / 83
Proof.
Thus
√
u2

−uµ n µu 1
ln mn (u) = + n ln 1 + √ + (σ 2 + µ2 ) 2 + . . .
σ σ n 2 σ n
√ (
2
−u µ n µu 1 u
= +n √ + (σ 2 + µ2 ) 2 + . . .
σ σ n 2 σ n
2 )

1 µu 1 u2
− √ + (σ 2 + µ2 ) 2 + . . . + ...
2 σ n 2 σ n
1 2 1 3
(since ln(1 + x) = x − x + x − . . . .)
2 3
(
1 2 u2 1 µ2 u2
= n (σ + µ2 ) 2 −
2 σ n 2 σ2 n
)
1
+ (terms involving to powers larger than 1)
n

u2
= + (terms which → 0 as n → ∞).
2
u2
∴ lim mn (u) = e 2 .
n→∞

32 / 83
The Central Limit Theorem stated above provides the limiting distribution of

X̄ − µ
√ .
σ/ n

However, probabilities involving related quantities such as the sum ni=1 Xi

P
are sometimes required.

Since ni=1 Xi = n X̄, the Central Limit Theorem also applies to the sum of
P
a sequence of random variables.

The next result provides alternative forms of the Central Limit Theorem.

33 / 83
Results
Suppose X1 , X2 , . . . are independent and identically distributed random
variables with common mean µ = E(Xi ) and common variance
σ 2 = V ar(Xi ) < ∞.

Then the Central Limit Theorem may also be stated in the following
alternative forms:
√ D
1 → N (0, σ 2 ),
n(X̄ − µ) −
P
Xi −nµ D
2 i√
σ n
−
→ N (0, 1),
P
Xi −nµ D
3 i √
n
−
→ N (0, σ 2 ).

34 / 83
6.6 Applications of the Central Limit Theorem

35 / 83
In this section, we provide some applications of the Central Limit Theorem.

The Central Limit Theorem is the most important in statistics; its

applications are many and varied.

In particular, it is shown that a much simpler approximation can replace

complicated or long-winded probability calculations.

36 / 83
6.6.1 Probability Calculations about a Sample
Mean

37 / 83
Say we are interested in a random variable X.

We measure this variable on each of n randomly selected subjects, giving us

n independently and identically distributed (iid) random variables
X1 , X2 , . . . , Xn .

X̄ is the average of X from this sample.

Thanks to the Central Limit Theorem (CLT), we know that the average of a
sample from any random variable is approximately normally
distributed.

So, if we know µ and σ for this random variable, we can calculate any
probability we like about X̄.

38 / 83
Example
It is known that, in 1995, adult women in Australia had an average weight of
about 67kg, and the variance of this quantity is about 256.

We randomly choose 10 Australians. What is an approximate

distribution for the average weight of these people? What is the
chance that their average weight exceeds 80kg?

39 / 83
Example

Solution:
From the Central Limit Theorem ,
X̄ − µ D
√ −→ N (0, 1),
σ/ n
so we can say that
σ2

approx 256
X̄ ∼ N µ, =N 67, = N (67, 25.6).
n 10
So, using Chapter 3 methods to calculate normal probabilities,

X̄ − 67 80 − 67
P (X̄ > 80) = P √ > √
25.6 25.6
≈ P (Z > 2.569351)
≈ 0.00509446.

In RStudio, you type 1 -pnorm(2.569351).

40 / 83
Example
Cadmium is a naturally occurring heavy metal found in drinking water at low levels. The
Australian Drinking Water Guidelines recommend drinking water contain no more than
0.05 mg/L of cadmium due to health considerations.

Cadmium concentrations in water cannot be measured exactly, and it is known from

previous work that the variance of measurements of cadmium concentration in water is
about 0.0062 . Three water samples are taken from a small dam, and cadmium levels are
measured. The average of these three values is then used to estimate cadmium
concentration in the dam.

Water in the tested dam is high in cadmium, with 0.06 mg/L. What is the
chance that the unsatisfactory cadmium levels are detected, i.e., the chance
that the average of the three samples is higher than 0.05 mg/L?

41 / 83
Example

Solution:
Let X be the cadmium level for a randomly chosen sample, and we will assume that it is normally
distributed.
We are given that µ = 0.06, σ 2 = 0.0062 , and n = 3.
That is,
2 !
0.006
X̄3 ∼ N 0.06, √ .
3

So,
!
X̄ − 0.06 0.05 − 0.06
P (X̄3 > 0.05) = P 0.006
> 0.006
√ √
3 3
= P (Z > −2.88675) (where Z ∼ N (0, 1))
≈ 0.998

In RStudio, type at the prompt: 1 - pnorm(-2.88675)

42 / 83
6.6.2 How Well Does the Central Limit
Theorem Work?

43 / 83
The normal approximation to the sample mean is an asymptotic
approximation; that is, it is an approximation obtained by considering
ever-increasing values of n.

This approximation can be expected to work well when n is large.

However, how large does n need to be for the normal

approximation to be reasonable?

It turns out that the answer depends on the distribution of X.

44 / 83
Recall that in our proof of the central limit theorem , we said that

u2
ln mn (u) = + (terms which → 0 as n → ∞)
2

When we say that the distribution of X̄ is normal, we are ignoring all the
terms in ln mn (u) that go to zero as n increases.

How well this approximation works in a particular setting depends on how

small these additional terms are.

45 / 83
The first (and usually the largest) of the terms we ignored is

u3 3 2 3
u3
E(X ) − 3µσ − µ = κ1
6σ 3 n3/2 6n1/2
where κ1 is defined as the “skewness” of the distribution (a measure of the
asymmetry in the density function). See Information on skewness .

So this first term gets smaller as n gets larger, and it is small when the
skewness of the distribution is small.

The second term we ignored in the central limit theorem proof, which has a
coefficient of n−1 , is a function of both the skewness and the “kurtosis” of the
distribution (a measure of how long-tailed the density of X is).

46 / 83
From a further study of the distribution of the sample mean for different
choices of X, we can work out the following rough rules of thumb:

For most distributions encountered in practice, n > 30 is a large

enough value of n such that the normal approximation to the sample
mean is reasonable.

47 / 83
It should be noted however that more “pathological” distributions exist,
for which n > 30 is not sufficient to ensure approximate normality of X̄.

For any n, in theory, one can always produce an X such that X̄ is not
close to normal, for example, X ∼ P oisson(1/n).

However, such distributions are rarely encountered in practice.

48 / 83
If it is reasonable to assume that the distribution under consideration
has little skewness and without a “long-tailed” (i.e., it does not have
high kurtosis ), then the Central Limit Theorem will work well for
even smaller n.

Say, n > 10 is often sufficient in such cases.

49 / 83
6.6.3 Normal Approximation to the Binomial
Distribution

50 / 83
The Central Limit Theorem also allows us to approximate some common
distributions by the normal. An example is the binomial distribution.

Central Limit Theorem for Binomial Distribution

Suppose X ∼ Binomial(n, p). Then

X − np D
p −
→ N (0, 1),
np(1 − p)

where E(X) = n p and V ar(X) = n p (1 − p).

51 / 83
Proof.
Let X1 , . . . , Xn be a set of independent Bernoulli random variables with
parameter p. Then
X
X= Xi =⇒ X/n = X̄n
i

1
P
where X̄n = n i Xi is the sample mean of the Xi ’s. By the Central Limit
Theorem

X/n − µ
lim P √ ≤x = P (Z ≤ x),
n→∞ σ/ n
where Z ∼ N (0, 1) and µ = E(Xi ) = p and σ 2 = V ar(Xi ) = p(1 − p).

The required result follows immediately.

52 / 83
The practical ramifications are that probabilities involving binomial random
variables with large n can be approximated by normal probabilities.

However, a slight adjustment, known as a continuity correction, is often

used to improve the approximation:

53 / 83
Normal Approximation to Binomial Distribution with Continuity Correction
Suppose X ∼ Bin (n, p). Then
!
x − np + 12
P (X ≤ x) ≃ P Z ≤ p ,
np(1 − p)
where Z ∼ N (0, 1).

The continuity correction is based on the fact that a discrete random variable is being
approximated by a continuous random variable.

The continuity correction is subtracting 0.5 from any lower bound and adding 0.5 to any
upper bound.

54 / 83
Example
Adam tosses 25 pieces of toast off a roof and ten land butter side up.
Is this evidence that toast lands butter side down more often than
butter side up? That is, is P (X ≤ 10) unusually small?

55 / 83
Example
Adam tosses 25 pieces of toast off a roof and ten land butter side up.
Is this evidence that toast lands butter side down more often than
butter side up? That is, is P (X ≤ 10) unusually small?
Solution
Firstly, let X be the number of toasts that land butter side up. Here
X ∼ Binomial(25, 0.5).

We assume the toast will equally like to land on the butter or non-butter
sides.

We could answer this question by calculating the exact probability, which

would be time-consuming.

55 / 83
Example

Solution - continued
Instead, we use the fact that, by the Central Limit Theorem,
X − np D
Z=p −→ N (0, 1),
n p (1 − p)
p √
where n p = 25 × 12 = 12.5, n p (1 − p) = 25 × 12 × 12 = 6.25, and n p (1 − p) = 6.25 = 2.5.
Therefore,
!
X − np 10 − np
P (X ≤ 10) = P p ≤p
n p (1 − p) n p (1 − p)

10 − 12.5
= P Z≤
2.5
= P (Z ≤ −1) = Φ(−1)
≈ 0.1586553.

Using RStudio, pnorm(-1).

56 / 83
Example

Solution - continued
With the continuity correction ,

10 − 12.5 + 0.5
P (X ≤ 10) = P Z≤
2.5
= P (Z ≤ −0.80) = Φ(−0.80)
= 0.2118554,

using RStudio, pnorm(-0.80).

Compare this with the exact answer obtained from the binomial distribution:

P (X ≤ 10) = 0.2121781,

using RStudio, pbinom(10,25,0.5)

57 / 83
6.6.4 What Should be the Size of n to Use the
Normal Approximation to the Binomial
Distribution?

58 / 83
How large does n need to be for the normal approximation to the binomial
distribution to be reasonable?

Recall that how well the central limit theorem works depends on the
skewness of the distribution of X.

In the case of the Bernoulli and binomial distributions, the skewness is a

function of p (and skewness is zero for p = 0.5).

This means that how well the normal approximation to the binomial works is
a function of p, and it is a better approximation as p approaches 0.5.

59 / 83
A useful rule of thumb is that the normal approximation to the
binomial will work well when n is large enough that both np > 5 and
n(1 − p) > 5.

This rule of thumb means that we do not need a very large value of n for this
large sample approximation to work well.

For example, if p = 0.5,we only need n = 10 for the normal approximation to

work well.

On the other hand, if p = 0.005, we would need a sample size of n = 1000, at

least.

60 / 83
6.6.5 Normal Approximation to the Poisson
Distribution

61 / 83
Normal Approximation to the Poisson Distribution
Suppose X ∼ Poisson(λ). Then

X −λ
lim P √ ≤ x = P (Z ≤ x)
λ→∞ λ
where Z ∼ N (0, 1).

Note that E(X) = λ = V ar(X).

This approximation works increasingly well as λ gets large, and it provides a

reasonable approximation to most Poisson probabilities for λ > 5.

62 / 83
Example
Suppose X ∼ Poisson(100). Then
100x
P (X = x) = e−100 , x = 0, 1, 2, . . . .
x!

120
X 100x
To calculate P (80 ≤ X ≤ 120), we would need to evaluate e−100 .
x=80
x!
This isn’t easy!

Use a normal approximation to calculate

P(80 < X < 120).

63 / 83
Example

Solution:
We have X ∼ Poisson(100). So by the Central Limit Theorem ,

80 − λ X −λ 120 − λ
P (80 ≤ X ≤ 120) = P √ ≤ √ ≤ √
λ λ λ

80 − 100 X − 100 120 − 100
= P ≤ ≤
10 10 10
≈ P (−2 ≤ Z ≤ 2), where Z ∼ N (0, 1)
= Φ(2) − Φ(−2) = 0.9544997,

using RStudio, pnorm(2) - pnorm(-2) .

64 / 83
Example

Solution - continued:
So by the Central Limit Theorem and the continuity correction ,

80 − λ − 0.5 120 − λ + 0.5
P (80 ≤ X ≤ 120) = P √ ≤Z≤ √
λ λ
≈ P (−2.05 ≤ Z ≤ 2.05) where Z ∼ N (0, 1)
= Φ(2.05) − Φ(−2.05) = 0.9596356,

using RStudio, pnorm(2.05) - pnorm(-2.05) .

65 / 83
Example

Solution - continued:
The exact solution is
120
X 100x
P (80 ≤ X ≤ 120) = e−100
x=80
x!
= 0.9546815,

using RStudio, ppois(120,100) - ppois(80,100) .

66 / 83
6.7 Delta Method

67 / 83
The Central Limit Theorem provides a large sample approximation to the
distribution of X̄n .

However, what about other functions of a sequence X1 , X2 , . . . , and

p
1 functions of X̄ such as (X̄n )3 and sin−1 ( X̄n ).
2 functions defined through a non-linear equation such as the solution in α
to
α−1 X
X̄n − ln(Xi ) + Γ(α) = 0.
n i

This second example is particularly important in statistics, as we will see

when we study inference in later chapters.

68 / 83
It turns out that these random variable sequences also converge in
distribution to a normal random variable.

The general technique for establishing such results has become known as the
delta method.

The reason for this name is a bit mysterious, although it seems to be related
to the notation (e.g. δ) used in Taylor series expressions.

69 / 83
We are interested in the distribution of g(X̄n ) for some function of g.

The Delta method allows us to compute the asymptotic distribution of

g(X̄n ).

70 / 83
Delta Method
Let Y1 , Y2 , . . . be a sequence of random variables such that
√ D
→ N (0, σ 2 ).
n(Yn − θ) −
Suppose the function g is differentiable at θ and g ′ (θ) ̸= 0. Then
√ D
→ N (0, σ 2 g ′ (θ)2 ).
n{g(Yn ) − g(θ)} −

71 / 83
Proof.
Sketch of proof: Taylor series expansion gives

g(Yn ) = g(θ + Yn − θ) = g(θ) + g ′ (θ)(Yn − θ) + . . . .

Ignoring the lower order terms and re-arranging, one obtains the approximation:
√ √
n{g(Yn ) − g(θ)} ≃ g ′ (θ) n(Yn − θ).
√
However, the right-hand side is n(Yn − θ) multiplied by a constant g ′ (θ).
Then, we are given that
√ D
n(Yn − θ) −→ N (0, σ 2 )
and so it follows via the above approximation that
√ D
g ′ (θ) n(Yn − θ) −→ N (0, g ′ (θ)2 σ 2 )

72 / 83
Example
Let X1 , X2 , . . . be a sequence of identically distributed random variables with
mean two and variance seven.

Obtain a large sample approximation for the distribution of (X̄n )3 .

73 / 83
Example

Solution:
To obtain the asymptotic distribution of (X̄n )3 , we first need to find the asymptotic distribution of X̄n
and then apply the Delta method.
By the Central Limit Theorem, we know that
√ D √
n(X̄n − 2) −→ N (0, ( 7)2 ).
Let g(X̄n ) = (X̄n )3 . Applying of the Delta method with g(x) = x3 leads to g ′ (x) = 3x2 ⇒ g ′ (2) ̸= 0.
That is, the Delta method gives
√ √
n (X̄n )3 − 23

n g(X̄n ) − g(2) =
D
−→ N (0, 7 × (g ′ (2))2 ) = N (0, 7 × 9 × 16) = N (0, 1008).

For large n, the approximate distribution of (X̄n )3 is

1008
N 8, .
n
74 / 83
Multivariate Extension of the Delta Method
There is a multivariate extension of the Delta method that will be used later
in these notes.

A proof can be found in Chapter 3 of:

Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics,

New York: John Wiley & Sons.

75 / 83
6.8 Supplementary Material

76 / 83
Supplementary Material - Chebychev’s Inequality

Chebychev’s Inequality
For any random variable Y ,
p 1
P Y − E(Y ) > k V ar(Y ) ≤ .
k2

77 / 83
Supplementary Material - E(X̄n ) and V ar(X¯n )

n
1 X
E(X̄n ) = E Xi
n i=1
n
1 X
= E(Xi )
n i=1
(since expectation is a linear operator)
1
= × n×µ
n
= µ.

n
1 X
V ar(X̄n ) = V ar Xi
n i=1
n
1 X
= V ar(Xi )
n2 i=1
(since the Xi are independent)
1 2
= ×n×σ
n2
σ2
= .
n

return to notes

78 / 83
Supplementary Material - Distribution of the Maximum of Uniform(a, b)

Let Y = max{X1 , X2 , . . . , Xn }, where X1 , X2 , . . . , Xn are distributed as Uniform(a, b). We know that Y < x if and only
if every sample element is less than x. That is,
P (Y ≤ x) = P (X1 ≤ x, X2 ≤ x, . . . , Xn ≤ x)
n
Y
= P (Xi ≤ x) by independence
i=1
n
= FXi (x) .

The cumulative distribution function of Xi ∼ Uniform(a, b) is


0
 if x < a
y−a
FXi (x) = b−a
if a < x < b

1 if x ≥ b.


Hence, the cumulative distribution function of Y is


0
 n if y < a
y−a
FY (y) = b−a
if a < y < b

if y ≥ b.

1

and the probability density function of Y is


 n (y−a)n−1
(b−a)n
if a < y < b
fY (y) =
0 otherwise.

Continued on the next slide.

79 / 83
Supplementary Material - Distribution of the Maximum of Uniform(a, b)

Special cases:

1 if Xi ∼ Uniform(0, 1), then the probability density function of Y = max{X1 , X2 , . . . , Xn } is

(
n y n−1 if 0 < y < 1
fY (y) =
0 otherwise.

This is a Beta distribution with α = n and β = 1 since

Γ(n + 1) n!
Beta(n, 1) = = = n.
Γ(n) Γ(1) (n − 1)!

2 if Xi ∼ Uniform(0, θ), then the probability density function of Y = max{X1 , X2 , . . . , Xn } is

n y n−1
(
fY (y) = θn
if 0 < y < θ
0 otherwise.

return to notes

80 / 83
Supplementary Material
Let Z1 , Z2 , . . . , Zn be independent and identically distributed Bernoulli(1/2) random variables.

We know that
1 E(Zi ) = 1 = µ
2
2 V ar(Zi ) = 1 × 1
= 1
= σ2
2 2 4

Then by the Weak Law of Large Numbers (WLLN) ,

n
1 X P 1
Z̄n = Zi −→ .
n i=1
2

Let
n
X 1
n Yn = Zi = n Z̄n ∼ Binomial(n, )
i=1
2
since it is the sum of independent and identically distributed Bernoulli(1/2), Therefore
P 1
Yn = Z̄n −→ .
2
return to notes

81 / 83
Supplementary Material - Skewness
Suppose the random variable X has mean µ and variance σ 2 .

The skewness of X is defined by

Skewness(X) = E (X − µ)3 /σ 3 .

Skewness is a measure of the asymmetry of the probability distribution of a real-valued

random variable about its mean. The skewness value can be positive (or right-skewed),
zero (or symmetrical), negative(left-skewed), or undefined.

return to the notes

82 / 83
Supplementary Material - Kurtosis

Suppose the random variable X has mean µ and variance σ 2 .

The kurtosis of X, kur(X) is defined by

kur(X) = E (X − µ)4 /σ 4 .

Kurtosis is a measure of how outlier-prone a distribution is. The kurtosis of the normal
distribution is 3.

Distributions that are more outlier-prone than the normal distribution have kurtosis
greater than 3; distributions that are less outlier-prone have kurtosis less than 3.
return to notes

83 / 83

Limiting Distributions
No ratings yet
Limiting Distributions
10 pages
4 Convergence and Simulation
No ratings yet
4 Convergence and Simulation
55 pages
Math556 11 ModesOfConvergence
No ratings yet
Math556 11 ModesOfConvergence
9 pages
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
No ratings yet
Lecture Notes 4 Convergence (Chapter 5) 1 Random Samples: 1 N N 1 N N I
12 pages
Convergence of Random Variables
No ratings yet
Convergence of Random Variables
11 pages
Đồ_án_CSXS (1)
No ratings yet
Đồ_án_CSXS (1)
28 pages
Various Modes of Convergence: Definitions
No ratings yet
Various Modes of Convergence: Definitions
6 pages
ee5110-lecture-limit-theorems
No ratings yet
ee5110-lecture-limit-theorems
9 pages
Approximations To Probability Distributions: Limit Theorems
No ratings yet
Approximations To Probability Distributions: Limit Theorems
15 pages
Chapter3 Asymtotic Stats
No ratings yet
Chapter3 Asymtotic Stats
114 pages
Covergence
No ratings yet
Covergence
18 pages
Convergence Concepts: 2.1 Convergence of Random Variables
No ratings yet
Convergence Concepts: 2.1 Convergence of Random Variables
6 pages
Lesson4 MAT284 PDF
100% (1)
Lesson4 MAT284 PDF
36 pages
Math408-Lecture-9-10
No ratings yet
Math408-Lecture-9-10
17 pages
Lec 6
No ratings yet
Lec 6
7 pages
Chapter7 (Probability)
No ratings yet
Chapter7 (Probability)
15 pages
09 Limit Theorems
No ratings yet
09 Limit Theorems
7 pages
Chebysev Inequality: Suppose and Variance
No ratings yet
Chebysev Inequality: Suppose and Variance
13 pages
확통1 LectureNote06 on Limit Theorems
No ratings yet
확통1 LectureNote06 on Limit Theorems
36 pages
Convergence of Random Variables - Wikipedia
No ratings yet
Convergence of Random Variables - Wikipedia
17 pages
Recitation_1
No ratings yet
Recitation_1
10 pages
Lecture 7: Convergence and Limit Theorems
No ratings yet
Lecture 7: Convergence and Limit Theorems
23 pages
Introduction To Probability Theory
No ratings yet
Introduction To Probability Theory
13 pages
Strong Law
No ratings yet
Strong Law
9 pages
Chapter 5 Limit Theorems
No ratings yet
Chapter 5 Limit Theorems
31 pages
MTL106 - by Amaiya Singhal (PROBABILITY AND STOCHASTIC PROCESSES)
No ratings yet
MTL106 - by Amaiya Singhal (PROBABILITY AND STOCHASTIC PROCESSES)
53 pages
Lec 4
No ratings yet
Lec 4
8 pages
Week 16 - L13 - Limit Theorems
No ratings yet
Week 16 - L13 - Limit Theorems
22 pages
Random Variables
No ratings yet
Random Variables
8 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Lecture-6
No ratings yet
Lecture-6
90 pages
ch2
No ratings yet
ch2
24 pages
Section53
No ratings yet
Section53
35 pages
Probability and Stochastic Process 51
No ratings yet
Probability and Stochastic Process 51
11 pages
MIT14 30s09 Lec17
No ratings yet
MIT14 30s09 Lec17
9 pages
Lect 05
No ratings yet
Lect 05
22 pages
Chapter 7 8fhjg
No ratings yet
Chapter 7 8fhjg
9 pages
Two Proofs of The Central Limit Theorem
No ratings yet
Two Proofs of The Central Limit Theorem
13 pages
Bgpev2 Asymptotic
No ratings yet
Bgpev2 Asymptotic
31 pages
Problems: MN) MN
No ratings yet
Problems: MN) MN
11 pages
03 Asym - Ipynb Econ Prob
No ratings yet
03 Asym - Ipynb Econ Prob
3 pages
G Chebyshev's
No ratings yet
G Chebyshev's
9 pages
Memo Proba
No ratings yet
Memo Proba
2 pages
ORF309 Limit Theorems
No ratings yet
ORF309 Limit Theorems
7 pages
Chapter 7
No ratings yet
Chapter 7
17 pages
CLT PDF
No ratings yet
CLT PDF
13 pages
Central Limit Theorem
100% (1)
Central Limit Theorem
20 pages
Basic Statistics and Probability Theory
No ratings yet
Basic Statistics and Probability Theory
45 pages
2 PDF
No ratings yet
2 PDF
27 pages
Asymptotic Statistics (By Changliang ZOU)
No ratings yet
Asymptotic Statistics (By Changliang ZOU)
115 pages
Probability Theory Notes Chapter 3 Varadhan
No ratings yet
Probability Theory Notes Chapter 3 Varadhan
50 pages
7.limiting Distributions
No ratings yet
7.limiting Distributions
35 pages
Chap 1samp Distributions
No ratings yet
Chap 1samp Distributions
7 pages
hw7 - Sol 2
No ratings yet
hw7 - Sol 2
15 pages
Central Limit Theorem - Wikipedia
No ratings yet
Central Limit Theorem - Wikipedia
19 pages
Unit 3 YT Part2
No ratings yet
Unit 3 YT Part2
74 pages
Đồ_án_CSXS
No ratings yet
Đồ_án_CSXS
89 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
week01B
No ratings yet
week01B
45 pages
week02A
No ratings yet
week02A
44 pages
Question 1
No ratings yet
Question 1
5 pages
quiz_6
No ratings yet
quiz_6
8 pages
Chapter 3
No ratings yet
Chapter 3
142 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Math5846_chapter10
No ratings yet
Math5846_chapter10
76 pages
Math 5846 Chapter 1
No ratings yet
Math 5846 Chapter 1
102 pages
Math5846_chapter9
No ratings yet
Math5846_chapter9
43 pages
Matrice and determinants dpp1
No ratings yet
Matrice and determinants dpp1
30 pages
Gmail - CSC 208 Problem Set 1
No ratings yet
Gmail - CSC 208 Problem Set 1
8 pages
7 Numerical Methods 3 Newton Raphson and Second Order
No ratings yet
7 Numerical Methods 3 Newton Raphson and Second Order
19 pages
Lecture 2
No ratings yet
Lecture 2
5 pages
Mth501 Solved Mcqs by $hining $tar: Forallu, Vinwandu-Vmustbeinw
No ratings yet
Mth501 Solved Mcqs by $hining $tar: Forallu, Vinwandu-Vmustbeinw
75 pages
Seminar Report
No ratings yet
Seminar Report
5 pages
22eil384 - Basic Programming With Matlab
No ratings yet
22eil384 - Basic Programming With Matlab
2 pages
LPP Formulation
0% (2)
LPP Formulation
15 pages
Multiple Integral
No ratings yet
Multiple Integral
17 pages
Un Symmetrical Bending and Shear Centre
No ratings yet
Un Symmetrical Bending and Shear Centre
11 pages
Decentralized Load-Frequency Control of A Two-Area Power System Via Linear Programming and Optimization Techniques
No ratings yet
Decentralized Load-Frequency Control of A Two-Area Power System Via Linear Programming and Optimization Techniques
6 pages
Class 10 OPT Math Set A
100% (1)
Class 10 OPT Math Set A
2 pages
Advanced Finite Element Analysis Using ANSYS Mechanical APDL
No ratings yet
Advanced Finite Element Analysis Using ANSYS Mechanical APDL
3 pages
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
No ratings yet
Objective Mapping and Kriging: 5.1 Contouring and Gridding Concepts
24 pages
Polynomials
No ratings yet
Polynomials
2 pages
Partial Differentiation
No ratings yet
Partial Differentiation
8 pages
Symmetry Factors of Feynman Diagrams For Scalar Fields
No ratings yet
Symmetry Factors of Feynman Diagrams For Scalar Fields
19 pages
Modulus and Inequalities PDF
No ratings yet
Modulus and Inequalities PDF
20 pages
Haversine Formula PDF
100% (1)
Haversine Formula PDF
4 pages
zgWc882YOiOX5wAWadDh
No ratings yet
zgWc882YOiOX5wAWadDh
6 pages
Chapter Three Velocity Analysis
No ratings yet
Chapter Three Velocity Analysis
57 pages
Class 10 Trigonometry worksheet
No ratings yet
Class 10 Trigonometry worksheet
1 page
J. Boyd - Rational Chebychev Functions
No ratings yet
J. Boyd - Rational Chebychev Functions
31 pages
Algebra - 5: Number of Questions: 35
No ratings yet
Algebra - 5: Number of Questions: 35
8 pages
Mathematics Nda/Na: Finish Faster With Vipin Gaur
No ratings yet
Mathematics Nda/Na: Finish Faster With Vipin Gaur
42 pages
SCI EI Art Qi
No ratings yet
SCI EI Art Qi
69 pages
MTH 100 Practice Question Solve 3 To 5
No ratings yet
MTH 100 Practice Question Solve 3 To 5
4 pages
A Brief Introduction To The Lorenz Gauge & The Quantization of The Electric Field.
No ratings yet
A Brief Introduction To The Lorenz Gauge & The Quantization of The Electric Field.
5 pages
Castigliano S Theorem
No ratings yet
Castigliano S Theorem
19 pages