0% found this document useful (0 votes)

32 views114 pages

Chapter3 Asymtotic Stats

Uploaded by

Tin Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views114 pages

Chapter3 Asymtotic Stats

Uploaded by

Tin Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

Chapter 3: Asymptotic Statistics

Jonathan Roth

Mathematical Econometrics I
Brown University
Fall 2023
Outline

1. Overview

2. LLN, CLT, and CMT

3. Putting Asymptotics into Practice

1
Motivation

We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance

This situation arises when we know that Yi ∼ N(µ, σ 2 ) with known σ

But this situation is rare... how do we “do inference” more generally?

2
Motivation

We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance

This situation arises when we know that Yi ∼ N(µ, σ 2 ) with known σ

But this situation is rare... how do we “do inference” more generally?

Fortunately, the assumption of normally distributed sample means

turns out to be a good approximation when samples are large

2
Motivation

We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance

This situation arises when we know that Yi ∼ N(µ, σ 2 ) with known σ

But this situation is rare... how do we “do inference” more generally?

Fortunately, the assumption of normally distributed sample means

turns out to be a good approximation when samples are large

What we mean by a “good approximation” is formalized by asymptotic

statistics, which considers the distribution of µ̂ in the limit as N → ∞

2
Overview of Important Results

The Law of Large Numbers (LLN) says that when N is large, µ̂ is

close to µ with very high probability

3
Overview of Important Results

The Law of Large Numbers (LLN) says that when N is large, µ̂ is

close to µ with very high probability

The Central Limit Theorem (CLT) says that when N is large, the
distribution of µ̂ is approximately normally distributed with mean µ
and variance σ 2 /n

3
Overview of Important Results

The Law of Large Numbers (LLN) says that when N is large, µ̂ is

close to µ with very high probability

The Central Limit Theorem (CLT) says that when N is large, the
distribution of µ̂ is approximately normally distributed with mean µ
and variance σ 2 /n

The Continuous Mapping Theorem says that when N is large,

continuous functions of µ̂, say g(µ̂), are also close to g(µ)

3
Outline

1. Overview X

2. LLN, CLT, and CMT

3. Putting Asymptotics into Practice

4
Convergence in Probability

Intuitively, a random variable XN converges in probability to x if the

probability that XN is “close to” x is almost 1 when N is large

5
Convergence in Probability

Intuitively, a random variable XN converges in probability to x if the

probability that XN is “close to” x is almost 1 when N is large

Formally, we say XN converges in probability to x , Xn →p x or

plim Xn = x , if for all ε > 0,

P(|XN − x | > ε) → 0

5
Convergence in Probability

Intuitively, a random variable XN converges in probability to x if the

probability that XN is “close to” x is almost 1 when N is large

Formally, we say XN converges in probability to x , Xn →p x or

plim Xn = x , if for all ε > 0,

P(|XN − x | > ε) → 0

If Xn →p x for a constant x , we say Xn is consistent for x

5
Convergence in Probability

Intuitively, a random variable XN converges in probability to x if the

probability that XN is “close to” x is almost 1 when N is large

Formally, we say XN converges in probability to x , Xn →p x or

plim Xn = x , if for all ε > 0,

P(|XN − x | > ε) → 0

If Xn →p x for a constant x , we say Xn is consistent for x

Typically x is a constant, although we will sometimes also say

XN → X for X a random variable (using the same definition as above)

5
Convergence in Probability (Cont.)

Useful fact: if E [(XN − x )2 ] → 0, then XN →p x

6
Convergence in Probability (Cont.)

Useful fact: if E [(XN − x )2 ] → 0, then XN →p x

Proof (you won’t be responsible for this):

By the law of iterated expectations,

E [(XN − x )2 ] =P(|Xn − x | > ε)E [(XN − x )2 ||Xn − x | > ε]+

P(|Xn − x | ≤ ε)E [(XN − x )2 ||Xn − x | ≤ ε]

6
Convergence in Probability (Cont.)

Useful fact: if E [(XN − x )2 ] → 0, then XN →p x

Proof (you won’t be responsible for this):

By the law of iterated expectations,

E [(XN − x )2 ] =P(|Xn − x | > ε)E [(XN − x )2 ||Xn − x | > ε]+

P(|Xn − x | ≤ ε)E [(XN − x )2 ||Xn − x | ≤ ε]
≥P(|Xn − x | > ε)ε 2 + 0

6
Convergence in Probability (Cont.)

Useful fact: if E [(XN − x )2 ] → 0, then XN →p x

Proof (you won’t be responsible for this):

By the law of iterated expectations,

E [(XN − x )2 ] =P(|Xn − x | > ε)E [(XN − x )2 ||Xn − x | > ε]+

P(|Xn − x | ≤ ε)E [(XN − x )2 ||Xn − x | ≤ ε]
≥P(|Xn − x | > ε)ε 2 + 0

This implies that

P(|XN − x | > ε) ≤ E [(XN − x )2 ]/ε 2 (Chebychev’s Inequality)

6
Convergence in Probability (Cont.)

Useful fact: if E [(XN − x )2 ] → 0, then XN →p x

Proof (you won’t be responsible for this):

By the law of iterated expectations,

E [(XN − x )2 ] =P(|Xn − x | > ε)E [(XN − x )2 ||Xn − x | > ε]+

P(|Xn − x | ≤ ε)E [(XN − x )2 ||Xn − x | ≤ ε]
≥P(|Xn − x | > ε)ε 2 + 0

This implies that

P(|XN − x | > ε) ≤ E [(XN − x )2 ]/ε 2 (Chebychev’s Inequality)

Hence, E [(XN − x )2 ] → 0 implies P(|XN − x | > ε) → 0

6
Law of Large Numbers

Law of Large Numbers. Suppose that Y1 , ..., YN are drawn iid from
a distribution with Var (Yi ) = σ 2 < ∞. Then

1 N
µ̂N = ∑ Yi →p µ = E [Yi ]
N i=1

In words: as the sample gets large, the sample mean will be close to
the population mean with high probability.

7
Law of Large Numbers

Law of Large Numbers. Suppose that Y1 , ..., YN are drawn iid from
a distribution with Var (Yi ) = σ 2 < ∞. Then

1 N
µ̂N = ∑ Yi →p µ = E [Yi ]
N i=1

In words: as the sample gets large, the sample mean will be close to
the population mean with high probability.

Proof: We saw last chapter that E [µ̂N ] = µ and Var (µ̂N ) = σ 2 /N.
Thus,
Var (µ̂N ) = E [(µ̂N − µ)2 ] = σ 2 /N → 0
Hence, µ̂N →p µ by our “useful fact”.

7
Laws of Large Numbers Illustration

1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 1

8
Laws of Large Numbers Illustration

1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 10

9
Laws of Large Numbers Illustration

1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 100

10
Laws of Large Numbers Illustration

1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 1000

11
Convergence in Distribution

You might have noticed that the distribution of µ̂ in the simulations

looks close to a normal distribution as N gets large

The notion of convergence in distribution formalizes what it means

for one distribution to be close to another distribution

12
Convergence in Distribution

You might have noticed that the distribution of µ̂ in the simulations

looks close to a normal distribution as N gets large

The notion of convergence in distribution formalizes what it means

for one distribution to be close to another distribution

Definition: We say that XN converges in distribution to a continuously

distributed variable X , denoted Xn →d X or Xn ⇒ X , if the CDF of
XN converges (pointwise) to the CDF of X ,

FXN (x ) → FX (x ) for all x

12
Central Limit Theorem

The Central Limit Theorem (CLT) formalizes the sense in which

sample means are approximately normally distributed in large samples

13
Central Limit Theorem

The Central Limit Theorem (CLT) formalizes the sense in which

sample means are approximately normally distributed in large samples

Theorem: Suppose that Y1 , ..., YN are drawn iid from a distribution

with mean µ = E [Yi ] and variance Var (Yi ) = σ 2 < ∞. Then the
sample mean µ̂ = N1 ∑Ni=1 Yi satisfies
√
N(µ̂ − µ) →d N(0, σ 2 )

13
Central Limit Theorem

The Central Limit Theorem (CLT) formalizes the sense in which

sample means are approximately normally distributed in large samples

Theorem: Suppose that Y1 , ..., YN are drawn iid from a distribution

with mean µ = E [Yi ] and variance Var (Yi ) = σ 2 < ∞. Then the
sample mean µ̂ = N1 ∑Ni=1 Yi satisfies
√
N(µ̂ − µ) →d N(0, σ 2 )

In words, the theorem says the following:

1 We can start with any distribution Y , possibly non-normal
i
2 If we take the average of the Y , ..., Y
1 N in a sample sufficiently
large, the distribution of µ̂ = N1 ∑i Yi is (approximately) normal!

13
CLT Illustration

1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 1

14
CLT Illustration

1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 2

15
CLT Illustration

1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 5

16
CLT Illustration

1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 10

17
CLT Illustration II

https://www.youtube.com/watch?v=EvHiee7gs9Y
18
19
Multivariate Versions

The results we’ve discussed extend naturally to the multivariate case

20
Multivariate Versions

The results we’ve discussed extend naturally to the multivariate case

For a vector XN ∈ Rk , we say XN →p x if each component of XN

converges in probability to each component of x.

20
Multivariate Versions

The results we’ve discussed extend naturally to the multivariate case

For a vector XN ∈ Rk , we say XN →p x if each component of XN

converges in probability to each component of x.

LLN: For µ̂µ N , the sample mean of iid vectors Y1 , ...YN with mean µ
µ N →p µ
and finite variance, µ̂

20
Multivariate Versions

The results we’ve discussed extend naturally to the multivariate case

For a vector XN ∈ Rk , we say XN →p x if each component of XN

converges in probability to each component of x.

LLN: For µ̂µ N , the sample mean of iid vectors Y1 , ...YN with mean µ
µ N →p µ
and finite variance, µ̂

For a vector XN ∈ Rk , we say XN →d X for X continuously distributed

if FXN (x) → FX (x) for all x ∈ Rk .

CLT: For µ̂µ N , the sample

√ mean of iid vectors Y1 , ...YN with mean µ
and finite variance Σ , N(µ̂µ N − µ ) →d N(0, Σ )

20
Continuous Mapping Theorem

Sometimes we are interested in functions of sample means (e.g., the

t-statistic is a function of µ̂ and σ ).

21
Continuous Mapping Theorem

Sometimes we are interested in functions of sample means (e.g., the

t-statistic is a function of µ̂ and σ ).

The continuous mapping theorem (CMT) tells us about continuous

functions of random variables that converge in distribution/probability

21
Continuous Mapping Theorem

Sometimes we are interested in functions of sample means (e.g., the

t-statistic is a function of µ̂ and σ ).

The continuous mapping theorem (CMT) tells us about continuous

functions of random variables that converge in distribution/probability

Theorem: suppose g(·) is a continuous function

If XN →p X , then g(XN ) →p g(X )

21
Continuous Mapping Theorem

Sometimes we are interested in functions of sample means (e.g., the

t-statistic is a function of µ̂ and σ ).

The continuous mapping theorem (CMT) tells us about continuous

functions of random variables that converge in distribution/probability

Theorem: suppose g(·) is a continuous function

If XN →p X , then g(XN ) →p g(X )
If XN →d X , then g(XN ) →d g(X )

21
Continuous Mapping Theorem

Sometimes we are interested in functions of sample means (e.g., the

t-statistic is a function of µ̂ and σ ).

The continuous mapping theorem (CMT) tells us about continuous

functions of random variables that converge in distribution/probability

Theorem: suppose g(·) is a continuous function

If XN →p X , then g(XN ) →p g(X )
If XN →d X , then g(XN ) →d g(X )
Multivariate versions here too: If XN →p X, then g(XN ) →p g(X) and
if XN →d X, then g(XN ) →d g(X)

21
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

Claim: if Y1 , ..., YN are iid and Var (Yi2 ) is finite, then

σ̂ 2 →p σ 2 = Var (Yi ).

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

Claim: if Y1 , ..., YN are iid and Var (Yi2 ) is finite, then

σ̂ 2 →p σ 2 = Var (Yi ).

Proof:
We can write the sample variance as σ̂ 2 = 1
N ∑N 2 2
i=1 Yi − µ̂ .

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

Claim: if Y1 , ..., YN are iid and Var (Yi2 ) is finite, then

σ̂ 2 →p σ 2 = Var (Yi ).

Proof:
1 N
We can write the sample variance as σ̂ 2 = 2 2
N ∑i=1 Yi − µ̂ .
1 N 2 2
First term: by the LLN, N ∑i=1 Yi →p E [Yi ].

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

Claim: if Y1 , ..., YN are iid and Var (Yi2 ) is finite, then

σ̂ 2 →p σ 2 = Var (Yi ).

Proof:
1 N
We can write the sample variance as σ̂ 2 = 2 2
N ∑i=1 Yi − µ̂ .
1 N 2 2
First term: by the LLN, N ∑i=1 Yi →p E [Yi ].
Second term: by the LLN, µ̂ →p µ = E [Yi ]. Thus, by the CMT,
µ̂ 2 →p E [Yi ]2 .

22
Convergence of Sample Variance

One useful application of the CMT is to show convergence in

probability of the sample variance

Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .

Claim: if Y1 , ..., YN are iid and Var (Yi2 ) is finite, then

σ̂ 2 →p σ 2 = Var (Yi ).

22
Slutsky’s Lemma

Slutsky’s lemma (sometimes Slutsky’s theorem) summarizes a few

special cases of the CMT that are very useful.

23
Slutsky’s Lemma

Slutsky’s lemma (sometimes Slutsky’s theorem) summarizes a few

special cases of the CMT that are very useful.

Suppose that XN →p c for a constant c, and YN →d Y . Then:

XN + YN →d c + Y .

23
Slutsky’s Lemma

Slutsky’s lemma (sometimes Slutsky’s theorem) summarizes a few

special cases of the CMT that are very useful.

Suppose that XN →p c for a constant c, and YN →d Y . Then:

XN + YN →d c + Y .

Xn Yn →d cY .

23
Slutsky’s Lemma

Slutsky’s lemma (sometimes Slutsky’s theorem) summarizes a few

special cases of the CMT that are very useful.

Suppose that XN →p c for a constant c, and YN →d Y . Then:

XN + YN →d c + Y .

Xn Yn →d cY .

If c 6= 0, then Yn /Xn →d Y /c.

23
Slutsky’s Lemma

Slutsky’s lemma (sometimes Slutsky’s theorem) summarizes a few

special cases of the CMT that are very useful.

Suppose that XN →p c for a constant c, and YN →d Y . Then:

XN + YN →d c + Y .

Xn Yn →d cY .

If c 6= 0, then Yn /Xn →d Y /c.

Analogous versions apply for vector-valued random variables.

23
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.

24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.

Now, suppose that Yi is not normally distributed and we don’t know

its variance.

24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.

Now, suppose that Yi is not normally distributed and we don’t know

its variance.
√
By CLT, N(µ̂ − µ0 ) →d N(0, σ 2 ).
By CMT and LLN (as shown above), σ̂ →p σ .

24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.

Now, suppose that Yi is not normally distributed and we don’t know

its variance.
√
By CLT, N(µ̂ − µ0 ) →d N(0, σ 2 ).
By CMT and LLN (as shown above), σ̂ →p σ .

µ̂ − µ0
Thus, by Slutsky’s lemma, t̂ = √ →d N(0, 1).
σ̂ / n

24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n

Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.

Now, suppose that Yi is not normally distributed and we don’t know

its variance.
√
By CLT, N(µ̂ − µ0 ) →d N(0, σ 2 ).
By CMT and LLN (as shown above), σ̂ →p σ .

µ̂ − µ0
Thus, by Slutsky’s lemma, t̂ = √ →d N(0, 1).
σ̂ / n

Hence, asymptotically Pr (|t̂| > 1.96) → 0.05, even though Yi is not

normal and σ̂ is estimated! We can hypothesis test just like before.
24
Asymptotic Confidence Intervals

Similarly, when Yi was

√ normal w/ σ known, we showed the confidence
interval µ̂ ± 1.96σ / N contained the true µ 95% of the time

25
Asymptotic Confidence Intervals

Similarly, when Yi was

√ normal w/ σ known, we showed the confidence
interval µ̂ ± 1.96σ / N contained the true µ 95% of the time

Analogously,√when Yi is non-normal with unknown variance,

µ̂ ± 1.96σ̂ / N contains the true µ with probability approaching 95%
as N grows large.

25
Outline

1. Overview X

2. LLN, CLT, and CMT X

3. Putting Asymptotics into Practice

26
Example – Oregon Health Insurance Experiment

27
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315