Chapter3 Asymtotic Stats
Chapter3 Asymtotic Stats
Jonathan Roth
Mathematical Econometrics I
Brown University
Fall 2023
Outline
1. Overview
1
Motivation
We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance
2
Motivation
We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance
2
Motivation
We’ve seen how we can test hypotheses about population means using
information from the sample mean µ̂ when it is normally distributed
with a known variance
2
Overview of Important Results
3
Overview of Important Results
The Central Limit Theorem (CLT) says that when N is large, the
distribution of µ̂ is approximately normally distributed with mean µ
and variance σ 2 /n
3
Overview of Important Results
The Central Limit Theorem (CLT) says that when N is large, the
distribution of µ̂ is approximately normally distributed with mean µ
and variance σ 2 /n
3
Outline
1. Overview X
4
Convergence in Probability
5
Convergence in Probability
P(|XN − x | > ε) → 0
5
Convergence in Probability
P(|XN − x | > ε) → 0
5
Convergence in Probability
P(|XN − x | > ε) → 0
5
Convergence in Probability (Cont.)
6
Convergence in Probability (Cont.)
6
Convergence in Probability (Cont.)
6
Convergence in Probability (Cont.)
6
Convergence in Probability (Cont.)
6
Law of Large Numbers
Law of Large Numbers. Suppose that Y1 , ..., YN are drawn iid from
a distribution with Var (Yi ) = σ 2 < ∞. Then
1 N
µ̂N = ∑ Yi →p µ = E [Yi ]
N i=1
In words: as the sample gets large, the sample mean will be close to
the population mean with high probability.
7
Law of Large Numbers
Law of Large Numbers. Suppose that Y1 , ..., YN are drawn iid from
a distribution with Var (Yi ) = σ 2 < ∞. Then
1 N
µ̂N = ∑ Yi →p µ = E [Yi ]
N i=1
In words: as the sample gets large, the sample mean will be close to
the population mean with high probability.
Proof: We saw last chapter that E [µ̂N ] = µ and Var (µ̂N ) = σ 2 /N.
Thus,
Var (µ̂N ) = E [(µ̂N − µ)2 ] = σ 2 /N → 0
Hence, µ̂N →p µ by our “useful fact”.
7
Laws of Large Numbers Illustration
1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 1
8
Laws of Large Numbers Illustration
1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 10
9
Laws of Large Numbers Illustration
1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 100
10
Laws of Large Numbers Illustration
1
Distribution and mean of N ∑i Zi when Zi ∼ U(0, 1), N = 1000
11
Convergence in Distribution
12
Convergence in Distribution
12
Central Limit Theorem
13
Central Limit Theorem
13
Central Limit Theorem
13
CLT Illustration
1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 1
14
CLT Illustration
1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 2
15
CLT Illustration
1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 5
16
CLT Illustration
1
Distributions of µ̂ = N ∑i Xi vs. N(E [µ̂], Var (µ̂)): Xi ∼ U(0, 1), N = 10
17
CLT Illustration II
https://www.youtube.com/watch?v=EvHiee7gs9Y
18
19
Multivariate Versions
20
Multivariate Versions
20
Multivariate Versions
LLN: For µ̂µ N , the sample mean of iid vectors Y1 , ...YN with mean µ
µ N →p µ
and finite variance, µ̂
20
Multivariate Versions
LLN: For µ̂µ N , the sample mean of iid vectors Y1 , ...YN with mean µ
µ N →p µ
and finite variance, µ̂
20
Continuous Mapping Theorem
21
Continuous Mapping Theorem
21
Continuous Mapping Theorem
21
Continuous Mapping Theorem
21
Continuous Mapping Theorem
21
Convergence of Sample Variance
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
Proof:
We can write the sample variance as σ̂ 2 = 1
N ∑N 2 2
i=1 Yi − µ̂ .
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
Proof:
1 N
We can write the sample variance as σ̂ 2 = 2 2
N ∑i=1 Yi − µ̂ .
1 N 2 2
First term: by the LLN, N ∑i=1 Yi →p E [Yi ].
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
Proof:
1 N
We can write the sample variance as σ̂ 2 = 2 2
N ∑i=1 Yi − µ̂ .
1 N 2 2
First term: by the LLN, N ∑i=1 Yi →p E [Yi ].
Second term: by the LLN, µ̂ →p µ = E [Yi ]. Thus, by the CMT,
µ̂ 2 →p E [Yi ]2 .
22
Convergence of Sample Variance
Let σ̂ 2 = 1
N ∑N 2
i=1 (Yi − µ̂) be the sample variance of Yi .
Proof:
1 N
We can write the sample variance as σ̂ 2 = 2 2
N ∑i=1 Yi − µ̂ .
1 N 2 2
First term: by the LLN, N ∑i=1 Yi →p E [Yi ].
Second term: by the LLN, µ̂ →p µ = E [Yi ]. Thus, by the CMT,
µ̂ 2 →p E [Yi ]2 .
Thus, by the CMT again, 1
N ∑N 2 2 2 2 2
i=1 Yi − µ̂ →p E [Yi ] − E [Yi ] = σ .
22
Slutsky’s Lemma
23
Slutsky’s Lemma
XN + YN →d c + Y .
23
Slutsky’s Lemma
XN + YN →d c + Y .
Xn Yn →d cY .
23
Slutsky’s Lemma
XN + YN →d c + Y .
Xn Yn →d cY .
23
Slutsky’s Lemma
XN + YN →d c + Y .
Xn Yn →d cY .
23
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.
24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.
24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.
24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.
µ̂ − µ0
Thus, by Slutsky’s lemma, t̂ = √ →d N(0, 1).
σ̂ / n
24
Asymptotic Hypothesis Testing
Recall that when Yi ∼ N(µ, σ 2 ), we showed that the t-statistic
t̂ = σµ̂−µ
√ 0 ∼ N(0, 1) under H0 : µ = µ0 .
/ n
Thus, when Yi ∼ N(µ, σ 2 ), we had that Pr (|t̂| > 1.96) = 0.05 under
the null.
µ̂ − µ0
Thus, by Slutsky’s lemma, t̂ = √ →d N(0, 1).
σ̂ / n
25
Asymptotic Confidence Intervals
25
Outline
1. Overview X
26
Example – Oregon Health Insurance Experiment
27
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√
µ̂ ± 1.96 × σ̂ / N =
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 =
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 = [0.319, 0.338]
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 = [0.319, 0.338]
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 = [0.319, 0.338]
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 = [0.319, 0.338]
28
Sample Means for Depression Outcome
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Say we want a CI for the population mean in the control group
We have
√ √
µ̂ ± 1.96 × σ̂ / N = 0.329 ± 1.96 × 0.470/ 10426 = [0.319, 0.338]
28
CIs for Treatment Effects in Experiments
29
CIs for Treatment Effects in Experiments
How can we form confidence intervals (or test hypotheses) about the
treatment effect?
29
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
30
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
30
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
30
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
Var (τ̂) =
30
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
30
Mean and variance of the difference-in-means
1
Let Ȳ1 = N1 ∑i:Di =1 Yi be the sample mean for the treated group.
1
Let Ȳ0 = N0 ∑i:Di =0 Yi be the sample mean for the control group.
where the fact that the samples are independent implies that
Cov (Y¯1 , Y¯0 ) = 0.
30
We just showed that in an experiment
31
We just showed that in an experiment
31
We just showed that in an experiment
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Showing Asymptotic Normality
√
By the CLT, we have that N1 (Y¯1 − µ1 ) →d N(0, σ12 ).
N1 1
Note that N = N ∑i Di →p E [Di ] by the LLN.
32
Hypothesis Testing for Experiments (continued)
We just showed that
√ 1
Ȳ1 − E [Yi (1)] E [Di ] Var (Yi (1)) 0
N →d N 0, 1
Ȳ0 − E [Yi (0)] 0 1−E [Di ] Var (Yi (0))
33
Hypothesis Testing for Experiments (continued)
We just showed that
√ 1
Ȳ1 − E [Yi (1)] E [Di ] Var (Yi (1)) 0
N →d N 0, 1
Ȳ0 − E [Yi (0)] 0 1−E [Di ] Var (Yi (0))
33
Hypothesis Testing for Experiments (continued)
We just showed that
√ 1
Ȳ1 − E [Yi (1)] E [Di ] Var (Yi (1)) 0
N →d N 0, 1
Ȳ0 − E [Yi (0)] 0 1−E [Di ] Var (Yi (0))
We can thus form a 95% confidence interval for τ = E [Yi (1) − Yi (0)],
√
Ȳ1 − Ȳ0 ± 1.96σ̂ / N,
where σ̂ 2 = NN1 σ̂12 + NN0 σ̂02 , where σ̂d2 is the sample variance for
treatment group d ∈ {0, 1}
33
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ =
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ = 0.306 − 0.329 = −0.023.
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ = 0.306 − 0.329 = −0.023.
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ = 0.306 − 0.329 = −0.023.
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ = 0.306 − 0.329 = −0.023.
34
Sample Means for Depression Outcome (Again)
Control Group Treated Group
Mean 0.329 0.306
SD 0.470 0.461
N 10426 13315
Our point estimate of the treatment effect is
τ̂ = 0.306 − 0.329 = −0.023.
34
Hypothesis Testing under Unconfoundedness
Recall that under unconfoundedness, Di ⊥⊥ (Yi (1), Yi (0))|Xi , we have
35
Hypothesis Testing under Unconfoundedness
Recall that under unconfoundedness, Di ⊥⊥ (Yi (1), Yi (0))|Xi , we have
35
Hypothesis Testing under Unconfoundedness
Recall that under unconfoundedness, Di ⊥⊥ (Yi (1), Yi (0))|Xi , we have
35
Hypothesis Testing under Unconfoundedness
Recall that under unconfoundedness, Di ⊥⊥ (Yi (1), Yi (0))|Xi , we have
36
The Challenge of Continuous x
We’ve shown thus far how we can estimate CATE (x ) when the
number of observations with Xi = x is large.
36
The Challenge of Continuous x
We’ve shown thus far how we can estimate CATE (x ) when the
number of observations with Xi = x is large.
36
The Challenge of Continuous x
We’ve shown thus far how we can estimate CATE (x ) when the
number of observations with Xi = x is large.
36
The Challenge of Continuous x
We’ve shown thus far how we can estimate CATE (x ) when the
number of observations with Xi = x is large.
The next part of the course will focus on achieving this task using
linear regression as an approximation to the CEF.
36