0% found this document useful (0 votes)
7 views28 pages

Lecture6-Standard Error, Confidence Interval and Simple Hypothesis Testing - Slides - Annotated

Lecture 6 of Econ 704 focuses on the conditional mean model, discussing the asymptotic distribution of the OLS estimator for β and its applications in assessing accuracy and hypothesis testing. Key topics include estimating asymptotic variance, constructing confidence intervals, and conducting hypothesis tests, including single and multiple hypotheses. The lecture also introduces the Eicker-Huber-White estimator for variance and the concept of p-values in hypothesis testing.

Uploaded by

Hot Buzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views28 pages

Lecture6-Standard Error, Confidence Interval and Simple Hypothesis Testing - Slides - Annotated

Lecture 6 of Econ 704 focuses on the conditional mean model, discussing the asymptotic distribution of the OLS estimator for β and its applications in assessing accuracy and hypothesis testing. Key topics include estimating asymptotic variance, constructing confidence intervals, and conducting hypothesis tests, including single and multiple hypotheses. The lecture also introduces the Eicker-Huber-White estimator for variance and the concept of p-values in hypothesis testing.

Uploaded by

Hot Buzz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Econ 704

Lecture 6 - Standard Error, Confidence Interval, and Simple Hypothesis


Testing

Xiaoxia Shi

University of Wisconsin - Madison

11/12/2018

Lecture (6) Econ 704 11/12/2018 1 / 26


We continue our discussion of the conditional mean model:

E [Y |X ] = X 0 β.

We have derived the asymptotic distribution of the OLS estimator for


β:

n ( β̂ − β) →d N 0, (E [XX 0 ])−1 E [U 2 XX 0 ](E [XX 0 ])−1 ,


where
U = Y − X 0β

Lecture (6) Econ 704 11/12/2018 2 / 26


We continue our discussion of the conditional mean model:

E [Y |X ] = X 0 β.

We have derived the asymptotic distribution of the OLS estimator for


β:

n ( β̂ − β) →d N 0, (E [XX 0 ])−1 E [U 2 XX 0 ](E [XX 0 ])−1 ,


where
U = Y − X 0β

What do we use the asymptotic distribution for?

Lecture (6) Econ 704 11/12/2018 2 / 26


We continue our discussion of the conditional mean model:

E [Y |X ] = X 0 β.

We have derived the asymptotic distribution of the OLS estimator for


β:

n ( β̂ − β) →d N 0, (E [XX 0 ])−1 E [U 2 XX 0 ](E [XX 0 ])−1 ,


where
U = Y − X 0β

What do we use the asymptotic distribution for?


for assessing accuracy of β̂
for testing hypotheses about β.

Lecture (6) Econ 704 11/12/2018 2 / 26


Outline

Estimating the Asymptotic Variance: Standard Error

Confidence Interval

Simple Hypothesis Testing

Multiple Hypothesis Testing (next lecture)

Lecture (6) Econ 704 11/12/2018 3 / 26


Outline

Estimating the Asymptotic Variance: Standard Error

Confidence Interval

Simple Hypothesis Testing

Multiple Hypothesis Testing (next lecture)

Lecture (6) Econ 704 11/12/2018 4 / 26


Estimating the Asymptotic Variance-Covariance

Recall that √ 
n ( β̂ − β) →d N 0, Vβ ,
where Vβ = (E [XX 0 ])−1 E [U 2 XX 0 ](E [XX 0 ])−1

Vβ is unknown. Thus it needs to be estimated.

A sample analogue estimator:


! −1 ! ! −1
n n n
Vbβ = n −1
∑ Xi Xi0 n −1
∑ Ubi2 Xi Xi0 n −1
∑ Xi Xi0 .
i =1 i =1 i =1

where Ubi is called the Regression Residual:

Ubi = Yi − Xi0 β.
b

Lecture (6) Econ 704 11/12/2018 5 / 26


Estimating the Asymptotic Variance-Covariance

! −1 ! ! −1
n n n
Vbβ = n −1
∑ Xi Xi0 n −1
∑U b2 Xi Xi0 n −1
∑ Xi Xi0 .
i =1 i =1 i =1

This estimator of Vβ is called the Eicker-Huber-White estimator.

It is also called the heteroskdedasticity-robust variance estimator.


This is because it is based on the asymptotic variance covariance
matrix Vβ that is derived without making the homoskedasticity
assumption.

With minor technical conditions added, one can show: Vbβ →p Vβ .


A proof is at the end of the narrative version of Lecture 6. Read it to
hone in your skills with the LLN and the Slutsky Theorem.

Lecture (6) Econ 704 11/12/2018 6 / 26


Standard Error


Since Vβ is the asymptotic variance-covariance matrix of n ( βb − β),
we can use Vbβ approximates the variance-covariance matrix of
√ b
n ( β − β ).
Vbβ
That means that we can use n to approximate the variance of β.
b

In particular, we can use [Vbβ ]j,j /n to the variance of β̂ j , where


β j and β̂ j are the jth elements of β and β̂ respectively, and
[Vbβ ]j ,j is the jth diagonal element of Vbβ .
q
[Vbβ ]j,j /n is called the standard error of β̂ j .

We use s.e.( β̂ j ) to denote the standard error of β̂ j .

Lecture (6) Econ 704 11/12/2018 7 / 26


Using Standard Error

Combined with the asymptotic normality of β̂, the standard errors can
be used to assess the accuracy of the estimator.

The assesment is made by


approx 
β̂ j ∼ N β j , s.e.( β̂ j )
approx
The precise sense in which the ∼ holds is that

n ( β̂ j − β j )/s.e.( β̂) →d N (0, 1). as n → ∞.

Lecture (6) Econ 704 11/12/2018 8 / 26


Example
Suppose Y is student test score of a school district, X = (1, D )0
where 
1 student-teacher ratio < 20
D= (1)
0 student-teacher ratio ≥ 20
We estimate E (Y |D ) = β 0 + β 1 D, using the estimator β̂ ≡ ( β̂ 0 , β̂ 1 )0 .
Suppose that s.e.( β̂ 0 ) = 1.3 and s.e.( β̂ 1 ) = 1.8.
What is the probability that β̂ 0 is off by 3 points or more?
!
| β̂ 0 − β 0 | 3
Pr(| β̂ 0 − β 0 | ≥ 3) = Pr ≥
s.e.( β̂ 0 ) 1.3
≈ Pr (|N (0, 1)| ≥ 3/1.3)
= Φ(−3/1.3) + 1 − Φ(3/1.3)
= 2(1 − Φ(3/1.3))
≈ 0.0210 = 2.1%. (2)

Lecture (6) Econ 704 11/12/2018 9 / 26


Example

Suppose Y is student test score of a school district, X = (1, D )0


where 
1 student-teacher ratio < 20
D= (3)
0 student-teacher ratio ≥ 20
Estimate E (Y |D ) = β 0 + β 1 D, using the estimator β̂ ≡ ( β̂ 0 , β̂ 1 )0 .

Suppose that s.e.( β̂ 0 ) = 1.3 and s.e.( β̂ 1 ) = 1.8.

What is the probability that β̂ 0 is off by 3 points or more?

Pr(| β̂ 0 − β 0 | ≥ 3) ≈ 2.1%. (4)

Suppose that you (the analyst or policy maker) can tolerate an error
as large as 3 points. The estimator β̂ 0 is quite OK.

Lecture (6) Econ 704 11/12/2018 10 / 26


Reporting a point estimate β̂ j along with it standard error is one way
to report the results of statistical analysis. Combined with asymptotic
normality, that allows the reader to do the calculations like above to
assess how accurate the point estimate is.

Another way to report the results of statistical analysis is to report an


interval estimator of β.

This interval estimator is a random interval, that contains the true


value β with (approximately) a prespecified probability.

The interval is called a confidence interval, and the prespecified


probability is called the confidence level.

Lecture (6) Econ 704 11/12/2018 11 / 26


Outline

Estimating the Asymptotic Variance: Standard Error

Confidence Interval

Simple Hypothesis Testing

Multiple Hypothesis Testing (next lecture)

Lecture (6) Econ 704 11/12/2018 12 / 26


Confidence Interval Construction

Suppose that a scalar parameter of interest is θ. And we have a point


estimator θ̂ and can prove that

n (θ̂ − θ ) →d N (0, σ2 ).

Suppose that there is an estimator σ̂ for σ such that σ̂ →p σ.



Then s.e.(θ̂ ) = σ̂/ n.

A two-sided confidence interval of confidence level 1 − α can be


constructed as

[θ̂ − zα/2 × s.e.(θ̂ ), θ̂ − zα/2 × s.e.(θ̂ )]

Lecture (6) Econ 704 11/12/2018 13 / 26


Confidence Interval Construction

What is zα/2 ?

We can find this out via the following calculation:



Pr θ ∈ [θ̂ − zα/2 × s.e.(θ̂ ), θ̂ − zα/2 × s.e.(θ̂ )]

...
...

≈ Pr(|N (0, 1)| ≤ zα/2 )


= 1 − α.

Thus, zα/2 is the 1 − α/2 quantile of N (0, 1). E.g. z2.5% = 1.96,
z5% = 1.645.

Lecture (6) Econ 704 11/12/2018 14 / 26


Outline

Estimating the Asymptotic Variance: Standard Error

Confidence Interval

Simple Hypothesis Testing

Multiple Hypothesis Testing (next lecture)

Lecture (6) Econ 704 11/12/2018 15 / 26


Hypothesis Testing

The point estimator θ̂ answers the question: What is θ


approximately?

The confidence interval answers the question: approximately In what


range does θ lie?

Hypothesis testing answers yes/no questions about θ. For example:


Are the test scores of school districts with low student-teacher ratio
and those with high student-teacher ratio the same in expectation?

Lecture (6) Econ 704 11/12/2018 16 / 26


Single Hypotheses

A single hypothesis about a parameter θ is defined by one equation:

H0 : θ = c,

for some given value c.


In the conention of hypothesis testing, we take a hypothesis like this
as the null hypothesis, and let its opposite be the alternative
hypothesis. For example

H1 : θ 6= c.

This particular H1 is a two-sided alternative hypothesis.


It is also possible to consider the one-sided alternative hypothesis:

H1 : θ > c,

when we are confident that θ 6≤ c.


Lecture (6) Econ 704 11/12/2018 17 / 26
Single Hypothesis Testing

A hypothesis test can be constructed based on a point estimator of θ,


its standard error s.e.(θ̂ ), and the asymptotic normality:

θ̂ − θ
→d N (0, 1).
s.e.(θ̂ )

A hypothesis test is a binary decision made based on θ̂ and s.e.(θ̂ ).


For example for H0 : θ = c against the two-sided alternative:
( )
|θ̂ − c |
φn = 1 > threshold
s.e.(θ̂ )

Here φn is a binary variable. It’s value 1 denotes rejection of H0 and 0


denotes acceptance of H0 .

Lecture (6) Econ 704 11/12/2018 18 / 26


Determining the Threshold: Critical Value

( )
|θ̂ − c |
φn = 1 > threshold
s.e.(θ̂ )

The random variable on the left-hand side in the indicator function is


called the test statistic, while the threshold is called the critical value.

In order to determine the critical value, we need to understand the


nature of the binary decision φn .
It is a random decision, and thus not always correct.
Type I error (False rejection): φn = 1 while θ = c.
Type II error (False acceptance): φn = 0 while θ 6= c.

Lecture (6) Econ 704 11/12/2018 19 / 26


Hypothesis Testing Convention

The convention of hypothesis testing is to first control the probability


of the type-I error at a prespecified significance level, say α:
!
|θ̂ − c | approx
Pr(φn = 1|H0 ) ≡ Pr > threshold θ = c ≤ α.
s.e.(θ̂ )

While the probability of the type-I error is controlled, try to minimize


the type-II error (by choosing threshold as small as possible).

Lecture (6) Econ 704 11/12/2018 20 / 26


The Two-sided t-Test
For the hypothesis H0 : θ = c vs H1 : θ 6= c, consider the test
( )
|θ̂ − c |
φn = 1 > threshold
s.e.(θ̂ )

θ̂ −c
We use the fact that s.e.(θ̂ )
→d N (0, 1) under H0 (i.e. θ = c).
Controlling type-I error rate requires:

Pr(φn = 1|H0 ) ≈ Pr(|N (0, 1)| > threshold ) ≤ α

which yields: threshold ≥ zα/2 .


Minimizing type-II error rate requires making threshold as small as
possible. Thus
threshold = zα/2 .
The resulting test is the so-called two-sided t-test.
Lecture (6) Econ 704 11/12/2018 21 / 26
One-Sided t-Test
Suppose instead, we consider the one-sided alternative:
H0 : θ = c vs. H1 : θ > c.
Then a sensible test is
!
θ̂ − c
φn = > threshold
s.e.(θ̂ )
Controlling type-I error rate requires:
Pr(φn = 1|H0 ) ≈ Pr(N (0, 1) > threshold ) ≤ α
which yields: threshold ≥ zα .
Minimizing type-II error rate requires making threshold as small as
possible. Thus
threshold = zα .
The resulting test is the so-called two-sided t-test.
Lecture (6) Econ 704 11/12/2018 22 / 26
p-value:Definition

When you are not exactly sure which significance level is appropriate,
it is convenient to report the p-value.
We define this for the two-sided t-test. The one-sided t-test verison is
analogous.
Let τ denote the realized value of the Student-t statistic given your
data set. The p-value is

p = Pr(|N (0, 1)| ≥ |τ |) = 2(1 − Φ(|τ |)).

Lecture (6) Econ 704 11/12/2018 23 / 26


p-value: Interpretations

The p-value is the probability that a random draw of the test statistic
approx
(i.e. t ∼ N (0, 1)) is at least as averse to the H0 as the realized
draw (i.e. τ), assuming that H0 is true.
If it is unlikely to see stronger evidence against H0 (i.e., p-value is
small), the current evidence must be a strong one.

The p-value is the lowest significance level at which your observed


value of the statistic (e.g. τ) is evidence for rejecting the H0 . That is:

The test of a significance level above the p-value rejects H0 , and the
test of a significance level below the p-value does not.

Lecture (6) Econ 704 11/12/2018 24 / 26


Example: Statistical Significance of A Parameter

Consider a conditional mean model E (Y |X ) = X 0 β. Consider a


coordinate of β, say β j .
Then a two-sided t-test for the following hypotehsis

H0 : β j = 0 vs. H1 : β j 6= 0 (5)

is called the statistical significance test for β j .


If the two-sided t-test of significance level α rejects this H0 , we say
that β j is (statistically) significant at significance level α.

The t-statistics and the p-values reported in the regression result


table in STATA is the t-statistic for these hypotheses.

Lecture (6) Econ 704 11/12/2018 25 / 26


Quiz 6

Suppose that
√  β̂1 − β1    

n β̂ − β →d N 0, ρ 1 .
2 2

Suppose that we have a consistent estimator ρ̂ for the unknown ρ


Note that  
β
β 1 − β 2 = (1, −1) × β12


1 n (( β̂ 1 − β̂ 2 ) − ( β 1 − β 2 )) →d ?
2 Construct a test for the single hypothesis:

H0 : β 1 − β 2 = 0.

Lecture (6) Econ 704 11/12/2018 26 / 26

You might also like