0% found this document useful (0 votes)
16 views26 pages

Econ Shu301 Ch5

This chapter discusses hypothesis testing and confidence intervals for the slope coefficient (β1) in a linear regression model with one regressor. It shows that the t-statistic for testing hypotheses about β1 follows a t-distribution. The chapter also explains how to construct 95% confidence intervals for β1 and interprets the coefficient when the regressor is a binary variable. In this case, β1 represents the difference between the population means of the dependent variable in the two groups defined by the binary regressor.

Uploaded by

liuzihan654321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Econ Shu301 Ch5

This chapter discusses hypothesis testing and confidence intervals for the slope coefficient (β1) in a linear regression model with one regressor. It shows that the t-statistic for testing hypotheses about β1 follows a t-distribution. The chapter also explains how to construct 95% confidence intervals for β1 and interprets the coefficient when the regressor is a binary variable. In this case, β1 represents the difference between the population means of the dependent variable in the two groups defined by the binary regressor.

Uploaded by

liuzihan654321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter 5

Linear Regression with One Regressor:


Hypothesis Tests and Confidence Intervals
 
d 1 Var[(Xi − µX )ui ]
β̂1 −−→ N β1 , σβ̂21 =
n Var(Xi )2

The OLS estimator β̂1 of the slope coefficient β1 differs from one sample to the
next—that is, how β̂1 has a sampling distribution. In this chapter, we show how
knowledge of this sampling distribution can be used to make statements about
β1 that accurately summarize the sampling uncertainty. This chapter provides
the expression for the standard error of the OLS estimator and shows how to
use β̂1 and its standard error to test hypotheses and how to construct confidence
intervals for β1 .

1
Hypothesis Tests
The t-Statistic
d β̂1 − β1 d
β̂1 −−→ N (β1 , σβ̂21 ), −−→ N (0, 1)
σβ̂1

When computing the t-statistic, β1 would be the value in the null hypothesis.
However, σβ̂1 is unknown and thus we need a consistent estimator of it:

1 P n
(Xi − X)2 û2i
1 n − 2 i=1 1 Var[(Xi − µX )ui ] p
σ̂β̂21 ÷ σβ̂21 =  2 ÷ −−→ 1
n 1 P n n Var(Xi )2
(Xi − X)2
n − 1 i=1
where ui = Yi − β0 − β1 Xi and ûi = Yi − β̂0 − β̂1 Xi

Therefore,
β̂1 − β1 d
−−→ N (0, 1)
σ̂β̂1

Under the null hypothesis, H0 : β1 = β1,0 ,


 
estimator − hypothesized value β̂1 − β1,0 β̂1 − β1,0 d
t= = = −−→ N (0, 1)
standard error of the estimator σ̂β̂1 SE(β̂1 )
2
Testing hypotheses about the slope β1

The p-value is the probability of observing a value of β̂1 at least as different


from β1,0 as the estimate actually computed (β̂1act ), assuming that the null
hypothesis is correct.

Equivalently, the p-value is the smallest significance level at which you can
reject the null hypothesis.

The null hypothesis and the two-sided alternative hypothesis are

H0 : β1 = β1,0 vs. H1 : β1 ̸= β1,0

β̂1act − β1,0
 
β̂1 − β1,0
p-value = PrH0 > = PrH0 (|Z| > |tact |) = 2Φ(−|tact |)
SE(β̂1 ) SE(β̂1 )

Reject H0 at the 5% significance level if p-value < 0.05 or |tact | > 1.96.

3
One-Sided Alternatives

H0 : β1 = β1,0 vs. H1 : β1 > β1,0

β̂ act − β1,0
 
β̂1 − β1,0
p-value = PrH0 > 1 = PrH0 (Z > tact ) = 1 − Φ(tact )
SE(β̂1 ) SE(β̂1 )
Reject H0 at the 5% significance level if p-value < 0.05 or tact > 1.64.

H0 : β1 = β1,0 vs. H1 : β1 < β1,0

β̂1act − β1,0
 
β̂1 − β1,0
p-value = PrH0 < = PrH0 (Z < tact ) = Φ(tact )
SE(β̂1 ) SE(β̂1 )
Reject H0 at the 5% significance level if p-value < 0.05 or tact < −1.64.


In practice, one-sided alternative hypotheses should be used only when there is
a clear reason for doing so. This reason could come from economic theory, prior
empirical evidence, or both. However, even if it initially seems that the relevant
alternative is one-sided, upon reflection this might not necessarily be so. In
practice, such ambiguity often leads econometricians to use two-sided tests.
4
Confidence Intervals
Hypothesis tests are useful if you have a specific null hypothesis in mind (0 in
many cases). Being able to accept or reject this null hypothesis based on the
statistical evidence provides a powerful tool for coping with the uncertainty
inherent in using a sample to learn about the population. Yet, there are many
times that no single hypothesis about a regression coefficient is dominant, and,
due to sampling uncertainty, we cannot determine the true value of β1 exactly
from a sample of data. Hence, one instead would like to know a range of values
of the coefficient that are consistent with the data. This calls for constructing a
confidence interval.

5
Confidence Interval for β1

The 95% confidence interval is the set of all values of β1 that cannot be
rejected at the 5% significance level by a two-sided hypothesis test.

Equivalently, the 95% confidence interval is an interval that has a 95%


probability of containing the true value of β1 ; that is, in 95% of all possible
samples that might be drawn, it will contain the true value of β1 .

95% confidence interval for β1 = [β̂1 − 1.96SE(β̂1 ), β̂1 + 1.96SE(β̂1 )]

6
Regression When X Is a Binary Variable
When the regressor is binary—that is, when it takes on only two values, 0 and 1,
it is called an indicator variable or sometimes a dummy variable.
For example, X might be a worker’s gender (= 1 if female, = 0 if male), or
whether the district’s class size is small or large (= 1 if small, = 0 if large).

The mechanics of regression with a binary regressor are the same as if it is


continuous. The interpretation of β1 , however, is different, and it turns out that
regression with a binary variable is equivalent to performing a difference of
means analysis.

7
Regression When X Is a Binary Variable

The population regression model with Di as the regressor is

Yi = β0 + β1 Di + ui , E(ui |Di ) = 0

Because Di is not continuous, it is not useful to think of β1 as a slope; indeed,


because Di can take on only two values, there is no “line,” so it makes no sense
to talk about a slope. Thus, we will not refer to β1 as the slope; instead, we will
simply refer to β1 as the coefficient on Di .

8
Interpretation of β1 When X Is a Binary Variable

If β1 is not a slope, what is it? The best way to interpret β0 and β1 in a


regression with a binary regressor is to consider, one at a time, the two possible
cases, Di = 0 and Di = 1:

Yi = β0 + ui when Di = 0
Yi = β0 + β1 + ui when Di = 1

Because E(ui |Di ) = 0,

E(Yi |Di = 0) (= the population mean of Yi when Di = 0) = β0


E(Yi |Di = 1) (= the population mean of Yi when Di = 1) = β0 + β1

Since β1 = E(Yi |Di = 1) − E(Yi |Di = 0), β1 is the difference in the population
means of Yi in the two groups, one with Di = 1 and the other with Di = 0.

9
β̂1 is the difference between the sample averages of Yi in the two groups

Because β1 is the difference in the population means, it makes sense that β̂1 is
the difference between the sample averages of Yi in the two groups, and, in fact,
this is the case:
n
P
Di Yi
i=1 the sample average of Yi
n =
P in the group with Di = 1
Di
i=1
n
P
(1 − Di )Yi
i=1 the sample average of Yi
n =
P in the group with Di = 0
(1 − Di )
i=1

We want to confirm that


n
P n
P  n
P 
Di Yi (1 − Di )Yi (Di − D)(Yi − Y )
i=1 i=1 i=1
 
Pn − P
n = β̂1 
= n
P


Di (1 − Di ) (Di − D)2
i=1 i=1 i=1

10
β̂1 is the difference between the sample averages of Yi in the two groups

According to the formula for the OLS estimator, β̂1 equals


n
P n
P n
P
(Di − D)(Yi − Y ) Di Yi − 2nD Y + nD Y Di Yi − nD Y
i=1 i=1 i=1
β̂1 = n = n = n
2 2 2
Di2 − 2nD + nD Di2 − nD
P P P
(Di − D)2
i=1 i=1 i=1

where the numerator can be written as


n
X n
X n
X n
X n
X
Di Yi − nD Y = Di Yi − D Yi = (1 − D) Di Yi − D (1 − Di )Yi
i=1 i=1 i=1 i=1 i=1

and the denominator can be written as


n n
X 2 X 2 2
Di2 − nD = Di − nD = nD − nD = nD(1 − D)
i=1 i=1

which shows that


Pn n
P n
P n
P
Di Yi (1 − Di )Yi Di Yi (1 − Di )Yi
i=1 i=1 i=1 i=1
β̂1 = − = n − Pn
nD n − nD P
Di (1 − Di )
i=1 i=1
11
β̂1 is the difference between the sample averages of Yi in the two groups

We can also see that


n
P n
P n
P  Pn
Yi Di Yi (1 − Di )Yi D
i=1
 i=1 i=1
 i=1 i
β̂0 = Y − β̂1 D = − n − P
n

n  P  n
Di (1 − Di )
i=1 i=1
n
P n
P n
P n
P  n
P  n
P
Yi Di Yi Di (1 − Di )Yi Di (1 − Di )Yi
i=1 i=1 i=1 i=1 i=1
  i=1
= − P
n − P
n
1 − +
n
n n  n  P
Di (1 − Di ) (1 − Di )
i=1 i=1 i=1
| {z }
n
P
(1 − Di )
i=1
=
n
n
P n
P n
P n
P n
P
Yi Di Yi (1 − Di )Yi (1 − Di )Yi (1 − Di )Yi
i=1 i=1 i=1 i=1 i=1
= − − + P
n = n
n n n P
(1 − Di ) (1 − Di )
i=1 i=1

12
Hypothesis tests: Are the two population means the same?

Is the difference in the population means in the two groups statistically


significantly different from 0 at the 5% level? If the two population means are
the same, then β1 is 0. Thus the null hypothesis that the two population means
are the same can be tested against the alternative hypothesis that they differ by
testing the null hypothesis β1 = 0 against the alternative β1 ̸= 0.

13
Units of Measurement
The Effects of Changing Units of Measurement on OLS Estimators

Let β̂0 and β̂1 be the OLS estimators in the regression model Yi = β0 + β1 Xi + ui .

Let β̃0 and β̃1 be the OLS estimators in the regression of a1 Yi on a2 Xi ,


where a1 and a2 are constants with a2 ̸= 0. Observe that
a1
a1 Yi = a1 β0 + β1 (a2 Xi ) + a1 ui
a2

Hence, β̃0 = a1 β̂0 and β̃1 = (a1 /a2 )β̂1 with σ̂β̂1 = |a1 /a2 | σ̂β̃1 .

Let β̃0 and β̃1 be the OLS estimators in the regression of Yi + c1 on Xi + c2 ,


where c1 and c2 are constants. Observe that

Yi + c1 = (β0 + c1 − c2 β1 ) + β1 (Xi + c2 ) + ui

Hence, β̃0 = β̂0 + c1 − c2 β̂1 and β̃1 = β̂1 with σ̂β̂1 = σ̂β̃1 .


The R2 is invariant to changes in the units of measurement of the independent
or the dependent variable, which can be seen from its formula.
14
Heteroskedasticity and Homoskedasticity
Our only assumption about the distribution of ui conditional on Xi is that it
has a conditional mean of 0 (Assumption A1). If, furthermore, the variance of
this conditional distribution, Var(ui |Xi ), is constant for i = 1, . . . , n and does
not depend on Xi , then the errors are said to be homoskedastic. Otherwise,
the error term is heteroskedastic. Homoskedasticity is a special case of
heteroskedasticity. In other words, heteroskedasticity is a more general case
than homoskedasticity.

Because we have not placed no restrictions on the conditional variance, whether


the errors are homoskedastic or heteroskedastic, all theory of OLS remains valid,
and the OLS estimator is unbiased, consistent, and asymptotically normal.

15
16
What Does This Mean in Practice?

If the error term is homoskedastic, then the formulas for the variances of the
OLS estimators simplify. Consequently, if the errors are homoskedastic, then
there is a specialized formula that can be used for the standard errors, and it is
called the homoskedasticity-only standard error. Because this alternative
formula is derived for the special case that the errors are homoskedastic, it does
not apply if the errors are heteroskedastic. If the errors are heteroskedastic,
then the homoskedasticity-only standard error is inappropriate. Specifically, if
the errors are heteroskedastic, then the t-statistic computed using the
homoskedasticity-only standard error does not have a standard normal
distribution, even in large samples. In contrast, σ̂β̂1 produces valid statistical
inferences whether the errors are heteroskedastic or homoskedastic; thus it is
called heteroskedasticity-robust standard errors.

At a general level, there rarely is any reason to believe that the errors are
homoskedastic. Therefore, it is always prudent to assume that the errors might
be heteroskedastic and always use the heteroskedasticity-robust standard errors
17
Mathematical Implications of Homoskedasticity
Under homoskedasticity, the conditional variance of ui given Xi is a constant,
i.e., Var(ui |Xi ) = Var(ui ) = σu2 , and the asymptotic variance of β̂1 simplifies.
First, notice that the numerator of σβ̂2 can be written as
1

Var[(Xi − µX )ui ] = E[((Xi − µX )ui − E[(Xi − µX )ui ])2 ]


= E[((Xi − µX )ui )2 ] = E[(Xi − µX )2 u2i ] = E[E[(Xi − µX )2 u2i |Xi ]]
= E[(Xi − µX )2 E[u2i |Xi ]] = E[(Xi − µX )2 Var[ui |Xi ]]
= E[(Xi − µX )2 σu2 ] = σu2 E[(Xi − µX )2 ] = σu2 Var(Xi )

where the equalities follow from E[(Xi − µX )ui ] = 0 and E[ui ] = 0. Then,
1 Var[(Xi − µX )ui ] 1 σu2 Var(Xi ) 1 Var(ui )
σβ̂21 = = =
n Var(Xi )2 n Var(Xi )2 n Var(Xi )
The homoskedasticity-only standard errors of β̂1 is σ
eβ̂1 , where
1 P n
û2
1 n − 2 i=1 i
eβ̂21
σ = n
n 1 P
(Xi − X)2
n − 1 i=1
18
Using the t-Statistic
When the Sample Size is Small
Using the t-Statistic in Regression When the Sample Size Is Small

When the sample size is small, the exact distribution of the t-statistic is
complicated and depends on the unknown population distribution of the data.
If, however, in addition to the three least squares assumptions, the regression
errors are homoskedastic and the regression errors are normally
distributed, then the OLS estimator is normally distributed and the
homoskedasticity-only t-statistic has a t distribution with n − 2 degrees of freedom.

In econometric applications, however, there is rarely a reason to believe that the


errors are homoskedastic and normally distributed.

19

You might also like