0% found this document useful (0 votes)
14 views39 pages

Lecture Set 4

The document discusses hypothesis tests and confidence intervals in multiple regression, focusing on individual and joint hypothesis testing methods. It emphasizes the importance of model specification and the selection of variables to avoid omitted variable bias, while also detailing the use of F-statistics for testing joint hypotheses. Additionally, it provides guidelines for presenting regression results and highlights the need for careful judgment in model selection.

Uploaded by

Jimmy Teng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views39 pages

Lecture Set 4

The document discusses hypothesis tests and confidence intervals in multiple regression, focusing on individual and joint hypothesis testing methods. It emphasizes the importance of model specification and the selection of variables to avoid omitted variable bias, while also detailing the use of F-statistics for testing joint hypotheses. Additionally, it provides guidelines for presenting regression results and highlights the need for careful judgment in model selection.

Uploaded by

Jimmy Teng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

ECONS303: Applied Quantitative Research Methods

Lecture set 4: Hypothesis Tests and


Confidence Intervals in Multiple
Regression

The elements of the model stand, in the model, for the objects. The model consists in the fact that its
elements are combined with one another in a definite way.
- Choy, Keen Meng (2020) Tractatus Modellus-Philosophicus, mimeo
Outline
1. Hypothesis tests and confidence intervals for one coefficient
2. Joint hypothesis tests on multiple coefficients
3. Other types of hypotheses involving multiple coefficients
4. Confidence sets for multiple coefficients
5. Model specification: how to decide which variables to include
in a regression model
Hypothesis Tests and Confidence Intervals for a
Single Coefficient (SW Section 7.1)
• Hypothesis tests and confidence intervals for a single coefficient
in multiple regression follow the same logic and recipe as for
the slope coefficient in a single-regressor model.

ˆ1  E ( ˆ1 )
• is approximately distributed N(0,1) (CLT).
var( ˆ1 )

• Thus hypotheses on 1 can be tested using the usual t -statistic,


and confidence intervals are constructed as {ˆ1  1.96  SEˆ1 )}.

• So too for β2,…, βk.


Example: The California class size data
1. TestScore  698.9  2.28  STR
(10.4) (0.52)

2. TestScore  686.0  1.10  STR  0.650 PctEL


(8.7) (0.43) (0.031)

• The coefficient on STR in (2) is the effect on TestScores of a unit change in


STR, holding constant the percentage of English Learners in the district
• The coefficient on STR falls by one-half
• The 95% confidence interval for coefficient on STR in (2) is {–1.10 ± 1.96 ×
0.43} = (–1.95, – 0.26)
• The t-statistic testing βSTR = 0 is t = –1.10/0.43 = –2.54, so we reject the
hypothesis at the 5% significance level
Tests of Joint Hypotheses (SW Section 7.2)
(1 of 2)

Let Expn = expenditures per pupil and consider the population


regression model:

TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui

The null hypothesis that “school resources don’t matter,” and the
alternative that they do, corresponds to:

H0: β1 = 0 and β2 = 0

vs. H1: either β1 ≠ 0 or β2 ≠ 0 or both


Tests of Joint Hypotheses (SW Section 7.2)
(2 of 2)
H0: β1 = 0 and β2 = 0 vs. H1: either β1 ≠ 0 or β2 ≠ 0 or both ≠ 0 –(7.7)

• A joint hypothesis specifies a value for two or more coefficients,


that is, it imposes a restriction on the value of two or more
coefficients.
• As a matter of terminology, we say that equation (7.7) imposes two
restrictions on the multiple regression model: β1 = 0 and β2 = 0
• In general, a joint hypothesis will involve q restrictions. In the
example above, q = 2.
Hypothesis Tests for Individual Coefficient Estimates
What happens when you use the "one-at-a-time" testing procedure:
reject the joint null hypothesis if

reject H0: β1 = β2 = 0 if |t1| > 1.96 and/or |t2| > 1.96


A “common sense” idea is to reject if either of the individual t-statistics
exceeds 1.96 in absolute value.
But this “one at a time” test isn’t valid: the resulting test rejects too often
under the null hypothesis (more than 5%)!
Hypothesis Tests for Individual Coefficient Estimates
Case 1
Assume the t-statistics are not correlated and thus are independent in large samples. Then
we can calculate the rejection probability exactly.
Because the t-statistics are independent,
Pr(|t1| > 1.96 and |t2| > 1.96) = Pr (|t1| > 1.96 ) × Pr (|t2| > 1.96 )
= 0.95 × 0.95 = 0.9025
Hence, the probability of rejecting the null hypothesis when it is true is 1 – 0.9025 = 0.975
or 9.75%
Case 2
If the regressors are correlated, then the size of the “one at a time” testing approach
depends on the value of the correlation between the regressors, so the situation is even
more complicated.

To resolve this, one approach is to use the so-called Bonferroni method


(See SW 4th edition Appendix 7.1). Another approach to testing joint
hypotheses that is more powerful and commonly used is based on the F-
Statistic.
The F-statistic
The F-statistic tests all parts of a joint hypothesis at once.

Formula for the special case of the joint hypothesis β1 = β1,0 and β2 = β2,0 in a
regression with two regressors:

1  t1  t2  2 ˆ t1 ,t2 t1t2 
2 2

F   F2,  distribution
2  1  ˆ t21 ,t2 
where ˆ t1 ,t2 estimates the correlation between t1 and t2 .

In general, the t-statistics are correlated so the formula adjusts for this
correlation.

This adjustment is made so that under the null hypothesis, the F-statistic has an
F2, ∞ distribution whether or not the t-statistics are correlated.

Note: The formula for more than two β’s is nasty unless you use matrix algebra.
If the F-distribution is computed using the general heteroskedasticity-robust SE
formula, its large-n distribution under the null hypothesis is the F2, ∞
distribution, regardless of whether the errors are homoskedastic or
heteroskedastic.
The p-value of the F statistic can be computed using the large sample F2, ∞
aproximation to its distribution.
The p-value can be evaluated using a table of the F2, ∞ distribution or a table of the
χ2𝑞 distribution because a χ2𝑞 -distributed RV is q times an F2, ∞-distributed
random variable.

Implementation in STATA
Use the “test” command after the regression
Example: Test the joint hypothesis that the population coefficients on STR and
expenditures per pupil (expn_stu) are both zero, against the alternative that at least
one of the population coefficients is nonzero.
F-test example, California class size data:
reg testscr str expn_stu pctel, r;
Regression with robust standard errors Number of obs = 420
F( 3, 416) = 147.20
Prob > F = 0.0000
R-squared = 0.4366
Root MSE = 14.353
------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -.2863992 .4820728 -0.59 0.553 -1.234001 .661203
expn_stu | .0038679 .0015807 2.45 0.015 .0007607 .0069751
pctel | -.6560227 .0317844 -20.64 0.000 -.7185008 -.5935446
_cons | 649.5779 15.45834 42.02 0.000 619.1917 679.9641
------------------------------------------------------------------------------
NOTE
test str expn_stu; The test command follows the regression

( 1) str = 0.0 There are q=2 restrictions being tested


( 2) expn_stu = 0.0
F( 2, 416) = 5.43 The 5% critical value for q=2 is 3.00
Prob > F = 0.0047 Stata computes the p-value for you
Model specification: How to decide what
variables to include in a regression (Section
7.5) (1 of 2)
1. Identify the variable of interest
2. Think of the omitted causal effects that could result in omitted
variable bias
3. Include those omitted causal effects if you can or, if you can’t,
include variables correlated with them that serve as control
variables.
– The control variables are effective if the conditional mean
independence assumption plausibly holds, that is, if u is uncorrelated
with STR once the control variables are included.
– This results in a “base” or “benchmark” model.
Model specification: How to decide what
variables to include in a regression (Section
7.5) (2 of 2)
4. Also specify a range of plausible alternative models, which
include additional candidate variables.
5. Estimate your base model and plausible alternative
specifications (“sensitivity checks”).
– Does a candidate variable change the coefficient of interest (β1)?
– Is a candidate variable statistically significant?
– Use judgment, not a mechanical recipe…
– Don’t just try to maximize R2!
Digression about measures of fit…
It is easy to fall into the trap of maximizing the R 2 and R 2 , but this loses
sight of our real objective, an unbiased estimator of the class size effect.
• A high R 2 (or R 2 ) means that the regressors explain the variation in Y.

• A high R 2 (or R 2 ) does not mean that you have eliminated omitted
variable bias.

• A high R 2 (or R 2 ) does not mean that you have an unbiased estimator
of a causal effect (1 ).

• A high R 2 (or R 2 ) does not mean that the included variables are
statistically significant  this must be determined using hypotheses tests.
Analysis of the Test Score Data Set
(SW Section 7.6) (1 of 3)
1. Identify the variable of interest:
STR
2. Think of the omitted causal effects that could result in omitted
variable bias
Whether the students know English; outside learning
opportunities; parental involvement; teacher quality (if teacher
salary is correlated with district wealth) – there is a long list!
Analysis of the Test Score Data Set
(SW Section 7.6) (2 of 3)
3. Include those omitted causal effects if you can or, if you can’t,
include variables correlated with them that serve as control
variables.
4. The control variables are effective if the conditional mean
independence (CMI) assumption plausibly holds (if u is
uncorrelated with STR once the control variables are included).
This results in a “base” or “benchmark” model.
Many of the omitted causal variables are hard to measure, so
we need to find control variables. These include PctEL (both a
control variable and an omitted causal factor) and measures of
district wealth.
Analysis of the Test Score Data Set
(SW Section 7.6) (3 of 3)
4. Also specify a range of plausible alternative models, which
include additional candidate variables.
It isn’t clear which of the income-related variables will best
control for the many omitted causal factors such as outside
learning opportunities, so the alternative specifications include
regressions with different income variables. The alternative
specifications considered here are just a starting point, not the
final word!
5. Estimate your base model and plausible alternative
specifications (“sensitivity checks”).
Presentation of regression results (1 of 2)
• We have a number of regressions and we want to report them.
It is awkward and difficult to read regressions written out in
equation form, so instead it is conventional to report them in a
table.
• A table of regression results should include:
– estimated regression coefficients
– standard errors
– measures of fit
– number of observations
– relevant F-statistics, if any
– any other pertinent information, such as confidence intervals for the
causal effect of interest
• Find this information in the following table!
Presentation of regression results (2 of 2)
Summary: testing joint hypotheses
• The “one at a time” approach of rejecting if either of the t-
statistics exceeds 1.96 rejects more than 5% of the time under the
null (the size exceeds the desired significance level)
• The heteroskedasticity-robust F-statistic is built into the STATA
(“test” command); this tests all q restrictions at once.

Note: The homoskedasticity-only F-statistic is important historically (and thus


in practice) and can help intuition, but it isn’t valid when there is
heteroskedasticity (refer to Appendix in this set of slides to guide your readings
in the SW 4th edition textbook).
Summary: Multiple Regression
• Multiple regression allows you to estimate the effect on Y of a
change in X1, holding other included variables constant.
• If you can measure a variable, you can avoid omitted variable
bias from that variable by including it.
• If you can’t measure the omitted variable, you still might be able
to control for its effect by including a control variable.
• There is no simple recipe for deciding which variables belong in
a regression – you must exercise judgment.
• One approach is to specify a base model – relying on a-priori
reasoning – then explore the sensitivity of the key estimate(s) in
alternative specifications.
APPENDIX
More on F-statistics.
There is a simple formula for the F-statistic that holds only under
homoskedasticity (so it isn’t very useful) but which nevertheless
might help you understand what the F-statistic is doing.

The homoskedasticity-only F-statistic (when errors are


homoskedastic)
• Run two regressions, one under the null hypothesis (the
“restricted” regression) and one under the alternative hypothesis
(the “unrestricted” regression).
• Compare the fits of the regressions – the R2s – if the
“unrestricted” model fits sufficiently better, reject the null
The “restricted” and “unrestricted” regressions

Example: are the coefficients on STR and Expn zero?


Unrestricted population regression (under H1):
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
Restricted population regression (that is, under H0):
TestScorei = β0 + β3PctELi + ui
• The number of restrictions under H0 is q = 2
Simple formula for the homoskedasticity-only
F-statistic:
2
( Runrestricted  Rrestricted
2
)/q
F
(1  Runrestricted
2
)/(n  kunrestricted  1)
where:
2
Rrestricted  the R 2 for the restricted regression
2
Runrestricted  the R 2 for the unrestricted regression
q = the number of restrictions under the null
kunrestricted = the number of regressors in the unrestricted regression.

• The bigger the difference between the restricted and unrestricted


R2s – the greater the improvement in fit by adding the variables in
question – the larger is the homoskedasticity-only F.
Example:
Restricted regression:
TestScore  644.7  0.671PctEL, Rrestricted
2
 0.4149
(1.0) (0.032)
Unrestricted regression:
TestScore  649.6  0.29STR  3.87 Expn  0.656 PctEL
(15.5) (0.48) (1.59) (0.032)
2
Runrestricted  0.4366, kunrestricted  3, q  2
2
( Runrestricted  Rrestricted
2
)/q
So F
(1  Runrestricted
2
)/(n  kunrestricted  1)
(.4366  .4149)/2
  8.01
(1  .4366)/(420  3  1)

Note: Heteroskedasticity-robust F = 5.43…


The homoskedasticity-only F-statistic – summary
2
( Runrestricted  Rrestricted
2
)/q
F
(1  Runrestricted
2
)/(n  kunrestricted  1)

• The homoskedasticity-only F-statistic rejects when adding the


two variables increased the R2 by “enough” – that is, when adding
the two variables improves the fit of the regression by “enough”

• If the errors are homoskedastic, then the homoskedasticity-only


F -statistic has a large-sample distribution that is  q2 /q.

• But if the errors are heteroskedastic, the large-sample distribution


of the homoskedasticity-only F -statistic is not  q2 /q
The F distribution
Your regression printouts might refer to the “F ” distribution.
If the four multiple regression LS assumptions hold and if:
5. ui is homoskedastic, that is, var(u|X1,…, Xk) does not depend
on X ’s
6. u1,…,un are normally distributed then the homoskedasticity-
only F-statistic has the “Fq,n-k–1” distribution, where q = the
number of restrictions and k = the number of regressors under
the alternative (the unrestricted model).
The Fq,n–k–1 distribution (1 of 2)
• The F distribution is tabulated many places
• As n  , the Fq ,n -k 1 distribution asymptotes to the  q2 /q distribution:

• The Fq , and  q2 /q distributions are the same.

• For q not too big and n  100, the Fq ,n k 1 distribution and the  q2 /q
distribution are essentially identical.

• Many regression packages (including STATA) compute p-values


of F-statistics using the F distribution

• You will encounter the F distribution in published empirical work.


Another digression: A little history of
statistics… (1 of 2)
• The theory of the homoskedasticity-only F-statistic and the
Fq,n–k–1 distributions rests on implausibly strong assumptions
(are earnings normally distributed?)
• These statistics date to the early 20th century… the days
when data sets were small and computers were people…
• The F-statistic and Fq,n–k–1 distribution were major
breakthroughs: an easily computed formula; a single set of
tables that could be published once, then applied in many
settings; and a precise, mathematically elegant justification.
Another digression: A little history of
statistics… (2 of 2)
• The strong assumptions were a minor price for this breakthrough.
• But with modern computers and large samples we can use the
heteroskedasticity-robust F-statistic and the Fq,∞ distribution,
which only require the four least squares assumptions (not
assumptions #5 and #6)
• This historical legacy persists in modern software, in which
homoskedasticity-only standard errors (and F-statistics) are the
default, and in which p-values are computed using the Fq,n–k–1
distribution.
Summary: the homoskedasticity-only
F-statistic and the F distribution
• These are justified only under very strong conditions – stronger
than are realistic in practice.
• You should use the heteroskedasticity-robust F-statistic, with  q2 /q
(that is, Fq , ) critical values.

• For n  100, the F-distribution essentially is the  q2 /q distribution.

• For small n, sometimes researchers use the F distribution


because it has larger critical values and in this sense is more
conservative.
Testing Single Restrictions on Multiple
Coefficients (SW Section 7.3) (1 of 2)
Yi = β0 + β1X1i + β2X2i + ui, i = 1,…,n
Consider the null and alternative hypothesis,
H0: β1 = β2 vs. H1: β1 ≠ β2
This null imposes a single restriction (q = 1) on multiple
coefficients – it is not a joint hypothesis with multiple restrictions
(compare with β1 = 0 and β2 = 0).
Testing Single Restrictions on Multiple
Coefficients (SW Section 7.3) (2 of 2)
Here are two methods for testing single restrictions on multiple
coefficients:
1.Perform the test directly
Some software, including STATA, lets you test restrictions using
multiple coefficients directly.
2. Rearrange (“transform”) the regression
Rearrange the regressors so that the restriction becomes a
restriction on a single coefficient in an equivalent regression.
(See SW 4th edition pp.258 Section 7.3)
Method 1: Perform the test directly
Yi = β0 + β1X1i + β2X2i + ui
H0: β1 = β2 vs. H1: β1 ≠ β2
Example:
TestScorei = β0 + β1STRi + β2Expni + β3PctELi + ui
In STATA, to test β1 = β2 vs. β1 ≠ β2 (two-sided):
regress testscore str expn pctel, r
test str=expn

The details of implementing this method are software-specific.


Confidence Sets for Multiple Coefficients
(SW Section 7.4) (1 of 2)
Yi = β0 + β1X1i + β2X2i + … + βk Xki + ui, i = 1,…,n
What is a joint confidence set for β1 and β2?
A 95% joint confidence set is:
• A set-valued function of the data that contains the true
coefficient(s) in 95% of hypothetical repeated samples.
• Equivalently, the set of coefficient values that cannot be rejected
at the 5% significance level.
You can find a 95% confidence set as the set of (β1, β2) that cannot
be rejected at the 5% level using an F-test
Confidence Sets for Multiple Coefficients
(SW Section 7.4) (2 of 2)
• Let F(β1,0, β2,0) be the (heteroskedasticity-robust) F-statistic
testing the hypothesis that β1 = β1,0 and β2 = β2,0:
• 95% confidence set = {β1,0, β2,0: F(β1,0, β2,0) < 3.00}
• 3.00 is the 5% critical value of the F2,∞ distribution
• This set has coverage rate 95% because the test on which it is
based (the test it “inverts”) has size of 5%
– 5% of the time, the test incorrectly rejects the null when the null is true,
so 95% of the time it does not; therefore the confidence set constructed as
the nonrejected values contains the true value 95% of the time (in 95% of
all samples).
The confidence set based on the F-statistic
is an ellipse:
 
1 1 2  2 ˆ t1 ,t2 t1t2 
t 2
 t 2

 1 ,  2 : F     3.00 
 2 1  ˆ t1 ,t2
2
 
Now
1
F  
 t 2
 t 2
ˆ 
2  2  t1 ,t2 t1t 2 
2(1  ˆ t1 ,t2 )
2 1

1
 
2(1  ˆ t1 ,t2 )
2

 ˆ    2  ˆ    2  ˆ    ˆ    

 1 1,0
  
2 2,0
  2 
ˆ t1 ,t2 
1 1,0

2 2,0
 
 SE ( ˆ1 )   SE ( ˆ2 )   SE ( ˆ )  SE ( ˆ )  
 1  2 
 
This is a quadratic form in β1,0 and β2,0 – thus the boundary of the set F = 3.00 is
an ellipse.
Confidence set based on inverting the
F-statistic
Figure 7.1: 95% Confidence Set for Coefficients on STR and Expn
from Equation (7.6)

You might also like