Econometrics I - Lecture 5 (Wooldridge) Color
Econometrics I - Lecture 5 (Wooldridge) Color
ECONOMETRICS I
National Economics
University
2019
MA Hai Duong
1
MULTIPLE REGRESSION:
INFERENCE
2
RECAP.
In the multiple regression model
where
under fairly minimal assumptions the OLS estimator is an unbiased
estimator of
Under the assumptions that:
1. the population model is linear in parameters;
2. the conditional expectation of true errors given all explanatory
variables is zero;
3. the sample is randomly selected; and
4. the explanatory variables are not perfectly collinear
If we add no conditional heteroskedasticity to the above assumptions,
22
then OLS is the B.L.U.E. for
RECAP.
Under these assumptions, the variance-covariance matrix of
conditional on is , which we estimate using
Where:
6
SAMPLING DISTRIBUTIONS OF
THE OLS ESTIMATORS
Assumptions we have made so far gave us
And
7
ASSUMPTION MLR.6 OR E.5
(NORMALITY)
The random sampling assumption implies that population errors are
independent of each other
The zero conditional expectation assumption implies that
population errors have mean zero
The no heteroskedasticity assumption implies that population
errors have variance
Assumption MLR.6 can therefore be expressed as: conditional on
explanatory variables, population errors are i.i.d.
Similarly, E.5 can be expressed as
Assumptions MLR.1 to MLR.6 are called the Classical Linear Model
(CLM) assumptions
8
THE CLASSICAL LINEAR
MODEL
MLR.6: 𝒖¿
𝒖(−)
9
SAMPLING DISTRIBUTIONS OF
THE OLS ESTIMATORS
The beauty of Normal distribution is that any linear combination of
Normal random variable is also Normally distributed
Recall that:
10
SAMPLING DISTRIBUTIONS OF
THE OLS ESTIMATORS
That is for
Where:
11
TESTING HYPOTHESES ABOUT A
SINGLE POPULATION PARAMETER
We cannot directly use the result to test hypotheses about because
depends on , which is unknown.
But we have as an estimator of . Using this in place of gives us the
standard error,
But replacing (an unknown constant) with (an estimator that varies
across samples), takes us from the standard normal to the distribution
(Student distribution).
Under the CLM Assumptions,
12
TESTING HYPOTHESES
(CONT.)
The t distribution also has a bell shape, but is more spread out (has
fatter tails) than the . It gets more similar to as its (degree of free)
increases.
13
TESTING HYPOTHESES
(CONT.)
We use the result on the distribution to test the null hypothesis
about a single
Most routinely we use it to test if controlling for all other , has no
partial effect on :
16
TESTING HYPOTHESES
(CONT.)
Fail to reject
the null
17
TESTING HYPOTHESES
(CONT.)
If (the value of the -statistic in our sample) falls in the critical
region, we reject the null hypothesis.
When we reject the null, we say that is statistically significant at
the % level
When we fail to reject the null, we say that is not statistically
significant at the % level
18
TESTING HYPOTHESES
(CONT.)
We can use test to test the null hypothesis:
19
P-VALUE
Another way, probability value (P-value or Significant) can be used
to making decision for testing hypothesis
P-value formula: P-value
P-value
P-value
Decision rule: P-value Reject the null
P-value Doesn’t reject the null
These values are automatically calculated by Eviews (only for two-
sided test). In one-sided cases, we can use a half of P-value
20
P-VALUE CALCULATED BY
EVIEWS
21
STEPS INVOLVED IN STATISTICAL
VERIFICATION OF A QUESTION
1. Formulating and for the question
2. Determining the appropriate test statistic and stating its
distribution under
3. Determining the rejection region with reference to the null
distribution, and the desired level of significance of the test (),
using statistical tables or software
4. Calculating the test statistic from regression results
5. Arriving at a conclusion: Rejecting if the value of the test
statistic falls inside the rejection region, and not rejecting
otherwise
6. Explaining the conclusion in the context of the question
22
CONFIDENCE INTERVALS
Another way to use classical statistical testing is to construct a
confidence interval using the same critical value as was used for a
two-sided test
A confidence interval is defined as
24
TESTING MULTIPLE LINEAR
RESTRICTIONS (CONT.)
Question: The first null involves four restrictions, the second
involves ... restriction and the third involves ... restrictions?
The alternative can only be that at least one of these restrictions is
not true
The test statistic involves estimating two equations, one without
restrictions (the unrestricted model) and one with the restrictions
imposed (the restricted model), and seeing how much their sum of
squared residuals differ
This is particularly easy for testing exclusion restrictions like the
first two nulls on the previous slide
25
TESTING MULTIPLE LINEAR
RESTRICTIONS (CONT.)
For example, (note: 2 restrictions)
We have the alternative is : at least one of or is not zero
The unrestricted model is:
(UR)
and the restricted model is:
(R)
26
TESTING MULTIPLE LINEAR
RESTRICTIONS (CONT.)
The test statistic is
under
where is the number of restrictions, and is an distribution with
degrees of freedom. The value is called the numerator , and is
called the denominator .
Note: The F statistic is always positive (Recall Statistics in E&B) - so
if you happen to get a negative number, you should realize that
you must have made a mistake
The 10%, 5% and 1% critical values of the distribution for various
numerator and denominator s are given in Tables G.3a, G.3b and
G.3c of the textbook
27
TESTING MULTIPLE LINEAR
RESTRICTIONS (CONT.)
Suppose and . Then from Table G.3b the
29
TEST FOR OVERALL
SIGNIFICANCE OF A MODEL
This is the that is reported in the Eviews output any time we estimate a
regression
It is for the special null hypothesis that all slope parameters are zero
30
EXAMPLE FOR THE F TEST
Using data VHLSS 2010, determinant the effect of the number of high-
education labors on the number of location labors, we will use the
estimation results:
(UR)
(R)
32
EXAMPLE FOR THE F TEST
is greater than both and ⇒ There is strongly enough evidence to
suggest that the effect of the number of high educated labors on
the number of located labors is statistically significant, after
controlling for the effects of total population and the population
density of location to the number of located labors.
The for overall significance of both regressions shows that both
models have at least one significant explanatory variable for
explaining the number of located labors.
You can do these test by yourself, can’t you?
33
TESTING A SINGLE
RESTRICTION THAT
INVOLVES MORE THAN ONE
PARAMETER
In the regression model
34
TESTING A SINGLE
RESTRICTION THAT
INVOLVES MORE THAN ONE
PARAMETER
We can use the as usual, there would be no good reason to study
this case separately
However, with a single hypothesis, we can have one-sided
alternatives. For example, for Null 1, we have
(Alt 1)
and for Null 2 we have
(Alt 2)
F-test cannot be used for one sided alternatives
We can re-formulate the problem and use a t-test
35
TESTING A SINGLE
RESTRICTION THAT
INVOLVES MORE THAN ONE
PARAMETER
Consider Null1. Define
We can write Null1 and Alt1 as
36
REPARAMETERISATION: A
COOL USEFUL TRICK
Consider Null1 and Alt1
and set
Replace in the model with and rearrange
37
REPARAMETERISATION:
EXAMPLE
Test that controlling for IQ, one additional year of education has the
same proportional effect on wage as one additional year of
experience, against the alternative that it has a larger effect:
38
REPARAMETERISATION:
EXAMPLE
Note that only the estimated parameter of EDUC on the right is on
the left
is the parameter of EDUC in the reparameterised model
= 5.59 which is larger than 1.65 the 5% one-sided CV of
Therefore, we reject the null
Conclusion: After controlling for IQ, one additional year of education
has a larger proportional effect on wage as one additional year of
experience.
39
SUMMARY
In this lecture we learned that if we add the assumption that
population errors are normally distributed, we get that the OLS
estimator, conditional on X is normally distributed
Using this, we learned how to test hypothesis on any of the slope
parameters, or on a single linear combination of several of them,
by using a test
We learned that we can reparameterise the population model to
enable us to test a single linear combination of parameters easily
We also learned how to form confidence intervals for any of the
slope parameters
We learned that hypothesis tests could also be done with the help
of confidence intervals, or alternatively using p-values 40
SUMMARY
Further, we can test multiple linear restrictions using an
The test statistic is based on sum of squared residuals of the
unrestricted and the restricted models ( and )
under
Using the significance level of the test, we can find the of the test
from the F table
We reject the null of > , and not reject otherwise
The for overall significance of a regression model is an important
case that is provided by all statistical packages
Finally, we learned how to test general linear restrictions
41
TEXTBOOK EXERCISES
Problems 5, 6 (page 142)
Problems 8, 9 (page 143)
Problems 12, 13 (page 145)
42
COMPUTER EXCERCISES
C8 (page 147)
C11, C12 (page 148)
43
THANK FOR YOUR
ATTENDANCE - Q & A
44