Econ140 Spring2016 Section07 Handout Solutions
Econ140 Spring2016 Section07 Handout Solutions
Section 7
GSI: Fenella Carpena
March 10, 2016
\ = 11.10 + 0.0689 · years + 0.0126 · gamesyr + 0.00098 · bavg + 0.0144 · hrunsyr + 0.0108 · rbisyr
salary
(0.29) (0.0121) (0.0026) (0.0010) (0.0161) (0.0072)
Example 2.1. What test statistic would we use to test the hypothesis H0 : β4 = 0, H1 : β4 6= 0? Carry
out this test at the 5% level?
We would calculate the t-statistic, as we have learned before in this class. So t-stat = (β̂4 − 0)/SE(β̂4 ) =
0.0144/0.0161 ≈ 0.894. Since |0.894| < 1.96; we fail to reject the null hypothesis.
Example 2.2. What test statistic would we use to test the hypothesis H0 : β2 = 0.1, H1 : β2 6= 0.1? Carry
out this test at the 10% level.
Again, we would calculate the t-statistic. That is, t-stat = (β̂2 − 0.1)/SE(β̂2 ) = (0.0126 − 0.1)/0.0026. Since
|t − stat| > 1.96, we reject the null.
1
Example 2.3. A sports analyst hypothesizes that once years in the league and games per year have been
controlled for, the variables bavg, hrunsyr, and rbisyr (which we can think of as measure of performance)
have no effect on salary.
or alternatively,
2
(Rur − Rr2 )/q
F − stat = 2 )/(n − k − 1)
(1 − Rur
that we tested in Example 2.3 using an F -test. Is it possible to carry out this joint hypothesis test using the
3 t-statistics from the following 3 individual tests: (1) H0 : β3 = 0, H1 : β3 6= 0; (2) H0 : β4 = 0, H1 : β4 6= 0;
(3) H0 : β5 = 0, H1 : β5 6= 0?
Testing each coefficient individually is not appropriate, because we want to look at all 3 coefficients
simultaneously. Therefore, what we need is the joint distribution of β̂3 , β̂4 , β̂5 . If we looked at each
of these one at a time by looking at the t-statistic, we will not be putting any restriction on the other
parameters. Note that if you looked at the t-stat of β̂3 , β̂4 , β̂5 individually, you will see that each has a t-stat
that is less than 1.96, which might lead you to conclude that we would fail to reject the joint hypothesis test.
But as we saw in Example 2.3, this conclusion turns out to be wrong.
2
Example 2.5. Looking back at Example 2.3, we found that we rejected the joint hypothesis that bavg,
hrunsyr, rbisyr have no effect on salary. But if we had looked at each of these variables individually, we
would have failed to reject each null hypothesis separately because the individual t-stats are less than 1.96.
What might explain the difference in these results?
One possibility is that there is imperfect multicollinearity between these three variables. Imperfect multi-
collinearity makes it difficult to estimate their coefficients precisely (why? recall again from last section’s
material) resulting in a low t-stat. Since the F-stat tests whether bavg, hrunsyr and rbsyr are jointly
different from zero, the high correlation between the three variables becomes less relevant.
(a) What test statistics can we use to carry out the above hypothesis test?
We can use either an F -stat or a t-stat
(b) Describe how you would calculate the F -statistic (under the assumption of homoskedasticity). What is
the restricted and unrestricted regression?
We would use the F -statistic formula using the SSR of the restricted regression and the unrestricted
regression, n, k, and q. The restricted regression is Yi = β0 + β1 (X1i + X2i ) + ui . The unrestricted
regression is Yi = β0 + β1 X1i + β2 X2i + ui . Here, q = 1 and k = 2.
(c) Describe how you can use a t-statistic to test H0 : β1 = β2 , H1 : β1 6= β2 . Specifically, transform the
regression so that you can use a t-statistic to carry out the test.
Note that the regression can be re-written as Yi = β0 + β1 X1i + β2 X2i + ui + [β2 X1i − β2 X1i ] =⇒
Yi = β0 + (β1 − β2 )X1i + β2 (X2i + X1i ) + ui . So we can regress Y on X1 and W , where W = X2 + X1 ,
and use a t-stat to test that the coefficient on X1 is zero.
Example 3.2. In the same regression as in in Example 3.1, transform the regression so that you can use
a t-statistic to test β1 + 2β2 = 0.
Note that the regression can be re-written as Yi = β0 + β1 X1i + β2 X2i + ui + [2β2 X1i − 2β2 X1i ] =⇒
Yi = β0 + (β1 + 2β2 )X1i + β2 (X2i − 2X1i ) + ui . So we can regress Y on X1 and Z, where Z = X2 − 2X1 ,
and use a t-stat to test that the coefficient on X1 is zero.
3
4 Hypothesis Testing in MRM: Stata
Example 4.1. Suppose we have 1980 census data on the 50 states recording the population size in each
state (pop), the median age (medage), the number of deaths (death), the number of marriages, (marriage),
and the number of divorces (divorce). We estimate the following regression:
------------------------------------------------------------------------------
pop | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
medage | -181303.8 43749.97 -4.14 0.000 -269420.8 -93186.87
death | 91.30243 4.137673 22.07 0.000 82.96873 99.63613
marriage | 1.80206 4.303597 0.42 0.677 -6.865829 10.46995
divorce | 39.80303 8.146704 4.89 0.000 23.39473 56.21134
_cons | 5241295 1272002 4.12 0.000 2679350 7803239
------------------------------------------------------------------------------
What Stata commands would you use to test the following hypothesis?
(a) Test the individual hypothesis that the coefficient on medage is zero.
test medage = 0. If you did this command in Stata, you would get an F -statistic of 17.17. Note that
in the case of an individual hypothesis test, the F -stat = (t-stat)2 . From the regression table above, the
t-stat is −4.14, and you can verify that −4.142 is about 17.17 (the differences are due to rounding error).
(b) Test the joint hypothesis that the coefficients on all four regressors are simultaneously zero.
test (medage = 0) (death = 0) (marriage = 0) (divorce = 0). If you did this command in Stata,
you would get an F -stat of 1299.46. Notice that this is the same F -stat that is in the upper right of the
regression table, which says F( 4, 45) = 1299.46. This is not a coincidence, because the F -stat in the
upper right of the regression table is for the joint hypothesis that all regressors are zero.
(c) Test the linear hypothesis that the coefficient on death minus the coefficient on marriage is zero.
test death - marriage = 0. As an aside, note that we could also test any linear combination of
coefficients, for example, test 2∗death + 4∗marriage = 6.
(a) [6] What is the interpretation of the coefficient on points in all three regressions? Give the meaning
of the OLSE of the points coefficient from the third regression.
The coefficient for the multivariate regressions is β1 = ∆wage/∆points under the assumption that
all other variables are held constant. An additional point per game (on average for all games played)
will fetch the NBA player, on average, over $80,000 in annual salary – holding rebounds and assists
constant.
4
Table 1: Regressions on NBA Salaries
Variables Model 1 Model 2 Model 3
points 111.67 87.72 80.67
[8.18]** [9.43]** [11.20]**
rebounds 86.44 93.36
[20.62]** [21.25]**
assists 26.08
[21.56]
cons 278.10 137.56 115.02
[83.62]** [83.09] [86.09]
R2 0.43 0.47 0.48
N 269 269 269
(b) [8] When the rebounds variable is added, both the R2 and the Adjusted R2 increase whereas when
additionally assists variable is added, only the R2 increases. Does this indicate that rebounds should
be in the regression and that assists should not? Explain.
R2 tells us the percent of variation in our dependent variable that is explained by the regression
model. However, this measure should not solely determine whether or not to include a variable in
a regression - particularly when we are concerned with determining causal effects or are attempting
to control for omitted variables. R2 necessarily increases each time a variable is added so that a
small increase in R2 conveys no information. Adjusted R2 does dip slightly when assists was added
to the regression and so adjusted R2 is doing its job of penalizing for adding a regressor. Note that
the coefficient on rebounds was significantly different from zero whenever it appeared in a regression,
whereas the coefficient on assists was not. Now, assists could be correlated with other measures of
on-court performance. Apparently it is not correlated enough to impact standard errors of coefficients
on points and rebounds. However, the criterion for including a variable should be based on the theory.
Here that would be performance on the court translates into wins which translates into ticket sales
and broadcast rights.
(c) [5] We wonder whether a player’s position matters, thinking that different positions may be more
valuable to a team than others. To investigate, we regresses salary on dummy indicators of each of
the three possible court positions a player can have, center, forward, and guard (and a constant),
with no other explanatory variables. Stata refuses to report OLSEs for all three regressor. Why does
this happen? How would you solve this problem?
A player is classified as having one of the three positions and so the three dummy variables add up to
one, i.e., equal to the constant “variable”. Hence, we have perfect multicollinearity and an example
of “the dummy variable trap”. To escape this trap, exclude one of the three positions, e.g., center.
Stata will do this by itself, excluding the first of the offending variables in alphabetical order. Then
the coefficients on the other two positions give the increment in salary relative to the average center.
(d) [3] Based on the results, it seems that the marginal effect of rebounds on salaries is higher than
the marginal effect of points. So immediately after running the regression for model 3 we issue the
command: test rebounds=points. What does this Stata command do?
This command performs a test of the hypothesis that coefficients on points and rebounds are equal.
(e) [7] Stata generates output from the command in (d) that includes an F-statistic. Suppose that it takes
the value: 4.60. Decide whether you reject the null at the 1% level of significance.
The 1% critical value of the F-stat with 1 and 265 degrees of freedom is very close to F(1,∞) = 6.6349
from the F-table. Since our F-stat of 4.60 is less than 6.6349 we cannot reject the null that the 2
coefficients are equal at 1% significance level.
5
6