yit = β0 + β1xit,1 + β2xit,2 + β3xit,3 + β4xit,4 + uit ,x ,x ,x ,β ,β ,β ,β
yit = β0 + β1xit,1 + β2xit,2 + β3xit,3 + β4xit,4 + uit ,x ,x ,x ,β ,β ,β ,β
where (xit,1,xit,2,xit,3,xit,4) are explanatory variables, (β0,β1,β2,β3,β4) are unknown parameters of interest, and uit is
unobservable error. As usual, i = 1,...,N refers to individuals (id, cross-sectional units) and t = 1,...,T refers to
time periods. Use the data file SIMUDATA.dta to answer the following questions. Unless otherwise specified,
use 5% as the significance level for all the tests below.
(a) (10 points) Declare the data to be a panel via specifying the individual identifier (id) and time identifier
(t). Which regressor(s) are not time-varying? What are N and T? Do you have a balanced panel?
Answer:
Answer:
R2=0.896
c) (10 points) It is well-known that the standard errors (SE) of panel data estimation need to be
adjusted to control for likely correlation of the error uit over time for given i (clustering on i), i.e.,
C(uit,uis|xi) 6= 0 for t 6= s. Re-estimate model (1) using OLS and calculate cluster-robust SE. Compare
the estimation results with those obtained in Part (b). Comment on your findings. If C (uit, uis|xi)6 = 0
is true, is OLS is still blue?
Answer:
R2=0.896
The coefficient estimates and R – squared are the same in both (b) and (c). However, there is a significant
difference in standard errors which provides evidence that there is non-zero correlation of the error uit over
time for given i (clustering on i), i.e., C(uit,uis|xi) 6= 0. Since this is true, then the Gauss-Markov theorem does
not hold. In this case, the pooled OLS estimator is not blue.
d) (10 points) One of your friends argues that the OLS estimator may be problematic as xit,1 is probably
endogenous. If this were true, which assumption of linear regression would not be valid, and what
could be wrong with using OLS?
Answer:
The exogeneity assumption of linear regression will be violated which states that independent
variables (x) are not dependent on the dependent variable (y).
When E [uit |xit,1 , xit,2, xit,3]6 = 0, the OLS will be unbiased or inconsistent.
e) (10 points) This friend suggests that you should use TSLS rather than OLS. In particular, he proposes
two instrumental variables (IV), zit,1 and zit,2, for xit,1. What conditions must hold for zit,1 and zit,2 to be
valid IV?
Answer:
For instrumental variables, zit,1 and zit,2 to be valid IV, the following conditions must hold.
f) (10 points) Estimate model (1) using TSLS with zit,1 and zit,2 as IV. As in Part (c), you should compute
and report cluster-robust SE. Compare the TSLS estimates with the OLS estimates obtained in Part
(c), and comment on your findings. Assuming both zit,1 and zit,2 are valid IV, do you think xit,1 is an
endogenous regressor? Explain.
Answer:
^y =1 . 571 x it , 1−1.9 02 x it , 2+ 0.573 x it , 3+2. 141
(0.301) (0.199) (0.547) (0.272)
R2=0.799
Comparing the above model with estimated model from part (c), it is evident that the TSLS estimated
coefficients are significantly different with the OLS estimates obtained from part (c), which indicates that
there is at least one endogenous regressors.
This intuition (endogenous regressor) is verified by the (robust) Hausman test. It’s P-value is 0.0002
<0.05, therefore the null hypothesis is rejected.
(g) (15 points) Suppose zit,1 is exogenous. Name a test that can be used to check if zit,2 also satisfies the
exogeneity condition. Assess the strength of (zit,1,zit,2) as IV. Write the expression for the first-stage
regression of the TSLS. Is the coefficient on xit,1 exactly identified? Explain.
Answer:
F-Test can be used to assess the IV strength. The F-statistic below is 7.20826 <10. Therefore, the null
hypothesis that, none of zit,1 and zit,2 is strong IV, cannot be rejected. We conduct overidentifying test to test
The first-stage regression of the TSLS has the following estimates. The coefficient on X it,1 is not exactly
identified.
^
X 1=0.033 x it , 2+0 . 933 x it ,3+ 0.247 z it ,1+0.2 38 z it , 2−0. 20 6
(0.101) (0.243) (0.144) (0.071) (0.122)
R2=0.0779
(h) (15 points) To capture potential time effects, consider the following model
where ds,t are time dummies (ds,t = 1 if s = t, and 0 otherwise). Note that the sample includes data from t
= 1 to t = T, but model (2) includes only dummies for t = 2 to t = T. Why? Estimate model (2) using TSLS
with zit,1 and zit,2 as IV and test if time effects are significant, i.e., at least one γt are not zero. With time
effects controlled, do you think xit,1 is still an endogenous regressor?
Answer:
^
y it=1. 546 x it ,1−1. 902 x it , 2+ 0 .598 x it , 3−0.069 d 2−0.0 80 d 3−1.169 d 4+ 0. 896 d 5+2 . 223
(0.309) (0.119) (0.416) (0.267) (0.267) (0.267) (0.267) (0.211)
R2=0.8082
d1 is not included in the above regression in order to avoid perfect multicollinearity. The P-value for test H0:
y2=y3=y4=y5 is essentially 0. Therefore, the null hypothesis that, there exists no time effects, is rejected. It is
concluded that controlling time effects does not help in eliminating the endogeneity problem as the p-vlaue for
Hausman test is still very small i.e. 0.0001<0.05.
(i) (15 points) Suppose that vit = αi + eit with eit ∼ i.i.d.(0,σe2). Re-write model (2) as
Treat αi as fixed effects (FE). Use an FE estimator to estimate model (3). 1 Justify the fact that the FE estimator
cannot estimate all slope coefficients. Compare the FE estimates with the TSLS estimates obtained in (h).
Comment on your findings.
Answer:
Since Xit,3 is time-invariant, the FE estimator cannot estimate its coefficient β3. FE and TSLS estimates look
similar. If included regressors are correlated with αi but uncorrelated with eit, the FE estimator is consistent.
Whereas TSLS estimator is consistent irrespective of whether included regressors are correlated with αi or
eit or both. The similarity between FE and TSLS estimates implies that the FE model can resolve the
endogeneity problem in this case. i.e. it is appropriate to assume E[eit| vit]=0 but allow E[αi| vit]=/0. Note
that the FE estimator has much smaller SE for the coefficients on (x it,1, xit,2). This is not surprising as we
already know from Part (g) that zit,1 and zit,2 are weak IV. Using weak IV sometimes leads to imprecise TSLS
estimates.