In-Semester Test - Proposed Solutions
In-Semester Test - Proposed Solutions
In-Semester Test - Proposed Solutions
Question 1
1.
[5 Marks] The model could be written in different forms but it’s important to make it clear
whether variables vary over time and across different individuals.
• Most explicit form:
𝐸𝑎𝑟𝑛!" = 𝛽# + 𝛽$ 𝑀𝑎𝑙𝑒! + 𝛽% 𝐴𝑔𝑒!" + 𝛽& 𝐸𝑑𝑢𝑐!" + 𝛽' 𝑂𝑐𝑐!" + 𝛽( 𝐸𝑥𝑝𝑒𝑟!" + 𝛽) 𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛!"
+ 𝛽* 𝑀𝑎𝑟𝑟𝑖𝑒𝑑!" + (𝛼! + 𝜀!" )
Or simply:
𝐸𝑎𝑟𝑛!" = 𝒙-𝒊𝒕 𝜷 + (𝛼! + 𝜀!" )
Notice that we are assuming a person’s gender is time-invariant and that education is allowed
to change over time (no penalties if students assumed education was time-invariant). All other
variables are assumed to vary over time and across individuals, except 𝛼! .
• Per-individual matrix form:
𝑬𝒂𝒓𝒏𝒊 = 𝑿𝒊 𝜷 + 𝒖𝒊 , ∀𝑖 = 1, . . . , 𝑛
2.
[5 Marks] The most important difference is the assumption made about 𝛼! . For the FE model,
this term is allowed to correlate with any of the regressors (i.e., 𝐶𝑜𝑣(𝛼! , 𝑿) ≠ 0), while for
the RE the assumption is of zero correlation (𝐶𝑜𝑣(𝛼! , 𝑿) = 0)).
[5 Marks] From this assumption, the FE and RE transformations are proposed. For the FE,
the fact that there is such a correlation leads to a transformation designed to eliminate the
effects of 𝛼! . But for the RE model, there is no need to eliminate 𝛼! as it doesn’t lead to a
violation of zero conditional mean. However, to overcome the consequential problem of
serial correlation caused by 𝛼! , a Feasible GLS estimator is used. This makes the RE model
more efficient (than the POLS).
A clear indication of the two sub-parts of the question was expected.
3.
Three possible ways would be:
First: create a dummy variable for each individual in the sample (except one) in order to
explicitly model 𝛼! . This is called Least Square Dummy Variable Model (LSDV). The
advantages of this method are (1) to be capable of observing the average difference in
earnings for each individual and (2) to test whether this is an important variable across
individuals (i.e., we can test whether the POLS is the best option or not).
The disadvantages include: (1) the model will normally contain a large set of regressors,
making it impractical to implement (computationally and analytically), particularly when
considering that we are primarily interested in the role of Male on earnings (not in how wages
vary across each individual); (2) the large number of regressors results in a large number of
degrees of freedom being consumed, decreasing the power of our conclusion; and (3) we
won’t be able to observe the effects of time-invariant regressors because of collinearity with
the dummy variables.
Second: use the OLS applied to the FE transformation (i.e., the FE or Within estimator). This
method has the advantage of being much more practical and preserve more degrees of
freedom than the LSDV option, but we won’t be able to directly observe the estimated
individual-specific effects either. In relation to the FD estimator, the FE estimator tends to be
more efficient. The biggest disadvantage, however, is that time-invariant regressors will be
eliminated in the process, i.e., we won’t be able to observe an estimate of the parameter of
interest (𝛽$ ).
Third: if T = 2, the FE estimator will be numerically the same as the First Difference (FD)
estimator. The advantages are similar to the ones for the FE estimator. The disadvantage here
is that this equivalence is only possible if we have two years.
Notice that for all the models above, we are assuming that the truncation of age is unrelated
to the regressors, i.e., no adjustments for this issue would be necessary.
4.
As the variable of interest is time-invariant, we cannot use the FE estimator (or the variants
discussed above). So, the parameter won’t be identified.
This one was more of a hit-or-miss type of question but there were variations.
5.
[5 Marks] If the assumption of 𝐶𝑜𝑣(𝛼! , 𝑿) ≠ 0 is dropped, then we can either use the RE
model (if 𝛼! are not constant for all individuals) or the POLS if 𝛼! = 𝛼, ∀𝑖. In most cases, we
would favour the RE estimator as the conditions for the POLS to be consistent and unbiased
are even more restrictive than the ones for the RE.
With the RE (or POLS), we will be able to identify the effect of 𝑀𝑎𝑙𝑒! on earnings.
[5 Marks] The main disadvantage to use the RE model is the strength of the assumption that
𝐶𝑜𝑣(𝛼! , 𝑿) = 0. In our context, if ability (assumed to be time-invariant, unobservable and
individual specific, i.e., being part of 𝛼! ) is correlated with education (reasonable
assumption), then the RE estimator will be biased and inconsistent.
Notice that the BE is also possible here, although not a common choice.
6.
There are different strategies for addressing this issue. In the two cases considered below
(there might be more), we create a dummy variable for having children or not (let’s call this
𝐶ℎ𝑖𝑙𝑑!" ). It’s likely that the first strategy is superior to the second, as it’s likely less
restrictive.
Strategy 1:
We could consider only women in our sample and run the following regression:
𝐸𝑎𝑟𝑛!" = 𝜃# + 𝜃$ 𝐶ℎ𝑖𝑙𝑑!" + 𝜃% 𝐴𝑔𝑒!" + 𝜃& 𝐸𝑑𝑢𝑐!" + 𝜃' 𝑂𝑐𝑐!" + 𝜃( 𝐸𝑥𝑝𝑒𝑟!" + 𝜃) 𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛!"
+ 𝜃* 𝑀𝑎𝑟𝑟𝑖𝑒𝑑!" + (𝛼! + 𝜖!" )
Notice I left both 𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛!" and 𝐶ℎ𝑖𝑙𝑑!" in the regression. I’m also assuming away any
issues with censoring and truncation (particularly of the dependent variable).
In this regression, we could use the FE model without the issues discussed above, and
motherhood penalty would be estimated by 𝜃M$ .
Strategy 2:
We use all observations (both men and women). Proposing an interaction term relating both
variables (Male and Child) could potentially capture this effect.
The model could be:
𝐸𝑎𝑟𝑛!" = 𝛿# + 𝛿$ 𝑀𝑎𝑙𝑒! + 𝛿% 𝐴𝑔𝑒!" + 𝛿& 𝐸𝑑𝑢𝑐!" + 𝛿' 𝑂𝑐𝑐!" + 𝛿( 𝐸𝑥𝑝𝑒𝑟!" + 𝛿) 𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛!"
+ 𝛿* 𝑀𝑎𝑟𝑟𝑖𝑒𝑑!" + 𝛿. 𝑀𝑎𝑙𝑒! ∗ 𝐶ℎ𝑖𝑙𝑑!" + (𝛼! + 𝑣!" )
Notice I left both 𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛!" and 𝐶ℎ𝑖𝑙𝑑!" in the regression.
The expected difference in earnings between males and females would therefore be:
𝐸 [𝐸𝑎𝑟𝑛!" |𝑀𝑎𝑙𝑒! = 1, 𝑋 ] − 𝐸 [𝐸𝑎𝑟𝑛!" |𝑀𝑎𝑙𝑒! = 0, 𝑋 ] = 𝛿$ + 𝛿. 𝐶ℎ𝑖𝑙𝑑!"
So, the predicted difference between males and females will also depend on whether the
woman has children or not.
We wouldn’t be able to estimate 𝛿$ using the FE estimator (or equivalent) but we could either
(1) use an alternative method (RE or POLS) if we believe 𝐶𝑜𝑣(𝛼! , 𝑿) = 0 or (2) not estimate
this effect if we are primarily interested in testing the statistical significance of 𝛿. . These are
reasons why we would likely favour the first strategy.
Question 2
7.
We can use a Hausman test after estimating both models. The null hypothesis is:
𝐻# : 𝑝𝑙𝑖𝑚Y𝜷Z 𝑹𝑬 − 𝜷
Z 𝑭𝑬 [ = 𝟎
U
Z 𝑹𝑬 − 𝜷
𝐻$ : 𝑝𝑙𝑖𝑚Y𝜷 Z 𝑭𝑬 [ ≠ 𝟎
𝐻 = Y𝜷 Z 𝑹𝑬 [- Y𝑉𝑎𝑟(𝜷
Z 𝑭𝑬 − 𝜷 Z 𝑹𝑬 )[2$ (𝜷
Z 𝑭𝑬 ) − 𝑉𝑎𝑟(𝜷 Z 𝑭𝑬 − 𝜷
Z 𝑹𝑬 )
Under the null hypothesis, H follows a 𝜒3% distribution where q is the number of time-varying
variables that are included in the RE and FE regressions1. If the null hypothesis is rejected at
the 𝛼% significance level, we would favour the FE model. If we don’t reject it, the RE is
preferred.
If the RE is not efficient, we would need to use the robust Hausman test.
No full marks if students didn’t make it clear that only time-varying terms are considered in
the test (definition of q).
1
Recall that the RE transformation does not eliminate time-invariant regressors, likely resulting in more
estimates than the FE model. So, we must only consider the time-invariant estimates in the test.
8.
The assumptions for the FD model to be unbiased, consistent and efficient are:
FD.1 – Linearity of parameters: 𝒚𝒊 = 𝑿𝒊 𝜷 + 𝛼! 𝜾 + 𝜺𝒊 , ∀𝑖 = 1, . . . , 𝑛.
FD.2 – Random sampling from the cross section.
FD.3 – No perfect collinearity (𝑿- 𝑫′𝑫𝑿 has full rank).
FD.4 – Exogeneity: 𝐸(𝜺𝒊 |𝑿𝒊 ) = 𝟎. (Or equivalent)
FD.5 – Homoskedasticity: 𝑉𝑎𝑟(𝜀!" |𝑿𝒊 ) = 𝑉𝑎𝑟(𝜀!" ) = 𝜎 % , for all t = 1, 2, …, T.
FD.6 – No serial correlation: 𝐶𝑜𝑣(𝜀!" , 𝜀!4 |𝑿𝒊 ) = 0, for all 𝑡 ≠ 𝑠.
FD.7 – The errors are independent and identically asymptotically normally distributed.
Assumptions 5, 6 and 7 can be summarised by using:
Z 𝑭𝑫 − 𝜷) → 𝑁 k𝟎, 𝜎 % Y𝐸(𝑿-𝒊 𝑫′𝑫𝑿𝒊 )[2$ l
√𝑛(𝜷
No full marks if students didn’t use the mathematical symbols (within reason) in their
answers.
9.
For the sake of notation, assume 𝐷% represents the second difference (e.g., 𝐷% 𝑦!" = 𝑦!" −
𝑦!,"2% ). Therefore, the model written in second differences would be:
𝐷% 𝒚𝒊 = 𝐷% 𝑿-𝒊 𝜷 + 𝐷% 𝜺𝒊 , for 𝑖 = 1, 2, … , 𝑛
For each i, the second difference matrices will be:
−1 0 1 0 … 0
𝐷% = t … … … … … 0u
0 0 0 −1 0 1 (82%) ; 8
𝑦!,& − 𝑦!,$
𝐷% 𝒚𝒊 = t … u
𝑦!,8 − 𝑦!,82% (82%) ; $
𝜀!,& − 𝜀!,$
𝐷% 𝜺𝒊 = t … u
𝜀!,8 − 𝜀!,82% (82%) ; $
10.
There is nothing preventing us from getting to the same expressions derived in lectures. For
instance, the FD-2 estimator (this is the name I’m giving to this estimator) would be:
> 2$ >
1 1
Z 𝑭𝑫2𝟐
𝜷 = 𝜷 + y z(𝐷% 𝑿𝒊 )- 𝐷% 𝑿𝒊 { y z(𝐷% 𝑿𝒊 )- 𝐷% 𝜺𝒊 {
𝑛 𝑛
!?$ !?$
To get to this point, we assumed linearity of the parameters and random sampling (just like
FD.1 and FD.2). For unbiasedness and consistency, we would need the inverse of the term
above to exist (reflecting the need for no perfect collinearity – FD.3) and at the same time the
last term in brackets to be zero. This requires exogeneity (FD.4).
So, the assumptions to guarantee unbiasedness and consistency would be:
FD-2.1 – Linearity of parameters: 𝒚𝒊 = 𝑿𝒊 𝜷 + 𝛼! 𝜾 + 𝜺𝒊 , ∀𝑖 = 1, . . . , 𝑛.
FD-2.2 – Random Sample from the cross section.
FD-2.3 – No perfect collinearity (𝐸[(𝑫𝟐 𝑿𝒊 )- 𝑫𝟐 𝑿𝒊 |𝑿𝒊 ] is invertible).
FD-2.4 – Exogeneity: 𝐸[(𝑫𝟐 𝑿𝒊 )- 𝑫𝟐 𝜺𝒊 |𝑿𝒊 ] = 𝟎 or equivalently 𝐸[𝑫𝟐 𝜺𝒊 |𝑫𝟐 𝑿𝒊 ] = 𝟎. Strict
exogeneity is also possible.
The assumptions for efficiency include homoskedasticity, no serial correlation and asymptotic
normality of the error term. They could be summarised using:
2$
Z 𝑭𝑫 − 𝜷) → 𝑁 k𝟎, 𝜎@% Y𝐸((𝑫𝟐 𝑿𝒊 )- 𝑫𝟐 𝑿𝒊 )[ l
√𝑛(𝜷