Final 2015 PDF
Final 2015 PDF
1. State and prove the Frisch-Waugh-Lovell (FWL) theorem as it applies to the OLS
estimates of β2 and the residuals in the linear regression model
y = X1 β1 + X2 β2 + u. (1)
The theorem says that the estimates of β2 and the residuals are the same if one runs (1)
by OLS or the projected regression:
M1 y = M1 X2 β2 + u. (S.01)
y = X1 β̂1 + X2 β̂ + û = PX y + MX y, (S.02)
since
X2⊤M1 MX y = X2⊤MX y = 0.
Solving (S.03) for β̂2 gives the expression of the estimator from (S.01).
For the residuals, premultiply (S.02) by M1 to get
M1 y − M1 X2 β̂2 = M1 MX y = MX y.
The leftmost expression is the vector of residuals from (S.01), the rightmost that from (1).
Explicitly derive the covariance matrix of β̂2 from (1). Let X ≡ [X1 X2 ], and derive this
covariance matrix in terms of a block of (X⊤X)−1 . Show how the FWL result allows us to
obtain an explicit expression for this block.
We have β̂2 = (X2⊤M1 X2 )−1 X2⊤M1 y, from which we can compute the covariance matrix
of β̂2 as
( ⊤ )−1 ( )−1 ( )−1
X2 M1 X2 X2⊤M1 σ 2 IM1 X2 X2⊤M1 X2 = σ 2 X2⊤M1 X2 . (S.04).
.
Since the covariance matrix of β̂ ≡ [β̂1 ... β̂2 ] is σ 2 (X⊤X)−1 , the covariance matrix of β̂2
is the lower right k2 × k2 block of this matrix, which can therefore, since there can be only
one covariance matrix of β̂2 , be expressed explicitly as (S.04).
1
If β1 = 0, then β2 can also be estimated by running
y = X2 β2 + u,
with covariance matrix σ 2 (X2⊤X2 )−1 , where σ 2 is the variance of the components of the
disturbance vector u. Explain the relation between this covariance matrix and the one you
computed for the estimator from running (1) in terms of the Gauss-Markov theorem.
The covariance matrix of β̃2 from the restricted equation is σ 2 (X⊤X)−1 . By Gauss-
Markov, this should be smaller than the variance of any other linear unbiased estimator,
such as β̂2 . Intuitively, (1) does not incorporate the information about β1 that is incorpo-
rated in the restricted regression.
If we compare the precision matrices instead of the covariance matrices, we see that, indeed,
( ) ( )
X2⊤X2 − X2⊤M1 X2 = X2⊤P1 X2 = P1 X2 ⊤ P1 X2 ,
2. In asymptotic theory, the law of large numbers is used to prove the consistency of
estimators under certain conditions, and the central limit theorem is used to prove asymp-
totic normality. Give formal definitions of the properties of consistency and asymptotic
normality of the NLS estimator of the parameters of the nonlinear regression model
y = x(β) + u.
Consistency says that, if the true value of β under the DGP is β0 , then, if β̂ is the NLS
estimator,
plim β̂ = β0 .
n→∞
where V is a deterministic, finite, positive definite matrix. The NLS estimator may well
be consistent for one asymptotic construction, but not for another. But the most usual
failure of consistency is when one or more parameters are not identified. An example is
the model
y = αι + βγx + u,
where β and γ cannot be separately identified.
More generally, if it can be shown that a given estimator is consistent, of what practical
use is the further information that the estimator is also asymptotically normal?
Unless one has an approximation to the actual distribution of the estimator, or, better, the
estimation error, it is not possible to perform statistical inference. It is usually possible
to estimate the covariance matrix V , and this allows us to set up hypothesis tests and
confidence intervals.
2
3. A variable represented by an n-vector y is said to be centred if the sum of its elements
is zero:
∑
n
yt = 0.
t=1
y = α + βx + u. (4)
Denote by w the result of centring y, and by z the result of centring x, and then consider
these regressions:
w = βz + u,
w = α + βz + u,
y = α + βz + u,
y = βz + u,
y = βx + u.
For which of these regressions is the estimate of β the same as that given by (4)? Why? In
the two cases in which α is estimated, what is the connection between these estimates of α
and that given by (4)? Explain.
The FWL theorem tells us that the estimator of β from (4), and the residuals, are the
same as those obtained from the FWL regression Mι y = Mι X + Mι u, where the action
of Mι on any vector is to replace it by its centred version. The FWL regression is therefore
just w = βz + u, the first of the regressions to be considered.
For the second, once again use FWL to annihilate the constant. Since Mι is idempotent,
the result is w = βz + u, and so again the estimator of β and the residuals are the same
as those from (4).
For the third regression, use FWL again to get w = βz + u, and so the conclusion is the
same as for the first two regressions.
The estimate of β̂ from (4) can be written as (z⊤z)−1 z⊤w. The fourth regression gives an
estimate equal to (z⊤z)−1 z⊤y, and, since z = Mι z, it follows that z⊤y = z⊤Mι y = z⊤w.
Thus the estimate of β is the same. The residuals, however, are different, because, for (4),
they are M[ι z] y, whereas, here they are Mz y.
Finally, the fifth regression gives an estimate of (x⊤x)−1 x⊤y, which is quite different from
what is given by the FWL regression.
In the second regression, since the regressand and the residuals are the same as for the
first, the estimate of α must be zero.
In the third regression, the regressors ι (for α) and z are orthogonal, and so the estimate
of α is the same as that obtained by regressing y on ι alone. It is therefore just ȳ.
3
After estimating the equation
y = α + β1 x1 + β2 x2 + β3 x3 + u,
by OLS, a student observed that the regressors x1 , x2 , and x3 were linearly dependent
– the package had eliminated x3 on the grounds of collinearity with the other regressors.
Another attempt was made using centred versions of the four variables y, x1 , x2 , and x3 ,
and eliminating the constant from the regression. Was there still a problem of collinearity?
Explain. If there was, what was the linear dependency? If not, would there be a collinearity
problem if the constant was put back into the regression?
y = x(β) + u,
it was suspected that some of the explanatory variables in the vector x(β) of regression
functions might be correlated, even asymptotically, with the disturbances u. For that
reason, a matrix W of instrumental variables was used to perform the initial estimation of
the model. Explain how this estimation could be performed, either by writing down the
equations of which the solution is the nonlinear IV estimator β̂ IV , or by writing down a
criterion function that is minimised by β̂ IV .
The zero functions are, as usual, the residuals y − x(β), and so the method of moments
estimating equations are just ( )
W ⊤ y − x(β) = 0.
This works only if the parameters are just identified. Otherwise, we use estimates of the
instruments that are optimal under homoskedasticity and no serial correlation: These are
the columns of PW X(β0 ). Thus the moment conditions, in which the parameters and
instruments are estimated simultaneously, are
( )
X⊤(β)PW y − x(β) = 0.
These equations are also the first-order conditions for minimising the criterion function
( ) ( )
y − x(β) ⊤PW y − x(β) .
Subsequently, the model was re-estimated by nonlinear least squares. Detail how a Durbin-
Wu-Hausman (DWH) test can be performed on the basis of these two estimations. Explain
carefully how to bootstrap this test, and explain why a (relatively) simple bootstrap pro-
cedure can be used even when instrumental variables are used.
4
The DWH test is based on the “vector of constrasts”, β̂IV − β̂NLS . Arguments just like
those used for linear models show that this test is equivalent to a test of the null according
to which the regression functions are uncorrelated with the errors against an alternative
that maintains that assumption but includes the columns of PW X(β0 ) as extra regressors.
In practice we replace the unknown β0 by the NLS estimate β̂NLS .
The easiest test procedure is a GNR. The GNR that corresponds to the null hypothesis,
evaluated at the NLS estimates, is
where the hat refers to the NLS, not IV, estimates. Adding in the extra regressors gives
Usually, some columns of PW X̂ are the same as those of X̂, and so should be suppressed.
The test statistic could be a pseudo F statistic, or even nR2 , since the GNR under the
null is evaluated at the estimates under the null, and so yields zero fitted values.
For the bootstrap, the chosen test statistic is computed as above, either as an F statistic
(or t, if there is only one degree of freedom), or as nR2 . Then the bootstrap DGP is set
up so as to satisfy the null hypothesis, and use estimates obtained under that hypothesis:
These will be the NLS estimates β̂ and s2 , where s2 is the SSR from the NLS regression
divided by n − k. Thus the bootstrap DGP can be written as
5. State and sketch the proof of the Gauss-Markov theorem concerning the efficiency of
the OLS estimator of the parameters of a linear regression. Be sure to state clearly the
conditions that must be satisfied in order that the conclusion of the theorem should be true.
Explain clearly the sense of the term “efficiency” in the context of the theorem.
The Gauss-Markov theorem deals with the linear regression model with exogenous explana-
tory variables and white-noise disturbances (homoskedastic and serially uncorrelated). It
says that the OLS estimator is BLUE, that is, best within the class of linear, unbiased
estimators of the parameters of the model. Here, “best”, or “efficient”, means that the
5
difference between the covariance matrix of any other linear unbiased estimator and that
of the OLS estimator is positive semi-definite.
Many proofs can be found. Here is one. Let the regression model be denoted as y = Xβ+u.
Let a linear estimator take the form βA ≡ A⊤y, where A is an n×k matrix (usual notation).
In order for this estimator to be unbiased, we must have for arbitrary true β that
( )
E(A⊤y) = E A⊤(Xβ + u) = A⊤Xβ = β.
This implies that A⊤X = I, the k × k identity matrix. The OLS estimator can be written
as β̂OLS = (X⊤X)−1 X⊤y, and so
( )
β̂A − β̂OLS = A⊤ − (X⊤X)−1 X⊤ y.
Consequently,
Var(β̂A ) − Var(β̂OLS ) = Var(β̂A − β̂OLS ).
The right-hand side of this equation, being a covariance matrix, is positive semi-definite,
as required.
Consider the following two regressions:
y = X1 β1 + u, and
y = X1 β1 + X2 β2 + u.
It is plain that the second regression reduces to the first if β2 = 0. Suppose that you have
the results of the OLS estimation of both of these regressions. In terms of these results,
how would you compute an F statistic that would let you test the hypothesis that β2 = 0?
Denote by SSR0 and SSR1 the sums of squared residuals from the constrained and uncon-
strained regressions respectively. Then the F statistic is given by
6
It is not valid. This is because (S.06) implicitly assumes that Ω = σ 2 I, so that the
covariance matrix of β̂ is σ 2 (X⊤X)−1 . The numerator of F , apart from degrees-of-freedom
factors, can be shown to be
(for instance, use the threefold decomposition). This is distributed as chi-squared only if
Var(X2⊤M1 y) is proportional to X2⊤M1 X2 , while, with general Ω, the covariance matrix
is X2⊤M1 ΩM1 X2 .
6. In each of the following regressions, y is the dependent variable, x and z are explanatory
variables, and α, β, and γ are unknown parameters.
y = α + βx + γ/x + u,
y = α + βx + x/γ + u,
y = α + βx + z/γ + u,
y = α + βx + z/β + u,
y = α + βxz + u,
y = α + βγxz + γz + u,
y = α + βγx + γz + u,
y = α + βx + βx2 + u,
y = α + βx + γx2 + u,
y = α + βγx3 + u.
For each of these regressions, say whether it is possible to obtain a least-squares estimator
of the parameters. In other words, say whether each model is identified. If not, explain
why. If so, can the estimator be obtained by ordinary, that is, linear, least squares? If so,
give the explicit form of the linear regression to be used: write down the dependent variable
and the regressors.
The first regression is identified and can be done by OLS: regress y on ι, x, and 1/x.
The second is not identified, since it can be written as
( )
y = α + x β + 1/γ + u,
7
The seventh is exactly like the sixth, except that the second regressor is x as opposed
to xz.
The eighth is once more straight OLS: regress y on ι and x + x2 .
The ninth calls for the regression of y on ι, x, and x2 in order to get the three parameter
estimates.
The tenth is not identified, since β and γ occur only as the product βγ, and cannot be
separately identified.
7. In terms of the standard normal (N(0, 1)) distribution, give definitions of the chi-
squared (χ2 ) distribution with n degrees of freedom, the Student’s t distribution with n
degrees of freedom, and the F distribution with n and d degrees of freedom.
∑n
If the xi , i = 1, . . . , n, are mutually independent N(0, 1) variables, the variable i=1 x2i has
the√χ2 (n)∑distribution. If z is another N(0, 1) variable, independent of the xi , the variable
n
z/ n−1 i=1 x2i has the t(n) distribution. If the wj , j = 1, . . . , d, are still more mutually
∑n ∑d
independent N(0, 1) variables, independent of the xi , then (n−1 i=1 x2i )/(d−1 j=1 wj2 )
has the F (n, d) distribution.
Consider the regression
The following printout gives (some of) the results from running this regression:
(printout omitted)
What is the numerical value of n, the sample size?
The estimate of the residual variance, σ̂ 2 , is SSR/(n − k). We have SSR = 1283.780282,
k = 3, and σ̂ 2 = 22.522461. Thus
1283.780282
n−k = = 57.
22.522461
It follows that n = 60.
Consider the hypothesis that β2 = 0. Calculate numerically two statistics that can be used
to test this hypothesis, a t statistic and an F statistic. Show that the F statistic is the
square of the t statistic. Do you think that the hypothesis can be rejected by a test of
level 5%?
The t statistic can be read off directly from the printout: it is 3.172400. The F statistic is
given by the formula
∥PM1 x2 y∥2
F = , (S.07)
σ̂ 2
where M1 projects off the constant and x1 , and σ̂ 2 was given above. The numerator here
is ( )−1
y⊤M1 x2 x2⊤M1 x2 x2⊤M1 y. (S.08)
8
The key to this tricky question is to note that β̂2 = (x2⊤M1 x2 )−1 x2⊤M1 y, by FWL. This
turns (S.08) into
β̂22 (x2⊤M1 x2 ).
Now the estimate of the variance of β̂2 follows by FWL as well: it is σ̂ 2 (x2⊤M1 x2 )−1 .
But this is just the lower right element in the covariance matrix estimate above, 0.399578.
Putting all this into (S.07) gives
Both the t and F statistics lead to the rejection of the hypothesis at the 5% level.
Prove algebraically, by use of the FWL Theorem or otherwise, that the square of the
t statistic and the F statistic will always be the same for the test of β2 = 0 in (2), regardless
of the actual numerical values of the variables y, x1 , and x2 .
The t statistic is
β̂2
√ ,
σ̂ (x2 M1 x2 )−1
2 ⊤
Using FWL, we see that the numerator of this is (S.08), and so the whole expression
is (S.07), the F statistic.
Give the numerical value of the estimate of β1 − β2 , when the hypothesis that β2 = 0 is
not imposed. Also give the value of the estimated variance of β1 − β2 . Use the values you
have calculated to find the value of the t statistic for the hypothesis that β1 − β2 = 0. Do
the same for the hypothesis that β1 − β2 = 1.
We see that β̂1 − β̂2 = 3.411438 − 2.005344 = 1.406094. This value is a linear combination
of the parameter estimates, and its variance is estimated by (see the covariance matrix
estimate in the printout)
[ ][ ]
0.382308 0.106485 1
[1 −1 ] = 0.568916.
0.106485 0.399578 −1
√
The t statistic
√ for β1 − β2 = 0 is thus 1.406094/ 0.568916 = 1.864191. For β1 − β2 = 1 it
is 0.406094/ 0.568916 = 0.538397.
9
The regression of y on a constant and x1 + x2 imposes the restriction that β1 − β2 = 0
on the unrestricted regression (2). The t statistic for this restriction is 1.864191, and
the F statistic is thus the square of this, 3.475208. Writing SSR0 for the sum of squared
residuals from the restricted regression, and SSR1 for that from the unrestricted regression,
we have that
SSR0 − SSR1 57
F = .
SSR1 1
Since SSR1 = 1283.780282, we get
y = α + β1 x1 + β2 x2 + u, (3)
where
ut = ρ1 ut−1 + ρut−2 + εt , (4)
with εt a white noise process.
Set up a nonlinear regression model equivalent to the model above, but with white noise
disturbances. Construct the GNR (Gauss-Newton regression) that corresponds to this
nonlinear regression.
εt = ut − ρ1 ut−1 − ρ2 ut−2
= yt − α − β1 xt1 − β2 xt2 − ρ1 (yt−1 − α − β1 xt−1,1 − β2 yt−1,2 )
− ρ2 (yt−2 − α − β1 xt−2,1 − β2 xt−2,2 ).
10
First, (3) was estimated by OLS:
(printout omitted)
and then the residuals of this were regressed on the constant, x1 , x2 , and the residuals
lagged once and twice, with the results:
(printout omitted)
Next, another OLS regression was performed with the second lags of all the variables except
the constant as extra regressors:
(printout omitted)
The artificial variables of the GNR corresponding to a nonlinear model based on (3) and
(4), but with ρ1 = 0, were evaluated at parameters computed from the first four parameter
estimates from the regression immediately above. The GNR was then run, with results:
(printout omitted)
Explain all this. In particular, why is it appropriate to set ρ1 = 0? Is there evidence of
first or second order serial correlation? If so, what? Can you obtain asymptotically efficient
estimates of the nonlinear model based on (3) and (4)? If so, what are they? What about
this model with the restriction that ρ1 = 0? Are there common factor restrictions imposed
in either of these models? If so, give their algebraic expression. Can you test them for
either of the models? If so, do so.
Consider first the results of the second printout, the regression of the OLS residuals from (3)
on the regressors of (3) and the first and second lags of the residuals. This can be inter-
preted as the GNR (S.10) with ρ1 = ρ2 = 0, and α, β1 , and β2 equal to their OLS estimates.
It is therefore the GNR for the model (S.09) evaluated at the parameter estimates of (3),
which can be thought of as (S.09) subject to the restrictions ρ1 = ρ2 = 0. A statistic to
test these restrictions is the F statistic based on the SSRs of the first two printouts:
Here n = 100, k = 5, k2 = 2. One can certainly reject the null that these restrictions are
satisfied, and so there is clear evidence of first- or second-order serial correlation (or both).
Notice next that the t statistic for res1 is the second printout is -0.060027. Within the
context of the model (S.09), this completely fails to reject the hypothesis that ρ1 = 0, and
so it makes sense to impose that as a restriction.
Asymptotically efficient estimates of the model (S.09) can be obtained by taking one Gauss-
Newton step from root-n consistent estimates of these parameters. For the coefficients
in (3), we have the estimates
and these are root-n consistent even if there is serial correlation. For the autoregressive
parameters, appropriate estimates are given in the second printout as the coefficients of
the lagged residuals:
ρ̂1 = −0.004434, ρ̂2 = 0.724269.
11
In order to take the Gauss-Newton step, we have to run the GNR (S.10) with its variables
evaluated at these arguments. But no results are given for this, and so we cannot obtain
asymptotically efficient parameter estimates for the model (S.09), equivalent to that defined
by (3) and (4), on the basis of the results provided.
However, consider the model with the restriction ρ1 = 0 imposed. The model itself can be
written as
yt = α(1 − ρ2 ) + β1 xt1 + β2 xt2 + ρ2 yt−2 − ρ2 β1 xt−2,1 − ρ2 β2 xt−2,2 + εt . (S.11)
The corresponding GNR is
yt − α(1 − ρ2 ) − β1 xt1 − β2 xt2 − ρ2 yt−2 + ρ2 β1 xt−2,1 + ρ2 β2 xt−2,2
= (1 − ρ2 )bα + (xt1 − ρ2 xt−2,1 )b1 + (xt2 − ρ2 xt−2,2 )b2
+(yt−2 − α − β1 xt−2,1 − β2 xt−2,2 )r2 . (S.12)
Consider the regression the results of which are given in the third printout:
yt = γ0 + γ1 xt1 + γ2 xt2 + γ3 yt−2 + γ4 xt−2,1 + γ5 xt−2,2 . (S.13)
This is (S.11) reparametrised so that:
γ0 = α(1 − ρ2 ), γ 1 = β1 , γ2 = β2 , γ3 = ρ2 , γ4 = ρ2 β1 , γ5 = ρ2 β2 . (S.14)
But there are only four parameters in (S.11), while there are six in (S.13). Thus (S.11)
imposes two restrictions on the unrestricted (S.13). They can be expressed as
γ4 = γ3 γ1 and γ5 = γ3 γ2 .
These are the common-factor restrictions imposed in the model (S.11).
Notice now that the unrestricted model (S.13) allows us to obtain root-n consistent esti-
mates of the restricted model (S.11). The question suggests that these should be obtained
from the first four estimates from (S.13), and, from (S.14), we find that
β̂1 = γ̂1 = 0.930298, β̂2 = γ̂2 = −0.893355, ρ̂2 = γ̂3 = 0.719486,
α̂ = γ̂0 /(1 − γ̂3 ) = 10.039703/(1 − 0.719486) = 35.790381.
The fourth printout records the results of running the GNR (S.12) with its variables
evaluated at the estimates immediately above, except that a plain constant is used instead
of the correct regressor for α, namely 1 − ρ2 . For the model with the restriction ρ1 = 0
imposed, we can indeed obtain asymptotically efficient estimates by taking one Gauss-
Newton step:
β̃1 = 0.930298 + 0.069652 = 0.99995, β̃2 = −0.893355 + 0.013008 = −0.880347,
ρ̃2 = 0.719486 + 0.004628 = 0.724114,
α̃ = 35.790381 − 14.311977/(1 − 0.719486) = −15.230163.
In order to test the common-factor restrictions in (S.11), we may use the F -statistic based
on the unrestricted model (S.13) and the restricted model (S.11), usinf the SSRs in print-
outs 3 and 4 respectively. We get
85164.283942 − 84131.807830 98 − 6
F = × = 0.450675.
84131.807830 2
This makes it plain that we fail to reject the common-factor restrictions.
12
9. In the linear regression model
y = Xβ + u,
the OLS estimator is no longer consistent if the regressors X and the disturbances u are
correlated. Explain why this is so, and give an explicit expression for the asymptotic
bias. Show how to obtain a consistent estimator for the parameter vector β by the use of
instrumental variables. State the conditions that the instruments must satisfy, and show
that the estimator you suggest is indeed consistent under these conditions.
The OLS estimator is defined by the estimating equation X⊤(y − (Xβ) = 0. If the )
DGP is y = Xβ0 + u, the left-hand side of the equation is X⊤ u − X(β − β0 ) ,
of which the expectation is E(X⊤u) − E(X⊤X)(β − β0 ). This expectation is zero for
( )−1
β = β0 + E (X⊤X) E(X⊤u). In the asymptotic limit, therefore, the estimator tends in
( )−1
probability to β0 plus the limit of E (X⊤X) E(X⊤u). This last limit is the asymptotic
bias.
For an estimator using instrumental variables, the estimating equation is W ⊤(y−Xβ) = 0,
where W is an n × k( matrix, the columns
) of which are the instrumental variables. The
left-hand side is W ⊤ u − X(β − β0 ) . In order for the expectation of this expression
to be zero when β = β0 , it is necessary and sufficient that E(W ⊤u) = 0. This condi-
tion, which guarantees consistency, implies that the instruments are either exogenous or
predetermined.
13