Panel Data Lecture Notes
Panel Data Lecture Notes
Chapter 14:
Advanced panel data methods
Fixed effects estimators
We discussed the first difference (FD) model
as one solution to the problem of unobserved
heterogeneity in the context of panel data. It is
not the only solution; the leading alternative is
the fixed effects model, which will be a better
solution under certain assumptions.
For a model with a single explanatory variable,
yit = 1xit + ai + uit
(1)
(2)
(3)
defining the demeaned data on [y, x] as the observations of each panel with their mean values
per individual removed. This algebra is known
as the within transformation, and the estimator we derive is known as the within estimator.
Just as OLS in a cross-sectional context only
explains the deviations of y from its mean
y, the within estimators explanatory value is
derived from the comovements of y around
its individual-specific mean with x around its
individual-specific mean. Thus, it matters not
if a unit has consistently high or low values of
y and x. All that matters is how the variations
around those mean values are correlated.
results, but they are both unbiased estimators of the underlying coefficient vector. Both
are consistent with T fixed as N . For
large N and small T (a common setup in many
datasets) we might be concerned with relative
efficiency. When the uit are serially uncorrelated (given that they are homoskedastic, this
amounts to saying they are i.i.d.) FE will be
more efficient than FD, and the standard errors reported from FE are valid. We often may
assume serially uncorrelated errors, but there
is no reason why that condition will necessarily
hold in the data. If uit follows a random walk
process, then its differences will be uncorrelated, and first differencing will be the appropriate estimator. But we may often encounter
an error process with some serial correlation,
but not necessarily a random walk process.
When T is large and N is not very large (for
instance, when we have many time periods
(4)
(5)
u2
(u2 + T a2)
(6)
inefficient relative to the RE alternative. Because the FE model is equivalent to the LSDV
formulation, it involves the loss of N degrees
of freedom. Given that the ai may be considered as nuisance parameters, if we do not care
about their values, we might rather apply RE
and substantially reduce the degrees of freedom lost in estimation: especially important if
T is small.
We do not know , of course, so we must consistently estimate it. The ability to do so involves the crucial assumption that cov(xit, ai) =
0: the unobservable individual effects must be
independently distributed of the regressors. If
our estimate of is close to zero, the RE estimates will be similar to those of a pooled OLS
model. If our estimate of is close to one,
the RE estimates will be similar to those of a
FE model. The RE estimator may be chosen
in Stata by giving the command xtreg depvar
we are interested in testing the effect of a timeinvariant variable, RE can yield such an estimate, but we should include all available timeinvariant variables as controls to try to ensure
that the independence assumption is satisfied.
If we are interested in evaluating the effect of a
time-varying explanatory variable, can we justify the use of RE? Yes, but in realistic terms
probably only in the case where the key variable is set randomly. For instance, if students
are assigned randomly to sections of a course
or home rooms in a K-12 context, RE would
be appropriate given that the assignment variable would not be correlated with unobservables such as aptitude. On the other hand, if
students are grouped by ability or test scores
and assigned to home rooms accordingly, the
assignment variable will not be independent of
the unobservable individual aptitude, and RE
will be inconsistent.
errors are likely to be correlated with one another. The cluster covariance matrix estimator allows for error variances to differ between
clusters (but not within clusters), as well as
allowing for correlations between errors in the
same cluster (but not between clusters). Ignoring these correlations will cause estimated
standard errors to be biased and inconsistent.
It may be invoked in regress and many other
commands with the ,cluster(id) option, where
id specifies the name of an integer variable
denoting cluster membership. The values of
id need not be consecutive. When estimating cluster standard errors, it is important that
there are more clusters than regressors in the
model. In practical terms, this rules out the
case that a panel identifier is specified as the
cluster id and individual-specific constant terms
are estimated. However, that does not rule out
use of the cluster option in a FE mode because
the fixed effects and random effects estimators, whose large-sample justification is based
on small T, large N datasets as N , the
SU R estimator is based on the large-sample
properties of large T, small N datasets as
T . In that context, it may be considered
a multiple time series estimator.
Equation i of the SU R model is:
yi = Xii + i, i = 1, . . . , N
(8)
where yi is the ith equations dependent variable and Xi is the matrix of regressors for the
ith equation, on which we have T observations.
The disturbance process = [01, 02, . . . , 0N ]0 is
assumed to have an expectation of zero and
a covariance matrix of . We will only consider the case where we have T observations
per equation, although it is feasible to estimate
the model with an unbalanced panel. Note also
that although each Xi matrix will have T rows,
(9)
(10)
and
1 = 1 I
(11)
When will this estimator provide a gain in efficiency over equation-by-equation OLS? First,
if the ij , i 6= j are actually zero, there is no
gain. Second, if the Xi matrices are identical
across equationsnot merely having the same
variable names, but containing the same numerical valuesthen GLS is identical to equationby-equation OLS, and there is no gain. Beyond these cases, the gain in efficiency depends on the magnitude of the cross-equation
contemporaneous correlations of the residuals.
The higher are those correlations, the greater
the gain. Furthermore, if the Xi matrices
columns are highly correlated across equations,
the gains will be smaller.
The feasible SU R estimator requires a consistent estimate of , the N N contemporaneous covariance matrix of the equations disturbance processes. The representative element ij , the contemporaneous correlation be-
data are set up in the long format more commonly used with panel data, the reshape command may be used to place them in the wide
format. It is an attractive estimator relative to
pooled OLS, or even in comparison with fixed
effects, in that SU R allows each unit to have
its own coefficient vector. Not only the constant term differs from unit to unit, but each
of the slope parameters differ as well across
units, as does 2, which is constrained to be
equal across units in pooled OLS, fixed effects
or random effects estimators.
Standard F -tests may be used to compare the
unrestricted SU R results with those that may
be generated in the presence of linear constraints, such as cross-equation restrictions (see
constraint). Cross-equation constraints correspond to the restriction that a particular regressors effect is the same for each panel unit.
The isure option may be used to iterate the
estimates, as described above.
(14)
to some degree, their coefficients may be seriously biased as well. Note also that this bias
is not caused by an autocorrelated error process . The bias arises even if the error process is i.i.d. If the error process is autocorrelated, the problem is even more severe given
the difficulty of deriving a consistent estimate
of the AR parameters in that context. The
same problem affects the one-way random effects model. The ui error component enters
every value of yit by assumption, so that the
lagged dependent variable cannot be independent of the composite error process.
A solution to this problem involves taking first
differences of the original model. Consider a
model containing a lagged dependent variable
and a single regressor X:
yit = 1 + yi,t1 + Xit2 + ui + it
(15)
(16)
There is still correlation between the differenced lagged dependent variable and the disturbance process (which is now a first-order
moving average process, or M A(1)): the former contains yi,t1 and the latter contains i,t1.
But with the individual fixed effects swept out,
a straightforward instrumental variables estimator is available. We may construct instruments for the lagged dependent variable from
the second and third lags of y, either in the
form of differences or lagged levels. If is i.i.d.,
those lags of y will be highly correlated with the
lagged dependent variable (and its difference)
but uncorrelated with the composite error process. Even if we had reason to believe that
might be following an AR(1) process, we could
still follow this strategy, backing off one period and using the third and fourth lags of y
(17)
where Xit includes strictly exogenous regressors, Wit are predetermined regressors (which
may include lags of y) and endogenous regressors, all of which may be correlated with ui, the
unobserved individual effect. First-differencing
the equation removes the ui and its associated
omitted-variable bias. The ArellanoBond estimator sets up a generalized method of moments (GM M ) problem in which the model is
specified as a system of equations, one per
time period, where the instruments applicable
to each equation differ (for instance, in later
time periods, additional lagged values of the
instruments are available). The instruments
include suitable lags of the levels of the endogenous variables (which enter the equation
in differenced form) as well as the strictly exogenous regressors and any others that may be
specified. This estimator can easily generate
an immense number of instruments, since by
period all lags prior to, say, ( 2) might be
individually considered as instruments. If T is
nontrivial, it is often necessary to employ the
option which limits the maximum lag of an instrument to prevent the number of instruments
from becoming too large. This estimator is
available in Stata as xtabond.
A potential weakness in the ArellanoBond DP D
estimator was revealed in later work by Arellano and Bover (1995) and Blundell and Bond
(1995). The lagged levels are often rather
poor instruments for first differenced variables,
especially if the variables are close to a random
walk. Their modification of the estimator includes lagged levels as well as lagged differences. The original estimator is often entitled
difference GMM, while the expanded estimator
is commonly termed System GMM. The cost
of the System GMM estimator involves a set of
additional restrictions on the initial conditions
of the process generating y.