LN 13
LN 13
Panel Data
1 Introduction
A major theme in our discussion of causal effects and structural equations modeling is that
causal or structural parameters may not correspond to parameters associated with condi-
tional mean functions or best linear predictors. We looked for situations where the treat-
ment assignment was not “confounded” with the potential outcomes, or where we could find
instruments—exogenous variables which satisfied exclusion restrictions, but which were related
to the included endogenous variables—as sources of identification in such cases. Panel data
provide an alternative route to identification of structural parameters, which complements
these other techniques.
Consider a firm which produces output using a technology described by a Cobb-Douglas pro-
duction function:
Y (K, L) = AK γ1 Lγ2 . (1)
Here K is capital input, L is labor input, and Y is output. If γ1 +γ2 < 1 (so there are decreasing
returns to scale), we might suppose that the firm chooses capital and labor to maximize profits,
taking output price py and input prices pk , pl as given. In this case, factor demands for K
and L will be increasing in A, since firms will hire inputs until their marginal revenue product
equals their price.
We are interested in connecting this economic model to empirical data on firms. We will be
working with data based on many firms, so we need to specify the production functions for
1
each firm. Suppose we can write, for i = 1, . . . , n,
Yi (K, L) = Ai K γ1 Lγ2 .
This expresses the notion that each firm has a Cobb-Douglas production function with common
coefficients γ1 , γ2 . However, there are differences in how efficient the firms are, which arises
from variation in Ai across firms.
Here, K, L are not “data” but are simply arguments in the function Yi (K, L). (This is much
like the supply demand example we considered in the last lecture note.) We will use Ki and
Li to denote the amounts of capital and labor actually chosen by the firm, and Yi to denote
the actual output of the firm. Then the Cobb-Douglas model implies that
Yi = Ai Kiγ1 Lγi 2 ,
or taking logs:
log Yi = log Ai + γ1 log Ki + γ2 log Li .
To simplify notation, let us define yi ≡ log Yi , and similarly for Ai , Ki , and Li . Then we can
write
yi = b + γ1 ki + γ2 li + ui ,
where b ≡ E(ai ), and ui = ai − b. This looks like a classical regression model for yi given ki
and li . However, in the classical regression model, the disturbance is assumed to satisfy
E(ui |k, l) = 0.
So we would be assuming that ui is mean-independent of ki and li . But recall that under price-
taking and profit-maximization, we would expect that ki and li are related quite strongly to
efficiency ai and hence to ui . In this case, doing simple OLS will lead to biased and inconsistent
estimates of the structural parameters.
One possible solution emerges if the firms are observed in multiple time periods. Suppose for
each firm, we observe output and measured inputs in each of T years. We will denote these
observations by (yit , kit , lit ), for i = 1, . . . , n, t = 1, . . . , T . This is an example of panel data.
In general, the term panel data refers to any data with a natural grouping structure. Another
example of panel data is data on earnings and other variables for each sibling in a family, for
a large number of families. Suppose that our previous model continues to hold, so that
2
Here ait is interpreted as a measure of firm i’s efficiency at time t. If we write ait ≡ αi + uit ,
then we can write our model as
E(uit |l, k, α1 , . . . , αn ) = 0.
Then the model is in the form of a classical regression model, except that there is a different
intercept term for each firm. The connection is even stronger if we define dummy variables
(
1 if i=j
dit,j =
0 otherwise.
Write xit ≡ (kit , lit )0 , dit ≡ (dit,1 , . . . , dit,n )0 , γ ≡ (γ1 , γ2 )0 and α = (α1 , . . . , αn ). Then
3 Fixed Effects
Our model is
E(yit |X) = x0it γ + d0it γ, (2)
where xit is a k × 1 vector of regressors (which does not include a constant), and dit is a n × 1
vector of dummy variables as defined above, and X is interpreted to contain all the regressors
3
and the dummy variables. Let β = (γ 0 , α0 )0 , and
x011
1 0 ··· 0 y11
.. .. .. .. ..
. . . . .
x0 1 0 ··· 0 y1T
1T
x0 0 1 ··· 0 y
21 21
. .. .. .. .
.. . . . ..
X= 0 , y=
0 1 ···
x2T 0 y2T
.. .. .. .. ..
. . . . .
0
xn1
0 0 ··· 1
yn1
.. .. .. .. ..
. . . . .
x0nT 0 0 ··· 1 ynT
The least-squares estimate β̂ is often called the “fixed effects” estimate, or FE for short.
Another name is the least-square dummy variables (LSDV) estimator. It is typically the case
that n, the cross-sectional dimension, is large relative to T . In this case, X can contain a large
number of columns because there are many dummy variables. This means that X 0 X is a large
matrix, possibly difficult to invert on a computer with limited memory. The following results
can be used to simplify the calculations:
Result 1
γ̂ = (Xw0 Xw )−1 Xw0 yw ,
and
V ar(γ̂) = σ 2 (Xw0 Xw )−1 ,
4
where
(x11 − x̄1 )0 y11 − ȳ1
.. ..
.
.
(x − x̄ )0 y − ȳ
1T 1 1T 1
Xw = .
.
,
yw = .
.
,
.
.
(xn1 − x̄n )0 yn1 − ȳn
.. ..
. .
(xnT − x̄n )0 ynT − ȳn
T T
1X 1X
x̄i = xit , ȳi = yit .
T T
t=1 t=1
Stated in this form, the estimator is often called the within estimator, because it is based
on deviations from within-firm averages. The interpretation of this result is that in order to
get the least-squares estimates for γ, one can perform the shorter regression given above. To
obtain an estimate of the variance, the following is useful:
Result 2:
yit − x0it γ̂ − α̂i = (yit − ȳi ) − (xit − x̄i )0 γ̂.
Thus
1
s2 = (yw − Xw γ̂)0 (yw − Xw γ̂).
nT − k − n
Notice that this is different from the variance estimate that a “canned” least-squares routine
applied to the previous short regression would produce.
4 Random Effects
There is an alternative approach to working with panel data models which connects nicely to
GLS estimation. For simplicity assume all variables are measured in deviations from (grand)
means. (Otherwise we could let the vector xit include a constant and all of what follows would
go through.) Assume that the αi are i.i.d. with
5
where y is as defined before and
x011 α1 + u11
. ..
..
.
x0 α +u α1 l
1T 1 1T
. .. .
.. ,
X1 = = . = .. + u.
0
xn1
αn + un1
αn l
.. ..
. .
x0nT αn + unT
E(|X) = 0,
and
σα2 ll0 0 · · ·
0
.. .. .. 2
V (|X) = . + σ InT ≡ Ω.
. .
0 0 ··· σα2 ll0
We can think of this as an “error components” model in that there is a composite error term
arising from the αi and the uit ; this gives the variance matrix a particular correlation structure.
In any case, now we have a generalized regression model with variance matrix Ω. It would be
natural to apply GLS. This leads to the estimator
The GLS estimator in this model is called the random effects (RE) estimator. It can be shown
to have the following form, which leads to some additional insight.
Result 3:
γ̂GLS = (Xw0 Xw + rXb0 Xb )−1 (Xw0 Xw γ̂w + rXb0 Xb γ̂b ),
6
where Xw and yw are the deviations from means as defined before, and
x̄01 ȳ1
. .
.. ..
x¯ 0 y¯
1 1
. .
Xb = .. , yb = .. ,
0
x̄n
ȳn
.. ..
. .
x̄0n ȳn
In order to implement feasible GLS in the RE model, we only need estimates of σ 2 and σα2 .
With these we can form an estimate of Ω; alternatively, we can form an estimate of r and use
the between and within estimates as in Result 3.
An easy way to estimate σ 2 is to just use the variance estimate σ̂ 2 arising from the FE estimator.
Based on Result 2, this can be obtained using the within estimator.
To estimate σα2 , we can use the between estimates. Notice that the equation being estimated
by the between estimator is
ȳi = x̄i 0 γ + vi ,
where
vi ≡ αi + ūi .
The variance of the disturbance term is
σ2
σv2 = σα2 + .
T
7
Thus, if we use the residuals from the between estimates to form a variance estimate σˆv 2 , we
can obtain an estimate of σα2 by
σ̂ 2
σˆα 2 = σˆv 2 − .
T
One might expect that the random effects estimator is superior to the fixed effects estimator.
After all, it is the GLS estimator; moreover, the previous discussion shows that the fixed
effects estimator is a limiting case of RE, corresponding to situations where the variation in
the individual effects is large. Since the feasible version can actually estimate the variance of the
individual effects, this would seem preferable to assuming it is arbitrarily large. However, there
is a very strong assumption built in to the random effects estimator: this is the assumption
that the disturbances, including αi , are orthogonal to the explanatory variables. Going back
to our production function example, this was exactly the case we wanted to avoid. So the
RE estimator may not be appropriate for that case; in other applications where the omitted
variables interpretation of αi is less relevant, this may be less of an issue.