0% found this document useful (0 votes)
8 views8 pages

LN 13

yyyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

LN 13

yyyy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Economics 625, LN 13:

Panel Data

November 12, 2001

1 Introduction

A major theme in our discussion of causal effects and structural equations modeling is that
causal or structural parameters may not correspond to parameters associated with condi-
tional mean functions or best linear predictors. We looked for situations where the treat-
ment assignment was not “confounded” with the potential outcomes, or where we could find
instruments—exogenous variables which satisfied exclusion restrictions, but which were related
to the included endogenous variables—as sources of identification in such cases. Panel data
provide an alternative route to identification of structural parameters, which complements
these other techniques.

2 Cobb-Douglas Production Function

2.1 Cross-sectional Analysis

Consider a firm which produces output using a technology described by a Cobb-Douglas pro-
duction function:
Y (K, L) = AK γ1 Lγ2 . (1)
Here K is capital input, L is labor input, and Y is output. If γ1 +γ2 < 1 (so there are decreasing
returns to scale), we might suppose that the firm chooses capital and labor to maximize profits,
taking output price py and input prices pk , pl as given. In this case, factor demands for K
and L will be increasing in A, since firms will hire inputs until their marginal revenue product
equals their price.
We are interested in connecting this economic model to empirical data on firms. We will be
working with data based on many firms, so we need to specify the production functions for

1
each firm. Suppose we can write, for i = 1, . . . , n,

Yi (K, L) = Ai K γ1 Lγ2 .

This expresses the notion that each firm has a Cobb-Douglas production function with common
coefficients γ1 , γ2 . However, there are differences in how efficient the firms are, which arises
from variation in Ai across firms.
Here, K, L are not “data” but are simply arguments in the function Yi (K, L). (This is much
like the supply demand example we considered in the last lecture note.) We will use Ki and
Li to denote the amounts of capital and labor actually chosen by the firm, and Yi to denote
the actual output of the firm. Then the Cobb-Douglas model implies that

Yi = Ai Kiγ1 Lγi 2 ,

or taking logs:
log Yi = log Ai + γ1 log Ki + γ2 log Li .
To simplify notation, let us define yi ≡ log Yi , and similarly for Ai , Ki , and Li . Then we can
write
yi = b + γ1 ki + γ2 li + ui ,
where b ≡ E(ai ), and ui = ai − b. This looks like a classical regression model for yi given ki
and li . However, in the classical regression model, the disturbance is assumed to satisfy

E(ui |k, l) = 0.

So we would be assuming that ui is mean-independent of ki and li . But recall that under price-
taking and profit-maximization, we would expect that ki and li are related quite strongly to
efficiency ai and hence to ui . In this case, doing simple OLS will lead to biased and inconsistent
estimates of the structural parameters.

2.2 The Role of Panel Data

One possible solution emerges if the firms are observed in multiple time periods. Suppose for
each firm, we observe output and measured inputs in each of T years. We will denote these
observations by (yit , kit , lit ), for i = 1, . . . , n, t = 1, . . . , T . This is an example of panel data.
In general, the term panel data refers to any data with a natural grouping structure. Another
example of panel data is data on earnings and other variables for each sibling in a family, for
a large number of families. Suppose that our previous model continues to hold, so that

yit = ait + γ1 kit + γ2 lit .

2
Here ait is interpreted as a measure of firm i’s efficiency at time t. If we write ait ≡ αi + uit ,
then we can write our model as

yit = αi + γ1 kit + γ2 lit + uit .

We could interpret αi as capturing firm-specific inputs, such as management quality, which do


not change over time. We might then assume that

E(uit |l, k, α1 , . . . , αn ) = 0.

Then the model is in the form of a classical regression model, except that there is a different
intercept term for each firm. The connection is even stronger if we define dummy variables
(
1 if i=j
dit,j =
0 otherwise.

Write xit ≡ (kit , lit )0 , dit ≡ (dit,1 , . . . , dit,n )0 , γ ≡ (γ1 , γ2 )0 and α = (α1 , . . . , αn ). Then

E(yit |X) = x0it γ + d0it α.

which is in the form of the classical regression model.


It is important to note the rather strong assumptions that we have made use of so far. The
strongest is that only the constant part of unmeasured efficiency ait = αi + uit is relevant for
the firm’s choice of kit and lit . This might be reasonable if uit represents factors which cannot
be predicted at the time the firm chooses its factor inputs. For example, if we are looking at
farms’ production of some crop, this could be affected by changes in weather which are not
anticipated at the time the inputs are chosen. But if this assumption is not reasonable, then
it may be difficult to estimate the parameters of the production function reliably.

3 Fixed Effects

Our model is
E(yit |X) = x0it γ + d0it γ, (2)
where xit is a k × 1 vector of regressors (which does not include a constant), and dit is a n × 1
vector of dummy variables as defined above, and X is interpreted to contain all the regressors

3
and the dummy variables. Let β = (γ 0 , α0 )0 , and

x011
   
1 0 ··· 0 y11
 .. .. .. ..   .. 
 . . . .   . 
   
 x0 1 0 ··· 0   y1T 
 1T   
 x0 0 1 ··· 0   y 
 21   21 
 . .. .. ..   . 
 .. . . .   .. 
X= 0 , y=
   
0 1 ···

 x2T 0   y2T 
 .. .. .. ..   .. 
   
 . . . .   . 
 0   
 xn1
 0 0 ··· 1  
 yn1 
 
 .. .. .. ..   .. 
 . . . .   . 
x0nT 0 0 ··· 1 ynT

Then the least squares estimator is


!
γ̂
β̂ = = (X 0 X)−1 X 0 y,
α̂

with corresponding variance estimator


n T
1 1 XX
2
s = (y − X β̂)0 (y − X β̂) = (yit − x0it γ̂ − α̂i )2 .
nT − k − n nT − k − n
i=1 t=1

The least-squares estimate β̂ is often called the “fixed effects” estimate, or FE for short.
Another name is the least-square dummy variables (LSDV) estimator. It is typically the case
that n, the cross-sectional dimension, is large relative to T . In this case, X can contain a large
number of columns because there are many dummy variables. This means that X 0 X is a large
matrix, possibly difficult to invert on a computer with limited memory. The following results
can be used to simplify the calculations:

Result 1
γ̂ = (Xw0 Xw )−1 Xw0 yw ,
and
V ar(γ̂) = σ 2 (Xw0 Xw )−1 ,

4
where    
(x11 − x̄1 )0 y11 − ȳ1
 ..   .. 

 . 


 . 

 (x − x̄ )0   y − ȳ 
 1T 1   1T 1 

Xw =  .
.

,

yw =  .
.

,
 . 
 . 
 (xn1 − x̄n )0   yn1 − ȳn 
   
.. ..
   
. .
   
   
(xnT − x̄n )0 ynT − ȳn
T T
1X 1X
x̄i = xit , ȳi = yit .
T T
t=1 t=1
Stated in this form, the estimator is often called the within estimator, because it is based
on deviations from within-firm averages. The interpretation of this result is that in order to
get the least-squares estimates for γ, one can perform the shorter regression given above. To
obtain an estimate of the variance, the following is useful:

Result 2:
yit − x0it γ̂ − α̂i = (yit − ȳi ) − (xit − x̄i )0 γ̂.
Thus
1
s2 = (yw − Xw γ̂)0 (yw − Xw γ̂).
nT − k − n
Notice that this is different from the variance estimate that a “canned” least-squares routine
applied to the previous short regression would produce.

4 Random Effects

There is an alternative approach to working with panel data models which connects nicely to
GLS estimation. For simplicity assume all variables are measured in deviations from (grand)
means. (Otherwise we could let the vector xit include a constant and all of what follows would
go through.) Assume that the αi are i.i.d. with

E(αi |X) = 0, V (αi |X) = σα2 .

Defining it = αi + uit , we can write the model as

yit = x0it γ + it , i = 1, . . . , n; t = 1, . . . , T.

Stacking the observations:


y = X1 γ + ,

5
where y is as defined before and
   
x011 α1 + u11
 .  ..
 .. 
 
 

 . 
 
 x0   α +u  α1 l

 1T   1 1T 
 .   ..   . 
 ..  ,
X1 =   = .  =  ..  + u.
 
 0 
 xn1 

 αn + un1 

αn l
 ..  ..
   
 .  .
 
 
x0nT αn + unT

Here l is a T × 1 vector of ones. We can write

E(|X) = 0,

and
σα2 ll0 0 · · ·
 
0
 .. .. .. 2
V (|X) =  .  + σ InT ≡ Ω.

. .
0 0 ··· σα2 ll0
We can think of this as an “error components” model in that there is a composite error term
arising from the αi and the uit ; this gives the variance matrix a particular correlation structure.
In any case, now we have a generalized regression model with variance matrix Ω. It would be
natural to apply GLS. This leads to the estimator

γ̂GLS = (X10 Ω−1 X1 )−1 X10 Ω−1 y.

The GLS estimator has variance

V (γ̂GLS |X) = (X10 Ω−1 X1 )−1 .

4.1 Within and Between

The GLS estimator in this model is called the random effects (RE) estimator. It can be shown
to have the following form, which leads to some additional insight.

Result 3:
γ̂GLS = (Xw0 Xw + rXb0 Xb )−1 (Xw0 Xw γ̂w + rXb0 Xb γ̂b ),

6
where Xw and yw are the deviations from means as defined before, and
   
x̄01 ȳ1
 .   . 
 ..   .. 
   
 x¯ 0   y¯ 
 1   1 
 .   . 
Xb =  ..  , yb =  ..  ,
 
 0 
 x̄n 

 ȳn 

 ..   .. 
   
 .   . 
x̄0n ȳn

γ̂w = (Xw0 Xw )−1 Xw0 yw , γ̂b = (Xb0 Xb )−1 Xb0 yb ,


and
σα2 −1
r = (1 + T ) .
σ2
Notice that γ̂w is just the fixed effects (within) estimator. We will call γ̂b the “between” estima-
tor because it only uses variation between units. The RE estimator is a weighted combination
of the within and between estimates, where the weight depends on the relative variances of the
individual “effect” αi and the idiosyncratic disturbance uit . As σα2 → ∞, r → 0 and we obtain
the FE estimator.

4.2 Feasible GLS in the Random Effects Model

In order to implement feasible GLS in the RE model, we only need estimates of σ 2 and σα2 .
With these we can form an estimate of Ω; alternatively, we can form an estimate of r and use
the between and within estimates as in Result 3.
An easy way to estimate σ 2 is to just use the variance estimate σ̂ 2 arising from the FE estimator.
Based on Result 2, this can be obtained using the within estimator.
To estimate σα2 , we can use the between estimates. Notice that the equation being estimated
by the between estimator is
ȳi = x̄i 0 γ + vi ,
where
vi ≡ αi + ūi .
The variance of the disturbance term is
σ2
σv2 = σα2 + .
T

7
Thus, if we use the residuals from the between estimates to form a variance estimate σˆv 2 , we
can obtain an estimate of σα2 by
σ̂ 2
σˆα 2 = σˆv 2 − .
T

4.3 Comparing the FE and RE estimators

One might expect that the random effects estimator is superior to the fixed effects estimator.
After all, it is the GLS estimator; moreover, the previous discussion shows that the fixed
effects estimator is a limiting case of RE, corresponding to situations where the variation in
the individual effects is large. Since the feasible version can actually estimate the variance of the
individual effects, this would seem preferable to assuming it is arbitrarily large. However, there
is a very strong assumption built in to the random effects estimator: this is the assumption
that the disturbances, including αi , are orthogonal to the explanatory variables. Going back
to our production function example, this was exactly the case we wanted to avoid. So the
RE estimator may not be appropriate for that case; in other applications where the omitted
variables interpretation of αi is less relevant, this may be less of an issue.

You might also like