Slides On Panel Data Analysis
Slides On Panel Data Analysis
fi/~bepa/
1
Panel Data or longitudinal data consists of
time series for each statistical unit in the
cross section. In other words, we randomly
select our cross section only once, and once
that is done, we follow each statistical unit
within this cross section over time. Thus all
cross sections are equally large and consist of
the same statistical units.
2
2.2 Independent Pooled Cross Sections
3
We can also interact a time dummy with key
explanatory variables to see if the effect of
that variable has changed over time.
Example 2:
Changes in the return to education and the
gender wage gap between 1978 and 1985.
8
With panel data we view the unobserved fac-
tors affecting the dependent variable as con-
sisting of two types: those that are constant
and those that vary over time. Letting i de-
note the cross-sectional unit and t time:
(6) yit = β0 + δ0 D2,t + β1 xit + ai + uit . t = 1, 2
9
Naively, we might go and estimate a fixed
effects model by pooled OLS. That is, we
write (6) in the form
10
The main problem with applying pooled OLS
is that we did very little to solve the omitted
variable bias problem. Only the time-varying
part (assumed to be common for all cross-
sesctional units) has been taken out by in-
troducing the time dummy. The fixed effect
ai, however, is still there; it has just been hid-
den in the composite error νit, and is there-
fore not modeled. That is, the parameter
estimates are still biased, unless ai is uncor-
related with xit.
Example 4 (continued).
Pooled OLS on the crime rate data yields
(9) crmrte
\ = 93.42 + 7.94D87 + 0.427unem.
11
The main reason for collecting panel data
is to allow for ai to be correlated with the
explanatory variables. This can be achieved
by first writing down (6) explicitely for both
time points:
Example 4 (continued).
Estimation of (10) yields
∆crmrte
\ = 15.40 + 2.22∆unem,
(t = 3.28) (t = 2.52)
which now gives a positive, statistically significant
relationship (p = 0.015) between unemployment and
crime rates.
13
Policy Analysis with Two-Period Panel Data
17
2.4 Dummy Variable Regression in Panels
20
Thus, subtracting (20) from (18) eliminates
ai and gives
(21) yit − ȳi = β1(xit − x̄i) + (uit − ūi)
or
(22) ẏit = β1ẋit + u̇it,
where e.g., ẏit = yit − ȳi is the time demeaned
data on y.
23
Example 6. (continued)
We have N = 22 cross-sectional units and
T = 9 time periods for a total of N T = 198
observations. There is one dummy for the
enterprize zone and eight year dummies for
a total of k = 9 regressors. The correction
factor for the standard errors is therefore
s r r
NT − k 22 · 9 − 9 189
= = ≈ 1.063831.
N (T −1) − k 22 · 8 − 9 167
24
EViews can do the degrees of freedom ad-
justment automatically, if you tell it that you
have got panel data. In order to do that,
choose
Effects Specification
26
R2 in Fixed Effects Estimation
27
Limitations
28
Example 7
Data set wagepan.xls (Wooldridge): n = 545, T = 8.
Is there a wage premium in belonging to labor union?
log(wageit ) = β0 + β1 educit + β3 exprit + β4 expr2it
+β5 marriedit + β6 unionit + ai + uit
Year (d81 to d87) and race dummies (black and hisp)
are also included. Pooled OLS with νit = ai +uit yields
Dependent Variable: LWAGE
Method: Panel Least Squares
Date: 12/11/12 Time: 12:32
Sample: 1980 1987
Periods included: 8
Cross-sections included: 545
Total panel (balanced) observations: 4360
White period standard errors & covariance (d.f. corrected)
Effects Specification
Assumptions:
FE.1: For each i, the model is
31
If we add the following two assumptions, FE
is the best linear unbiased estimator:
34
Generally, we call the model in equation (25)
the random effects model if ai is uncorre-
lated with all explanatory variables, i.e.,
35
If the data set is simply pooled and the error
term is denoted as vit = ai + uit, we have the
regression
36
If σa2 and σu2 were known, optimal estimators
(BLUE) would be obtained by generalized
least squares (GLS), which in this case would
reduce to estimating the regression slope co-
efficients from the quasi demeaned equation
(29)
yit −λȳt = β0(1−λ) + β1(xit −λx̄i) + (vit −λv̄i),
where
!1
σu2 2
(30) λ=1− .
σu2 + T σa2
In practice σu2 and σa2 are unknown, but they
can be estimated for example as follows:
∗ The
ideal random effects assumptions include FE.1,
FE.2, FE.4–FE.6.
FE.3 is replaced with
RE.3: There are no perfect linear relationships
among the explanatory variables.
RE.4: In addition of FE.4, E[ai |Xi ] = 0.
38
Note that λ = 0 in (29) corresponds to pooled
regression and λ = 1 to FE, such that for
σu2 σa2 (λ ≈ 1) RE estimates will be sim-
iliar to FE estimates, whereas for σu2 σa2
(λ ≈ 0) RE estimates will resemble pooled
OLS estimates.
Example 7 (continued.)
Note that the constant dummies black and
hisp and the variable with constant change
exper, which dropped out with the FE method,
can be estimated with RE.
!1/2
0.3512
λ̂ = 1 − = 0.643,
0.3512 + 8 · 0.32462
such that the RE estimates lie closer to the
FE estimates than to the pooled OLS esti-
mates.
Applying RE is probably not appropriate in
this case, because, as discussed earlier, the
unobservable ai is probably correlated with
some of the explanatory variables.
39
EViews output for RE estimation:
Effects Specification
S.D. Rho
Weighted Statistics
Unweighted Statistics
40
Random effects or fixed effects?
41
Hausman specification test
Example 7 (continued.)
As expected, the Hausman test strongly re-
jects the null hypothesis, that ai would be
uncorrelated with all explanatory variables.
Therefore, RE is inappropriate and we must
use FE parameter estimates instead.
43
Correlated Random Effects - Hausman Test
Equation: HAUSMAN
Test cross-section random effects
Effects Specification
44