Panel Data Model
Panel Data Model
Ani Katchova
t
1
2
3
1
2
3
1
2
3
9
10
11
20
20
20
25
30
35
10
10
10
20
20
20
30
30
30
20
20
20
20
20
20
20
20
20
-11
-10
-9
0
0
0
5
10
15
Within
deviation
-10
-10
-10
0
0
0
10
10
10
-1
0
1
0
0
0
-5
0
5
Within
deviation
(modified)
19
20
21
20
20
20
15
20
25
Individual mean
Overall mean
Overall variance
Between variance
Within variance
The overall variation can be decomposed into between variation and within variation.
Panel data models describe the individual behavior both across time and across individuals.
There are three types of models: the pooled model, the fixed effects model, and the random
effects model.
Pooled model
The pooled model specifies constant coefficients, the usual assumptions for cross-sectional
analysis.
This is the most restrictive panel data model and is not used much in the literature.
In other words, the individual-specific effects are the leftover variation in the dependent
variable that cannot be explained by the regressors.
Time dummies can be included in the regressors x.
and
,
Here
,
/
so
Rho is the interclass correlation of the error. Rho is the fraction of the variance in the error due
to the individual-specific effects. It approaches 1 if the individual effects dominate the
idiosyncratic error.
collapses on
as n becomes large:
Efficiency
Efficiency (minimum variance) is usually established relative to specific classes of estimators.
o Example: OLS is efficient (minimum variance) among the class of linear, unbiased
estimators (Gauss-Markov Theorem).
o Maximum likelihood (given correct distributional assumptions) is asymptotically
efficient among consistent estimators.
Pooled OLS estimator
The pooled OLS estimator uses both the between and within variation to estimate the
parameters.
The pooled OLS estimator is obtained by stacking the data over i and t into one long regression
with NT observations and estimating it by OLS:
If the true model is the pooled model and the regressors are uncorrelated with the error terms,
the pooled OLS regressor is consistent.
If the true model is fixed effects then the pooled OLS regressor is inconsistent.
We need to have panel-corrected standard errors.
Between estimator
The between estimator only uses the between variation (across individuals).
It uses the time averages of all variables.
o If an individual has a work experience of 9, 10, and 11 years measured over 3 periods
then the average experience is 10.
This is an OLS estimation of the time-averaged dependent variable on the time-averaged
regressors for each individual.
The number of observations is N. The time variation is not considered and the data are
collapsed with one observation per individual.
This estimator is seldom used because the pooled and RE estimators are more efficient.
Within estimator or fixed effects estimator
The within estimator uses the within variation (over time).
It uses time-demeaned variables (the individual-specific deviations of variables from their
time-averaged values).
o If an individual has a work experience of 9, 10, an 11 years measured over 3 periods, the
average experience is 10. So the time-demeaned values are -1, 0, and 1.
This is an OLS estimation of the time-demeaned dependent variable on the time-demeaned
regressors.
First-differences estimator
The first-difference estimator uses the one-period changes for each individual.
It uses first-differenced variables (the individual-specific one-period changes for each
individual).
o If an individual has a work experience of 9, 10, and 11 years measured over 3 periods
then the first difference experience are missing (.), 1, and 1.
This is an OLS estimation of the one-period changes of the dependent variable on the oneperiod changes in the regressors.
,
The number of observations is N(T-1). We lose the first observation for each individual
because of differencing.
The individual-specific effects cancel out.
A limitation of the first-differences model is that time-invariant variables are dropped from the
model and their coefficients are not identified.
Estimator/true model
Pooled OLS estimator
Between estimator
Within or fixed effects estimator
First differences estimator
Random effects estimator
Pooled model
Consistent
Consistent
Consistent
Consistent
Consistent
The fixed effects estimator will always give consistent estimates, but they may not be the most
efficient.
The random effects estimator is inconsistent if the appropriate model is the fixed effects model.
The random effects estimator is consistent and most efficient if the appropriate model is
random effects model.
This is a test for the random effects model based on the OLS residual.
or equivalently
,
is significantly different from zero.
Test whether
If the LM test is significant, use the random effects model instead of the OLS model.
We still need to test for fixed versus random effects.
Hausman test
The random effects estimator is more efficient so we need to use it if the Hausman test
supports it. If it does not support it, use the fixed effects model.
Hausman test tests whether there is a significant difference between the fixed and random
effects estimators.
The Hausman test statistic can be calculated only for the time-varying regressors.
The Hausman test statistics is:
o It is chi-square distributed with degrees of freedom equal to the number of parameters for
the time-varying regressors.
o If the Hausman test is insignificant use the random effects.
o If the Hausman test is significant use the fixed effects.