Panel Data Notes
Panel Data Notes
Introduction
Panel data or longitudinal data refers to a cross-section repeatedly sampled over time, but
where the same economic agent has been followed throughout the period of the sample. Some
examples of data of this sort include among others; firm or company data, and comparative
country specific macroeconomic data over time. The common feature of these sources of data
is that, while the sample of individuals N is typically relative large, the number of time periods
T over which these individuals are observed is generally short. The application of estimation
methods which exploit panel data sources have become increasingly prominent in both the
theoretical and applied econometric literature. This popularity is in part a consequence of the
increased availability of data of this type, coupled with the ability for panel data studies to
answer questions not possible either from a cross-section context or with a pure time series.
It is nevertheless true to say that at least for more advanced applications; the statistical and
econometric methods required for the estimation of panel data models tend to be complex
relative to cross-sectional methods. Some difficulties arise from inevitable problems inherent
in data sources with longitudinal element. These difficulties mainly relate to the concepts of
attrition and non-randomness of the sample, which can generate statistical biases if not treated
correctly. Attrition occurs when individuals leave the panel before the end of the period of the
panel. For example some organizations in a panel of firms may go out of business. In a sample
of individuals observed repeatedly over time, some may choose not to participate mid-way
through the sampling period; other may simply die and leave the panel. If these occurrences
are random, then the only consequence is loss of information and efficiency. If on the other
hand those who leave the sample are systematically different from those whom remain, then
this causes a problem of selection which causes the panel to lose its representativeness and
become random as the period of observation proceeds. Dealing with non-random attrition may
then require selection correction methods over and above the statistical methods required for
dealing with longitudinal data. Given the above, it is therefore, worth mentioning the
advantages of panel date methods.
Why use panel data methods?
Panel data usually give the researcher a large number of data points, increasing the
degrees of freedom and reducing the collinearity among explanatory variables - hence
improving the efficiency of econometric estimates (bearing in mind heterogeneity
bias).
Panel data provides a means of resolving the magnitude of econometric problems that
often arises in empirical studies, namely the often heard assertion that the real reason
one finds (or does not find) certain effects is the presence of omitted (mis-measured or
unobserved) variables that are correlated with explanatory variables. Panel data allows
controlling for omitted (unobserved or mismeasured) variables.
Some Terminology
(i) Cross-section oriented panel data: The number of cross-sections (N) is more than the
time dimension (T).
(ii) Time-series oriented panel data: The time dimension (T) is greater than the cross-
sections (N).
(iii) Balanced panel data: This is panel data where there are no missing observations for every
cross-section.
2
(iv) Unbalanced panel data: This is the case, where the cross-sections do not have the same
number of data observations. In other words some cross-sections do not have data.
(v) Rotating panels: This is a case where in order to keep the same number of economic
agents in a survey; the fraction of economic agents that drops from the sample in the
second period is replaced by an equal number of similar economic agents that are freshly
surveyed. This is a necessity in survey panels where the same economic agent (say
household) may not want to be interviewed again and again.
(vi) Pseudo-Panels/synthetic panels: This panel data that is close to a genuine panel data
structure. For instance for some countries, panel data may not exist. Instead the researcher may
find annual household survey based on a large random sample of the population. In repeated cross-
section surveys, it may be impossible to track the same household over time as required in a
genuine panel. In Pseudo panels, cohorts are tracked (e.g. males borne between 1970 and 1980).
For large samples, successive surveys will generate random samples of members of each cohort.
We can then estimate economic relations based on means rather than individual observations.
(vii) Spatial Panels: This is panel data dealing with space. For instance cross-section of
countries, regions, states. These aggregate units are likely to exhibit cross-sectional
correlation that has to be dealt with using special methods (spatial econometrics).
(viii) Limited dependent/nonlinear panel data: This is panel data where the dependent
variable is not completely continuous-binary (logit/probit models), hierarchical (nested
logit models), ordinal (ordered logit/probit), categorical (multinomial logit/probit), count
models (poisson and negative binomial), truncated (truncated regression), censored (tobit),
sample selection (Heckit model).
3
Linear Panel Data Models
Consider the following general linear model for panel data;
yit it xit' it it , i 1,......., N and t 1,......., T (1)
Where, yit is the dependent variable, xit is a vector of independent variables, it is the
The model in equation 1 permits both the intercept and slope coefficients to vary over individual
and time. The model in equation 1 above is however, characterized by the following;
It is too general and not estimable; there are more parameters to be estimated than
observations.
There is need for more restrictions on the extent to which it and it can vary over i
Consider now the most restrictive model: (Pooled Panel Data). This model specifies constant
coefficients which do not distinguish between two individuals and same individual at two different
points in time. However, this is not always the case in practice.
yit xit' it (2)
Under this model the following assumptions are made;
Assume error term are homoscedastic and serially independent both within and
between individuals (cross-section); that is
Var ( it ) 2
Cor ( it , js ) 0, when i j and/or t s.
The marginal effects β of the set of k vector of time-varying characteristics xit are taken to be
common across i and t, although this assumption can itself be tested. If the model is correctly
specified and regressors are uncorrelated with the error term, the pooled OLS will produce
consistent and efficient estimates for the parameters.
4
1 N T
NT
xit yit
ˆ i 1 t 1
1 N T
xit xit
NT i 1 t 1
N T N T
1 1
Where x
NT
x , y NT y , x
i 1 t 1
it
i 1 t 1
it it xit x , yit yit y , (3)
And ˆ y ˆ x
This formulation does not distinguish between two different individuals and the same individual at
two different points in time. This feature undermines the accuracy of the approach when
differences do exist between cross-sectional units. However, a larger number of observations by
pooling data across time improve efficiency relative to a single cross-section. This model also
does not use any panel information; the data are treated as if there was only one single index.
The assumption made here is that the errors are homoscedastic and serially
independent both within and between individuals (cross-sections).
Var ( it ) 2
Cor ( it , js ) 0, when i j and/or t s.
This is referred to as individual (unobserved) heterogeneity. Also the slopes are the same for all
individuals
In this model the constant terms, i , and slope coefficients, i ,vary from individual to
individual.
yit i xit i it (5)
5
In the SUR models, the error terms are assumed to be contemporaneously correlated and
heteroscedastic between individuals.
Var ( it ) i2
Cor ( it , js ) 0, when t s.
Note: When we have a large number of independent individuals observed for a few time periods
(N>>T), like in cross-sectional panels, it is not possible to estimate different individual slopes ( i
), for all the exogenous variables. The panel data model is the most appropriate as compared to
Seemingly Unrelated Regression (SUR) Model. While, when we have a medium length time series
for relatively few individuals (T>N), then; in such a case the SUR model may be appropriate.
Thus efficient SUR estimation is mainly used when T≥N and Equation by Equation OLS is used if
K ≤ T ≤ N. Therefore, Pooled model is the most restrictive while, SUR is the most unrestrictive.
Where ui denotes the unobserved individual specific effect and it denotes idiosyncratic errors or
idiosyncratic disturbances, which change across time and cross-section. ui is time invariant and
accounts for any individual-specific effect that is not included in the regression.
6
Functional Forms and Notation
The parameter estimate of a dummy variable is part of the intercept in a fixed effect model and a
component of error in the random effect model. Slopes remain the same across groups or time
periods. The functional forms of one-way panel data models are as follows.
it
IID 0, 2
7
individual as his own control. Recall the farm example we gave previously, we said that one
possibility explaining the very high rates is that there is an unobservable factor, say land quality
that the estimation was not controlling for. If we can have data on the output from the land in
several years, we could control for this unobservable factor even though one does not see it.
Unfortunately there are some costs and dangers in doing so.
Let’s see some of the implications. Say we have the following
Yit ' X it i t u it
where t is the time effect (say macro) that affects all individuals in the sample of time t.
i is the fixed-effects for observation i. If i is correlated with X’s then we will have problems.
The fixed-effect is, as the qualifier suggests is fixed through time. Given that we have moew than
one observation each of the sample points, we can remove the i by taking differences or when
there are more than two observations, by subtracting (sweeping out) the individual means. Lets
take the 1st difference
Yi 2 Yi1 2 1 ' ( X i 2 X i1 ) U i 2 U i1
Y X U
We can then consistently estimate the parameters since the regression does not have the
unobserved (and correlated with X’s) fixed effect. However this has some consequences;
1- the new regression has half as many observations as the regression in levels.
2- Differencing will sweep out the fixed effects as well as everything else that does not
change over the period of observation.
3- Differencing in the presence of measurement error may in fact complicate the estimation.
8
The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with dummy
variables. We estimate an LSDV model first by defining a series of individual-specific dummy
variables. In principle one simply estimates the OLS regression of yit on xit and a set of N-1 indicator
variables dit , d 2it ,....d N 1t . The resulting estimator of from this model turns out to be equal to the
The key issue in LSDV is how to avoid the perfect multicollinearity or so called “dummy variable
trap.” LSDV has three approaches to avoid getting caught in the trap. These approaches are
different from each other with respect to model estimation and interpretation of dummy variable
parameters (Suits 1984). They produce different dummy parameter estimates, but their results are
equivalent.
The first approach, LSDV1, drops a dummy variable. That is, the parameter of the eliminated
dummy variable is set to zero and is used as a baseline. A variable to be dropped, needs to be
carefully (as opposed to arbitrarily) selected so that it can play a role of the reference group
effectively.
LSDV1: yi 0 1 xi 1d1i i , or yi 0 1 xi 2 d 2i i
LSDV2 includes all dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero).
LSDV2: yi 1 xi 1d1i 2 d 2i i
LSDV3 includes the intercept and all dummies, and then impose a restriction that the sum of
parameters of all dummies is zero.
LSDV3: yi 0 1 xi 1d1i 2 d 2i i , subject to 1 2 0
Each approach has a constraint (restriction) that reduces the number of parameters to be estimated
by one and thus makes the model identified.
9
The three approaches end up fitting the same model but the coefficients of dummy variables in
each approach have different meanings and thus are numerically different. A parameter estimate in
LSDV2, d , is the actual intercept (Y-intercept) of group d. It is easy to interpret substantively.
The t-test examines if d is zero. In LSDV1, a dummy coefficient shows the extent to which the
actual intercept of group d deviates from the reference point (the parameter of the dropped dummy
variable), which is the intercept of LSDV1. The null hypothesis holds that the deviation from the
reference group is zero. In LSDV3, a dummy coefficient means how far its actual parameter is
away from the average group effect.
Therefore, the null hypothesis is the deviation from the average is zero. In short, each approach
has a different baseline and thus tests a different hypothesis but produces exactly the same
parameter estimates of regressors. They all fit the same model; given one LSDV fitted, in other
words, we can replicate the other two LSDVs.
Which approach is better than the others? You need to consider both estimation and interpretation
issues carefully. In general, LSDV1 is often preferred because of easy estimation in statistical
software packages. Oftentimes researchers want to see how far dummy parameters deviate from
the reference group rather than what are the actual intercept of each group.
LSDV is widely used because it is relatively easy to estimate and interpret substantively. This
LSDV, however, becomes problematic when there are many groups or subjects in panel data. If T
is fixed and nT , only coefficients of regressors are consistent. The coefficients of dummy
variables, are not consistent since the number of these parameters increases as nT increases
(Baltagi 2001). This is called incidental parameter problem. Under this circumstance, LSDV is
useless and thus calls for another strategy, the within effect model.
10
estimate β only. This means subtracting the mean for each cross-section from each observation.
Demeaning the data will not change the estimates for β.
There are no incidental parameters and the errors still satisfy the usual assumptions.
We can therefore use OLS on the above equation to obtain consistent estimates.
The parameter estimates of regressors in the within effect model are identical to those of LSDV.
The within effect model in turn has several disadvantages.
Since this model does not report dummy coefficients, you need to compute them using the formula
di yi xi . Since no dummy is used, the within effect model has larger degrees of freedom for
error, resulting in small MSE (mean square error) and incorrect (smaller) standard errors of
parameter estimates. Thus, we need to adjust the standard error using the formula;
nT k
sek sek
nT n k
Note: R2 of the within effect model is not correct because the intercept is suppressed. Also
because dummy variables cannot be used we would be able to say nothing about the relationship
between the dependent variable and the time-invariant characteristics using this estimator.
11
The Between Estimator
The between group effect model, so called the group mean regression, uses group means of the
dependent and independent variables. This data aggregation reduces the number of observations
down to n. The between estimator is the OLS estimator for regression of yi on time averaged
regressors; yi i xi i
This estimator just uses cross-sectional variation. Its major concern is the difference between
different individuals.
Random effects (RE): is more useful for a sample drawn from a large population and both
‘individual effects’ and usual error terms are assumed to be random: note that correlation between
explanatory variables and error terms are assumed to be always zero in RE context.
ui IID(0, u2 )
Where,
vit IID(0, v2 )
The ui is assumed independent of it and xit , which are also independent of each other for all i
and t. This assumption is not necessary in the fixed effect model. The components of
Cov(wit , w js ) u2 v2 if i j and s t same cross section and time; and
12
Thus, the matrix or the variance structure of errors looks like
u2 v2 u2 u2 u2
u2 u2 v2 u2 u2
u2 u2 u2 v2 u2
u
2
u2 u2 u2 v2
A random effect model is estimated by generalized least squares (GLS) when the variance
structure is known, and by feasible generalized least squares (FGLS) when the variance is
unknown. Compared to fixed effect models, random effect models are relatively difficult to
estimate. In this chapter we assume panel data are balanced.
stacked vector of yit formed in a similar form as the error term and a similar X, then
Y X u
Generalized Least Square requires the removal of a non standard structure of the error covariance
matrix. Define a weight matrix in the form
P V 1/ 2
PY PX Pu
Y* X * u *
E (u*, u *') E ( Puu ' P) PE (uu ') P PVP I NT
13
P V 1/ 2 I N 1/ 2 .
V I N v2 I N 2 ii' ,
Assuming the errors in this equation are serially uncorrelated and homoskedatic, OLS (GLS)
could be applied. Note that the RE estimator is obtained by a quasi time demeaning, rather than
removing the time average from the explanatory and the dependent variable at each t, random
effects removes a fraction of the time average. If ̂ is close to unity, RE and FE estimates tend to
be close.
1/ 2
1
ˆ 1 2
[1 T (ˆ u / v )]
2
Applications of RE estimator attempt to control for the part of i correlated with xit by
including dummy variables for various groups if we have many observations in each group
14
to control for certain amount of heterogeneity that might be correlated with the time
constant elements of xit . These dummies should be termed as (example sector fixed effect
so need to use robust standard errors for t and F statistics for pooled OLS regression.
o Use residuals ( eit ) to test for serial correlation in eit vit vi. , which are serially assumed
There are various proposed ways but the most common way of estimating procedure in the
literature is Swammy-Arora (in Eviews for instance).
GLS estimator of REM is a weighted average of within and between estimator and it converges to
the within estimator as t and 1.
The null hypothesis is that all dummy parameters except for one for the dropped are zero:
H 0 : 1 ....... n 1 0 . This hypothesis is tested by the F test which is based on loss of
goodness-of-fit. The robust model in the following formula is LSDV (or within effect model) and
the efficient model is the pooled regression.
eeEfficient
R
eeRobust n 1 2
Robust REfficient
2
n 1 F n 1, nT n k
eeRobust nT n k 1 R 2
Robust nT n k
If the null hypothesis is rejected, you may conclude that the fixed group effect model is better than
the pooled OLS model.
15
The null hypothesis is that cross-sectional variance components are zero, H 0 : u2 0, Breusch and
Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003). A test of presence of
individual specific random effects against iid errors, because REM = Pooled if variance of
individual effects become zero,
v
0 so that 1 0.
T 2 v2
H 0 : 2 0,
H A : 2 0
2 N 2
NT
T vi.
LM N T i 1
1 12 under H 0.
2(T 1)
vit
2
i 1 t 1
Note: Breusch and Pagan test is two sided test against H A : u2 0 but variance never be negative.
The LM test for unbalanced/incomplete panels is modified by Baltagi and Li (1990). Decision:
High 12 and low prob (less than 0.10 or 0.05), reject the null and REM is better.
Two caveats: (1) strict exogeneity is mainted under the null and the alternative; correlation
between xit and uit for any s and t causes both FE and RE to be inconsistent and generally their
efficient. It should be noted that this assumption is not being tested by the Hausman statistic: the
16
Hausman statistic has no systematic power against the alternative that the first assumption is true
but the second is false. Failure of the second assumption causes the usual Hausman test to have a
nonstandard limiting distribution; sysmptotic size larger or smaller than the normal size.
H ( re w ) '[V ( w ) V ( re )]1 ( re w );
H 0 : re w 0, i.e., the correct model is a REM. beta is consistent under H 0 .
H 0 : re w 0, i.e., the correct model is FEM. beta is inconsistent under H a
17
Modeling serial correlation: example
AR(1) vit vit 1 it AR(2) vit 1vit 1 2vit 2 it or MA(1) it vit 1
Autocorrelation coefficient calculated as:
N T
it it 1
r i 1 t 2
N T
,r N (0,1) and H 0.
i 1 t 2
it it
Poolability Test
What is poolability? Poolability tests whether or not slopes are the same across groups or over
time. Thus, the null hypothesis of the poolability test is H 0 : ik k . Remember that slopes
remain constant in fixed and random effect models; only intercepts and error variances matter. The
poolability test is undertaken under the assumption of
N 0, s 2 I NT . This test uses the F
statistic
Fobs
ee ee n 1 K
i i
F n 1 K , n T K
ee n T K
i i
Where, ee is the SSE of the pooled OLS and i i e'e is the SSE of the OLS regression for group i. If
the null hypothesis is rejected, the panel data are not poolable. Under this circumstance, you may
go to the random coefficient model or hierarchical regression model.
Similarly, the null hypothesis of the poolability test over time is H 0 : tk k . The F-test is;
Fobs
ee ee T 1 K
t t
F T 1 K , T n K
ee T n K
t t
18
DYNAMIC PANAL DATA MODELS
Consider the following model:
yit i yit 1 xit' vit ... (1)
where i iid (0, 2 ) and vit iid (0, v2 )
This implies yit 1 and i (the error term) are correlated; OLS estimator becomes biased and
B. Fixed Effects
The model: yit yi. i i ( yit 1 yi.1 ) ( xit' 1 xi. ) (vit vi. )
( yit 1 yi.1 ) and (vit vi. ) are correlated as yit 1 is correlated with vi.
C. Random Effects
The model: yit yi. i i ( yit 1 yi.1 ) ( xit' 1 xi. ) (vit vi. )
( yit 1 yi.1 ) and (vit vi. ) are correlated as yit 1 is correlated with vi.
Use yit-2 yit 2 yit 3 or yit 2 as an instrument for yit-1 yit-1 yit-2
19
Not efficient estimates because does not make use of available moment conditions.
It is two step GMM estimator, complicated formulation Page 5, Part V. Considering x regressors,
in addition to yi1 we will have x ji1 , where j are regressors and for yi1 , yi 2 ,.. yiT 2 , we have
ARRELANO AND BOVER (1995) and BLUDELL and BOND SYSTEM ESTIMATOR
Blundell and Bond (1998) used an extended system GMM estimator that uses lagged differences
of yit as instruments for equations in levels and the system GMM estimator is shown to have
dramatic efficiency gains over the basic first difference GMM. For small T, the use of extra
20
moment condition improves efficiency and reduces finite sample bias. Blundell and Bond (2000)
found a more reasonable and more precise estimate of coefficient of capital as compared to the
standard first-differenced GMM for US manufacturing for the year (1982-89).
Valid Instruments in case of System GMM, assuming no predetermined variables for
simplicity as an example could be shown as:
yiT yiT 1 viT 1 could be yi1, yi 2 ,..., yiT 2 , yi 2 , yi 3..., yiT 1
Make sure that first order and second order autocorrelation tests AR(1) and AR(2) are
conducted;
Sergan Test over identification is conducted assuming homoscedaticity, not on robust
standard errors;
Check the effect of assuming predetermined variables as endogenous and exogenous;
Check whether or not the coefficients of the variables are statistically (jointly) zero
using Wald Test.
Make sure that two step GMM results are not for inference but for specification
because two step standard errors tend to be biased downwards in small samples.
Hausman test, predictions of errors, test, estimates (R).
Problems of Non-stationarity: Low power and nonstandard limiting distribution of time series
(unit root and cointegration) tests as well as spurious regression (t-statistics diverge in
misspecified regression of I(1) variables).
21
correlation and heteroskedasticity of the residuals can now be combined with FE and
RE panel estimation methods.
In non-staionary panel
Test statistics and estimators have normal limiting distribution unlike the time series case;
In a panel data context, it is possible to avoid spurious regression. As T and N goes to
infinity, spurious regression estimates give consistent estimate of the true value of the
parameter because panel estimator averages across individuals and the information in the
independent panel leads a stronger overall signal than pure time series.
However as N and T introduces some other problems in asymptotic analysis
(Phillips and Moon, 2000).
Two Issues:
i. Panel data tests are the wrong answer to the low power of unit root tests in single
time series.
ii. Hence, panel unit root tests don’t rescue PPP and can’t settle the question of growth
convergence among countries
Tests of Non-stationarity: Tests are notorious for low power and non-standard limiting
distributions.
All panel unit tests begin with the following specification:
yit i yit 1 xit it
Then:
yit yit 1 i yit 1 yit 1 xit it , with i ( i 1)
Pi
ADF type model: yit i yit 1 xit ij yit j it ,
j 1
1. Levin, Lin and Chu (LLC); Breitung as well as Hadri assume for this tests the
persistence parameters are common across cross section units;
22
i (Common Unit Root) .
LLC: 1. Attempts to derive estimates for from proxies of yit and yit that are
kernel choices.
Rreitung: 1. Only the autoregressive component (and not the exogenous components)
is removed when constructing standardized proxies; (2) Proxies and transformed and
detrended. (2) Requires only a specification of the number of lags used in each cross-
section ADF and exogenous regressors. (3). No kernel computation.
Hardi: (1) Similar to KPSS unit root test (2) Null: No unit root. (3) Based on residuals
from the individual OLS regressions of the yit on a constant or on a constant and a
Im, Pesaran and Shin (IPS); Fischer-ADF as well as Fischer PP tests assume i vary
Pi
yit i yit 1 xit ij yit j it ,
j 1
23
(1). Estimate separate ADF regressions, (2). The average of the t-statistics for each of
the i from individual ADF regressions, ( tiTi ( pi ) ;
Average is
N
tiTi ( pi )
tNT i 1
N
tNT has asymptotic standard normal distribution as:
N
N tNT N 1 E ( tiT ( Pi ))
WtNT i 1 N (0,1)
N
N 1
var( t
i 1
iT ( Pi ))
IPS provided expected mean and variances of ADF t-statistics for various values of T
and P.
COINTEGRATION
Because a need for more powerful test;
Time series tests are less powerful especially for short T.
Types of Tests:
Residual based Tests;
o DF and ADF tests (Kao tests 1999)
o LM test (McCoskey and Kao -1998).
Likelihood based test
o Combined individual tests.
Pedroni (Engle-Granger Based) Panel Cointegration Test
EG (1987) cointegration test is based on residuals of a spurious regression on I (1)
variables.
If I (1) variables are cointegrated the residuals should be I(0), otherwise I(1).
Pedroni (1999, 2004) and Kao (1999) extended EG for panel data and suggest several
tests, allowing heterogeneous intercepts and trend coefficients across cross-sections;
24
yit i i t 1i x1it ... Mi xMit eit ,
t 1, 2,...T , i 1, 2,...N , m 1, 2,...M .
y and x are assumed to be I(1). i and i are individual and trend effects, which
could be set to zero.
H 0 : No cointegration, residuals are I (1).
Steps
o Obtain residuals from the above regression;
o Run auxiliary regression of the form
Pi
ei ,t i ei ,t 1 ij ei ,t j vi ,t
j 1
and NT N (0,1).
Steps: (1) Obtain residuals from the above regression (2) Run following auxiliary
regression:
eit ei ,t 1 vit
H 0 : 1, no cointegration.
t-value:
N T
( ˆ 1) e it 1
i t 2
Se
Kao presents several statistics for DF , DFt , DF* , DFt* and ADF which are
25
o Maddala and Wu (1999) suggested based on Fisher (1932) the following.
o If i is the p-value from individual cointegration test for cross section i then
26