0% found this document useful (0 votes)
8 views

Panel Data Notes

Panel daya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Panel Data Notes

Panel daya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Panel Data Models

Introduction
Panel data or longitudinal data refers to a cross-section repeatedly sampled over time, but
where the same economic agent has been followed throughout the period of the sample. Some
examples of data of this sort include among others; firm or company data, and comparative
country specific macroeconomic data over time. The common feature of these sources of data
is that, while the sample of individuals N is typically relative large, the number of time periods
T over which these individuals are observed is generally short. The application of estimation
methods which exploit panel data sources have become increasingly prominent in both the
theoretical and applied econometric literature. This popularity is in part a consequence of the
increased availability of data of this type, coupled with the ability for panel data studies to
answer questions not possible either from a cross-section context or with a pure time series.

It is nevertheless true to say that at least for more advanced applications; the statistical and
econometric methods required for the estimation of panel data models tend to be complex
relative to cross-sectional methods. Some difficulties arise from inevitable problems inherent
in data sources with longitudinal element. These difficulties mainly relate to the concepts of
attrition and non-randomness of the sample, which can generate statistical biases if not treated
correctly. Attrition occurs when individuals leave the panel before the end of the period of the
panel. For example some organizations in a panel of firms may go out of business. In a sample
of individuals observed repeatedly over time, some may choose not to participate mid-way
through the sampling period; other may simply die and leave the panel. If these occurrences
are random, then the only consequence is loss of information and efficiency. If on the other
hand those who leave the sample are systematically different from those whom remain, then
this causes a problem of selection which causes the panel to lose its representativeness and
become random as the period of observation proceeds. Dealing with non-random attrition may
then require selection correction methods over and above the statistical methods required for
dealing with longitudinal data. Given the above, it is therefore, worth mentioning the
advantages of panel date methods.
Why use panel data methods?

 Panel data usually give the researcher a large number of data points, increasing the
degrees of freedom and reducing the collinearity among explanatory variables - hence
improving the efficiency of econometric estimates (bearing in mind heterogeneity
bias).

 More importantly, longitudinal data allow a researcher to analyze a number of


important economic questions that cannot be addressed using cross-sectional or time-
series data sets. The oft-touted power of panel data derives from their theoretical ability
to isolate the effects of specific actions, treatments, or more general policies.

 Panel data provides a means of resolving the magnitude of econometric problems that
often arises in empirical studies, namely the often heard assertion that the real reason
one finds (or does not find) certain effects is the presence of omitted (mis-measured or
unobserved) variables that are correlated with explanatory variables. Panel data allows
controlling for omitted (unobserved or mismeasured) variables.

 Panel data involve two dimensions: a cross-sectional dimension N, and a time-series


dimension T. We would expect that the computation of panel data estimators would be
more complicated than the analysis of cross-section data alone (where T = 1) or time
series data alone (where N = 1). However, in certain cases the availability of panel data
can actually simplify the computation and inference.

Some Terminology
(i) Cross-section oriented panel data: The number of cross-sections (N) is more than the
time dimension (T).
(ii) Time-series oriented panel data: The time dimension (T) is greater than the cross-
sections (N).
(iii) Balanced panel data: This is panel data where there are no missing observations for every
cross-section.

2
(iv) Unbalanced panel data: This is the case, where the cross-sections do not have the same
number of data observations. In other words some cross-sections do not have data.
(v) Rotating panels: This is a case where in order to keep the same number of economic
agents in a survey; the fraction of economic agents that drops from the sample in the
second period is replaced by an equal number of similar economic agents that are freshly
surveyed. This is a necessity in survey panels where the same economic agent (say
household) may not want to be interviewed again and again.
(vi) Pseudo-Panels/synthetic panels: This panel data that is close to a genuine panel data
structure. For instance for some countries, panel data may not exist. Instead the researcher may
find annual household survey based on a large random sample of the population. In repeated cross-
section surveys, it may be impossible to track the same household over time as required in a
genuine panel. In Pseudo panels, cohorts are tracked (e.g. males borne between 1970 and 1980).
For large samples, successive surveys will generate random samples of members of each cohort.
We can then estimate economic relations based on means rather than individual observations.

(vii) Spatial Panels: This is panel data dealing with space. For instance cross-section of
countries, regions, states. These aggregate units are likely to exhibit cross-sectional
correlation that has to be dealt with using special methods (spatial econometrics).

(viii) Limited dependent/nonlinear panel data: This is panel data where the dependent
variable is not completely continuous-binary (logit/probit models), hierarchical (nested
logit models), ordinal (ordered logit/probit), categorical (multinomial logit/probit), count
models (poisson and negative binomial), truncated (truncated regression), censored (tobit),
sample selection (Heckit model).

Types of Panel Data Models


(a) Static Panel data Models vs Dynamic Panel Data Model: Static panel data model has no
lagged dependent variable among the explanatory variables.
(b) Stationary Panel Data Model vs Non Stationary Panel Data Model: Stationary panel
data model contain stationary variables (i.e. I(0) variables) as opposed to non-stationary variables

3
Linear Panel Data Models
Consider the following general linear model for panel data;
yit  it  xit' it   it , i  1,......., N and t  1,......., T (1)

Where, yit is the dependent variable, xit is a vector of independent variables,  it is the

disturbance term, i indexes individuals (firms, country etc), t indexes time.

The model in equation 1 permits both the intercept and slope coefficients to vary over individual
and time. The model in equation 1 above is however, characterized by the following;

 It is too general and not estimable; there are more parameters to be estimated than
observations.
 There is need for more restrictions on the extent to which  it and it can vary over i

and t and on the behaviour of the error term.

Consider now the most restrictive model: (Pooled Panel Data). This model specifies constant
coefficients which do not distinguish between two individuals and same individual at two different
points in time. However, this is not always the case in practice.
yit    xit'    it (2)
Under this model the following assumptions are made;
 Assume error term are homoscedastic and serially independent both within and
between individuals (cross-section); that is
Var ( it )   2
Cor ( it ,  js )  0, when i  j and/or t  s.

The marginal effects β of the set of k vector of time-varying characteristics xit are taken to be

common across i and t, although this assumption can itself be tested. If the model is correctly
specified and regressors are uncorrelated with the error term, the pooled OLS will produce
consistent and efficient estimates for the parameters.

This is the pooled least squares model

4
1 N T
NT
 xit yit
ˆ  i 1 t 1
1 N T
 xit xit
NT i 1 t 1
N T N T
1 1
Where x 
NT
 x , y  NT  y , x
i 1 t 1
it
i 1 t 1
it it  xit  x , yit  yit  y , (3)

And ˆ  y  ˆ x

This formulation does not distinguish between two different individuals and the same individual at
two different points in time. This feature undermines the accuracy of the approach when
differences do exist between cross-sectional units. However, a larger number of observations by
pooling data across time improve efficiency relative to a single cross-section. This model also
does not use any panel information; the data are treated as if there was only one single index.

Traditional Panel Data Model


In this case the constant term,  i varies from individual to individual.

yit   i  xit    it (4)

 The assumption made here is that the errors are homoscedastic and serially
independent both within and between individuals (cross-sections).
Var ( it )   2
Cor ( it ,  js )  0, when i  j and/or t  s.

This is referred to as individual (unobserved) heterogeneity. Also the slopes are the same for all
individuals

Seemingly Unrelated Regression (SUR) Model

In this model the constant terms,  i , and slope coefficients,  i ,vary from individual to

individual.
yit   i  xit i   it (5)

5
In the SUR models, the error terms are assumed to be contemporaneously correlated and
heteroscedastic between individuals.
Var ( it )   i2

Cor ( it ,  jt )   ij Contemporaneous (same period) correlation

Cor ( it ,  js )  0, when t  s.

Note: When we have a large number of independent individuals observed for a few time periods
(N>>T), like in cross-sectional panels, it is not possible to estimate different individual slopes (  i

), for all the exogenous variables. The panel data model is the most appropriate as compared to
Seemingly Unrelated Regression (SUR) Model. While, when we have a medium length time series
for relatively few individuals (T>N), then; in such a case the SUR model may be appropriate.
Thus efficient SUR estimation is mainly used when T≥N and Equation by Equation OLS is used if
K ≤ T ≤ N. Therefore, Pooled model is the most restrictive while, SUR is the most unrestrictive.

One-Way Error Component Model


Panel data models examine group (individual-specific) effects, time effects, or both. These effects
are either fixed effect or random effect. A fixed effect model examines if intercepts vary across
groups or time periods, whereas a random effect model explores differences in error variances. A
one-way model includes only one set of dummy variables (e.g., firm), while a two way model
considers two sets of dummy variables (e.g., firm and year).

Consider the traditional panel data model of the form;


yit   i  xit i   it (6)

The error term in equation 6 is decomposed into;


 it  ui  it (7)

Where ui denotes the unobserved individual specific effect and  it denotes idiosyncratic errors or

idiosyncratic disturbances, which change across time and cross-section. ui is time invariant and

accounts for any individual-specific effect that is not included in the regression.

6
Functional Forms and Notation
The parameter estimate of a dummy variable is part of the intercept in a fixed effect model and a
component of error in the random effect model. Slopes remain the same across groups or time
periods. The functional forms of one-way panel data models are as follows.

Fixed group effect model: yit    ui   xit   it , where  it 


IID 0, 2 
Random group effect model: yit    xit    ui  it  , where  it 
IID 0, 2 
Note that ui is a fixed or random effect and errors are independent identically distributed,

 it 
IID 0, 2 

Notations used in this chapter include;


yi : Dependent variable (DV) mean of group i.

yt : Dependent variable (DV) mean at time t.

xi  : means of independent variables (IVs) of group i.

xt : means of independent variables (IVs) at time t.

y : Overall means of the DV.

x : Overall means of the IVs.

n : The number of groups or firms


T : The number of time periods
N  nT : Total number of observations
k : The number of regressors excluding dummy variables
K  k 1 (Including the intercept)

Panel data and Measurement Error


When surveys collect repeated observations on each individual, the resulting panel data open up
possibilities that are not available in the single cross-section. In particular, the opportunity to
compare the same individual under different circumstances permits the possibility to use the

7
individual as his own control. Recall the farm example we gave previously, we said that one
possibility explaining the very high rates is that there is an unobservable factor, say land quality
that the estimation was not controlling for. If we can have data on the output from the land in
several years, we could control for this unobservable factor even though one does not see it.
Unfortunately there are some costs and dangers in doing so.
Let’s see some of the implications. Say we have the following
Yit   ' X it   i   t  u it

where  t is the time effect (say macro) that affects all individuals in the sample of time t.

 i is the fixed-effects for observation i. If  i is correlated with X’s then we will have problems.
The fixed-effect is, as the qualifier suggests is fixed through time. Given that we have moew than
one observation each of the sample points, we can remove the  i by taking differences or when

there are more than two observations, by subtracting (sweeping out) the individual means. Lets
take the 1st difference
Yi 2  Yi1   2  1   ' ( X i 2  X i1 )  U i 2  U i1
Y    X  U
We can then consistently estimate the parameters since the regression does not have the
unobserved (and correlated with X’s) fixed effect. However this has some consequences;
1- the new regression has half as many observations as the regression in levels.
2- Differencing will sweep out the fixed effects as well as everything else that does not
change over the period of observation.
3- Differencing in the presence of measurement error may in fact complicate the estimation.

Fixed Effect Model


There are several strategies for estimating the fixed effect model. The least squares dummy
variable model (LSDV) uses dummy variables, whereas the within effect model does not. These
strategies, of course, produce identical parameter estimates of non-dummy independent variables.
The between effect model fits the model using group and/or time means of dependent and
independent variables without dummies.

The least squares dummy variable model (LSDV)

8
The least squares dummy variable (LSDV) regression is ordinary least squares (OLS) with dummy
variables. We estimate an LSDV model first by defining a series of individual-specific dummy
variables. In principle one simply estimates the OLS regression of yit on xit and a set of N-1 indicator

variables dit , d 2it ,....d N 1t . The resulting estimator of  from this model turns out to be equal to the

within estimator. This is a special case of the Frisch-Waugh-Lovell theorem.

The key issue in LSDV is how to avoid the perfect multicollinearity or so called “dummy variable
trap.” LSDV has three approaches to avoid getting caught in the trap. These approaches are
different from each other with respect to model estimation and interpretation of dummy variable
parameters (Suits 1984). They produce different dummy parameter estimates, but their results are
equivalent.

The first approach, LSDV1, drops a dummy variable. That is, the parameter of the eliminated
dummy variable is set to zero and is used as a baseline. A variable to be dropped, needs to be
carefully (as opposed to arbitrarily) selected so that it can play a role of the reference group
effectively.
LSDV1: yi  0  1 xi  1d1i   i , or yi  0  1 xi   2 d 2i   i

LSDV2 includes all dummies and, in turn, suppresses the intercept (i.e., set the intercept to zero).
LSDV2: yi  1 xi  1d1i   2 d 2i   i

LSDV3 includes the intercept and all dummies, and then impose a restriction that the sum of
parameters of all dummies is zero.
LSDV3: yi  0  1 xi  1d1i   2 d 2i   i , subject to 1   2  0

Each approach has a constraint (restriction) that reduces the number of parameters to be estimated
by one and thus makes the model identified.

9
The three approaches end up fitting the same model but the coefficients of dummy variables in
each approach have different meanings and thus are numerically different. A parameter estimate in
LSDV2,  d , is the actual intercept (Y-intercept) of group d. It is easy to interpret substantively.

The t-test examines if  d is zero. In LSDV1, a dummy coefficient shows the extent to which the

actual intercept of group d deviates from the reference point (the parameter of the dropped dummy
variable), which is the intercept of LSDV1. The null hypothesis holds that the deviation from the
reference group is zero. In LSDV3, a dummy coefficient means how far its actual parameter is
away from the average group effect.

Therefore, the null hypothesis is the deviation from the average is zero. In short, each approach
has a different baseline and thus tests a different hypothesis but produces exactly the same
parameter estimates of regressors. They all fit the same model; given one LSDV fitted, in other
words, we can replicate the other two LSDVs.

Which approach is better than the others? You need to consider both estimation and interpretation
issues carefully. In general, LSDV1 is often preferred because of easy estimation in statistical
software packages. Oftentimes researchers want to see how far dummy parameters deviate from
the reference group rather than what are the actual intercept of each group.

LSDV is widely used because it is relatively easy to estimate and interpret substantively. This
LSDV, however, becomes problematic when there are many groups or subjects in panel data. If T
is fixed and nT   , only coefficients of regressors are consistent. The coefficients of dummy
variables, are not consistent since the number of these parameters increases as nT increases
(Baltagi 2001). This is called incidental parameter problem. Under this circumstance, LSDV is
useless and thus calls for another strategy, the within effect model.

The Within Estimator


Using the “WITHIN” estimation we can still assume individual effects, although we no longer directly
estimate them. We demean the data so as “wipe out the incidental parameters (individual effects) and

10
estimate β only. This means subtracting the mean for each cross-section from each observation.
Demeaning the data will not change the estimates for β.

Consider the following fixed group effect model:

yit    ui   xit   it , where  it 


IID 0, 2 
The mean model is given by
yi    ui  xi   i
Demeaning the model gives us;
yit  yi    ui  xit   it  (  ui  xi   i  )

 (   )  (ui  ui )  ( xit  xi )   ( it  i )

yit  yi  ( xit  xi )   ( it  i  )

yit  yi  ( xit  xi )  ( it  i )

Using OLS yields the within estimator


1 N
N T  T
ˆw     xit  xi  xit  xi     x it  xi  yit  yi 
 i 1 t 1  i 1 t 1

There are no incidental parameters and the errors still satisfy the usual assumptions.
We can therefore use OLS on the above equation to obtain consistent estimates.

The parameter estimates of regressors in the within effect model are identical to those of LSDV.
The within effect model in turn has several disadvantages.
Since this model does not report dummy coefficients, you need to compute them using the formula
di  yi  xi  . Since no dummy is used, the within effect model has larger degrees of freedom for
error, resulting in small MSE (mean square error) and incorrect (smaller) standard errors of
parameter estimates. Thus, we need to adjust the standard error using the formula;

nT  k
sek  sek
nT  n  k
Note: R2 of the within effect model is not correct because the intercept is suppressed. Also
because dummy variables cannot be used we would be able to say nothing about the relationship
between the dependent variable and the time-invariant characteristics using this estimator.

11
The Between Estimator
The between group effect model, so called the group mean regression, uses group means of the
dependent and independent variables. This data aggregation reduces the number of observations
down to n. The between estimator is the OLS estimator for regression of yi on time averaged

regressors; yi   i  xi  i

This estimator just uses cross-sectional variation. Its major concern is the difference between
different individuals.

Random Effects model:


The fixed effects (FE) model is appropriate when differences between individual agents may
reasonably be viewed as parametric shifts in the regression function. FE is more appropriate if
cross-sectional represents broadly exhaustive sample of the population, for example if the sample
of firms represents a broadly complete coverage of those in the industry. If the sample is drawn
from a larger population and may not reasonably be considered exhaustive, then it may be more
appropriate to view the individual specific terms in the sample as randomly distributed effects
across the full cross-section agents.

Random effects (RE): is more useful for a sample drawn from a large population and both
‘individual effects’ and usual error terms are assumed to be random: note that correlation between
explanatory variables and error terms are assumed to be always zero in RE context.

The one-way random group effect model is formulated as;


yit    xit   ui  it , wit  ui  it

ui IID(0,  u2 )
Where,
vit IID(0,  v2 )

The ui is assumed independent of  it and xit , which are also independent of each other for all i

and t. This assumption is not necessary in the fixed effect model. The components of
Cov(wit , w js )   u2   v2 if i  j and s  t same cross section and time; and

Cov(uit , u js )   u2 if i  j and s  t same cross section and different time .

12
Thus, the  matrix or the variance structure of errors looks like

  u2   v2  u2  u2  u2 
 
  u2  u2   v2  u2  u2 

  u2  u2  u2   v2  u2 
 
 u
2
 u2  u2  u2   v2 
A random effect model is estimated by generalized least squares (GLS) when the variance
structure is known, and by feasible generalized least squares (FGLS) when the variance is
unknown. Compared to fixed effect models, random effect models are relatively difficult to
estimate. In this chapter we assume panel data are balanced.

For the full panel, NT,


 0 0 0 
 
V   0  0 0   IN  
( NTXNT )  0 0  0 
 
 0 0 0 
Where, I N is an identity matrix of dimension N and  is Kronecker product. Let Y represent a

stacked vector of yit formed in a similar form as the error term and a similar X, then
Y  X u
Generalized Least Square requires the removal of a non standard structure of the error covariance
matrix. Define a weight matrix in the form
P  V 1/ 2
PY  PX   Pu
Y*  X *   u *
E (u*, u *')  E ( Puu ' P)  PE (uu ') P  PVP  I NT

GLS estimator given P is thus:


GLS  ( X 'V 1 X )1 X 'V 1Y .
Thus because of serial independence assumption, OLS is inefficient, gives incorrect standard
errors and t-statistics. Thus, GLS estimator, which is transformed OLS. From random effects
model, one can generate a specific form the weight matrix

13
P  V 1/ 2  I N 1/ 2 .

V  I N     v2 I N   2 ii' ,

where i represents an N vector of ones. This allows us to derive the form of


 v
1/ 2  I N  ii'; where   1 
T T (   u2 )1/ 2
2
v

Thus, it is possible to get


 yi1   yi 
y y 
yi*   1/ 2 yi   i 2 i
 
 
 yiN   y 
Do the same for the x and the error term and get:
yit   yi  ( xit   xi )   (vit   vi )

Assuming the errors in this equation are serially uncorrelated and homoskedatic, OLS (GLS)
could be applied. Note that the RE estimator is obtained by a quasi time demeaning, rather than
removing the time average from the explanatory and the dependent variable at each t, random
effects removes a fraction of the time average. If ̂ is close to unity, RE and FE estimates tend to
be close.
1/ 2
 1 
ˆ  1   2 
[1  T (ˆ u /  v )] 
2

o ˆ  1 either T   or ˆ u2  ˆ v2 or both. As  approaches to unity, the precision of the


random effects estimator approaches that of the fixed effects estimator and the effects of
time-constant explanatory variables become harder to estimate.
o NOTE: If we are primarily interested in the effect of a time –constant variable in a panel
data study, the robustness of the FE estimator to correlation between the unobserved
effects and xit is practically useless. Random variables may be our only choice.

Applications of RE estimator attempt to control for the part of i correlated with xit by

including dummy variables for various groups if we have many observations in each group

14
to control for certain amount of heterogeneity that might be correlated with the time
constant elements of xit . These dummies should be termed as (example sector fixed effect

but rather sector dummy)


o Since used  instead of  , hetroscedaticity in ui and vit serial correlation in vit is possible,

so need to use robust standard errors for t and F statistics for pooled OLS regression.
o Use residuals ( eit ) to test for serial correlation in eit  vit   vi. , which are serially assumed

to be uncorrelated. If we detect serial correlation, the assumption is false, has an effect on


the t and F statistics.

There are various proposed ways but the most common way of estimating procedure in the
literature is Swammy-Arora (in Eviews for instance).

GLS estimator of REM is a weighted average of within and between estimator and it converges to
the within estimator as t   and  1.

GLS  w1 within  (1  w1 )ˆbetween

Testing Group Effects (Fixed Effects versus Pooled OLS)


In a regression of the form: yit    i  xit    it ,

The null hypothesis is that all dummy parameters except for one for the dropped are zero:
H 0 : 1  .......  n 1  0 . This hypothesis is tested by the F test which is based on loss of

goodness-of-fit. The robust model in the following formula is LSDV (or within effect model) and
the efficient model is the pooled regression.

 eeEfficient

R
 eeRobust   n  1 2
Robust  REfficient
2
  n 1 F  n  1, nT  n  k 
 eeRobust   nT  n  k  1  R 2
Robust   nT  n  k 
If the null hypothesis is rejected, you may conclude that the fixed group effect model is better than
the pooled OLS model.

Testing Random Effects (Pooled versus Random)

15
The null hypothesis is that cross-sectional variance components are zero, H 0 :  u2  0, Breusch and

Pagan (1980) developed the Lagrange multiplier (LM) test (Greene 2003). A test of presence of
individual specific random effects against iid errors, because REM = Pooled if variance of
individual effects become zero,
v
   0 so that   1   0.
T 2   v2

H 0 :  2  0,
H A :  2  0

 2 N 2 
NT   
T vi.
LM   N T i 1
 1 12 under H 0.
2(T  1)  
  vit
2

 i 1 t 1 
Note: Breusch and Pagan test is two sided test against H A :  u2  0 but variance never be negative.

The LM test for unbalanced/incomplete panels is modified by Baltagi and Li (1990). Decision:
High  12 and low prob (less than 0.10 or 0.05), reject the null and REM is better.

Hausman Test: Fixed Effects versus Random Effects


The Hausman specification test compares the fixed versus random effects under the null
hypothesis that the individual effects are uncorrelated with the other regressors in the model
(Hausman 1978). If correlated (H0 is rejected), a random effect model produces biased estimators,
violating one of the Gauss-Markov assumptions; so a fixed effect model is preferred. Hausman’s
essential result is that the covariance of an efficient estimator with its difference from an
inefficient estimator is zero (Greene 2003).

Two caveats: (1) strict exogeneity is mainted under the null and the alternative; correlation
between xit and uit for any s and t causes both FE and RE to be inconsistent and generally their

plims will differ.


(2)The test is constructed by assuming random effect model is correct and the estimator  re is

efficient. It should be noted that this assumption is not being tested by the Hausman statistic: the

16
Hausman statistic has no systematic power against the alternative that the first assumption is true
but the second is false. Failure of the second assumption causes the usual Hausman test to have a
nonstandard limiting distribution; sysmptotic size larger or smaller than the normal size.
H  (  re   w ) '[V (  w )  V (  re )]1 (  re   w );
H 0 :  re   w  0, i.e., the correct model is a REM. beta is consistent under H 0 .
H 0 :  re   w  0, i.e., the correct model is FEM. beta is inconsistent under H a

The test statistic is asymptotically 22 distributed under the null.


Decision: Statistic is not significant, fail to reject the null implies REM estimator is consistent.
Note: In cases where the difference in the variance matrices is not positive definite, a Moore-
Penrose generalised inverse is used.

Testing for Hetroscedasticty


 Assuming homoscedaticity in the presence of heteroscedatic errors yield
consistent estimates but not efficient; standard errors are biased; thus should
compute robust standard errors.
 Given  i2   2   v2 (  2 is hetroscedastic in REM and  v2 is also hetroscedastic).

 Tests: Verbon (1980) LM test for


H 0 : Homescedasticty i (0,  2 ) and vit (0,  v2 )
H A : Hetroscedasticty i (0,  2i ) and vit (0,  vit2 )
2
T T  ˆ 2 
LM test, LM    i2  1  (2N 1) , one tail test, imperfect test.
2 i 1  ˆ 
Serial Correlation:
 Estimates will be consistent but not efficient; standard errors are biased.
 Effects derive serial correlation; given  it  i  vit , assume correlation over time

due to presence of individuals across the panel:


1 for i  j , t  s 
 
   cor ( it ,  jt )     2

 2   2 i  j, t  s 
  v 
 Time/distance derive serial correlation;

17
 Modeling serial correlation: example
AR(1) vit   vit 1  it AR(2) vit  1vit 1  2vit 2  it or MA(1)  it   vit 1
Autocorrelation coefficient calculated as:
N T

   it it 1
r i 1 t  2
N T
,r N (0,1) and H 0.
  
i 1 t  2
it it

Durbin-Watson is also used.


If hetroscedsticity or serial correlation is suspected: solution: (a) Model the variance and or
correlation of the disturbance term and accordingly address; or take their robust standard errors;
example use Arrelano’s (1987) robust standard errors if autocorrelation or hetroscedaticity is
expected, White method for hetroscedaticty.

Poolability Test
What is poolability? Poolability tests whether or not slopes are the same across groups or over
time. Thus, the null hypothesis of the poolability test is H 0 : ik   k . Remember that slopes

remain constant in fixed and random effect models; only intercepts and error variances matter. The
poolability test is undertaken under the assumption of   
N 0, s 2 I NT . This test uses the F

statistic

Fobs 
 ee   ee   n  1 K
i i
F  n  1 K , n T  K  
 ee n T  K 
i i

Where, ee is the SSE of the pooled OLS and i i e'e is the SSE of the OLS regression for group i. If
the null hypothesis is rejected, the panel data are not poolable. Under this circumstance, you may
go to the random coefficient model or hierarchical regression model.

Similarly, the null hypothesis of the poolability test over time is H 0 : tk   k . The F-test is;

Fobs 
 ee   ee  T  1 K
t t
F T  1 K , T  n  K  
 ee T  n  K 
t t

Where, ee is the SSE of the OLS regression at time t.

18
DYNAMIC PANAL DATA MODELS
Consider the following model:
yit  i   yit 1  xit'   vit ... (1)
where i iid (0,  2 ) and vit iid (0,  v2 )

In the model there are two sources of persistence over time:


 Autocorrelation due to yit 1 ,

 Individual effects characterizing heterogeneity among individuals,


 Problems in the above model.
A. OLS: yit is a function of i and yit 1 , but yit 1 is a function of i as in

yit 1  i   yit 2  xit' 1  vit 1 ….. (2).

This implies yit 1 and i (the error term) are correlated; OLS estimator becomes biased and

inconsistent even if vit (0,  v2 ), cov( yit ,v js )  0 .

B. Fixed Effects
 The model: yit  yi.  i  i   ( yit 1  yi.1 )  ( xit' 1  xi. )  (vit  vi. )

( yit 1  yi.1 ) and (vit  vi. ) are correlated as yit 1 is correlated with vi.

FE is biased; consistency depends on T being large.

C. Random Effects
 The model: yit   yi.  i  i   ( yit 1   yi.1 )  ( xit' 1   xi. )  (vit   vi. )

( yit 1   yi.1 ) and (vit   vi. ) are correlated as yit 1 is correlated with vi.

GLS –will be biased in dynamic panel.

D. Anderson and Hsiao (1981) estimator


 The Model: Remove i and get yit  yit 1  xit'   vit

Use yit-2  yit 2  yit 3 or yit 2 as an instrument for yit-1  yit-1  yit-2

Because: Instruments will not be correlated with


vit  vit 1 given that vit is not serially correlated.

19
Not efficient estimates because does not make use of available moment conditions.

F. Arrelano and Bond (1991) Method – Generalized Method of Moments (GMM)


 More efficient than above.
 Additional instruments if one uses orthogonality condition between lagged
yit and vit .

 The model: Given xit ,

i and get yit  yit 1   ( yit 1  yit 2 )  (vit  vit 1 )


 Valid Instruments: example for
yiT  yiT 1  viT 1 could be yi1, yi 2 ,..., yiT 2

 Instruments for first differenced Arrelano and Bond in a matrix


( yi1 ) 0 00 0 
 0 ( yi1 , yi 2 ) 
Wi 
D  
 
 
 0 ( yi1 , yi 2 ,.. yiT 2 ) 

It is two step GMM estimator, complicated formulation Page 5, Part V. Considering x regressors,
in addition to yi1 we will have x ji1 , where j are regressors and for yi1 , yi 2 ,.. yiT  2 , we have

additional xij1 , xij 2 ,...xijT 1 .

Arrelano and Bond suggested Sargan’s test of over identifying restrictions as


1
 
m  vˆ 'W   Wi ' (vi ) 'Wi  W ' vˆ  2 ( p  k  1)
 
Where p refers number of columns of W and v̂ is residuals from a two-step regression.

ARRELANO AND BOVER (1995) and BLUDELL and BOND SYSTEM ESTIMATOR

Blundell and Bond (1998) used an extended system GMM estimator that uses lagged differences
of yit as instruments for equations in levels and the system GMM estimator is shown to have

dramatic efficiency gains over the basic first difference GMM. For small T, the use of extra

20
moment condition improves efficiency and reduces finite sample bias. Blundell and Bond (2000)
found a more reasonable and more precise estimate of coefficient of capital as compared to the
standard first-differenced GMM for US manufacturing for the year (1982-89).
 Valid Instruments in case of System GMM, assuming no predetermined variables for
simplicity as an example could be shown as:
yiT  yiT 1  viT 1 could be yi1, yi 2 ,..., yiT 2 , yi 2 , yi 3..., yiT 1

 Make sure that first order and second order autocorrelation tests AR(1) and AR(2) are
conducted;
 Sergan Test over identification is conducted assuming homoscedaticity, not on robust
standard errors;
 Check the effect of assuming predetermined variables as endogenous and exogenous;
 Check whether or not the coefficients of the variables are statistically (jointly) zero
using Wald Test.
 Make sure that two step GMM results are not for inference but for specification
because two step standard errors tend to be biased downwards in small samples.
 Hausman test, predictions of errors, test, estimates (R).

NONSTATIONARY PANEL DATA MODELS

Problems of Non-stationarity: Low power and nonstandard limiting distribution of time series
(unit root and cointegration) tests as well as spurious regression (t-statistics diverge in
misspecified regression of I(1) variables).

The problem associated in the use of a panel data:


1. The assumption of homogeneity of the regression parameters which is implicit in the
use of a pooled regression model may not work. Hence, we may consider
heterogeneity of parameters by estimating a model, say, for each country.
2. The non-stationarity, spurious regression and issue of cointegration are serious
problems from the time series dimension. Time-series fully modified estimation
techniques that account for endogeneity of the regressors (e.g. VAR models) and

21
correlation and heteroskedasticity of the residuals can now be combined with FE and
RE panel estimation methods.

In non-staionary panel
 Test statistics and estimators have normal limiting distribution unlike the time series case;
 In a panel data context, it is possible to avoid spurious regression. As T and N goes to
infinity, spurious regression estimates give consistent estimate of the true value of the
parameter because panel estimator averages across individuals and the information in the
independent panel leads a stronger overall signal than pure time series.
 However as N   and T   introduces some other problems in asymptotic analysis
(Phillips and Moon, 2000).
 Two Issues:
i. Panel data tests are the wrong answer to the low power of unit root tests in single
time series.
ii. Hence, panel unit root tests don’t rescue PPP and can’t settle the question of growth
convergence among countries

Tests of Non-stationarity: Tests are notorious for low power and non-standard limiting
distributions.
 All panel unit tests begin with the following specification:
yit   i yit 1   xit   it

xit represent exogenous variables in the model.

 If  i  1 , yit 1 is weakly (trend) stationary and thus the data is stationary.

 If  i  1 , then yit contains a unit root.

 Then:
yit  yit 1   i yit 1  yit 1   xit   it , with i  ( i  1)
Pi
ADF type model: yit  i yit 1   xit  ij yit  j   it ,
j 1

1. Levin, Lin and Chu (LLC); Breitung as well as Hadri assume for this tests the
persistence parameters are common across cross section units;

22
i   (Common Unit Root) .

 LLC and Breitung consider the following Basic ADF specification


Pi
yit   yit 1   xit  ij yit  j   it , where the lag order for the difference terms
j 1

Pi to vary across cross-sections;

H 0 :   0 (There is a unit root).


H 0 :   0 (There is a unit root).

 LLC: 1. Attempts to derive estimates for  from proxies of yit and yit that are

standardized and free of autocorrelations and deterministic components. 2. It requires


specification of the number of lags used in each equation ADF regression, Pi as well as

kernel choices.
 Rreitung: 1. Only the autoregressive component (and not the exogenous components)
is removed when constructing standardized proxies; (2) Proxies and transformed and
detrended. (2) Requires only a specification of the number of lags used in each cross-
section ADF and exogenous regressors. (3). No kernel computation.
 Hardi: (1) Similar to KPSS unit root test (2) Null: No unit root. (3) Based on residuals
from the individual OLS regressions of the yit on a constant or on a constant and a

trend ;( 3) requires specification of OLS type.

 Im, Pesaran and Shin (IPS); Fischer-ADF as well as Fischer PP tests assume i vary

with cross sections. ADF regression for each cross-section

Pi
yit  i yit 1   xit  ij yit  j   it ,
j 1

H 0 : i  0 (There is a unit root).


H 0 : i  0 (There is a unit root).

23
 (1). Estimate separate ADF regressions, (2). The average of the t-statistics for each of
the i from individual ADF regressions, ( tiTi ( pi ) ;

 Average is
 N 
  tiTi ( pi ) 
tNT   i 1 
N
 tNT has asymptotic standard normal distribution as:

 N

N  tNT  N 1  E ( tiT ( Pi )) 
WtNT   i 1  N (0,1)
N
N 1
 var( t
i 1
iT ( Pi ))

IPS provided expected mean and variances of ADF t-statistics for various values of T
and P.

COINTEGRATION
 Because a need for more powerful test;
 Time series tests are less powerful especially for short T.

Types of Tests:
 Residual based Tests;
o DF and ADF tests (Kao tests 1999)
o LM test (McCoskey and Kao -1998).
 Likelihood based test
o Combined individual tests.
Pedroni (Engle-Granger Based) Panel Cointegration Test
 EG (1987) cointegration test is based on residuals of a spurious regression on I (1)
variables.
 If I (1) variables are cointegrated the residuals should be I(0), otherwise I(1).
 Pedroni (1999, 2004) and Kao (1999) extended EG for panel data and suggest several
tests, allowing heterogeneous intercepts and trend coefficients across cross-sections;

24
yit  i   i t  1i x1it  ...   Mi xMit  eit ,
t  1, 2,...T , i  1, 2,...N , m  1, 2,...M .
 y and x are assumed to be I(1).  i and  i are individual and trend effects, which
could be set to zero.
 H 0 : No cointegration, residuals are I (1).

Steps
o Obtain residuals from the above regression;
o Run auxiliary regression of the form
Pi
ei ,t  i ei ,t 1  ij ei ,t  j  vi ,t
j 1

 Pedroni panel cointegration statistic  NT is constructed from the auxiliary regression

and  NT N (0,1).

Kao (1999) EG Based Panel Cointegration Test


Assume common coefficient for 1st stage regressors and specific intercepts;
Bivariate case:
yit   i   xit  eit for yit  yit 1  uit and xit  xit 1  uit

Steps: (1) Obtain residuals from the above regression (2) Run following auxiliary
regression:
eit  ei ,t 1  vit

H 0 :   1, no cointegration.

t-value:
N T
( ˆ  1)  e it 1
 i t 2

Se

Kao presents several statistics for DF , DFt , DF* , DFt* and ADF which are

asymptotically distributed to standard normal.

Combined Individual Tests (Fisher /Johansen)

25
o Maddala and Wu (1999) suggested based on Fisher (1932) the following.
o If  i is the p-value from individual cointegration test for cross section i then

under the null of no contegration


N
2 log( i )   2 (2 N ).
i 1

26

You might also like