0% found this document useful (0 votes)
15 views65 pages

Lecture4 Panelt-Smodels 12-04-2017 Corrections

The document provides a practitioner's guide on panel time-series models using Stata, outlining methods such as panel unit root tests, cointegration tests, and empirical estimators. It discusses the evolution from first-generation methods, which assume cross-sectional independence, to second-generation methods that account for correlation across panel members. Additionally, it includes practical implementation steps and considerations for choosing appropriate estimators based on panel dimensions and characteristics.

Uploaded by

Nguyen Bich Ngoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views65 pages

Lecture4 Panelt-Smodels 12-04-2017 Corrections

The document provides a practitioner's guide on panel time-series models using Stata, outlining methods such as panel unit root tests, cointegration tests, and empirical estimators. It discusses the evolution from first-generation methods, which assume cross-sectional independence, to second-generation methods that account for correlation across panel members. Additionally, it includes practical implementation steps and considerations for choosing appropriate estimators based on panel dimensions and characteristics.

Uploaded by

Nguyen Bich Ngoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Panel time-series models

A practitioner's guide using Stata


Prof. Geoff Pugh (Feb.2015)

26/02/2025 1
Terminology and limits
• Methods
1. Panel unit root tests
2. Cointegration tests
3. Empirical estimators
• Development
– 1st generation methods
• Assumed panel members (groups) to be
cross-sectionally independent
– 2nd generation
• Addressed correlation across panel members
26/02/2025 2
Example
• Many examples use data from Eberhardt (2012)
• xtmg to download
– Either: Command Window: findit xtmg
– Or: Help – Search – xtmg
• Command Window, type:
– use manu_prod
– xtset nwcode year
• Testing a simple theory of labour productivity
– Constant returns Cobb-Douglas production function
• Y = AKiL1-I
• Y – value added (deflated)
• K – capital stock (deflated)
• L – labour force
• A - captures Total Factor Productivity (TFP)
– Systematic unobservable influences (technology parameter)
– Can be rewritten in per capita terms and log-linearised for estimation
• where y=Y/L and k=K/L (i.e. per worker)
• lyit = Ait + αilkit + it
– it = error term (idiosyncratic unobservable influences)
– α = elasticity of labour productivity with respect to output (to be estimated)

26/02/2025 Heterogeneous across countries 3
Methods I: Panel unit root tests
• Start with plots of each data series
– xtline
• xtline ly
• Xtline lk
– Most series display a nonconstant mean
• Most series certainly not “textbook” stationary!
• Panel unit root tests
– Work best with “large” T and at least “moderate” N

26/02/2025 4
Panel Unit Root testing (PURT) in Stata
• Help - search – multipurt
• Using existing xtfisher & pescadf commands
– If not installed
• help multipurt
– Click on xtfisher
» help xtfisher
– Click on pescadf
» help pescadf

• Also uses data handling practices from xtwest


– If not installed
• help multipurt
– Click on xtwest
» help xtwest

• Panel unit root tests for


– Multiple variables
• Maximum of 9
• Reflects suitability of panel TS modelling for small models
(i.e. limited number of variables)
– Not a limitation of “Micro panel” estimators (difference and system GMM)
– Multiple lags
– Unbalanced panels
– Panels with discontinuous time-series
• Two panel unit root tests building on Dickey-Fuller
& Augmented DF regressions
26/02/2025 5
– For models with and without a trend term
User-written programme: Multipurt
• 1st generation test: Maddala and Wu (1999)
– Allows for heterogeneity in the autoregressive coefficient
of the Dickey-Fuller regression
– Ignores cross-section dependence in the data
• 1st generation characteristic
– Null: nonstationarity in all panel members/series
• Alternative: at least one series in the panel is stationary
– If implemented by xtfisher
• Drift indicates that the process under the null hypothesis is a random walk with nonzero drift
– May not be used with pp or trend
– pp indicates that the Phillips-Perron test is used rather than the ADF
• Trend specifies that a trend term be included in the associated regression.
– May not be used with the drift option

• 2nd generation test: Pesaran (2007) CIPS test


– Allows for heterogeneity in the autoregressive coefficient
of the Dickey-Fuller regression
– Allows for cross-section dependence in the data
• 2nd generation characteristic
– Test statistic is constructed from the results of panel-member-specific (A)DF regressions
• Averaging of the group-specific results follows Im, Pesaran and Shin (2003)
– Null: Nonstationarity (i.e. all panels contain a UR)
• Under H0 - test statistic has a non-standard distribution
– Consistent under HA that only a fraction of the series are stationary
26/02/2025 6
– Default test: includes a Constant
Implementing PURTs
• All tests: required: lags(numlist)
– Identifies the maximum number of lagged differences to be included
in the group-specific ADF regressions
• To control for serial correlation in the errors
• Command window: multipurt ly lk, lags(2)
– ly
• Conclusion depends on with or without trend
– lk
• Preponderance of results suggest non-rejection of H0
• For robustness checks
– Try xtfischer options
• xtfisher ly, lag(2) trend display
– Display: presents the unit root tests for each individual group
» 4 non-rejections at the one per cent level
» 8 non-rejections at the five per cent level
• Group-level series overwhelmingly non-stationary
• Conclusion: Non-stationary series
– Real economy variables, hence most likely to be I(1)
– Check: multipurt D_ly D_lk , lags(1)
• One lag sufficient with differenced data
26/02/2025 7
• Uniform results: p=0.000
Additional panel-data unit-root tests
• Stata 13: xtunitroot
– Wide range of tests
– Most require a balanced panel
• 2 work also with unbalanced panels
– Variants of the two multipurt tests
• e.g. IPS
– xtunitroot ips ly, lags(aic 2) demean
– xtunitroot ips ly, lags(aic 2) trend demean
• Demean: mitigates the impact of cross-sectional dependence
– for each time period xtunitroot computes the mean of the series across panels
and subtracts this mean from the series
• aic: number of lags of the series be chosen such that the AIC
for the regression is minimized
– Akaike Information Criteria – a measure of fit not of statistical validity
» Statistical validity require “white noise” residuals
26/02/2025
» Not guaranteed by “best fit” 8
Methods II: Cointegration testing
• Conceptual concerns in testing for cointegration in
panels
– How much heterogeneity do we allow across
groups/countries?
– How do we combine the statistics we arrive at if we
opted for heterogeneous tests?
• Major approaches
1. Residual-based tests
• Run some regression, collect residuals
and test for stationarity
2. Error correction tests
• Construct an error correction model and investigate whether
the EC term is significant
26/02/2025 9
1st generation tests
(Pedroni, 1999, 2000 & 2004)
• Introduces flexibility/heterogeneity in terms of
– Cointegrating vector
– Dynamics
• Output 1: Seven tests for cointegration
– Null of no cointegration
– Assume cross-section independence
• Residual tests in the Engle-Granger tradition
• Two groups of statistics:
– ‘group-mean’, allowing for heterogeity
1. Fit each panel separately
2. Average the resulting test statistics
– ‘panel’ (pooled), imposing a common parameter
1. Pool the data before fitting the model
26/02/2025 2. Compute based on the pooled regression results 10
Estimate of the long-run cointegrating
vector (equilibrium relationship)
• Output 2: the cointegrating vector
– Implements Pedroni's group-mean panel-dynamic ordinary least-
squares (PDOLS) model
• Extends the dynamic ordinary least-squares (DOLS) technique of
estimating the cointegrating vector in a single equation to panel time-
series data
– Medium to large N, large T
• DOLS involves adding lags and leads of the regressors to eliminate
feedback effects and endogeneity.
• PDOLS
– A DOLS regression is conducted for each individual
– The results are combined for the entire panel following Pedroni's
group-mean approach.
– Variables must cointegrate for PDOLS
26/02/2025 11
Implementation
• Help – Search – xtpedroni
• Command window:
– help xtpedroni
– use xtwestdata
• Analysing influence of log per capita GDP (loggdp)
on log per capita health expenditure (loghex)
– N = 20 (OECD countries)
– T = 32 (Annual data 1970-2001)
– xtset ctr year
• Panel defined by Country; Year
• Examples of syntax
– xtpedroni loghex loggdp, trend lagselect(aic)
– xtpedroni loghex loggdp, trend lagselect(aic) extraobs
notdum
• notdum suppresses default common time dummies
26/02/2025 – May be appropriate when averaging over the N dimension 12
may destroy the cointegrating relationship
Interpretation
• Each of the statistics is distributed as standard normal when both the time series and the
cross-sectional dimensions of the panel grow large
• “Thus, to test the null of no cointegration, one simply computes the value … and
compares these to the appropriate tails of the normal distribution” (Pedroni, 1999,
pp.666-68)
– “For the panel variance statistic, large positive values imply that
the null of no cointegration is rejected.”
– “For each of the other six test statistics … the left tail of the normal distribution is used to reject
the null hypothesis … large negative values imply that the null of no cointegration is rejected.”
– Baltagi, 2013, p.296: “Rejection of the null hypothesis means that enough of the individual cross-
sections have statistics ‘far away’ from the means predicted by theory were they to be generated
under the null.”
• A relevant critical value:  1.96
– Critical value for a one-sided test at the 2.5 per cent significance level
– >|1,96|  Reject the null of no cointegration
• Different results according to inclusion/non-inclusion of period dummies
– Non-inclusion makes cross-group error correlation more likely
– But trend should be sufficient to pick up unmodelled growth processes
• If cointegration not rejected,
then strong evidence of a long-run equilibrium relationship
– Without period dummies, health expenditure a “luxury good”

26/02/2025Strong fiscal implications for growing economies 13
2nd generation tests
• ECM approach
– Check whether an ECM does/does not have
error correction
• individual group or full panel

• a0i is the error correction/speed of adjustment term.


• Penultimate term includes lags and leads of x
– Otherwise need to assume exogeneity of x
• Estimate separately for each panel (group)
• If a0i = 0  no error-correction  y,x not cointegrated
• If a0i < 0 Error Correction (negative feedback)  cointegration.
• In total 4 tests, based on
– ‘group mean’
– ‘pooled panel’
• Large negative values reject H0 of no cointegration.
• If cross-group correlation of errors is suspected: use bootstrap to obtain robust SEs
• Long T is important
26/02/2025 14
– Often strong contrast (homogenous/heterogeneous) unless T is large
Implementation
• Help – Search – xtwest
– Command window:
• help xtwest
• Command window
– use xtwestdata
• xtset ctr year
– xtwest loghex loggdp, lags(1 3) leads(0 3) lrwindow(3) constant
trend westerlund
• To allow for cross-group error correlation
– Bootstrap robust SEs for the test statistics
– xtwest loghex loggdp, lags(1 3) leads(0 3) lrwindow(3) constant
trend westerlund bootstrap(50)
26/02/2025 15
Testing for cross-section dependence
• The xtcd command implements the Pesaran (2004) CD test for cross-
section independence in macro panel data.
– Not a post-estimation command
– Can apply xtcd to test ‘raw’ variables pre-estimation
• Employs the correlation-coefficients
between the time-series for each panel member
• Allows for
– Analysis of cross-section dependence in any variable and residual series
– Computation of the averaged correlation and absolute correlation coefficients for
up to nine series at a time
– Both balanced and unbalanced panels
• But not discontinuous data (i.e. time series with “gaps”)
• Null hypothesis: cross-section independence
• Robust to
– Nonstationarity (the spuriousness would show up in the averaging)
– Parameter heterogeneity or structural breaks
• Performs well even in small samples
26/02/2025 16
xtcd: Implementation
• Help – Search – xtcd
– Command window: help xtcd
• Command window
– xtcd loghex loggdp
• Compare
– In this case, insufficient common observations
– Panel too small
– Panel too unbalanced
• What now?
– Assume that cross-section dependence exists
• Implication
– Either: Use estimators that address the problem
– Or: estimate with period dummies to attenuate the problem
26/02/2025 17
Methods III: Estimators
1. Bias-Corrected Least Squares Dummy Variable (LSDV)
– Stata user-written programme: xtlsdvc
• Bruno (2005) SJ 5(4) 437-500
2. Panel-dynamic Ordinary Least-squares (PDOLS)
– Stata user-written programme: xtpedroni
• Neal (2014) SJ 14(3)
3. Mean Group (MG)
4. Pooled Mean Group (PMG)
– Stata user-written programme: xtpmg
• Blackburne and Frank (2007) SJ 7(2) 197-208
5. Common Correlated Effects Mean Group (CCEMG)
6. Augmented Mean Group (AMG)
– Stata user-written programme: xtmg
• Eberhardt (2012) 12(1) 61-71
26/02/2025 18
Considerations for the choice of
estimator
• A challenge!
– Assumptions under which estimators deliver an optimal solution almost
never realised in practice
– Estimators typically do not satisfy all considerations
• Rarely a definitive econometric method
– Need to be aware of the trade offs
– Importance of robustness checking
• Otherwise choice of technique can introduce selection bias into estimation
• Review estimators available in Stata, taking into account
1. Panel dimensions
2. Dynamics
3. Parameter heterogeneity
4. Potential endogeneity
5. Cross-group correlation
26/02/2025 19
26/02/2025 20
Consideration 1: Dimensions T &N: Where
are panel TS techniques appropriate?
• Panel data where both cross-section (N) and time series (T)
dimensions are at least “moderately large”
– Say, T>25 and N about the same
– Often T>N
– T large enough for separate estimation for each group
• e.g. datasets from international organisations (typically annual data back to 1950)
– Blackburne and Frank (2007): N=24; T= 34
– Eberhardt (2012): N=48; T=33 (unbalanced)
• e.g. Penn World Table
– Referred to a macro panels; “data fields”
• Default position should not be micro-panel estimators
• Different asymptotics
– Difference- and System GMM
• Pooled estimators allowing only the intercepts to differ across groups
• Do not explicitly address parameter heterogeneity
– But interaction terms can be used
– TS panel estimators allow for parameter heterogeneity
• Assumption of parameter homogeneity inappropriate for
26/02/2025
large N & large T panels 21
A note on Seemingly Unrelated
Regression (SUR)
• Separate regressions
– e.g. same model for different countries (but can include equation-specific
observables
• Allows for common unobservable factors that affect all countries at the
same time
– Contemporaneous error correlation provides additional information to improve the
efficiency of estimates
• Advantages
– SUR generally more efficient than
OLS estimation of separate equations
– Allows for complete panel heterogeneity
• Disadvantages
– Long and narrow datasets
• Small N method: N<10 (Cameron and Trivedi, 2009)
• T relatively large: T/N > 3 (at least) (Beck and Katz, 1995)
– Time-series properties of the data not the focus
• Dynamic SUR models problematic (Greene, 2012 – 7th Ed. - pp.344-45)
26/02/2025 22
Indicative dimensions of T and N
for dynamic panel estimation
Cross
-section
Large
Small Moderate
Time N>25-30
-series
N<10 10≤N≤25


Micro
• Group Mean estimators (but
Insufficient (?) applied to
Small data for • Bias- macro panels)
T<10 dynamic Corrected  Difference
analysis LSDV GMM
 System
GMM
Moderat • Group Mean
Unobserved
e
10≤T≤2
components
model (CFRs)
• Bias-
Corrected 
LSDV
26/02/2025 5 23
Consideration 2: Dynamics
• In economics, history matters!
– Economic variables are not independently realised
in every period
• Instead: expectations; anticipations; persistence; adjustment; path
dependency; hysteresis effects;
and so on
• In macro and in microeconomics
• In econometrics, non-stationarity matters
– Problem of spurious regression unless
non-stationary variable cointegrate
– Requires different estimators
• Dynamics typically introduced into models
via the lagged dependent variable
26/02/2025
– Autoregressive models 24
Introducing history via the lagged
dependent variable
• Starting with the simplified specification in Equation 2,
repeatedly substitute for the lagged dependent variable.
• Substitute for in (2):

• Substitute (3) into (2)

• Substitute for in (4):

• Substitute (5) into (4)

• Gather terms

• … and so on.
26/02/2025 25
Lagged dependent variable introduces the entire
history of the independent variables
• Dynamic specifications
– Repeated substitution demonstrates that the lagged dependent variable introduces the
entire history of the independent variable(s)
• Equation 6’, current y influenced
– not only by current x
– but also by the cumulated effects from x one period back ()
and two periods back () … and so on
• Unobserved influences (shocks) u also accumulate
• Persistence effects attenuate the more remote the period
– Shown by the increasing exponent on
• Substitutions demonstrate that a dynamic specification includes
the whole history of both the observed and the unobserved influences
on the current value of the dependent variable
• By taking this history into account, we are able to identify the additional short-run,
contemporaneous effects of x on y
– Informative about the process of adjustment of y to x (short-run dynamics)
• Specifying and estimating a static model in the presence of dynamics
– Biased and inconsistent estimates
• Long-run equilibrium effects and short-run deviations not separately identified
26/02/2025 26
• Hence the need for dynamic estimators
Requirements of a dynamic estimator

• Fully exploit the information content


of time series
• From the Granger Representation Theorem
1. The long-run equilibrium relationship between
variables y and x
2. Adjustment towards long-run equilibrium
• Plus
3. Short-run impact of current changes in x on y

26/02/2025 27
Consideration 3: Parameter heterogeneity
• Pesaran and Smith (1995 & 1996)
– Estimating LR relationships from dynamic heterogeneous
panels
Main conclusions:
 Random coefficients models
do not extend to dynamic panels
– Pooling dynamic heterogeneous panels can give misleading results
Þ Test for common slope coefficients
• usually rejected

 Cross-section estimates
give average LR effects
– The group mean regression
26/02/2025 • Coefficients robust to dynamic misspecification of the underlying time series 28
Slope heterogeneity in models with panel
data: consequences
• Static models
– Each approach
• Unbiased and consistent estimators
of the average effect ()
– Nothing to choose between the methods
• Dynamic models
– Lagged dependent variable
– 4 approaches not all valid

29
Initial four approaches to estimating LR
effects with heterogeneous panels
 Mean Group estimator
– Separate regressions for each time series
1
i = 1, 2, …, N  
N
 i
– Take the mean:
• Average effect of some exogenous variable
on the endogenous variable
 Effects models: FE and RE
– Restriction: i = 
 Aggregate time series estimator
– In each period, aggregate data over units
• Pooled across groups
– One time series
• Standard time series analysis
 Aggregate cross section estimator
– In each group, aggregate data over time
– One cross section of group means
• Standard cross-section analysis
30
 average LR effects
Verdict on initial four approaches
 Separate time series regressions
– OK with large T
• Normal time series analysis
– Results for individual panel members (groups) unreliable
and can be difficult to interpret
• Unless T is large
– But panel averages establish a reliable mean estimate (Eberhardt, 2013, p.442)
• Basis of Mean Group approach
 FE and RE
– Estimates biased and inconsistent
 Aggregation over countries
– Estimates inconsistent
 Aggregation over time
– OK: estimates consistent
• With large T (>25)
– Maybe 15 or less?
• If T small, report that results
“valid approximately”
– Be cautious about conclusions!

For large N and large T


– Estimators from 1 and 4 both give 31
the average LR effect of x on y
Estimating time-series panel models in the presence
of slope heterogeneity: 2 basic approaches

1. Mean Group estimation


– Basis of modern time-series panel estimators
• PDOLS
• MG and PMG
• CCEMG and AMG
2. Group Mean estimation
– Aggregate cross section
• Model formulated in terms of LR coefficients
– No SR dynamics

32
Implementing the Group-Mean estimator
• Heterogeneous dynamic model
yit = ixit + iyi,t-1 + it (2.1)
• From Smith and Pesaran (1995), p.87

• Aggregate over time


yi  xi  yi ,  1  vi (2.17)
– Where bar  group mean
• Averaged over time t = 1, …,T
– V-bar is a complex error term
• Containing terms from the random coefficients

• For large T (>25)


y
– Mean of lagged dependent variable can be omitted
i i  x  vi
– Hence, estimate:
• Valid asymptotically
• For a finite sample, estimators biased
– Even with large N
• However, in empirical investigation
“the estimates seem quite robust to variations in the number of years used to form
averages”
– Smith and Pesaran (1995), p.101

• Supports conventional wisdom!


– Times series estimates  short-run effects 33
Why use the Group Mean
cross-section estimator?
• Statistical
– To overcome problems arising from dynamic
heterogeneous panels
– Cross-section analysis much more tractable
• Can implement in any package
• Economic
– To obtain long-run coefficients
• But discards dynamics!

34
Mean group estimation of
time-series panel models
• Parameters vary across groups
– In “large T panels”, parameters cannot be estimated in a
model that imposes cross-group homogeneity
• Unless the slope coefficients are truly identical across groups
• But slope heterogeneity is pervasive
• “Traditional panel analysis” (“small T”)
– Pool the time series
• Assume parameter homogeneity
– Efficiency gain

• Time-series panel analysis


– Separate time series regressions for the N groups
– Estimates averaged
26/02/2025 • Allows parameter heterogeneity 35
Consideration 4: Addressing endogeneity
• According to Pesaran (1997)
– The Autoregressive Distributive Lag (ARDL) approach to time-series panel
analysis “continues to be applicable even if the Xt s are endogenous,
irrespective of whether they are I(1) or not”.
• Although these results were obtained in a time-series context, they
have been applied routinely to panel analysis: for example,
according to Pesaran et al. (1999, p.624):
– … it is relatively straightforward to allow for the possible dependence of Xit
on εit when when estimating the long-run coefficients, as long as Xit have
finite-order autoregressive representations.
• Note the doubling subscripting, indicating panel analysis
• The practical implication
– Augmenting the ARDL specification with an adequate number of lags
makes the estimation of the long-run coefficients immune to endogeneity
problems
26/02/2025 • Irrespective of whether the regressors are stationary 36
Pesaran’s argument
• The presence of endogenous regressors is not a cause for concern in
in estimating long-run parameters in the context of ARDL modelling

When the off-diagonal components of the Variance-Covariance matrix are non-zero


then the derivation of the long-run relationship between yt and xt should allow for the
contemporaneous feedback that exists between the variables
If ut and εt are jointly normally distributed, then:

and ƞt is distributed independently of εt

26/02/2025 37
Use this result for Equation 5

The difference between equations (5) and (8) is that in (8) “xt can be treated as
strictly exogenous” (Pesaran, 1997, p.183), even if . In this case, the
long-run relationship between yt and xt is given by:

where * denotes long-run or equilibrium values.


• This long-run relationship has two important features:
– “it allows for the direct as well as the indirect effects of changes in xt on yt that take
place through the contemporaneous dependence between ut and εt; and
– “the long-run coefficients  and ϴ will be invariant to the parameters of the xt
process … “.
• And the consequences for estimation are that
– … valid asymptotic inferences on the short-run and long-run parameters can be
made, using the least squares estimates of the ARDL model (8), once the order of the
26/02/2025
ARDL is appropriately augmented to allow for possible contemporaneous correlations 38
between ut and εt
Consideration 5: Cross-group
variable and/or error correlation
• Variable and/or residual correlation across panel members: due to
– common shocks (e.g. recession)
– spillover effects
• Cross-section dependence (CSD) can lead to
– imprecise estimates (error correlation)
– identification problem (variable correlation)
• Standard panel estimators assume cross-section independence
• Two approaches:
1. Spatial econometrics:
– Econometrician ‘knows’ how panel members are associated/correlated (e.g.
neighbourhood)
– Models this association explicitly employing a weight matrix
• ‘spatially lagged dependent variable’
2. Time-series panel analysis: Common factor models
– Models dependence with unobserved common factors ft
with heterogeneous impact i
– Trick
26/02/2025 is to estimate common factors 39
or blend out their impact on estimation.
Estimator 1: Bias-Corrected Least Squares
Dummy Variable (LSDV)
• LSDV ( FE) not consistent for finite T in dynamic panel models
– Bias can be substantial even for T=30
• Bias correction of LSDV in dynamic panel models
– Accounts for >90% of the bias
• Advantages
– Suitable for unbalanced panels
• But gaps cause problems in bootstrapping the SEs
– Suitable for small N (e.g. when 10≤N≤20)
• With small N and strictly exogenous regressors,
outperforms Difference GMM and System GMM
• But some evidence of acceptable finite-sample properties
with weakly exogenous regressors (?)
– Suitable for small T
• Bruno (2005, p.487) gives an example with N=29 and T=9 (or even fewer; p.491)
– Addresses small-sample bias
• Disadvantages
– For strictly exogeneous regressors
– Not easy to address heteroskedasticity
• But bootstrapped SEs take “full account of the dependency in the DGP”
(data generating process)
– Does not address parameter heterogeneity or cross-group error
26/02/2025 correlation 40
• So estimate with a full set of period DVs
Example 1
• Procedure (implemented by the software)
– Standard dynamic panel model

– Subtract bias approximation estimations from LSDV estimates


• Using in Stata
– Download: xtlsdvc
• Command Window
– help xtlsdvc
– use abdata
– xtlsdvc n w k yr1977-yr1984 if ind==4, initial(ab) first bias(3) vcov(100)
• cf Difference GMM (AB) and Bias-Corrected LSDV estimates
– Choice of initial estimator makes little difference, but choose System
GMM (BB) if the dependent variable is highly persistent (say, >0.8)
• Calculate long-run coefficient(s) and SEs, by post estimation
test for non-linear combinations: nlcom
– e.g., nlcom (_b[k]/(1-_b[l1.n]))
26/02/2025 41
Example 2
• Using Everhardt’s data
– N=48, T=33
– xtlsdvc ly lk year_2-year_33, initial(ab) first bias(3)
vcov(100) lsdv
• Compare Difference GMM (AB), LSDV and
Bias-Corrected LSDV estimates
• Save in a log file for comparison

26/02/2025 42
Estimator 2: Panel Dynamic
Ordinary Least Squares (PDOLS)
• Model: cointegrating regression
– Yit = αi + ixit + uit
• Cointegration tests of long-run hypotheses using aggregate panel data
– Extension of the individual time-series dynamic OLS
• Not clear why “dynamic” (does not estimate error-correction)
– Single-equation estimator of the cointegrating vector
– Applied to nonstationary data & cointegrated variables
• “Reasonably large” N and T (Pedroni, 2001: N=20, T=246 monthly observations)
• Regression on each panel group
– Yit = αi + ixit +i,jxit-j + u*it
• where j=-p…p and p=1,2,…P
– Cointegrating regression augmented with lead and lagged differences
of the regressor(s) to control for the endogenous feedback effect
• Output
– I coefficients and associated t-statistics average over the entire panel
– Individual group estimates of I coefficients and associated t-statistics
26/02/2025 43
Features
• Allows for greater flexibility in the presence of heterogeneity
of the cointegrating vectors
• Point estimates from the between-dimension estimator interpreted as the
mean value for the cointegrating vectors
• Test statistics constructed from the between-dimension estimators test the
null hypothesis H0: βi = β0 for all i
against the alternative hypothesis HA: βi  β0
• Values for βi are not constrained to be the same under HA
– Important advantage over test statistics constructed from
the within-dimension estimators
• Averaging across groups to create a single time series
 βi the same under both H0 & HA
– Much lower small-sample size distortion
than the within-dimension estimators
• PDOLS does not take account of cross-group error correlation
– But by default does estimate with common time dummies

26/02/2025 44
Examples
• Using Stata
– Download: xtpedroni
• Command Window
– help xtpedroni
– xtset country time
• Example 1: use pedronidata
– xtpedroni logexrate logratio, notest lags(5) mlags(5) b(1)
– xtpedroni logexrate logratio, notest lags(5) mlags(5) b(1) notdum
• Without time dummies
• b(1) – compute all t-stats against H0 slope coefficient=1 (Default: b(0))
– Appropriate for long-run PPP
– xtpedroni logexrate logratio, full notest lags(4) mlags(4) b(1) notdum
• Results for each country (no common time dummies)
• Example 2: use manu_prod (Eberhardt’s data)
– xtpedroni loghex loggdp, trend lagselect(aic)
• lags – default is 2
• mlags – default (chosen automatically for each individual)
26/02/2025 45
Estimators 3 and 4: Mean Group (MG) and
Pooled Mean group (PMG)
• For large N and large T panels
– Large N, small T panels
• Individual groups pooled
• FE or RE or RE+IV estimators (e.g. difference and system GMM)
– Only the intercepts vary across groups
– Large N, Large T
• T large enough to fit the model to each group separately
• Assumption of slope homogeneity not appropriate
• Nonstationarity also a concern
• Hence the need for models to estimate nonstationary,
heterogeneous dynamic panels
– MG
• Estimates N time-series regressions and average the coefficients
– PMG
26/02/2025 • Combination of averaging and pooling 46
Procedure
• Start with an ARDL model
– p lagged values of the dependent variable (starting from 1)
– q current and lagged values of the independent variable(s) (starting from 0)

• Reparameterise (1) into an Error Correction Model (ECM)

26/02/2025 47
Interpretation
• All terms in the ECM are stationary
– No need for unit root tests in ARDL approach
• ’i is the vector of coefficients that define the long-run
relationship between the variables
• i is the error-correcting speed of adjustment
– i of the form 1-Persistence (i.e. the inverse of persistence)
– If i = 0, then there is no evidence of a long-run relationship
– If i < 0, and statistically significant, then there is negative feedback,
which means that the LR relationship is stable
• i.e. the dependent variable returns to LR equilibrium over time
• ’ij is a vector of coefficients on the differenced values of the
independent variable(s)
– Defines the short-run (impact) relationship(s)
between the differenced dependent variable
and the differenced independent variable(s)
26/02/2025 48
Three approaches to estimation
• Two extremes
1. Dynamic LSDV estimator (i.e. Dynamic FE)
• Time-series data pooled across groups
• Only the intercepts are allowed to vary by group
• Unless the slopes are identical, estimates are inconsistent and potentially misleading
2. Mean Group estimator
• Fitted separately for each group
• Mean of the estimated coefficients calculated
• Intercepts, slope coefficients, and error variances
are all allowed to differ across groups
• Compromise: Pooled Mean Group estimator
– Combines both pooling and averaging
– Intercepts, short-run coefficients and error variances
all allowed to differ across the groups
• As in MG estimation
– Long-run coefficients constrained to be equal across groups
• As in dynamic FE estimation
• Exploiting similarity between groups to gain efficiency over MG
– Fewer parameters to estimate than in MG
26/02/2025 49
Key assumption
• Short-run parameter heterogeneity
– Unique features of cross-section groups – e.g.
countries – likely to bring about short-run differences
in response of dependent variable to changes in
independent variables
• Long-run parameter homogeneity
– Common features more likely to yield similar long-run
responses, especially if the groups are similar
• e.g. OECD countries
– To check this assumption: Hausman test
• If the true model is heterogeneous
– MG: always consistent
» But not most efficient if LR parameters homogeneous
26/02/2025 – PMG: inconsistent 50
PMG: advantages and Disadvantages
• OK in an unbalanced panel
• Even one with gaps
• ARDL platform
– In I(1) panels allows for mix of cointegration I < 0 )
and noncointegration (i = 0)
– Potential endogeneity of X variable(s) addressed
• Assumption of a common long-run equilibrium relationship
is appealing for small sets of arguably ‘similar’ groups
rather than large diverse macro panels
• Error term it
– Assumed to be identically distributed across groups and time
and uncorrelated with the regressors
• Can use cluster-robust SEs with DFE
– But does not address cross-group error correlation
26/02/2025 51
Example
• Using Everhardt’s data:
– use manu_prod
• Model will not converge with a full set
(minus one) of period dummies
• Estimate with time trend to take care of unmodelled growth
process(es)
– gen trend = year - 1970
– xtpmg d.ly d.lk , lr(l.ly lk trend ) ec(ec) replace pmg
– xtpmg d.ly d.lk , lr(l.ly lk trend ) ec(ec) replace mg
– hausman mg pmg, sigmamore
• Do not reject H0: mg and pmg estimates not systematically different
– Assumption of long-run parameter homogeneity not rejected
• pmg preferred
• Inclusion of the time trend in the CV brings pmg estimate
of the capital effect closer to ccemg estimate
– But not inclusion only of a constant in the CV
• Can add lags of d.ly and d.lk
– Do not make much difference in this case
26/02/2025 52
Estimators 5 and 6: Common Correlated Effects
Mean Group and Augmented Mean Group
• “2nd generation” models (Eberhardt)
– Allow for unobserved correlation across panel members (groups)
• Model
– One covariate and one unobserved common factor
• But generalises to multiple covariates
and multiple unobserved common factors

• Both N and T “moderate to large”


– Eberhardt et al. (2013) estimate on N=119, T=22
• CCE estimators “unlikely to perform as expected” in a short T panel
(e.g. average T=7.5) (p.445)
26/02/2025 53
Explanation
• Observables
– yit and xit
– i country-specific slope coefficient on xit
• Unobservables uit
– Group fixed effects α1i
• Capture time-invariant heterogeneity across groups
– Unobserved common factor ft with heterogeneous factor loadings λi
to capture
• Time-invariant heterogeneity
• Cross-section dependence
• Unobserved common factor gt
– Allows the observable xit to be driven by common factors other than ft
• Both unobserved common factors ft and gt
– Can be nonlinear and nonstationary
– Influence any cointegrating relationship
• Additional complexity
– ft in both (2) and (3)
– Induces endogeneity in the estimation equation (1)
it and eit assumed white noise
•26/02/2025 54
Estimation
• All Mean Group estimators follow the same
principle two-step methodology
1. Estimate N group-specific OLS regressions
2. Average the estimated coefficients across groups
• CCEMG and AMG estimators
– Step 1 augmented with additional covariates
– Step 2 the same
• Of interest is the average relationship across panel members
• But all N regression results can be listed
– Enabling analysis of patterns of heterogeneity
– Further insight into the ultimate source of heterogeneity

26/02/2025 55
CCEMG – implementation
• CCEMG estimator allows for the set up in (1), (2) and (3), which induces
1. cross-section dependence
2. time invariant unobservables with heterogeneous impact across groups
3. Identification problems
• i cannot be identified if xit contains ft
• CCEMG Solution (Pesaran, 2006)
– Augment the group-specific regression equation (1) with cross-section averages of
both the dependent and independent variables
• Using the data for the entire panel
• yi-bar and xi-bar added an additional regressors
in each of the N regression equations
– Step 2 is the usual MG averaging across panel members
• Estimated coefficients on the cross-section average variables
– Not interpretable in any meaningful way
– Present to “blend out” the biasing impact of the unobservable common factors
• Both “weak” factors (e.g. local spillover effects) &“strong” factors (e.g. global shocks)
• 26/02/2025
CCEMG robust to nonstationary common factors 56
Importance of the cross-section
dimension
• … as the cross-section dimension becomes large, the
unobserved common factor ft
can be captured by a combination of
cross-sectional averages of y and x
– Nicely explained in Eberhardt et al., 2013, p.443
• Parameters on the cross-sectional averages
and intercepts must be group specific
– Allows for both
1. Common factors (shocks, spillovers, etc) (ft)
2. Group-specific reactions to the common factors (λi)
– See equations (2) and (3)
– Achieved in Mean Group estimators by construction
• Each group estimated separately
26/02/2025 57
AMG
• Alternative to the Pesaran (2006) CCEMG
– Similar performance with respect to bias and RMSE
in panels with nonstationary variables
and multifactor error terms
• Both cointegrated and non-cointegrated
• Main difference is the intended application to
cross-country production functions
– CCEMG
• Unobservable common factor ft treated as a “nuisance”
– To be accounted for but of no interest
– AMG
• In cross-country production functions, unobservables represent total
factor productivity (TFP), a variable of interest
26/02/2025 58
Other features of CCE estimators
• Focus of CCEMG
– Consistent estimates of the parameters on the observed variable(s) (xit)
– i.e. the mean of the heterogeneous βi
• long-run coefficient(s) - see Equation 1
• Disadvantage
– Not informative about adjustment or short-run dynamics
– Does not address endogeneity of regressor(s)
• Except for endogeneity arising from common factors
• Does not address endogeneity of types common in macro panels
– E.g. simultaneity/feedback effects

• Advantages
– “ … can accommodate a fixed number of strong common factors and an
infinite number of weak common factors … where the former can be thought
of as common global shocks and the latter as local or regional spillover
effects.”
– “… remarkably robust to structural breaks, lack of cointegration, and certain
serial correlation.”
26/02/2025 • Eberhardt et al., 2013, p.443 59
Example
• Command Window: help xtmg
• Using Everhardt’s data:
– use manu_prod
• N=48, T=33
• Command window
– xtset the data
• Cross-country productivity analysis
– ly and lk both I(1)
• Theory:
– Capital per worker coefficient should be around 1/3 rd
• Compare
– MG (default): xtmg ly lk, trend res(eMG)
– CCEMG: xtmg ly lk, cce trend res(eCMGt)
• Trend option
– Each group-specific regression augmented with a linear trend term
• Takes account of unmodelled growth process(es)
• Can test MG and CCEMG residuals for cross-group correlation
– But not in this case
26/02/2025 • xtcd does not work when gaps in the panel 60
Choice of estimator: Summary
First check suitability for the dimensions of the data

Does the BCLSDV PDOLS MG PMG CCEMG AMG


estimator allow:
Unbalanced panel √ √ √ √ √ √
Gaps in time series √ √ √ √ √
Long-run equilibrium √ √ √ √ √
Error Correction √ √
Short-run dynamics √ √
Cointegrating vector
for each group √ √ √ √
EC and SR dynamics
for each group √ √
Parameter
heterogeneity √ √ √ √ √
Potential endogeneity √ √ √
Cross-group error
correlation √ √

26/02/2025 61
So, which estimator?
• Lack of systematic comparisons
• One useful study: Banerjee et al. (2010)
– Monte Carlo simulations
• Findings
1. Simultaneity a much larger source of bias than cross-group dependence
2. As yet, no CCE-IV estimator
• Research ongoing
3. Common factors can - at least partially - be addressed by time dummies
• In the observable part of the model rather than
very complicated unobservable effects in the residuals
• Implications (implicit rather than the authors’ own conclusions)
– CCEMG may not always be preferable to PMG
– Maybe use PMG and/or MG augmented with year effects
• If a full set of year DVs infeasible, try period DVs (e.g. before/after GFC)
• No universally preferred estimator
• Recommendation
1. Consider the nature of the problem to be investigated
and match with the most appropriate estimator
2. Robustness check
26/02/2025 • Are your results an artefact of your chosen estimator? 62
Using diagnostic tests to decide?
• Econometric models exist on two levels
– As statistical models
– As economic models
• Model has to be valid on both levels
• Statistical validity a necessary condition for a valid econometric model
– Example 1: Single equation time-series analysis using OLS/DOLS
• Regression residuals assumed NIID
– Normally, Individually and Independently Distributed
• Assumptions checked by standard diagnostic tests
– Serial (Auto)correlation in the residual
– Heteroskedasticity
– Autocorrelated Conditional Heteroskedasticity (ARCH)
» Especially for high-frequency data
– Linearity
– Normality
» Not essential for OLS estimation, as long as iid holds
» But informative about the underlying Data Generating process
– Example 2: “Micro” panel analysis (Wide N, Short T) using GMM
• Consistency depends on the validity of the instruments
• Assumptions checked by diagnostic tests
– m1/m2
– Sargan/Hansen

• Usual procedure when deciding between two models


– Give precedence to diagnostic tests
26/02/2025 63
– iff the model is statistically well specified, interpret for economic meaning
Usual procedure not possible
in time-series panel analysis
• Mainstream panel analysis
– Collections of cross-section data over short periods of time
• Time-series panel analysis
– Collections of time-series data
• Compare
– Time-series analysis
• Highly developed and well-understood diagnostics
to test for statistical misspecification
– Time-series panel analysis (Banerjee et al. 2010, p.2)
• “By and large … a misspecification-test-free zone … issues central to time-
series estimation such as the investigation of model specification have not
been duly emphasised … the lack of residual diagnostics in macro panel
econometrics now more than ever represents a glaring omission.”
• To date, no diagnostic procedures that practitioners can use routinely and with
confidence
– “A great deal of work remains”! Banerjee et al. 2010, p.34
26/02/2025 64
References
• Key resources: user-written programmes introduced in The Stata Journal
• mtpmg
– Blackburne, E. and Frank M. (2007). Estimation of nonstationary heterogeneous panels. The Stata Journal, 7(2) 197-208.
• xtlsdvc
– Bruno, G. (2005). Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. The Stata Journal, 5(4)
473-500.
• xtmg
– Eberhardt, M. (2012). Estimating panel time-series models with heterogeneous slopes. The Stata Journal, 12(1) 61-71.
• xtpedroni
– Neal, T. (2014). Panel cointegration analysis with xtpedroni. The Stata Journal, 14(3) 684-692.
• xtwest
– Persyn, D. and Westerlund, J. (2008). Error-Correction – based cointegration tests for panel data. The Stata Journal, 8(2) 232-241.
• Important articles
– Banerjee, A., Eberhardt, M. and Reade, J. (2010). Panel Estimation for Worriers. https://
sites.google.com/site/medevecon/publications-and-working-papers
– Eberhardt, M., Helmers, C. and Strauss H. (2013). Do spillovers mattter when estimating private returns to R&D? The Review of Economics and
Statistics, 95(2) 436-448.
– Maddala, G.S. and S. Wu (1999) 'A comparative study of unit root tests with panel data and a new simple test', Oxford Bulletin of Economics
and Statistics, Vol.61(Special Issue), pp.631-652.
– Pesaran, M.H. (2006). Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967-1012.
– Pesaran, M. H. (2007). A simple panel unit root test in the presence of cross-section dependence. Journal of Applied Econometrics, 22(2), 265-
312.
– Pesaran, M.H. and R. Smith, 1995, Estimating long-run relationships from dynamic heterogeneous panels, Journal of Econometrics, 68, 79-113.
– Pesaran, M.H., R. Smith and Kyung S.I., 1996, Dynamic linear models for heterogeneous panels, Chapter 8, in L. Mátyás and P. Sevestre, eds.,
The Econometrics of Panel Data: handbook of theory and applications, 2nd edition (Kluwer Academic Publishers, Dordrecht).
– Pesaran, M.H. (1997). The Role of Economic Theory in Modelling the Long-Run. The Economic Journal 107, 178-191.
– Pesaran, M.H., Shin, Y. and Smith, R. (1999). Pooled mean Group Estimation of Dynamic Heterogeneous Panels. Journal of the American
Statistical Association, 94(446) (June) 621-34.
– Pedroni, P. (1999). Critical values for cointegration tests in heterogeneous panels with multiple regressors. Oxford Bulletin of Economics and
Statistics, 61(Special Issue), 653-670.
– Pedroni, P. (2001). Purchasing Power Parity Tests in Cointegrated Panels. The Review of Economics and Statistics, 83(4) 727-731.
• Textbook
– Baltagi, B.H. (2013) Econometric Analysis of Panel Data. 5th Ed. New York: Wiley.
• 26/02/2025
Presentation 65
• MARKUS EBERHARDT, Panel time-series modelling: New tools for analyzing xt data.

You might also like