Lecture4 Panelt-Smodels 12-04-2017 Corrections
Lecture4 Panelt-Smodels 12-04-2017 Corrections
26/02/2025 1
Terminology and limits
• Methods
1. Panel unit root tests
2. Cointegration tests
3. Empirical estimators
• Development
– 1st generation methods
• Assumed panel members (groups) to be
cross-sectionally independent
– 2nd generation
• Addressed correlation across panel members
26/02/2025 2
Example
• Many examples use data from Eberhardt (2012)
• xtmg to download
– Either: Command Window: findit xtmg
– Or: Help – Search – xtmg
• Command Window, type:
– use manu_prod
– xtset nwcode year
• Testing a simple theory of labour productivity
– Constant returns Cobb-Douglas production function
• Y = AKiL1-I
• Y – value added (deflated)
• K – capital stock (deflated)
• L – labour force
• A - captures Total Factor Productivity (TFP)
– Systematic unobservable influences (technology parameter)
– Can be rewritten in per capita terms and log-linearised for estimation
• where y=Y/L and k=K/L (i.e. per worker)
• lyit = Ait + αilkit + it
– it = error term (idiosyncratic unobservable influences)
– α = elasticity of labour productivity with respect to output (to be estimated)
•
26/02/2025 Heterogeneous across countries 3
Methods I: Panel unit root tests
• Start with plots of each data series
– xtline
• xtline ly
• Xtline lk
– Most series display a nonconstant mean
• Most series certainly not “textbook” stationary!
• Panel unit root tests
– Work best with “large” T and at least “moderate” N
26/02/2025 4
Panel Unit Root testing (PURT) in Stata
• Help - search – multipurt
• Using existing xtfisher & pescadf commands
– If not installed
• help multipurt
– Click on xtfisher
» help xtfisher
– Click on pescadf
» help pescadf
Micro
• Group Mean estimators (but
Insufficient (?) applied to
Small data for • Bias- macro panels)
T<10 dynamic Corrected Difference
analysis LSDV GMM
System
GMM
Moderat • Group Mean
Unobserved
e
10≤T≤2
components
model (CFRs)
• Bias-
Corrected
LSDV
26/02/2025 5 23
Consideration 2: Dynamics
• In economics, history matters!
– Economic variables are not independently realised
in every period
• Instead: expectations; anticipations; persistence; adjustment; path
dependency; hysteresis effects;
and so on
• In macro and in microeconomics
• In econometrics, non-stationarity matters
– Problem of spurious regression unless
non-stationary variable cointegrate
– Requires different estimators
• Dynamics typically introduced into models
via the lagged dependent variable
26/02/2025
– Autoregressive models 24
Introducing history via the lagged
dependent variable
• Starting with the simplified specification in Equation 2,
repeatedly substitute for the lagged dependent variable.
• Substitute for in (2):
• Gather terms
• … and so on.
26/02/2025 25
Lagged dependent variable introduces the entire
history of the independent variables
• Dynamic specifications
– Repeated substitution demonstrates that the lagged dependent variable introduces the
entire history of the independent variable(s)
• Equation 6’, current y influenced
– not only by current x
– but also by the cumulated effects from x one period back ()
and two periods back () … and so on
• Unobserved influences (shocks) u also accumulate
• Persistence effects attenuate the more remote the period
– Shown by the increasing exponent on
• Substitutions demonstrate that a dynamic specification includes
the whole history of both the observed and the unobserved influences
on the current value of the dependent variable
• By taking this history into account, we are able to identify the additional short-run,
contemporaneous effects of x on y
– Informative about the process of adjustment of y to x (short-run dynamics)
• Specifying and estimating a static model in the presence of dynamics
– Biased and inconsistent estimates
• Long-run equilibrium effects and short-run deviations not separately identified
26/02/2025 26
• Hence the need for dynamic estimators
Requirements of a dynamic estimator
26/02/2025 27
Consideration 3: Parameter heterogeneity
• Pesaran and Smith (1995 & 1996)
– Estimating LR relationships from dynamic heterogeneous
panels
Main conclusions:
Random coefficients models
do not extend to dynamic panels
– Pooling dynamic heterogeneous panels can give misleading results
Þ Test for common slope coefficients
• usually rejected
Cross-section estimates
give average LR effects
– The group mean regression
26/02/2025 • Coefficients robust to dynamic misspecification of the underlying time series 28
Slope heterogeneity in models with panel
data: consequences
• Static models
– Each approach
• Unbiased and consistent estimators
of the average effect ()
– Nothing to choose between the methods
• Dynamic models
– Lagged dependent variable
– 4 approaches not all valid
29
Initial four approaches to estimating LR
effects with heterogeneous panels
Mean Group estimator
– Separate regressions for each time series
1
i = 1, 2, …, N
N
i
– Take the mean:
• Average effect of some exogenous variable
on the endogenous variable
Effects models: FE and RE
– Restriction: i =
Aggregate time series estimator
– In each period, aggregate data over units
• Pooled across groups
– One time series
• Standard time series analysis
Aggregate cross section estimator
– In each group, aggregate data over time
– One cross section of group means
• Standard cross-section analysis
30
average LR effects
Verdict on initial four approaches
Separate time series regressions
– OK with large T
• Normal time series analysis
– Results for individual panel members (groups) unreliable
and can be difficult to interpret
• Unless T is large
– But panel averages establish a reliable mean estimate (Eberhardt, 2013, p.442)
• Basis of Mean Group approach
FE and RE
– Estimates biased and inconsistent
Aggregation over countries
– Estimates inconsistent
Aggregation over time
– OK: estimates consistent
• With large T (>25)
– Maybe 15 or less?
• If T small, report that results
“valid approximately”
– Be cautious about conclusions!
32
Implementing the Group-Mean estimator
• Heterogeneous dynamic model
yit = ixit + iyi,t-1 + it (2.1)
• From Smith and Pesaran (1995), p.87
34
Mean group estimation of
time-series panel models
• Parameters vary across groups
– In “large T panels”, parameters cannot be estimated in a
model that imposes cross-group homogeneity
• Unless the slope coefficients are truly identical across groups
• But slope heterogeneity is pervasive
• “Traditional panel analysis” (“small T”)
– Pool the time series
• Assume parameter homogeneity
– Efficiency gain
26/02/2025 37
Use this result for Equation 5
The difference between equations (5) and (8) is that in (8) “xt can be treated as
strictly exogenous” (Pesaran, 1997, p.183), even if . In this case, the
long-run relationship between yt and xt is given by:
26/02/2025 42
Estimator 2: Panel Dynamic
Ordinary Least Squares (PDOLS)
• Model: cointegrating regression
– Yit = αi + ixit + uit
• Cointegration tests of long-run hypotheses using aggregate panel data
– Extension of the individual time-series dynamic OLS
• Not clear why “dynamic” (does not estimate error-correction)
– Single-equation estimator of the cointegrating vector
– Applied to nonstationary data & cointegrated variables
• “Reasonably large” N and T (Pedroni, 2001: N=20, T=246 monthly observations)
• Regression on each panel group
– Yit = αi + ixit +i,jxit-j + u*it
• where j=-p…p and p=1,2,…P
– Cointegrating regression augmented with lead and lagged differences
of the regressor(s) to control for the endogenous feedback effect
• Output
– I coefficients and associated t-statistics average over the entire panel
– Individual group estimates of I coefficients and associated t-statistics
26/02/2025 43
Features
• Allows for greater flexibility in the presence of heterogeneity
of the cointegrating vectors
• Point estimates from the between-dimension estimator interpreted as the
mean value for the cointegrating vectors
• Test statistics constructed from the between-dimension estimators test the
null hypothesis H0: βi = β0 for all i
against the alternative hypothesis HA: βi β0
• Values for βi are not constrained to be the same under HA
– Important advantage over test statistics constructed from
the within-dimension estimators
• Averaging across groups to create a single time series
βi the same under both H0 & HA
– Much lower small-sample size distortion
than the within-dimension estimators
• PDOLS does not take account of cross-group error correlation
– But by default does estimate with common time dummies
26/02/2025 44
Examples
• Using Stata
– Download: xtpedroni
• Command Window
– help xtpedroni
– xtset country time
• Example 1: use pedronidata
– xtpedroni logexrate logratio, notest lags(5) mlags(5) b(1)
– xtpedroni logexrate logratio, notest lags(5) mlags(5) b(1) notdum
• Without time dummies
• b(1) – compute all t-stats against H0 slope coefficient=1 (Default: b(0))
– Appropriate for long-run PPP
– xtpedroni logexrate logratio, full notest lags(4) mlags(4) b(1) notdum
• Results for each country (no common time dummies)
• Example 2: use manu_prod (Eberhardt’s data)
– xtpedroni loghex loggdp, trend lagselect(aic)
• lags – default is 2
• mlags – default (chosen automatically for each individual)
26/02/2025 45
Estimators 3 and 4: Mean Group (MG) and
Pooled Mean group (PMG)
• For large N and large T panels
– Large N, small T panels
• Individual groups pooled
• FE or RE or RE+IV estimators (e.g. difference and system GMM)
– Only the intercepts vary across groups
– Large N, Large T
• T large enough to fit the model to each group separately
• Assumption of slope homogeneity not appropriate
• Nonstationarity also a concern
• Hence the need for models to estimate nonstationary,
heterogeneous dynamic panels
– MG
• Estimates N time-series regressions and average the coefficients
– PMG
26/02/2025 • Combination of averaging and pooling 46
Procedure
• Start with an ARDL model
– p lagged values of the dependent variable (starting from 1)
– q current and lagged values of the independent variable(s) (starting from 0)
26/02/2025 47
Interpretation
• All terms in the ECM are stationary
– No need for unit root tests in ARDL approach
• ’i is the vector of coefficients that define the long-run
relationship between the variables
• i is the error-correcting speed of adjustment
– i of the form 1-Persistence (i.e. the inverse of persistence)
– If i = 0, then there is no evidence of a long-run relationship
– If i < 0, and statistically significant, then there is negative feedback,
which means that the LR relationship is stable
• i.e. the dependent variable returns to LR equilibrium over time
• ’ij is a vector of coefficients on the differenced values of the
independent variable(s)
– Defines the short-run (impact) relationship(s)
between the differenced dependent variable
and the differenced independent variable(s)
26/02/2025 48
Three approaches to estimation
• Two extremes
1. Dynamic LSDV estimator (i.e. Dynamic FE)
• Time-series data pooled across groups
• Only the intercepts are allowed to vary by group
• Unless the slopes are identical, estimates are inconsistent and potentially misleading
2. Mean Group estimator
• Fitted separately for each group
• Mean of the estimated coefficients calculated
• Intercepts, slope coefficients, and error variances
are all allowed to differ across groups
• Compromise: Pooled Mean Group estimator
– Combines both pooling and averaging
– Intercepts, short-run coefficients and error variances
all allowed to differ across the groups
• As in MG estimation
– Long-run coefficients constrained to be equal across groups
• As in dynamic FE estimation
• Exploiting similarity between groups to gain efficiency over MG
– Fewer parameters to estimate than in MG
26/02/2025 49
Key assumption
• Short-run parameter heterogeneity
– Unique features of cross-section groups – e.g.
countries – likely to bring about short-run differences
in response of dependent variable to changes in
independent variables
• Long-run parameter homogeneity
– Common features more likely to yield similar long-run
responses, especially if the groups are similar
• e.g. OECD countries
– To check this assumption: Hausman test
• If the true model is heterogeneous
– MG: always consistent
» But not most efficient if LR parameters homogeneous
26/02/2025 – PMG: inconsistent 50
PMG: advantages and Disadvantages
• OK in an unbalanced panel
• Even one with gaps
• ARDL platform
– In I(1) panels allows for mix of cointegration I < 0 )
and noncointegration (i = 0)
– Potential endogeneity of X variable(s) addressed
• Assumption of a common long-run equilibrium relationship
is appealing for small sets of arguably ‘similar’ groups
rather than large diverse macro panels
• Error term it
– Assumed to be identically distributed across groups and time
and uncorrelated with the regressors
• Can use cluster-robust SEs with DFE
– But does not address cross-group error correlation
26/02/2025 51
Example
• Using Everhardt’s data:
– use manu_prod
• Model will not converge with a full set
(minus one) of period dummies
• Estimate with time trend to take care of unmodelled growth
process(es)
– gen trend = year - 1970
– xtpmg d.ly d.lk , lr(l.ly lk trend ) ec(ec) replace pmg
– xtpmg d.ly d.lk , lr(l.ly lk trend ) ec(ec) replace mg
– hausman mg pmg, sigmamore
• Do not reject H0: mg and pmg estimates not systematically different
– Assumption of long-run parameter homogeneity not rejected
• pmg preferred
• Inclusion of the time trend in the CV brings pmg estimate
of the capital effect closer to ccemg estimate
– But not inclusion only of a constant in the CV
• Can add lags of d.ly and d.lk
– Do not make much difference in this case
26/02/2025 52
Estimators 5 and 6: Common Correlated Effects
Mean Group and Augmented Mean Group
• “2nd generation” models (Eberhardt)
– Allow for unobserved correlation across panel members (groups)
• Model
– One covariate and one unobserved common factor
• But generalises to multiple covariates
and multiple unobserved common factors
26/02/2025 55
CCEMG – implementation
• CCEMG estimator allows for the set up in (1), (2) and (3), which induces
1. cross-section dependence
2. time invariant unobservables with heterogeneous impact across groups
3. Identification problems
• i cannot be identified if xit contains ft
• CCEMG Solution (Pesaran, 2006)
– Augment the group-specific regression equation (1) with cross-section averages of
both the dependent and independent variables
• Using the data for the entire panel
• yi-bar and xi-bar added an additional regressors
in each of the N regression equations
– Step 2 is the usual MG averaging across panel members
• Estimated coefficients on the cross-section average variables
– Not interpretable in any meaningful way
– Present to “blend out” the biasing impact of the unobservable common factors
• Both “weak” factors (e.g. local spillover effects) &“strong” factors (e.g. global shocks)
• 26/02/2025
CCEMG robust to nonstationary common factors 56
Importance of the cross-section
dimension
• … as the cross-section dimension becomes large, the
unobserved common factor ft
can be captured by a combination of
cross-sectional averages of y and x
– Nicely explained in Eberhardt et al., 2013, p.443
• Parameters on the cross-sectional averages
and intercepts must be group specific
– Allows for both
1. Common factors (shocks, spillovers, etc) (ft)
2. Group-specific reactions to the common factors (λi)
– See equations (2) and (3)
– Achieved in Mean Group estimators by construction
• Each group estimated separately
26/02/2025 57
AMG
• Alternative to the Pesaran (2006) CCEMG
– Similar performance with respect to bias and RMSE
in panels with nonstationary variables
and multifactor error terms
• Both cointegrated and non-cointegrated
• Main difference is the intended application to
cross-country production functions
– CCEMG
• Unobservable common factor ft treated as a “nuisance”
– To be accounted for but of no interest
– AMG
• In cross-country production functions, unobservables represent total
factor productivity (TFP), a variable of interest
26/02/2025 58
Other features of CCE estimators
• Focus of CCEMG
– Consistent estimates of the parameters on the observed variable(s) (xit)
– i.e. the mean of the heterogeneous βi
• long-run coefficient(s) - see Equation 1
• Disadvantage
– Not informative about adjustment or short-run dynamics
– Does not address endogeneity of regressor(s)
• Except for endogeneity arising from common factors
• Does not address endogeneity of types common in macro panels
– E.g. simultaneity/feedback effects
• Advantages
– “ … can accommodate a fixed number of strong common factors and an
infinite number of weak common factors … where the former can be thought
of as common global shocks and the latter as local or regional spillover
effects.”
– “… remarkably robust to structural breaks, lack of cointegration, and certain
serial correlation.”
26/02/2025 • Eberhardt et al., 2013, p.443 59
Example
• Command Window: help xtmg
• Using Everhardt’s data:
– use manu_prod
• N=48, T=33
• Command window
– xtset the data
• Cross-country productivity analysis
– ly and lk both I(1)
• Theory:
– Capital per worker coefficient should be around 1/3 rd
• Compare
– MG (default): xtmg ly lk, trend res(eMG)
– CCEMG: xtmg ly lk, cce trend res(eCMGt)
• Trend option
– Each group-specific regression augmented with a linear trend term
• Takes account of unmodelled growth process(es)
• Can test MG and CCEMG residuals for cross-group correlation
– But not in this case
26/02/2025 • xtcd does not work when gaps in the panel 60
Choice of estimator: Summary
First check suitability for the dimensions of the data
26/02/2025 61
So, which estimator?
• Lack of systematic comparisons
• One useful study: Banerjee et al. (2010)
– Monte Carlo simulations
• Findings
1. Simultaneity a much larger source of bias than cross-group dependence
2. As yet, no CCE-IV estimator
• Research ongoing
3. Common factors can - at least partially - be addressed by time dummies
• In the observable part of the model rather than
very complicated unobservable effects in the residuals
• Implications (implicit rather than the authors’ own conclusions)
– CCEMG may not always be preferable to PMG
– Maybe use PMG and/or MG augmented with year effects
• If a full set of year DVs infeasible, try period DVs (e.g. before/after GFC)
• No universally preferred estimator
• Recommendation
1. Consider the nature of the problem to be investigated
and match with the most appropriate estimator
2. Robustness check
26/02/2025 • Are your results an artefact of your chosen estimator? 62
Using diagnostic tests to decide?
• Econometric models exist on two levels
– As statistical models
– As economic models
• Model has to be valid on both levels
• Statistical validity a necessary condition for a valid econometric model
– Example 1: Single equation time-series analysis using OLS/DOLS
• Regression residuals assumed NIID
– Normally, Individually and Independently Distributed
• Assumptions checked by standard diagnostic tests
– Serial (Auto)correlation in the residual
– Heteroskedasticity
– Autocorrelated Conditional Heteroskedasticity (ARCH)
» Especially for high-frequency data
– Linearity
– Normality
» Not essential for OLS estimation, as long as iid holds
» But informative about the underlying Data Generating process
– Example 2: “Micro” panel analysis (Wide N, Short T) using GMM
• Consistency depends on the validity of the instruments
• Assumptions checked by diagnostic tests
– m1/m2
– Sargan/Hansen