0% found this document useful (0 votes)
20 views13 pages

Dynamic Panel Data Modelling Using Maximum Likelihood An Alternative To Arellan

Panel data

Uploaded by

Marc Kaboré
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Dynamic Panel Data Modelling Using Maximum Likelihood An Alternative To Arellan

Panel data

Uploaded by

Marc Kaboré
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Applied Economics

ISSN: 0003-6846 (Print) 1466-4283 (Online) Journal homepage: http://www.tandfonline.com/loi/raec20

Dynamic panel data modelling using maximum


likelihood: an alternative to Arellano-Bond

Enrique Moral-Benito, Paul Allison & Richard Williams

To cite this article: Enrique Moral-Benito, Paul Allison & Richard Williams (2018): Dynamic panel
data modelling using maximum likelihood: an alternative to Arellano-Bond, Applied Economics,
DOI: 10.1080/00036846.2018.1540854

To link to this article: https://doi.org/10.1080/00036846.2018.1540854

Published online: 03 Nov 2018.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=raec20
APPLIED ECONOMICS
https://doi.org/10.1080/00036846.2018.1540854

Dynamic panel data modelling using maximum likelihood: an alternative to


Arellano-Bond
Enrique Moral-Benitoa, Paul Allisonb and Richard Williamsc
a
Banco de España, Madrid, Spain; bUniversity of Pennsylvania, Philadelphia, PA, USA; cUniversity of Notre Dame, Notre Dame, IN, USA

ABSTRACT KEYWORDS
The Arellano-Bond estimator is widely used among applied researchers when estimating Dynamic panel data;
dynamic panels with fixed effects and predetermined regressors. This estimator might behave maximum likelihood
poorly in finite samples when the cross-section dimension of the data is small (i.e. small N), estimation
especially if the variables under analysis are persistent over time. This paper discusses a JEL CLASSIFICATION
maximum likelihood estimator that is asymptotically equivalent to Arellano and Bond (1991) C23
but presents better finite sample behaviour. The estimator is based on an alternative para-
metrization of the likelihood function introduced in Moral-Benito (2013). Moreover, it is easy
to implement in Stata using the xtdpdml command as described in a companion paper
published in the Stata Journal, which also discusses further advantages of the proposed
estimator for practitioners.

I. Introduction Bank of Spain, the 2016 Spanish Stata Users Group


meeting in Barcelona, and the 2015 Stata Users
Panel data are very popular among applied
Conference in Columbus, Ohio. Code and data
researchers in many different fields from econom-
used in this article are available on the website
ics to sociology. A panel data set is one that
https://www3.nd.edu/~rwilliam/dynamic/index.
follows a given sample of subjects over time, and
html which includes further materials related to the
thus provides multiple observations on each sub-
practical implementation of the estimator.
ject in the sample. Subjects may be workers, coun-
studies of economic growth, unobserved het-
tries, firms, regions, etc., while the multiple
erogeneity at the country level may be associated
observations per subject usually refer to different
with cultural differences or geographical charac-
moments in time (e.g. years, quarters, or months).
teristics across countries (see Islam 1995).
Indeed, time series and cross-sectional data can be
Moreover, in a regression of y on x, panel data
thought of as special cases of panel data that are in
can accommodate feedback effects from current y
one dimension only (one panel subject for the
to future x, so that this particular form of reverse
former, one-time point for the latter).
causality can easily be accounted for by using well-
Allowing for the presence of subject-specific
known panel data techniques where the x regres-
unobserved heterogeneity represents one of the
sors are said to be predetermined (see Chapter 8
key advantages of using panel data. Having multi-
in Arellano 2003).1 Predetermined regressors are
ple observations per individual allows identifying
a time-invariant component that is unobserved to also labelled as weakly exogenous or sequentially
the econometrician and may be correlated with exogenous in the literature (Wooldridge 2010).
other observable characteristics in the data set. Dynamic panels in which the regressors include
For instance, in cross-country the lagged dependent variable are the best example
*The authors thank valuable comments by in this category. This is so because feedback from
Manuel Arellano, Kristin MacDonald, an anon- current y to future y exists by construction (see for
ymous referee, and attendants to seminars held at instance Arellano and Bond 1991).

CONTACT Enrique Moral-Benito [email protected] Banco de España, Madrid, Spain


1
Intuitively, this assumption implies that only future values of the explanatory variables are affected by the current value of the dependent variable.
© 2018 Informa UK Limited, trading as Taylor & Francis Group
2 E. MORAL-BENITO ET AL.

The panel GMM estimator discussed in Arellano starting just after WWII (see Barro and Sala-i-
and Bond (1991) is probably the most popular Martin 2003). On the other hand, as pointed out
alternative for estimating dynamic panels with by Bazzi and Clemens (2013), concern has inten-
unobserved heterogeneity and predetermined sified in recent years that many instrumental vari-
regressors. To be more concrete, the typical model ables of the type considered in panel GMM
to be estimated is given by the traditional partial estimators such as Arellano and Bond (1991) and
adjustment with feedback model, which is very Arellano and Bover (1995) may be invalid, weak,
popular among economists (see Arellano 2003, or both. The effects of this concern may be sub-
143). The beauty of the Arellano and Bond (1991) stantial in practice as recently illustrated by Kraay
estimator is that relies on minimal assumptions and (2015).
provides consistent estimates even in panels with In this paper, we discuss a maximum likelihood
few time series observations per individual (i.e. estimator based on the same identification
small T). However, it does require large samples assumption as Arellano and Bond (1991) so that
in the cross-section dimension (i.e. large N) and its both alternatives are asymptotically equivalent.
finite sample performance might represent a con- However, we show in Section 3 that our likeli-
cern when the number of units in the panel is hood-based alternative is strongly preferred in
relatively small, especially if the variables under terms of finite sample performance, especially
analysis are persistent (see Moral-Benito 2013). when the number of units in the panel (N) is
Against this background, several alternative small. Moreover, as illustrated in some of our
estimators have been proposed in the literature simulations as well as in Williams, Allison, and
with the same identifying assumption. For Moral-Benito (forthcoming), there are situations
instance, Alonso-Borrego and Arellano (1999), in which the likelihood approach may be preferred
Ahn and Schmidt (1995), and Hansen, Heaton, to standard GMM even when N is large and the
and Yaron (1996) consider different GMM var- unbalancedness represents a concern.
iants of the Arellano and Bond (1991) estimator The particular likelihood function presented in
with better finite sample performance. Also, like- this paper is an alternative parametrization to the
lihood-based approaches have been considered one presented in Moral-Benito (2013) but based on
under similar identifying assumptions resulting the same set of assumptions. In particular, it can be
in better finite sample behaviour (e.g. Hsiao, interpreted as an intermediate situation between
Pesaran, and Tahmiscioglu 2002; Moral-Benito the full covariance structure (FCS) and the simul-
2013). A practical limitation of these alternatives taneous equation model (SEM) representation dis-
is that their implementation by practitioners is far cussed in Moral-Benito (2013). This is so because
from straightforward given the requirement of the restrictions are enforced in the covariance
certain programming capabilities as well as matrix as in the SEM representation, but the ana-
numerical optimization routines. lysis is not conditional on the initial observations as
We do not include in the above category the so- in the FCS parametrization (see also Allison 2005;
called system-GMM estimator by Arellano and Allison, Williams, and Moral-Benito 2017).
Bover (1995) and Blundell and Bond (1998) This particular likelihood is useful in practice
because it requires an additional identifying because it can be maximized using numerical opti-
assumption for consistency. In particular, it relies mization techniques available in standard software
on the mean stationarity assumption that has been packages. To be more concrete, the maximum
proved to be controversial in most empirical set- likelihood estimator discussed in this paper is
tings. Intuitively, this assumption requires that the easy to implement in Stata adapting the sem com-
variables observed in the data set come from mand as described in the companion paper by
dynamic processes that started in the distant past Williams, Allison, and Moral-Benito (forthcom-
so that they have already reached their steady-state forthcoming). The intuition is that period-by-per-
distribution, which is hard to motivate in panels of iod equations from the panel data model are used
young workers or firms as well as country panels to form a system of equations of the type
APPLIED ECONOMICS 3

considered in SEM models (see e.g. Bentler and Time-invariant regressors can also be included,
Weeks 1980). Moreover, there are other software under the assumption that they are uncorrelated
packages that can estimate this model by maxi- with the fixed effects, and advantage over the
mum likelihood including LISREL, EQS, Amos, Arellano and Bond (1991) approach. Finally, in
Mplus, PROC CALIS (in SAS), lavaan (for R), addition to the individual-specific effects αi , we
and OpenMx (for R). can allow cross-sectional dependence by including
The rest of the paper is organized as follows: a set of time dummies. However, for the sake of
Section 2 describes the likelihood function. exposition, we focus on the specification (1) that
Section 3 illustrates the finite sample performance features the main ingredients of the approach and
of the proposed estimator in comparison to the facilitates its illustration.
Arellano and Bond (1991) GMM alternative. In
Section 4 we illustrate the usefulness of the esti-
mator in the context of an empirical application The likelihood function
investigating the effect of financial development In the spirit of Allison (2005) and Allison,
on economic growth across countries based on Williams, and Moral-Benito (2017), this section
Levine, Loayza, and Beck (2000). Section 5 develops a parameterization of the model in (1)–
concludes. (2) that leads to a maximum likelihood estimator
that is asymptotically equivalent to the Arellano
and Bond (1991) estimator augmented with the
II. Partial adjustment with feedback moment condition arising from lack of autocorre-
We consider the following model: lation as discussed in Ahn and Schmidt (1995).
Moral-Benito (2013) also consider alternative
yit ¼ λyit1 þ βxit þ αi þ vit (1) parametrizations of the same model. In particular,
  the restrictions implied by (2) can be placed in
i ; xi ; αi ¼ 0ðt ¼ 1; :::; TÞði ¼ 1; :::; NÞ
E vit jyt1 t
either the coefficient matrices or the variance-cov-
(2)
ariance matrix depending on how the system of
where i indexes units in the panel (workers, coun- equations is written. The parametrization consid-
tries, firms. . .) and t refers to time periods (dec- ered here is useful because it can be easily imple-
ades, years, quarters. . .). We also define the t  1 mented in practice using the sem command in
vectors of past realizations yt1 i ¼ ðyi;0 ; :::; yi;t1 Þ0 Stata as described in Williams, Allison, and
and xit ¼ ðxi;1 ; :::; xi;t Þ0 . Note that β and xit can also Moral-Benito (forthcoming). Note also that other
be vectors including more than one predeter- SEM packages such as Mplus, PROC CALIS in
mined regressor. In addition, we can easily include SAS, and lavaan or OpenMx in R can also be used.
strictly exogenous regressors. In addition to the T equations given by (1), we
This model relaxes the strict exogeneity complete the model with an equation for yi0 as
assumption for the x variables. The assumption well as T additional reduced-form equations for x3
in (1) allows for feedback from lagged values of y
yi0 ¼ vi0 (3)
to the current value for x. Moreover, it implies
lack of autocorrelation in vit since lagged vs are xi1 ¼ i1 (4)
linear combinations of the variables in the condi-
tioning set. Crucially, assumption (2) is the only xiT ¼ iT (5)
assumption we impose throughout the paper.2 In order to rewrite the system of equations given
Indeed, this is also the only assumption required by (1) and (3)–(5) in matrix form, we define the
for consistency of the Arellano and Bond (1991) following vectors of observed data (Ri ) and dis-
GMM estimator. turbances (Ui ):
2
Despite we derive the log likelihood under normality, it is important to remark that the resulting estimator is consistent and asymptotically normal
regardless of non-normality.
3
Needless to say, additional x predetermined regressors can be included as well as other exogenous covariates. We only discuss this canonical specification
for the sake of notation simplicity.:
4 E. MORAL-BENITO ET AL.

Ri ¼ ðyi1 ; :::; yiT ; yi0 ; xi1 ; :::xiT Þ0 (6) We next define the following matrices of
coefficients:
Ui ¼ ðαi ; vi1 ; :::; viT ; vi0 ; i1 ; :::iT Þ0 (7) 0
1 0 0 ... 0 λ β 0 ... 0
1
B λ 1 0 ... 0 0 0 β ... 0 C
Importantly, the covariance matrix of the distur- B C
B .. .. C
bances captures the restrictions imposed by (2) B 0 λ 1 ... 0 0 . . C
B . .. .. C
and it is given by: B . C
B . . . C
  B¼B B 0 ... λ 1 0 0 ... 0 β C
C
11 B C
VarðUi Þ ¼  ¼ B 0 ... 0 C
  B . .. .. C
0 2 21 22 1 B .. . . C
σα B ITþ1 C
@ 0 ... A
B 0 σ 2v1 C 0
B C
B . .. . . C
B .. . . C
B C
B 0 0  σ 2 C D ¼ ðd I2Tþ1 Þ
B v1 C
B ϕ 0  0 σ 2 C
¼B 0 v0 C where d ¼ ð1; :::; 1; 0; :::; 0Þ0 is a column vector
B ϕ 0  0 ω σ 2 C
B 1 01 1 C with T ones and T þ 1 zeros.
Bϕ Ψ C
B 2 21  0 ω02 σ 2T σ 22 C We can now write Equations (1) and (3)-(5) in
B C
B .. .. .. .. .. . . C matrix form:
@ . . . . . . A
ϕT ΨT1 ΨT2  ω0T ω1T  σ T 2
BRi ¼ DUi (10)
(8) Thus, assuming normality, the joint distribution
where the element 21 captures the correlation of Ri is:
between the fixed effects and the regressors  
1
through the ϕ parameters, and the feedback pro- Ri , N 0; B1 DD0 B0 (11)
cess from y to x allowing for nonzero correlations with resulting log-likelihood:
between the current vs and future s:
N  
 1
L /  log det B1 DD0 B0
ψth if < ht 2
covðvih ; it Þ ¼ (9)  
0 otherwise 1X N
1 1
 Ri 0 B1 DD0 B0 Ri (12)
2 i¼1
On the other hand, 11 gathers the lack of auto-
correlation in the v disturbances and the fixed As shown by Moral-Benito (2013), the maximizer
effects αi , and 22 gathers all of the contempora- of L is asymptotically equivalent to the Arellano
neous and dynamic relationships between the x and Bond (1991) GMM estimator4 regardless of
variables. In contrast to the standard Arellano non-normality. In Appendix A we illustrate, for
and Bond (1991) approach, we can accommodate the case of T ¼ 3 that the number of over-identi-
time-varying error variances in 11 . fying restrictions is the same in both cases. Also, it
Note that the covariance matrix of the joint is worth highlighting that likelihood ratio tests of
distribution of the initial observations ðyi0 ; xi1 Þ the model’s over-identifying restrictions can be
and the individual effects αi is unrestricted with used to test these and other hypotheses of interest.
the corresponding covariances captured through The likelihood function in Equation (12) is
the parameters ϕ0 , ϕ1 and ω01 . This is in sharp derived for balanced panels, i.e., panels in which
contrast with the mean stationarity assumption there are non-missing values for all variables and
required by the so-called system-GMM estimator all individuals at all time periods.5 However, unba-
discussed in Arellano and Bover (1995) and lanced panels are very common in practice. The
Blundell and Bond (1998). simplest approach for considering the ML
4
To be more concrete, the asymptotic equivalence is only guaranteed if we augment the Arellano and Bond (1991) estimator with moments resulting from
lack of autocorrelation in the errors as discussed by Ahn and Schmidt (1995).
5
Note that the GMM approach in Arellano and Bond (1991) can easily handle unbalanced panels by using all information available.
APPLIED ECONOMICS 5

estimator in unbalanced panels is based on the so- To be more concrete, the data for the depen-
called listwise deletion, which is based on elimi- dent variable y and the explanatory variable x are
nating those individuals that have missing values generated according to:
in any of the variables included in the model. This
alternative may perform poorly under heavily yit ¼ λyit1 þ βxit þ αi þ vit (13)
unbalanced data because the cross-section dimen-
xit ¼ ρxit1 þ ϕyit1 þ παi þ it (14)
sion (N) is drastically reduced generating conver-
gence failures of the likelihood maximization where vit , it and αi are generated as vit ,i:i:d:ð0; 1Þ,
procedure. it ,i:i:d:ð0; 6:58Þ and αi ,i:i:d:ð0; 2:96Þ. The para-
Alternatively, we consider the FIML approach meter ϕ in (14) captures the feedback from the lagged
discussed in Arbuckle (1996) in order to imple- dependent variable to the regressor. This particular
ment our ML estimator under unbalanced panels. DGP corresponds to scheme 2 in Bun and Kiviet
This approach computes individual-specific con- (2006), which is more realistic than their baseline
tributions to the likelihood function using only scheme 1, considered for convenience in the evalua-
those time periods that are observed for each tion of their analytical results. With respect to the
individual. Then, the likelihood function to be parameter values, we follow the baseline Design 5 in
maximized is computed by accumulating all the Moral-Benito (2013) and we fix λ ¼ 0:75, β ¼ 0:25,
individual-specific likelihoods. This alternative has ρ ¼ 0:5, ϕ ¼ 0:17, and π ¼ 0:67. This configura-
been shown to perform much better than listwise tion allows for fixed effects correlated with the regres-
deletion in cross-sectional settings (see Enders and sor as well as feedback from y to x. Bun and Kiviet
Bandalos 2001). Indeed, in Section 3 below, we (2006) provide more details about this particular
illustrate that the method performs relatively well Data-Generating Process.
when working with unbalanced panels using the The finite sample performance of the likelihood-
FIML approach. based estimator discussed in this paper is compared
with the widely used Arellano and Bond (1991)
GMM estimator. Our main motivation is to illustrate
III. Simulation results the potential gains in terms of finite sample biases of
using our maximum likelihood estimator as an alter-
In this section, we explore the finite sample beha- native to the Arellano and Bond (1991) approach.
viour of the likelihood-based estimator discussed Table 1 presents the simulation results. Columns
in this paper in comparison with the Arellano and (1) and (2) illustrate that our maximum likelihood
Bond (1991) GMM estimator.6 For this purpose, estimator (henceforth ML) presents much lower
we consider the simulation setting in Bun and biases when estimating λ than the Arellano and
Kiviet (2006) also considered by Moral-Benito Bond (1991) estimator (henceforth AB) as long as N
(2013). is small. In the case T ¼ 4, the ML bias is negligible

Table 1. Simulation results.


Bias λ Bias β Iqr λ Iqr β RMSE λ RMSE β
Sample size (1) AB (2) ML (3) AB (4) ML (5) AB (6) ML (7) AB (8) ML (9) AB (10) ML (11) AB (12) ML
N ¼ 100, T ¼ 4 −0.220 −0.009 −0.087 0.001 0.389 0.203 0.169 0.115 0.375 0.159 0.158 0.090
N ¼ 200, T ¼ 4 −0.138 −0.002 −0.054 0.002 0.312 0.167 0.135 0.088 0.281 0.131 0.119 0.069
N ¼ 500, T ¼ 4 −0.069 0.009 −0.027 0.005 0.226 0.130 0.098 0.061 0.190 0.103 0.081 0.049
N ¼ 1000, T ¼ 4 −0.037 0.010 −0.015 0.007 0.170 0.116 0.074 0.052 0.138 0.093 0.059 0.042
N ¼ 5000, T ¼ 4 −0.007 0.008 −0.003 0.004 0.077 0.069 0.033 0.029 0.061 0.055 0.026 0.024
N ¼ 100, T ¼ 8 −0.069 0.012 −0.014 0.004 0.081 0.091 0.032 0.037 0.094 0.073 0.029 0.029
N ¼ 100, T ¼ 12 −0.041 0.003 −0.004 0.001 0.045 0.050 0.020 0.023 0.054 0.039 0.016 0.017
N ¼ 5000, T ¼ 12 0.001 0.000 0.000 0.000 0.006 0.005 0.003 0.003 0.005 0.004 0.002 0.002
AB refers to the Arellano and Bond (1991) GMM estimator; ML refers to the maximum likelihood estimator discussed in Section 2.1; True parameter values
are λ ¼ 0:75 and β ¼ 0:25; Bias refers to the median estimation errors ^λ  λ and ^β  β; iqr is the 75th-25th interquartile range; RMSE is the root mean
square error; results are based on 1,000 replications.

6
We use the xtdpdml Stata command for the maximum likelihood estimator and the xtdpd Stata command for the Arellano and Bond (1991) GMM
estimator.
6 E. MORAL-BENITO ET AL.

even with N ¼ 100 while the AB bias is non-negligi- Table 2 considers alternative DGPs in which the
ble (around 5%) even with N ¼ 1; 000. Turning to β persistence of the dependent variable is larger than in
in columns (3) and (4), the same pattern arises with a the baseline design (i.e. λ is closer to 1). Under these
bias above 7% in the AB estimator when N ¼ 1; 000. circumstances, the AB biases are expected to increase
This result points to a significantly better finite sam- as instruments become weaker (Bond, Hoeffler, and
ple performance of the ML estimator when the cross- Temple 2001). Indeed, columns (1) and (3) confirm
section dimension is small. Not surprisingly, the per- this pattern for both λ and β. In the case of ML in
formance of the AB estimator improves as N columns (2) and (4), biases are also larger as λ
increases; therefore, when working with sample sizes increases but the magnitude of these biases is sub-
around 5,000 individuals or more, the gains from stantially smaller than that of AB. Turning to effi-
using the ML estimator are relatively minor. The ciency, iqrs tend to increase with λ in columns (5) and
bottom rows of Table 1 investigate the effect of (7) for AB, but remain similar or even lower in the
increasing T, the time series dimension of the panel, case of ML as reported in columns (6) and (8). Finally,
when N is small. Overall, the performance of the AB RMSEs in columns (9)–(12) summarize these find-
estimator improves as T increases while that of the ings pointing to significantly lower RMSEs for the ML
ML estimator remains virtually unaffected. In any estimator. Indeed, the RMSEs of the ML estimator
case, as long as N is small (e.g. N ¼ 100), the ML relative to those of the AB estimator are reduced as λ
estimator appears to be preferred to the AB alterna- increases: the RMSE for the AB estimator is two times
tive in terms of finite sample biases. larger than that of ML when λ ¼ 0:75 and four times
With respect to efficiency, the ML estimator pre- larger when λ ¼ 0:99.
sents lower interquartile ranges for all sample sizes Table 3 explores the performance of our ML
when T ¼ 4 as shown in columns (5)–(8). Indeed, estimator when working with unbalanced panels,
the ML estimator is asymptotically efficient under which are very common in practice. In particular,
normality as N ! 1. Only in some cases when T we consider samples with different degrees of
increases for N fixed the ML iqrs are slightly larger unbalancedness satisfying the missing at random
than those of AB (see the rows N ¼ 100, T ¼ 8 and (MAR) assumption.7 First, we compute a ‘prob-
N ¼ 100, T ¼ 12). However, when looking at the ability of missing observation’ Pitm that depends on
root mean square errors (RMSE) in columns (9)– x as follows: Pitm ¼ Λð0:5xit þ ςit Þ where
(12), ML presents always lower RMSEs than AB for ςit ,Nð0; 1Þ. Second, both y and x are replaced
λ, and virtually equal for β as T increases. by missing values for those observations below
Finally, when both N and T are relatively large the 1st, 5th and 10th percentiles of the Pitm distri-
(N ¼ 5000, T ¼ 12) as in the last row of Table 1, bution. Therefore, we explore the performance of
AB and ML perform similarly with negligible the estimators depending on the severity of the
biases and low interquartile ranges in both cases. unbalancedness.

Table 2. Simulation results for different values of λ.


Bias λ Bias β iqr λ iqr β RMSE λ RMSE β
Sample size (1) AB (2) ML (3) AB (4) ML (5) AB (6) ML (7) AB (8) ML (9) AB (10) ML (11) AB (12) ML
λ ¼ 0:75 −0.138 −0.002 −0.054 0.002 0.312 0.167 0.135 0.088 0.281 0.131 0.119 0.069
λ ¼ 0:80 −0.169 −0.010 −0.067 −0.002 0.339 0.161 0.147 0.087 0.315 0.127 0.133 0.068
λ ¼ 0:85 −0.208 −0.015 −0.083 −0.002 0.373 0.152 0.162 0.087 0.358 0.120 0.152 0.068
λ ¼ 0:90 −0.252 −0.029 −0.103 −0.008 0.413 0.150 0.181 0.086 0.409 0.120 0.175 0.068
λ ¼ 0:95 −0.300 −0.039 −0.125 −0.013 0.455 0.146 0.201 0.086 0.466 0.121 0.200 0.068
λ ¼ 0:99 −0.335 −0.048 −0.142 −0.018 0.478 0.142 0.211 0.086 0.503 0.121 0.218 0.069
λ ¼ 1:00 −0.343 −0.052 −0.146 −0.020 0.481 0.142 0.213 0.086 0.509 0.123 0.221 0.070
AB refers to the Arellano and Bond (1991) GMM estimator; ML refers to the maximum likelihood estimator discussed in Section 2.1; Sample size is N ¼ 200
and T ¼ 4 in all cases; Bias refers to the median estimation errors ^λ  λ and ^
β  β; iqr is the 75th-25th interquartile range; RMSE is the root mean square
error; results are based on 1,000 replications.

7
Under MAR, the probability that an observation is missing on the variable y can depend on another observed variable x. This condition is thus less
restrictive than the missing completely at random (MCAR) assumption that requires missing values on y to be independent of other observed variables x as
well as the values of y itself.
APPLIED ECONOMICS 7

Table 3. Simulation results under unbalanced panels.


Bias λ Bias β iqr λ iqr β
Unbalacedness (1) AB (2) ML (3) AB (4) ML (5) AB (6) ML (7) AB (8) ML
PANEL A: N ¼ 200, T ¼ 4
1% −0.171 −0.005 −0.063 0.006 0.336 0.212 0.134 0.099
5% −0.218 −0.004 −0.082 0.000 0.381 0.212 0.153 0.091
10% −0.268 0.005 −0.111 0.003 0.381 0.222 0.154 0.100
PANEL B: N ¼ 500, T ¼ 4
1% −0.090 −0.003 −0.035 −0.003 0.235 0.160 0.100 0.071
5% −0.122 0.009 −0.051 0.005 0.282 0.155 0.114 0.070
10% −0.163 0.016 −0.065 0.005 0.307 0.175 0.125 0.074
PANEL C: N ¼ 200, T ¼ 8
1% −0.049 0.004 −0.009 0.004 0.067 0.067 0.027 0.029
5% −0.072 0.015 −0.015 0.010 0.081 0.083 0.032 0.034
10% −0.104 0.020 −0.027 0.014 0.099 0.087 0.042 0.036
PANEL D: N ¼ 500, T ¼ 8
1% −0.021 0.006 −0.004 0.003 0.043 0.037 0.018 0.017
5% −0.035 0.014 −0.008 0.007 0.053 0.043 0.021 0.018
10% −0.054 0.022 −0.015 0.011 0.063 0.048 0.026 0.019
AB refers to the Arellano and Bond (1991) GMM estimator; ML refers to the maximum likelihood estimator discussed in Section 2.1 implemented based on
the FIML approach as described in the main text (i.e. fiml option in the xtdpdml Stata command); True parameter values are λ ¼ 0:75 and β ¼ 0:25; Bias
refers to the median estimation errors ^λ  λ and ^β  β; iqr is the 75th-25th interquartile range; results are based on 1,000 replications; unbalacedness
refers to the share of observations with missing value according to the missing at random (MAR) assumption.

Two main conclusions emerge from the results acknowledge that some samples in our simulations
in Table 3. First, the larger the severity of the produce convergence failures in the ML
unbalancedness, the larger the finite sample biases. estimator.8 All in all, while the ML estimator suf-
However, in the case of the ML estimator, the fers from convergence problems when unbalanc-
biases remain much lower in all cases. Second, edness is severe and the time dimension is low, the
the 75th-25th interquartile ranges also increase finite sample biases in the AB estimator signifi-
significantly as the unbalancedness increases. cantly increase under these circumstances. Note
However, the iqr increases are lower in the case also that Williams, Allison, and Moral-Benito
of the ML estimator. In any event, we

Table 4. Simulation results under nonnormal disturbances.


Bias λ Bias β iqr λ iqr β
Unbalacedness (1) AB (2) ML (3) AB (4) ML (5) AB (6) ML (7) AB (8) ML
PANEL A: t-student 4 df. N ¼ 200, T ¼ 4
0% −0.165 0.016 −0.063 0.009 0.312 0.190 0.136 0.089
5% −0.225 −0.002 −0.079 −0.003 0.353 0.199 0.163 0.094
10% −0.269 −0.017 −0.105 −0.007 0.413 0.189 0.181 0.088
PANEL B: t-student 4 df. N ¼ 500, T ¼ 4
0% −0.074 0.012 −0.030 0.006 0.237 0.153 0.101 0.066
5% −0.141 −0.008 −0.056 0.002 0.266 0.136 0.113 0.062
10% −0.187 −0.012 −0.073 −0.001 0.331 0.154 0.140 0.065
PANEL C: Mixture of Normals. N ¼ 200, T ¼ 4
0% −0.181 −0.011 −0.080 −0.002 0.335 0.173 0.166 0.088
5% −0.217 −0.023 −0.054 0.013 0.328 0.166 0.136 0.077
10% −0.225 −0.030 −0.052 0.005 0.307 0.154 0.115 0.068
PANEL D: Mixture of Normals. N ¼ 500, T ¼ 4
0% −0.096 0.005 −0.042 0.006 0.247 0.128 0.121 0.063
5% −0.217 −0.023 −0.054 0.013 0.328 0.166 0.136 0.077
10% −0.225 −0.030 −0.052 0.005 0.307 0.154 0.115 0.068
AB refers to the Arellano and Bond (1991) GMM estimator; ML refers to the maximum likelihood estimator discussed in Section 2.1 implemented based on
the FIML approach as described in the main text (i.e. fiml option in the xtdpdml Stata command); True parameter values are λ ¼ 0:75 and β ¼ 0:25; Bias
refers to the median estimation errors ^λ  λ and ^β  β; iqr is the 75th-25th interquartile range; results are based on 1,000 replications; unbalacedness
refers to the share of observations with missing value according to the missing at random (MAR) assumption.

8
The FIML algorithm can fail to converge when working with unbalanced panels, especially with small sample sizes. For example, in Panel A of Table 3 with
N ¼ 200 and T ¼ 4, there was a convergence failure in around 20% of the samples, which were excluded from the results shown in the table. However,
this figure is around 10% in Panel B with N ¼ 500 and T ¼ 4, and less than 1% in Panel C with N ¼ 200 and T ¼ 8. Indeed, in all samples with T ¼ 8 the
percentage of failures is less than 1%. Therefore, we conclude that convergence failures of our estimator may be a concern when exploiting unbalanced
panels in which time series dimension is low (around T ¼ 4) and the share of missing values is large (above 10%).
8 E. MORAL-BENITO ET AL.

(forthcoming) discuss ways to get models to con- due to the small N dimension of their data (they only
verge when they initially fail to do so. observe 74 countries), they also explored the system-
The simulation results discussed in this section GMM approach by Arellano and Bover (1995). The
are expected to hold under non-normality of the mean stationarity assumption required for consis-
disturbances; this is so because ML can be con- tency of the system-GMM estimator is especially
sidered a pseudo maximum likelihood estimator inappropriate in cross-country datasets starting at
that remains consistent and asymptotically normal the end of a war, as argued by Barro and Sala-i-
under non-normality (see Moral-Benito 2013). In Martin (2003). In this context, the maximum like-
Table 4, we explore fat-tailed and skew distur- lihood approach discussed in this paper is a natural
bances under different degrees of unbalancedness alternative to be explored instead of system-GMM.
to check the sensitivity of the FIML-based esti- In this section, we estimate the effect of finan-
mates to the normality assumption, especially in cial development on economic growth using the
the case of unbalanced panels in which the nor- proposed ML estimator in addition to first-differ-
mality assumption might appear more relevant. In enced GMM. We use a panel dataset of 78 coun-
particular, we first consider that all errors in the tries (N ¼ 78) over the period 1960–1995.9
DGP are distributed as a Student with 4 degrees of Following Levine, Loayza, and Beck (2000) we
freedom (Panels A and B), implying an infinite consider 5-year periods to avoid business cycle
kurtosis, that is, fatter tails than the normal dis- fluctuations so that we exploit a maximum of 7
tribution. Second, we also consider errors distrib- observations per country (T ¼ 7).
uted according to a mixture of two normal The dependent variable is the log of real per
distributions with different means (being the dif- capita GDP taken from the World Development
ference equal to 20) so that the resulting distribu- Indicators (WDI). The main regressors of interest
tion is nonsymmetric (Panels C and D). For both are three different proxies for financial develop-
nonnormal disturbances, the results remain very ment at the country level, namely, liquid liabilities,
similar to those of the normal case. commercial-central bank, and private credit, all
taken from the International Financial Statistics
(IFS) database. Liquid liabilities are defined as
IV. Empirical application the liquid liabilities of the financial system (cur-
rency plus demand and interest-bearing liabilities
The growth regressions literature over the eighties
of banks and non-bank financial intermediaries)
and early nineties were mostly based on cross-
divided by GDP. Commercial-central bank is
section approaches (see e.g. Barro 1991). Starting
defined as the assets of deposit money banks
in the mid-nineties, the mainstream approach was
divided by assets of deposit money banks plus
based on panel data methods accounting for coun-
central bank assets. Private credit refers to the
try-specific effects and reverse causality between
credit by deposit money banks and other financial
economic growth and potential growth determi-
institutions to the private sector divided by GDP.
nants. The Arellano and Bond (1991) estimator
Finally, the following control variables are also
was the most popular alternative exploited in this
considered: opennes to trade (from WDI), govern-
literature (e.g. Caselli, Esquivel, and Lefort 1996).
ment size (from WDI), average years of secondary
Along these lines, the influential paper by Levine,
schooling (from the Barro and Lee dataset), infla-
Loayza, and Beck (2000) found a positive effect of
tion (IFS), and the black market premium (from
financial development on economic growth after
World Currency Yearbook). For more details on
accounting for country-specific fixed effects and
the variables considered see Table 12 in Levine,
reverse causality in a panel data setting. They first
Loayza, and Beck (2000).
considered the Arellano and Bond (1991) first-differ-
Analogously to Equations (1) and (2), we esti-
enced GMM estimator. However, given the concern
mate the following model:
of finite sample biases in the first-differenced GMM
9
Since we do not have the original data set assembled by Levine, Loayza, and Beck (2000), we use an equivalent data set taken from the same public sources
and including four additional countries. We thank Pau Gaya and Alexandro Ruiz for sharing these data with us.
APPLIED ECONOMICS 9

yit ¼ λyit1 þ βFDit þ γwit þ αi þ vit (15) assumption (16). However, note that the system-
GMM estimator, also considered by Levine,
where yit refers to the log of real per capita GDP
Loayza, and Beck (2000), requires the additional
in country i and period t,10 FDit refers to one of
assumption of mean-stationarity that seems unde-
the three financial development proxies consid-
sirable in this setting as discussed in Barro and
ered by Levine, Loayza, and Beck (2000), and wit
Sala-i-Martin (2003).
refers to a set of control variables. αi captures
Table 5 presents the estimation results. In all
country-specific heterogeneity potentially corre-
cases, the FIML approach was considered in the
lated with the regressors that are time-invariant.
ML estimator due to the unbalancedness of the
In addition, we also include a set of time dummies
panel.12 There are 445 observations with non-miss-
to account for common shocks to all countries
ing values in all the variables while the total number
(e.g. the 1973 crisis). β is our parameter of interest,
of observations is 78  7 = 546 (i.e. unbalancedness
as it estimates the effect of financial development
is around 18%).13 Still, the ML algorithm achieved
on economic growth.11
convergence in all the specifications producing rea-
Following Levine, Loayza, and Beck (2000) we
sonable estimates, which can be attributed to the
assume that both FDit and the control variables wit
availability of a relatively large number of time
are predetermined so that feedback from GDP to
series observations (T ¼ 7) as illustrated in our
financial development and other macroeconomic
simulation results in Section 3.
conditions is allowed:
The diff-GMM estimates in Panel A of Table 5
 
i ; wi ; FDi ; αi ¼ 0 ðt ¼ 1; :::; TÞði ¼ 1; :::; NÞ
E vit jyt1 t t replicate the findings in Levine, Loayza, and Beck
(16) (2000). All the three proxies for financial develop-
ment (liquid liabilities, commercial-central bank,
The Arellano and Bond (1991) approach as well as private credit) have a positive and statistically sig-
the likelihood-based approach discussed in this nificant effect on economic growth. Moreover, the
paper can estimate the model in (15) under the

Table 5. Financial development and economic growth.


PANEL A: First-differenced GMM estimator (AB)
Lagged dep. variable 0.704 0.617 0.731 0.629 0.638 0.579
(0.066) (0.049) (0.056) (0.048) (0.057) (0.049)
Liquid Liabilities 0.040 0.066
(0.019) (0.017)
Commercial-central bank 0.039 0.039
(0.011) (0.010)
Private Credit 0.050 0.054
(0.013) (0.015)
Control variables Simple Policy Simple Policy Simple Policy
Observations 417 397 429 398 417 396
PANEL B: Maximum likelihood estimator (ML)
Lagged dep. variable 1.019 1.004 0.980 0.960 0.955 0.945
(0.043) (0.050) (0.044) (0.048) (0.040) (0.042)
Liquid liabilities 0.029 0.028
(0.012) (0.014)
Commercial-central bank 0.044 0.041
(0.008) (0.008)
Private credit 0.053 0.048
(0.010) (0.009)
Control variables Simple Policy Simple Policy Simple Policy
Observations 411 397 429 398 417 396
Notes. Dependent variable is the log of real per capita GDP in all cases. Simple set of control variables includes only average years of secondary schooling as
an additional covariate. The policy conditioning information set includes average years of secondary schooling, government size, openness to trade,
inflation, and black market premium as in Levine, Loayza, and Beck (2000). All regressors are normalized to have zero mean and unit standard deviation in
order to ease the interpretation of the coefficients. We denote significance at 10%, 5% and 1% with  ,  and  , respectively. Standard errors are denoted
in parentheses.

10
Note that we consider seven five-year periods, namely, 1960–1965, 1965–1970, 1970–1975, 1975–1980, 1980–1985, 1985–1990, and 1990–1995.
11
Note that the model in (15) is equivalent to yit  yit1 ¼ ðλ  1Þyit1 þ βFDit þ γwit þ αi þ vit where the dependent variable is GDP growth.
12
We use the fiml option in the xtdpdml Stata command.
13
Note that the inclusion of the lagged dependent variable further reduces the number of observations used in Table 5.
10 E. MORAL-BENITO ET AL.

effects are economically large since all regressors Ahn and Schmidt 1995; Hansen, Heaton, and Yaron
are normalized to have zero mean and unit stan- 1996; Hsiao, Pesaran, and Tahmiscioglu 2002;
dard deviation. For instance, an increase of one Moral-Benito 2013).14 Moreover, concern has inten-
standard deviation in the credit-to-GDP ratio sified in recent years that many instrumental vari-
boosts the level of GDP per capita by around ables of the type considered in panel GMM
5.4% according to the estimates in the last column estimators such as Arellano and Bond (1991) may
of Panel A. The magnitude of the liquid liabilities be invalid, weak, or both (see Bazzi and Clemens
and commercial-central bank estimated effects are 2013; Kraay 2015).
also large and similar in magnitude. Also, given In this article, we discuss a maximum likelihood
the estimated persistence of GDP per capita (i.e. estimator that is asymptotically equivalent to the
the lagged dependent variable coefficient), the Arellano and Bond (1991) estimator but it is strongly
long-run effects are even larger. In particular, the preferred in terms of finite sample performance.
long-run effect on a one standard deviation Moreover, the proposed estimator can be easily
increase in private credit is estimated to be around implemented in various SEM packages such as
β
13% (i.e. 1λ ). Stata (xtdpdml command described in Williams,
Turning to the maximum likelihood estimates Allison, and Moral-Benito (forthcoming)), SAS
in Panel B of Table 5, the estimated effects are (proc CALIS), Mplus, LISREL, EQS, Amos, lavaan
overall very similar. For instance, the estimated (for R), and OpenMx (for R).
impact effect of private credit on GDP per capita Simulation results presented in the paper indi-
is 4.8% instead of 5.4% as in Panel A. However, cate that our maximum likelihood estimator has
the estimated coefficients for the lagged dependent negligible biases in finite samples when the DGP
variable are significantly larger when using the ML includes fixed effects, a lagged dependent variable
estimator, which points to a downward bias in the regressor, and an additional predetermined expla-
diff-GMM estimates as shown in our simulations. natory variable. Moreover, these biases are smaller
An important implication of this result is that the than those of the first-differenced GMM when the
estimated long-run effects of financial develop- number of cross-section observations (N) is small.
ment on GDP could be much larger. According As an empirical illustration, we estimate the
to the last column of Panel B, the estimated long- effect of financial development on economic
run effect on GDP per capita of a one standard growth in a panel of countries using the proposed
0:048
deviation increase in private credit is 87% ( 10:945 ) estimator. According to our empirical results, the
0:054 GMM estimates of the long-run effect of financial
instead of 13% ( 10:579 ) as estimated by diff-
GMM. Not surprisingly, all the coefficients are development on economic growth presented in
estimated more precisely than in the GMM case Levine, Loayza, and Beck (2000) are much larger
as the maximum likelihood is more efficient than when considering the proposed maximum likeli-
GMM under normality. hood estimator.

Disclosure statement
V. Concluding remarks
No potential conflict of interest was reported by the authors.
The widely used first-differenced GMM estimator
discussed in Arellano and Bond (1991) may suffer
from finite sample biases when the number of cross- References
section observations is small. Based on the same
Ahn, S., and P. Schmidt. 1995. “Efficient Estimation of
identifying assumption, the alternatives proposed
Models for Dynamic Panel Data.” Journal of
in the literature are typically difficult to implement Econometrics 68: 5–27.
by practitioners as they require some programming Allison, P. 2005. Fixed Effects Regression Methods for
capabilities (e.g. Alonso-Borrego and Arellano 1999; Longitudinal Data Using SAS. Cary, NC: SAS Institute.

14
Note that the so-called system-GMM estimator by Arellano and Bover (1995) and Blundell and Bond (1998) is not included in this category because it
requires the mean-stationarity assumption for consistency, which is not required by first-differenced GMM.
APPLIED ECONOMICS 11

Allison, P., R. Williams, and E. Moral-Benito. 2017. Discussion Paper s 3048, London, UK: Centre for
“Maximum Likelihood for Cross-Lagged Panel Models Economic Policy Research (CEPR).
with Fixed Effects.” Socius 3: 1–17. Bun, M., and J. Kiviet. 2006. “The Effects of Dynamic
Alonso-Borrego, C., and M. Arellano. 1999. “Symmetrically Feedbacks on LS and MM Estimator Accuracy in Panel
Normalized Instrumental-Variable Estimation Using Data Models.” Journal of Econometrics 132: 409–444.
Panel Data.” Journal of Business & Economic Statistics 17: Caselli, F., G. Esquivel, and F. Lefort. 1996. “Reopening the
36–49. Convergence Debate: A New Look at Cross-Country
Arbuckle, J. 1996. “Full Information Estimation in the Growth Empirics.” Journal of Economic Growth 1: 363–389.
Presence of Incomplete Data.” In Advanced Structural Enders, C., and D. Bandalos. 2001. “The Relative
Equation Modeling, edited by G. A. Marcoulides and R. Performance of Full Information Maximum Likelihood
E. Schumacker, 243277. Mahwah, NJ: Lawrence Erlbaum Estimation for Missing Data in Structural Equation
Associates. Models.” Structural Equation Modeling 8: 430–457.
Arellano, M. 2003. Panel Data Econometrics. Oxford, UK: Hansen, L. P., J. Heaton, and A. Yaron. 1996. “Finite-Sample
Oxford University Press. Properties of Some Alternative GMM Estimators.” Journal
Arellano, M., and S. Bond. 1991. “Some Tests of Specification for of Business & Economic Statistics 14: 262–280.
Panel Data: Monte Carlo Evidence and an Application to Hsiao, C., H. Pesaran, and A. Tahmiscioglu. 2002.
Employment Equations.” Review of Economic Studies 58: “Maximum Likelihood Estimation of Fixed Effects
277–297. Dynamic Panel Data Models Covering Short Time
Arellano, M., and O. Bover. 1995. “Another Look at the Periods.” Journal of Econometrics 109: 107–150.
Instrumental Variable Estimation of Error-Components Islam, N. 1995. “Growth Empirics: A Panel Data Approach.”
Models.” Journal of Econometrics 68: 29–52. The Quarterly Journal of Economics 110: 1127–1170.
Barro, R. 1991. “Economic Growth in a Cross Section of Kraay, A. 2015. “Weak Instruments in Growth Regressions
Countries.” Quarterly Journal of Economics 106: 407– Implications for Recent Cross-Country Evidence on
443. Inequality and Growth.” Policy Research Working Paper
Barro, R., and X. Sala-i-Martin. 2003. Economic Growth. 7494, Washington, DC: The World Bank.
Cambridge, MA: MIT Press. Levine, R., N. Loayza, and T. Beck. 2000. “Financial
Bazzi, S., and M. Clemens. 2013. “Blunt Instruments: Intermediation and Growth: Causality and Causes.”
Avoiding Common Pitfalls in Identifying the Causes of Journal of Monetary Economics 46: 31–77.
Economic Growth.” American Economic Journal: Moral-Benito, E. 2013. “Likelihood-Based Estimation of
Macroeconomics 152†5: 186. Dynamic Panels with Predetermined Regressors.” Journal
Bentler, P., and D. Weeks. 1980. “Linear Structural Equations of Business & Economic Statistics 31: 451–472.
with Latent Variables.” Psychometrika 45: 289–308. Williams, R., P. Allison, and E. Moral-Benito. 2018. “Linear
Blundell, R., and S. Bond. 1998. “Initial Conditions and Dynamic Panel-Data Estimation Using Maximum
Moment Restrictions in Dynamic Panel Data Models.” Likelihood and Structural Equation Modeling.” The Stata
Journal of Econometrics 87: 115–143. Journal 18: 293-326.
Bond, S., A. Hoeffler, and J. Temple. 2001. “GMM Wooldridge, J. 2010. Econometric Analysis of Cross Section
Estimation of Empirical Growth Models.” CEPR and Panel Data. Cambridge, MA: MIT Press.
12 E. MORAL-BENITO ET AL.

Appendix A. Illustration in the case of The moments (17a)–(17b) are those typically exploited by
three time periods first-differenced GMM as in Arellano and Bond (1991) while
In order to illustrate the equivalence of our likelihood-based the moment in (17g) results from the lack of autocorrelation
approach outlined in section 2.1 and the baseline GMM implied by assumption (2) as considered by Ahn and
approach exclusively based on the assumption (2), we con- Schmidt (1995).
sider the case T ¼ 3 and show that the number of over- We thus have seven-moment conditions and two para-
identifying restrictions is the same in both estimators. meters to be estimated, λ and β, which give rise to five over-
identifying restrictions implied by the model in (1)–(2) when
λ and β are the parameters of interest.
Appendix A1. The GMM approach
With three time periods and yi0 observed by the econome- Appendix A2. The likelihood-based approach
trician, the model in (1)–(2) implies the following moment
The model in structural form given by Equation (10)
conditions:
involves 23 structural parameters when T ¼ 3, namely, λ, β,
Eðyi0 Δvi2 Þ ¼ 0 (17a) σ 2α , σ 2v0 , σ 2v1 , σ 2v2 , σ 2v3 , σ 21 , σ 22 , σ 23 , ϕ0 , ϕ1 , ϕ2 , ϕ3 , ψ21 , ψ31 , ψ32 ,
ω01 , ω02 , ω03 , ω12 , ω13 and ω23 .
Eðxi1 Δvi2 Þ ¼ 0 (17b)
The reduced form version of the model in (10), given by
Eðyi0 Δvi3 Þ ¼ 0 (17c) Ri ¼ B1 DUi , involves 28 reduced form parameters coming
from the 7  7 covariance matrix of the reduced-form distur-
Eðyi1 Δvi3 Þ ¼ 0 (17d) bances Ξi ¼ B1 DUi .
The difference between 28 reduced form parameters and 23
Eðxi1 Δvi3 Þ ¼ 0 (17e)
structural parameters implies 5 over-identifying restrictions as in
Eðxi2 Δvi3 Þ ¼ 0 (17f) the GMM case above, which ensures identification and that our
likelihood-based approach does not impose any additional restric-
EðΔvi2 ðvi3 þ αi ÞÞ ¼ 0 (17g) tion (i.e. it is exclusively based on the assumption (2)).

You might also like