Xtgls - Fit Panel-Data Models by Using GLS
Xtgls - Fit Panel-Data Models by Using GLS
com
xtgls — Fit panel-data models by using GLS
Description
xtgls fits panel-data linear models by using feasible generalized least squares. This command
allows estimation in the presence of AR(1) autocorrelation within panels and cross-sectional correlation
and heteroskedasticity across panels.
Quick start
GLS regression of y on x1, x2, and indicators for levels of categorical variable a using xtset data
xtgls y x1 x2 i.a
With heteroskedastic but uncorrelated errors across panels
xtgls y x1 x2 i.a, panels(heteroskedastic)
With heteroskedastic and correlated errors across panels
xtgls y x1 x2 i.a, panels(correlated)
Three-stage GLS with a common first-order autocorrelation within panels
xtgls y x1 x2 i.a, panels(correlated) corr(ar1)
As above, but let autocorrelation structure be panel-specific
xtgls y x1 x2 i.a, panels(correlated) corr(psar1)
As above, but estimate by iterated GLS
xtgls y x1 x2 i.a, panels(correlated) corr(psar1) igls
Menu
Statistics > Longitudinal/panel data > Contemporaneous correlation > GLS regression with correlated disturbances
1
2 xtgls — Fit panel-data models by using GLS
Syntax
xtgls depvar indepvars if in weight , options
options Description
Model
noconstant suppress constant term
panels(iid) use i.i.d. error structure
panels(heteroskedastic) use heteroskedastic but uncorrelated error structure
panels(correlated) use heteroskedastic and correlated error structure
corr(independent) use independent autocorrelation structure
corr(ar1) use AR1 autocorrelation structure
corr(psar1) use panel-specific AR1 autocorrelation structure
rhotype(calc) specify method to compute autocorrelation parameter;
see Options for details; seldom used
igls use iterated GLS estimator instead of two-step GLS estimator
force estimate even if observations unequally spaced in time
SE
nmk normalize standard error by N − k instead of N
Reporting
level(#) set confidence level; default is level(95)
display options control columns and column formats, row spacing, line width,
display of omitted variables and base and empty cells, and
factor-variable labeling
Optimization
optimize options control the optimization process; seldom used
coeflegend display legend instead of statistics
A panel variable must be specified. For correlation structures other than independent, a time variable must be
specified. A time variable must also be specified if panels(correlated) is specified. Use xtset; see [XT] xtset.
indepvars may contain factor variables; see [U] 11.4.3 Factor variables.
depvar and indepvars may contain time-series operators; see [U] 11.4.4 Time-series varlists.
by and statsby are allowed; see [U] 11.1.10 Prefix commands.
aweights are allowed; see [U] 11.1.6 weight.
coeflegend does not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.
Options
Model
SE
nmk specifies that standard errors be normalized by N − k , where k is the number of parameters
estimated, rather than N , the number of observations. Different authors have used one or the other
normalization. Greene (2018, 313) remarks that whether a degree-of-freedom correction improves
the small-sample properties is an open question.
Reporting
Optimization
optimize options control the iterative optimization process. These options are seldom used.
iterate(#) specifies the maximum number of iterations. When the number of iterations equals #,
the optimization stops and presents the current results, even if convergence has not been reached.
The default is iterate(100).
tolerance(#) specifies the tolerance for the coefficient vector. When the relative change in the
coefficient vector from one iteration to the next is less than or equal to #, the optimization process
is stopped. tolerance(1e-7) is the default.
log and nolog specify whether to display the iteration log. The iteration log is displayed by
default unless you used set iterlog off to suppress it; see set iterlog in [R] set iter.
The following option is available with xtgls but is not shown in the dialog box:
coeflegend; see [R] Estimation options.
Introduction
Information on GLS can be found in Greene (2018), Maddala and Lahiri (2006), Davidson and
MacKinnon (1993), and Judge et al. (1985).
If you have many panels relative to periods, see [XT] xtreg and [XT] xtgee. xtgee, in particular,
provides capabilities similar to those of xtgls but does not allow cross-sectional correlation. On the
other hand, xtgee allows a richer description of the correlation within panels as long as the same
correlations apply to all panels. xtgls provides two unique features:
1. Cross-sectional correlation may be modeled (panels(correlated)).
2. Within panels, the AR(1) correlation coefficient may be unique (corr(psar1)).
xtgls allows models with heteroskedasticity and no cross-sectional correlation, but, strictly
speaking, xtgee does not. xtgee with the vce(robust) option relaxes the assumption of equal
variances, at least as far as the standard error calculation is concerned.
Also, xtgls, panels(iid) corr(independent) nmk is equivalent to regress.
The nmk option uses n − k rather than n to normalize the variance calculation.
To fit a model with autocorrelated errors (corr(ar1) or corr(psar1)), the data must be equally
spaced in time. To fit a model with cross-sectional correlation (panels(correlated)), panels must
have the same number of observations (be balanced).
The equation from which the models are developed is given by
where i = 1, . . . , m is the number of units (or panels) and t = 1, . . . , Ti is the number of observations
for panel i. This model can equally be written as
y1 X1 1
y 2 X2
. = . β + .2
. .. ..
.
ym Xm m
For the Ωi,j matrices to be parameterized to model cross-sectional correlation, they must be square
(balanced panels).
In these models, we assume that the coefficient vector β is the same for all panels and consider a
variety of models by changing the assumptions on the structure of Ω.
For the classic OLS regression model, we have
E[i,t ] = 0
Var[i,t ] = σ 2
Cov[i,t , j,s ] = 0 if t 6= s or i 6= j
σ2 I 0 · · · 0
0 σ2 I · · · 0
Ω=
... .. .. ..
. . .
2
0 0 ··· σ I
whether or not the panels are balanced (the 0 matrices may be rectangular). The classic OLS assumptions
are the default panels(iid) and corr(independent) options for this command.
Example 1
Greene (2012, 1112) reprints data in a classic study of investment demand by Grunfeld and
Griliches (1960). Below we allow the variances to differ for each of the five companies.
. use https://www.stata-press.com/data/r16/invest2
. xtgls invest market stock, panels(hetero)
Cross-sectional time-series FGLS regression
Coefficients: generalized least squares
Panels: heteroskedastic
Correlation: no autocorrelation
Estimated covariances = 5 Number of obs = 100
Estimated autocorrelations = 0 Number of groups = 5
Estimated coefficients = 3 Time periods = 20
Wald chi2(2) = 865.38
Prob > chi2 = 0.0000
Example 2
. xtset
panel variable: company (strongly balanced)
time variable: time, 1 to 20
delta: 1 unit
. xtgls invest market stock, panels(correlated)
Cross-sectional time-series FGLS regression
Coefficients: generalized least squares
Panels: heteroskedastic with cross-sectional correlation
Correlation: no autocorrelation
Estimated covariances = 15 Number of obs = 100
Estimated autocorrelations = 0 Number of groups = 5
Estimated coefficients = 3 Time periods = 20
Wald chi2(2) = 1285.19
Prob > chi2 = 0.0000
Example 3
We can obtain the MLE results by specifying the igls option, which iterates the GLS estimation
technique to convergence:
. xtgls invest market stock, panels(correlated) igls
Iteration 1: tolerance = .2127384
Iteration 2: tolerance = .22817
(output omitted )
Iteration 1046: tolerance = 1.000e-07
Cross-sectional time-series FGLS regression
Coefficients: generalized least squares
Panels: heteroskedastic with cross-sectional correlation
Correlation: no autocorrelation
Estimated covariances = 15 Number of obs = 100
Estimated autocorrelations = 0 Number of groups = 5
Estimated coefficients = 3 Time periods = 20
Wald chi2(2) = 558.51
Log likelihood = -515.4222 Prob > chi2 = 0.0000
Example 4
If corr(ar1) is specified, each group is assumed to have errors that follow the same AR(1)
process; that is, the autocorrelation parameter is the same for all groups.
. xtgls invest market stock, panels(hetero) corr(ar1)
Cross-sectional time-series FGLS regression
Coefficients: generalized least squares
Panels: heteroskedastic
Correlation: common AR(1) coefficient for all panels (0.8651)
Estimated covariances = 5 Number of obs = 100
Estimated autocorrelations = 1 Number of groups = 5
Estimated coefficients = 3 Time periods = 20
Wald chi2(2) = 119.69
Prob > chi2 = 0.0000
Example 5
If corr(psar1) is specified, each group is assumed to have errors that follow a different AR(1)
process.
. xtgls invest market stock, panels(iid) corr(psar1)
Cross-sectional time-series FGLS regression
Coefficients: generalized least squares
Panels: homoskedastic
Correlation: panel-specific AR(1)
Estimated covariances = 1 Number of obs = 100
Estimated autocorrelations = 5 Number of groups = 5
Estimated coefficients = 3 Time periods = 20
Wald chi2(2) = 252.93
Prob > chi2 = 0.0000
Stored results
xtgls stores the following in e():
Scalars
e(N) number of observations
e(N ic) number of observations used to compute information criteria
e(N g) number of groups
e(N t) number of periods
e(N miss) number of missing observations
e(n cf) number of estimated coefficients
e(n cv) number of estimated covariances
e(n cr) number of estimated correlations
e(df) degrees of freedom
e(df pear) degrees of freedom for Pearson χ2
e(df ic) degrees of freedom for information criteria
e(ll) log likelihood
e(chi2) χ2
e(g min) smallest group size
e(g avg) average group size
e(g max) largest group size
e(rank) rank of e(V)
e(rc) return code
Macros
e(cmd) xtgls
e(cmdline) command as typed
e(depvar) name of dependent variable
e(ivar) variable denoting groups
e(tvar) variable denoting time within groups
e(coefftype) estimation scheme
e(corr) correlation structure
e(vt) panel option
e(rhotype) type of estimated correlation
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(chi2type) Wald; type of model χ2 test
e(rho) ρ
e(properties) b V
e(predict) program used to implement predict
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(Sigma) Σ
b matrix
e(V) variance–covariance matrix of the estimators
Functions
e(sample) marks estimation sample
Var(
d β b −1 X)−1
b GLS ) = (X0 Ω
For all our models, the Ω matrix may be written in terms of the Kronecker product:
Ω = Σm×m ⊗ ITi ×Ti
xtgls — Fit panel-data models by using GLS 11
The estimated variance matrix is obtained by substituting the estimator Σ b for Σ, where
0
i b
b i,j = b j
Σ
T
The residuals used in estimating Σ are first obtained from OLS regression. If the estimation is iterated,
residuals are obtained from the last fitted model.
Maximum likelihood estimates may be obtained by iterating the FGLS estimates to convergence
for models with no autocorrelation, corr(independent).
The GLS estimates and their associated standard errors are calculated using Σ b −1 . As Beck and
Katz (1995) point out, the Σ matrix is of rank at most min(T, m) when you use the pan-
els(correlated) option. For the GLS results to be valid (not based on a generalized inverse), T
must be at least as large as m, as you need at least as many period observations as there are panels.
Beck and Katz (1995) suggest using OLS parameter estimates with asymptotic standard errors that
are corrected for correlation between the panels. This estimation can be performed with the xtpcse
command; see [XT] xtpcse.
References
Baum, C. F. 2001. Residual diagnostics for cross-section time series regression models. Stata Journal 1: 101–104.
Beck, N. L., and J. N. Katz. 1995. What to do (and not to do) with time-series cross-section data. American Political
Science Review 89: 634–647.
Blackwell, J. L., III. 2005. Estimation and testing of fixed-effect panel-data systems. Stata Journal 5: 202–207.
Davidson, R., and J. G. MacKinnon. 1993. Estimation and Inference in Econometrics. New York: Oxford University
Press.
Greene, W. H. 2012. Econometric Analysis. 7th ed. Upper Saddle River, NJ: Prentice Hall.
. 2018. Econometric Analysis. 8th ed. New York: Pearson.
Grunfeld, Y., and Z. Griliches. 1960. Is aggregation necessarily bad? Review of Economics and Statistics 42: 1–13.
Herwartz, H., S. Maxand, F. H. C. Raters, and Y. M. Walle. 2018. Panel unit-root tests for heteroskedastic panels.
Stata Journal 18: 184–196.
Hoechle, D. 2007. Robust standard errors for panel regressions with cross-sectional dependence. Stata Journal 7:
281–312.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee. 1985. The Theory and Practice of Econometrics.
2nd ed. New York: Wiley.
Maddala, G. S., and K. Lahiri. 2006. Introduction to Econometrics. 4th ed. New York: Wiley.
Also see
[XT] xtgls postestimation — Postestimation tools for xtgls
[XT] xtpcse — Linear regression with panel-corrected standard errors
[XT] xtreg — Fixed-, between-, and random-effects and population-averaged linear models
[XT] xtregar — Fixed- and random-effects linear models with an AR(1) disturbance
[XT] xtset — Declare data to be panel data
[R] regress — Linear regression
[TS] newey — Regression with Newey–West standard errors
[TS] prais — Prais – Winsten and Cochrane – Orcutt regression
[U] 20 Estimation and postestimation commands