Generalized Least Squares
Generalized Least Squares
In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear
regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression
model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as
compared to conventional least squares and weighted least squares methods. It was first described by Alexander
Aitken in 1935.[1]
It requires knowledge of the covariance matrix for the residuals. If this is unknown, estimating the covariance
matrix gives the method of feasible generalized least squares (FGLS). However, FGLS provides fewer guarantees
of improvement.
Method
In standard linear regression models, one observes data on n statistical units with k − 1
predictor values and one response value each.
where each row is a vector of the predictor variables (including a constant) for the th data point.
The model assumes that the conditional mean of given to be a linear function of and that the conditional
variance of the error term given is a known non-singular covariance matrix, . That is,
where is a vector of unknown constants, called "regression coefficients", which are estimated from the
data.
If is a candidate estimate for , then the residual vector for is . The generalized least squares method
estimates by minimizing the squared Mahalanobis length of this residual vector:
which is equivalent to
which is a quadratic programming problem. The stationary point of the objective function occurs when
so the estimator is
The quantity is known as the precision matrix (or dispersion matrix), a generalization of the diagonal weight
matrix.
Properties
The GLS estimator is unbiased, consistent, efficient, and asymptotically normal with
GLS is equivalent to applying ordinary least squares (OLS) to a linearly transformed version of the data. This can
be seen by factoring using a method such as Cholesky decomposition. Left-multiplying both sides of
by yields an equivalent linear model:
This transformation effectively standardizes the scale of and de-correlates the errors. When OLS is used on data
with homoscedastic errors, the Gauss–Markov theorem applies, so the GLS estimate is the best linear unbiased
estimator for .
In GLS, a uniform (improper) prior is taken for , and as is a marginal distribution, it does not depend on
. Therefore the log-probability is
where the hidden terms are those that do not depend on , and is the log-likelihood. The maximum a
posteriori (MAP) estimate is then the maximum likelihood estimate (MLE), which is equivalent to the optimization
problem from above,
where has been substituted for , and the optimization problem has been re-written using the fact that the
logarithm is a strictly increasing function and the property that the argument solving an optimization problem is
independent of terms in the objective function, which do not involve said terms.
1. The model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals
are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to
examine the model adding additional constraints; for example, if the errors follow a time series
process, a statistician generally needs some theoretical assumptions on this process to ensure that
a consistent estimator is available).
2. Then, using the consistent estimator of the covariance matrix of the errors, one can implement GLS
ideas.
Whereas GLS is more efficient than OLS under heteroscedasticity (also spelled heteroskedasticity) or
autocorrelation, this is not true for FGLS. The feasible estimator is asymptotically more efficient (provided the
errors covariance matrix is consistently estimated), but for a small to medium-sized sample, it can be actually less
efficient than OLS. This is why some authors prefer to use OLS and reformulate their inferences by simply
considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial
autocorrelation. However, for large samples, FGLS is preferred over OLS under heteroskedasticity or serial
correlation.[3][4] A cautionary note is that the FGLS estimator is not always consistent. One case in which FGLS
might be inconsistent is if there are individual-specific fixed effects.[5]
In general, this estimator has different properties than GLS. For large samples (i.e., asymptotically), all properties
are (under appropriate conditions) common with respect to GLS, but for finite samples, the properties of FGLS
estimators are unknown: they vary dramatically with each particular model, and as a general rule, their exact
distributions cannot be derived analytically. For finite samples, FGLS may be less efficient than OLS in some
cases. Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small.
A method used to improve the accuracy of the estimators in finite samples is to iterate; that is, to take the residuals
from FGLS to update the errors' covariance estimator and then update the FGLS estimation, applying the same idea
iteratively until the estimators vary less than some tolerance. However, this method does not necessarily improve
the efficiency of the estimator very much if the original sample was small.
A reasonable option when samples are not too large is to apply OLS but discard the classical variance estimator
(which is inconsistent in this framework) and instead use a HAC (Heteroskedasticity and Autocorrelation
Consistent) estimator. In the context of autocorrelation, the Newey–West estimator can be used, and in
heteroscedastic contexts, the Eicker–White estimator can be used instead. This approach is much safer, and it is the
appropriate path to take unless the sample is large, where "large" is sometimes a slippery issue (e.g., if the error
distribution is asymmetric the required sample will be much larger).
For simplicity, consider the model for heteroscedastic and non-autocorrelated errors. Assume that the variance-
covariance matrix of the error vector is diagonal, or equivalently that errors from distinct observations are
uncorrelated. Then each diagonal entry may be estimated by the fitted residuals so may be constructed
by:
It is important to notice that the squared residuals cannot be used in the previous expression; an estimator of the
errors' variances is needed. To do so, a parametric heteroskedasticity model or nonparametric estimator can be used.
Under regularity conditions, the FGLS estimator (or the estimator of its iterations, if a finite number of iterations are
conducted) is asymptotically distributed as:
See also
Confidence region
Effective degrees of freedom
Prais–Winsten estimation
References
1. Aitken, A. C. (1935). "On Least Squares and Linear Combinations of Observations". Proceedings of
the Royal Society of Edinburgh. 55: 42–48. doi:10.1017/s0370164600014346 (https://doi.org/10.10
17%2Fs0370164600014346).
2. Strutz, T. (2016). Data Fitting and Uncertainty (A practical introduction to weighted least squares
and beyond). Springer Vieweg. ISBN 978-3-658-11455-8., chapter 3
3. Baltagi, B. H. (2008). Econometrics (4th ed.). New York: Springer.
4. Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
5. Hansen, Christian B. (2007). "Generalized Least Squares Inference in Panel and Multilevel Models
with Serial Correlation and Fixed Effects". Journal of Econometrics. 140 (2): 670–694.
doi:10.1016/j.jeconom.2006.07.011 (https://doi.org/10.1016%2Fj.jeconom.2006.07.011).
Further reading
Amemiya, Takeshi (1985). "Generalized Least Squares Theory" (https://books.google.com/books?i
d=0bzGQE14CwEC&pg=PA181). Advanced Econometrics (https://archive.org/details/advancedeco
nomet00amem). Harvard University Press. ISBN 0-674-00560-0.
Johnston, John (1972). "Generalized Least-squares" (https://books.google.com/books?id=BZtvwZA
GyV0C&pg=PA208). Econometric Methods (Second ed.). New York: McGraw-Hill. pp. 208–242.
Kmenta, Jan (1986). "Generalized Linear Regression Model and Its Applications" (https://books.goo
gle.com/books?id=Bxq7AAAAIAAJ&pg=PA607). Elements of Econometrics (Second ed.). New
York: Macmillan. pp. 607–650. ISBN 0-472-10886-7.
Beck, Nathaniel; Katz, Jonathan N. (September 1995). "What To Do (and Not to Do) with Time-
Series Cross-Section Data" (https://www.cambridge.org/core/journals/american-political-science-re
view/article/abs/what-to-do-and-not-to-do-with-timeseries-crosssection-data/0E778B85AB008DAF
8D13E0AC63505E37). American Political Science Review. 89 (3): 634–647. doi:10.2307/2082979
(https://doi.org/10.2307%2F2082979). ISSN 1537-5943 (https://www.worldcat.org/issn/1537-5943).
JSTOR 2082979 (https://www.jstor.org/stable/2082979). S2CID 63222945 (https://api.semanticscho
lar.org/CorpusID:63222945).