The_Overlapping_Data_Problem
The_Overlapping_Data_Problem
net/publication/228769233
CITATIONS READS
42 2,378
2 authors:
All content following this page was uploaded by Wade Brorsen on 17 February 2023.
Abstract
Overlapping data are often used in finance and economics, but applied work often uses inefficient
estimators. The article evaluates possible reasons for using overlapping data and provides a guide
about which estimator to use in a given situation.
JEL Classifications: C13.
Keywords: Autocorrelation, Monte Carlo, Newey-West, overlapping data.
a
Department of Agricultural Economics, Mississippi State University, PO Box 5187, Mississippi State,
MS 39762, U.S.A.; tel.: +1 662 325-2044, email [email protected]
b
Corresponding author. Department of Agricultural Economics, Oklahoma State University, 308 Ag
Hall, Stillwater, OK 74075, U.S.A.; tel.: +1 405 744-6836, email: [email protected]
© qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
1 Introduction
Time series studies estimating multiple-period changes can use overlapping data in order to
achieve greater efficiency. A common example is using annual returns when monthly data
are available. A one-year change could be calculated from January to December, another
from February to January, and so on. In this example the January to December and
February to January changes would overlap for eleven months. The overlapping of
observations creates a moving average (MA) error term and thus ordinary least squares
(OLS) parameter estimates would be inefficient and hypothesis tests biased (Hansen and
Hodrick, 1980). Past literature has recognized the presence of the moving average error
term. Our article seeks to improve econometric practice when dealing with overlapping
data by synthesizing and adding to the literature on overlapping data. We find limited
statistical reasons for not using the disaggregate data and that the preferred estimation
method can vary depending on the specific problem.
One way of dealing with the overlapping observations problem is to use a reduced
sample in which none of the observations overlap. For the example given above, the
reduced sample will have only one observation per year. Thus, for a 30-year period of
monthly data only 30 annual changes or observations will be used instead of 249 (the
maximum number of overlapping observations that can be created for this period) annual
observations. This procedure will eliminate the autocorrelation problem but it is obviously
highly inefficient. A second way involves using average data. For our example this means
using the average of the 12 overlapping observations that can be created for each year. This
procedure results in the same degree of data reduction and apparently ‘uses’ all the
information. In fact, not only is it inefficient, it also does not eliminate the moving average
error term (Gilbert, 1986) and can introduce autocorrelation not present in the original series
(Working, 1960). A third way is to use the overlapping data and to account for the moving
average error term in hypothesis testing. Several heteroskedasticity and autocovariance
consistent (HAC) estimators have been constructed that can provide asymptotically valid
hypothesis tests when using data with overlapping observations. These HAC estimators
include Hansen and Hodrick (HH), 1980, Newey-West (NW), 1987, Andrews and Monahan
(AM), 1990, and West (1997). A fourth way is to “transform the long-horizon overlapping
regression into a non-overlapping regression of one-period returns onto a set of transformed
79 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
regressors” Britten-Jones and Neuberger (B-JN), 2004. A final way is to use OLS
estimation with overlapping data, which yields biased hypothesis tests.
To illustrate the enormity of the problem the number of empirical articles involving the
use of overlapping data in regression analysis in three journals during 1996 and 2004 were
counted. The journals were, The Journal of Finance, The American Economic Review, and
The Journal of Futures Markets. The methods of estimation are classified as OLS with
non-overlapping data (OLSNO), OLS with the Newey-West (1987) variance covariance
estimator, OLS with any of the other generalized method of moments (GMM) estimators,
and just OLS.
The portion of articles using overlapping data increased from 1996 to 2004 (Table 1) so
that the majority of articles in finance now use overlapping data. Most of the empirical
articles that used overlapping data studied asset returns or economic growth. A common
feature of these articles is that returns or growth are measured over a period longer than the
observation period. For example, data are observed monthly and the estimation is done
annually. Authors provide several possible reasons for using aggregated data. The most
common reason given is measurement error in independent variables. For example, Jones
and Kaul (1996, p. 469), state that they select “use of quarterly data on all variables as a
compromise between the measurement errors in monthly data...”. Most authors provide no
justification for using overlapping data, but there must be some advantage to using it or it
would not be so widely used. Britten Jones and Neuberger (2004) contend, the use of
overlapping data is based more on economic reasons rather than statistical ones. Here, we
evaluate possible statistical reasons for using overlapping data.
80 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 1 also shows each of the estimation methods frequency of use. The OLSNO and
Newey-West estimation methods are used most often. We defined OLSNO as estimation
using non-overlapping observations. This means that the data exist to create overlapping
observations but the researchers chose to work with non-overlapping observations. It might
be more correct to say that OLSNO is used simply because it is not a practice to create
overlapping data. The OLSNO method will yield unbiased and consistent parameter
estimates and valid hypothesis tests. But it will be inefficient since it “throws away
information.”
We first demonstrate that the commonly used Newey-West and OLSNO methods can be
grossly inefficient ways of handling the overlapping data problem. This is done by
determining and comparing the small-sample properties of Newey-West, OLSNO,
maximum likelihood estimation (MLE), and generalized least squares (GLS) estimates.
Unrestricted MLE is included as an alternative to GLS to show what happens when the MA
coefficients are estimated.1 Then, we consider possible statistical reasons for using
overlapping data such as nonnormality, missing data, and errors in variables. Finally, we
evaluate ways of handling overlapping data when there are economic reasons for doing so.
While Newey-West and OLSNO estimation provide inefficient estimates the GLS
estimation cannot be applied in every situation involving overlapping data. An example
would be when lagged values of the dependent variable or some other endogenous variable
are used as an explanatory variable. In this case, as Hansen and Hodrick (1980) argue, the
GLS estimates will be inconsistent since an endogeneity problem is created when the
dependent and explanatory variables are transformed. For the specific case of overlapping
data considered by Hansen and Hodrick (1980), we have little to add to the previous
literature (e.g. Mark, 1995) that favors using the bootstrap to correct the small sample bias
in the Hansen and Hodrick approach. With a general multivariate time series model, often
overlapping data cannot be used to recover estimates of the disaggregate process that
generated the data. The percentage of cases where lagged values of the dependent variable
are used as an explanatory variable is reported in Table 1. In The Journal of Finance less
than 25 percent of articles included a lagged dependent variable as an explanatory variable
(half with the Newey-West estimator and half with OLSNO). In the American Economic
Review about 7 percent (all with the Newey-West estimator) of the articles included a
1
With normality, the GLS estimator is the maximum likelihood estimator. The true MLE would have the
parameters of the moving average process be known rather than estimated. Such a restricted MLE should be
considered with large sample sizes since it uses less storage than GLS.
81 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
lagged dependent variable. Thus, in most cases where nonoverlapping data are used, there
are no lagged dependent variables and so more precise estimation methods are available.
The rest of the paper is structured as follows. Section 2 considers the simplest case
where the data represent aggregates and the explanatory variables are strictly exogenous.
Section 3 discusses the OLSNO and Newey-West estimation methods and their
inefficiency. Section 4 conducts a Monte Carlo study to determine the size and power of the
hypothesis tests when using overlapping data and GLS, OLSNO, Newey-West, and
unrestricted MLE estimation methods. Sections 5 and 6 consider possible statistical and
economic reasons, respectively, for using overlapping data. Section 7 discusses how to
handle overlapping data in several special cases that do not fit any of the standard
procedures. Section 8 concludes.
There are many variations on the overlapping data problem. We first consider the simplest
case where the data represent aggregates and the explanatory variables are strictly
exogenous. This is the most common case in the literature such as when annual data are
used for dependent and independent variables and monthly data are available for both.
To consider the overlapping data problem, start with the following regression equation:
yt = β ′xt + ut , (1)
82 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
where k is the number of periods for which the changes are estimated. If n is the original
sample size, then n − k + 1 is the new sample size. These transformations of the dependent
and independent variables induce an MA process in the error terms of equation (2).
Because the original error terms were uncorrelated with zero mean, it follows that:
k −1 k −1
E [ et ] = E [ ∑ u t + j ] = ∑ E [u t + j ] = 0. (4)
j =0 j =0
Also, since the successive values of uj are homoskedastic and uncorrelated, the
unconditional variance of et is
var[et ] = σ e2 = E[et2 ] = kσ u2 . (5)
Based on the fact that two different error terms, et and et + s , have k - s common original error
terms, u, for any k − s > 0 , the covariances between the error terms are
cov[et , et + s ] = E[et , et + s ] = (k − s )σ u2 ∀(k − s ) > 0. (6)
k−s
corr[et , et + s ] = ∀(k − s ) > 0. (7)
k
Collecting terms we have as an example the case of n = k + 2 :
⎡ k −1 k−s 1 ⎤
⎢1 ... ... 0 … 0 ⎥
k k k
⎢ ⎥
⎢ k − 1 k −1 k−s 1
1 ... ... … 0 ⎥
⎢ k k k k ⎥
⎢ k −1 k −1 k−s 1 ⎥
⎢... 1 ... ... … ⎥
⎢ k k k k ⎥
⎢... ... ... ... ... ... ... ... … ⎥
Ω=⎢ ⎥ , (8)
⎢... ... ... ... ... ... ... ... … ⎥
⎢ 1 k−s k −1 k −1 ⎥
⎢… ... ... 1 ... ⎥
⎢ k k k k ⎥
⎢ 1 k−s k −1 k − 1⎥
⎢0 … ... ... 1
k k k k ⎥
⎢ ⎥
⎢0 1 k−s k −1
… 0 ... ... 1 ⎥
⎢⎣ k k k ⎥⎦
where, Ω is the correlation matrix. The correlation matrix, Ω, appears in Gilbert’s article
and the presence of a moving average error term is commonly recognized.
With Ω derived analytically the GLS parameter estimates and their variance-covariance
matrix can be obtained as follows:
βˆ = ( X' Ω −1 X ) −1 X' Ω −1Y (9)
83 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
and
var [ βˆ ] = σ e2 ( X' Ω −1 X ) −1 , (10)
The next issue to be discussed is the OLSNO and Newey-West estimation methods and
their inefficiency. We consider only Newey-West rather than the alternative GMM
estimators. As Davidson and MacKinnon (1993, p. 611) say “the Newey-West estimator is
never greatly inferior to that of the alternatives.” With the Newey-West estimation method,
parameter estimates are obtained by using OLS. The OLS estimate βˆ is unbiased and
consistent but inefficient. The OLS estimate of σe2 is biased and inconsistent. The Newey
and West (1987) autocorrelation consistent covariance matrix is computed using the OLS
residuals.
The OLSNO estimation method estimates parameters using OLS with a reduced sample
where the observations do not overlap. The OLSNO estimates of the variance are unbiased
since with no overlap there is no autocorrelation. The OLSNO parameter estimates are less
efficient than the GLS estimates because of the reduced number of observations used in
estimation.
While it is known that GLS is the preferred estimator, the loss from using one of the
inferior estimators in small samples is not known. We use a Monte Carlo study to provide
information about the small-sample differences among the estimators.
84 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
A Monte Carlo study was conducted to determine the size and power of the hypothesis tests
when using overlapping data and GLS, OLSNO, Newey-West, and unrestricted MLE,
estimation methods. The Monte Carlo study also provides a measure of the efficiency lost
from using OLSNO, Newey-West, and when the MA coefficients are estimated. The mean
and the variance of the parameter estimates are calculated to measure bias and efficiency.
Mean-squared error (MSE) is also computed. To determine the size of the hypothesis tests,
the percentage of the rejections of the true null hypotheses are calculated. To determine the
power of the hypothesis tests the percentages of the rejections of false null hypotheses are
calculated.
Data are generated using Monte Carlo methods. A single independent2 variable x with an
i.i.d. uniform distribution (0,1) and error terms u with a standard normal distribution are
generated. We also considered a N(0,1) for x but these results are not included here since
the conclusions did not change. The options RANUNI and RANNOR in SAS software are
used. The dependent variable y is calculated based on the relation represented in equation
(1). For simplicity β is assumed equal to one. The data set with overlapping observations
of X and Y is created by summing the x’s and y’s as in equation (3).
The regression defined in equation (2) was estimated using the set of data containing X
and Y. The number of replications is 2000. For each of the 2000 original samples, different
vectors x and u are used. This is based on Edgerton’s (1996) findings that using stochastic
exogenous variables in Monte Carlo studies improves considerably the precision of the
estimates of power and size. Six sample sizes T are used, respectively, 30, 100, 200, 500,
1000, and 2000. Three levels of overlapping k-1 are used, respectively, 1, 11, and 29. The
level 11 is chosen because it corresponds to using annual changes when monthly data are
available.
2
When autocorrelation in x is large and the error term follows a first-order autoregressive process, Greene
(1997, p. 589) finds that the inefficiency of OLS relative to GLS increases when the x’s are positively
autocorrelated. Since many real-world datasets have explanatory variables that are positively autocorrelated,
the inefficiency of OLS found here may be conservative.
85 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
The OLSNO, the Newey-West, and GLS estimates of β were obtained for each of the
2000 samples using PROC IML in SAS software. The unrestricted MLE estimates of β
were obtained using PROC ARIMA in SAS. The Ω matrix to be used in GLS estimation
was derived in equation (8). The Newey-West estimation was validated by comparing it
with the available programmed estimator in SHAZAM software using the OLS ...
/AUTCOV option. The power of the tests are calculated for the null hypothesis β = 0.
The means of the parameter estimates and their standard deviations as well as the MSE
values for the three overlapping levels 1, 11, and 29, for the OLSNO, Newey-West, and
GLS are presented in Tables 2, 3, and 4. The true standard deviations for the GLS
estimation are lower than those for the OLSNO and Newey-West estimation. This
demonstrates that the Newey-West and OLSNO parameter estimates are less efficient than
the GLS estimates. The inefficiency is greater as the degree of overlapping increases and as
the sample size decreases. For a sample size of 100 and overlapping level 29, the sample
variance of the GLS estimates is 0.119 while the sample variance of the Newey-West and
OLSNO estimates is 2.544 and 7.969, respectively. Besides the more efficient parameter
estimates, the difference between the estimated and actual standard deviations of the
parameter estimates are almost negligible for the GLS estimation regardless of sample size
or overlapping level. The estimated standard deviations for the OLSNO estimation show no
biases as expected. The Newey-West estimation tends to underestimate the actual standard
deviations even for overlapping level 1. The degree of underestimation increases with the
increase of overlapping level and as sample size decreases. Sometimes the estimated
standard deviation is only one-fourth of the true value. The Newey-West covariance
estimates have previously been found to be biased downward in small samples (e.g.
Goetzmann and Jorion, 1993; Nelson and Kim, 1993; Britten-Jones and Neuberger, 2004).
The parametric bootstrap suggested by Mark (1995) and used by Irwin, Zulauf and Jackson
(1996) can lead to tests with correct size, but still uses the inefficient OLS estimator.
The inferiority of the Newey-West and OLSNO parameter estimates compared to the
GLS estimates is also supported by the MSE values computed for the three methods of
estimation. Thus, for the sample size 100 and the overlapping level 29, the MSE for the
GLS, Newey-West, and OLSNO estimation is respectively 0.12, 2.55, and 8.02.
86 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 2: Parameter estimates, standard deviations, and MSE for OLSNO, Newey-
West, and GLS estimation (overlapping 1).
GLS Estimation Newey-West Estimation Non-overlapping Estimation
Sample Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE
Size Estimates Deviations Estimates Deviations Estimates Deviations
0.631 a 0.893 a
30 0.981 0.639 a 0.440 0.971 0.654 0.970 0.865
0.663 b 0.808 b 0.930 b
0.348 a 0.374 a 0.490 a
100 1.005 0.119 0.996 0.179 0.997 0.247
0.345 b 0.423 b 0.497 b
0.246 a 0.269 a 0.346 a
200 0.993 0.060 0.993 0.092 0.989 0.119
0.244 b 0.303 b 0.345 b
0.155 a 0.172 a 0.219 a
500 1.001 0.024 1.003 0.036 1.001 0.048
0.154 b 0.189 b 0.218 b
0.110 a 0.122 a 0.155 a
1000 1.001 0.012 0.997 0.018 1.005 0.024
0.109 b 0.134 b 0.156 b
0.077 a 0.086 a 0.110 a
2000 1.002 0.007 0.998 0.010 1.002 0.014
0.082 b 0.098 b 0.116 b
Notes: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated
standard deviations of the parameter estimates. b These are the actual standard deviations of the parameter
estimates. The model estimated is equation (2), where Yt and X t represent some aggregation of the original
disaggregated variables. For simplicity β is chosen equal to 1. The model is estimated using Monte Carlo
methods involving 2000 replications. The errors for the original process are generated from a standard normal
distribution and are homoskedastic and not autocorrelated. As a result of the aggregation, et follows an MA
process with the degree of the process depending on the aggregation level applied to x and y.
Table 3: Parameter estimates, standard deviations, and MSE for OLSNO, Newey-
West, and GLS estimation (overlapping 11).
GLS Estimation Newey-West Estimation Non-overlapping Estimation
Sample Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE
Size Estimates Deviations Estimates Deviations Estimates Deviations
Notes: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated
standard deviations of the parameter estimates. b These are the actual standard deviations of the parameter
estimates.
87 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 4: Parameter estimates, standard deviations, and MSE for OLSNO, Newey-
West, and GLS estimation (overlapping 29).
GLS Estimation Newey-West Estimation Non-overlapping Estimation
Sample Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE
Size Estimates Deviations Estimates Deviations Estimates Deviations
Notes: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated
standard deviations of the parameter estimates. b These are the actual standard deviations of the parameter
estimates. c These values cannot be estimated because of the very small number of observations.
The means of the parameter estimates and their standard deviations as well as the MSE
values for the three overlapping levels 1, 11, and 29, for the unrestricted MLE are presented
in Table 5. The results are similar to the results presented for the GLS estimation.
However, in small samples the actual standard deviations of the MLE estimates are larger
than those of the GLS estimates. As the degree of overlapping increases, the sample size
for which the standard deviations for both methods are similar, also increases (e.g. from 100
for overlapping 1 to 1000 for overlapping 29).
88 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 5: Parameter estimates, standard deviations, and MSE for the maximum
likelihood estimates assuming the MA coefficients are unknown for three
levels of overlapping (1, 11, and 29).
Overlapping 1 Overlapping 11 Overlapping 29
Sample Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE
Size Estimates Deviations Estimates Deviations Estimates Deviations
Notes: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated
standard deviations of the parameter estimates. b These are the actual standard deviations of the parameter
estimates. c These values cannot be estimated because of the very small number of observations.
The Newey-West and OLSNO estimation methods also perform considerably poorer
than the GLS estimation in hypothesis testing. The hypothesis testing results are presented
in Table 6. The Newey-West estimator rejects true null hypotheses far too often. In one
extreme case, it rejected a true null hypothesis 50.0% of the time instead of the expected
5%. In spite of greatly underestimating standard deviations, the Newey-West estimator has
considerably less power than GLS except with the smallest sample sizes considered. While
the OLSNO estimation has the correct size, the power of the hypothesis tests is much less
than the power of the tests with GLS.
The results of the hypothesis tests for the unrestricted MLE are presented in Table 7.
While the power of the hypothesis tests is similar to the power for the GLS estimation, the
size is generally larger than the size for the GLS estimation. Unrestricted MLE tends to
reject true null hypotheses more often than it should. However, this problem is reduced or
eliminated as larger samples are used, i.e. 500, 1000, 2000 observations. Table 7 also
presents the number of replications as well as the number/percentage of replications that
converge. Fewer replications converge as the degree of overlap increases and as sample
size decreases. Given the convergence problems, as shown in Table 7, it can be concluded
89 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
that, when MLE is chosen as the method of estimating equation (2), the MA coefficients
should be restricted rather than estimated unless the sample size is quite large.
Table 6: Power and Size Values of the Hypothesis Tests for OLSNO, Newey-West, and
GLS Estimation (Overlapping 1, 11, 29).
Degree of Sample GLS Estimation Newey-West Estimation Non-overlapping Estimation
Overlapping Size
Power Size Power Size Power Size
90 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 7: Power and size values of the hypothesis tests for the maximum likelihood
estimates assuming the MA coefficients are unknown for three levels of
overlap (1, 11, and 29).
Degree of Sample Total Number Replications that Converge
Overlap Size of Replications Power b Sizeb
Number Percentage
Notes: The sample sizes are the sizes for samples with overlapping observations. a These values cannot be
estimated because of the very small number of observations. b These are calculated based on the number of
replications that converged.
If the explanatory variables were strictly exogenous, no observations were missing, and the
errors were distributed normally as assumed so far, there are no statistical reasons to use
overlapping data since the disaggregate model could be estimated. We now consider
possible statistical reasons for using overlapping data.
91 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Missing observations can be a reason to use overlapping data. It is not unusual in studies of
economic growth to have key variables observed only every five or ten years at the start of
the observation period, but every year in more recent years. Using overlapping data allows
using all of the data. In this case, the disaggregate model cannot be estimated so OLSNO is
what has been used in the past.
When some observations are missing, one can derive the correlation matrix in equation
(8) as if all observations were available and then delete the respective rows and columns for
the missing overlapping observations and thus use GLS estimation. The Newey-West
estimator assumes autocovariance stationarity and so available software packages that
include the Newey-West estimator would not correctly handle missing observations. It
should, however, be possible to modify the Newey-West estimator to handle missing
observations. From this discussion it can be argued that the case of missing observations is
a statistical reason for using overlapping data that stands up to scrutiny but more efficient
estimators are available than the often used OLSNO.
5.2 Nonnormality
The GLS estimator does not assume normality, so estimates with GLS would remain best
linear unbiased and asymptotically efficient even under nonnormality. The hypothesis tests
derived, however, depend on normality. Hypothesis tests based on normality would still be
valid asymptotically provided the assumptions of the central limit theorem hold. As the
degree of overlapping increases, the residuals would approach normality, so nonnormality
would be less of a concern. The Newey-West estimator is also only asymptotically valid.
The GLS transformation of the residuals might also speed the rate of convergence toward
normality since it is “averaging” across more observations than the OLS estimator used
with Newey-West.
We estimated equation (2) with two correlated x’s and with the error term u following a
t-distribution with four degrees of freedom. Results are reported in Table 8. The main
difference with the previous results is the increased standard deviations for all methods of
estimation. Proportionally, the increase in standard deviations is slightly larger for Newey-
West and OLSNO. Thus, the Monte Carlo results support our hypothesis that the
92 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
advantages of GLS would be even greater in the presence of nonnormality. This can also be
seen from the hypothesis test results presented in Table 8. The power of the three methods
of estimation is reduced with the biggest reduction occurring for the Newey-West and
OLSNO. Finally, the increase of the standard deviations and the resulting reduction in the
power of hypothesis tests, is larger when the correlation between the two x’s increases. This
is true for the three methods of estimation. However, the GLS results are almost identical to
the results from the disaggregate model. This means that lack of normality cannot be a
valid statistical reason for using overlapping data when the disaggregate data are available.
93 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 8: Parameter estimates, standard deviations, MSE, and power and size of hypothesis tests for OLSNO, Newey-West, and GLS
estimation with two Xs and nonnormal errors (overlapping 1, 11, and 29).
Degree GLS Estimation Newey-West Estimation Non-overlapping Estimation Disaggregate Estimation
of Sample
Overlap Size Parameter Standard MSE Power Size Parameter Standard MSE Power Size Parameter Standard MSE Power Size Parameter Standard MSE Power Size
Estimates Deviations Estimates Deviations Estimates Deviations Estimates Deviations
1 30 1.014 0.953 a 1.007 0.208 0.046 0.997 0.898 a 1.606 0.288 0.152 1.049 1.334 a 3.220 0.201 0.128 1.012 0.977 a 1.058 0.192 0.050
1.003 b 1.267 b 1.794 b 1.029 b
100 0.969 0.498 a 0.261 0.494 0.053 0.969 0.526 a 0.386 0.460 0.095 0.999 0.700 a 0.766 0.342 0.111 0.970 0.517 a 0.276 0.467 0.052
0.510 b 0.621 b 0.875 b 0.525 b
500 1.008 0.226 a 0.050 0.988 0.051 1.005 0.249 a 0.074 0.956 0.082 0.996 0.317 a 0.152 0.832 0.117 1.008 0.236 a 0.054 0.983 0.055
0.223 b 0.273 b 0.390 b 0.233 b
1000 1.004 0.159 a 0.024 1 0.042 1.001 0.177 a 0.037 0.999 0.070 1.002 0.225 a 0.082 0.971 0.121 1.004 0.166 a 0.027 1 0.039
0.155 b 0.192 b 0.286 b 0.163 b
11 30 1.019 0.943 a 0.890 0.202 0.049 0.977 0.830 a 6.684 0.579 0.541 -- c -- c -- c -- c -- c 1.010 0.833 a 0.757 0.239 0.053
0.943 b 2.585 b -- c 0.870 b
100 0.994 0.507 a 0.274 0.498 0.052 0.998 0.915 a 2.196 0.338 0.244 0.944 2.059 a 4.975 0.072 0.051 1.001 0.502 a 0.274 0.516 0.053
0.523 b 1.482 b 2.230 b 0.523 b
500 1.008 0.226 a 0.051 0.993 0.049 1.010 0.524 a 0.439 0.517 0.138 1.035 0.810 a 0.687 0.236 0.056 1.007 0.233 a 0.054 0.988 0.052
0.225 b 0.663 b 0.828 b 0.233 b
1000 1.003 0.159 a 0.025 1 0.042 1.022 0.378 a 0.209 0.734 0.107 1.016 0.557 a 0.323 0.432 0.057 1.002 0.164 a 0.027 1 0.040
0.159 b 0.457 b 0.568 b 0.166 b
29 30 1.014 0.935 a 0.990 0.193 0.056 1.014 0.654 a 6.833 0.629 0.611 -- c -- c -- c -- c -- c 0.995 0.680 a 0.527 0.319 0.051
0.995 b 2.614 b -- c 0.726 b
100 1.009 0.507 a 0.294 0.513 0.046 0.995 0.911 a 5.420 0.505 0.455 0.982 4.919 a 81.94 0.063 0.059 1.020 0.466 a 0.237 0.599 0.041
0.543 b 2.328 b 9.052 b 0.486 b
500 1.010 0.226 a 0.051 0.989 0.050 0.958 0.759 a 1.085 0.335 0.177 0.950 1.350 a 1.920 0.103 0.052 1.009 0.228 a 0.052 0.988 0.046
0.225 b 1.041 b 1.385 b 0.229 b
1000 1.000 0.160 a 0.026 1 0.058 1.008 0.570 a 0.547 0.464 0.143 1.023 0.898 a 0.818 0.200 0.056 1.001 0.164 a 0.028 1 0.061
0.162 b 0.739 b 0.904 b 0.168 b
Notes: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated standard deviations of the parameter estimates. b These are the actual
standard deviations of the parameter estimates. c These values cannot be estimated because of the very small number of observations.
94 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
The most common reason authors give for using overlapping data is errors in the
explanatory variables. Errors in the explanatory variables causes parameter estimates to be
biased toward zero, even asymptotically. Using overlapping data reduces this problem, but
the problem is only totally removed as the level of overlap, k, approaches infinity.
We added to the x in equation (1) a measurement error, ω, that is distributed normally
with the same variance as the variance of x, ω ~ N(0, 1/12). We then conducted the Monte
Carlo study with x not being autocorrelated and also with x being autocorrelated with an
autoregressive coefficient of 0.8. In addition to estimating equation (2) with GLS, Newey-
West, and OLSNO, we also estimated equation (1) using the disaggregate data. The results
are reported in Table 9. The estimation was performed only for two sample sizes,
respectively 100 and 1000 observations. In the case when x is not autocorrelated, there is
no gain in using overlapping observations, in terms of reducing the bias due to measurement
error. This is true for all methods of estimation. GLS would be the preferred estimator
since it is always superior to Newey-West and OLSNO in terms of MSE, especially as the
overlapping level increases relative to the sample size.
In the case when x is autocorrelated, for relatively low level of overlap in relation to the
sample size, Newey-West and OLSNO have the smaller MSE, as a result of smaller bias.
As the degree of overlap increases relative to the same sample size, the GLS estimator
would be preferred compared to Newey-West and OLSNO estimators based on smaller
MSE as a result of the smaller variance. Thus the trade-off for the researcher is between
less biased parameter estimates with Newey-West or OLSNO versus smaller standard
deviations for the parameter estimates with GLS. However, the GLS transformation of the
variables does not reduce further the measurement error producing estimates that are just
barely less biased than the disaggregate estimates. On the other hand, Newey-West and
OLSNO standard errors are still biased with the bias increasing as the overlapping level
increases. So the preferred estimation method in the presence of large errors in the
variables would be OLS with overlapping data and with standard errors calculated using
Monte Carlo methods.
95 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 9: Parameter estimates, standard deviations, and MSE, for GLS, Newey-West, OLSNO, and the disaggregate estimation with measurement
errors in X (overlapping 1, 11, and 29).
Correlation Sample Degree of GLS Estimation Newey-West Estimation Non-overlapping Estimation Disaggregate Estimation
of Size Overlap
X Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE Parameter Standard MSE
Estimates Deviations Estimates Deviations Estimates Deviations Estimates Deviations
0 100 1 0.494 0.252 a 0.320 0.493 0.269 a 0.354 0.494 0.360 a 0.389 0.494 0.250 a 0.318
0.252 b 0.311 b 0.361 b 0.250 b
11 0.509 0.252 a 0.310 0.512 0.479 a 0.784 0.503 0.952 a 1.303 0.510 0.239 a 0.303
0.263 b 0.739 b 1.028 b 0.251 b
29 0.495 0.253 a 0.320 0.480 0.501 a 1.675 0.390 1.789 a 5.709 0.497 0.222 a 0.303
0.254 b 1.185 b 2.310 b 0.223 b
1000 1 0.499 0.079 a 0.257 0.502 0.088 a 0.257 0.501 0.112 a 0.261 0.499 0.079 a 0.257
0.077 b 0.095 b 0.111 b 0.077 b
11 0.502 0.079 a 0.255 0.499 0.189 a 0.303 0.497 0.277 a 0.332 0.501 0.079 a 0.255
0.080 b 0.227 b 0.281 b 0.080 b
29 0.499 0.079 a 0.257 0.517 0.285 a 0.366 0.509 0.441 a 0.440 0.499 0.078 a 0.257
0.078 b 0.364 b 0.445 b 0.077 b
0.8 c 100 1 0.718 0.191 a 0.119 0.816 0.174 a 0.080 0.816 0.218 a 0.084 0.716 0.190 a 0.120
0.199 b 0.214 b 0.223 b 0.198 b
11 0.731 0.187 a 0.111 0.931 0.187 a 0.096 0.934 0.337 a 0.127 0.721 0.181 a 0.113
0.196 b 0.302 b 0.351 b 0.187 b
29 0.730 0.186 a 0.110 0.963 0.174 a 0.186 0.966 0.536 a 0.493 0.720 0.166 a 0.109
0.194 b 0.429 b 0.701 b 0.174 b
1000 1 0.735 0.058 a 0.074 0.833 0.055 a 0.032 0.832 0.066 a 0.033 0.734 0.058 a 0.074
0.060 b 0.065 b 0.067 b 0.060 b
11 0.733 0.058 a 0.075 0.940 0.071 a 0.011 0.941 0.096 a 0.013 0.732 0.058 a 0.075
0.062 b 0.086 b 0.097 b 0.062 b
29 0.736 0.058 a 0.073 0.954 0.091 a 0.016 0.950 0.135 a 0.021 0.735 0.057 a 0.074
0.061 b 0.116 b 0.138 b 0.060 b
Note: The sample sizes are the sizes for samples with overlapping observations. a These are the estimated standard deviations of the parameter estimates. b These are the actual
standard deviations of the parameter estimates. c The x is generated as follows: xt = x0t + ωt, where x0t ~ uniform (0, 1) and ωt ~ N (0, 1/12).
96 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
The economic reasons for using overlapping data typically involve lagged variables. Even
though in most cases where overlapping data are used, there are no lagged variables, it is
important to consider the case because it is the case that has generated the most econometric
research. The lagged variable may be strictly exogenous or be a lagged value(s) of the
dependent variable. With lagged variables and overlapping data, GLS is generally
inconsistent and so the situation has generated considerable research interest. A second
economic reason for using overlapping data is the case of long-horizon regressions.
Examples of long-horizon regressions include the case of expected stock return (as
dependent variable) and dividend yield (as explanatory variable) and also the case of the
GDP growth and nominal money supply. Since the economic reason for using overlapping
data is increased prediction accuracy, we compare prediction errors when using overlapping
data to those using disaggregate data.
The case of overlapping data and a lagged dependent variable (or some other variable that is
not strictly exogenous) was a primary motivation for Hansen and Hodrick’s (1980)
estimator. In the textbook case of autocorrelation and a lagged dependent variable, ordinary
least squares estimators are inconsistent.
Engle (1969) shows that when the first lag of the aggregated dependent variable is used
as an explanatory variable, that using OLS and aggregated data could lead to bias of either
sign and almost any magnitude. Generalized least squares is also inconsistent. Consistent
estimates can be obtained using the maximum likelihood methods developed for time-series
models.
With a lagged dependent variable in the right-hand side (for simplicity we are using a
lag order of one), equation (1) now becomes:
yt = α 0 + α1 yt −1 + α 2 x1 + ut , ut ~ N (0,1), (11)
97 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
where for simplicity α 0 = 0 and α 2 = 1 . The value selected for α1 = 1 is 0.5. To get the
overlapping observations, for k = 3 apply equation (3) to (11) to obtain the equivalent model
of equation (2) as
Yt = 0.5Yt −1 + X t + et , (12)
The resulting error term ε t in equation (14) is a MA process of order four of the error term
3
See also Brewer (1973), Wei (1981), and Weiss (1984).
4
The model considered by Hansen and Hodrick (1980) is Yt = β + β1Yt − 3 + β 2Yt − 4 .
5
Equation (14) is obtained by substituting for Yt -1 and then for Yt -2 in equation (12).
98 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
99 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 10: Parameter estimates of different models for the case of the lagged dependent variable.
Equation Method of
Number Estimation Data Estimated Model
Note: The models in Table 10 are estimated using a large Monte Carlo sample of 500,000 observations. The unrestricted maximum likelihood estimates are
obtained using PROC ARIMA in SAS.
100
© qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
xt = μ + ρ xt −1 + ν t , (17)
As Valkanov (2003, p. 205) argues “Intuitively, the aggregation of a series into a long-
horizon variable is thought to strengthen the signal, while eliminating the noise.” Several
studies have attempted to provide more efficient estimators of the standard errors compared
to the ordinary least squares estimates.6 Valkanov (2003) suggests rescaling of the t-
statistic by the square root of the sample size, Hjalmarsson (2004) suggests rescaling the t-
statistic by the square root of the level of aggregation, k, and Hansen and Tuypens (2004)
suggest rescaling the t-statistic by the square root of ⅔ of the level of aggregation, k.
Two scenarios are considered. The first one is the scenario where the returns are
unpredictable which implies that β = 0. The second is when returns are predictable, so that
β ≠ 0 (studies usually assume β = 1). The first scenario is exactly the case considered in
Section 2, so the conclusions from Section 2 apply here.
For the case when returns are predictable, β = 1, we perform Monte Carlo simulations to
compare the standard errors of the GLS estimator and the OLS standard errors rescaled by
the square root of the sample size (Valkanov, 2003), by the square root of the level of
6
We also conducted Monte Carlo simulations for the transformations proposed by Britten-Jones and
Neuberger (2004) to this model. Results are not reported since the β estimates from the BJN transformations
were inconsistent.
101 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
aggregation (Hjalmarsson, 2004) and by the square root of ⅔ of the level of aggregation
(Hansen and Tuypens, 2004). We use two levels of aggregation: k=12 with sample sizes,
50, 100, 250, 500 and 1000, and k=75 with sample sizes, 200, 250, 500, 750 and 1000.
5000 replications are done for each case. We conduct Monte Carlo simulations for
commonly used assumptions of ρ = 1 in equation (19) and σ 12 = σ 21 = ±0.9 in equation
(18). We also conduct simulations by changing ρ to 0.9 and σ 12 to 0.5 and 0.1. In
addition, we assume α = μ = 0 .
A summary of the results from the above simulation follows. In general, all estimators
and rescaling approaches produce very good power against the alternative hypothesis β = 0.
However, this is not true for the size of the tests. Size is the critical issue when using an
approximation like this because a test should be conservative so that if the test rejects the
null hypothesis, the researcher can be confident that the conclusion is correct. For the high
absolute values of σ 12 considered above, respectively 0.9 and -0.9, the most promising
approach is the one suggested by Hansen and Tuypens (2004) where standard errors are
rescaled by the square root of ⅔ of the level of aggregation. However, this approach still
produces correct test sizes only when the ratio level of aggregation/sample size is close to
1/10. Test sizes are greater than the nominal size when the ratio level of
aggregation/sample size is less than 1/10 and less than the nominal size when the ratio level
of aggregation/sample size is greater than 1/10. Our simulations suggest that better test
sizes are produced when the following adjustment is applied to the Hansen and Tuypens
2
approach. When the ratio is less than 1/10, the adjustment is 3
(0 .9 k −1) ; when the ratio
2 ⎛ k⎞
is greater than 1/10 the adjustment is k ⎜0.9 + ⎟
3 ⎝ n⎠
. Test sizes for the Hansen and Tuypens
(HT) and our modified version of the HT approach are reported in Table 11. Table 11 also
reports test sizes for the cases when σ 12 equals 0.5 and 0.1. In the case when σ 12 = 0.5 , the
modified Hjalmarsson (2004) rescaling of the standard errors by the square root of the level
of aggregation produces better test sizes for different sample/aggregation level combination.
In this case, for ratios less than 1/10, the adjustment is (0.9 k −1) , and for ratios greater than
⎛ k⎞
1/10 the adjustment is k ⎜0.9 +
⎝
⎟
n⎠
. Finally, when σ 12 = 0.1 , the GLS standard errors
produce good test sizes. This is not surprising since a small σ 12 brings us closer to the case
of exogenous independent variables discussed in Section 2. Therefore, no modified
102 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
rescaling is needed in this last case. However, we also report in Table 11 the test sizes for
the unmodified Hjalmarsson (2004) rescaling as a comparison to the GLS test sizes.
=
1
Table 11: Size of hypotheses tests ( β ) for the Hansen and Tuypens (HT) and
Hjalmarsson (H) rescaling and our modified HT and H approaches for
long-horizon return regressions (overlapping 11 and 74).
σ12 = 0.9
Overlapping level 11
Overlapping level 74
σ12 = 0.5
Overlapping level 11
Overlapping level 74
σ12 = 0.1
Overlapping level 11
Overlapping level 74
In this section we compare the prediction accuracy of the aggregate and disaggregate
models. We also compare the prediction accuracy for the Hansen and Hodrick (HH) model
given in footnote 4. The disaggregate model with the HH estimator is similar to the
aggregate model with the only change being the dependent variable is disaggregate (one-
period change). Numerous authors have argued for using overlapping data when predicting
multiperiod changes. Bhansali (1999), however, reviews the theoretical literature and finds
no theoretical advantage to using overlapping data when the number of lags is known.
Marcellino, Stock, and Watson (2006) consider a number of time series and find no
empirical advantage to using overlapping data even when using pretesting to determine the
number of lags to include. Since the models here are known, theory predicts no advantage to
using overlapping data.
The Monte Carlo simulation involves generating 50,000 sample pairs. The first sample
is used to obtain the parameter estimates for the aggregate and disaggregate models for all
cases. The second sample is used to obtain predicted values by utilizing the parameter
estimates for the first sample. Dynamic forecasting is used to obtain predicted values of the
lagged values in the disaggregate model.
The means and standard deviations of the 50,000 root-mean-squared forecast errors are
reported in Table 12. Two levels of aggregation, 12 with five sample sizes, 50, 100, 250,
500, and 1000, and 75 with sample sizes 100, 250, 500, 750, and 1000, are used in the
simulations.
In the case of long horizon regressions the aggregate and disaggregate model are
roughly equal in forecast accuracy as expected. In the case of the HH model for low levels
of aggregation (level 12) the differences between the aggregate and the disaggregate models
are small. With the high level of aggregation, the disaggregate model sometimes
outperforms the aggregate one. If the economic goal is prediction, overlapping data might
be preferred if they make calculations easier since there is little difference in forecast
accuracy.
104 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
Table 12: Prediction accuracy for long-horizon regressions and Hansen and Hodrick model.
Long Horizon Regressions Hansen and Hodrick Model
Aggregation Aggregate Disaggregate Aggregate Disaggregate
Sample Model Model Model Model
Level
Size a b a b a b a
Mean SD Mean SD Mean SD Mean SDb
12 50 20.95 11.89 20.15 11.85 99.45 103.7 95.21 96.71
Notes: a These are means of the 50,000 estimated root-mean-squared forecast errors. b These are the standard deviations
of the 50,000 estimated root-mean-squared forecast error.
105 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
There are several special cases of overlapping data that do not fit any of the standard
procedures. Since the solutions are not obvious, we now discuss how to handle overlapping
data in the presence of varying levels of overlap, imperfect overlap, seasonal unit roots,
additional sources of autocorrelation, heteroscedasticity, and generalized autoregressive
conditional heteroskedasticity (GARCH).
It is not uncommon in studies of hedging to consider different hedging horizons which leads
to varying levels of overlap (i.e. k is not constant). This variation of the missing data
problem introduces heteroskedasticity of known form in addition to the autocorrelation. In
this case it is easier to work with the covariance matrix than the correlation matrix. The
covariance matrix is σ2u times a matrix that has the number of time periods (the value of kt)
used in computing that observation down the diagonal. The off diagonal terms would then
be the number of time periods for which the two observations overlap. Allowing for the
most general case of different overlap between every two consecutive observations, the
unconditional variance of et, given in equation (5), now is:
Var[et ] = σ e2 = E[et2 ] = ktσ u2 . (21)
Previously, two different error terms, et and et+s , had k-s common original error terms, u, for
any k - s > 0. Now, they may have less than k - s common u’s and there no longer is a
monotonic decreasing pattern of the number of the common u’s as et and et + s get further
apart. We let kts represent the number of common u’s (overlapping periods) between et and
et + s. Therefore, the covariances between the error terms et and et + s, are:
cov[et , et + s ] = E[et et + s ] = ( kts )σ u2 . (22)
106 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
⎡ k1 k12 k13 k 0 0 ⎤
⎢ ⎥
⎢ ⎥
⎢k 21 k2 k 23 k 2s 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ k 32 k3 k 34 k 3s ⎥
⎢ ⎥
2⎢ ⎥
Σ = σu ⎢ ⎥ (23)
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢0
⎣ 0 k ts k t( t −2 ) k t ( t −1 ) k ⎥⎦
where, kts = kst. The standard Newey-West procedure does not handle varying levels of
overlap since it assumes autocovariance stationarity.
Sometimes observations overlap, but they do not overlap in the perfect way assumed here
and so the correlation matrix is no longer known. An example would be where the
dependent variable represents six months returns on futures contracts. Assume that there
are four different contracts in a year, the March, June, September, and December contracts.
Then, the six-month returns for every two consecutive contracts would overlap while the
six-month returns between say March and September contracts would not overlap. The six-
month returns for the March and June contracts would overlap for three months, but they
would not be perfectly correlated during these three months, since the March and June
contracts are two different contracts. Let
σ js if m = 0
cov(u jt ,u st + m ) = (24)
0 otherwise
be the covariance between the monthly returns m months (or days if disaggregate data are
daily data) apart for the March and June contracts, where ujt and ust are the error term from
regression models with disaggregate data for the March and June contract. Then,
107 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
and
cov( e jt , est −m ) = k jsσ js , (26)
where kjs is the number of overlapping months between the March and June contracts; and
σ js = ρ i σ u2 , where ρ1 (i = 1,2) is the correlation between the u’s for two consecutive
contracts with maturities three ( ρ1 ) and six ( ρ 2 ) months apart. The covariance matrix for
equation (2) with n = 12, in this case is
⎡ k −1 k −2 k −3 k −4 k −5 ⎤
⎢ k ρ1 ρ1 ρ2 ρ2 0 0 0 0 0 0 ⎥
k k k k k
⎢ ⎥
⎢ ⎥
⎢ k −1 k −1 k −2 k −3 k −4 k −5 ⎥
⎢ k ρ1 ρ1 ρ2 ρ2 0 0 0 0 0 ⎥
⎢ k k k k k k ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ k − 3 ρ1 k −2
ρ2
k −1
k
k −1 k −2
ρ1
k −3
ρ1
k −4
ρ2
k −5
ρ2 0 0 0 ⎥
⎢ k k k k k k k k ⎥
⎢ ⎥
⎢ ⎥
⎢k − 4 k −3 k −2 k −1 k −1 k −2 k −3 k −4 k −5 ⎥
⎢ k ρ2 k
ρ1
k
ρ1
k
k
k k
ρ1
k
ρ1
k
ρ2
k
ρ2 0 0 ⎥
⎢ ⎥
⎢ ⎥
2 ⎢ ⎥
Σ = σu ⎢ ⎥ (27)
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 k −5 k −4 k −3 k −2 k −1
0 0 0 0 0 ρ2 ρ2 ρ2 ρ2 ρ2 k ⎥
⎢ k k k k k ⎥
⎢ ⎥
⎣ ⎦
108 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
The seasonal difference model of Box and Jenkins (1970), which is called a seasonal unit
root model in more recent literature, uses data which are in some sense overlapping, but do
not create an overlapping data problem if correctly specified. For annual data, the seasonal
root model is
yt = axt + ut ,
(28)
ut = ut −12 + et ,
where et is i.i.d. normal. In this case, the disaggregrate model
yt − yt −12 = α ( xt − xt −12 ) + et
has no autocorrelation. In this example, twelfth differencing leads to a model that can be
estimated using overlapping data and ordinary least squares. Seasonal unit roots have
largely been used when the research objective was forecasting (e.g. Clements and Hendry,
1997). One problem with the seasonal unit root model is that it is often rejected in
empirical work (e.g. McDougall 1995). Another is that it implies that each month has its
own independent unit root process and so each month’s price can wander aimlessly away
from the prices of the other months. Such a model seems implausible for most economic
time series. Hyllebert et al. (1990) suggest that the seasonal unit roots may be cointegrated,
which can overcome the criticism of one month’s price moving aimlessly away from
another month’s price. Wang and Tomek (2007) present another challenge to the seasonal
unit root model since they argue that commodity prices should not have any unit roots.
While a seasonal unit root model may be an unlikely model, if it is the true model, it does
not create an overlapping data problem.
109 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
the error term in equation (1) follows an ARMA process then the same procedure can be
applied with slight modification. Assume that ut in equation (1) follows the process
m( L)ut = h( L)ξt , (29)
7.5 Heteroskedasticity
If the residuals in the disaggregated data (ut in equation (1)) are heteroskedastic, then
estimation is more difficult. Define σ ut2 as the time-varying variance of ut and σ et2 as the
k −1
time-varying variance of et . Assume the ut’s are independent and thus σ et2 = ∑ σ ut2 − j . For
j =0
simplicity, assume that σ ut2 depends only on xt. If σ ut2 is assumed to be a linear function of
k −1
xt , σ ut2 = γ ′xt , then the function aggregates nicely so that σ et2 = ∑ γ ′ xt − j = γ ′ X t . But, if
j =0
k −1
multiplicative heteroskedasticity is assumed, σ ut2 = exp(γ ′ xt ) , then σ et2 = ∑ exp(γ ′ xt − j ) and
j =0
there is no way to consistently estimate γ using only aggregate data (nonoverlapping data
also have the same problem).
The covariance between et and et+s for any k − s ≥ 0 would be
k −1
Cov(et , et + s ) = ∑ σ u2( t − j ) (33)
j=s
110 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
The correlation matrix, Ω is known, as given by equation (8), so the covariance matrix can
be derived using the relation
Σ = Γ′ Ω Γ, (34)
where Γ = [γ ′ X 1 , γ ′ X 2 … , γ ′ X T ]x I T . A feasible generalized least squares estimator can
then be developed using equation (12). It might be reasonable to use equation (9) as the
first stage in a feasible GLS (FGLS) estimation that corrected for heteroskedasticity.
With financial data, it is common that the disaggregate model would follow a GARCH
process (e.g. Yang and Brorsen 1993). The combination of overlapping data and GARCH
processes besides the MA process in the mean also introduces additional autocorrelation in
the second moment. As an example, let us assume a GARCH(1,1) process for the volatility
of the disaggregate process and an overlapping level k=1. Then the diagonal elements of
the covariance matrix for the aggregate model would follow a GARCH(2,2) process while
the first off-diagonal elements would follow a GARCH(1,1) process. Thus the elements of
the covariance matrix are correlated. The appropriate estimator in such a case could be the
topic of future research.
8 Conclusions
We have evaluated different statistical and economic reasons for using overlapping data.
These reasons are especially important since they provide the motivation for using
overlapping data.
With strictly exogenous regressors as well as other standard assumptions, GLS is vastly
superior to Newey-West and OLSNO. The Newey-West estimator gave hypothesis tests
with incorrect size and low power even with sample sizes as large as 1,000. Unrestricted
MLE tends to reject the true null hypotheses more often than it should. However, this
problem is reduced or eliminated as larger samples are used, i.e. at least 1000 observations.
If overlapping data were the only econometric problem, there would appear to be little
reason to use overlapping data at all since the disaggregate model could be estimated. The
practice of estimating a model with both monthly and annual observations, for example,
would not have any apparent advantage.
111 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
We evaluated several statistical reasons for using overlapping data. If the motivation for
using overlapping data is missing observations then GLS is the preferred estimator. Errors
in variables with autocorrelated explanatory variables can be a reason to use overlapping
data, but even with the extreme case considered, the advantage is small. When overlapping
data are used due to nonnormality or errors in variables that are not autocorrelated, then
GLS is still preferred compared to Newey-West or OLSNO. However, the GLS estimator
provides no improvement compared to the disaggregate model. The GLS estimator would
be easier to implement than the Newey-West estimator for varying levels of overlap or
imperfect overlap.
We also evaluated economic reasons for using overlapping data. One such economic
reason involves regressions of long-horizon asset returns with overlapping data as in the
case of asset returns explained by dividend yields. In this case we proposed a modified
rescaling of the errors that produces correct test sizes for different sample sizes and level of
aggregation. Another economic reason is the case when lagged dependent variables are used
as explanatory variables. In this case the GLS estimator is inconsistent. When aggregate
data are used as regressors, consistent parameter estimates can sometimes be obtained with
maximum likelihood. In other cases, aggregation makes it impossible to recover the
parameters of the disaggregate model.
It can be reasonable to use overlapping data when the goal is to predict a multi-period
change. Results showed no advantage in terms of prediction accuracy from directly
predicting the multi-period change rather than using a disaggregate model and a multi-step
forecast. But the aggregate model could be preferred if it were more convenient to use.
Overlapping data are often used in finance and in studies of economic growth. Many of
the commonly used estimators are either inefficient or yield biased hypothesis tests. The
appropriate estimator to use with overlapping data depends on the situation, but authors
could do much better than the methods they presently use.
112 © qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
References
[1] Andrews, D. W. K., and J. C. Monahan (1990). An Improved Heteroskedasticity and
Autocorrelation Consistent Covariance Matrix Estimator. Econometrica, 60, 953-966.
[2] Bhansali, R. J. (1999). Parameter Estimation and Model Selection for Multistep Predic-
tion of a Time Series: A Review; in Asymptotics, Nonparametrics, and Time Series, S.
Ghosh (Ed), 201-225. Marcel Dekker, New York.
[3] Box, G., and G. Jenkins (1970). Time Series Analysis and Forecasting, and Control.
Holden-Day, San Francisco.
[5] Britten-Jones, M., and A. Neuberger (2004). Improved Inference and Estimation in Re-
gression with Overlapping Observations. EFA 2004 Maastricht Meetings Paper 4156.
Available at SSRN: http://ssrn.com/abstract=557090.
[6] Clements, M. P., and D. F. Hendry (1997). An Empirical Study of Seasonal Unit Roots
in Forecasting. International Journal of Forecasting, 13, 341-355
[7] Davidson, R., and J. G. MacKinnon (1993). Estimation and Inference in Econometrics.
Oxford University Press, New York.
[9] Engle, R. F. (1969). Biases from Time-Aggregation of Distributed Lag Models. Ph.D.
Thesis, Cornell University, University Microfilms: Ann Arbor, Michigan.
[10] Gilbert, C. L. (1986). Testing the Efficient Market Hypothesis on Averaged Data. Applied
Economics, 18, 1149-1166.
[11] Goetzmann, W. N., and P. Jorion (1993). Testing the Predictive Power of Dividend
Yields. Journal of Finance, 48, 663-680.
[12] Greene, W. H. (1997). Econometric Analysis, Third Edition. Macmillian Publishing Com-
pany, New York.
113
© qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
[13] Hansen, C. S., and B. Tuypens (2004). Long-Run Regressions: Theory and Application
to US Asset Markets. Zicklin School of Business WP 0410018, Baruch College, New York.
[14] Hansen, L. P., and R. J. Hodrick (1980). Forward Exchange Rates as Optimal Predictors
of Future Spot Rates: An Econometric Analysis. Journal of Political Economy, 88, 829-
853.
[15] Hjalmarsson, E. (2004). On the Predictability of Global Stock Returns. Göteborg Uni-
versity, Department of Economics WP 161.
[16] Hyllebert, S., Engle R. F., Granger C. W. J., and S. B. Yoo. (1990). Seasonal Integration
and Cointegration. Journal of Econometrics, 44, 215-238.
[17] Irwin, S. H., Zulauf C.R., and T. E. Jackson (1996). Monte Carlo Analysis of Mean
Reversion in Commodity Futures Prices. American Journal of Agricultural Economics,
78, 387-399.
[18] Jones, C. M., and G. Kaul (1996). Oil and the Stock Markets. The Journal of Finance,
51, 463-491.
[19] Marcellino, M. (1996). Some Temporal Aggregation Issues in Empirical Analysis. Uni-
versity of California at San Diego, Economics WP 96-39.
[21] Marcellino, M., Stock J. H., and M. W. Watson (2006). A Comparison of Direct and
Iterated Multistep AR Methods for Forecasting Macroeconomic Time Series. Journal of
Econometrics, 135, 499-526.
[23] McDougall, R. S. (1995). The Seasonal Unit Root Structure in New Zealand Macroeco-
nomic Variables. Applied Economics, 27, 817-827.
[24] Nelson, C. R., and M. J. Kim (1993). Predictable Stock Returns: The Role of Small
Sample Bias. Journal of Finance, 48, 641-661.
114
© qass.org.uk
QASS, Vol. 3 (3), 2009, 78-115
[25] Newey, W. K., and K. D. West (1987). A Simple, Positive Semi-Definite, Heteroskedas-
ticity and Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703-708.
[26] Valkanov, R. (2003). Long-Run Regressions: Theoretical Results and Applications. Jour-
nal of Financial Economics, 68, 201-232.
[27] Wang, D., and W. G. Tomek. (2007). Commodity Prices and Unit Root Tests. American
Journal of Agricultural Economics, 89, 873-889.
[29] Weiss, A. A. (1984). Systematic Sampling and Temporal Aggregation in Time Series
Models. Journal of Econometrics, 26, 271-281.
[31] Working, H. (1960). Note on the Correlation of First Difference Averages in a Random
Chain. Econometrica, 28, 916-918.
[32] Yang, S. R., and B. W. Brorsen (1993). Nonlinear Dynamics of Daily Futures Prices:
Conditional Heteroskedasticity or Chaos? Journal of Futures Markets, 13,175-191.
115
© qass.org.uk