Autocorrelation Notes NEW
Autocorrelation Notes NEW
INTRODUCTION
Autocorrelation occurs in time-series studies when the errors associated with a given
time period carry over into future time periods. For example, if we are predicting the
growth of stock dividends, an overestimate in one year is likely to lead to
overestimates in succeeding years.
For Example, We expect stock market prices to move or move down for several days
in succession.
The assumption of no auto or serial correlation in the error term that underlies the
CLRM will be violated.
We experience autocorrelation when
E (u i u j ) 0
Sometimes the term autocorrelation and serial correlation are used interchangeably.
However, some authors prefer to distinguish between them.
Tintner defines autocorrelation as ‘lag correlation of a given series within itself,
lagged by a number of times units’ whereas serial correlation is the ‘lag correlation
between two different series’.
TYPES OF AUTOCORRELATION
The most common form of Autocorrelation is first-order serial correlation, which can
either be positive or negative.
Positive serial correlation is where a positive error in one period carries over
into a positive error for the following period.
Negative serial correlation is where a negative error in one period carries over
into a negative error for the following period.
Second-order serial correlation is where an error affects data two time periods later.
This can happen when your data has seasonality. Orders higher than second-order do
happen, but they are rare.
In the figure given below
CAUSES OF AUTOCORRELATION
1. Inertia
A salient feature of most economic time series is inertia, or sluggishness. As is well
known, time series such as GNP, price indexes, production, employment, and
unemployment exhibit (business) cycles. Starting at the bottom of the recession, when
economic recovery starts, most of these series start moving upward. In this upswing,
the value of a series at one point in time is greater than its previous value. Thus there
is a “momentum’’ built into them, and it continues until something happens (e.g.,
increase in interest rate or taxes or both) to slow them down. Therefore, in regressions
involving time series data, successive observations are likely to be interdependent
2. Specification Bias:
There are several causes of autocorrelation. Perhaps the primary cause of
autocorrelation in regression problems involving time series data is failure to include
one or more important regressors in the model. For example suppose that we wish to
regress annual sales of a soft drink company against the annual advertising
expenditure for that product. Now the growth in population over the period of time
used in the study will also influence the product sales. If population size is not
included in the model, this may cause the errors in the model to be positively
autocorrelated, because population size is
positively correlated with product sales.
Consider the true model:
Sale (Yt) = β0 + β1X1t + β2X2t + εt ---------------------- ( I )
Where Y is the sale, X1 is the advertising expenditure, X2 is the population size.
However for some reason we run the following regression:
Sale (Yt) = β0 + β1X1t + υt ---------------------- ( II )
As model ( I ) is a true model and we run model ( II ), and hence the error or
disturbance
term υ will be autocorrelated.
b) Incorrect Functional Form:
Consider the following cost and output model:
Yt = β1 + β2 X1 + β3 X2 + υt
Instead of using the above form which is considered to be correct, if we fit the
following model:
Yt = β1 + β2 X1 + β3 X2 + υt
In this case, υ will reflect autocorrelation because of the use of an incorrect
functional form.
3. Cobweb Phenomenon
In agricultural market, the supply reacts to price with a lag of one time period because
supply decisions take time to implement. This is known as the cobweb phenomenon.
Thus, at the beginning of this year’s planting of crops, farmers are influenced by the
price prevailing last year.
i.e., Yt (Supplyt) = β1 + β2Pt−1 + ut
Obviously, in this situation the disturbances ut are not expected to be random because
if the farmers overproduce in year t, they are likely to reduce their production in t + 1,
and so on, leading to a Cobweb pattern.
4. Lags
Consumptiont 1 2 Consumptiont 1 u t
Yt 1 1 2 X t 1 u t 1
Yt 2 X t v t
This equation is known as the first difference form and dynamic regression model.
The previous equation is known as the level form. Note that the error term in the first
equation is not autocorrelated but it can be shown that the error term in the first
difference form is autocorrelated.
6. Non Stationary
When dealing with time series data, we should check whether the given time series is
stationary. A time series is stationary if its characteristics (e.g. mean, variance and
covariance) are time variant; that is, they do not change over time. If that is not the
case,we have a nonstationary time series. It is quite possible that both Y and X are
nonstationary
and therefore the error u is also nonstationary. In that case, the error term will exhibit
autocorrelation.
7. .Data Manipulation
In empirical analysis, the raw data are often “manipulated.’’ For example, in time
series regressions involving quarterly data, such data are usually derived from the
monthly data by simply adding three monthly observations and dividing the sum by 3.
This smoothness may itself lend to a systematic pattern in the disturbances, thereby
introducing autocorrelation.
Another source of manipulation is interpolation or extrapolation of data.
8. Omitted Variables
Suppose Yt is related to X2t and X3t, but we wrongfully do not include X3t in our
model. The effect of X3t will be captured by the disturbances ut. If X3t like many
economic series exhibit a trend over time, then X3t depends on X3t-1,X3t -2and so on.
Similarly then ut depends on ut-1, ut-2 and so on.
9. Misspecification
Suppose Yt is related to X2t with a quadratic relationship: Yt=β1+β2X22t+ut
but we wrongfully assume and estimate a straight line: Yt=β1+β2X2t+ut
Then the error term obtained from the straight line will depend on X22t.
10. Systematic errors in measurement
Suppose a company updates its inventory at a given period in time. If a systematic
error occurred then the cumulative inventory stock will exhibit accumulated
measurement errors. These errors will show up as an auto correlated procedure.
Markov First-order Autoregressive Scheme
Yt 1 2 X t u t
ut = ρ1ut−1 + εt (AR1)
Therefore when errors are auto correlated Ordinary Least Squares estimators are
inefficient (i.e. not “BEST”)
As noted, the estimator is no more BLUE, but the OLS estimators are still unbiased
and consistent.
2 1
Var ( ˆ2 ) 1 2 i j
2 x x k
x
2
t xt
̂2
Variance of may be either over estimated or under estimated depending on the
nature of auto correlation
As in the case of heteroscedasticity, in the presence of autocorrelation the OLS
estimators are still linear unbiased as well as consistent and asymptotically normally
distributed, but they are no longer efficient (i.e., minimum variance).Again, as in the
case of het-eroscedasticity, we distinguish two cases. For pedagogical purposes we
still continue to work with the two-variable model, although the following discussion
can be extended to multiple regressions
OLS Estimation Allowing for Autocorrelation
As noted, is not BLUE, the confidence intervals derived from there are likely to be
wider than those based on the GLS procedure. As Kmenta shows, this result is likely
to be the case even if the sample size increases indefinitely. That is not asymptotically
efficient. The implication of this finding for hypothesis testing is clear.It is likely to
declare a coefficient statistically insignificant (i.e., not different from zero) even
though in fact (i.e., based on the correct GLS procedure) it may be. This difference
can be seen clearly from Figure1. In this figure we show the 95% OLS [AR(1)] and
GLS confidence intervals assuming that true p2 = 0. Consider a particular estimate of
02, say, b2.
Figure 1
If there is positive autocorrelation, and if the value of a right-hand side variable grows
over time, then the estimate of the standard error of the coefficient estimate of this
variable will be too low and hence the t-statistic too high. In consequence, the usual t
and F tests may not be valid.
The variance of random term u may be seriously underestimated if the u`s are
autocorrelated. This leads to biased R-square value
DECTECTION OF AUTOCORRELATION
1. Graphical Method - There are various ways of examining the residuals.
The time sequence plot can be produced. Alternatively, we can plot the standardized
residuals against time.
The standardized residuals is simply the residuals divided by the standard error of the
regression. If the actual and standard plot shows a pattern, then the errors may not be
random. We can also plot the error term with its first lag.
2. The Runs Test- Consider a list of estimated error term, the errors term can be
positive or negative. In the following sequence, there are three runs.
(─ ─ ─ ─ ─ ─ ) ( + + + + + + + + + + + + + ) (─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ )
A run is defined as uninterrupted sequence of one symbol or attribute, such as + or -.
The length of the run is defined as the number of element in it. The above sequence as
three runs, the first run is 6 minuses, the second one has 13 pluses and the last one has
11 runs.
N: total number of observations
N1: number of + symbols (i.e. + residuals)
N2: number of ─ symbols (i.e. ─ residuals)
R: number of runs
Assuming that the N1 >10 and N2 >10, then the number of runs is normally
distributed with:
Then, Mean and Variance given below,
2N1 N 2
E ( R) 1
N
2 N 1 N 2 (2 N 1 N 2 N )
R2
( N ) 2 ( N 1)
If the null hypothesis of randomness is sustainable, following the properties of the
Normal distribution, we should expect that
Prob [E(R) – 1.96 σR≤ R ≤ E(R) – 1.96 σR]
Hypothesis: do not reject the null hypothesis of randomness with 95% confidence if R,
the number of runs, lies in the preceding confidence interval; reject otherwise
3. The Durbin Watson Test
t n
(uˆ
t 2
t uˆ t 1 ) 2
d t n
t
ˆ
u 2
t 1
It is simply the ratio of the sum of squared differences in successive residuals to the
RSS. The number of observation is n-1 as one observation is lost in taking successive
differences.
d
uˆt
2
uˆt 1 2ut ut 1
2
ˆˆ uˆtuˆt1
21
uˆt 2
uˆt
2
Since
uˆ t2 and uˆ t21 are approximately equal)
ˆ
uˆt uˆt 1
Where uˆt2
But since -1 ≤ ρ≤ 1, this implies that 0 ≤ d ≤ 4. If the statistic lies near the value 2,
there is no serial correlation. But if the statistic lies in the vicinity of 0, there is
positive serial correlation. The closer the d is to zero, the greater the evidence of
positive serial correlation. If it lies in the vicinity of 4, there is evidence of negative
serial correlation. If it lies between dL and dU / 4 –dL and 4 – dU, then we are in the
zone of indecision.
t (u t u t 1 )
The equation can be
Yt * 1 * 2 X t* t
Therefore, the error term satisfies all the OLS assumptions. Thus we can apply OLS
to the transformed variables Y* and X* and obtain estimation with all the optimum
properties, namely BLUE
Regression coefficient can be estimated as
β1 (1- ρ)
When ρ is unknown, there are many ways to estimate it.
Assume that ρ = +1 the generalized difference equation reduces to the first difference
equation
Yt Yt 2 ( X t X t 1 ) (u t u t 1 )
Yt 2 X t t
u t 1u t 1 t
Estimate the following equation
Step 4 : Estimate
procedure is strictly speaking valid in large samples and may not be appropriate in
small samples. But in large samples we now have a method that produces
autocorrelation-corrected standard errors . Therefore, if a sample is reasonably large,
one should use the Newey–West procedure to correct OLS standard errors not only in
situations of autocorrelation only but also in cases of heteroscedasticity, for the HAC
method can handle both, unlike the White method, which was designed specififically
for heteroscedasticity.
Useful Links :
1. https://www.statisticshowto.com/serial-correlation-autocorrelation/
2. https://www.rhayden.us/regression-models/consequences-of-using-ols-in-the-prese
nce-of-autocorrelation.html
3. https://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm
4. https://aip.scitation.org/doi/abs/10.1063/1.4825890
5. https://online.stat.psu.edu/stat501/lesson/14/14.3
6. https://slideplayer.com/slide/8591570/
7. https://www.slideshare.net/100002907643874/auto-correlation-presentation
Possible Questions:
1. Define Autocorrelation.
2. What are the types of Autocorrelation?
3. What are consequences of Autocorrelation?
4. What are the different methods of detecting Autocorrelation?
5. List out the causes of Autocorrelation.
6. Explain the remedial measures for Autocorrelation.
7. Write a note on Durbin Watson d statistic?