0% found this document useful (0 votes)
40 views6 pages

Autocorrelation Notes PDF

Notes on Autocorrelation STK310

Uploaded by

mika.naidu1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views6 pages

Autocorrelation Notes PDF

Notes on Autocorrelation STK310

Uploaded by

mika.naidu1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Autocorrelation Chapter 10
Yi = β1 + β2X2 + ui
∴ Ŷi = β1 +β2X2
∴ ui = Yi – ŶI
First is a simple linear regression model. We have a dependent variable Y and an independent variable X 2,
so this is a two-variable model. We have our intercept in this model β1, our slope coefficient β2 and our
residual ui.
What is the residual? The residual is the difference between what we observed Y i (what the dependent
variable actually was at iteration i) minus what the model predicted that value of Y to be, Ŷ i.
What is the model? What is Ŷi?
Ŷi = β1 +β2X2
ŶI is this straight line and that straight line is defined by an intercept or a constant β1 and a slope parameter
β2.

There are a few very important assumptions that we make about errors when we do linear regression.
One of those assumptions is the assumption of homoscedasticity. What does that mean? It basically just
says that the spread of the errors should be constant around our predicted line ŶI, the variance of these
errors must be constant. We must not see at any point of X that the errors start flaring out.

When we’ve got a situation of one error variance at the beginning and a larger or different error variance
later, that is heteroscedasticity. When this is present in our errors, we saw that our estimators and β1 and β2
is no longer BLUE, they no longer have minimum variance, they are no longer efficient. Why? Because the
error variance σ2 is no longer constant σ2i, it changes for each iteration. MSE is no longer minimum and so
we will have incorrect estimates of the variance of β1 and β2. β1 and β2 are still unbiased but it is the
variance that is now giving the problem. This is all because our errors are not homoscedasticly spreads
along the regression line.

Autocorrelation also has to do with the errors or with the residuals. Besides for the fact that the variance of
the errors must be constant, we also make the assumption that the correlation between the errors i and j is
equal to 0, where i is not equal to j:
Corr (ui,ui) = 0 i≠j
If i were to be equal to j, Corr (ui,ui) = 1, it will be 1, which is expected.
For any pair of errors where i≠j, that correlation must be equal to 0.
This means that the error i should be uninfluenced, unaffected, have no relationship with any other error in
this regression.
If we only had 6 observations in our regression model, that means i = 1, 2, 3, 4, 5, 6.
Our residuals will be what is shown in the matrix.
We fitted our model and we found our residuals from Yi – ŶI and we found 6 residuals here.
Autocorrelation says that the value of u1 is not affected by what the other residuals are. They are
each independent of one another. They make decisions based on only themselves, their own
frame of reference. There is no pattern, there is no structure, it just really is all random
(stochastic).
Autocorrelation is when this randomness is violated. When the errors are no longer random. It is
when the correlation between ui and uj where i≠j, is not 0.
Corr (ui,uj) ≠ 0 i≠j
It is something that is significantly different from 0.
That is a problem? Why?
It is a problem because it violates what we assume should hold when we are doing linear regression.
What would it look like if we do have this correlation between errors?
Our errors follow some structure. Our residuals go from negative to positive
to negative to positive (n,p,p,p,n,n,n,p). There are parts in this regression
where we are always over- or underestimating our model (the true value of
Y).
We are always going to be making some structural error, meaning there will
be parts in our regression where we will always make one type of error and
parts where we will always make another type of error. That is not correct.
That means that our errors are predictable and that there is a trend in our errors which either means that
we’ve made a mistake in our modelling, or there is some important X variable that follows this structure that
This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
2

we see in our errors that we have left out, because what is the error? The error is essentially everything
about Y that is not explained by X.
Yi = β1 + β2X2 + ui
Everything that X can’t explain in Y is explained by u i, so if there is some important X3 foe example, that we
have left out, it would manifest itself in the error.
That is a problem. It means that we don’t know our theory, or we’ve left out something very important and
that the conclusions we draw will be incorrect.
We want to fit our model in such a way that these residuals are just noise/ random, meaning that we can’t
explain it. It is a mixture of events that can’t be explained. X2 explains Y, X3 explains Y, but in the error we
will never really know what that noise is, but we want it to be random. If it is not random, it means there is
something that can be explained that we are not explaining.
That is what autocorrelation is. That is the nature of autocorrelation.
If there is autocorrelation, our errors have some structure and the big problem is that MSE is tended to be
underestimated and we don’t have the BLUE estimates, MSE won’t be what it should be.

Page 316-317
There is a list of 7 consequences of AC:
1. The first one is that the OLS estimators are no longer BLUE, they are still linear and they are still
unbiased but they are no longer the best, meaning they no longer have minimum variance.
2. A further consequence is that your t and F tests are going to be inaccurate because if your betas, β i
does not have the lowest variance, what does that mean about your t test statistic? Well, what is a t test
statistic?
It is your estimate of βi divided by the standard error of βi. So if the standard error of βi isn’t as small as
it can be, that means that you are dividing βi by a larger value than you should be dividing it by, which
means that your t-value is going to be smaller than it should be. If the standard error for βi is bigger or
not as small as it should be, your t is going to be smaller than it should be and therefore it is going to be
inaccurate. You can’t actually trust that t test statistic.
Your t and F tests become inaccurate.
3. One of the other disadvantages is that your R2 also becomes inaccurate, because what do you use to
calculate R2? You use the ESS/TSS. ESS is TSS – RSS. The smaller your RSS is, that larger your
ESS. So, if your RSS is inaccurate, your ESS will also be inaccurate meaning that your R2 will not be a
true representation of how this model is fitting/ how fit this model is.
Everything becomes inaccurate all because your error variance has been compromised. Why has it been
compromised? Because there is some structure correlation between your errors.

That is what autocorrelation is all about. It is serious, we can’t trust our model or any inference or
conclusions that we draw from our model. For that reason, same as with HS, we need to go find a way to
(1) identify whether this problem of AC is present in our model and (2) if it is present, we need to find a way
to get rid of it, we can’t just keep it in our model.

AC usually happens in time-series data. It usually happens when our iterator is time. It can happen when
we are not looking at time series data but that is the exception, it usually as a rule happens in time series
data.

Is there AC in my model?
How do you go about checking if you have AC?
1. The first way you can do that is an informal method. This takes the form of plotting your data.
There are two plots we can make here: let’s assume that t is days
a. The first is a plot of your residuals on your y-axis over i. As your iterations
go on, you want to see what do my errors do? We want to just see
randomness, there must be no structure, there must be no correlation. If I
had regress ui against i, I should get a straight line with a slope of 0 and
therefore a Pearson’s correlation coefficient of 0.
In this case, i is equal to t. Our iterator is time and our residuals are
measured over time, meaning your ui’s all means the residual at ith time.
We can make a simple plot of our residuals over time/i.
The residuals can be both positive and negative. A conventional error plot is centred around 0 and
you’ve got the residuals randomly spread around 0.

This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
3

b. The second plot we can do is to go and plot the residuals of one time
period ago (yesterday)against the residual of the time period we are
currently assessing (today).
What do we expect to see here?
On our x-axis we plot the residuals of today and on our y-axis we plot
the residuals of yesterday.
What do we expect to see here? We expect to see a random, no pattern
spread. There is an even spread, it is representative, it is evenly spread across the 4 quadrants.
This is what we would expect to see if there is no AC.
We can plot our data and we can already draw a solid conclusion about whether we have AC or not.
2. The second method we can use if the formal method. For this one we use the Durbin-Watson Test.
This test considers if there is correlation between our residuals.
There are four important assumptions that we make to see whether we should use the DW test (slides
p21)
If all the assumption are valid, I can go ahead and calculate my d. d is a function of
only the errors. Why? The problem with AC lies in the errors. In order to pick up
whether there is AC in our model, our means of testing will have to be a function of
those errors.
We see that the DW d test statistic is also linked to rho, ρ. Rho is the strength between the errors of
today and the errors of yesterday. The bigger rho is or the further rho is away from 0, the stronger the
relationship is between the errors of today and the errors of yesterday. The stronger rho is, we would
then also expect a specific value coming from d. The closer rho is to 1, the closer d is to 0. The closer
rho is to -1, the closer d is to 4. If rho is exactly 0, meaning there is no correlation between ut and ut-1,
you’ll find that d is exactly 2.

How is the d test statistic linked to rho all hinges on the 3rd assumption; the disturbances ut are
generated by the following mechanism:
ut = ρut-1 + νt
That is why we make this assumption. If we don’t make this assumption then the Durbin-Watson d test
cannot be used. If the errors are correlated with itself on a different day in a different way than what we
see in the assumption, if it is not this autoregressive one scheme, then the DW d test statistic will be
inaccurate.
It is the same with HS, if we assume that the variance of ui is equal to σ2X2
Var (ui) σ2X2, but it is actually equal to σ2X93.2, but we make this assumption meaning we are going to
transform our model by dividing through by X, then the results of our transformed model won’t be BLUE.
We might think that it is BLUE but we made the wrong assumption about our errors. We thought that the
HS structure looked like σ2X93.2 but if it actually looked like σ2X2, we are going to have to apply a very
different transformation. We are going to have to divide through by the √X93.2, then we will have BLUE
estimates. This is a very extreme example but what is shown is by making the wrong assumptions and
by acting upon those assumption, we can mislead ourselves.
So, the DW d test statistic will only work if we make this assumption and this is actually in fact a valid
assumption to make
In theory/ practice it has been found that this assumption is usually a valid assumption to make. It is
very rarely that your errors will follow an Arp structure where p is something greater than 1.

Durbin-Watson d Test:

Here it is shown how to estimate rho.


The only difference with rho is that in the
numerator we have the product of the errors of
today and the errors of yesterday. In the d test
we have the squared difference of the errors of
today and the errors of yesterday. They have
very similar functional forms. Both of them are a
function of the errors of today and the errors of
yesterday, just a difference in the numerator.
This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
4

We have calculated our Durbin-Watson d test statistic, what does it mean?


If we have calculated our d, it can fall anywhere between 0 and 4. If it is between 2 and our upper critical
value and 4 minus our lower critical value, then we do not reject the null hypothesis. We also have a lower
critical value and 4 minus our upper critical value, if it falls
between the lower and the upper then we have a zone of
indecision, that means we can’t really make a decision. That
is one of the disadvantages of the DW test, there is a zone of
indecision where we don’t know whether we should reject the
null hypothesis or not. There is the zone between 0 and the
lower limit, and between 4 minus the upper limit and 4 where
we will reject the null hypothesis in favour for either positive
AC if d is between 0 and the lower limit or negative AC if d is
between 4 minus the upper limit and 4. The marked areas
are the only two areas where we can really decide to reject
our null hypothesis of no AC.
Notice that both these hypotheses can be split into two parts.

Note that when you look at the columns, the k’ denotes all the explanatory variables in our original model
excluding the constant term. The constant term is the intercept, it is a column vector of 1s. If you have a two
variable model, meaning you’ve got one 1 and one X, you are not going to look at where k = 2 but where k
= 1 because there is only one X, forget about the intercept here.
There are two different tables, there is a table for our upper and lower limit critical values at a 5% level of
significance and one for a 1% level of significance.
This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
5

What do we do if we see that AC is a problem in our model?


Just like HS and MC, there must be a solution to this problem and there is one.
What is the problem?
An AR(1) process, specifically when we consider the residuals, means that the errors at some time period
is equal to rho times the errors of yesterday plus epsilon t:
AR(1): ut = ρut-1 + ϵt
This is an AR(1) structure. What does this structure mean? It means that we assume that u t is a function of
ut-1. What does this have to say about the correlation between ut and ut-1?
Corr (ut,ut-1) ≠ 0
ut-1 explains ut, the relationship depends on rho.
If rho is significantly different from 0, what do you expect the correlation between ut and ut-1 to be? It will
also be significantly different from 0. The correlation between ut and ut-1 is dependent upon rho. That means
that if rho is not equal to 0, we have AC, because if rho is not equal to 0, there is a significant correlation
between ut and ut-1. That is the problem with AC.
If: ut = ρut-1 + ϵt but ρ = 0
What do we end up with then?
ut = 0ut-1 + ϵt = ϵt
What is ϵt?
We assume that ϵt is a residual that follows a normal distribution with mean 0 and variance σ2.
ϵt ~ N(0,σ2)
It holds all the other assumptions: e.g., it is homoscedastic, doesn’t have AC, etc.
If rho is equal to zero, what is the errors of today equal to?
It is equal to an error of today which does fulfil all the OLS assumptions;
But, if rho is not equal to 0, then the residual of today is not just a function of what it should be, ϵt is the
ideal error, it is also a function of itself, yesterday. That shouldn’t be because then you have got AC.
To fix the problem of AC we need to get u t to only be a function of ϵt. How are we going to do that?
How do we get rid of ρut-1? We are going to have to somehow get it away/ remove it from our model.
What is ut?
ut = Yi - Ŷi
in the normal regression model we’ve got:
Yt = β1 + β2X2 + ut (1)
We want ut to just be a function of ϵt so we are going to have to get rid of ρut-1.
We start off by finding ut-1.
We are going to lag Yt = β1 + β2X2 + ut so that we can get ut-1.
We’ve got our original model, we know ut is autocorrelated with itself at different time periods and now we’re
going to transform this model in order to get rid of that AC.
Lag: Yt-1 = β1 + β2Xt-1 + ut-1 (2)
Β1 is a constant so a lag operator doesn’t work on a constant because a constant is itself over all time
periods.
Xt: since we have only got one X, we can write X2 as Xt in our original model.
So, we’ve got the ut-1 but we also want to get the rho because we want to get rid of rho.
Our next step is to multiply our model by rho:
This transformation makes the assumption that we know rho. What do we do if we don’t know rho?
We will never know rho because to know the true rho, we need to know the true population errors. If we
know the true population errors, then we don’t have to do regression in the first place.
If we don’t know what rho is, we still follow exactly the same method we are doing now but we just use ρ̂.
We need to estimate rho. Rho hat is an unbiased estimate of rho. Where we see a rho, think of how we are
actually going to use rho hat.
Multiply by ρ:
ρYt-1 = ρβ1 + ρβ2Xt-1 + ρut-1 (3)
How do we get ut = ϵt?
If we take (1) and we subtract (3) from it, your residuals will be equal to ut – ρut-1 and do the same on the
other side to get:
ut – ρut-1 = ρut-1 + ϵt – ρut-1
ut – ρut-1 = ϵt that is the errors we want.
Because these are the errors that we are looking/ ideal. They are normally distributed with mean 0 and a
constant homoscedastic variance, they don’t have AC, meaning ϵ t is not correlated with ϵt-1 or ϵt-I for that
matter or any other t.

This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
6

We want to get rid of the AC relationship and just have our u t as a function a purely stochastic ideal error
variable.
We lag our model, we only want to lag the error but because the error is a function of Yt-1, β1 and β2Xt-1, we
need to go and lag everything. It is only the residual that we want to make not autocorrelated. So, we lag it
and we multiply by rho. Because of equality, we have to do that to our entire model as well.
In the end we end up with:
(Yt – ρYt-1) = (β1 – ρβ1) + (β2Xt – ρβ2Xt-1) + (ut – ρut-1) ut – ρut-1 = ϵt
We want ϵt because it is not autocorrelated.
Corr (ϵt,ϵj) = 0 where i ≠ j
We want this because it means that there is no AC.
If we had to fit this regression model and estimate β1 and β2, they would be BLUE.
You can go and simplify this model, take out some common factors but the point is that you regress the
transformed Y against the transformed X because you know that the errors from that transformed model will
be the ideal errors and then at the end you are going to have a β1 which is going to be a β*1 if you simplify
Y, you’ll have to do something to that to get the true β1 back. The point is that you’ll be able to get a β1 and
a β2 that are BLUE. Most importantly the Best because under HS and AC the only thing that isn’t BLUE is
that B, it is still an estimate, it is still unbiased, it is still linear, it is just not the best. We have not made that
B a BLUE B.

Yt = β1 + β2X2 + ut
Yt-1 = β1 + β2Xt-1 + ut-1 lag
ρYt-1 = ρβ1 + ρβ2Xt-1 + ρut-1 multiply by ρ
ut – ρut-1 = ρut-1 + ϵt – ρut-1 (1) – (3)
ut – ρut-1 = ϵt
(Yt – ρYt-1) = (β1 – ρβ1) + (β2Xt – ρβ2Xt-1) + (ut – ρut-1) lag the error

This study source was downloaded by 100000826489728 from CourseHero.com on 05-09-2024 01:54:16 GMT -05:00
Chapter 10AutocorrelationSTK 310
https://www.coursehero.com/file/193055851/Autocorrelation-Notesdocxpdf/
Powered by TCPDF (www.tcpdf.org)

You might also like