Chapter 4
Chapter 4
Chapter 4
1. E(ut) = 0
2. Var(ut) = σ2 < ∞
3. Cov (ui,uj) = 0
5. ut ∼ N(0,σ2)
• We will now study these assumptions further, and in particular look at:
- How we test for violations
- Causes
- Consequences
in general we could encounter any combination of 3 problems:
- the coefficient estimates are wrong
- the associated standard errors are wrong
- the distribution that we assumed for the
test statistics will be inappropriate
- Solutions
- the assumptions are no longer violated
- we work around the problem so that we
use alternative techniques which are still valid
• The χ2- version is sometimes called an “LM” test, and only has one degree
of freedom parameter: the number of restrictions being tested, m.
• Asymptotically, the 2 tests are equivalent since the χ2 is a special case of the
F-distribution:
χ 2 (m )
→ F (m, T − k ) as T − k → ∞
m
• For small samples, the F-version is preferable.
• The mean of the residuals will always be zero provided that there is a
constant term in the regression.
• We have so far assumed that the variance of the errors is constant, σ2 - this
is known as homoscedasticity.
û + t
• If the errors do not have a
constant variance, we say
that they are heteroscedastic
e.g. say we estimate a regression
and calculate the residuals, ut.
x 2t
• Graphical methods
• Formal tests: There are many of them: we will discuss Goldfeld-Quandt
test and White’s test
3. The test statistic, denoted GQ, is simply the ratio of the two residual
variances where the larger of the two variances must be placed in
the numerator.
s12
GQ = 2
s2
4. The test statistic is distributed as an F(T1-k, T2-k) under the null of
homoscedasticity.
5. A problem with the test is that the choice of where to split the
sample is that usually arbitrary and may crucially affect the
outcome of the test.
4. If the χ2 test statistic from step 3 is greater than the corresponding value
from the statistical table then reject the null hypothesis that the disturbances
are homoscedastic.
• Whether the standard errors calculated using the usual formulae are too big
or too small will depend upon the form of the heteroscedasticity.
• If the form (i.e. the cause) of the heteroscedasticity is known, then we can
use an estimation method which takes this into account (called generalised
least squares, GLS).
• A simple illustration of GLS is as follows: Suppose that the error variance is
related to another variable zt by
var(ut ) = σ 2 zt2
• To remove the heteroscedasticity, divide the regression equation by zt
yt 1 x x
= β1 + β 2 2t + β 3 3t + vt
zt zt zt zt
ut
where vt = is an error term.
zt
u var(ut ) σ 2 zt2
• Now var(vt ) = var t = 2
= 2
= σ 2
for known zt.
zt z t z t
t yt yt-1 ∆yt
1989M09 0.8 - -
1989M10 1.3 0.8 1.3-0.8=0.5
1989M11 -0.9 1.3 -0.9-1.3=-2.2
1989M12 0.2 -0.9 0.2--0.9=1.1
1990M01 -1.7 0.2 -1.7-0.2=-1.9
1990M02 2.3 -1.7 2.3--1.7=4.0
1990M03 0.1 2.3 0.1-2.3=-2.2
1990M04 0.0 0.1 0.0-0.1=-0.1
. . . .
. . . .
. . . .
• We assumed of the CLRM’s errors that Cov (ui , uj) = 0 for i≠j, i.e.
This is essentially the same as saying there is no pattern in the errors.
• Obviously we never have the actual u’s, so we use their sample counterpart,
the residuals (the ut’s).
• If there are patterns in the residuals from a model, we say that they are
autocorrelated.
• Some stereotypical patterns we may find in the residuals are given on the
next 3 slides.
+
û t û t
+
- +
uˆ t −1 Time
+ û t
û t
+
- +
uˆ t −1 Time
- -
- +
uˆ t −1
• The coefficient estimates derived using OLS are still unbiased, but they are
inefficient, i.e. they are not BLUE, even in large sample sizes.
• Thus, if the standard error estimates are inappropriate, there exists the
possibility that we could make the wrong inferences.
• All of the models we have considered so far have been static, e.g.
yt = β1 + β2x2t + ... + βkxkt + ut
• But we can easily extend this analysis to the case where the current value
of yt depends on previous values of y or one of the x’s, e.g.
yt = β1 + β2x2t + ... + βkxkt + γ1yt-1 + γ2x2t-1 + … + γkxkt-1+ ut
• We could extend the model even further by adding extra lags, e.g.
x2t-2 , yt-3 .
• However, other problems with the regression could cause the null hypothesis of no
autocorrelation to be rejected:
– Omission of relevant variables, which are themselves
autocorrelated.
– If we have committed a “misspecification” error by using
an inappropriate functional form.
– Autocorrelation resulting from unparameterised
seasonality.
• Denote the first difference of yt, i.e. yt - yt-1 as ∆yt; similarly for the x-
variables, ∆x2t = x2t - x2t-1 etc.
If our model is
∆yt = β1 + β2 ∆x2t + β3x2t-1 +β4yt-1 + ut
β4yt-1 = - β1 - β3x2t-1
− β1 β3
y= − x2
β4 β4
• This problem occurs when the explanatory variables are very highly correlated
with each other.
• Perfect multicollinearity
Cannot estimate all the coefficients
- e.g. suppose x3 = 2x2
and the model is yt = β1 + β2x2t + β3x3t + β4x4t + ut
Corr x2 x3 x4
x2 - 0.2 0.8
x3 0.2 - 0.3
x4 0.8 0.3 -
• But another problem: if 3 or more variables are linear
- e.g. x2t + x3t = x4t
• Note that high correlation between y and one of the x’s is not
muticollinearity.
• Essentially the method works by adding higher order terms of the fitted values
(e.g. yt2 , yt3 etc.) into an auxiliary regression:
Regress ut on powers of the fitted values:
ut = β0 + β1 yt2 + β2 yt3 +...+ β p −1 ytp + vt
Obtain R2 from this regression. The test statistic is given by TR2 and is
distributed as a χ 2 ( p − 1) .
• So if the value of the test statistic is greater than a χ ( p − 1) then reject the null
2
yt = Axtβ e ut ⇔ ln yt = α + β ln xt + ut
f(x ) f(x )
x x
0.5
0.4
0.3
0.2
0.1
0.0
-5.4 -3.6 -1.8 -0.0 1.8 3.6 5.4
• Bera and Jarque formalise this by testing the residuals for normality by
testing whether the coefficient of skewness and the coefficient of excess
kurtosis are jointly zero.
• It can be proved that the coefficients of skewness and kurtosis can be
expressed respectively as:
E[u3 ] E[u4 ]
b1 = and b2 =
(σ ) (σ )
3/ 2
2 2 2
• Could use a method which does not assume normality, but difficult and
what are its properties?
• Often the case that one or two very extreme residuals causes us to reject
the normality assumption.
û t
+
Oct Time
1987
• We have implicitly assumed that the parameters (β1, β2 and β3) are
constant for the entire sample period.
• We can test this implicit assumption using parameter stability tests. The
idea is essentially to split the data into sub-periods and then to estimate up
to three models, for each of the sub-parts and for all the data and then to
“compare” the RSS of the models.
where:
RSS = RSS for whole sample
RSS1 = RSS for sub-sample 1
RSS2 = RSS for sub-sample 2
T = number of observations
2k = number of regressors in the “unrestricted” regression (since it comes
in two parts)
k = number of regressors in (each part of the) “unrestricted” regression
3. Perform the test. If the value of the test statistic is greater than the
critical value from the F-distribution, which is an F(k, T-2k), then reject
the null hypothesis that the parameters are stable over time.
• Consider the following regression for the CAPM β (again) for the
returns on Glaxo.
• Say that we are interested in estimating Beta for monthly data from
1981-1992. The model for each sub-period is
• 1981M1 - 1987M10
0.24 + 1.2RMt T = 82 RSS1 = 0.03555
• 1987M11 - 1992M12
0.68 + 1.53RMt T = 62 RSS2 = 0.00336
• 1981M1 - 1992M12
0.39 + 1.37RMt T = 144 RSS = 0.0434
H0 : α1 = α 2 and β1 = β2
• The unrestricted model is the model where this restriction is not imposed
00434
. − ( 00355
. + 000336
. ) 144 − 4
Test statistic = ×
00355
. + 000336
. 2
= 7.698
• We reject H0 at the 5% level and say that we reject the restriction that the
coefficients are the same in the two periods.
• Problem with the Chow test is that we need to have enough data to do the
regression on both sub-samples, i.e. T1>>k, T2>>k.
• An alternative formulation is the predictive failure test.
• What we do with the predictive failure test is estimate the regression over a “long”
sub-period (i.e. most of the data) and then we predict values for the other period
and compare the two.
To calculate the test:
- Run the regression for the whole period (the restricted regression) and obtain the RSS
- Run the regression for the “large” sub-period and obtain the RSS (called RSS1). Note
we call the number of observations T1 (even though it may come second).
RSS − RSS1 T1 − k
Test Statistic =
where T2 = number of observations we are × to “predict”. The test statistic
attempting
RSS1 T2
will follow an F(T2, T1-k).
- Forward predictive failure tests, where we keep the last few observations
back for forecast testing, e.g. we have observations for 1970Q1-1994Q4.
So estimate the model over 1970Q1-1993Q4 and forecast 1994Q1-1994Q4.
1200
1000
600
400
200
1
27
53
79
105
131
157
183
209
235
261
287
313
339
365
391
417
443
- Split the data according to any known important
Sample Period
historical events (e.g. stock market crash, new government elected)
- Use all but the last few observations and do a predictive failure test on those.
Our Objective:
• To build a statistically adequate empirical model which
- satisfies the assumptions of the CLRM
- is parsimonious
- has the appropriate theoretical interpretation
- has the right “shape” - i.e.
- all signs on coefficients are “correct”
- all sizes of coefficients are “correct”
- is capable of explaining the results of all competing models
• Little, if any, diagnostic testing was undertaken. But this meant that all
inferences were potentially invalid.
• The advantages of this approach are that it is statistically sensible and also
the theory on which the models are based usually has nothing to say about
the lag structure of a model.
• First step is to form a “large” model with lots of variables on the right hand
side
• This is known as a GUM (generalised unrestricted model)
• At this stage, we want to make sure that the model satisfies all of the
assumptions of the CLRM
• If the assumptions are violated, we need to take appropriate actions to remedy
this, e.g.
- taking logs
- adding lags
- dummy variables
• We need to do this before testing hypotheses
• Once we have a model which satisfies the assumptions, it could be very big
with lots of lags & independent variables