0% found this document useful (0 votes)

631 views20 pages

CLRM Assumptions

This document discusses statistical tests for the classical linear regression model (CLRM), including hypothesis testing using t-tests and F-tests. It also covers diagnosing and addressing violations of CLRM assumptions such as multicollinearity, heteroscedasticity, autocorrelation, non-stochastic regressors, and non-normal disturbances. Remedies include transformation of variables, weighted least squares, and adding lagged dependent variables. The document also briefly discusses the generalized linear regression model.

Uploaded by

piscesxiii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

631 views20 pages

CLRM Assumptions

Uploaded by

piscesxiii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Classical Linear Regression Model:

Assumptions and Diagnostic Tests

Yan Zeng

Version 1.1, last updated on 10/05/2016

Abstract
Summary of statistical tests for the Classical Linear Regression Model (CLRM), based on Brooks [1],
Greene [5] [6], Pedace [8], and Zeileis [10].

Contents
1 The Classical Linear Regression Model (CLRM) 3

2 Hypothesis Testing: The t-test and The F-test 4

3 Violation of Assumptions: Multicollinearity 5

3.1 Detection of multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Consequence of ignoring near multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Dealing with multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Violation of Assumptions: Heteroscedasticity 7

4.1 Detection of heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.1 The Goldfeld-Quandt test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.2 The White’s general test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.1.3 The Breusch-Pagan test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.4 The Park test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Consequences of using OLS in the presence of heteroscedasticity . . . . . . . . . . . . . . . . 8
4.3 Dealing with heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.1 The generalised least squares method . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.2 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3.3 The White-corrected standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Violation of Assumptions: Autocorrelation 9

5.1 Detection of autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.1 Graphical test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.2 The run test (the Geary test) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.1.3 The Durbin-Watson test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.1.4 The Breusch-Godfrey test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Consequences of ignoring autocorrelation if it is present . . . . . . . . . . . . . . . . . . . . . 11
5.3 Dealing with autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.1 The Cochrane-Orcutt procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.2 The Newey-West standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3.3 Dynamic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.3.4 First diﬀerence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1
5.4 Miscellaneous issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 Violation of Assumptions: Non-Stochastic Regressors 14

7 Violation of Assumptions: Non-Normality of the Disturbances 14

8 Issues of Model Speciﬁcation 14

8.1 Omission of an important variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.2 Inclusion of an irrelevant variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.3 Functional form: Ramsey’s RESET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.4 Parameter stability / structural stability tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.4.1 The Chow test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.4.2 Predictive failure tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8.4.3 The Quandt likelihood ratio (QLR) test . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.4.4 Recursive least squares (RLS): CUSUM and CUSUMQ . . . . . . . . . . . . . . . . . 16

9 The Generalized Linear Regression Model (GLRM) 17

9.1 Properties of OLS in the GLRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.2 Robust estimation of asymptotic covariance matrices for OLS . . . . . . . . . . . . . . . . . . 18
9.2.1 HC estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.2.2 HAC estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.2.3 R package for robust covariance estimation of OLS . . . . . . . . . . . . . . . . . . . . 19

2
1 The Classical Linear Regression Model (CLRM)
Let the column vector xk be the T observations on variable xk , k = 1, · · · , K, and assemble these data
in an T × K data matrix X. In most contexts, the ﬁrst column of X is assumed to be a column of 1s:
 
1
1
 
x1 =  . 
 .. 
1 T ×1

so that β1 is the constant term in the model. Let y be the T observations y1 , · · · , yT , and let ε be the
column vector containing the T disturbances. The Classical Linear Regression Model (CLRM) can be
written as  
xi1
 xi2 
 
y = x1 β1 + · · · + xK βK + ε, xi =  . 
 .. 
xiT T ×1
or in matrix form
yT ×1 = XT ×K βK×1 + εT ×1 .

Assumptions of the CLRM (Brooks [1, page 44], Greene [6, page 16-24]):
(1) Linearity: The model speciﬁes a linear relationship between y and x1 , · · · , xK .

y = Xβ + ε

(2) Full rank: There is no exact linear relationship among any of the ndependent variables in the model.
This assumption will be necessary for estimation of the parameters of the model (see formula (1)).

X is a T × K matrix with rank K.

(3) Exogeneity of the independent variables: E[εi |xj1 , xj2 , · · · , xjK ] = 0. This states that the
expected value of the disturbance at observation i in the sample is not a function of the independent
variables observed at any observation, including this one. This means that the independent variables will
not carry useful information for prediction of εi .

E[ε|X] = 0.

(4) Homoscedasticity and nonautocorrelation: Each disturbance, εi has the same ﬁnite variance,
σ 2 , and is uncorrelated with every other disturbance, εj .

E[εε′ |X] = σ 2 I.

(5) Data generation: The data in (xj1 , xj2 , · · · , xjK ) may be any mixture of constants and random
variables.
X may be ﬁxed or random.
(6) Normal distribution: The disturbances are normally distributed.

ε|X ∼ N (0, σ 2 I).

In order to obtain estimates of the parameters β1 , β2 , · · · , βK , the residual sum of squares (RSS)
( )2
∑
T ∑T ∑K
RSS = ε̂′ ε̂ = ε̂2t = yt − xit βi
t=1 t=1 i=1

3
is minimised so that the coeﬃcient estimates will be given by the ordinary least squares (OLS) estimator
 
β̂1
 β̂2 
β̂ =   ′ −1 ′
· · · = (X X) X y. (1)
β̂k

In order to calculate the standard errors of the coeﬃcient estimates, the variance of the errors, σ 2 , is
estimated by the estimator
∑T
RSS ε̂2
2
s = = t=1 t (2)
T −K T −K
where we recall K is the number of regressors including a constant. In this case, K observations are “lost”
as K parameters are estimated, leaving T − K degrees of freedom.
Then the parameter variance-covariance matrix is given by

Var(β̂) = s2 (X ′ X)−1 . (3)

And the coeﬃcient standard errors are simply given by taking the square roots of each of the terms on the
leading diagonal. In summary, we have (Brooks [1, page 91-92])
 ′

β̂ = (X X)−1 X ′ y = β + (X ′ X)−1 X ′ ε
∑T
ε̂2t
s2 = Tt=1 (4)

 −K
Var(β̂) = s2 (X ′ X)−1 .

The OLS estimator is the best linear unbiased estimator (BLUE), consistent and asymptotically normally
distributed (CAN), and if the disturbances are normally distributed, asymptotically eﬃcient among all CAN
estimators.

2 Hypothesis Testing: The t-test and The F-test

The t-statistic for hypothesis testing is given by

β̂i − hypothesized value

∼ t(T − K)
SE(β̂i )
√
where SE(β̂i ) = Var(β̂)ii , and is used to test single hypotheses. The F -test is used to test more than one
coefficient simultaneously.
Under the F -test framework, two regressions are required. The unrestricted regression is the one in
which the coefficients are freely determined by the data, and the restricted regression is the one in which
the coefficients are restricted, i.e. the restrictions are imposed on some βs. Thus the F -test approach to
hypothesis testing is also termed restricted least squares.
The F -test statistic for testing multiple hypotheses about the coefficient estimate is given by
RRSS − U RSS T −K
× ∼ F (m, T − K) (5)
U RSS m
where U RSS is the residual sum of squares from unrestricted regression, RRSS is the residual sum of squares
from restricted regression, m is the number of restrictions1 , T is the number of observations, and K is the
number of regressors in the unrestricted regression.
To see why the test centres around a comparison of the residual sums of squares from the restricted and
unrestricted regressions, recall that OLS estimation involved choosing the model that minimised the residual
1 Informally, the number of restrictions can be seen as “the number of equality signs under the null hypothesis”.

4
sum of squares, with no constraints imposed. Now if, after imposing constraints on the model, a residual
sum of squares results that is not much higher than the unconstrained model’s residual sum of squares, it
would be concluded that the restrictions were supported by the data. On the other hand, if the residual
sum of squares increased considerably after the restrictions were imposed, it would be concluded that the
restrictions were not supported by the data and therefore that the hypothesis should be rejected.
It can be further stated that RRSS ≥ U RSS.2 Only under a particular set of very extreme circumstances
will the residual sums of squares for the restricted and unrestricted models be exactly equal. This would be
the case when the restriction was already present in the data, so that it is not really a restriction at all.
Finally, we note any hypothesis that could be tested with a t-test could also have been tested using an
F -test, since
t2 (T − K) ∼ F (1, T − K).

3 Violation of Assumptions: Multicollinearity

If the explanatory variables were orthogonal to one another, adding or removing a variable from a
regression equation would not cause the values of the coeﬃcients on the other variables to change. Perfect
multicollinearity will make it impossible to invert the (X ′ X) matrix since it would not be of full rank.
Technically, the presence of high multicollinearity doesn’t violate any CLRM assumptions. Consequently,
OLS estimates can be obtained and are BLUE with high multicollinearity. The larger variances (and standard
errors) of the OLS estimators are the main reason to avoid high multicollinearity.
Causes of multicollinearity include
• You use variables that are lagged values of one another.
• You use variables that share a common time trend component.
• You use variables that capture similar phenomena.

3.1 Detection of multicollinearity

Testing for multicollinearity is surprisingly difficult. Correlation matrix is one simple method, but if
the relationship involves more variables that are collinear, then multicollinearity would be very difficult to
detect.
Rule of thumb for identifying multicollinearity. Because high multicollinearity doesn’t violate
a CLRM assumption and is a sample-specific issue, researchers typically don’t use formal statistical tests
to detect multicollinearity. Instead, they use two sample measurements as indicators of a potential multi-
collinearity problem.
• Pairwise correlation coefficients. The sample correlation of two independent variables, xk and xj ,
is calculated as
skj
rkj = .
sk sj
As a rule of thumb, correlation coefficients around 0.8 or above may signal a multicollinearity problem.
Other evidence you should also check include insignificant t-statistics, sensitive coefficient estimates, and
nonsensical coefficient signs and values.
Note the pairwise correlation coefficients only identify the linear relationship of two variables. It does
not check linear relationship among more than two variables.
• Auxiliary regression and the variance inflation factor (VIF). A VIF for any given independent
variable is calculated by
1
V IFk =
1 − Rk2
where Rk2 is the R-squared value obtained by regressing independent variable xk on all the other independent
variables in the model.
2 Recall URSS is the shortest distance from a vector to its projection plane.

5
As a rule of thumb, VIFs greater than 10 signal a highly likely multicollinearity problem, and VIFs
between 5 and 10 signal a somewhat likely multicollinearity issue. Remember to check also other evidence
of multicollinearity (insignificant t-statistics, sensitive or nonsensical coefficient estimates, and nonsensical
coefficient signs and values). A high VIF is only an indicaotr of potential multicollinearity, but it may not
result in a large variance for the estimator if the variance of the independent variable is also large.

3.2 Consequence of ignoring near multicollinearity

First, R2 will be high but the individual coefficients will have high standard errors, so that the regression
“looks good” as a whole, but the individual variables are not significant. This arises in the context of very
closely related explanatory variables as a consequence of the difficulty in observing the individual contribution
of each variable to the overall fit of the regression.
Second, the regression becomes very sensitive to small changes in the specification, so that adding or
removing an explanatory variable leads to large changes in the coefficient values or significance of the oth-
er variables. The intuition is that, if the independent variables are highly collinear, the estimates must
emphasize small differences in the variables in order to assign an independent effect to each of them.
Third, nonsensical coefficient signs and magnitudes. With higher multicollinearity, the variance of the
estimated coefficients increases, which in turn increases the chances of obtaining coefficient estimates with
extreme values.
Finally, near multicollinearity will thus make confidence intervals for the parameters very wide, and
significance tests might therefore give inappropriate conclusions, and so make it difficult to draw sharp
inferences.

3.3 Dealing with multicollinearity

A number of alternative estimation techniques have been proposed that are valid in the presence of
multicollinearity – for example, ridge regression, or principal components. Many researchers do not use
these techniques, however, as they can be complex, their properties are less well understood than those
of the OLS estimator and, above all, many econometricians would argue that multicollinearity is more a
problem with the data than with the model or estimation method.
Ad hoc methods include:
• Ignore it, if the model is other wise adequate, i.e. statistically and in terms of each coefficient being
of a plausible magnitude and having an appropriate sign. The presence of near multicollinearity does not
affect the BLUE properties of the OLS estimation. However, in the presence of near multicollinearity, it will
be hard to obtain small standard errors.
• Drop one of the collinear variables, so that the problem disappears. This may be unacceptable if
there were strong a priori theoretical reasons for including both variables in the model. Also, if the removed
variable was relevant in the data generating process for y, an omitted variable bias would result (see later).
• Transform the highly correlated variables into a ratio and include only the ratio and not
the individual variables in the regression. This may be unacceptable by financial theory.
• Increase the sample size, e.g. by using a pooled sample. This is because near multicollinearity is
more a problem with the data than with the model.
• Use a new model.
First-differencing. Its use is limited to models utilizing time-series or panel data. It also has its cost: 1)
losing observations; 2) losing variation in your independent variables (resulting in insignificant coefficients);
3) changing the specification (possibly resulting in misspecification bias).
The composite index variable. But never combine variables into an index that would, individually, be
expected to have opposite signs.

6
4 Violation of Assumptions: Heteroscedasticity
4.1 Detection of heteroscedasticity
This is the situation where E[ε2i |X] is not a ﬁnite constant.

4.1.1 The Goldfeld-Quandt test

The Goldfeld-Quandt test is based on splitting the total sample of length T into two sub-samples of
length T1 and T2 . The regression model is estimated on each sub-sample and the two residual variances are
calculated as
s21 = ε̂′1 ε̂1 /(T1 − K), s22 = ε̂′2 ε̂2 /(T2 − K)
respectively. The null hypothesis is that the variances of the disturbances are equal, against a two-sided
alternative. The test statistic, denoted GQ, is simply

s21
GQ =
s22

with s21 > s22 . The test statistic is distributed as an F (T1 − K, T2 − K) under the null hypothesis, and the
null of a constant variance is rejected if the test statistic exceeds the critical value.
The GQ test is simple to construct but its conclusions may be contingent upon a particular, and probably
arbitrary, choice of where to split the sample. An alternative method that is sometimes used to sharpen the
inferences from the test and to increase its power is to omit some of the observations from the centre of the
sample so as to introduce a degree of separation between the two sub-samples.

4.1.2 The White’s general test

The White’s general test for heteroscedasticity is carried out as follows.
(1) Assume that the regression model estimated is of the standard linear form, e.g.

yt = β1 + β2 x2t + β3 x3t + εt .

To test Var(εt ) = σ 2 , estimate the model above, obtaining the residuals ε̂t .
(2) Run the auxiliary regression

ε̂2t = α1 + α2 x2t + α3 x3t + α4 x22t + α5 x23t + α6 x2t x3t + νt .

The squared residuals are the quantity of interest since Var(εt ) = E[ε2t ] under the assumption that E[εt ] = 0.
The reason that the auxiliary regression takes this form is that it is desirable to investigate whether the
variance of the residuals varies systematically with any known variables relevant to the model. Note also
that this regression should include a constant term, even if the original regression did not. This is as a result
of the fact that ε̂2t will always have a non-zero mean.
(3) Given the auxiliary regression, the test can be conducted using two different approaches.
(i) First it is possible to use the F -test framework. This would involve estimating the auxiliary
regression as the unrestricted regression and then running a restricted regression of ε̂2t on a constant only.
The RSS from each specification would then be used as inputs to the standard F -test formula.
(ii) An alternative approach, called Lagrange Multiplier (LM) test, centres around the value of R2
for the auxiliary regression and does not require the estimation of a second (restricted) regression. If one or
more coefficients in the auxiliary regression is statistically significant, the value of R2 for that equation will
be relatively high, while if none of the variables is significant, R2 will be relatively low. The LM test would
thus operate by obtaining R2 from the auxiliary regression and multiplying it by the number of observations,
T . It can be shown that
T R2 ∼ χ2 (m)
where m is the number of regressors in the auxiliary regression (excluding the constant term), equivalent to
the number of restrictions that would have to be placed under the F -test approach.

7
(4) The test is one of the joint null hypothesis that α2 = α3 = α4 = α5 = α6 = 0. For the LM test, if the
χ2 -test statistic from step (3) is greater than the corresponding value from the statistical table then reject
the null hypothesis that the errors are homoscedastic.

4.1.3 The Breusch-Pagan test

The Breusch-Pagan test can be seen as a special case of White’s general test. See [11] for a summary.

4.1.4 The Park test

The Park test assumes that the heteroscedasticity may be proportional to some power of an independent
variable xk in the model: σε2t = σε2 xα
kt .
(1) Estimate the model yt = β1 + β2 x2t + · · · + βK xKt + εt using OLS.
(2) Obtain the squared residuals, ε̂2t , after estimating your model.
(3) Estimate the model ln ε̂2t = γ + α ln xkt + νt using OLS.
(4) Examine the statistical significance of α using the t-statistic: t = σ̂α̂α̂ . If the estimate of α coefficient
is statistically significant, then you have evidence of heteroskedasticity.

4.2 Consequences of using OLS in the presence of heteroscedasticity

When the errors are heteroscedastic, the OLS estimators will still give unbiased (and also consistent)
coefficient estimates, but they are no longer BLUE. The reason is that the error variance, σ 2 , plays no part
in the proof that the OLS estimator is consistent and unbiased, but σ 2 does appear in the formulae for the
coefficient variances.
If OLS is still used in the presence of heteroscedasticity, the standard errors could be wrong and hence
any inferences made could be misleading. In general, the OLS standard errors will be too large for the
intercept when the errors are heteroscedastic. The effect of heteroscedasticity on the slope standard errors
will depend on its form.

4.3 Dealing with heteroscedasticity

4.3.1 The generalised least squares method
The generalised least squares (GLS) method supposes that the error variance was related to zt by the
expression
Var(εt ) = σ 2 zt2 .
All that would be required to remove the heteroscedasticity would be to divide the regression equation
through by zt
yt 1 x2t x3t
= β1 + β2 + β3 + νt
zt zt zt zt
where νt = zεtt is an error term. GLS can be viewed as OLS applied to transformed data that satisfy the
OLS assumptions. GLS is also known as weighted least squares (WLS) since under GLS a weighted sum of
the squared residuals is minimised, whereas under OLS it is an unweighted sum. Researchers are typically
unsure of the exact cause of the heteroscedasticity, and hence this technique is usually infeasible in practice.

4.3.2 Transformation
A second “solution” for heteroscedasticity is transforming the variables into logs or reducing by some other
measure of “size”. This has the eﬀect of re-scaling the data to “pull in” extreme observations.

8
4.3.3 The White-corrected standard errors
A third “solution” for heteroscedasticity is robust standard error (White-corrected standard errors,
heteroscedasticity-corrected standard errors) following White [9]. In a model with one independent
variable, the robust standard error is
v
u ∑T
u t=1 (xt − x) ε̂t
2 2
se(β̂1 )HC = u
t (∑ T )2 .
(x
t=1 t − x) 2

Generalizing this result to a multiple regression model, the robust standard error is
v
u ∑T
u 2 2
t=1 ω̂tk ε̂t
se(β̂k )HC = u
t (∑ T )2
2
t=1 ω̂tk

2
where the ω̂tk ’s are the squared residuals obtained from the auxiliary regression of xk on all the other
independent variables. Here’s how to calculate robust standard errors:
(1) Estimate your original multivariate model, yt = β1 + β2 x2t + · · · + βK xKt + εt , and obtain the squared
residuals, ε̂2t .
(2) Estimate K − 1 auxiliary regressions of each independent variable on all the other independent
variables and retain all T × (K − 1) squared residuals (ω̂tpk
2
).
(3) For any independent variable, calculate the robust standard errors:
v
u ∑T
u 2 2
t=1 ω̂tk ε̂t
se(β̂k )HC = u
t (∑T )2 .
2
t=1 ω̂tk

The eﬀect of using the correction is that, if the variance of the errors is positively related to the square of
an explanatory variable, the standard errors for the slope coeﬃcients are increased relative to the usual OLS
standard errors, which would make hypothesis testing more “conservative”, so that more evidence would be
required against the null hypothesis before it would be rejected.
The results of Fabozzi and Francis [4] strongly suggest the presence of heteroscedasticity in the context
of the single index market model. Numerous versions of robust standard errors exist for the purpose of
improving the statistical properties of the heteroskedasticity correction; no form of robust standard error is
preferred above all others.

5 Violation of Assumptions: Autocorrelation

5.1 Detection of autocorrelation
When you’re drawing conclusions about autocorrelation using the error pattern, all other CLRM assump-
tions must hold, especially the assumption that the model is correctly specified. If a model isn’t correctly
specified, you may mistakenly identify the model as suffering from autocorrelation. Misspecification is a
more serious issue than autocorrelation.

5.1.1 Graphical test

Graphical test. The ﬁrst step is to consider possible relationships between the current residual and the
immediately previous one via a graphical exploration. Thus ε̂t is plotted against ε̂t−1 , and ε̂t is plotted over
time. Under positive autocorrelation, most of the dots (ε̂t−1 , ε̂t ) are in the ﬁrst and third quadrants, while
under negative autocorrelation, most of the dots (ε̂t−1 , ε̂t ) are in the second and fourth quadrants. In a graph
where ε̂t is plotted against time, a positively autocorrelated series of residuals will not cross the time-axis
very frequently, while a negatively autocorrelated series of residuals will cross the time-axis more frequently
than if they were distributed randomly.

9
5.1.2 The run test (the Geary test)
The run test (the Geary test). You want to use the run test if you’re uncertain about the nature of the
autocorrelation.
A run is deﬁned as a sequence of positive or negative residuals. The hypothesis of no autocorrelation
isn’t sustainable if the residuals have too many or too few runs.
The most common version of the test assumes that runs are distributed normally. If the assumption of
no autocorrelation is sustainable, with 95% conﬁdence, the number of runs should be between

µr ± 1.96σr

where µr is the expected number of runs and σr is the standard deviation. These values are calculated by
√
2T1 T2 2T1 T2 (2T1 T2 − T1 − T2 )
µr = + 1, σr =
T1 + T2 (T1 + T2 )2 (T1 + T2 − 1)

where r is the number of observed runs, T1 is the number of positive residuals, T2 is the number of negative
residuals, and T is the total number of observations.
If the number of observed runs is below the expected interval, it’s evidence of positive autocorrelation;
if the number of runs exceeds the upper bound of the expected interval, it provides evidence of negative
autocorrelation.

5.1.3 The Durbin-Watson test

The Durbin-Watson (DW) test is a test for ﬁrst order autocorrelation. One way to motivate the test and
to interpret the test statistic would be in the context of a regression of the time-t error on its previous value

εt = ρεt−1 + νt (6)

where νt ∼ N (0, σnu

2
).3 The DW test statistic has as its null and alternative hypotheses

H0 : ρ = 0, H1 : ρ ̸= 0.

It is not necessary to run the regression given by (6) since the test statistic can be calculated using
quantities that are already available after the ﬁrst regression has been run
∑T
(ε̂t − ε̂t−1 )2
DW = t=2∑T ≈ 2(1 − ρ̂) (7)
2
t=2 ε̂t

where ρ̂ is the estimated correlation coefficient that would have been obtained from an estimation of (6).
The intuition of the DW statistic is that the numerator “compares” the values of the error at times t − 1 and
t. If there is positive autocorrelation in the errors, this difference in the numerator will be relatively small,
while if there is negative autocorrelation, with the sign of the error changing very frequently, the numerator
will be relatively large. No autocorrelation would result in a value for the numerator between small and
large.
In order for the DW test to be valid for application, three conditions must be fulfilled:
(i) There must be a constant term in the regression.
(ii) The regressors must be non-stochastic.
(iii) There must be no lags on dependent variable in the regresion.4
The DW test does not follow a standard statistical distribution. It has two critical values: an upper
critical value dU and a lower critical value dL . The rejection and non-rejection regions for the DW test are
illustrated in Figure 1.
3 More generally, the AR(1) processes in time series analysis.
4 If
the test were used in the presence of lags of the dependent variable or otherwise stochastic regressors, the test statistic
would be biased towards 2, suggesting that in some instances the null hypothesis of no autocorrelation would not be rejected
when it should be.

10
Figure 1: Rejection and non-rejection regions for DW test

5.1.4 The Breusch-Godfrey test

The Breusch-Godfrey test is a more general test for autocorrelation up to the rth order, whereas DW
is a test only of whether consecutive errors are related to one another. The model for the errors under the
Breusch-Godfrey test is
εt = ρ1 εt−1 + ρ2 εt−2 + · · · + ρr εt−r + νt , νt ∼ N (0, σν2 ). (8)
The null and alternative hypotheses are:
H0 : ρ1 = ρ2 = · · · = ρr = 0, H1 : ρ1 ̸= 0 or ρ2 ̸= 0 or · · · or ρr ̸= 0.
The Breusch-Godfrey test is carried out as follows:
(1) Estimate the linear regression (8) using OLS and obtain the residuals, ε̂t .
(2) Obtain R2 from the auxiliary regression
∑
K ∑
r
ε̂t = γ1 + γi xit + ρj ε̂t−j + νt , νt ∼ N (0, σν2 ).
i=2 j=1

(3) Letting T denote the number of observations, the test statistic is given by
(T − r)R2 ∼ χ2r .
Note that (T − r) pre-multiplies R2 in the test for autocorrelation rather than T . This arises because
the first r observations will effectively have been lost from the sample in order to obtain the r lags used in
the test regression, leaving (T − r) observations from which to estimate the auxiliary regression.
One potential difficulty with Breusch-Godfrey is in determining an appropriate value of r. There is no
obvious answer to this, so it is typical to experiment with a range of values, and also to use the frequency of
the data to decide. For example, if the data is monthly or quarterly, set r equal to 12 or 4, respectively.

5.2 Consequences of ignoring autocorrelation if it is present

The consequences of ignoring autocorrelation when it is present are similar to those of ignoring het-
eroscedasticity. The coefficient estimates derived using OLS are still unbiased, but they are inefficient, even
at large sample sizes, so that the standard error estimates could be wrong.
In the case of positive serial correlation in the residuals, the OLS standard error estimates will be
biased downwards relative to the true standard errors. Furthermore, R2 is likely to be inflated relative to
its “correct” value if autocorrelation is present but ignored, since residual autocorrelation will lead to an
underestimate of the true error variance (for positive autocorrelation).
The following example illustrates the statement above. We assume the autocorrelation is represented by
a first-order autocorrelation:
∑K
yt = β1 + βi xit + εt , εt = ρεt−1 + νt .
i=2

where −1 < ρ < 1 and νt is a random variable that satisﬁes the CLRM assumptions; namely E[νt |εt−1 ] = 0,
Var(νt |εt−1 ) = σν2 , and Cov(νt , νs ) = 0 for all t ̸= s. By repeated substitution, we obtain
εt = νt + ρνt−1 + ρ2 νt−2 + ρ3 νt−3 + · · · .

11
Therefore,
σν2
E[εt ] = 0, Var(εt ) = σν2 + ρ2 σν2 + · · · = .
1 − ρ2
The stationarity assumption (|ρ| < 1) is necessary to constrain the variance from becoming an inﬁnite value.
σν2
OLS assumes no autocorrelation; that is, ρ = 0 in the expression σε2 = 1−ρ 2 . Consequently, in the presence

of autocorrelation, the estimated variances and standard errors from OLS are underestimated.

5.3 Dealing with autocorrelation

5.3.1 The Cochrane-Orcutt procedure
The Cochrane-Orcutt (CO) procedure works by assuming a particular form for the structure of the
autocorrelation, and it is carried out as follows:
(1) Assume that the general model is of the form

∑
K
yt = β1 + βi xit + εt
i=2

εt = ρεt−1 + νt . (9)
Estimate the equation using OLS, ignoring the residual autocorrelation.
(2) Obtain the residuals, and run the regression

ε̂t = ρε̂t−1 + νt .

(3) Obtain ρ̂ and construct yt∗ = yt − ρ̂yt−1 , β1∗ = (1 − ρ̂)β1 , x∗2t = (x2t − ρ̂x2(t−1) , etc., so that the
original model can be written as
∑K
yt∗ = β1∗ + βi x∗it + νt
i=2

(4) Run the GLS regression

∑
K
yt∗ = β1∗ + βi x∗it + νt
i=2

Cochrane and Orcutt [2] argue that better estimates can be obtained by repeating steps (2)-(4) until the
change in ρ̂ between one iteration and the next is less than some ﬁxed amount (e.g. 0.01). In practice, a
small number of iterations (no more than 5) will usually suﬃce. We also note assumptions like (9) should
be tested before the Cochrane-Orcutt or similar procedure is implemented.5

5.3.2 The Newey-West standard errors

The Newey-West (NW) standard errors. Estimating the model using OLS and adjusting the standard
errors for autocorrelation has become more popular than other correction methods. There are two reasons for
this: (1) The serial correlation robust standard errors can adjust the results in the presence of a basic AR(1)
process or a more complex AR(q) process, and (2) only the biased portion of the results (the standard errors)
are adjusted, while the unbiased estimates (the coeﬃcients) are untouched, so no model transformation is
required.
The White variance-covariance matrix of the coeﬃcients is appropriate when the residuals of the estimated
equation are heteroscedastic but serially uncorrelated. Newey and West [7] developed a variance-covariance
estimator that is consistent in the presence of both heteroscedasticity and autocorrelation. The corresponding
standard error is called the heteroscedasticity-autocorrelation-corrected (HAC) standard error, the serial
correlation robust standard error, or the Newey-West (NW) standard error. It can be calculated by applying
the following steps:
5 For a similar procedure called the Prais-Winsten (P W ) transformation, see [11].

12
∑K
(1) Estimate your original model yt = β1 + i=2 βi xit + εt and obtain the residuals: ε̂t .
∑K
(2) Estimate the auxiliary regression x2t = α1 + i=3 αi xit + rt and retain the residuals: r̂t .
(3) Find the intermediate adjustment factor, α̂t = r̂t ε̂t , and decide how much serial correlation (the
number of lags) you’re going to allow. A Breusch-Godfrey test can be useful in making this determination,
while EViews uses INTEGER[4(T /100)2/9 ].
∑T ∑g [ ] (∑
T
)
(4) Obtain the error variance adjustment factor, v̂ = t=1 α̂t2 + 2 h=1 1 − g+1 h
t=h+1 t t−h ,
α̂ α̂
where g represents the number of lags determined in Step 3.
(5) Calculate the serial correlation robust standard error. For variable x2 ,
( )2
se(β̂2 ) √
se(β̂2 )HAC = v̂.
σ̂ε

(6) Repeat Steps (2) through (5) for independent variables x3 through xK .

5.3.3 Dynamic models

Dynamic models. In practice, assumptions like (9) are likely to be invalid and serial correlation in the
errors may arise as a consequence of “misspeciﬁed dynamics”. Therefore a dynamic model that allows for
the structure of y should be used rather than a residual correction on a static model that only allows for a
contemporaneous relationship between the variables.

5.3.4 First diﬀerence

First differences. Another potential “remedy” for autocorrelated residuals would be to switch to a model
in first differences rather than in levels.

5.4 Miscellaneous issues

Why might lags be required in a regression? Lagged values of the explanatory variables or of the
dependent variable (or both) may capture important dynamic structure in the dependent variable. Two
possibilities that are relevant in fiance are as follows.
• Inertia of the dependent variable. Often a change in the value of one of the explanatory variables will
not affect the dependent variable immediately during one time period, but rather with a lag over several time
periods. Many variables in economics and finance will change only slowly as a result of pure psychological
factors. Delays in response may also arise as a result of technological or institutional factors.
• Over-reactions.
Autocorrelation that would not be remedied by adding lagged variables to the model.
• Omission of relevant variables, which are themselves autocorrelated, will induce the residuals from the
estimated model to be serially correlated.
• Autocorrelation owing to unparameterised seasonality.
• If “misspecification” error has been committed by using an inappropriate functional form.
Problems with adding lagged regressors to “cure” autocorrelation.
• Inclusion of lagged values of the dependent variable violates the assumption that the explanatory variables
are non-stochastic. In small samples, inclusion of lags of the dependent variable can lead to biased coefficient
estimates, although they are still consistent.
• A model with many lags may have solved a statistical problem (autocorrelated residuals) at the expense
of creating an interpretational one.
Note that if there is still autocorrelation in the residuals of a model including lags, then the OLS estimators
will not even be consistent.
Autocorrelation in cross-sectional data. Autocorrelation in the context of a time series regression
is quite intuitive. However, it is also plausible that autocorrelation could be present in certain types of
cross-sectional data.

13
6 Violation of Assumptions: Non-Stochastic Regressors
The OLS estimator is consistent and unbiased in the presence of stochastic regressors, provided that the
regressors are not correlated with the error term of the estimated equation. However, if one or more of the
explanatory variables is contemporaneously correlated with the disturbance term, the OLS estimator will
not even be consistent. This results from the estimator assigning explanatory power to the variables where
in reality it is arising from the correlation between the error term and yt .

7 Violation of Assumptions: Non-Normality of the Disturbances

The Bera-Jarque test statistic is given by
[ ]
b21 (b2 − 3)2
W =T +
6 24

where T is the sample size, b1 is the coeﬃcient of skewness

E[ε3 ]
b1 = ,
(σ 2 )3/2

and b2 is the coeﬃcient of kurtosis

E[ε4 ]
b2 = .
(σ 2 )2
The test statistic W asymptotically follows a χ2 (2) under the null hypothesis that the distribution of the
series is symmetric and mesokurtic, properties that a standard normal distribution has.
b1 and b2 can be estimated using the residuals from the OLS regression ε̂. The null hypothesis is of
normality, and this would be rejected if the residuals from the model were either signiﬁcantly skewed or
leptokurtic/platykurtic (or both).

8 Issues of Model Speciﬁcation

8.1 Omission of an important variable
The consequence would be that the estimated coefficients on all the other variables will be biased and
inconsistent unless the excluded variable is uncorrelated with all the included variables. Even if this condition
is satisfied, the estimate of the coefficient on the constant term will be biased, and the standard errors will
also be biased (upwards). Further information is offered in Dougherty [3], Chapter 7.

8.2 Inclusion of an irrelevant variable

The consequence of including an irrelevant variable would be that the coefficient estimators would still
be consistent and unbiased, but the estimators would be inefficient. This would imply that the standard
errors for the coefficients are likely to be inflated. Variables which would otherwise have been marginally
significant may no longer be so in the presence of irrelevant variables. In general, it can also be stated that
the extent of the loss of efficiency will depend positively on the absolute value of the correlation between the
included irrelevant variable and the other explanatory variables.
When trying to determine whether to err on the side of including too many or too few variables in a
regression model, there is an implicit trade-off between inconsistency and efficiency; many researchers would
argue that while in an ideal world, the model will incorporate precisely the correct variables – no more and
no less – the former problem is more serious than the latter and therefore in the real world, one should err
on the side of incorporating marginally significant variables.

14
8.3 Functional form: Ramsey’s RESET
Ramsey’s regression specification error test (RESET) is conducted by adding a quartic function of the
fitted values of the dependent variable (ŷt2 , ŷt3 , and ŷt4 ) to the original regression and then testing the joint
significance of the coefficients for the added variables.
The logic of using a quartic of your fitted values is that they serve as proxies for variables that may have
been omitted – higher order powers of the fitted values of y can capture a variety of non-linear relationships,
since they embody higher order powers and cross-products of the original explanatory variables.
The test consists of the following steps:
1. Estimate the model you want to test for specification error. E.g. yt = β1 + β2 x1t + · · · + βK xKt + εt .
2. Obtain the fitted values after estimating your model and estimate :

∑
K
yt = α1 + α2 ŷt2 + · · · + αp ŷtp + βi xit + νt . (10)
i=1

3. Test the joint significance of the coefficients on the fitted values of yt terms using an F -statistic, or
using the test statistic T R2 , which is distributed asymptotically as χ2 (p − 1) (the value of R2 is obtained
from the regression (10)). If the value of the test statistic is greater than the critical value, reject the null
hypothesis that the functional form was correct.
A RESET allows you to identify whether misspecification is a serious problem with your model, but it
doesn’t allow you to determine the source.

8.4 Parameter stability / structural stability tests

The idea is essentially to split the data into sub-periods and then to estimate up to three models, for
each of the sub-parts and for all the data and then to “compare” the RSS of each of the models. There are
two types of test: the Chow (analysis of variance) test and predictive failure tests.

8.4.1 The Chow test

To apply a Chow test for structural stability between any two groups (A and B):
1. Estimate your model combining all data and obtain the residual sum of squares (RSSr ) with degrees
of freedom T − 2K.
2. Estimate your model separately for each group and obtain the residual sum of squares for group A,
RSSur,A , with degrees of freedom TA − K and the residual sum of squares for group B, RSSur,B , with
degrees of freedom TB − K.
3. Compute the F -statistic by using this formula:
RSSr −(RSSur,A +RSSur,B )
K
F = RSSur,A +RSSur,B
.
T −2K

The null hypothesis for the Chow test is structural stability. The larger the F -statistic, the more evidence
you have against structural stability and the more likely the coeﬃcients are to vary from group to group.
If the value of the test statistic is greater than the critical value from the F -distribution, which is an
F (K, T − 2K), then reject the null hypothesis that the parameters are stable over time.
Note the result of the F -statistic for the Chow test assumes homoskedasticity. A large F -statistic only
informs you that the parameters vary between the groups, but it doesn’t tell you which speciﬁc parameter(s)
is (are) the source(s) of the structural break.

8.4.2 Predictive failure tests

A problem with the Chow test is that it is necessary to have enough data to do the regression on both
sub-samples. An alternative formulation of a test for the stability of the model is the predictive failure
test, which requires estimation for the full sample and one of the sub-samples only. The predictive failure
test works by estimating the regression over a “long” sub-period (i.e. most of the data) and then using

15
those coefficient estimates for predicting values of y for the other period. These predictions for y are then
implicitly compared with the actual values. The null hypothesis for this test is that the prediction errors for
all of the forecasted observations are zero.
To calculate the test:
1. Run the regression for the whole period (the restricted regression) and obtain the RSS.
2. Run the regression for the “large” sub-period and obtain the RSS (called RSS1 ). Note the number
of observations for the long estimation sub-period will be denoted by T1 . The test statistic is given by
RSS − RSS 1 T1 − K
×
RSS T2
where T2 is the number of observations that the model is attempting to “predict”. The test statistic will
follow an F (T2 , T1 −K) distribution.
Forward predictive failure tests are where the last few observations are kept back for forecast testing.
Backward predictive failure tests attempt to “back-cast” the first few observations. Both types of test offer
further evidence on the stability of the regression relationship over the whole sample period

8.4.3 The Quandt likelihood ratio (QLR) test

The Chow and predictive failure tests will work satisfactorily if the date of a structural break in a financial
time series can be specified. But more often, a researcher will not know the break date in advance. In such
circumstances, a modified version of the Chow test, known as the Quandt likelihood ratio (QLR) test,
can be used instead. The test works by automatically computing the usual Chow F -test statistic repeatedly
with different break dates, then the break date giving the largest F -statistic value is chosen.
While the test statistic is of the F -variety, it will follow a non-standard distribution rather than an F -
distribution since we are selecting the largest from a number of F -statistics rather than examining a single
one. The test is well behaved only when the range of possible break dates is sufficiently far from the end
points of the whole sample, so it is usual to “trim” the sample by (typically) 5% at each end.

8.4.4 Recursive least squares (RLS): CUSUM and CUSUMQ

An alternative to the QLR test for use in the situation where a researcher is unsure of the date is to
perform recursive least squares (RLS). The procedure is appropriate only for time-series data or cross-
sectional data that have been ordered in some sensible way (e.g., a sample of annual stock returns, ordered
by market capitalisation).
Recursive estimation simply involves starting with a sub-sample of the data, estimating the regression,
then sequentially adding one observation at a time and re-running the regression until the end of the sample
is reached.
It is to be expected that the parameter estimates produced near the start of the recursive procedure
will appear rather unstable, but the key question is whether they then gradually settle down or whether
the volatility continues through the whole sample. Seeing the latter would be an indication of parameter
instability.
It should be evident that RLS in itself is not a statistical test for parameter stability as such, but rather it
provides qualitative information which can be plotted and thus gives a very visual impression of how stable
the parameters appear to be.
The CUSUM test is based on a normalised (i.e. scaled) version of the cumulative sums of the residuals.
Under the null hypothesis of perfect parameter stability, the CUSUM statistic is zero. A set of ±2 standard
error bands is usually plotted around zero and any statistic lying outside the bands is taken as evidence of
parameter instability.
The CUSUMSQ test is based on a normalised version of the cumulative sums of squared residuals.
The scaling is such that under the null hypothesis of parameter stability, the CUSUMSQ statistic will start
at zero and end the sample with a value of 1. Again, a set of ±2 standard error bands is usually plotted
around zero and any statistic lying outside these is taken as evidence of parameter instability.
For full technical details of CUSUM and CUSUMQ, see Greene [5], Chapter 7.

16
9 The Generalized Linear Regression Model (GLRM)
The generalized linear regression model is


y = Xβ + ε
E[ε|X] = 0 (11)


E[εε′ |X] = σ 2 Ω = Σ,

where Ω is a positive deﬁnite matrix.

In this model, heteroscedasticity usually arises in volatile high frequency time-series data and in cross-
section data where the scale of the dependent variable and the explanatory power of the model tend to vary
across observations. Autocorrelation is usually found in time-series data. Panel data sets, consisting of cross
sections observed at several points in time, may exhibit both characteristics.
Convention on notation: Throughout this section, we shall use n in place of T to stand for the number of
observations. This is to be consistent with popular textbooks like Greene [5].

9.1 Properties of OLS in the GLRM

Recall under the assumptions of CLRM, the OLS estimator

β̂ = (X ′ X)−1 X ′ y = β + (X ′ X)−1 X ′ ε

is the best linear unbiased estimator (BLUE), consistent and asymptotically normally distributed (CAN),
and if the disturbances are normally distributed, asymptotically eﬃcient among all CAN estimators.
In the GLRM, the OLS estimator remains unbiased, consistent, and asymptotically normally distributed.
It will, however, no longer be eﬃcient and the usual inference procedures based on the F and t distributions
are no longer appropriate.
Theorem 1 (Finite Sample Properties of β̂ in the GLRM). If the regressors and disturbances are
uncorrelated, then the least squares estimator is unbiased in the generalized linear regression model. With
non-stochastic regressors, or conditional on X, the sampling variance of the least squares estimator is

Var[β̂|X] = E[(β̂ − β)(β̂ − β)′ |X]

= E[(X ′ X)−1 X ′ εε′ X(X ′ X)−1 |X]
= (X ′ X)−1 X ′ (σ 2 Ω)X(X ′ X)−1
( )−1 ( )( )−1
σ2 1 ′ 1 ′ 1 ′
= XX X ΩX XX
n n n n
( )−1 ( )( )−1
1 ′ 1 1 ′
= XX Φ XX . (12)
n n n
2
( )
where Φ = σn X ′ ΩX = Cov √1n X ′ (y − Xβ) is essentially the covariance matrix of the scores or esti-
mating functions. If the regressors are stochastic, then the unconditional variance is EX [Var[b|X]]. β̂ is a
linear function of ε. Therefore, if ε is normally distributed, then

β̂|X ∼ N (β, σ 2 (X ′ X)−1 (X ′ ΩX)(X ′ X)−1 )

If Var[β̂|X] converges to zero, then β̂ is mean square consistent. With well-behaved regressors, (X ′ X/n)−1
will converge to a constant matrix. But (σ 2 /n)(X ′ ΩX/n) need not converge at all.
Theorem 2 (Consistency of OLS in the GLRM). If Q = p lim(X ′ X/n) and p lim(X ′ ΩX/n) are both
ﬁnite positive deﬁnite matrices, then β̂ is consistent for β. Under the assumed conditions,

p lim β̂ = β.

17
The conditions in the above theorem depend on both X and Ω. An alternative formula that separates
the two components can be found in Greene [5, page 194-195].

Theorem 3 (Asymptotic Distribution of β̂ in the GLRM). If the regressors are sufficiently well
behaved and the off-diagonal terms in Ω diminish sufficiently rapidly, then the least squares estimator is
asymptotically normally distributed with covariance matrix
( )
σ 2 −1 1 ′
Asy.Var[β̂] = Q p lim X ΩX Q−1 .
n n

9.2 Robust estimation of asymptotic covariance matrices for OLS

In view of formula (12), is it necessary to discard OLS as an estimator? If Ω is known, then as will be
shown later, there is a simple and eﬃcient estimator based on Ω, and the answer is yes. If Ω is unknown but
its structure is known and we can estimate Ω using sample information, then the answer is less clear-cut.
The third possibility is that Ω is completely unknown, both as to its structure and the speciﬁc values of its
elements. In this situation, least squares or instrumental variables may be the only estimator available, and
as such, the only available strategy is to try to devise an estimator for the appropriate asymptotic covariance
matrix of β̂.
If σ 2 Ω were known, then the estimator of the asymptotic covariance matrix of β̂ would be
( )−1 ( )( )−1
1 1 ′ 1 ′ 2 1 ′
VOLS = XX X [σ Ω]X XX .
n n n n

The matrices of sums of squares and cross products in the left and right matrices are sample data that are
readily estimable, and the problem is the center matrix that involves the unknown σ 2 Ω = E[εε′ |X]. For
estimation purposes, we will assume that tr(Ω) = n, as it is when σ 2 Ω = σ 2 I in the CLRM.
Let Σ = (σij )i,j = σ 2 Ω = σ 2 (ωij )i,j . What is required is an estimator of the K(K + 1)/2 unknown
elements in the matrix
1 ∑
n
1
Q∗ = X ′ ΣX = σij x̃i x̃′j .
n n i,j=1

where x̃i is the column vector formed by the transpose of row i of X (see Greene [5, page 805]). To verify
this formula of Q∗ , recall we have the convention
 
xi1
 xi2 
 
X = [x1 , · · · , xK ], xi =  . 
 .. 
xin

with i = 1, · · · , K. So

x̃i = [x1i , x2i , · · · , xKi ]′ and x̃′j = [x1j , x2j , · · · , xKj ], 1 ≤ i, j ≤ n.

Consequently,  
x̃′1
 x̃′2  ∑
n
 
X =  .  , X ′ = [x̃1 , x̃2 , . . . , x̃n ], and X ′ ΣX = σij x̃i x̃′j
 ..  i,j=1
x̃′n

The least squares estimator β̂ is a consistent estimator of β, which implies that the least squares residuals
ε̂i are “pointwise” consistent estimators of their population counterparts εi . The general approach, then,
will be to use X and ε̂ to devise an estimator of Q∗ .

18
9.2.1 HC estimator
Consider the heteroscedasticity case ﬁrst. White [9] has shown that under very general conditions, the
estimator
1∑ 2
n
S0 = ε̂ x̃i x̃′i
n i=1 i
has
p lim S0 = p lim Q∗ .
Therefore, the White heteroscedasticity consistent (HC) estimator
( )−1 ( ∑ n
)( )−1
1 1 ′ 1 ′ 1 ′ −1 −1
Est.Asy.Var[β̂] = XX 2
ε̂i x̃i x̃i XX = n (X ′ X) S0 (X ′ X) .
n n n i=1 n

can be used to estimate the asymptotic covariance matrix of β̂.

This result is extremely important and useful. It implies that without actually specifying the type of
heteroscedasticity, we can still make appropriate inferences based on the results of least squares. This
implication is especially useful if we are unsure of the precise nature of the heteroscedasticity.

9.2.2 HAC estimator

In the presence of both heteroscedasticity and autocorrelation, as a natural extension of White’s result,
the natural counterpart for estimating
1 ∑
n
Q∗ = σij x̃i x̃′j
n i,j=1

would be
1 ∑
n
Q̂∗ = ε̂i ε̂j x̃i x̃′j
n i,j=1

But there are two problems with this estimator. The first one is that it is difficult to conclude yet that
Q̂∗ will converge to anything at all, since the matrix is 1/n times a sum of n2 terms. We can achieve the
convergence of Q̂∗ by assuming that the rows of X are well behaved and that the correlations diminish with
increasing separation in time.
The second problem is a practical one, that Q̂∗ needs not be positive definite. Newey and West [7] have
devised an estimator, the Newey–West autocorrelation consistent (AC) covariance estimator, that
overcomes this difficulty:

1∑ ∑
L n
l
Q̂∗ = S0 + wl ε̂t ε̂t−l (x̃t x̃′t−l + x̃t−l x̃′t ), wl = 1 − .
n L+1
l=1 t=l+1

It must be determined in advance how large L is to be. In general, there is little theoretical guidance. Current
practice speciﬁes L ≈ T 1/4 . Unfortunately, the result is not quite as crisp as that for the heteroscedasticity
consistent estimator.

9.2.3 R package for robust covariance estimation of OLS

See Zeileis [10] for a survey.

References
[1] Chris Brooks. Introductory econometrics for ﬁnance, 2ed.. New York, Cambridge University Press, 2008.
1, 3, 4

19
[2] Cochrane, D. and Orcutt, G. H. (1949). “Application of Least Squares Regression to Relationships
Containing Autocorrelated Error Terms”, Journal of the American Statistical Association 44, 32–61. 12
[3] Christopher Dougherty. Introduction to econometrics, 3ed.. Oxford University Press, 2007. 14

[4] Fabozzi, F. J. and Francis, J. C. (1980). “Heteroscedasticity in the Single Index Model”, Journal of
Economics and Business 32, 243–8. 9

[5] William H. Greene. Econometric analysis, 5ed.. Prentice Hall, 2002. 1, 16, 17, 18
[6] William H. Greene. Econometric analysis, 7ed.. Prentice Hall, 2012. 1, 3

[7] Newey, W. K. and West, K. D. (1987). “A Simple Positive-Deﬁnite Heteroskedasticity and

Autocorrelation-Consistent Covariance Matrix”, Econometrica 55, 703–8. 12, 19

[8] Roberto Pedace. Econometrics for dummies. Hoboken, John Wiley & Sons Inc., 2013. 1

[9] White, H. (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for
Heteroskedasticity”, Econometrica 48, 817–38. 9, 19

[10] Zeileis, A. (2004). “Econometric Computing with HC and HAC Covariance Matrix Estimators”, Journal
of Statistical Software, 11:10. 1, 19

[11] Zeng, Y. (2016). “Book Summary: Econometrics for Dummies”, version 1.0.5. Unpublished manuscript.
8, 12

Introduction To Biostatistics Student Lecture Notes
100% (2)
Introduction To Biostatistics Student Lecture Notes
130 pages
Worksheet Econometrics I
No ratings yet
Worksheet Econometrics I
7 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Econometrics Year 3 Eco
No ratings yet
Econometrics Year 3 Eco
185 pages
Econometrics II
100% (1)
Econometrics II
4 pages
Econometrics I CH-1
No ratings yet
Econometrics I CH-1
32 pages
Chapter 1 - Introduction To Banking
No ratings yet
Chapter 1 - Introduction To Banking
28 pages
CH 5 Time Series
No ratings yet
CH 5 Time Series
46 pages
Module 2 Nature of Industry
100% (1)
Module 2 Nature of Industry
24 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
48 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
Lny (0.0477) (0.0628) (0.0287) (0.2905) +0.2799D +0.5634D 0.2572D (0.1044) (0.1657) (0.0787) 0.7660
No ratings yet
Lny (0.0477) (0.0628) (0.0287) (0.2905) +0.2799D +0.5634D 0.2572D (0.1044) (0.1657) (0.0787) 0.7660
2 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
DATA SCIENCE SYLLABUS 2nd Year
No ratings yet
DATA SCIENCE SYLLABUS 2nd Year
58 pages
Econometrics: Specification Errors
100% (2)
Econometrics: Specification Errors
13 pages
Questions Regarding Panel Data
No ratings yet
Questions Regarding Panel Data
3 pages
Econometrics Lecture 01
0% (2)
Econometrics Lecture 01
22 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Chapter 4 - Securities Markets & Organizations
No ratings yet
Chapter 4 - Securities Markets & Organizations
28 pages
SRSTRT Mod
100% (1)
SRSTRT Mod
184 pages
Chapter 2 - Econometrics
No ratings yet
Chapter 2 - Econometrics
41 pages
CH 3 - 0
No ratings yet
CH 3 - 0
12 pages
Repaso Econometria Final BUENO
No ratings yet
Repaso Econometria Final BUENO
88 pages
Ramesh Rao
100% (1)
Ramesh Rao
12 pages
Application of The Concept of Sequence and Arithmetic Series in Economics 1
No ratings yet
Application of The Concept of Sequence and Arithmetic Series in Economics 1
17 pages
Chapter 7 Introduction To Portfolio Management
0% (1)
Chapter 7 Introduction To Portfolio Management
6 pages
MPhil Econometrics Question Final Exam 2022
No ratings yet
MPhil Econometrics Question Final Exam 2022
2 pages
Chapter 3: Quantitative Demand Analysis Answers To Questions and Problems
No ratings yet
Chapter 3: Quantitative Demand Analysis Answers To Questions and Problems
14 pages
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
100% (3)
Chapter Two: Simple Linear Regression Models: Assumptions and Estimation
34 pages
Problems For CFA Level I: Accounting Income and Assets: The Accrual Concept
No ratings yet
Problems For CFA Level I: Accounting Income and Assets: The Accrual Concept
8 pages
Statistics: Self-Learning Module 15
No ratings yet
Statistics: Self-Learning Module 15
16 pages
Introduction To Management Science 8th Edition by Bernard W. Taylor III
50% (2)
Introduction To Management Science 8th Edition by Bernard W. Taylor III
39 pages
Actual Costing-1
No ratings yet
Actual Costing-1
41 pages
Chapter-12 ANOVA For-Homework
No ratings yet
Chapter-12 ANOVA For-Homework
16 pages
Chapter Two: Defining The Research Problem and Hypotheses Formulation 2.1 What Is Research Problem?
No ratings yet
Chapter Two: Defining The Research Problem and Hypotheses Formulation 2.1 What Is Research Problem?
8 pages
chp2 Econometric
No ratings yet
chp2 Econometric
54 pages
Managerial Accounting - Ch.1
No ratings yet
Managerial Accounting - Ch.1
2 pages
Stat 378
No ratings yet
Stat 378
73 pages
Econometrics Module 2
No ratings yet
Econometrics Module 2
38 pages
Labor Ecos Chapter 2
No ratings yet
Labor Ecos Chapter 2
100 pages
Demand and Supply
No ratings yet
Demand and Supply
13 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Psychological Factors Affecting Consumer Behaviour
No ratings yet
Psychological Factors Affecting Consumer Behaviour
3 pages
Part 2
No ratings yet
Part 2
16 pages
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
No ratings yet
CHAPTER TWO-Econometrics I (Econ 2061) Edited1 PDF
35 pages
Chapter 4 Multiple Regression Model
No ratings yet
Chapter 4 Multiple Regression Model
31 pages
Chapter 4 Financing Current Assets
No ratings yet
Chapter 4 Financing Current Assets
33 pages
Econometrics UAB
No ratings yet
Econometrics UAB
353 pages
Stationary and Non Stationary
100% (1)
Stationary and Non Stationary
5 pages
Econometrics Question M.Phil II 2020
No ratings yet
Econometrics Question M.Phil II 2020
4 pages
Ch. 3 Overview of Investment
No ratings yet
Ch. 3 Overview of Investment
56 pages
Causal Forecasting Final
No ratings yet
Causal Forecasting Final
29 pages
Class 12 Applied Mathematics Sample Paper Set 1
No ratings yet
Class 12 Applied Mathematics Sample Paper Set 1
10 pages
Chapter Three Multiple
No ratings yet
Chapter Three Multiple
15 pages
Transfer Pricing - Management Control Systems
100% (2)
Transfer Pricing - Management Control Systems
47 pages
Chapter 4 Central Bank and Monetary Policy4995746790939042372
No ratings yet
Chapter 4 Central Bank and Monetary Policy4995746790939042372
13 pages
EMBA Induction Session 3
No ratings yet
EMBA Induction Session 3
28 pages
Mathematics in The Modern World: Imelda B. Antonio Mary Hope H. Gravoso
No ratings yet
Mathematics in The Modern World: Imelda B. Antonio Mary Hope H. Gravoso
12 pages
Measures of Averages
No ratings yet
Measures of Averages
32 pages
Special Probability Distributions: Presented By: Juanito S. Chan
No ratings yet
Special Probability Distributions: Presented By: Juanito S. Chan
37 pages
VK Malik
No ratings yet
VK Malik
25 pages
Lecture 3 Accounting Cycle
No ratings yet
Lecture 3 Accounting Cycle
8 pages
Tax Evasion
100% (1)
Tax Evasion
5 pages
CFA LVL II Quantitative Methods Study Notes
No ratings yet
CFA LVL II Quantitative Methods Study Notes
10 pages
Heteros Kedasti City
No ratings yet
Heteros Kedasti City
26 pages
John S. Loucks: Slides Prepared by
No ratings yet
John S. Loucks: Slides Prepared by
56 pages
A1249374774 16765 26 2019 MTH302 Unit2 3 Problems
0% (1)
A1249374774 16765 26 2019 MTH302 Unit2 3 Problems
3 pages
Financial Econometrics Assignment 2 2021
No ratings yet
Financial Econometrics Assignment 2 2021
5 pages
Q2 Module 4 Statistics
No ratings yet
Q2 Module 4 Statistics
11 pages
I P S F E: Probability and Probability Distributions
No ratings yet
I P S F E: Probability and Probability Distributions
52 pages
Course Outline Mphil 2018-19 Fall Course Title: Statistics For Research-I By: Prof. Vikash Raj Satyal, PHD (Email, Cell 9841-413453)
No ratings yet
Course Outline Mphil 2018-19 Fall Course Title: Statistics For Research-I By: Prof. Vikash Raj Satyal, PHD (Email, Cell 9841-413453)
4 pages
Forecasting Problems
No ratings yet
Forecasting Problems
2 pages
Unit Objective: Lesson Plan
No ratings yet
Unit Objective: Lesson Plan
9 pages
Linear Programming
No ratings yet
Linear Programming
9 pages
Normal Binomial
100% (1)
Normal Binomial
12 pages
Portfolio Construction Based On 10 Listed Companies of DSE: Prepared For
No ratings yet
Portfolio Construction Based On 10 Listed Companies of DSE: Prepared For
27 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
COT 3 Standard Deviation
No ratings yet
COT 3 Standard Deviation
37 pages
Energy Economics Detailed Syllabus
No ratings yet
Energy Economics Detailed Syllabus
23 pages
Stats Formulae
No ratings yet
Stats Formulae
6 pages
Chapter 13 Wiener Processes and PDF
No ratings yet
Chapter 13 Wiener Processes and PDF
29 pages
Financial Markets - Chapter 5 - Overview of Risk and Return
No ratings yet
Financial Markets - Chapter 5 - Overview of Risk and Return
29 pages
A.Aziz Et Al., 2019 EVALUATING ENVIRONMENTAL, HEALTH AND SAFETY PRACTICES IN HOSPITALS A CASE STUDY IN KARACHI
No ratings yet
A.Aziz Et Al., 2019 EVALUATING ENVIRONMENTAL, HEALTH AND SAFETY PRACTICES IN HOSPITALS A CASE STUDY IN KARACHI
7 pages
Velocity and Acceleration Before Contact in The Tackle During Rugby Union Matches
No ratings yet
Velocity and Acceleration Before Contact in The Tackle During Rugby Union Matches
11 pages
A Study of Perception Towards Digital Payment Adoption in Sagar City of Madhya Pradesh
No ratings yet
A Study of Perception Towards Digital Payment Adoption in Sagar City of Madhya Pradesh
8 pages
Lecture 02 Normal Distribution and Binomial Distribution
No ratings yet
Lecture 02 Normal Distribution and Binomial Distribution
38 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
3RD Week3
No ratings yet
3RD Week3
25 pages
Nowak Verly The Practice of SGS
No ratings yet
Nowak Verly The Practice of SGS
12 pages
Interview Screening Questions Document 1
No ratings yet
Interview Screening Questions Document 1
5 pages

CLRM Assumptions

Uploaded by

CLRM Assumptions

Uploaded by

Classical Linear Regression Model:

Assumptions and Diagnostic Tests

Version 1.1, last updated on 10/05/2016

2 Hypothesis Testing: The t-test and The F-test 4

3 Violation of Assumptions: Multicollinearity 5

4 Violation of Assumptions: Heteroscedasticity 7

5 Violation of Assumptions: Autocorrelation 9

6 Violation of Assumptions: Non-Stochastic Regressors 14

7 Violation of Assumptions: Non-Normality of the Disturbances 14

8 Issues of Model Speciﬁcation 14

9 The Generalized Linear Regression Model (GLRM) 17

X is a T × K matrix with rank K.

ε|X ∼ N (0, σ 2 I).

Var(β̂) = s2 (X ′ X)−1 . (3)

2 Hypothesis Testing: The t-test and The F-test

β̂i − hypothesized value

3 Violation of Assumptions: Multicollinearity

3.1 Detection of multicollinearity

3.2 Consequence of ignoring near multicollinearity

3.3 Dealing with multicollinearity

4.1.1 The Goldfeld-Quandt test

4.1.2 The White’s general test

ε̂2t = α1 + α2 x2t + α3 x3t + α4 x22t + α5 x23t + α6 x2t x3t + νt .

4.1.3 The Breusch-Pagan test

4.1.4 The Park test

4.2 Consequences of using OLS in the presence of heteroscedasticity

4.3 Dealing with heteroscedasticity

5 Violation of Assumptions: Autocorrelation

5.1.1 Graphical test

5.1.3 The Durbin-Watson test

where νt ∼ N (0, σnu

5.1.4 The Breusch-Godfrey test

5.2 Consequences of ignoring autocorrelation if it is present

5.3 Dealing with autocorrelation

(4) Run the GLS regression

5.3.2 The Newey-West standard errors

5.3.3 Dynamic models

5.3.4 First diﬀerence

5.4 Miscellaneous issues

7 Violation of Assumptions: Non-Normality of the Disturbances

where T is the sample size, b1 is the coeﬃcient of skewness

and b2 is the coeﬃcient of kurtosis

8 Issues of Model Speciﬁcation

8.2 Inclusion of an irrelevant variable

8.4 Parameter stability / structural stability tests

8.4.1 The Chow test

8.4.2 Predictive failure tests

8.4.3 The Quandt likelihood ratio (QLR) test

8.4.4 Recursive least squares (RLS): CUSUM and CUSUMQ

where Ω is a positive deﬁnite matrix.

9.1 Properties of OLS in the GLRM

Var[β̂|X] = E[(β̂ − β)(β̂ − β)′ |X]

β̂|X ∼ N (β, σ 2 (X ′ X)−1 (X ′ ΩX)(X ′ X)−1 )

9.2 Robust estimation of asymptotic covariance matrices for OLS

x̃i = [x1i , x2i , · · · , xKi ]′ and x̃′j = [x1j , x2j , · · · , xKj ], 1 ≤ i, j ≤ n.

can be used to estimate the asymptotic covariance matrix of β̂.

9.2.2 HAC estimator

9.2.3 R package for robust covariance estimation of OLS

[7] Newey, W. K. and West, K. D. (1987). “A Simple Positive-Deﬁnite Heteroskedasticity and

You might also like