0% found this document useful (0 votes)
26 views60 pages

Econometrics Chapter Five

Chapter Five discusses the violations of classical model assumptions in econometrics, focusing on issues like multicollinearity, autocorrelation, and their implications for parameter estimation and hypothesis testing. It highlights the consequences of these violations, such as biased estimators and invalid statistical tests, and presents methods for detecting multicollinearity and autocorrelation, including Variance Inflation Factor (VIF) and Durbin-Watson tests. The chapter concludes with potential remedial measures to address these issues in regression analysis.

Uploaded by

Zelalem Teshome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views60 pages

Econometrics Chapter Five

Chapter Five discusses the violations of classical model assumptions in econometrics, focusing on issues like multicollinearity, autocorrelation, and their implications for parameter estimation and hypothesis testing. It highlights the consequences of these violations, such as biased estimators and invalid statistical tests, and presents methods for detecting multicollinearity and autocorrelation, including Variance Inflation Factor (VIF) and Durbin-Watson tests. The chapter concludes with potential remedial measures to address these issues in regression analysis.

Uploaded by

Zelalem Teshome
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Bule Hora University

College of Business and Economics


Department of Economics
CHAPTER FIVE
Violations of the Assumptions of Classical Model
By:
Girja Boru

03/17/2025 1
Introduction
• The classical assumptions do not always hold. If one or more of these
assumptions are violated (say in the presence of Autocorrelation,
Heteroscedasticity and/or multicollinearity):
• Estimations of parameters will not be accurate
• The OLS Estimators coefficients may not hold BLUE property.
• Tests of hypothesis using standard t and F – statistics will no-longer be valid
• Conclusion/ inferences made will be misleading.
• Formal tests are required to identify whether theses assumptions are
satisfied or not.

2
MULTICOLLINEARITY (MC)

• The Assumption classical linear regression model (CLRM) is that


there is no high multicollinearity among the regressors included in
the regression model.
• Multicollinearity meant the existence of a “perfect” or exact and
inexact, linear relationship among some or all explanatory variables of
a regression model.
• If there is no exact linear relationship exists between any of the
explanatory variable or if all explanatory variables are uncorrelated
with each other, we speak of absence of MC.

3
MULTICOLLINEARITY (MC)
• There are two extreme cases and rarely exist in practice. Of particularly are cases in
between: moderate to high degree of MC
• MC is common in macro economic time series data (such as GNP, money supply,
income and etc) since economic variables tends to move together.
• Consequence of Perfect MC
• We say that there is a perfect MC if two or more explanatory variables are perfectly
correlated, that is, if the following r/ship exists between the explanatory variables:
• One consequence of perfect MC is non- identifiability of the regression coefficients
vector β. It refers to the inability to uniquely determine the values of certain parameters
in a regression model.
• Another consequence of perfect MC is that we cannot estimate the
regression coefficients.

4
Illustration:
• Consider the model in deviation form (k=3)
yi  2 x2i   3 x3i   i
• In relation (*), suppose  1,  5 and  j 0 for all value of j, i.e, X
2 3 2 5 X 3

• X 2i 1   3 X 3i   4 X 4i   i

• We have seen earlier that the OLS estimator of  2 is:


   x2 y    x3 2     x3 y    x2 x3 
2  2
  x2 2    x3 2     x2 x3 
Since we have x2 5 x3 , we can replace x2 by 5 x3
   (5 x3 ) y    x3 2     x3 y    (5 x3 ) x3 
2  2
  (5 X 3 ) 2    x3 2     (5 X 3 ) x3 
 5   x3 y    x3 2   5   x3 y    x3 2  0
2  2

25   X 3 2    x3 2   25   x3 2  0
 
Thus,  2 is indeterminate. It can also be show  3 is indeterminate.Therefore
, in the presence of perfect MC , theregression coefficients can not be estimated .

5
Cont….
• Major implications of a high degree of MC:
1.OLS coefficient estimates are still unbiased.
2. OLS coefficient estimates will have large variances (or the variance
will be inflated).
3.There is high probability of accepting the null hypothesis of zero
coefficients (using the t test) when in fact the coefficient is significantly
different from zero.
4. The regression model may do well, that is R 2 may be quite high.
5. The OLS estimates and their standard errors may be quite sensitive
to small changes in the data.

6
Method of Detecting MC
• MC almost always exists in most applications.
• So the question is not whether it is present or not; it is a question of
degree!
• Also MC is not a statistical problem; it is a data (sample) problem.
• Therefore, we do not ‘test for MC’; but we measures its degree in any
particular sample (using some rule of thumb).
Some of Methods
2
of Detecting MC are:
R
1. High but few (or no) significant t-ratios
2. High pair-wise correlations among repressor. Note that this is a sufficient
conditions but not a necessary condition; that is, small-wise correlation
for all pairs of repressor does not guarantee the absence of MC
3. Variance Inflation Factor (VIF)

7
CONT….

Consider the regression model


Yi 1   2 X 2  3 X 3  ...  k X k   i......(*)

VIF of  j is defined as :
 1
VIF (  j )  j 2, 3,..., k
 2
1 R j
 2
Where R j i s the coefficient of determination obtained when the
Xi variable is regressed on the remaining explanatory varibles( called

auxiliary regression). For example , theVIF of  2 is defined as :
 1
VIF (  2 ) 
 2
1  R2
Where R2 2 is coefficient of det er min ation of the auxiliary regression :
X 2i 1   2 X 1   3 X 3  ...   k X k   i
8
Cont….
Rule of Thumb

A) If VIF (  ) exceeds 10, then  is poorly estimated because of MC (or

j j

the j th regressor variable (Xj) is responsible for MC.



B) (Klien’s rule) MC is trouble if any of  exceeds the over all R 2 (the
j

coefficients of determination of the regression equation.


Example: Consider the data on imports (Y), GDP(X2), Stock formation(X3)
and Consumption (X4) for the years 1949-1967.
• The coefficient of determination of the auxiliary regression of GDP(X2)
on stock formation (X3) and consumption (X4).
X 2 i 1   3 X 3i   4 X 4 i   i

2
Using SPSS R2 =0.998203.
1

the
1
VIF of VIF of  2 is thus:
VIF (  2 )   556.5799
 2 1  0.998203
1  R2 9
Cont….
• Since the figure is far exceed 10, we can conclude that the coefficients
of GDP is poorly estimated because of MC ( or that GDP is
responsible for MC).
• Remedial Measures
1.Include additional observations
2. Dropping a variable
If we consider our example , we expect GDP to have an impact on
imports. By dropping this variable we can reduce MC
3. A prior information
If we have a priori information that  2  3 But this information is
rarely available.
10
Autocorrelation

• It occurs when the classical assumption that cov( i ,  j ) 0 fori  j


is violated.
• The assumption tells as that the error term at time t is not correlated
with the error term at any other point of time.
• It occurs most frequently when estimating models using time series
data and hence also called serial correlation.

11
Autocorrelation
• In general, there are a lot of conditions under which the errors are
autocorrelated (AC). In such cases we have:
Cov( i ,  j ) E ( i ,  j ) 0 for i  j

• In order to see the consequence of AC, we have to specify the nature (


mathematical form) of AC.
• Usually we assume that the errors (disturbances) follow the first-
order-autoregressive scheme( abbreviated AR(1)

12
Autocorrelation
• Autoregressive model for order one (AR(1)) model
• The error process:
 t  t  1  ut
where E (ut ) 0, var E (ut 2 )  u 2 and E (ut u s ) 0 for t s

• (that is, the Ut ’s satisfy all assumptions


R
of the CLRM) is called
2

autoregressive process of order one (or AR(1) process)


• Thus ,ρ is nothing but the coefficient of correlation between  t and  t  1
• If the error are autocorrelated, and yet we persist in using OLS, then the
variances of regression coefficients will be under-estimated leading to
narrower confidence intervals, high values of R 2 and inflated t-ratio.

13
Autocorrelation

• Implications of AC
1. OLS estimators are still unbiased.
2. OLS estimators are consistent, i.e their variance approaches to
zero, as the sample size gets larger.
3. OLS estimators are no longer efficient.
4. The estimated variance of the OLS estimators are biased, and, as a
consequence, the conventional confidence intervals and tests of
significance are not valid

14
Autocorrelation
• Test for the presence of AC:
1.Graphical Method
 

• Plot the estimated residuals,   y  y against time. If we see a


t t t

clustering of neighboring residuals on one or the other side of the


line ε=0 then such clustering is a sign that errors are auto correlated.
2.Durbin-Watson(DW)
The DW test statistics is computed as:
n

 ( t   t  1 )2
dw  t 2
n

 t
(
t 1
) 2

15
Autocorrelation

• To test of H o :  0 versus H A :  0 , we can use the Durbin-


Watson lower (dL) and upper (du) bounds (critical values).
Decision rule:
Reject H o if d  d L
Do not reject H o if d  d L
The test is inclusive if d L  d  d u

Some advantage of DW test


• It is used only for first-order serial correlation
• It contain inclusive region
16
Autocorrelation

17
Autocorrelation
• Limitation of DW test
a) There are certain regions where the test is inclusive
b) The test is valid only when there is an intercept term in the model
c) The test is invalid when the lagged values of the dependent variable
appear as repressors.
d) The test is valid for AR(1) error scheme only.
3. Breusch- Godfrey (BG) test
• Suppose we estimate the model
yt  0  1 x1t   2 x2t  ....   n xnt   t
• Assume the error term has the following lag structure
 t 1 t  1   2 t  2  3 t  3  ...   n t  n  v
18
Autocorrelation

• If there is no autocorrelation problem, it is naturally expected that the coefficients


of the lagged error terms should be zero. Hence, the null is:
H 0 : 1  2 3 ...  n 0

The procedure is:


ˆt
• Predict the residual
R2
ˆ•t 
Run
 0 the
1 xauxiliary regression andˆ obtainˆ
1t   2 x2 t  ...   n xnt  1 t  1   2 t  2  3 t  3  ...   n  t  n  vt
ˆ ˆ
R2
 the sample size T is large , BG have shown that (T-p) follows
• If
2
the Chi-square
( 2
(T  p ) R with p degree of freedom.
)distribution  2

• If exceeds the tabulated value from the distribution with p


Degree of freedom we reject the null hypothesis of No AC
• If tests prove that there is a true autocorrelation problem, we may use other techniques
such as GLS (Generalized Least Square).
19
Autocorrelation
Advantage of the BG test
a) The test is always conclusive
b) The test is valid when lagged values of the dependent variable
appear as regressor
c) The test is valid for higher order AR schemes (not just for AR(1)
error scheme only.
4. Test based on the partial autocorrelation (PACFC) of OLS residuals.
Plot the PACF of OLS residual. If the function at lag one is outside the
95% upper and lower confidence limits, then this is an indication that
the error follows the AR(1) process. Higher order processes can be
detected similarly.

20
Autocorrelation
Correcting for error AC (of AR(1) scheme)
Consider the model
Yt    X t   t t 1, 2....T ........(*)
Where the errors are generated according to the AR(1) scheme:
 t  t  1  ut ,  1
here ut  t   t  1 satisfies all assunption of CLRM (that is , E (ut ) 0,
var (ut ) E (U t )  u and E (ut u s ) 0 for t s
2 2

Suppose by applying any one of the above tests you come to the
conclusion that the errors are autocorrelated. What to do next?
21
Autocorrelation
• Lagging equation (*) by one period and multiplying throughout by , we
get:
Yt  1    X t  1   t  1 ..............(**)
Subtracting equation (**) from equation (*), we get
Yt  Yt  1  (1   )   ( X   X t  1 )  ( t   t  1 )
                 
Yt *  * X t
* ut

Yt *  *  X t *  ut .............(*)

The above transformationuis known


t   t  as
t  1the Cochrane –Orcutt
transformation. Since fulfils all assumptions of the CLRM, we
can apply OLS to equation (***) to get estimators which are BLUE. 
Problem: the above transformation requires a knowledge of the value of .
Thus we need to estimate it. 22
Autocorrelation
• Method of estimation of 
a)Using the Durbin-Watson statistic.
It can be shown as T (the sample size ) gets larger, the DW static d
approaches to 2(1   ), i.e d  2(1   ) asT  . Thus, we can use this fact
to construct an estimator of  as:
 d
 1 
2
Note: This estimator is highly inaccurate if the sample size is small.
b) From OLS residuals
 
Regress OLS residuals  t on  t  1 with out a constant term:
 
 t   t  1  ut

An estimate of  is the estimated coefficient of  t  1 . 23
Autocorrelation
C) Durbin’s method
Run the regression of Yt on Yt  1 , X t and X t  1 :
Yt    Yt  1   X t   X t  1  ut
An estimate of  is the estimated coefficient of Yt  1.

Illustrative Example
The following data is on investment and value of outstanding shares for
the yeas 1935-1953.

24
Autocorrelation
value outstanding
Year Investment (Y) Share(X)
1935 317.6 3078.5
1936 391.8 4661.7
1937 410.6 5387.1
1938 257.7 2792.2
1939 330.8 4313.2
1940 461.2 4643.9
1941 512 4551.2
1942 448 3244.1
1943 499.6 4053.7
1944 547.5 4379.3
1945 561.2 4840.9
1946 688.1 4900.9
1947 568.9 3526.5
1948 529.2 3254.7
1949 555.1 3700.2
1950 642.9 3755.6
1951 755.9 4833
1952 891.2 4924.9
1953 1304.4 6241.7
25
Autocorrelation
• The estimated regression equation of Y on X is: The F-statistics is significant at the 1% level. This
indicates that the model is adequate.

26
Autocorrelation
• AC diagnostics
• Although the model passes the ANOVA test, we plot the estimated
residuals against time and look for some model misspecifications.
• The graph (scatter plot) of the estimated disturbances (residuals) is
shown below. We can see a clustering

of neighboring residuals on
one or the other side of the line  t 0 .
• This might be a sign that the errors are autocorrelated. However, we
do not make a final judgment until we apply formal tests of
autocorrelation.

27
Autocorrelation

28
Autocorrelation
• The DW test statistic is equal to d=0.553. At the 5% level of
significance, the Durbin Watson critical values ( for T=19) are
d L 1.180 and d u 1.401.Since the test statistic is less than
d L , we reject H o :  0

We can also test for error AC using the Breusch-Godfrey


 (BG) test.
t
• We first apply OLS and obtain residuals .
R2
• We then
 run the following auxiliary
 regression and obtain :
 t    X t  1  t  1  t
The SPSS output is shown below:
29
Autocorrelation

• Model Summary
R R Square Adjusted Std. Error of
R Square The estimate
.835a .698 .658 137.99705
a. Predictor: (Constant), lagged residual, valsh

the BG test statistics is : (T  p ) R 2 (19  1)(0.698) 12.564.The tabulated


value from the  2 distribution with p 1degree of freedom and 5% level of
significance is 3.841.Since the calculated test statistic exceeds the critical value
, we reject the null hypothesis H o : 1 0 and conclude that there is error AC.

30
Autocorrelation
• Another method of detecting error AC is using the partial
autocorrelation function (PACF) of the residuals. This Shows below.
We can see that the PACF at lag one is outside the confidence limits.
This is an indication that the errors follow the AR(1) process.

31
Autocorrelation
• All the test indicate that there is error AC. Thus , we need to apply the
Cochrane-Orcutt transformation.
  
To obtain an estimate  of  , we regress the OLS residuals  t on  t  1
without a constant term.
This gives the following result:


Thus , an estimate of  is given by  0.805
32
Autocorrelation
• The Cochrane-Orcutt transformation
• We apply the following (Cochrane-Orcutt )transformation:
Yt  Yt  1  (1   )   ( X   X t  1 )  ( t   t  1 )
                
Yt *  * X t
* ut

Yt *  *  X t *  ut .............(*)

Note that equation(*) fulfils all basic assumptions and, thus, we can
estimate the parameters in this equation by an OLS procedure. Using
, we obtain Yt* (traninv) and Xt*(tranvalsh) and

 0.805
estimate the regression of Yt* on Xt*.

33
Autocorrelation
• The results are (SPSS output):

34
Autocorrelation
• The partial autocorrelation function of the residuals in the
transformed model is shown below it can be seen that the function
lies within the upper and lower confidence limits, indicating that the
autocorrelation structure has been properly dealt with.

35
Heteroscedasticity
• This assumption tells us that the error variance remains not constant
for all observations.
• But there are many situations in which this assumption may not hold.
For example, the variance of the error term may increase or decrease
with the dependent variable or one of the independent variables. Under
such circumstances, we have the case of heteroscedasticity.
• Heteroscedasticity often occurs in cross-sectional data.
• Under Heteroscedasticity, the OLS estimators of the regression
coefficients are not BLUE and efficient. Generally, under
Heteroscedasticity we have the following:
1. The OLS estimators of the regression coefficients are still unbiased
and consistent.
2. The estimated variances of the OLS estimators are baised and the
conventionally calculated confidence intervals and test of 36
Heteroscedasticity
Test of Heteroscedasticity
1. Inspection (graphic method)
• It may be generally a rule rather than expectation to test
heteroscedasticity in cross sectional data.
• The graph of the square of residuals against the dependent variable
give a rough indication of the existence of heteroscedasticity.
• If there is a systematic trend in the graph, it may indicate the presence
of heteroscedasticity (Figures b, c, d and e) and (fig a) has no a
systematic trend in the graph.

37
Heteroscedasticity

38
Heteroscedasticity

However, there are formal ways of detecting heteroscedasticity

Formal tests:

a) White’s general test


b) Goldfeld-Quandt test
c) Breusch-Pagan-Godfrey test (BP)

39
Heteroscedasticity
• White’s test
This test involves applying OLS to:
 2
 i
0  1Z1i  2 Z 2i  ...  p Z pi  ui

2
And calculate the coefficients of determination, R w where  are
i

OLS residuals from the original model. The null hypothesis is:
Ho : 0 1 ...  p 0
The test statistics is:
 2
cal
2
nR w

40
Heteroscedasticity
• Decision rule: Reject H0 ( the hypothesis of homoscedasticity) if the
above test statistic exceeds the value from the chi-square distribution
with p degree of freedom for a given level of significance α.

 2
cal    ( p)
2

Goldfeld-Quanted test
• Suppose we have a model with one explanatory variable X1 and Let Y
be the dependent variable. The step in this test are the following.
(a) Arrange the observations in to three parts: n1 observetions in the
first part, p observations in the middle part, and n2 observations in the
second part (n1+n2+p=n). Usually p is taken to be one-sixth of n.
41
Heteroscedasticity
c) Run regression on the first n1, observations, obtained residuals , 1i
and calculate the residual variances
n1 
 1i

S12  i 1

n1  2

Similarly run a regression on thesecond n2 observation , obtainthe residual  2i ,
n1 

 2i

, and calculate S 2 2  i 1
n2  2
d ) calculate the test statistics :
s2 2
Fcal  2
s1 42
Heteroscedasticity

• Decision rule: Reject the null hypothesis Ho :   2


2 2
1 ( and
conclude that the errors are heteroscedastic) if:
Fcal  F (n1  2, n2  2)
where Fcal  F (n1  2, n2  2) is the critical value from the F  distribution
with n1  2 and n2  2 degree of freedom.

43
Heteroscedasticity
• Breusch –Pagan Test
• This involves applying OLS to:

i2

0  1 X 1i  2 X 2i  ...  k Z ki  ui
2
and calculatethe Re gression Sum of Square ( RSS ).The test statistics is :
RSS
 cal 
2

2
Decision rule :Reject the Ho(the null hypothesis) of homoscedasticity :
1 2 ... k 0 if :
 cal 2   2 ( K )
Where  2 ( K )is the critical value from the Chi- square distribution with K degree of freedom
fora given value of  . 44
Heteroscedasticity

Correction for heteroscedasticity


• If we are sure that there is a genuine heteroscedasticity problem, we
can deal with the problem using:
— Heteroscedasticity-robust statistics after estimation by OLS.
— Weighted least squares (WLS) estimation.

45
Heteroscedasticity
Illustrative examples: Consider the following data on consumption
expenditure (Y) and income (X) for 20 households ( both in thousands of
Dollars).

46
Cont….
• Apply OLS we get the following results (SPSS output).

R 2 0.989, F 1262.637( P  value  0.001) 47



Heteroscedasticity
• A plot of the residual  i against the values of the expanatory
variable Xi is show below.

48
Heteroscedasticity
• It can clearly be seen that the scatter of the residuals (i.e the variance
of the residuals) increases with Xi. This is an indication of a
heteroscedasticity problem. However, we should not come to
conclusion until we apply formal test of the hypothesis of
homoscedasticity.
1. Goldfeld-Quandt test
In order to apply this test, we should first order the observations based on
the absolute magnitude of the explanatory variable X. We divide the data
into three parts: n1=8 ,p= 4 and n2= 8.
-to increase the power of test we drop the middle p=4 residuals. We then
run a separate regression on the first and the second parts and calculate
the residual variance for each of the two parts
49
Heteroscedasticity
Therefore,
2 2
s1 0.316 and s2 3.383
Calculate the Goldfeld-Quandt test statistics as:

2
s2 3.383
Fcal  2
 10.706
s1 0.316
for  0.05, F ( n1  2, n2  2) F 0.05(6, 6) 4.28

see statistics table of Gujirat (2009): Basic Econometrics book page 880
Decision: Since Fcal=10.706 is greater than the tabulated value, we reject the null hypothesis of
homoscedasticity at the 5% significance level.
Ho: homoscedasticity
HA: heteroscedasticity
50
Heteroscedasticity
2. Breusch-Pagan test
This involves applying OLS to:
Decision rule:Reject the Ho(the null hypothesis)of homoscedasticity:
i2
0  1 X i  ui
 1 2 ... k 0 if :
2

where 

2
1.726 cal 2   2 (K )

and computing the regression sum of square (RSS). The OLS result
indicates that RSS=12.132. the Breusch-Pagan test is then
12.132
 cal 2  RSS
2  6.066
2
For  0.05, the critical value is  2 (k )  2 0.05 (1) 3.841
51
Heteroscedasticity

•OR
 2
• Compute Variable …..square of residual (sqresid)…. (  )
In numeric expression
RES_1* RES_1
Analyze or estimate
 2
 o  1 X i  U

Go to ANOVA table
If P-value > 5%........... Homoscedasticity
If P-value < 5%............Hetroscedasticity
52
Heteroscedasticity
• Decision: We reject the null hypothesis of homoscedasticity at the 5%
significance level and the data is heteroscedasticity.

• and the data is heteroscedasticity

53
Heteroscedasticity

3. The White test


 2
This involves applying OLS to:   0   X   X  Ui
i 1 i 2 i
2

2
And computing the coefficient of determination w .This yields w 0.878
R 2
R
The White test statistics is:
 cal 2 nR 2 w 20(0.878) 17.56
2

We compare this value with  ( p) for a given level of significance α.
For α=0.05,  2 0.05 (2) .5.991
Decision: Since 17.56 is greater than tabulated value, we reject the
cal
2

null hypothesis of homoscedasticity at the 5% level of significance.

54
Heteroscedasticity
• Weighted Least Square (WLS)
• All of the test indicates that the disturbance are heteroscedastic. Thus,
the regression coefficient obtained by OLS are not efficient.
• In such cases, we have to apply Weighted Least Square (WLS). One
method of correcting for heteroscedasticity is based on the
assumption that the variance of the disturbances is positively
associated with level of income X, that is,
 i 2  2 X i 2
the model we are going to estimate is then:
Yi  1   Xi  i
       
Xi  X i   X i  Xi
Yi  1  i
      
Xi  X i  Xi
 Yi *    X i *  i * 55
Heteroscedasticity
• This simply means that we apply OLS by regressing Yi/Xi on 1/Xi. The
SPSS output is shown below.

• The model is adequate as judged by the F-test at the 5% level of


significance. The estimated model is shown below

56
Heteroscedasticity

Note that the estimated constant term and slop from the transformed
model correspond to the value of estimated β and α respectively. Thus
the estimated model is
Yi 0.612  0.910 X i
(2.297) (52.624)

• A plot of the residual from the transformed is shown below. The plot
does not indicate any increasing or decreasing pattern in the scatter
of the residuals

57
Heteroscedasticity

58
Classical Normal Regression Model: Testing Normality

59
Th
eE
nd
60

You might also like