Topic 1 Wble

1.
Relaxing the assumptions

of the classical model
A. Multicollinearity , B. Heteroscedastic [Week 1]
C. Autocorrelation, D. Model specification Errors [Week 2]
Assumptions of the Classical Linear

Regression Model
The regression model is linear in the parameters.
X values are fixed in repeated sampling, or Zero covariance between ui and
Xi.
Zero mean value of disturbance terms, ui.
Homoscedasticity or equal variance of ui.
No autocorrelation between the disturbances.
The number of observations n must be greater than the number of
parameters to be estimated.
no multicollinerity
variability in X values
Model are correctly specified, so there is no specification problems
Gauss Markov Theorem

Given the assumptions of the classical linear regression model,
the least squares estimator, in the class of unbiased linear
estimators, have minimum variance, that is, they are BLUE.
(i) It is linear, that is, a linear function of a random variable,
such as the dependent variable Y in the regression model.
(ii) They are unbiased, that is, its average or expected value is equal to true
value
(iii) They have minimum variance. Combined with (i), an unbiased estimator
with the least variance means that they are minimum-variance unbiased,
or efficient estimators.
A. Multicollinearity
Existence of linear relationship among some or
all explanatory variable:
(1) Perfect Multicolinearity
(2) Imperfect Multicolinearity
In practice, exact linear relationship(s) among

regressors is a rarity, but in many applications
the regressors may be highly correlated.
4
Multicollinearity: consequences
1. Estimates will be still unbiased.
2. Standard errors of the estimates increases
3. t-statistics are small
4. Goodness of fit is not diminished.
5. Estimates vary widely if the specification model is
changed.
6. Slope estimates for independent variables without
multicollinerity are not seriously affected.
Detection of Multicollinearity
(1) High R2 but few significant t-ratio
(2) High pair-wide correlation among xs
(3) Variance inflation Factor, VIF = (1/1-R2aux)
(4) Tolerance (TOL) factors = (1/VIF)
Treating multicollinerity
1.
2.
3.
4.
leave the model alone

eliminate an independent variable
redesign the model
increase the sample size
Example: consumption-income
Assuming consumption expenditure is linearly related to income
and wealth, we obtain the following regression:
1.
Regression estimates
2.
Correlation matrix
Do you think multicollinearity is a

serious problem here?
Should we drop income/wealth?
10
B. Heteroscedasticity
Violation of the assumption of classical linear
regression model
2
E (ei ) = i
11
Heteroscedasticity: nature
Causes:
1. Error learning model: Learning behaviors across
time
2. Model misspecification - omitted variable or
improper functional form.
3. Outlier.
4. Skewness in the regressors.
5. Changes in data collection or definitions.
12
Heteroscedasticity: consequences
's
unbiased and consistent
no longer the best estimator.

They are not BLUE (not minimum variance - hence
not efficient).
The estimator variances

not asymptotically efficient
So confidence intervals, t, F, are invalid.
13
Detecting heteroscedasticity
Nature of problems
informal
Graphical
Detection
of
Heteroscedasticity
Park test
formal
Glejser test
Breusch-Pagan test
White test
14
Heteroskedasticity: remedies
Weighted Least Squares

(when i2 are known)
Remedial
measures
Whites Heteroscedasticity-consistent
Variances
and Standard Error
(when i2 is not known)
15
Example: Child mortality

Consider the cross sectional data for 64 countries on
child mortality and a few other variables given in Table
1.
We are interested to find out the determinant of Child
mortality.
CM = f(PGNP, FLR)
Since the data are cross sectional, involving diverse
countries with different child mortality experiences, it is
likely that we might encounter heteroscedasticity
problem.
16
Table 1: Child Mortality and other Data for 64 countries
17
Regression model
18
Graphical analysis
10,000
100
8,000
R
E
S
ID
_S
Q
75
50
25
6,000
4,000
2,000
4,000
8,000
12,000
16,000
20,000
PGNP
-25
10,000
-50
R
E
S
ID
_S
Q
8,000
-75
-100
5
10
15
20
25
30
35
40
CM Residuals
45
50
55
60
6,000
4,000
2,000
0
0
20
40
60
80
100
FLR
19
Alternative Heteroscedaticity tests
Park Test
Glejser Test
Breush Pagan
White Heteroscedaticity
test
The White test allows for

nonlinearities by using
squares and crossproducts of
all the xs
Still just using an F or LM to
test whether all the xj, xj2,
and xjxh are jointly significant
The White test is the most
general test for
heteroskedasticity.
20
Whites heteroscedasticity test
21

Back to example
Applying whites heteroscedastic test

with cross-product term, we have
Let us return to the child mortality

example, we obtain
CMi = 263.6416 -0.006PGNPi
2.2316 FLRi+ui
What we need?
ui2 = 0 + 1PGNi +2PGNi2
+3(PNGi*FLRi)+4FLRi+5FLRi2
+ei
R2 = 0.0386
22
23
1. Relaxing the assumptions

of the classical model
A. Multicollinearity , B. Heteroscedastic [Week 1]
C. Autocorrelation, D. Model specification Errors [Week 2]
24
C. Autocorrelation
The Classical Linear Regression Model (CLRM)
assumes that the disturbance term is not
autocorrelated, i.e. E (ut us) = 0 for t s.
The term autocorrelation can be defined as
correlation between member of observations
order in time (as in time series data) or space (as
in cross sectional data).
25
Pure autocorrlation
Pure autocorrelation occurs when classical assumption,
which assumes uncorrelated observations of the error
term, is violated in a correctly specified equation.
Exist in the form of first order autocorrelation and
higher order of autocorrelation
26
First-order of Autocorrelation, AR(1):

Yt = 0 + 1 X1t + ut
If E (ut us) 0
and if
t = 1,,T
where t s
ut = ut-1 + et
where
-1 < < 1
(
: RHO)
and
et ~ iid (0, e2)
(white noise)
This scheme is called first-order autocorrelation and denotes as AR(1)
27
Positive Autocorrelation
Positive Autocorrelation is indicated by a cyclical residual plot over time.
28
Negative Autocorrelation
ut
+
u t
u t 1
Time
Negative autocorrelation is indicated by an alternating pattern where the

residuals cross the time axis more frequently than if they were distributed
randomly
29
No pattern in residuals
No autocorrelation
ut
+
u t
u t 1
No pattern in residuals at all: this is what we would like to see
30
Higher order of autocorrelation

If ut = 1 ut-1 + et
it is AR(1), first-order autoregressive
If ut = 1 ut-1 + 2 ut-2 + et
it is AR(2), second-order autoregressive
If ut = 1 ut-1 + 2 ut-2 + 3 ut-3 + et
it is AR(2), third-order autoregressive
.
High order
autocorrelation
If ut = 1 ut-1 + 2 ut-2 + + n ut-n + et

it is AR(n), nth-order autoregressive
31
Impure autocorrelation
By impure autocorrelation we mean serial correlation
that is caused by a specification error such as omitted
variable or an incorrect functional form.
Pure autocorrelation is caused by underlying
distribution of the error term of the true specifications
of an equation (which cannot be changed by the
researcher), impure serial correlation is cased by a
specification error that often can be corrected.
32
1.2. Consequences of autocorrelation

The estimators remain unbiased and consistent. E(u)=0
ensures the unbiasedness of the estimator.
The estimators will be inefficient in the sense that we will be
able to obtain estimators with lower variance.
False inferences about the significance of the parameters.
The usual OLS formula for calculating the variance of the estimators
is incorrect.
Variance of estimators will be underestimated.
Larger t-statistic.
Variables which are not important/insignificant may be considered as
significant.
33
1.3. Detecting Autocorrelation

1. Graphical Method
2. Durbin-Watson d Test (for first order
autocorrelation only)
3. Breusch-Godfrey LM test (for higher order
autocorrelation)
34
Durbin-Watson d Test
Steps involved in the Durbin-Watson Test
1. Run the OLS and obtain the residuals t.
2. Compute the d statistics.
n
(
d =
t 1 ) 2
t=2
n
t2
or
n
t t 1
t=2
and d 2(1 - )
t2
1
35
3.
Compute the DurbinWatson critical value.

H0: No first order autocorrelation
H1: First order autocorrelation
36
d lies between 0 and 4
d = 2 implies residuals uncorrelated
D-W provide upper and lower bounds for d

if d < dL then reject null of no serial correlation
if d > dU then reject null hypothesis of no serial correlation
if dL< d < dU then test is inconclusive
37
Assumption underlying the d statistics:
1.
2.
3.
4.
5.
Regression model includes an intercept term

Regressors fixed in repeating sampling
Error term follow AR(1) scheme
Error term is normally distributed
Regressors do not include the lagged dependent
variables
38
Lagrange Multiplier (LM) Test or called Durbins m test

Or Breusch-Godfrey (BG) test of higher-order autocorrelation
Test procedure
1.
Run OLS and obtain the residuals t.
2.
Run t against all the regressors in the model

plus the additional regressors, t-1, t-2, , tp.
t = 0 + 1 Xt + t-1 + t-2 + + t-p + u
Obtain the R2 value from this regression.
3.
compute the BG-statistic: (n-p)R2
4.
compare the BG-statistic to the 2p (p is #

of degree-order)
5.
If BG > 2p, reject Ho, it means there is a

higher-order autocorrelation
If BG < 2p, not reject Ho, it means there
is a no higher-order autocorrelation
39
1.4. The Remedies

The place to start in correcting an autocorrelation
problem is to look carefully at the specification of
the equation for possible errors that might be
causing impure serial correlation.
Is the functional form correct? Are you sure there
are no omitted variables? In particular, are there
specification errors that might have some pattern
over time that could introduced impure serial
correlation?
40
The Remedies
Only after the specification of the question has
been reviewed carefully should the possibility of
an adjustment for pure serial correlation be
considered.
If you concluded that you have pure serial
correlation, the appropriate response is to
consider the application of Generalized Least
Squares or Newey-West standard errors.
41

Durbin-Watson d test
42
43
Lagrange Multiplier (LM) Test or called Durbins m test

Or Breusch-Godfrey (BG) test of higher-order autocorrelation
44
D. Econometric modelling: Model

specification & diagnostic testing
The assumption of the CLRM that econometric
model used in the analysis is correctly specified
has two meaning.
1. no equation specification errors
2. no model specification errors.
Here, the major focus was on equation
specification errors
45
Types of specification Errors

1. Omission of a relevant variable(s)
2. Inclusion of an unnecessary variable(s)
3. Adopting the wrong functional form
4. Errors of measurement
5. Incorrect specification of the stochastic
error term
6.Assumption that the error term is normally
distributed
46
1. If a relevant variable is omitted

o
The OLS estimators of the variables retained in

the model are not only biased but inconsistent as
well.
The variances and standard errors of these
coefficients are incorrectly estimated, thereby
vitiating the usual hypothesis testing procedures.
47
48
2. If the model is over fitted

o
o
o
The estimators of the coefficients of the relevant

as well as irrelevant variables remain unbiased as
well as consistent.
The error variance remains correctly specified.
The only problem is that the estimated variances
tend to be larger than necessary, thereby making
for less precise estimation of the parameters.
That is, the confidence interval tend to be larger
than necessary.
49
In practice
(a) For the presence of unnecessary variable
o The t test , the F test
(b) For the omission of variable (s)
o Look for model adequacy (R bar Square., t-ratios,
signs of estimated coefficient, DW)
o If found not adequate, this may be attributed to
one of the following: omission of a relevant
variable, the use of a wrong functional form,
presence of serial correlation, etc .
50
To detect equation specification errors

In practice we considered several tests, such as
(1)Examination of residuals
(2)The Durbin Watson/autocorrelation test
(3)The Ramseys RESET test
(4)The Langrange multiplier test (alternative to
RESET test)
51

RESET stands for Regression Specification Error Test
and was proposed by Ramsey (1969).
RESET is a general test for the following types of
specification errors:
52

RESET stands for Regression Specification
Error Test and was proposed by Ramsey (1969)
a general test for the following types of
specification errors:
Omitted variables
Incorrect functional form
Correlation between X and , which may be caused by
measurement error in X, simultaneous equation
considerations, combination of lagged y values and serially
correlated disturbances.
53
Essentially the method works by adding

higher order terms of the fitted values (e.g.
etc.) into an auxiliary regression:
Obtained the R2 and let it be R2new and

perform a partial F test to find out if the
increase in R2 is statistically significant,
If the computed F values is significant, say

at 5 %, one can reject the null hypothesis
implying that the model is mis-specified.
54
RESET Test:
55
4.Erros of measurement
If there are errors of measurement
in the regressand only, OLS
estimators are unbiased as well as
consistent but the are less efficient
(estimated variances are now larger
than in the case where there are no
such errors of measurement).
If there are errors of measurement
in the regressors, OLS estimators
are biased as well as inconsistent.
The remedies are often not easy:

one solution is to use instrumental
or proxy variables, although highly
correlated with the original X
variables, are uncorrelated with the
equation and measurement errors.
In practice, researchers need to be
careful in stating the sources of
his/her data, how they were
collected, what definitions were
used etc.
56
5. Incorrect spevification of the

stochastic error term
A common problem facing a
researcher is the specification of
the error term that enters the
regression model.
Since error term is not directly
observable, there is no easy way to
determine the form in which it
enter the model, for example
correct model:
Estimated model:
57
6.Non-Normal Errors
CNLRM assumed that the error
term follows the normal
distribution.
Invoked the central limit theorem
(CLT) to justify the normality
assumption.
Because of CLT: we able to
established that OLS estimators are
also normally distributed.
As a results, we able to perform
hypothesis testing using t/F tests
regardless of the sample size.
Jarque-bera test can be used to
detect whether the residuals are
normally distributed.
What if we cannot maintain the

normality assumption? Two choice.
1. bootstrapping.
2. invoke large, asymptotic, sample
theory.
58

Topic 1 Wble

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Topic 1 Wble

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 1 Wble

Uploaded by

Copyright:

Available Formats

1.

Relaxing the assumptions

Assumptions of the Classical Linear

Gauss Markov Theorem

In practice, exact linear relationship(s) among

leave the model alone

Do you think multicollinearity is a

Should we drop income/wealth?

no longer the best estimator.

The estimator variances

Weighted Least Squares

Example: Child mortality

Table 1: Child Mortality and other Data for 64 countries

Alternative Heteroscedaticity tests

The White test allows for

Whites heteroscedasticity test

Whites heteroscedasticity test

Applying whites heteroscedastic test

Let us return to the child mortality

Whites heteroscedasticity test

1. Relaxing the assumptions

First-order of Autocorrelation, AR(1):

et ~ iid (0, e2)

This scheme is called first-order autocorrelation and denotes as AR(1)

Positive Autocorrelation is indicated by a cyclical residual plot over time.

Negative autocorrelation is indicated by an alternating pattern where the

No pattern in residuals at all: this is what we would like to see

Higher order of autocorrelation

If ut = 1 ut-1 + 2 ut-2 + + n ut-n + et

1.2. Consequences of autocorrelation

1.3. Detecting Autocorrelation

Compute the DurbinWatson critical value.

D-W provide upper and lower bounds for d

Regression model includes an intercept term

Lagrange Multiplier (LM) Test or called Durbins m test

Run OLS and obtain the residuals t.

Run t against all the regressors in the model

compute the BG-statistic: (n-p)R2

compare the BG-statistic to the 2p (p is #

If BG > 2p, reject Ho, it means there is a

1.4. The Remedies

Example: Child mortality

Lagrange Multiplier (LM) Test or called Durbins m test

D. Econometric modelling: Model

Types of specification Errors

1. If a relevant variable is omitted

The OLS estimators of the variables retained in

Example: Child mortality

2. If the model is over fitted

The estimators of the coefficients of the relevant

To detect equation specification errors

To detect equation specification errors

To detect equation specification errors

To detect equation specification errors

Essentially the method works by adding

Obtained the R2 and let it be R2new and

If the computed F values is significant, say

The remedies are often not easy: