0% found this document useful (0 votes)
63 views

Regression Problems (Practical)

This document discusses problems that can arise in regression analysis due to violations of assumptions. It covers multicollinearity, which occurs when independent variables are highly linearly related. When multicollinearity is present, it can cause regression coefficients to be poorly estimated with large variances and low precision. The document discusses ways to detect multicollinearity, such as by examining correlation coefficients between variables and calculating variance inflation factors. Remedies discussed include dropping collinear variables or increasing the sample size.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Regression Problems (Practical)

This document discusses problems that can arise in regression analysis due to violations of assumptions. It covers multicollinearity, which occurs when independent variables are highly linearly related. When multicollinearity is present, it can cause regression coefficients to be poorly estimated with large variances and low precision. The document discusses ways to detect multicollinearity, such as by examining correlation coefficients between variables and calculating variance inflation factors. Remedies discussed include dropping collinear variables or increasing the sample size.

Uploaded by

APPIAH ELIJAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

11/21/2022

PROBLEMS IN
REGRESSION
WITH

HELLO!
2

I am Elijah Appiah from


Ghana.
I am an Economist by
profession.
I love everything about R!

secret behind the smile! You can reach me:


[email protected]

1
11/21/2022

Presentations on Inference

Lesson Goals
1. Undertand the concepts of:
◦Multicollinearity
◦Heteroscedasticity
◦Autocorrelation

2
11/21/2022

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

3
11/21/2022

Lesson Goals
1. What is the nature of multicollinearity?
2. Is multicollinearity really a problem?
3. What are its practical consequences?
4. How does one detect it?
5. What remedial measures can be taken to alleviate the problem
of multicollinearity?

Multicollinearity
• Assumption: There is no exact collinearity
between the independent variables.
𝟎 𝟏 𝟐
Violation of this
assumption leads to No exact linear
the problem of relationship
MULTICOLLINEARITY

4
11/21/2022

Multicollinearity - Nature
• Perfect (Exact) and Imperfect (Inexact) Multicollinearity
𝑿𝟏 𝑿𝟐 𝑿𝟑
5 25 28
12 60 60
8 40 47 Add some random numbers
15 75 76 𝑿𝟑𝒊 : 3, 0, 7, 1, 2, 5
3 15 17
20 100 105

Multicollinearity - Nature
• Perfect (Exact) and Imperfect (Inexact) Multicollinearity
𝑿𝟏 𝑿𝟐 𝑿𝟑
5 25 28
12 60 60 Add some random numbers
8 40 47 𝑿𝟑𝒊 : 3, 0, 7, 1, 2, 5
15 75 76
So, how do we examine the relationship
3 15 17 between these variables?
20 100 105 Correlation Coefficient (𝒓)

5
11/21/2022

11

Now, let’s practice

Multicollinearity - Nature
Code Correlation Relationship
Coefficient (r) (Collinearity)
cor(X1, X2) 1 Perfect
cor(X1, X3) 0.9967 Imperfect
cor(X2, X3) 0.9967 Imperfect
X1 X2 X3
Correlation X1 1.0000000 1.0000000 0.9967306
Matrix X2 1.0000000 1.0000000 0.9967306
X3 0.9967306 0.9967306 1.0000000

6
11/21/2022

Multicollinearity - Nature
𝑪𝒐𝒏𝒔𝒖𝒎𝒑𝒕𝒊𝒐𝒏𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑰𝒏𝒄𝒐𝒎𝒆𝒊 + 𝜷𝟐 𝑾𝒆𝒂𝒍𝒕𝒉𝒊 + 𝒖𝒊

• Multicollinearity is a question of degree and not of


kind. The problem is with the degree of
multicollinearity.
• The “Do Nothing” school of thought says:
“Multicollinearity is God’s phenomenon.”

Multicollinearity - Nature

28 O. J. Blanchard, Comment, Journal of Business and Economic Statistics, vol. 5, 1967, pp. 449–451.
The quote is reproduced from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p.
190.

7
11/21/2022

Multicollinearity - Effects
• The estimators are still BLUE, but have large
variances and low precision.
𝟐
𝒊

If true variance is; 𝟐

Multicollinearity reveals
Inflated variance; 𝒊

Multicollinearity - Effects
• Confidence intervals widen leading to the
acceptance of “zero null hypothesis”.
• High R-squared value.

8
11/21/2022

Multicollinearity - Effects
• Confidence intervals
widen leading to the
acceptance of “zero null
hypothesis”.

Multicollinearity - Effects
If multicollinearity is perfect the regression coefficients
of the X variables are indeterminate and their standard
errors are infinite.
If multicollinearity is less than perfect, the regression
coefficients, although determinate, possess large
standard errors (in relation to the coefficients
themselves),which means the coefficients cannot be
estimated with great precision or accuracy.

9
11/21/2022

Multicollinearity - Detection
• High overall 𝟐 but few insignificant t-ratios of
𝒊 ’s.
• High correlation between 𝒊 ’s.
• Variance Inflation Factor

𝟐
𝑿𝒊

Multicollinearity - Detection
• Variance Inflation Factor

𝟐
𝑿𝒊

Tolerance
𝟐
𝑿𝒊

10
11/21/2022

Multicollinearity - Detection
𝟏
𝑽𝑰𝑭 =
𝟏 − 𝑹𝟐𝑿𝒊
𝟎 𝟏 𝟐
Auxiliary Regression:
𝟎 𝟏
Obtain 𝑹𝟐𝒙 from this auxiliary regression.

Multicollinearity - Detection
𝟏
𝑹𝟐𝑿𝒊 = 𝟎, 𝑽𝑰𝑭 = =𝟏
𝟏−𝟎
𝟏
𝑹𝟐𝑿𝒊 = 𝟎. 𝟓, 𝑽𝑰𝑭 = =𝟐
𝟏 − 𝟎. 𝟓
𝟏
𝑹𝟐𝑿𝒊 = 𝟎. 𝟗, 𝑽𝑰𝑭 = = 𝟏𝟎
𝟏 − 𝟎. 𝟗
𝟏
𝑹𝟐𝑿𝒊 = 𝟏, 𝑽𝑰𝑭 = = 𝑰𝒏𝒇
𝟏−𝟏

11
11/21/2022

Multicollinearity - Remedies

• Drop variable causing


multicollinearity.
• Increase sample size.

Multicollinearity in R
Packages: performance, lmtest

12
11/21/2022

25

Now, let’s practice

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

13
11/21/2022

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

Heteroscedasticity
• Assumption: The variance of the error terms, 𝒊,
is equal or homoscedastic.
• Assumption violation leads to the problem of
HETEROSCEDASTICITY / HETEROSKEDASTICITY.

14
11/21/2022

Heteroscedasticity - Nature

Homoscedasticity Heteroscedasticity

Heteroscedasticity - Nature

15
11/21/2022

Heteroscedasticity - Nature

Heteroscedasticity - Effects
• Estimates of regression coefficients are still LINEAR and
UNBIASED but …
• … no longer BEST (minimum variance).
• Estimate of true variance, 𝝈𝟐 , is biased and the
direction of bias is unknown. Thus, estimates of
coefficients also biased.
• T-test and F-test become unreliable.

16
11/21/2022

Heteroscedasticity - Detection
Specific Tests
• Park Test
• Glejser Test
• Goldfeld-Quandt Test
General Tests
• White’s Test
• Breusch-Pagan Test

Heteroscedasticity - Detection
• These tests follow the hypothesis:

• Rejecting the null hypothesis imply that the problem of


heteroscedasticity is present.

17
11/21/2022

Heteroscedasticity - Remedy
• Take logs (compress data into smaller
values)
• Weighted Least Squares
• Revisit the model (there might be a
misspecification)

Heteroscedasticity in R
Packages: performance, lmtest

18
11/21/2022

37

Now, let’s practice

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

19
11/21/2022

Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.

Autocorrelation - Nature
• Assumption: No autocorrelation between
disturbances. [ 𝒊 𝒋 𝒊 𝒋 𝒊 𝒋 ]
• Assumption violation leads to the problem of
AUTOCORRELATION.
• What happens if the error terms are correlated?
We’ll find out soon!

20
11/21/2022

Autocorrelation - Nature
• No autocorrelation means: THERE SHOULDN’T BE
ANY CLEAR PATTERN BETWEEN THE RESIDUALS!
• Autocorrelation and Serial Correlation are treated
synonymously.

Autocorrelation - Nature

21
11/21/2022

Autocorrelation - Effects
• Estimates of regression coefficients are still LINEAR
and UNBIASED but …
• … no longer BEST (minimum variance).
• Estimate of true variance, 𝝈𝟐 , is biased and the
direction of bias is unknown. Thus, estimates of
coefficients also biased.
• T-test and F-test become unreliable.

Autocorrelation - Detection
• Graphical Method of Residuals
• Durbin-Watson (DW) Test
• Breusch-Godfrey (BG) Test
Test Hypothesis
𝟎
𝟏

22
11/21/2022

Autocorrelation - Remedies
• Revisit the model (maybe, model is misspecified)
• Use Newey West method (an extension of White’s
heteroscedasticity test) to obtain standard errors
corrected for autocorrelation.
• Generalized Least Squares
• HAC (Heteroscedasticity-Autocorrelation consistent)
standard errors

Autocorrelation – Detection & Remedies

performance package, coeftest package


check_model(),
check_autocorrelation()

lmtest::bgtest()
Lmtest::dwtest()

23
11/21/2022

47

Now, let’s practice

Thank You!
Any question?
Email:
[email protected]

LinkedIn:
https://www.linkedin.com/in/appiah-elijah-383231123/

24

You might also like