Regression Problems (Practical)
Regression Problems (Practical)
PROBLEMS IN
REGRESSION
WITH
HELLO!
2
1
11/21/2022
Presentations on Inference
Lesson Goals
1. Undertand the concepts of:
◦Multicollinearity
◦Heteroscedasticity
◦Autocorrelation
2
11/21/2022
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
3
11/21/2022
Lesson Goals
1. What is the nature of multicollinearity?
2. Is multicollinearity really a problem?
3. What are its practical consequences?
4. How does one detect it?
5. What remedial measures can be taken to alleviate the problem
of multicollinearity?
Multicollinearity
• Assumption: There is no exact collinearity
between the independent variables.
𝟎 𝟏 𝟐
Violation of this
assumption leads to No exact linear
the problem of relationship
MULTICOLLINEARITY
4
11/21/2022
Multicollinearity - Nature
• Perfect (Exact) and Imperfect (Inexact) Multicollinearity
𝑿𝟏 𝑿𝟐 𝑿𝟑
5 25 28
12 60 60
8 40 47 Add some random numbers
15 75 76 𝑿𝟑𝒊 : 3, 0, 7, 1, 2, 5
3 15 17
20 100 105
Multicollinearity - Nature
• Perfect (Exact) and Imperfect (Inexact) Multicollinearity
𝑿𝟏 𝑿𝟐 𝑿𝟑
5 25 28
12 60 60 Add some random numbers
8 40 47 𝑿𝟑𝒊 : 3, 0, 7, 1, 2, 5
15 75 76
So, how do we examine the relationship
3 15 17 between these variables?
20 100 105 Correlation Coefficient (𝒓)
5
11/21/2022
11
Multicollinearity - Nature
Code Correlation Relationship
Coefficient (r) (Collinearity)
cor(X1, X2) 1 Perfect
cor(X1, X3) 0.9967 Imperfect
cor(X2, X3) 0.9967 Imperfect
X1 X2 X3
Correlation X1 1.0000000 1.0000000 0.9967306
Matrix X2 1.0000000 1.0000000 0.9967306
X3 0.9967306 0.9967306 1.0000000
6
11/21/2022
Multicollinearity - Nature
𝑪𝒐𝒏𝒔𝒖𝒎𝒑𝒕𝒊𝒐𝒏𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑰𝒏𝒄𝒐𝒎𝒆𝒊 + 𝜷𝟐 𝑾𝒆𝒂𝒍𝒕𝒉𝒊 + 𝒖𝒊
Multicollinearity - Nature
28 O. J. Blanchard, Comment, Journal of Business and Economic Statistics, vol. 5, 1967, pp. 449–451.
The quote is reproduced from Peter Kennedy, A Guide to Econometrics, 4th ed., MIT Press, Cambridge, Mass., 1998, p.
190.
7
11/21/2022
Multicollinearity - Effects
• The estimators are still BLUE, but have large
variances and low precision.
𝟐
𝒊
Multicollinearity reveals
Inflated variance; 𝒊
Multicollinearity - Effects
• Confidence intervals widen leading to the
acceptance of “zero null hypothesis”.
• High R-squared value.
8
11/21/2022
Multicollinearity - Effects
• Confidence intervals
widen leading to the
acceptance of “zero null
hypothesis”.
Multicollinearity - Effects
If multicollinearity is perfect the regression coefficients
of the X variables are indeterminate and their standard
errors are infinite.
If multicollinearity is less than perfect, the regression
coefficients, although determinate, possess large
standard errors (in relation to the coefficients
themselves),which means the coefficients cannot be
estimated with great precision or accuracy.
9
11/21/2022
Multicollinearity - Detection
• High overall 𝟐 but few insignificant t-ratios of
𝒊 ’s.
• High correlation between 𝒊 ’s.
• Variance Inflation Factor
𝟐
𝑿𝒊
Multicollinearity - Detection
• Variance Inflation Factor
𝟐
𝑿𝒊
Tolerance
𝟐
𝑿𝒊
10
11/21/2022
Multicollinearity - Detection
𝟏
𝑽𝑰𝑭 =
𝟏 − 𝑹𝟐𝑿𝒊
𝟎 𝟏 𝟐
Auxiliary Regression:
𝟎 𝟏
Obtain 𝑹𝟐𝒙 from this auxiliary regression.
Multicollinearity - Detection
𝟏
𝑹𝟐𝑿𝒊 = 𝟎, 𝑽𝑰𝑭 = =𝟏
𝟏−𝟎
𝟏
𝑹𝟐𝑿𝒊 = 𝟎. 𝟓, 𝑽𝑰𝑭 = =𝟐
𝟏 − 𝟎. 𝟓
𝟏
𝑹𝟐𝑿𝒊 = 𝟎. 𝟗, 𝑽𝑰𝑭 = = 𝟏𝟎
𝟏 − 𝟎. 𝟗
𝟏
𝑹𝟐𝑿𝒊 = 𝟏, 𝑽𝑰𝑭 = = 𝑰𝒏𝒇
𝟏−𝟏
11
11/21/2022
Multicollinearity - Remedies
Multicollinearity in R
Packages: performance, lmtest
12
11/21/2022
25
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
13
11/21/2022
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
Heteroscedasticity
• Assumption: The variance of the error terms, 𝒊,
is equal or homoscedastic.
• Assumption violation leads to the problem of
HETEROSCEDASTICITY / HETEROSKEDASTICITY.
14
11/21/2022
Heteroscedasticity - Nature
Homoscedasticity Heteroscedasticity
Heteroscedasticity - Nature
15
11/21/2022
Heteroscedasticity - Nature
Heteroscedasticity - Effects
• Estimates of regression coefficients are still LINEAR and
UNBIASED but …
• … no longer BEST (minimum variance).
• Estimate of true variance, 𝝈𝟐 , is biased and the
direction of bias is unknown. Thus, estimates of
coefficients also biased.
• T-test and F-test become unreliable.
16
11/21/2022
Heteroscedasticity - Detection
Specific Tests
• Park Test
• Glejser Test
• Goldfeld-Quandt Test
General Tests
• White’s Test
• Breusch-Pagan Test
Heteroscedasticity - Detection
• These tests follow the hypothesis:
17
11/21/2022
Heteroscedasticity - Remedy
• Take logs (compress data into smaller
values)
• Weighted Least Squares
• Revisit the model (there might be a
misspecification)
Heteroscedasticity in R
Packages: performance, lmtest
18
11/21/2022
37
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
19
11/21/2022
Assumptions of CLRM
Assumption 1: Assumption 6:
The regression model is linear in the parameters. The number of observations must be greater than
the number of parameters to be estimated.
Assumption 2:
The values of the regressors, X’s, are fixed, or are Assumption 7:
independent of the error term. Sufficient variation in X’s.
Assumption 3: Assumption 8:
The mean value of the error term is zero. There is no exact collinearity between X’s.
Assumption 4: Assumption 9:
The variance of 𝒖 is constant or homoscedastic. Model is specified correctly.
Assumption 5: Assumption 10:
There is no autocorrelation between disturbances. The stochastic error term is normally distributed.
Autocorrelation - Nature
• Assumption: No autocorrelation between
disturbances. [ 𝒊 𝒋 𝒊 𝒋 𝒊 𝒋 ]
• Assumption violation leads to the problem of
AUTOCORRELATION.
• What happens if the error terms are correlated?
We’ll find out soon!
20
11/21/2022
Autocorrelation - Nature
• No autocorrelation means: THERE SHOULDN’T BE
ANY CLEAR PATTERN BETWEEN THE RESIDUALS!
• Autocorrelation and Serial Correlation are treated
synonymously.
Autocorrelation - Nature
21
11/21/2022
Autocorrelation - Effects
• Estimates of regression coefficients are still LINEAR
and UNBIASED but …
• … no longer BEST (minimum variance).
• Estimate of true variance, 𝝈𝟐 , is biased and the
direction of bias is unknown. Thus, estimates of
coefficients also biased.
• T-test and F-test become unreliable.
Autocorrelation - Detection
• Graphical Method of Residuals
• Durbin-Watson (DW) Test
• Breusch-Godfrey (BG) Test
Test Hypothesis
𝟎
𝟏
22
11/21/2022
Autocorrelation - Remedies
• Revisit the model (maybe, model is misspecified)
• Use Newey West method (an extension of White’s
heteroscedasticity test) to obtain standard errors
corrected for autocorrelation.
• Generalized Least Squares
• HAC (Heteroscedasticity-Autocorrelation consistent)
standard errors
lmtest::bgtest()
Lmtest::dwtest()
23
11/21/2022
47
Thank You!
Any question?
Email:
[email protected]
LinkedIn:
https://www.linkedin.com/in/appiah-elijah-383231123/
24