0% found this document useful (0 votes)
14 views21 pages

MULTICOLLINEARITY

The document discusses multicollinearity in regression analysis, defining it as the correlation among independent variables that can lead to difficulties in estimating regression coefficients. It outlines the consequences of multicollinearity, such as inflated standard errors and indeterminate coefficients, and provides methods for detection and remedies, including dropping redundant variables or transforming them. An illustrative example highlights how multicollinearity can obscure the individual effects of correlated variables on the dependent variable.

Uploaded by

isaiahmpapi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

MULTICOLLINEARITY

The document discusses multicollinearity in regression analysis, defining it as the correlation among independent variables that can lead to difficulties in estimating regression coefficients. It outlines the consequences of multicollinearity, such as inflated standard errors and indeterminate coefficients, and provides methods for detection and remedies, including dropping redundant variables or transforming them. An illustrative example highlights how multicollinearity can obscure the individual effects of correlated variables on the dependent variable.

Uploaded by

isaiahmpapi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MULTICOLLINEARITY

DISMAS ALEX
Dept of Economics and Tax Mag.
The Institute of Finance Management
In this lecture we take a critical look at this assumption by
seeking answers to the following questions:
1. What is the nature of multicollinearity?
2. Is multicollinearity really a problem?
3. What are its practical consequences?
4. How does one detect it?
multicollinearity? 5. What remedial measures can be taken to
alleviate the problem of
INTUITION
• Consider the following regression:

Interpretations of the regression coefficients;

�� 1- The marginal effect on salary of 1 additional year experience, holding other variables constant

�� 2- The marginal effect on salary of 1 additional year of age, holding other variables constant
DEFINITION OF MULTICOLLINEARITY
Perfect multicollinearity is the violation of Assumption (no explanatory
variable is a perfect linear function of any other explanatory variables).
• Perfect (or Exact) Multicollinearity
If two or more independent variables have an exact linear relationship
between them then we have perfect multicollinearity.
• Multicollinearity occurs when the X variables are themselves related
• Here is an example of perfect multicollinearity in a model with two
explanatory variables:
��= �� + �� X + �� X + e
i 0 1 1i 2 2i i

X 1i 0
= �� + �� 1X2i
MULTICOLLINEARITY CONT…
• Consequence: OLS cannot generate estimates of regression
coef ficients (error message).

• Why? OLS cannot estimate the marginal effect of X1 on ��


while holding X2 constant because X2 moves exactly when X1
moves!
• Solution: Easy - Drop one of the variables!
Consequences of Multicollinearity
1. Difficult to identify separate effects of each variable in the model.
2. The variances and the standard errors of the regression
coef ficient estimates will increase. This means lower t-statistics.
3. Estimates of parameters may not appear significantly different
from zero even though the F-test for functions of the correlated
variables may be large
4. Regression coef ficients will be sensitive to specif ications.
Regression coef ficients can change substantially when variables
are added or dropped.
5. The overall f it of the regression equation will be largely unaffected
by multicollinearity. This also means that forecasting and
prediction will be largely unaffected.
(Read also Gujarat, D. N. (2004) pg 350)
Why Care?
• What does multicollinearity do to my regression results?
Why Care? Cont….
• In passing, note that multicollinearity, as we have defined it, refers only to
linear relationships among the X variables. It does not rule out nonlinear
relationships among them. For example, consider the following regression
model:
• Yi = β0 + β1Xi + β2X2i + β3X3i + ui
• where, say, Y = total cost of production and X = output. The variables X2i
(output squared) and X3i (output cubed) are obviously functionally related
to Xi, but the relationship is nonlinear.
• Why does the classical linear regression model assume that there is no
multicollinearity among the X’s? The reasoning is this:
1. If multicollinearity is perfect, the regression coefficients of the X
variables are indeterminate and their standard errors are infinite.
2. If multicollinearity is less than perfect, the regression coefficients,
although determinate, possess large standard errors which means the
coefficients cannot be estimated with great precision or accuracy.
Sources of multicollinearity
There are several sources of multicollinearity.
1. The data collection method employed, for example, sampling over a
limited range of the values taken by the regressors in the population.
2. Constraints on the model or in the population being sampled. For
example, in the regression of electricity consumption on income (X )
and house size (X ) (High X always mean high X ).
2

3 2 3

3. Model specification, for example, adding polynomial terms to a


regression model, especially when the range of the X variable is small.
4. An overdetermined model. This happens when the model has more
explanatory variables than the number of observations.
• An additional reason for multicollinearity, especially in time series
data, may be that the regressors included in the model share a
common trend, that is, they all increase or decrease over time.
The Detection of Multicollinearity
• High Correlation Coef ficients
Pairwise correlations among independent variables might be high
(in absolute value). Rule of thumb: If the correlation > 0.9 then
severe multicollinearity may be present.
The Detection of Multicollinearity
• High ��2with low t-Statistic Values
Possible for individual regression coef ficients to be insignif icant
but for the overall f it of the equation to be high.
• High Variance Inf lation Factors (VIFs)
A VIF measures the extent to which multicollinearity has
increased the variance of an estimated coef ficient. It looks at the
extent to which an explanatory variable can be explained by all
the other explanatory variables in the equation.
The Detection of Multicollinearity
The Detection of Multicollinearity
Remedies for Multicollinearity
No single solution exists that will eliminate multicollinearity. Certain
approaches may be useful:
1. Do Nothing
Live with what you have.
2. Drop a Redundant Variable: If a variable is redundant, it should have never
been included in the model in the f irst place. So dropping it actually is just
correcting for a specif ication error. Use economic theory to guide your
choice of which variable to drop.
3. Transform the Multicollinear Variables: Sometimes you can reduce
multicollinearity by re-specifying the model, for instance, create a
combination of the multicollinear variables. As an example, rather than
including the variables GDP and population in the model, include
GDP/population (GDP per capita) instead.
4. Increase the Sample Size
AN ILLUSTRATIVE EXAMPLE: CONSUMPTION EXPENDITURE
IN RELATION TO INCOME AND WEALTH
• Let us consider the consumption–income example in table 10.5.
we obtain the following regression:
Yˆi = 24.7747 + 0.9415X2i − 0.0424X3i
(6.7525) (0.8229) (0.0807)
t = (3.6690) (1.1442) (−0.5261)
(10.6.1)
R2 = 0.9635 R¯2 = 0.9531 df = 7
• Regression (10.6.1) shows that income and wealth together explain about
96 percent of the variation in consumption expenditure, and yet neither of
the slope coefficients is individually statistically significant.
• The wealth variable has the wrong sign. Although β ˆ 2 and β ˆ 3 are
individually statistically insignificant, if we test the hypothesis that β2 = β3 =
0 simultaneously, this hypothesis can be rejected, as Table 10.6 shows.
Source of variation SS df MSS
Due to regression 8,565.5541 2 4,282.7770
Due to residual 324.4459 7 46.3494

• Under the usual assumption we obtain:


• F =4282.7770 / 46.3494 = 92.4019 (10.6.2)
• This F value is obviously highly significant. Our example shows
dramatically what multicollinearity does.
• The fact that the F test is significant but the t values of X and X are
individually insignificant means that the two variables are so highly
2 3

correlated that it is impossible to isolate the individual impact of


either income or wealth on consumption.
• If instead of regressing Y on X2, we regress it on X3, we obtain
Yˆi = 24.411 + 0.0498X3i
(6.874) (0.0037) (10.6.5)
t = (3.551) (13.29) R2 = 0.9567
• We see that wealth has now a significant impact on
consumption expenditure, whereas in (10.6.1) it had no effect
on consumption expenditure.
• Regressions (10.6.4) and (10.6.5) show very clearly that in
situations of extreme multicollinearity dropping the highly
collinear variable will often make the other X variable
statistically significant. This result would suggest that a way out
of extreme collinearity is to drop the collinear variable.
THANK YOU

You might also like