Multi Col Linearity
Multi Col Linearity
Multicollinearity
Extra reading, Basic Econometrics by Gujarati and
Introduction to Econometrics by Maddala
Introduction
• In this chapter we take a critical look at this assumption by seeking
answers to the following questions:
• 1. What is the nature of multicollinearity?
• 2. Is multicollinearity really a problem?
• 3. What are its practical consequences?
• 4. How does one detect it?
• 5. What remedial measures can be taken to alleviate the problem of
multicollinearity?
THE NATURE OF MULTICOLLINEARITY
• Multicollinearity originally it meant the existence of a “perfect,” or exact,
linear relationship among some or all explanatory variables of a regression
model. For the k-variable regression involving explanatory variable X1, X2, . . .
, Xk (where X1 = 1 for all observations to allow for the intercept term), an
exact linear relationship is said to exist if the following condition is satisfied:
λ1X1 + λ2X2 +· · ·+λkXk = 0 (10.1.1)
where λ1, λ2, . . . , λk are constants such that not all of them are zero
simultaneously.
Today, however, the term multicollinearity is used to include the case where the
X variables are intercorrelated but not perfectly so, as follows:
λ1X1 + λ2X2 +· · ·+λ2Xk + vi = 0 (10.1.2)
where vi is a stochastic error term.
The nature of Multicollinearity
• To see the difference between perfect and less than perfect multicollinearity, assume, for example, that λ2 ≠ 0.
Then, (10.1.1) can be written as:
•
•
•
• which shows how X2 is exactly linearly related to other variables. In this situation, the coefficient of correlation
between the variable X2 and the linear combination on the right side of (10.1.3) is bound to be unity.
• Similarly, if λ2 ≠ 0, Eq. (10.1.2) can be written as:
•
•
•
• which shows that X2 is not an exact linear combination of other X’s because it is also determined by the
stochastic error term vi.
Nature of Multicollinearity
• As a numerical example, consider the following hypothetical data:
• X2 X3 X*3
• 10 50 52
• 15 75 75
• 18 90 97
• 24 120 129
• 30 150 152
• It is apparent that X3i = 5X2i . Therefore, there is perfect collinearity between X2 and X3
since the coefficient of correlation r23 is unity. The variable X*3 was created from X3 by
simply adding to it the following numbers, which were taken from a table of random
numbers: 2, 0, 7, 9, 2. Now there is no longer perfect collinearity between X2 and X*3.
(X3i = 5X2i + vi ) However, the two variables are highly correlated because calculations
will show that the coefficient of correlation between them is 0.9959.
Nature of Multicollinearity
• The preceding algebraic approach to multicollinearity can be
portrayed in Figure 10.1). In this figure the circles Y, X2, and X3
represent, respectively, the variations in Y (the dependent variable)
and X2 and X3 (the explanatory variables). The degree of collinearity
can be measured by the extent of the overlap (shaded area) of the
X2 and X3 circles. In the extreme, if X2 and X3 were to overlap
completely (or if X2 were completely inside X3, or vice versa),
collinearity would be perfect.
Nature of Multicollinearity
• In passing, note that multicollinearity, as we have defined it, refers only to linear relationships
among the X variables. It does not include nonlinear relationships among them. For example,
consider the following regression model:
• Yi = β0 + β1Xi + β2X2i + β3X3i + ui (10.1.5)
• where, say, Y = total cost of production and X = output. The variables X2i (output squared) and X3i
(output cubed) are obviously functionally related to Xi, but the relationship is nonlinear.
•
• Why does the classical linear regression model assume that there is no multicollinearity among
the X’s? The reasoning is this:
• If multicollinearity is perfect, the regression coefficients of the X variables are indeterminate and
their standard errors are infinite.
• If multicollinearity is less than perfect, the regression coefficients, although determinate, possess
large standard errors which means the coefficients cannot be estimated with great precision or
accuracy.
Sources of Multicollinearity
• There are several sources of multicollinearity.
• 1. The data collection method employed, for example, sampling over a
limited range of the values taken by the regressors in the population.
• 2. Constraints on the model or in the population being sampled. For
example, in the regression of electricity consumption on income (X2) and
house size (X3) (High X2 always mean high X3).
• 3. Model specification, for example, adding polynomial terms to a regression
model, especially when the range of the X variable is small.
• 4. An overdetermined model. This happens when the model has more
explanatory variables than the number of observations.
• An additional reason for multicollinearity, especially in time series data, may
be that the regressors included in the model share a common trend, that
is, they all increase or decrease over time.
ESTIMATION IN THE PRESENCE OF PERFECT MULTICOLLINEARITY
• Generally, there is no exact linear relationship among the X variables. Thus, turning to the three-variable
model in the deviation form given in (10.2.1), instead of exact multicollinearity, we may have
• x3i = λx2i + vi (10.3.1)
• where λ ≠ 0 and where vi is a stochastic error term such that x2ivi = 0.
• In this case, estimation of regression coefficients β2 and β3 may be possible. For example, substituting
(10.3.1) into (7.4.7), we obtain
•
•
•
•
• where use is made of Σx2ivi = 0. A similar expression can be derived for βˆ3.
• Now, unlike (10.2.2), there is no reason to believe a priori that (10.3.2) cannot be estimated. Of course, if vi is
sufficiently small, say, very close to zero, (10.3.1) will indicate almost perfect collinearity and we shall be
back to the indeterminate case of (10.2.2).
PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY