Multicollinearity Among The Regressors Included in The Regression Model
Multicollinearity Among The Regressors Included in The Regression Model
………………… (3)
which shows how X2 is exactly linearly related to other variables or how it can be derived
from a linear combination of other X variables. In this situation, the coefficient of
correlation between the variable X2 and the linear combination on the right side of (3) is
bound to be unity.
Similarly, if λ2 ≠ 0, Eq. (2) can be written as
………………… (4)
which shows that X2 is not an exact linear combination of other X’s because it is also
determined by the stochastic error term vi.
As a numerical example, consider the following hypothetical data:
AEC 507/GMG/MC/2021 1
It is apparent that X3i = 5X2i. Therefore, there is perfect collinearity between X2 and
X3 since the coefficient of correlation r23 is unity. The variable X*3 was created from X3 by
simply adding to it the numbers taken from a table of random numbers: 2, 0, 7, 9, 2. Now
there is no longer perfect collinearity between X2 and X*3. However, the two variables are
highly correlated because calculations will show that the coefficient of correlation between
them is 0.9959.
Algebraically multicollinearity can be portrayed using following Ballentine.
In this figure the circles Y, X2, and X3 represent, respectively, the variations in Y
(the dependent variable) and X2 and X3 (the explanatory variables). The degree of
collinearity can be measured by the extent of the overlap (shaded area) of the X2 and X3
circles. In above Fig (a) there is no overlap between X2 and X3, and hence no collinearity.
AEC 507/GMG/MC/2021 2
In Fig. (b) through (e) there is a “low” to “high” degree of collinearity—the greater the
overlap between X2 and X3 (shaded area), the higher the degree of collinearity.
In the extreme, if X2 and X3 were to overlap completely (or if X2 were completely
inside X3, or vice versa), collinearity would be perfect.
Note that multicollinearity, as we have defined it, refers only to linear relationships
among the X variables. It does not rule out nonlinear relationships among them. For
example, consider the following regression model:
…………………(5)
where, say, Y = total cost of production and X = output. The variables Xi2 (output squared)
and Xi3 (output cubed) are obviously functionally related to Xi, but the relationship is
nonlinear.
Therefore, models like (5) do not violate the assumption of no multicollinearity.
However, in concrete applications, the conventionally measured correlation coefficient
will show Xi, X2i, and X3i to be highly correlated, which, will make it difficult to estimate
the parameters of (5) with greater precision (i.e., with smaller standard errors).
The classical linear regression model assumes no multicollinearity among the X’s
because, if multicollinearity is perfect in the sense of (1), the regression coefficients of the
X variables are indeterminate and their standard errors are infinite. If multicollinearity is
less than perfect, as in (2), the regression coefficients, although determinate, possess large
standard errors (in relation to the coefficients themselves), which means the coefficients
cannot be estimated with great precision or accuracy.
Sources of multicollinearity [as given by Montgomery and Peck]:
1. The data collection method employed- If sampling over a limited range of the values
taken by the regressors in the population.
2. Constraints on the model or in the population being sampled –Ex: In the regression of
electricity consumption on income (X2) and house size (X3) there is a physical
constraint in the population in that families with higher incomes generally have larger
homes than families with lower incomes.
3. Model specification, for example, adding polynomial terms to a regression model,
especially when the range of the X variable is small.
4. An over-determined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical research
where there may be a small number of patients about whom information is collected
on a large number of variables.
5. In time series data, regressors included in the model share a common trend, i.e., they
all increase or decrease over time. Thus, in the regression of consumption expenditure
on income, wealth, and population, the regressors income, wealth, and population may
all be growing over time at more or less the same rate, leading to collinearity among
these variables.
AEC 507/GMG/MC/2021 3
Problem with Multicollinearity (MC)
Estimation of regression coefficient is a problem and we can’t find them with ease.
In the case of perfect multicollinearity
Here the regression coefficients remain indeterminate and their standard errors are infinite.
This fact can be demonstrated in terms of the three-variable regression model. Using the deviation
form, where all the variables are expressed as deviations from their sample means, we can write
the three variable regression model as
Assume that X3i = λX2i, where λ is a nonzero constant (e.g., 2, 4, 1.8, etc.). Substituting this
into previous equation above, we obtain
Where,
gives us only one equation in two unknowns (note λ is given) and there is an infinity of solutions
to above equation for given values of ˆα and λ.
To put this idea in concrete terms, let ˆα = 0.8 and λ = 2. Then we have
0.8 = ˆ β2 + 2 ˆβ3
Or β2 = 0.8 − 2ˆβ3
AEC 507/GMG/MC/2021 4
Now, choose a value of ˆβ3 arbitrarily, and we will have a solution for ˆβ2. Choose another
value for ˆβ3, and we will have another solution for ˆβ2. No matter how hard we try, there is no
unique value for ˆβ2.
Thus in the case of perfect multicollinearity one cannot get a unique solution for the
individual regression coefficients. But notice that one can get a unique solution for linear
combinations of these coefficients. The linear combination (β2 + λβ3) is uniquely estimated by α,
given the value of λ.
In the case of perfect multicollinearity the variances and standard errors of ˆβ2 and ˆβ3
individually are infinite.
Estimation in the Presence of “High” but “Imperfect” Multicollinearity
The perfect multicollinearity situation is a pathological extreme. Generally, there is no
exact linear relationship among the X variables, especially in data involving economic time series.
Thus, turning to the three-variable model in the deviation form, instead of exact multicollinearity,
we may have
where λ≠ 0 and where vi is a stochastic error term such that x2ivi = 0. Incidentally, the Ballentines
shown in earlier Figure b to e represent cases of imperfect collinearity.
In this case, estimation of regression coefficients β2 and β3 may be possible.
For example, substituting above equation we obtain
where use is made of Ʃx2ivi=0, A similar expression can be derived for ˆ β3.
Theoretical Consequences of Multicollinearity
Even if multicollinearity is very high, as in the case of near multicollinearity, the OLS
estimators still retain the property of BLUE.
The only effect of multicollinearity is to make it hard to get coefficient estimates with
small standard error. The MC problem is largely something to do with samples / sample size, but
not on population. If the sample size is low, then SE is high and hence variance is also high.
Practical Consequences of High Multicollinearity
1. Although BLUE, the OLS estimators have large variances and covariances, making precise
estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be statistically
insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes in the data.
AEC 507/GMG/MC/2021 5
Demonstration of consequences
Large Variances and Covariances of OLS Estimators
To see large variances and covariances, the variances and covariances of ˆβ2 and ˆβ3 are given by
where r23 is the coefficient of correlation between X2 and X3. It is apparent from above first
two equations that as r23 tends toward 1, that is, as collinearity increases, the variances of the two
estimators increase and in the limit when r23 = 1, they are infinite. It is equally clear from third
equation that as r23 increases toward 1, the covariance of the two estimators also increases in
absolute value. [Note: cov (ˆβ2, ˆβ3) ≡ cov (ˆβ3, ˆβ2).]
If r223=1, then variances will be ∞ and SE will be ∞ and also covariance increases
progressively.
Variance-Inflating Factor (VIF)
The speed with which variances and covariances increase will be known as VIF, by this
we can detect presence or absence of MC.
The VIF shows how the variance of an estimator is inflated by the presence of MC.
As r223 approaches 1, the VIF approaches infinity. That is, as the extent of collinearity
increases, the variance of an estimator increases, and in the limit it can become infinite. As can be
readily seen, if there is no collinearity between X2 and X3, VIF will be 1.
Thus variances of ˆβ2 and ˆβ3 are directly proportional to the VIF and inversely
proportional to the sum of deviations squares of x values.
The inverse of the VIF is called tolerance (TOL).
AEC 507/GMG/MC/2021 6
To have an idea about how fast the variances and covariances increase as r23 increases,
consider the data in above Table on variances and covariances for selected values of r23. The
increases in r23 have a dramatic effect on the estimated variances and covariances of the OLS
estimators. When r23 = 0.50, the var ( ˆβ2) is 1.33 times the variance when r23 is zero, but by the
time r23 reaches 0.95 it is about 10 times as high as when there is no collinearity. And, an increase
of r23 from 0.95 to 0.995 makes the estimated variance 100 times that when collinearity is zero.
The same dramatic effect is seen on the estimated covariance as indicated in in following Figure.
AEC 507/GMG/MC/2021 7
Detection of Multicollinearity
After studying the nature and consequences of multicollinearity, How to detect the presence of
collinearity in any given situation, especially in models involving more than two explanatory
variables. For that let us consider the Kmenta’s warning:
Multicollinearity is a question of degree and not of kind. The meaningful distinction is not
between the presence and the absence of multicollinearity, but between its various degrees.
Since multicollinearity refers to the condition of the explanatory variables that are assumed to
be non-stochastic, it is a feature of the sample and not of the population. Therefore, we do not
“test for multicollinearity” but can, if we wish, measure its degree in any particular sample.
Since multicollinearity is essentially a sample phenomenon, arising out of the largely non-
experimental data collected in social sciences, there is no unique method of detecting it or
measuring its strength, instead there are some rules of thumb, some informal and some formal, like
1. High R2 but few significant t ratios: This is the “classic” symptom of multicollinearity. If R2 is
high, say, in excess of 0.8, the F test in most cases will reject the hypothesis that the partial
slope coefficients are simultaneously equal to zero, but the individual t tests will show that none
or very few of the partial slope coefficients are statistically different from zero.
Although this diagnostic is sensible, its disadvantage is that “it is too strong in the sense that
multicollinearity is considered as harmful only when all of the influences of the explanatory
variables on Y cannot be disentangled.”
2. High pair-wise correlations among regressors: If the pair-wise or zero-order correlation
coefficient between two regressors is high, say, in excess of 0.8, then multicollinearity is a
serious problem. The problem with this criterion is that, although high zero-order correlations
may suggest collinearity, it is not necessary that they be high to have collinearity in any specific
case. To put the matter somewhat technically, high zero-order correlations are a sufficient but
not a necessary condition for the existence of multicollinearity because it can exist even though
the zero-order or simple correlations are comparatively low (say, less than 0.50).
Suppose for 4 variable model
If
where λ2 and λ3 are constants, not both zero. Obviously, X4 is an exact linear combination of X2 and
X3, giving R24.23 = 1, the coefficient of determination in the regression of X4 on X2 and X3.
With r42 = 0.5, r43 = 0.5, and r23 = −0.5, which are not very high values, we get R2 =1.
Therefore, in models involving more than two explanatory variables, the simple or zero-
order correlation will not provide an infallible guide to the presence of multicollinearity. Of
course, if there are only two explanatory variables, the zero-order correlations will suffice.
3. Examination of partial correlations: Farrar and Glauber have suggested that one should look
at the partial correlation coefficients. In the regression of Y on X2, X3, and X4, a finding that R2
1.234 is very high but r 2 12.34, r 213.24, and r214.23 are comparatively low may suggest that
the variables X2, X3, and X4 are highly inter-correlated and that at least one of these variables is
superfluous.
AEC 507/GMG/MC/2021 8
Supposing partial correlation between dependent and independent variables is lower, then
we can suspect presence of MC.
4. Auxiliary regressions: Since multicollinearity arises because one or more of the regressors are
exact or approximately linear combinations of the other regressors, one way of finding out
which X variable is related to other X variables is to regress each Xi on the remaining X
variables and compute the corresponding R2, which we designate as R2i; each one of these
regressions is called an auxiliary regression, auxiliary to the main regression of Y on the X’s.
Then, following the relationship between F and R2, the variable
Rule of thumb:
If k is between 100 and 1000 there is moderate to strong multicollinearity and if it exceeds
1000 there is severe multicollinearity.
Alternatively, if the CI (=√k) is between 10 and 30, there is moderate to strong
multicollinearity and if it exceeds 30 there is severe multicollinearity.
6. Tolerance and variance inflation factor: As R2j, increases towards unity, that is, as the
collinearity of Xj with the other regressors increases, VIF also increases. The larger the value of
VIFj, the more “troublesome” or collinear the variable Xj.
As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if R2j exceeds 0.90,
that variable is said be highly collinear. The closer is TOLj (Inverse of VIF) to zero, the greater the
degree of collinearity of that variable with the other regressors, while the closer TOLj is to 1, the
greater the evidence that Xj is not collinear with the other regressors
According to Goldberger, exact micronumerosity (the counterpart of exact
multicollinearity) arises when n, the sample size, is zero, in which case any kind of estimation is
impossible. Near micronumerosity, like near multicollinearity, arises when the number of
observations barely exceeds the number of parameters to be estimated.
AEC 507/GMG/MC/2021 9
Remedial Measures
If multicollinearity is serious? We have two choices:(1) do nothing or (2) follow some
rules of thumb.
The “do nothing” school of thought is expressed by Blanchard as follows:
When students run their first ordinary least squares (OLS) regression, the first problem
that they usually encounter is that of multicollinearity. Many of them conclude that there is
something wrong with OLS; some resort to new and often creative techniques to get around the
problem. But, we tell them, this is wrong. Multicollinearity is God’s will, not a problem with OLS
or statistical technique in general.
What Blanchard is saying is that multicollinearity is essentially a data deficiency problem
(micronumerosity, again) and sometimes we have no choice over the data we have available for
empirical analysis. Also, it is not that all the coefficients in a regression model are statistically
insignificant. Moreover, even if we cannot estimate one or more regression coefficients with
greater precision, a linear combination of them (i.e., estimable function) can be estimated
relatively efficiently.
Rule-of-Thumb Procedures
One can try rules of thumb to address the problem of multicollinearity, the success
depending on the severity of the collinearity problem.
1. A priori information: Suppose we consider the model
Where, Y = consumption, X2 = income, and X3 = wealth. Since income and wealth variables tend to
be highly collinear. But suppose a priori we believe that β3 = 0.10β2; that is, the rate of change of
consumption with respect to wealth is one-tenth the corresponding rate with respect to income. We
can then run the following regression
Where, Xi=X2i + 0.1X3i. Then with ˆβ2, we can estimate ˆβ3 from the postulated relationship
between β2 and β3.
The a priori information could come from previous empirical work in which the
collinearity problem happens to be less serious or from the relevant theory underlying the field of
study.
2. Combining cross-sectional and time series data:
A variant of the extraneous or apriori information technique is the combination of cross
sectional and time-series data, known as pooling the data. Suppose we want to study the demand
for automobiles in the United States and assume we have time series data on the number of cars
sold, average price of the car, and consumer income. Suppose also that
AEC 507/GMG/MC/2021 10
where Y* = ln Y − ˆβ3 ln I, that is, Y* represents that value of Y after removing from it the effect of
income. We can now obtain an estimate of the price elasticity β2 from the preceding regression.
3. Dropping a variable(s) and specification bias:
When faced with severe multicollinearity, one of the “simplest” things to do is to drop one
of the collinear variables, so that the non-significant may become significant variable or vis-a-vis.
But in dropping a variable from the model we may be committing a specification bias or
specification error. Specification bias arises from incorrect specification of the model used in the
analysis.
4. Transformation of variables: Suppose we have time series data on consumption expenditure,
income, and wealth. One reason for high multicollinearity between income and wealth in such
data is that over time both the variables tend to move in the same direction. One way of
minimizing this dependence is to proceed as follows
If the relation
holds at time t, it must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore, we have
On taking difference
where vt = ut - ut−1. This equation is known as the first difference form because we run the
regression, not on the original variables, but on the differences of successive values of the
variables.
The first difference regression model often reduces the severity of multicollinearity because,
although the levels of X2 and X3 may be highly correlated, there is no a priori reason to believe
that their differences will also be highly correlated.
An incidental advantage of the first-difference transformation is that it may make a
nonstationary time series stationary. Loosely speaking, a time series, say, Yt, is stationary if its
mean and variance do not change systematically over time.
Another commonly used transformation in practice is the ratio transformation.
Consider the model:
where Y is consumption expenditure in real dollars, X2 is GDP, and X3 is total population. Since
GDP and population grow over time, they are likely to be correlated. One “solution” to this
problem is to express the model on a per capita basis, that is, by dividing with X3, we obtain:
AEC 507/GMG/MC/2021 11
5. Additional or new data:
Since multicollinearity is a sample feature, it is possible that in another sample involving
the same variables collinearity may not be so serious as in the first sample. Sometimes simply
increasing the size of the sample (if possible) may attenuate the collinearity problem.
For example, in the three-variable model
AEC 507/GMG/MC/2021 12
are also high, multicollinearity may not be readily detectable. Also, as pointed out by
C. Robert, Krishna Kumar, John O’Hagan, and Brendan McCabe, there are some
statistical problems with the partial correlation test suggested by Farrar and Glauber.
e. Therefore, one may regress each of the Xi variables on the remaining X variables in the
model and find out the corresponding coefficients of determination R2i. A high R2i
would suggest that Xi is highly correlated with the rest of the X’s. Thus, one may drop
that Xi from the model, provided it does not lead to serious specification bias.
4. Detection of multicollinearity is half the battle. The other half is concerned with how to get rid
of the problem. Again there are no sure methods, only a few rules of thumb. Some of these
rules are:
a. using extraneous or prior information,
b. combining cross-sectional and time series data,
c. omitting a highly collinear variable,
d. transforming data, and
e. obtaining additional or new data.
Which of these rules will work in practice will depend on the nature of the data and
severity of the collinearity problem.
5. We noted the role of multicollinearity in prediction and pointed out that unless the collinearity
structure continues in the future sample it is hazardous to use the estimated regression that has
been plagued by multicollinearity for the purpose of forecasting.
6. Although multicollinearity has received extensive (some would say excessive) attention in the
literature, an equally important problem encountered in empirical research is that of
micronumerosity, smallness of sample size. According to Goldberger, “When a research
article complains about multicollinearity, readers ought to see whether the complaints would
be convincing if “micronumerosity” were substituted for “multicollinearity.” He suggests that
the one ought to decide how small n, the number of observations, is before deciding that one
has a small-sample problem, just as one decides how high an R2 value is in an auxiliary
regression before declaring that the collinearity problem is very severe.
AEC 507/GMG/MC/2021 13