Multicollinearity Nature of Multicollinearity
Multicollinearity Nature of Multicollinearity
Types of Multicollinearity:
There are two types of multicollinearity. They are-
1.Exact/Perfect Multicollinearity.
2.Near or less than perfect Multicollinearity.
Exact Multicollinearity: If exist perfect linear relationship among the explanatory
variables then it is treated as exact multicollinearity. In case of exact multicollinearity the
design matrix as data matrix „X‟ is not of full rank & consequently (X′X)-1 does not exist.
In this case ׀X′X= ׀0.
Example; For the K-variables regression model involving explanatory variabie
X1,X2,….,Xk (where X1=1 for all observations to allow for the intercept term) an exact
linear relationship is said to exist if the following condition is satisfied.
λ1X1+λ2X2+λkXk=0
Where λ1,λ2,…,λk are constants such that not all of them are zero simultaneously.
Assume that λ1≠0 then the equation can be written as
X1=-(λ2/λ1)X2-(λ3/λ1)X3-….-(λk/λ1)Xk
Which show that X1is linearly related with other explanatory variables(X‟s).
Near multicollinearity: If the explanatory variables (x‟s) are strongly as highly
correlated but not perfectly then it s called near multicollinearity. In this case (X'X)-1 is
exist but with related large diagonal elements i.e.׀X′X≠׀0.
Example: When the explanatory variables (X′s) are inter correlated but not perfectly then
a linear relationship is said to be exist if
λ1X1+λ2X2+….+λkXk+vi=0
where Vi is a stochastic error term & λ′ s are constant such that not all of them are zero
simultaneously.
Assume that λ1≠0, then the equation can be written as
X1=-(λ2/λ1)X2-(λ3/λ1)X3-….-(λk/λ1)Xk-vi/λ1
Which shows that X1 is not exactly linearly related with other explanatory variables.
Sources of multicollinearity: There are several sources of multicollinearity.
1. The data collection method employed, for example, sampling over a limited range
of the values taken by the regressors in the population.
2. Constraints on the model or in the population being sampled. For example, in the
regression of electricity consumption on income (X2) & house size (X3) there is a
physical constraint in the population in the families with higher income generally
have larger homes than families with lower incomes.
3. Model specifications, for example, adding polynomial terms to a regression
model, especially when the range of the X variable is small.
4. An over determined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical research
where there may be a small number of patients about whom information is
collected on a large number of variables.
Detection of multicollinearity:
The indicators for detecting multicollinearity are as follows :-
1. Eigen Values & Conditional Index: Here we discuss the method of Eigen value &
conditional index to detect the multicollinearity. At first we have to calculate the data
matrix. Then using ׀X'X-λI= ׀0 we get the values of λ which is eigen value. Now we
have
maximumeigen value
Conditional Index (CI)=
minimumeigen value
After calculating CI, if CI lies between 10 to 30 then there is moderate
multicollinearity. And if CI exceeds 30 then there have severe multicollinearity.
2. High R2 but few significant t-ratios: This is classic symptom of multicollinearity . If
R2 is high , say excess of 0.8, the F-test in most cases reject the H0 that the partial
slope coefficients are simultaneously equal to zero, but the individual t-test will show
that none or very few of the partial coefficients are statistically different from zero. In
short we can write when R2 is very high but none of the regression coefficients is
statistically significant.
3. High pair wise correlation among Regression: Another suggested rule of thumb is that
if the pair wise as zero order correlation coefficients between two regressors is high,
say excess of 0.1 then multicollinearity is a serious problem.
4. Examination of partial correlation:
If R2 is high but the partial correlation are comparatively low may suggest that the
explanatory variables are highly correlated.
6. Tolerance and Variance Inflation Factor:
The speed with which variance and covariance increase can be seen with the VIF, which
is defined as
1
VIF=
1 r232
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity.
7. Low value of ׀X′X ׀in case of exact multicollinearity i.e ׀X′X=׀0.
Remedial Measures:
If multicollinearity has serious effects on coefficients estimates of important factor, one
should adopt one of the following solutions –
5. Model Specification:
Multicollinearity may be overcome if we specify our model; this can be done in the
following way.
a) One approach is to redefine regressors.
b) Re-specification of lagged variable or other explanatory variable in a
distributed lagged variables.