100% found this document useful (3 votes)
5K views7 pages

Multicollinearity Nature of Multicollinearity

Multicollinearity refers to an exact or near linear relationship between explanatory variables in a regression model. It can occur due to constraints in the population or model specification. Near multicollinearity results in high variance for coefficient estimates, while perfect multicollinearity makes coefficients indeterminate. It can be detected using variance inflation factors, eigenvalues, or correlation between variables. Potential remedies include increasing the sample size, dropping or combining variables, or modifying the model specification. Multicollinearity is problematic as it inflates standard errors and impacts coefficient significance tests.

Uploaded by

Sufian Himel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
5K views7 pages

Multicollinearity Nature of Multicollinearity

Multicollinearity refers to an exact or near linear relationship between explanatory variables in a regression model. It can occur due to constraints in the population or model specification. Near multicollinearity results in high variance for coefficient estimates, while perfect multicollinearity makes coefficients indeterminate. It can be detected using variance inflation factors, eigenvalues, or correlation between variables. Potential remedies include increasing the sample size, dropping or combining variables, or modifying the model specification. Multicollinearity is problematic as it inflates standard errors and impacts coefficient significance tests.

Uploaded by

Sufian Himel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Multicollinearity

Nature of Multicollinearity: The term multicollinearity is introduced in Economic


analysis by Economist “Ranger Frisch”. Multicollinearity refers to the existence of a
perfect or exact linear relationship among some or all explanatory variables of a
regression model.
For the k-variables regression involving explanatory variable X1,X2,X3,…,XK having the
following linear relationship
α 1X1+ α2X2+…+αkKk=0
Example: In demand function of a commodity suppose the quantity demanded of
commodity „A‟ depends upon the price of commodity „B‟ .If there are two prices are
correlated to each other it will be difficult to separate the influence of two prices on the
demand of the commodity. Such a problem is known as multicollinearity problem.

Types of Multicollinearity:
There are two types of multicollinearity. They are-
1.Exact/Perfect Multicollinearity.
2.Near or less than perfect Multicollinearity.
Exact Multicollinearity: If exist perfect linear relationship among the explanatory
variables then it is treated as exact multicollinearity. In case of exact multicollinearity the
design matrix as data matrix „X‟ is not of full rank & consequently (X′X)-1 does not exist.
In this case ‫׀‬X′X‫= ׀‬0.
Example; For the K-variables regression model involving explanatory variabie
X1,X2,….,Xk (where X1=1 for all observations to allow for the intercept term) an exact
linear relationship is said to exist if the following condition is satisfied.
λ1X1+λ2X2+λkXk=0
Where λ1,λ2,…,λk are constants such that not all of them are zero simultaneously.
Assume that λ1≠0 then the equation can be written as
X1=-(λ2/λ1)X2-(λ3/λ1)X3-….-(λk/λ1)Xk
Which show that X1is linearly related with other explanatory variables(X‟s).
Near multicollinearity: If the explanatory variables (x‟s) are strongly as highly
correlated but not perfectly then it s called near multicollinearity. In this case (X'X)-1 is
exist but with related large diagonal elements i.e.‫׀‬X′X‫≠׀‬0.
Example: When the explanatory variables (X′s) are inter correlated but not perfectly then
a linear relationship is said to be exist if
λ1X1+λ2X2+….+λkXk+vi=0
where Vi is a stochastic error term & λ′ s are constant such that not all of them are zero
simultaneously.
Assume that λ1≠0, then the equation can be written as
X1=-(λ2/λ1)X2-(λ3/λ1)X3-….-(λk/λ1)Xk-vi/λ1
Which shows that X1 is not exactly linearly related with other explanatory variables.
Sources of multicollinearity: There are several sources of multicollinearity.
1. The data collection method employed, for example, sampling over a limited range
of the values taken by the regressors in the population.
2. Constraints on the model or in the population being sampled. For example, in the
regression of electricity consumption on income (X2) & house size (X3) there is a
physical constraint in the population in the families with higher income generally
have larger homes than families with lower incomes.
3. Model specifications, for example, adding polynomial terms to a regression
model, especially when the range of the X variable is small.
4. An over determined model. This happens when the model has more explanatory
variables than the number of observations. This could happen in medical research
where there may be a small number of patients about whom information is
collected on a large number of variables.

Consequences of multicollinearity: In case of near or high multicollinearity one is


likely to encounter the following consequences.
1. Although BLUE, the OLS estimators have the large variance & covariance‟s
making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis‟ more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be
statistically insignificant.
4. Although the t ratio of one or more coefficients statistically insignificant, R2,the
overall measure of goodness of fit , can be very high.
5. The OLS estimators & their standard error can be sensitive to small changes in the
data.

If multicollinearity is perfect among the explanatory variables then the regression


coefficient of the X variables are indeterminate & their standard errors are infinite. If
multicollinearity is less than perfect, then the regression coefficients although
determinate, possesses large standard errors (in relation to the coefficients) which mean
that the coefficient cannot be estimated with great precision or accuracy. But when there
is no multicollinearity among the X′s variables then we can easily estimate the regression
coefficient.
For this reason CLRM assume that there is no multicollinearity among the X′s .

Detection of multicollinearity:
The indicators for detecting multicollinearity are as follows :-
1. Eigen Values & Conditional Index: Here we discuss the method of Eigen value &
conditional index to detect the multicollinearity. At first we have to calculate the data
matrix. Then using ‫׀‬X'X-λI‫= ׀‬0 we get the values of λ which is eigen value. Now we
have

maximumeigen value
Conditional Index (CI)=
minimumeigen value
After calculating CI, if CI lies between 10 to 30 then there is moderate
multicollinearity. And if CI exceeds 30 then there have severe multicollinearity.
2. High R2 but few significant t-ratios: This is classic symptom of multicollinearity . If
R2 is high , say excess of 0.8, the F-test in most cases reject the H0 that the partial
slope coefficients are simultaneously equal to zero, but the individual t-test will show
that none or very few of the partial coefficients are statistically different from zero. In
short we can write when R2 is very high but none of the regression coefficients is
statistically significant.
3. High pair wise correlation among Regression: Another suggested rule of thumb is that
if the pair wise as zero order correlation coefficients between two regressors is high,
say excess of 0.1 then multicollinearity is a serious problem.
4. Examination of partial correlation:
If R2 is high but the partial correlation are comparatively low may suggest that the
explanatory variables are highly correlated.
6. Tolerance and Variance Inflation Factor:
The speed with which variance and covariance increase can be seen with the VIF, which
is defined as
1
VIF=
1  r232
VIF shows how the variance of an estimator is inflated by the presence of
multicollinearity.
7. Low value of ‫׀‬X′X‫ ׀‬in case of exact multicollinearity i.e ‫׀‬X′X‫=׀‬0.

Remedial Measures:
If multicollinearity has serious effects on coefficients estimates of important factor, one
should adopt one of the following solutions –

1. Increase The Sample Size:


The easiest way to overcome the problem of multicollinearity is to increase the sample
size. Investigators are advised to collect more data to reduce the intensity of collinearity.

2. Using Extraneaous Estimate:


To eliminate the effects of multicollinearity the other commonly adopted method is the
uses of extraneous information is estimating parameters. Suppose our model is
Y=α0+α1X1+α2X2+u
Where X1 & X2 are correlated. If we know that α2=0.5α1, then the model will be
Y=α0+α1X1+0.5α1X2+u
=α0+α1(X1+0.5X2)+u
=α0+α1X′+u
Now we can estimate α1by OLS and hence α2=0.5α1.
3. Dropping Variables:
When we faced with severe multicollinearity one of the simplest things to do is to drop
one of the collinear variables.

4. Combining Cross-Sectional And Time Series Data:


Generally time series data is affected by multicollinearity problem. So if we combine the
cross-sectional data in time series data then the multicollinearity problem should be
reduced.

5. Model Specification:
Multicollinearity may be overcome if we specify our model; this can be done in the
following way.
a) One approach is to redefine regressors.
b) Re-specification of lagged variable or other explanatory variable in a
distributed lagged variables.

Is multicollinearity necessarily bad?


It has been said that if the purpose of regression analysis is prediction or
forecasting, then multicollinearity is not a serious problem because the higher the
R2 ,the better the prediction.
Moreover , if the objective of the analysis is not only prediction but also reliable
estimation of the parameters, serious multicollinearity will be a problem because
we have seen that it tends to large standard error of the estimators.
In one situation however, multicollinearity may not impose a serious problem.
This is the case when R2 is high and the regression coefficient are individually
significant as revealed by the higher t-values.

You might also like