0% found this document useful (0 votes)
9 views22 pages

Multi Collinearity

Notes

Uploaded by

Ivy Dasgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Multi Collinearity

Notes

Uploaded by

Ivy Dasgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Multicollinearity

Dr. Ivy Das Gupta


Analysis of Collinear Data: Multicollinearity
➢ In a multiple linear regression model some regressors may be correlated
➢ When regressors are highly correlated, the problem of multicollinearity arises
➢ The assumption of independence among the regressors is violated
➢ In a multiple linear regression model, the full rank condition implies that the
regressors are independent of one another
➢ Multicollinearity is a high degree of correlation among several independent
variables
➢ Multicollinearity may also appear when we incorporate a variable in terms of
another variable present in the model
Multicollinearity
➢ In some cases multicollinearity is a result of the structure of the data
➢ If regressors are expressed as linear combination of one another, the
regression model suffers from perfect multicollinearity
➢ Perfect multicollinearity is a rare event
➢ We discuss the problem in terms of departures from independence of the
regressors with one another
➢ It is difficult to retain the ceteris paribus condition if the correlation is very
strong
Multiple Correlation & Partial Correlation
In a multiple linear regression model we have different types of correlation:

1) Simple correlation between two variables


2) Multiple correlation &
3) Partial correlation

The estimated sum of squares in a multiple linear regression model is:

ESS = β1hat S1y + β2hat S2y

The coefficient of multiple determination: R2y.12 = ESS/TSS = (β1hat S1y + β2hat S2y )/Syy

The positive square root of the coefficient of multiple determination is called the multiple correlation coefficient
Partial Correlation
➢ The RSS in a multiple linear regression model is:
➢ RSS = Syy (1 - R2y.12)
➢ The OLS estimates β1hat & β2hat have partial effect or ceteris paribus
interpretation:
➢ β1hat = ∑ u1hat y/∑(u1hat)2
➢ Where, u1hat = x1 - bhatx2 is the estimated residual of the regression
equation of x1 on x2 measuring that part of x1 which is uncorrelated to x2
Partial Correlation
➢ The correlation between the dependent variable y and that part of x1 which
is not explained by another regressor x2 is called the partial correlation
between y and x1: r2y1.2

➢ Similarly, when we regress y on x2, after eliminating the effect of x1 on x2,


the proportion explained by x2 is measured by r2y2.1
➢ Therefore, the proportion remains unexplained by x2 is (1- r2y2.1)
Partial Correlation
➢ When we regress y on x1 in simple regression framework, we have
➢ RSS1 = Syy(1 - r2y1)
➢ Therefore, by considering the effect of x1 on y after eliminating the effect of
x2, the residual sum of square becomes:
➢ RSS = Syy(1 - r2y1)(1- r2y2.1)
➢ Now, RSS = Syy (1 - R2y.12) in multiple linear regression
➢ Thus, Syy (1 - R2y.12)= Syy(1 - r2y1)(1- r2y2.1)
➢ Therefore, (1 - R2y.12)= (1 - r2y1)(1- r2y2.1)
Partial Correlation
➢ The equation shows the relationship between multiple correlation, simple
correlation and partial correlation
➢ In a multiple linear regression model with more than two regressors we have
partial correlation of different order
➢ The concept of partial correlation is very important in locating multicollinearity
➢ For two regressors, x1 & x2 if the simple correlation between two, r212 is very high
and the partial correlation between two y and x1, r2y1.2 is very low then there will
be problem of multicollinearity
Multicollinearity: Problems
➢ A perfect multicollinearity violates the assumption that X matrix is fully
ranked and we cannot apply OLS
➢ When full rank condition is not satisfied, the inverse of X cannot be
defined and OLS estimate will be undefined
➢ The multiple linear regression model:
➢ Y = Xβ + ε
➢ The OLS estimate is:
➢ βhat = (X′X)-1(X′Y)
Multicollinearity: Problems
➢ The mean and variance of the OLS estimates are:

➢ The assumption of full column rank implies that all the explanatory
variables are independent
➢ Or the explanatory variables are orthogonal
➢ Now suppose the multiple linear regression model is:
➢ y = β0 + β1x1 + β2x2 + ε
Multicollinearity: Problems
➢ The explanatory variables are related as:
➢ ax1 + x2 = b
➢ Therefore,
➢ y = β0 + β1x1 + β2(b - ax1) + ε
➢ y = (β0 + β2 b) + ( β1 - aβ2 )x1 + ε
➢ We cannot estimate β0 , β1 & β2 separately
➢ When one variable is constant multiple of another variable they will be
perfectly correlated
Multicollinearity: Problems
➢ Multicollinearity means high value of r212
➢ But high value of r212 does not necessarily presence of multicollinearity
➢ When multicollinearity presents the variances of coefficients are inflated
➢ Multicollinearity reduces the precision of estimates
➢ Weakens the statistical power of the regression model
➢ p-values fail to reject the null hypothesis
➢ In the presence of multicollinearity OLS estimates remain unbiased but its
sampling variance becomes very large
A simple way to detect the multicollinearity is to
calculate the correlation coefficients for all possible
pairs of predictor variables

But high value of correlation coefficient do not


necessarily imply multicollinearity
Detection of
Multicollinearity In presence of multicollinearity, the R2 is quite high, the
overall F statistic is very high, but they have very high
standard errors and low significance level (Green, 2000)

Several diagnostic measures are available


Detection of Multicollinearity
➢ Determinant of (X′X):
★ As the degree of multicollinearity increases, |X′X| ⟶ 0
★ When there is perfect multicollinearity |X′X| = 0 and the rank of X′X is less
than k
★ If explanatory variables have very low variability then |X′X| ⟶ 0
★ This measure is not bounded
★ But this measure does not help in detecting the variable which is causing
multicollinearity
Detection of Multicollinearity
Determinant of the correlation matrix

★ The determinant of the correlation matrix, D lies between 0 and 1


★ If D = 0, the explanatory variables are exactly linearly dependent
★ If D = 1, the columns of matrix X are orthonormal
★ It is bounded measure, 0 ≤ D ≤ 1
★ It is not affected by the dispersion of explanatory variables
Detection of Multicollinearity

Inspection of correlation matrix:

★ The inspection of off-diagonal elements rij of X′X gives an idea of the


presence of multicollinearity
★ If Xi and Xj are nearly linearly dependent the |rij| will be close to 1
★ But pairwise inspection of correlation coefficients are not sufficient
for detecting multicollinearity in the data
Detection of Multicollinearity
Measure based on partial correlation:

★ Let R2 is the coefficient of determination in the full model based on all predictors
★ R2-j is the coefficient of determination when jth variable is dropped
★ Similarly, we can calculate (R2-1, R2-2, …………………, R2-k)
★ We have to find out R2m = max (R2-1, R2-2, …………………, R2-k)
★ If multicollinearity is present R2m will be high
★ It indicates high degree of multicollinearity
Detection of Multicollinearity
Theil’s measure of multicollinearity:

★ It is defined as m =
★ When m = 0, there is no multicollinearity

Variance Inflation Factor (VIF):

★ It is very simple test. It quantifies how much the variances of the estimated
coefficients are inflated
★ The variance of the OLS estimate βjhat is the smallest variance: V(βjhat)min = σ2/Sjj
Detection of Multicollinearity

VIF (Variance Inflation Factor):

★ For a multiple linear regression model with correlated predictors, the


variance of the estimate βjhat is: V(βjhat) = σ2/Sjj (1 - R2j)
★ Here R2j is the value of R2 obtained from regressing the jth predictor
on the remaining predictors
★ A VIF is the ratio of the two variances: VIF = V(βjhat)/V(βjhat)= 1/(1 - R2j)
Illustration by using STATA
VIF
• The minimum value of VIF is unity
• A value of 1 indicates that there is no correlation between this
independent variable and any others.
• There is no criterion for determining the bottom line of the tolerance
value or VIF.
• Some argue that a tolerance value less than 0.1 or VIF greater than 10
roughly indicates significant multicollinearity.
Dealing with Multicollinearity

There are several approaches:

1) Deletion of variables
2) Centering the variables
3) Restricted least squares
4) Principal components
5) Ridge regression

You might also like