Multi Collinearity
Multi Collinearity
The coefficient of multiple determination: R2y.12 = ESS/TSS = (β1hat S1y + β2hat S2y )/Syy
The positive square root of the coefficient of multiple determination is called the multiple correlation coefficient
Partial Correlation
➢ The RSS in a multiple linear regression model is:
➢ RSS = Syy (1 - R2y.12)
➢ The OLS estimates β1hat & β2hat have partial effect or ceteris paribus
interpretation:
➢ β1hat = ∑ u1hat y/∑(u1hat)2
➢ Where, u1hat = x1 - bhatx2 is the estimated residual of the regression
equation of x1 on x2 measuring that part of x1 which is uncorrelated to x2
Partial Correlation
➢ The correlation between the dependent variable y and that part of x1 which
is not explained by another regressor x2 is called the partial correlation
between y and x1: r2y1.2
➢ The assumption of full column rank implies that all the explanatory
variables are independent
➢ Or the explanatory variables are orthogonal
➢ Now suppose the multiple linear regression model is:
➢ y = β0 + β1x1 + β2x2 + ε
Multicollinearity: Problems
➢ The explanatory variables are related as:
➢ ax1 + x2 = b
➢ Therefore,
➢ y = β0 + β1x1 + β2(b - ax1) + ε
➢ y = (β0 + β2 b) + ( β1 - aβ2 )x1 + ε
➢ We cannot estimate β0 , β1 & β2 separately
➢ When one variable is constant multiple of another variable they will be
perfectly correlated
Multicollinearity: Problems
➢ Multicollinearity means high value of r212
➢ But high value of r212 does not necessarily presence of multicollinearity
➢ When multicollinearity presents the variances of coefficients are inflated
➢ Multicollinearity reduces the precision of estimates
➢ Weakens the statistical power of the regression model
➢ p-values fail to reject the null hypothesis
➢ In the presence of multicollinearity OLS estimates remain unbiased but its
sampling variance becomes very large
A simple way to detect the multicollinearity is to
calculate the correlation coefficients for all possible
pairs of predictor variables
★ Let R2 is the coefficient of determination in the full model based on all predictors
★ R2-j is the coefficient of determination when jth variable is dropped
★ Similarly, we can calculate (R2-1, R2-2, …………………, R2-k)
★ We have to find out R2m = max (R2-1, R2-2, …………………, R2-k)
★ If multicollinearity is present R2m will be high
★ It indicates high degree of multicollinearity
Detection of Multicollinearity
Theil’s measure of multicollinearity:
★ It is defined as m =
★ When m = 0, there is no multicollinearity
★ It is very simple test. It quantifies how much the variances of the estimated
coefficients are inflated
★ The variance of the OLS estimate βjhat is the smallest variance: V(βjhat)min = σ2/Sjj
Detection of Multicollinearity
1) Deletion of variables
2) Centering the variables
3) Restricted least squares
4) Principal components
5) Ridge regression