0% found this document useful (0 votes)
93 views10 pages

4 Regression Diagnostics I

Multicollinearity occurs when explanatory variables in a regression model are highly correlated. This leads to unreliable and imprecise estimates of coefficients with high variances and standard errors. The variance inflation factor and condition number can identify multicollinearity. Increasing sample size, transforming variables, dropping variables, and using multivariate techniques can help minimize but not remove multicollinearity.

Uploaded by

Satyanshu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views10 pages

4 Regression Diagnostics I

Multicollinearity occurs when explanatory variables in a regression model are highly correlated. This leads to unreliable and imprecise estimates of coefficients with high variances and standard errors. The variance inflation factor and condition number can identify multicollinearity. Increasing sample size, transforming variables, dropping variables, and using multivariate techniques can help minimize but not remove multicollinearity.

Uploaded by

Satyanshu Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Regression Diagnostics – I

Multicollinearity

1
 Definition

 While estimating multiple regression models, quite


often we obtain unsatisfactory results.

 This happens when we have high values of variances


of the estimated coefficients, and hence high standard
errors.

 This is possible when there is little variation in


explanatory variables or high inter-correlations
among the explanatory variables or both.

 Multicollinearity → the explanatory variables of the


multiple regression model get highly correlated
arises only in the context of multiple regressions. 2
 Multicollinearity represents lack of independent
movement in the sample data on explanatory
variables it is a feature of sample data, and is
absent in the context of population data.

 When multicollinearity is present, we cannot separate


out the effects of the explanatory variables on the
dependent variable → if Yi  f (X1i , X2i ) , and X 1i and X 2i
are perfectly correlated, then either could predict Yi
and the other would become superfluous.

 Perfect correlation between the explanatory variables


is rarely observed → most frequently observed is a
situation of high correlation among the explanatory
variables → we face difficulty in obtaining precise
estimates for the unknown parameters. 3
 Consequences of Multicollinearity

Consider the MLRM


Yi     1 X 1i   2 X 2 i   i
Three possible cases:
1. Absence of multicollinearity r122  0
2. Perfect multicollinearity r122  1
3. Some degree of multicollinearity 0  r122 1

Case I SLRM collapses to MLRM yet we should


continue with MLRM as the variances of the estimated
coefficients from the SLRMs would be upwardly biased
when working with multivariate data, it is advisable to
fit multiple regressions.

Case II Impossible to obtain OLS estimates for


unknown parameters of the MLRM. 4
Case III If r122 is close to 1 high degree of
multicollinearity possible to perform OLS estimation of
the MLRM, but the variances of the estimates would
become very large. This follows from
 2
 2
Var ( ˆ1 )  
Var ( ˆ ) 
 x12i (1  r122 ) and 2
 x 22i (1  r122 )

Important Observations:

 Multicollinearity makes the OLS estimates imprecise or


unreliable as they have large variances and hence large
standard errors.

 The coefficients affected by multicollinearity will have


low (computed) t-values so that their associated variables
become statistically insignificant.

 The signs of the estimated coefficients may be reversed


under the influence of multicollinearity. 5
 Tests for Multicollinearity

 The Variance-Inflation Factor (VIF)

Multicollinearity generates high variances for the OLS


estimates thereby providing insignificant regression results.

So, the VIF test compares the variances of OLS estimates


under two situations:

(i) Ideal situation → multicollinearity is absent, and

(ii) Observed situation → multicollinearity is present

When multicollinearity is present,


 2
Var ( ˆ k ) 
 x ki2 (1  R k2 )
Under the ‘ideal situation”, R k2  0 , so that
 2
Var ( ˆ k ) 
 x ki2 6
The VIF is computed as a ratio of the two variances.
 2
 xki2 (1  Rk2 ) 1
VIF (ˆk )  
 2 1  Rk2
 xki2
Now, if VIF( ˆ k )  1 , we say that there is no multicollinearity
while VIF ( ˆ k )  1 indicates its presence.

Rule of thumb: VIF  10 → there is serious multicollinearity


involving the corresponding explanatory variable.

Note:
 VIFs are computed for each of the estimated slope
coefficients.
 VIF values help to identify the multicollinear variables.
7
 The Condition Number (CN)

CN provides an overall measure for multicollinearity.

It is computed as

Highest eigen value of the matrix ( X X )


CN 
Lowest eigen value of the matrix ( X X )

Rules of thumb (degree of multicollinearity):


i. CN  1 absent
ii. 1  CN  10 negligible
iii. 10  CN  30 moderate to strong
iv. CN  30 severe
8
Remedial Measures

o Of all econometric problems, multicollinearity is the most


serious one → no measure can completely remove it.

o The measures suggested attempt to minimize its impact so


that reasonable regression results are obtained.

 Increasing sample size → helps to reduce the severity of


multicollinearity → becomes clear from the following:
 2
Var ( ˆ k ) 
 x ki2 (1  Rk2 )
As sample size increases,
  xki increases and Var ( ˆ K ) falls, unless the values of all
2

additional observations on X ki are equal to X ki , which


is most unlikely to happen.
 R k2 also falls, which further reduces Var ( ˆ K ) . 9
 Transformation of variables the intensity of
multicollinearity may fall when transformed variables
(ratio, first-difference etc.) are used instead of variables
in ‘levels’.

 Dropping variables one of the easiest ways to


overcome the multicollinearity problem.

After identification of the multicollinear variables, the


researcher often drops some of them from the model,
especially those for which the estimated coefficients have
absolute value of computed-t less than 1.

 Other methods multivariate techniques like principal


components analysis, factor analysis, ridge regression,
etc.

You might also like