CH 5 - Multicollearity
CH 5 - Multicollearity
Multicollinearity
Outline
• The nature of multicollinearity
• Estimation in the presence of multicollinearity.
• Practical consequences
• Detection of multicollinearity
• Remedial measures
1. The Nature of Multicollinearity
• Originally it meant the existence of a “perfect,” or exact,
linear relationship among some or all explanatory variables of
a regression model.
• Today, it includes perfect multicollinearity and less than
perfect multicollinearity.
• Wooldridge (2004): High (but not perfect) correlation between
two or more independent variables is called multicollinearity.
• Perfect multicollinearity
λ1X1 + λ2X2 + · · ·+λk Xk = 0
• Unperfect multicollinearity
λ1X1 + λ2X2 + · · ·+λ2Xk + vi = 0
where vi is a stochastic error term.
1. The Nature of Multicollinearity
• A numerical example:
• The term
2. Estimation in the presence of multicollinearity
Perfect multicollinearity
High multicollinearity
1. The OLS estimators have large variances and covariances,
making precise estimation difficult.
2. The confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. The t ratio of one or more coefficients tends to be statistically
insignificant.
4. Although the t ratio of one or more coefficients is statistically
insignificant, R2, the overall measure of goodness of fit, can be
very high.
5. The OLS estimators and their standard errors can be sensitive
to small changes in the data.
Example
Example
• Income and wealth together explain about 96
percent of the variation in consumption
expenditure .
• Neither of the slope coefficients is individually
statistically significant.
• Not only is the wealth variable statistically
insignificant but also it has the wrong sign.
• H0 ( ) is rejected (F=92.40) Consumption
expenditure is related to income and wealth.
When collinearity is high, tests on individual
regressors are not reliable.
Example
• Correlation matrix
. use "D:\Bai giang\Kinh te luong\datasets\WAGE2.DTA", clear
. gen exper2=exper*2
educ 1.0000
exper -0.4556* 1.0000
exper2 -0.4556* 1.0000* 1.0000
tenure -0.0362 0.2437* 0.2437* 1.0000
age -0.0123 0.4953* 0.4953* 0.2706* 1.0000
sibs -0.2393* 0.0643* 0.0643* -0.0392 -0.0407 1.0000
brthord -0.2050* 0.0883* 0.0883* -0.0285 0.0054 0.5939* 1.0000
Example
• Regression results
. reg lwage educ exper exper2 tenure age sibs brthord
note: exper omitted because of collinearity
• Do Nothing:
• A priori information.
• Combining cross-sectional and time series data.
• Dropping a variable(s) and specification bias
• Transformation of variables
• Additional or new data.
Do Nothing
• Multicollinearity is essentially a data
deficiency problem and sometimes we have no
choice over the data we have available for
empirical analysis.
Priori information
Ex: Suppose we consider the Cobb Douglas production
function of a country:
Or:
- High correlation between K and L leads to large
variances of coefficient estimators.
- Based on the findings in prior literature, we know that the
country has constant returns to scale: α+β = 1.
Priori information
Replacing β with 1-α, we obtain:
or
Where
We estimate and compute:
Combining cross-sectional and time series data
• Examine the demand for automobiles
• Where:
• Dropping a variable(s) and specification bias