0% found this document useful (0 votes)

59 views27 pages

CH 5 - Multicollearity

This document discusses multicollinearity in regression analysis. It defines multicollinearity as a high correlation between two or more independent variables. It outlines the following key points: 1. Multicollinearity can cause estimation of coefficients to have high variance and become unstable, making precise estimation difficult. 2. Even when the overall regression model is significant, individual coefficients may not be statistically significant due to multicollinearity inflating their standard errors. 3. Multicollinearity can be detected by a high R-squared but few significant t-ratios, high pairwise correlations between predictors, and variance inflation factors above 10. 4. Potential remedial measures include dropping highly correlated variables, transforming

Uploaded by

Nguyễn Lê Minh Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views27 pages

CH 5 - Multicollearity

Uploaded by

Nguyễn Lê Minh Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter 4B

Multicollinearity
Outline
• The nature of multicollinearity
• Estimation in the presence of multicollinearity.
• Practical consequences
• Detection of multicollinearity
• Remedial measures
1. The Nature of Multicollinearity
• Originally it meant the existence of a “perfect,” or exact,
linear relationship among some or all explanatory variables of
a regression model.
• Today, it includes perfect multicollinearity and less than
perfect multicollinearity.
• Wooldridge (2004): High (but not perfect) correlation between
two or more independent variables is called multicollinearity.
• Perfect multicollinearity
λ1X1 + λ2X2 + · · ·+λk Xk = 0
• Unperfect multicollinearity
λ1X1 + λ2X2 + · · ·+λ2Xk + vi = 0
where vi is a stochastic error term.
1. The Nature of Multicollinearity

• A numerical example:

• X3i = 5X2i  There is perfect collinearity between X2 and X3

• The variable X*3 was created from X3 by simply adding to it
the following numbers (vi = 2, 0, 7, 9, 2). Now there is no
longer perfect collinearity between X2 and X*3. However, the two
variables are highly correlated because calculations will show that the
coefficient of correlation between them is 0.9959.
The Nature of Multicollinearity

• The term
2. Estimation in the presence of multicollinearity
Perfect multicollinearity

Perfect collirearity X3i = λX2i , where λ is a nonzero constant

 the estimator is indeterminate

2. Estimation in the presence of multicollinearity
High multicollinearity
• The variances and covariances of β2 and β3 are given by

 where r23 is the coefficient of correlation between X2 and X3.

• The r23 tends toward 1 as collinearity increases, the variances
and covariance of the estimators increase.
Perfect collinearity: r23 = 1, the variances are infinite.
2. Estimation in the presence of multicollinearity

• The speed with which variances and covariances increase

can be seen with the variance-inflating factor (VIF),
which is defined as:

• Using this definition, we can express

3. Practical consequences of Multicollinearity

High multicollinearity
1. The OLS estimators have large variances and covariances,
making precise estimation difficult.
2. The confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. The t ratio of one or more coefficients tends to be statistically
insignificant.
4. Although the t ratio of one or more coefficients is statistically
insignificant, R2, the overall measure of goodness of fit, can be
very high.
5. The OLS estimators and their standard errors can be sensitive
to small changes in the data.
Example
Example
• Income and wealth together explain about 96
percent of the variation in consumption
expenditure .
• Neither of the slope coefficients is individually
statistically significant.
• Not only is the wealth variable statistically
insignificant but also it has the wrong sign.
• H0 ( ) is rejected (F=92.40)  Consumption
expenditure is related to income and wealth.
 When collinearity is high, tests on individual
regressors are not reliable.
Example
• Correlation matrix
. use "D:\Bai giang\Kinh te luong\datasets\WAGE2.DTA", clear

. gen exper2=exper*2

. pwcorr educ exper exper2 tenure age sibs brthord, star(0.05)

educ exper exper2 tenure age sibs brthord

educ 1.0000
exper -0.4556* 1.0000
exper2 -0.4556* 1.0000* 1.0000
tenure -0.0362 0.2437* 0.2437* 1.0000
age -0.0123 0.4953* 0.4953* 0.2706* 1.0000
sibs -0.2393* 0.0643* 0.0643* -0.0392 -0.0407 1.0000
brthord -0.2050* 0.0883* 0.0883* -0.0285 0.0054 0.5939* 1.0000
Example
• Regression results
. reg lwage educ exper exper2 tenure age sibs brthord
note: exper omitted because of collinearity

Source SS df MS Number of obs = 852

F( 6, 845) = 28.37
Model 24.6147323 6 4.10245539 Prob > F = 0.0000
Residual 122.201332 845 .144616961 R-squared = 0.1677
Adj R-squared = 0.1617
Total 146.816065 851 .172521815 Root MSE = .38029

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0675684 .0071676 9.43 0.000 .0534999 .0816368

exper 0 (omitted)
exper2 .0047813 .0020189 2.37 0.018 .0008186 .0087439
tenure .0119621 .0026717 4.48 0.000 .0067181 .0172062
age .0098871 .0050935 1.94 0.053 -.0001103 .0198845
sibs -.0067073 .0071707 -0.94 0.350 -.0207818 .0073672
brthord -.0134547 .0102047 -1.32 0.188 -.0334841 .0065748
_cons 5.406201 .1685784 32.07 0.000 5.07532 5.737083
Example
• Regression results without sibs
. reg lwage educ exper tenure age brthord

Source SS df MS Number of obs = 852

F( 5, 846) = 33.87
Model 24.4882037 5 4.89764073 Prob > F = 0.0000
Residual 122.327861 846 .14459558 R-squared = 0.1668
Adj R-squared = 0.1619
Total 146.816065 851 .172521815 Root MSE = .38026

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0684463 .0071054 9.63 0.000 .0545001 .0823926

exper .0096979 .0040349 2.40 0.016 .0017782 .0176175
tenure .0120242 .0026707 4.50 0.000 .0067822 .0172662
age .0099569 .0050926 1.96 0.051 -.0000387 .0199524
brthord -.018937 .008353 -2.27 0.024 -.035332 -.002542
_cons 5.383053 .1667398 32.28 0.000 5.055781 5.710325
Example
• Regression results without brthord
. reg lwage educ exper tenure age sibs

Source SS df MS Number of obs = 935

F( 5, 929) = 35.96
Model 26.8611795 5 5.37223589 Prob > F = 0.0000
Residual 138.795104 929 .149402695 R-squared = 0.1622
Adj R-squared = 0.1576
Total 165.656283 934 .177362188 Root MSE = .38653

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0683854 .0069057 9.90 0.000 .0548328 .0819381

exper .0114801 .0039255 2.92 0.004 .0037763 .019184
tenure .0124307 .0026148 4.75 0.000 .0072992 .0175622
age .0086558 .0049387 1.75 0.080 -.0010366 .0183481
sibs -.0121724 .0056602 -2.15 0.032 -.0232807 -.0010642
_cons 5.384746 .1630053 33.03 0.000 5.064844 5.704647
3. Detection of Multicollinearity

• High R2 but few significant t ratios. If R2 is high,

say, in excess of 0.8, the F test in most cases will
reject the hypothesis that the partial slope
coefficients are simultaneously equal to zero, but the
individual t tests will show that none or very few of
the partial slope coefficients are statistically
different from zero.
3. Detection of Multicollinearity

• High pair-wise correlation among regressors: a rule of

thumb indicates that the pair-wise correlation is high, say, in
excess of 0.8, then multicollinearity is a serious problem. 
This is a sufficient but not necessary condition. The model
involving more than two explanatory variables, the
correlations will not provide an infallible guide to the
presence of multicollinearity.
• Auxiliary regressions: to regress each Xi on the remaining
X variables and compute the corresponding R2. If the
computed F exceeds the critical Fi at the chosen level of
significance, it is taken to mean that the particular Xi is
collinear with other X’s;
3. Detection of Multicollinearity

• Tolerance and variance inflation factor: if the VIF of a

variable exceeds 10, which will happen if R2j exceeds 0.90,
that variable is said be highly collinear
The inverse of the VIF is called tolerance (TOL). That is,

When R2j = 1 (i.e., perfect collinearity), TOLj = 0

When R2j = 0 (i.e., no collinearity whatsoever), TOLj is 1
4. Remedial Measures

• Do Nothing:
• A priori information.
• Combining cross-sectional and time series data.
• Dropping a variable(s) and specification bias
• Transformation of variables
• Additional or new data.
Do Nothing
• Multicollinearity is essentially a data
deficiency problem and sometimes we have no
choice over the data we have available for
empirical analysis.
Priori information
Ex: Suppose we consider the Cobb Douglas production
function of a country:

Or:
- High correlation between K and L leads to large
variances of coefficient estimators.
- Based on the findings in prior literature, we know that the
country has constant returns to scale: α+β = 1.
Priori information
Replacing β with 1-α, we obtain:

or
Where
We estimate and compute:
Combining cross-sectional and time series data
• Examine the demand for automobiles

Where Y= number of cars sold, P= average

price, I= income, t= time.
We estimate price elasticity and income
elasticity.
In time series data, the price and income
variables tend to be highly collinear.
Combining cross-sectional and time series data
• If we have cross-sectional data, for ex, data
generated by consumer panes, or budget
studies conducted by various private and
governmental agencies  we obtain a fairly
reliable estimate of the income elasticity
• Time series regression:

• Where:
• Dropping a variable(s) and specification bias

-When we drop the wealth variable, the income variable is now

highly significant.
- But we may be committing a specification bias or
specification error. Economic theory says that income and
wealth should both be included in the model explaining the
consumption expenditure, dropping the wealth variable would
constitute specification bias.
-The remedy may be worse than the disease. Multicolliearity
may prevent precise estimation of the parameters, whereas
omitting a variable may seriously mislead us as to the true
values of the parameters.
Transformation of variables-first difference form
Regression model:

 It must hold at time (t-1):

 The first difference regression model often

reduces the severity of multicollinearity.
Additional or new data
• Increasing the size of the sample may
attenuate the collinearity problem.

• The sample size increases  will

increase  The variance will decrease.

Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
2018 Biosatics MCQ
100% (4)
2018 Biosatics MCQ
33 pages
CH 17 Statistica
No ratings yet
CH 17 Statistica
36 pages
AE6207 - Solution 1 - 2024
No ratings yet
AE6207 - Solution 1 - 2024
8 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
No ratings yet
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
48 pages
Omitted Variable Tests
No ratings yet
Omitted Variable Tests
4 pages
L10.2 2023
No ratings yet
L10.2 2023
64 pages
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
4 pages
Lab Exercises Answer
No ratings yet
Lab Exercises Answer
13 pages
Results - Practical 2 - Econometrics
No ratings yet
Results - Practical 2 - Econometrics
4 pages
17-Econometrics-Linear Regression
No ratings yet
17-Econometrics-Linear Regression
18 pages
Assignement 1 .Hridita. BUS 525
No ratings yet
Assignement 1 .Hridita. BUS 525
10 pages
L9.1 2023
No ratings yet
L9.1 2023
47 pages
FE RE Models 2024
No ratings yet
FE RE Models 2024
250 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Solutions To Practice Questions For Classes 6 To 12 ECO2151 Winter 2024 PDF
No ratings yet
Solutions To Practice Questions For Classes 6 To 12 ECO2151 Winter 2024 PDF
11 pages
EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
76 pages
19-Econometrics-Linear Regression
No ratings yet
19-Econometrics-Linear Regression
17 pages
ch4 Dummy
No ratings yet
ch4 Dummy
54 pages
HJGH
No ratings yet
HJGH
48 pages
Example of 2SLS and Hausman Test
No ratings yet
Example of 2SLS and Hausman Test
4 pages
Life Expectancy at BirthJ Total Years
No ratings yet
Life Expectancy at BirthJ Total Years
3 pages
Analysing Panel Data
No ratings yet
Analysing Panel Data
25 pages
Assignment Econometrics
No ratings yet
Assignment Econometrics
7 pages
Ees 404
No ratings yet
Ees 404
10 pages
University of Gujrat: Department of Management Sciences
No ratings yet
University of Gujrat: Department of Management Sciences
13 pages
University of Gujrat: Department of Management Sciences
No ratings yet
University of Gujrat: Department of Management Sciences
16 pages
Multicollinearity Example
No ratings yet
Multicollinearity Example
22 pages
University of Gujrat: Department of Management Sciences
No ratings yet
University of Gujrat: Department of Management Sciences
10 pages
Econ 1630 HW1
No ratings yet
Econ 1630 HW1
6 pages
05 Week Economicsofeducation
No ratings yet
05 Week Economicsofeducation
11 pages
Student Name: Zainab Kashani Student ID: 63075: Date of Submission: Sep 16, 2020
No ratings yet
Student Name: Zainab Kashani Student ID: 63075: Date of Submission: Sep 16, 2020
10 pages
Part 2 - Simple Regression Model
No ratings yet
Part 2 - Simple Regression Model
56 pages
KTL 31 10 2024
No ratings yet
KTL 31 10 2024
2 pages
8 面板数据方法
No ratings yet
8 面板数据方法
63 pages
Assignment No5
No ratings yet
Assignment No5
1 page
Specification
No ratings yet
Specification
12 pages
Mock Test Econ
No ratings yet
Mock Test Econ
2 pages
Im ch01
No ratings yet
Im ch01
11 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Introduction To Econometrics, 5 Edition: Chapter 4: Nonlinear Models and Transformations of Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 4: Nonlinear Models and Transformations of Variables
27 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Advanced Panel Data Methods: Basic Econometrics
100% (1)
Advanced Panel Data Methods: Basic Econometrics
32 pages
Topic 6 - FE, RE and Tests
No ratings yet
Topic 6 - FE, RE and Tests
46 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
EES 400 Assignment November 2024
No ratings yet
EES 400 Assignment November 2024
2 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
Simple Regression Model
No ratings yet
Simple Regression Model
54 pages
5103A1
No ratings yet
5103A1
6 pages
2 Simple Regression Model
No ratings yet
2 Simple Regression Model
55 pages
Stata Textbook Examples Introductory Eco No Metrics by Jeffrey
100% (1)
Stata Textbook Examples Introductory Eco No Metrics by Jeffrey
104 pages
Stata Textbook Examples Introductory Econometrics by Jeffrey PDF
No ratings yet
Stata Textbook Examples Introductory Econometrics by Jeffrey PDF
104 pages
Assignment - Group 3
No ratings yet
Assignment - Group 3
2 pages
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
No ratings yet
Analisis Regresi Sederhana Dan Berganda (Teori Dan Praktik)
53 pages
283
No ratings yet
283
7 pages
Instrumental Variable in Regression
No ratings yet
Instrumental Variable in Regression
28 pages
Intergrated Problem
No ratings yet
Intergrated Problem
8 pages
Mehran Riaz - 17232720-016 - QTB
No ratings yet
Mehran Riaz - 17232720-016 - QTB
13 pages
Chapter - 5 - Panel Data Analysis
No ratings yet
Chapter - 5 - Panel Data Analysis
53 pages
STATA Training For Staff
No ratings yet
STATA Training For Staff
23 pages
W7A1
No ratings yet
W7A1
6 pages
Clog P Dengan Aktivitas (Log 1/IC) : Regression
No ratings yet
Clog P Dengan Aktivitas (Log 1/IC) : Regression
6 pages
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
No ratings yet
A Note On Using Stratified Alpha To Estimate The Composite Reliability of A Test Composed of Interrelated Nonhomogeneous Items
8 pages
Previewpdf
No ratings yet
Previewpdf
27 pages
6 Ways To Test For A Normal Distribution - Which One To Use - by Joos Korstanje - Towards Data Science
No ratings yet
6 Ways To Test For A Normal Distribution - Which One To Use - by Joos Korstanje - Towards Data Science
9 pages
Germany22 Luedicke
No ratings yet
Germany22 Luedicke
39 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
Business Statistics (B.com) P 1
No ratings yet
Business Statistics (B.com) P 1
99 pages
Dependent T-Test Using SPSS/PAIRED T-TEST: Assumption #1
No ratings yet
Dependent T-Test Using SPSS/PAIRED T-TEST: Assumption #1
4 pages
Assignment For Statistical Economics
No ratings yet
Assignment For Statistical Economics
3 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Application of Statistical Concepts in The Determination of Weight Variation in Samples
100% (1)
Application of Statistical Concepts in The Determination of Weight Variation in Samples
4 pages
Classification L12
No ratings yet
Classification L12
20 pages
Linear Regression: An Approach For Forecasting
No ratings yet
Linear Regression: An Approach For Forecasting
12 pages
Nptel ML Questions
No ratings yet
Nptel ML Questions
12 pages
Lecture03 MachineLearning
No ratings yet
Lecture03 MachineLearning
78 pages
Inferential Statistics
No ratings yet
Inferential Statistics
6 pages
On Eda
No ratings yet
On Eda
60 pages
Chapter-12 ANOVA For-Homework
No ratings yet
Chapter-12 ANOVA For-Homework
16 pages
AICS Manual
0% (1)
AICS Manual
3 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Financial Management: Lecture No. 22 Portfolio Risk Analysis & Efficient Portfolio Maps Batch 6-2
No ratings yet
Financial Management: Lecture No. 22 Portfolio Risk Analysis & Efficient Portfolio Maps Batch 6-2
10 pages
BY: David J.Litja: Measurement Errors
No ratings yet
BY: David J.Litja: Measurement Errors
53 pages
PART I: (Please Answer On The QUESTION SHEET) : F E - GTP 8 Intake Index 1
No ratings yet
PART I: (Please Answer On The QUESTION SHEET) : F E - GTP 8 Intake Index 1
8 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Upper Critical Values of The Student's-T Distribution
No ratings yet
Upper Critical Values of The Student's-T Distribution
3 pages
ClassWork 03 Multi Way ANOVA - Sol
No ratings yet
ClassWork 03 Multi Way ANOVA - Sol
31 pages
Mean and Variance
No ratings yet
Mean and Variance
22 pages