0% found this document useful (0 votes)

78 views20 pages

JK

The document discusses multicollinearity in regression analysis. It defines multicollinearity as an exact or near-exact linear relationship among explanatory variables. This causes regression coefficients to be imprecisely estimated and statistically insignificant. The document outlines ways to detect multicollinearity, such as high R-squared but few significant coefficients, and remedies such as dropping or transforming variables. It provides an example of estimating a consumption model and diagnosing multicollinearity using variance inflation factors.

Uploaded by

LA Syamsul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views20 pages

JK

Uploaded by

LA Syamsul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

MULTICOLLINEARITY

BASIC ECONOMETRIC
INTRODUCTION

• Means the existence of a “perfect”, or exact, linear

relationship among some or all explanatory variables of
regression model (Ragnar Fris, 1934).
• But today, the term multicollinearity is used in a border sense
to include the case the X variables are intercorrelated but not
perfectly.`
• Multicollinearity (M-C) – avoid the assumption 10 of CLRM,
which is no M-C among the regressors included in the
regression model.
INTRODUCTION

• Why CLRM assume here is no M-C among X’s?

• Reason:
– If M-C is perfect, the regression coefficients of the X
variables are indeterminate and their SE are infinite.
– If M-C is less perfect, the regression coefficient, posses
large SE (in relation to the coefficient themselves) which
means the coefficients cannot be estimated with great
precision or accuracy.
SOURCES OF M-C

• The data collection method employed. Sampling over a

limited range of the values taken by the regressors in the
population.
• Constraints on the model or in the population being
sampled. Example: regression of electricity cons on income
(X2) and house size (X3) – families with higher income
generally have larger homes.
• Model specification. Example: adding polynomial terms to a
regression model, especially when the range of the X variable
is small.
SOURCES OF M-C

• An overdetermined model. This happens when the model has

more explanatory variables than the number of obs.
CONSEQUENCES OF M-C
1. Although BLUE, the OLS estimators have large variance and
covariance – making precise estimation difficult.
2. Because of (1), the confidence intervals tend to be much
wider, leading the acceptance of “zero null hypothesis” (H0).
3. Also because of (1), the t ratio of one or more coefficients
tends to be statistically insignificant.
4. Although t ratio is statistically insignificant, the , the
overall measure of goodness of fit can be very high.
5. The OLS estimators and their standard errors can be
sensitive to small change in the data.
(refer to Gujerati to get the more evaluation)
DETECTION OF M-C

• High but few significant t ratio.

– This is classic sympton of M-C.
– If high, says >0.8, the F test will reject the hypothesis all coefficient
simultaneously equal to zero.
– But the individual t test show that none or very few of the coefficients
are statistically different from zero.
• High-pair-wise correlation among regressors.
– The pair-wise or zero-order correlation coefficient between 2
regressors is high (> 0.8).
– But, high zero-order correlation are sufficient but not a necessary
condition for the existence of M-C because it can exist even though
the zero-order or simple correlation are comparatively low.
DETECTION OF M-C

• Examination of partial correlations.

– In regression of Y on X2, X3 and X4 , a finding that is very high
but their , and are comparatively low may suggest
that the variable X2, X3 and X4 are highly intercorrelated.
– Although a study of the partial correlations may be useful, but there is
no guarantee they will provide an infallible guide o M-C, for it may
happen that both and all partial correlations are sufficiently high.
DETECTION OF M-C

• Variance inflation factor (VIF).

– Measures M-C by regrressing one independent variables on all of the
remaining independent variables.
– If we have 3 independent variables, to use VIF to look aby possible M-
C, we run 3 regrerssion, one for each independent variables.

– We would run the following three regression:

DETECTION OF M-C

– Next, for each these regression, we calculate the VIF using this
formula:

– where is the unadjusted from a regression using X as dependent

variables. (not from the original regression that we suspect has M-C).
– Idea VIF – if the in above equation is high, the variances of the
slope estimates (and the S.E) will also be high or inflated.
– Some researchers believe that a VIF greater than 4 indicates a serious
M-C problem
REMEDIAL MEASURES

• Do nothing
– Blanchard (1967) – M-C is essentially a data deficiency problem and
some times we have no choice over the data we have available for
empirical analysis.
– Not all the coefficient are statistically insignificant. If we cannot
estimate one or more regression coefficient s with greater precision, a
linear combination of them can be estimated relatively efficiently.
REMEDIAL MEASURES

• Combining cross-sectional and time series data.

– known as pooling the data.
– Assume we have time series data on the number of cars sold, average
price of the car and consumer income:

– Where Y = number of cars, P = average price, I = income and t = time.

– In time series data, price and income generally tend to be highly
collinear.
– If we run regression – face M-C problem.
– Tobin (1950) suggest, if we have cross-sectional data, we can obtain a
fairly reliable estimate of the income elasticity because in such data,
which are at a point in time, the prices do not vary much.
REMEDIAL MEASURES

• Dropping a variable(s) as specification bias.

– One of ‘simplest’ things to do – drop one of collinear variables.
– But dropping a variable – may be commiting a specification bias or
specification error.
– Specification bias arises from incorrect specification of the model used
in the analysis.
• Transformation of variables.
– Supposed we have time series data on cons expenditure (Y), income
(X1) and wealth (X2).
– Income and wealth show high collinearity – overt time both the
variables tend to move in the same direction.
REMEDIAL MEASURES

• At the time t-1,

• If we subtract before from later, we obtain

• Where
• Equation above known a first difference form because we run
the regression not in the original variable, but on difference
form.
• First difference model often reduce the severity of M-C
because there is no priori reason their difference will also
highly correlated.
REMEDIAL MEASURES

• Additional or new data

– Since M-C is a sample feature, it is possible that in other sample
involving the same variables collinearity may not be serious as in the
first sample.
– Some times, simply increase the size sample may attenuate the
collinearity problem.
DIAGNOSTIC TEST
THE MODEL ESTIMATED
• Lets the model we want to estimate is as below:

ln 𝑃𝐶𝐸𝑡 = 𝛼0 + 𝛼1 ln 𝑃𝐷𝐼𝑡 + 𝛼2 ln 𝐺𝐷𝑃𝑡 + 𝑢𝑡 (1)

where;
PCE = personel consumption expenditure (billion of 1987 dollar)
PDI = personal disposible income (billion of 1987 dollar)
GDP = Gross domestic products (billion of 1987 dollar)
CHANGE DATA INTO LOG FORM
• Because the model is in log form, we must change the data
and create the new variables in Stata into log form.
. gen lngdp = log(gdp)

. gen lnpdi = log(pdi)

. gen lnpce = log(pce)

• Now we have create a new variables namely lngdp, lnpdi and

lnpce in log form.
• Make sure you have create the variable t (if use time series),
and at Stata command;
. tsset t
time variable: t, 1 to 88
delta: 1 unit
REGRESS THE MODEL
• Now, base on the model in Eq(1), we will perform an OLS
regression.

. regress lnpce lnpdi lngdp, constant

Source SS df MS Number of obs = 88

F( 2, 85) =15955.37
Model 2.93006108 2 1.46503054 Prob > F = 0.0000
Residual .007804743 85 .000091821 R-squared = 0.9973
Adj R-squared = 0.9973
Total 2.93786582 87 .033768573 Root MSE = .00958

lnpce Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnpdi .4871041 .0601343 8.10 0.000 .367541 .6066672

lngdp .6091764 .0636003 9.58 0.000 .4827219 .7356309
_cons -1.060787 .0689796 -15.38 0.000 -1.197937 -.9236368
MULTICOLLINEAR TEST
• We now will test whether the mulit-k is exist in our model by
using the VIF test.
. estat vif

Variable VIF 1/VIF

lngdp 102.35 0.009770

lnpdi 102.35 0.009770

Mean VIF 102.35

• Our VIF = 102.35 , consider high multi-k

Sa1 Frame
No ratings yet
Sa1 Frame
51 pages
30GX
0% (1)
30GX
12 pages
Earn Money Typing Online
100% (3)
Earn Money Typing Online
37 pages
Exercises For CH 5
No ratings yet
Exercises For CH 5
2 pages
Mahatma Gandhi University Revised Scheme For B Tech Syllabus Revision 2010 (Civil Engineering)
No ratings yet
Mahatma Gandhi University Revised Scheme For B Tech Syllabus Revision 2010 (Civil Engineering)
4 pages
1.0 Chapter 1
No ratings yet
1.0 Chapter 1
18 pages
MiniROVER Data Sheet 2013 Lo 1
No ratings yet
MiniROVER Data Sheet 2013 Lo 1
2 pages
Chapter One
No ratings yet
Chapter One
7 pages
Photography - Tips & Tricks
No ratings yet
Photography - Tips & Tricks
13 pages
The Background of The Local Government
No ratings yet
The Background of The Local Government
4 pages
What Happened To You Book Discussion Guide-National Version
No ratings yet
What Happened To You Book Discussion Guide-National Version
7 pages
Individual Project MGMT25000
No ratings yet
Individual Project MGMT25000
3 pages
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
No ratings yet
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
3 pages
Chapter Two: Concepts of Local Government
No ratings yet
Chapter Two: Concepts of Local Government
9 pages
The World of International Economics: Mcgraw-Hill/Irwin
No ratings yet
The World of International Economics: Mcgraw-Hill/Irwin
21 pages
Chap. 2 Decentralization
No ratings yet
Chap. 2 Decentralization
19 pages
AVL Tree Operations
No ratings yet
AVL Tree Operations
23 pages
40 ICAGRI SlsiRevisi LukmanHakim
No ratings yet
40 ICAGRI SlsiRevisi LukmanHakim
10 pages
Wulandari Et Al-2019-Journal of Holistic Nursing Science
No ratings yet
Wulandari Et Al-2019-Journal of Holistic Nursing Science
6 pages
BIO270 Lab 1 Manual 2014
No ratings yet
BIO270 Lab 1 Manual 2014
7 pages
Thesis Proposal On International Trade
100% (2)
Thesis Proposal On International Trade
6 pages
Malaysian School Counsellors' Challenges in Job Description, Job Satisfaction and Competency
No ratings yet
Malaysian School Counsellors' Challenges in Job Description, Job Satisfaction and Competency
7 pages
Kratus 2017 Music Listening Is Creative
No ratings yet
Kratus 2017 Music Listening Is Creative
6 pages
In Partial Fulfillment of The Requirements For Work Immersion
No ratings yet
In Partial Fulfillment of The Requirements For Work Immersion
10 pages
Detyre Kursi Rrjeta Telematike
No ratings yet
Detyre Kursi Rrjeta Telematike
19 pages
Generative Ai-In-The-Loop: Integrating Llms and Gpts Into The Next Generation Networks
No ratings yet
Generative Ai-In-The-Loop: Integrating Llms and Gpts Into The Next Generation Networks
9 pages
Science 10 Lesson Plan
100% (1)
Science 10 Lesson Plan
7 pages
Surat Undangan Peserta ADIA
No ratings yet
Surat Undangan Peserta ADIA
9 pages
Homework Hotline d428
100% (1)
Homework Hotline d428
5 pages
Uninterruptible Power Supply (UPS)
No ratings yet
Uninterruptible Power Supply (UPS)
11 pages
Compitators
No ratings yet
Compitators
32 pages
BS 5493 1977 Amd 2 Code of Practice For Protective Coating o PDF
No ratings yet
BS 5493 1977 Amd 2 Code of Practice For Protective Coating o PDF
118 pages
Lab Ex1
100% (1)
Lab Ex1
7 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
9 pages
Summary of Learning
No ratings yet
Summary of Learning
10 pages
2016 - The Episteme Journal of Linguistics and Literature Vol 2 No 3 - 2.nima Saragi Analysis of Conditional OBAMA Speech
No ratings yet
2016 - The Episteme Journal of Linguistics and Literature Vol 2 No 3 - 2.nima Saragi Analysis of Conditional OBAMA Speech
28 pages
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
No ratings yet
SAT Suite Question Bank - 1 o 10 Difficult and Hard Grammar 2622024 Answers
10 pages

JK

Uploaded by

JK

Uploaded by

MULTICOLLINEARITY

• Means the existence of a “perfect”, or exact, linear

• Why CLRM assume here is no M-C among X’s?

• The data collection method employed. Sampling over a

• An overdetermined model. This happens when the model has

• High but few significant t ratio.

• Examination of partial correlations.

• Variance inflation factor (VIF).

– We would run the following three regression:

– where is the unadjusted from a regression using X as dependent

• Combining cross-sectional and time series data.

– Where Y = number of cars, P = average price, I = income and t = time.

• Dropping a variable(s) as specification bias.

• At the time t-1,

• If we subtract before from later, we obtain

• Additional or new data

ln 𝑃𝐶𝐸𝑡 = 𝛼0 + 𝛼1 ln 𝑃𝐷𝐼𝑡 + 𝛼2 ln 𝐺𝐷𝑃𝑡 + 𝑢𝑡 (1)

. gen lnpdi = log(pdi)

. gen lnpce = log(pce)

• Now we have create a new variables namely lngdp, lnpdi and

. regress lnpce lnpdi lngdp, constant

Source SS df MS Number of obs = 88

lnpce Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnpdi .4871041 .0601343 8.10 0.000 .367541 .6066672

Variable VIF 1/VIF

lngdp 102.35 0.009770

Mean VIF 102.35

• Our VIF = 102.35 , consider high multi-k

You might also like