0% found this document useful (0 votes)

113 views9 pages

CH 10

Chapter 10 discusses multicollinearity in regression analysis, highlighting its nature, consequences, detection methods, and potential remedies. It explains that multicollinearity occurs when regressors are correlated, which can lead to indeterminate coefficients or large standard errors, complicating precise estimation. The chapter also outlines various detection methods, including examining R-squared values, pair-wise correlations, and variance inflation factors, and suggests remedial measures such as dropping variables, transforming data, or using advanced statistical techniques.

Uploaded by

261936547

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views9 pages

CH 10

Uploaded by

261936547

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

CH 10

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE

CORRELATED?

Assumption of the classical linear regression model (CLRM) is that there is no

multicollinearity among the regressors included in the regression model. In this chapter we take
a critical look at this assumption by seeking answers to the following questions:
1. What is the nature of multicollinearity?
2. Is multicollinearity really a problem?
3. What are its practical consequences?
4. How does one detect it?
5. What remedial measures can be taken to alleviate the problem of multicollinearity?
THE NATURE OF MULTICOLLINEARITY
The term multicollinearity is due to Ragnar Frisch. Originally it meant the existence of a
“perfect,” or exact, linear relationship among some or all explanatory variables of a regression
model. For the k-variable regression involving explanatory variable X1, X2, . . . , Xk (where X1 = 1
for all observations to allow for the intercept term), an exact linear relationship is said to exist if
the following condition is satisfied:
X2 X3 = 3X2 X5 = 2X2+Ui X4 = X22

2 6 5 4
3 9 8 9
9 27 23 81

Ui = a random number
r23= 1 perfect linear relationship
r25 =0.9997 highly linear relationship but not perfect
r24 = 0.9972 highly linear relationship but not perfect

λ1X1 + λ2X2 + · · · + λkXk = 0 1

where λ1, λ2, . . . , λk are constants such that not all of them are zero simultaneously.

Today, however, the term multicollinearity is used in a broader sense to include the case of
perfect multicollinearity, as shown by (1), as well as the case where the X variables are
intercorrelated but not perfectly so, as follows:
λ1X1 + λ2X2 + · · · + λ2Xk + vi = 0
where vi is a stochastic error term.

As a numerical example, consider the following hypothetical data:

It is apparent that X3i = 5X2i . Therefore, there is perfect collinearity between X2 and X3 since the
coefficient of correlation r23 is unity. The variable X*3 was created from X3 by simply adding to it
the following numbers, which were taken from a table of random numbers: 2, 0, 7, 9, 2. Now
there is no longer perfect collinearity between X2 and X*3. However, the two variables are highly
correlated because calculations will show that the coefficient of correlation between them is
0.9959.

If multicollinearity is perfect in the sense of (x2 and X 3), the regression coefficients of the X
variables are indeterminate and their standard errors are infinite If multicollinearity is less
than perfect, as in x2 and x3*, the regression coefficients, although determinate, possess
large standard errors (in relation to the coefficients themselves), which means the
coefficients cannot be estimated with great precision or accuracy.
There are several sources of multicollinearity. As Montgomery and Peck note, multicollinearity
may be due to the following factors
1. The data collection method employed, for example, sampling over a limited range of the
values taken by the regressors in the population.
2. Constraints on the model or in the population being sampled. For example, in the regression
of electricity consumption on income (X2) and house size (X3) there is a physical constraint in
the population in that families with higher incomes generally have larger homes than families
with lower incomes.
3. Model specification, for example, adding polynomial terms to a regression model, especially
when the range of the X variable is small. Y = b1 +b2X +b3X2 +b4X3 +Ui

4. An overdetermined model. This happens when the model has more explanatory variables than
the number of observations.

PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY

In cases of near or high multicollinearity, one is likely to encounter the following consequences:
1. Although BLUE, the OLS estimators have large variances and covariances, making precise
estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider, leading to the
acceptance of the “zero null hypothesis” (i.e., the true population coefficient is zero) more
readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to be statistically
insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R2, the overall
measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes in the data.
Large Variances and Covariances of OLS Estimators
To see large variances and covariances, recall that for the model the variances and covariances of
βˆ2 and βˆ3 are given by

t = b/s.e

where r2 3 is the coefficient of correlation between X2 and X3.

The speed with which variances and covariances increase can be seen with the variance-
inflating factor (VIF), which is defined as
VIF shows how the variance of an estimator is inflated by the presence of multicollinearity. As r
2 3 approaches 1, the VIF approaches infinity.
As you can see from this expression, var (βˆj ) is proportional to σ2 and VIF but inversely
proportional to x2j . Thus, whether var (βˆj ) is large or small will depend on the three ingredients:
(1) σ2, (2) VIF, and (3) x2j . The last one, which ties in with Assumption 8 of the classical model,
states that the larger the variability in a regressor, the smaller the variance of the coefficient of
that regressor, assuming the other two ingredients are constant, and therefore the greater the
precision with which that coefficient can be estimated.
Before proceeding further, it may be noted that the inverse of the VIF is called tolerance (TOL).
That is,

When R2j = 1 (i.e., perfect collinearity), TOLj = 0 and when R2j = 0 (i.e., no collinearity
whatsoever), TOLj is 1. Because of the intimate connection between VIF and TOL, one can use
them interchangeably.

To illustrate the various points made thus far, let us reconsider the consumption–income
example. In Table we reproduce the data of Table and add to it data on wealth of the consumer.
If we assume that consumption expenditure is linearly related to income and wealth, then, from
Table we obtain the following regression:

Regression shows that income and wealth together explain about 96 percent of the variation in
consumption expenditure, and yet neither of the slope coefficients is individually statistically
significant. Moreover, not only is the wealth variable statistically insignificant but also it has the
wrong

DETECTION OF MULTICOLLINEARITY
Having studied the nature and consequences of multicollinearity, the natural question is: How
does one know that collinearity is present in any given situation, especially in models involving
more than two explanatory variables?
gHere it is useful to bear in mind Kmenta’s warning:
1. Multicollinearity is a question of degree and not of kind. The meaningful distinction is not
between the presence and the absence of multicollinearity, but between its various degrees.
2. Since multicollinearity refers to the condition of the explanatory variables that are assumed to
be nonstochastic, it is a feature of the sample and not of the population. Therefore, we do not
“test for multicollinearity” but can, if we wish, measure its degree in any particular sample. Since
multicollinearity is essentially a sample phenomenon, arising out of the largely nonexperimental
data collected in most social sciences, we do not have one unique method of detecting it or
measuring its strength. What we have are some rules of thumb, some informal and some formal,
but rules of thumb all the same. We now consider some of these rules.
-

1. High R2 but few significant t ratios. As noted, this is the “classic” symptom of
multicollinearity. If R2 is high, say, in excess of 0.8, the F test in most cases will reject the
hypothesis that the partial slope coefficients are simultaneously equal to zero, but the individual t
tests will show that none or very few of the partial slope coefficients are statistically different
from zero.
Example
The data are time series for the years 1947–1962 and pertain to Y = number of people employed,
in thousands; X1 = GNP implicit price deflator; X2 = GNP, millions of dollars; X3 = number of
people unemployed in thousands, X4 = number of people in the armed forces, X5 =
noninstitutionalized population over 14 years of age; and X6 = year, equal to 1 in 1947, 2 in 1948,
and 16 in 1962.

Yt = β1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 +Ut

2. High pair-wise correlations among regressors. Another suggested rule of thumb is that if
the pair-wise or zero-order correlation coefficient between two regressors is high, say, in excess
of 0.8, then multicollinearity is a serious problem. The problem with this criterion is that,
although high zero-order correlations may suggest collinearity, it is not necessary that they be
high to have collinearity in any specific case.

As you can see, several of these pair-wise correlations are quite high, suggesting that there may
be a severe collinearity problem. Of course, remember the warning given earlier that such pair-
wise correlations may be a sufficient but not a necessary condition for the existence of
multicollinearity
Examination of partial correlations. Because of the problem just mentioned in relying on zero-
order correlations, Farrar and Glauber have suggested that one should look at the partial
correlation coefficients. Thus in the regression of Y on X2, X3, and X4, a finding that R2 1.2 3 4 is
very high but r 2 1 2.3 4, r 2 1 3.2 4, and r 2 1 4.2 3 are comparatively low may suggest that the variables
X2, X3, and X4 are highly intercorrelated and that at least one of these variables is superfluous.
Although a study of the partial correlations may be useful, there is no guarantee that they will
provide an infallible guide to multicollinearity, for it may happen that both R2 and all the partial
correlations are sufficiently high.

Auxiliary regressions.
one may adopt Klien’s rule of thumb, which suggests that multicollinearity may be a
troublesome problem only if the R2 obtained from an auxiliary regression is greater than the
overall R2, that is, that obtained from the regression of Y on all the regressors. Of course, like all
other rules of thumb, this one should be used judiciously. Tolerance and variance inflation
factor. We have already introduced TOL and VIF. As R2j , the coefficient of determination in the
regression of regressor Xj on the remaining regressors in the model, increases toward unity, that
is, as the collinearity of Xj with the other regressors increases, VIF also increases and in the limit
it can be infinite. Some authors therefore use the VIF as an indicator of multicollinearity.
The larger the value of VIFj, the more “troublesome” or collinear the variable Xj. As a rule of
thumb, if the VIF of a variable exceeds 10, which will happen if R2j exceeds 0.90, that variable
is said be highly collinear. Of course, one could use TOLj as a measure of multicollinearity in
view of its intimate connection with VIFj. The closer is TOLj to zero, the greater the degree of
collinearity of that variable with the other regressors. On the other hand, the closer TOLj is to 1,
the greater the evidence that Xj is not collinear with the other regressors.
To shed further light on the nature of the multicollinearity problem, let us run the auxiliary
regressions, that is the regression of each X variable on the remaining X variables. To save space,
we will present only the R2 values obtained from these regressions, which are given in Table
Since the R2 values in the auxiliary regressions are very high (with the possible exception of the
regression of X4) on the remaining X variables, it seems that we do have a serious collinearity
problem. The same information is obtained from the tolerance factors. As noted previously, the
closer the tolerance factor is to zero, the greater is the evidence of collinearity. Applying Klein’s
rule of thumb, we see that the R2 values obtained from the auxiliary regressions exceed the
overall R2 value (that is the one obtained from the regression of Y on all the X variables) of
0.9954 in 3 out of 6 auxiliary regressions, again suggesting that indeed the Longley data are
plagued by the multicollinearity problem.

REMEDIAL MEASURES
What can be done if multicollinearity is serious? We have two choices:
(1) do nothing or (2) follow some rules of thumb.
Do Nothing
The “do nothing” school of thought is expressed by Blanchard as follows:
When students run their first ordinary least squares (OLS) regression, the first problem
that they usually encounter is that of multicollinearity.Many of them conclude that there
is something wrong with OLS; some resort to new and often creative techniques to get
around the problem. But, we tell them, this is wrong. Multicollinearity is God’s will, not
a problemwith OLS or statistical technique in general.
Rule-of-Thumb Procedures
1. A priori information. Suppose we consider the model
Yi = β1 + β2X2i + β3 X3i + ui
where Y = consumption, X2 = income, and X3 = wealth. As noted before, income and wealth
variables tend to be highly collinear. But suppose a priori we believe that β3 = 0.10β2; that is, the
rate of change of consumption with respect to wealth is one-tenth the corresponding rate with
respect to income. We can then run the following regression:
Yi = β1 + β2X2i + 0.10β2X3i + ui

Yi = β1 + β2 (X2i + 0.10X3i) + ui

Yi = β1 + β2 Zi + ui
where Zi = X2i + 0.1X3i . Once we obtain βˆ2, we can estimate βˆ3 from the postulated
relationship between β2 and β3
Combining cross-sectional and time series data.
Dropping a variable(s) and specification bias. When faced with severe multicollinearity, one
of the “simplest” things to do is to drop one of the collinear variables. Thus, in our consumption–
income–wealth illustration, when we drop the wealth variable, we obtain regression which
shows that, whereas in the original model the income variable was statistically insignificant, it is
now “highly” significant. But in dropping a variable from the model we may be committing a
specification bias or specification error.

Transformation of variables.

Example
Let us reconsider our original model.
First of all, we could express GNP not in nominal terms, but in real terms, which we can do by
dividing nominal GNP by the implicit price deflator. Second, since noninstitutional population
over 14 years of age grows over time because of natural population growth, it will be highly
correlated with time, the variable X6 in our model. Therefore, instead of keeping both these
variables, we will keep the variable X5 and drop X6. Third, there is no compelling reason to
include X3, the number of people unemployed; perhaps the unemployment rate would have been
a better measure of labor market conditions. But we have no data on the latter. So, we will drop
the variable X3. Making these changes, we obtain the following regression results (RGNP = real
GNP

Additional or new data.

Other methods of remedying multicollinearity. Multivariate statistical techniques such as
factor analysis and principal components or techniques such as ridge regression are often
employed to “solve” the problem of multicollinearity. Unfortunately, these techniques are
beyond the scope of this book, for they cannot be discussed competently without resorting to
matrix algebra.

Skill Development
QNO 10.2,10.3,10.5,10.7,10.10,10.11,10.12,10.19,10.24,10.25

Lesson 5 Tools, Techniques and Procedures
100% (2)
Lesson 5 Tools, Techniques and Procedures
39 pages
High School DXD 22 - Gremory of The Graduation Ceremony
56% (18)
High School DXD 22 - Gremory of The Graduation Ceremony
172 pages
Agricultural and Biological Engineering: Psychrometric Chart Use
No ratings yet
Agricultural and Biological Engineering: Psychrometric Chart Use
6 pages
Experiment No. 03 Determination of PH in Water Sample Using PH Meter
No ratings yet
Experiment No. 03 Determination of PH in Water Sample Using PH Meter
3 pages
QMT 533 Assesment 2
No ratings yet
QMT 533 Assesment 2
20 pages
Multicollinearity Nature of Multicollinearity
100% (3)
Multicollinearity Nature of Multicollinearity
7 pages
Occult Signs and Symbols
100% (2)
Occult Signs and Symbols
27 pages
Chapter 5
No ratings yet
Chapter 5
116 pages
Chapter 4. Violation of Assumptions
No ratings yet
Chapter 4. Violation of Assumptions
51 pages
Chapter Four Violations of The Assumptions of Classical Model
No ratings yet
Chapter Four Violations of The Assumptions of Classical Model
151 pages
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
100% (1)
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
45 pages
Chat Application Using Java
No ratings yet
Chat Application Using Java
10 pages
Official Views of The World's Columbian Exposition
No ratings yet
Official Views of The World's Columbian Exposition
464 pages
Tajweed Rules Surah Fussilat
No ratings yet
Tajweed Rules Surah Fussilat
13 pages
DALL E 3 - OpenAI
No ratings yet
DALL E 3 - OpenAI
8 pages
Mumbai Pune Expressway
100% (3)
Mumbai Pune Expressway
12 pages
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
No ratings yet
CH 10 MULTICOLLINEARITY WHAT HAPPENS IF THE EGRESSORS ARE CORRELATED
36 pages
PSIC - Final of Domestic Electrical Appliances
No ratings yet
PSIC - Final of Domestic Electrical Appliances
82 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
25 pages
Multicollinearity
100% (1)
Multicollinearity
25 pages
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
100% (1)
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
22 pages
Multicollinearity Assignment April 5
100% (1)
Multicollinearity Assignment April 5
15 pages
CHAPTER 4 - Violations of Assumptions
No ratings yet
CHAPTER 4 - Violations of Assumptions
96 pages
Multicollinearity (Compatibility Mode)
No ratings yet
Multicollinearity (Compatibility Mode)
71 pages
Chapter 04
No ratings yet
Chapter 04
70 pages
Multicollinearity
100% (1)
Multicollinearity
2 pages
Transport Phenomena 1
No ratings yet
Transport Phenomena 1
8 pages
Chapter 4
No ratings yet
Chapter 4
38 pages
AE Unit II
No ratings yet
AE Unit II
64 pages
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
100% (6)
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
28 pages
Chapter 4
No ratings yet
Chapter 4
68 pages
Chapter 4
No ratings yet
Chapter 4
47 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
Multicollinearity
No ratings yet
Multicollinearity
35 pages
Chapter 7 (Multicolinarity)
No ratings yet
Chapter 7 (Multicolinarity)
64 pages
Ecf230 FPD 9 2018 2
No ratings yet
Ecf230 FPD 9 2018 2
44 pages
Slides 3 Iu
No ratings yet
Slides 3 Iu
22 pages
Lecture 4 - Multicolinearity
No ratings yet
Lecture 4 - Multicolinearity
24 pages
EAF DustTreatment ByNewProcess
No ratings yet
EAF DustTreatment ByNewProcess
11 pages
Econometrics Edited Chapter-4
No ratings yet
Econometrics Edited Chapter-4
35 pages
LEC11
No ratings yet
LEC11
21 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
37 pages
Global Humanitarian Overview 2025 (Abridged Report)
No ratings yet
Global Humanitarian Overview 2025 (Abridged Report)
20 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
21 pages
Prinsip Dasar Teknik DNA Rekombinan
No ratings yet
Prinsip Dasar Teknik DNA Rekombinan
39 pages
Multicollinerity
No ratings yet
Multicollinerity
27 pages
Present and Future of English Updated
No ratings yet
Present and Future of English Updated
10 pages
Multicollinearity
No ratings yet
Multicollinearity
36 pages
Econ 321.6
No ratings yet
Econ 321.6
20 pages
9
No ratings yet
9
25 pages
LTSpice Tutorial
No ratings yet
LTSpice Tutorial
24 pages
Multicollinearity 074432
No ratings yet
Multicollinearity 074432
21 pages
Multicollinearity
No ratings yet
Multicollinearity
7 pages
MULTICOLLINEALITY
No ratings yet
MULTICOLLINEALITY
20 pages
Multicollinearity 2023
No ratings yet
Multicollinearity 2023
32 pages
Abdul Moez 21936547
No ratings yet
Abdul Moez 21936547
9 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
ENG-Accessories For Operating tables-210210X52P-20200318-small
No ratings yet
ENG-Accessories For Operating tables-210210X52P-20200318-small
26 pages
Multicolnearity 2
No ratings yet
Multicolnearity 2
28 pages
Chapter 4 Multicollinearity
No ratings yet
Chapter 4 Multicollinearity
7 pages
Mulicolinearity
No ratings yet
Mulicolinearity
18 pages
1 Metrix
No ratings yet
1 Metrix
4 pages
The Future of The English Lang Review
No ratings yet
The Future of The English Lang Review
5 pages
Disaster Management at International Level
No ratings yet
Disaster Management at International Level
4 pages
MULTICOLLINEARITY
No ratings yet
MULTICOLLINEARITY
8 pages
Multicollinearity
No ratings yet
Multicollinearity
25 pages
Algebra Che 304 Cimpetency Reviewer
No ratings yet
Algebra Che 304 Cimpetency Reviewer
1 page
Econometrics ch11
No ratings yet
Econometrics ch11
44 pages
CH 4 Multicollearity S
No ratings yet
CH 4 Multicollearity S
24 pages
Multi Kol
No ratings yet
Multi Kol
44 pages
Case Study
No ratings yet
Case Study
3 pages
Missing Value 11
No ratings yet
Missing Value 11
14 pages
Chapter7 Econometrics Multicollinearity
No ratings yet
Chapter7 Econometrics Multicollinearity
24 pages
Influencer Script - Nidhi Gupta - 15.03.23
No ratings yet
Influencer Script - Nidhi Gupta - 15.03.23
2 pages
Multicollinearity
No ratings yet
Multicollinearity
26 pages
Evaluation of The Methods For Determination of The Free Radical Scavenging Activity by DPPH
No ratings yet
Evaluation of The Methods For Determination of The Free Radical Scavenging Activity by DPPH
14 pages
CMTOFT
No ratings yet
CMTOFT
4 pages
Grade 5 - Week 13 - Science Questions
No ratings yet
Grade 5 - Week 13 - Science Questions
4 pages
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
No ratings yet
SEE 3433 Electrical Machines: Classification of DC Machines DC Generators - Separately Excited - Armature Reaction
22 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
22 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Function and Functionalism, A Synthetic Perspective
No ratings yet
Function and Functionalism, A Synthetic Perspective
21 pages
6 Multicolinearity
No ratings yet
6 Multicolinearity
6 pages
Multicollinearity Samiji
No ratings yet
Multicollinearity Samiji
13 pages
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
No ratings yet
Multicollinearity: What Happens If Explanatory Variables Are Correlated.
20 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
8 pages
Breaking Spaghetti Nives Bonacic Croatia IYPT 2011
No ratings yet
Breaking Spaghetti Nives Bonacic Croatia IYPT 2011
34 pages
Resolución Test 2
No ratings yet
Resolución Test 2
2 pages
JK Cements: Swot Analysis
No ratings yet
JK Cements: Swot Analysis
3 pages
Effect of Elemental Sulfur On Pitting Corrosion of Steels
No ratings yet
Effect of Elemental Sulfur On Pitting Corrosion of Steels
8 pages
Data Problems: Multicollinearity and Inadequate Variation
No ratings yet
Data Problems: Multicollinearity and Inadequate Variation
4 pages
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet

CH 10

Uploaded by

CH 10

Uploaded by

CH 10

MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE

Assumption of the classical linear regression model (CLRM) is that there is no

λ1X1 + λ2X2 + · · · + λkXk = 0 1

As a numerical example, consider the following hypothetical data:

PRACTICAL CONSEQUENCES OF MULTICOLLINEARITY

where r2 3 is the coefficient of correlation between X2 and X3.

Yt = β1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 +Ut

Additional or new data.

You might also like