0% found this document useful (0 votes)
27 views21 pages

Heteroscedasticity

Uploaded by

mokshwork08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views21 pages

Heteroscedasticity

Uploaded by

mokshwork08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Heteroscedasticity

Asif Tariq
Overview of the presentation
• What is the nature of Heteroscedasticity?
• What are the causes of Heteroscedasticity?
• What are its consequences?
• How does one detect it?
• What is the remedy?
What is Heteroscedasticity?
• Recall the assumption of homoskedasticity implied that conditional on the explanatory
variables, the variance of the unobserved error, u, was constant.
𝐕𝐚𝐫 𝐮𝐢 = 𝛔𝟐 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐢
or
𝐕𝐚𝐫 𝐮𝐢 𝐗 𝐢 = 𝛔𝟐
• If this is not true, that is, if the variance of u is different for different values of the X’s, then
the errors are heteroscedastic.
𝐕𝐚𝐫 𝐮𝐢 𝐗 𝐢 = 𝛔𝟐𝐢

• A homoscedastic error is one that has constant variance. Equivalently, this means that the
dispersion of the observed values of Y around the regression line is the same across all
observations.

• Heteroscedastic error is one that has a non-constant variance.


What is Heteroscedasticity?
Why to worry about Heteroscedasticity?- The Consequences

• The OLS estimators are still unbiased and consistent, yet the estimators are not efficient
because minimum variance property is violated, making statistical inference less reliable
(i.e., the estimated t and F values may not be reliable).

• Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear
unbiased estimators (LUE).

• Standard errors/Confidence intervals not correct.

• As a result, the t and F tests based under the standard assumptions of CLRM may not be
reliable, resulting in erroneous conclusions regarding the statistical significance of the
estimated regression coefficients.

• In the presence of heteroscedasticity, the BLUE estimators are provided by the method of
weighted least squares (WLS)/generalised least squares (GLS).
Causes
• Error Learning Models.

• Variance of errors may increase as the value of explanatory variable increases:

• Low income → low savings.

• High income → possibility of greater savings.

• With high income the variability as well as errors increase.

• Measurement Errors:

• Presence of Outliers:

• Model Misspecification:

• Skewness in the Distribution.


Causes

• Other Sources:

1. Incorrect transformation of data.

2. Incorrect-Functional Form

3. Missing variables - important variables may be missing.


Detection
• Graphical method.
• Plot the squared 𝒖
ෝ 𝒊 or 𝒖
ෝ 𝒊 or 𝒖
ෝ 𝒊 against X (In case of simple linear regression).
• Graph squared 𝒖
ෝ 𝒊 or 𝒖
ෝ 𝒊 or 𝒖
ෝ 𝒊 against predicted Y or any X (In case of multiple linear
regression).

• Formal tests such as:


• Park test.
• Spearman’s rank correlation test.
• Breusch-Pagan (BP) test*.
• White’s test.
• Goldfeld - Quandt test.
• Glejser test.
• Koenker–Bassett (KB) test.
Graphical method – In case of Simple Linear Regression
• If there is no a priori or empirical information about the nature of heteroscedasticity, in practice one
can do the regression analysis on the assumption that there is no heteroscedasticity and then do a
postmortem examination of the residual squared 𝒖 ෝ 𝒊 or 𝒖ෝ 𝒊 or 𝒖
ෝ 𝒊 to see if they exhibit any systematic
pattern.
• Figures (a) does not show any evidence of
patterns associated with heteroscedasticity.
• Figures (b)–(d) show patterns associated with
heteroscedasticity.
• Figure (b) has a “spray-shaped” residual pattern
that is consistent with the variance of the error
term increasing as x-values increase.
• Figure (c) has a “funnel-shaped” residual
pattern that is consistent with the variance of the
error term decreasing as x-values increase.
• Figure (d) has a “bow-tie” residual pattern that
is consistent with the variance of the error term
decreasing and then increasing as x-values
increase.
Graphical method - In case of Multiple Linear Regression

• In Figure (a) we see that there is no


systematic pattern between the two
variables, suggesting that perhaps no
heteroscedasticity is present in the data.
• Figures (b) to (e), however, exhibit definite
patterns.
• For instance, Figure (c) suggests a linear
relationship, whereas Figures (d) and (e)
indicate a quadratic relationship between uො 2i
and Y෡i .

ෝ 𝟐𝐢
• Same procedure can be done by plotting 𝐮
against each X.
Park’s Test
• Advocated by Rolla Edward Park

• Park formalises the graphical method by suggesting that 𝝈𝟐𝒊 is some function of the
explanatory variable X.

• The functional form he suggests is

𝜷
𝝈𝟐𝒊 = 𝝈𝟐 𝑿𝒊 𝒆𝒗𝒊 (1) or

ln 𝝈𝟐𝒊 = ln 𝝈𝟐 + 𝜷 ln 𝑿𝒊 + 𝒗𝒊 (2)

• Since 𝝈𝟐𝒊 is generally not known, Park suggests using 𝒖


ෝ 𝟐𝒊 as a proxy and running the
following regression:

ෝ 𝟐𝒊 = ln 𝝈𝟐 + 𝜷 ln 𝑿𝒊 + 𝒗𝒊
ln 𝒖 (3)

ෝ 𝟐𝒊 = α + 𝜷 ln 𝑿𝒊 + 𝒗𝒊
ln 𝒖 (4)
Park’s Test
• If β in eq. (4) turns out to be statistically significant, it would suggest that
heteroscedasticity is present in the data. If it turns out to be insignificant, we may accept
the assumption of homoscedasticity.

• The Park test is thus a two-stage procedure:

• In the first stage we run the OLS regression disregarding the heteroscedasticity question.

• We obtain 𝐮
ෝ 𝐢 from this regression, and then in the second stage we run the regression
(4).
Limitation of Park’s Test
• Although empirically appealing, the Park test has some problems.

• Goldfeld and Quandt* have argued that the error term vi entering into Eq. (4) may not satisfy
the OLS assumptions and may itself be heteroscedastic. Nonetheless, as a strictly
exploratory method, one may use the Park test.
Spearman’s Rank Correlation Test
• The Spearman’s rank correlation coefficient as:

σ 𝒅𝟐𝒊
ෝ=1–6[
𝝆 ] (5) where di = difference in the ranks
𝒏 𝒏𝟐 −𝟏

• This rank correlation coefficient can be used to detect heteroscedasticity as follows:

Assume 𝐘𝐢 = 𝛃𝟎 + 𝛃𝟏 𝐗 𝐢 + 𝐮𝐢

ෝ𝐢.
Step 1: Fit the regression to the data on Y and X and obtain the residuals 𝐮

ෝ 𝐢 , that is, taking their absolute value |ෝ


Step 2: Ignoring the sign of 𝐮 𝐮𝐢 |, rank both |ෝ
𝐮𝐢 | and 𝐗 𝐢 (or
෡𝐢 ) according to an ascending or descending order and compute the Spearman’s rank
𝐘
correlation coefficient as per equation (5).
Spearman’s Rank Correlation Test
• Step 3: Assuming that the population rank correlation coefficient is zero and n > 8, the
significance of the sample 𝛒 can be tested by the t-test as follows:

ෝ 𝐧−𝟐
𝝆
𝐭= (6) with df = n – 2
ෝ𝟐 )
(𝟏− 𝝆

• Decision Rule: If the computed t-value exceeds the critical t-value, we may reject the null
hypothesis of no heteroscedasticity; otherwise we don’t reject it.

• Note: If the regression model involves more than one X variable, 𝝆 ෝ can be computed between
𝐮𝐢 | and each of the X variables separately and can be tested for statistical significance by the
|ෝ
t-test given in Eq. (6).
Breusch-Pagan Test
• Breusch and Pagan (1979) developed a Lagrange multiplier (LM) test for heteroskedasticity.
In the following model:

𝒀𝒊 = 𝜷𝟏 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 + ⋯ + 𝜷𝒌 𝑿𝒌𝒊 + 𝒖𝒊 (1)

• The Breusch–Pagan test involves the following steps:

• Step 1: Run a regression of Equation (1) and obtain the residuals 𝒖


ෝ 𝒊 of this regression
equation.

• Step 2: Run the following auxiliary regression:

ෝ 𝟐𝒊 = 𝒂𝟏 + 𝒂𝟐 𝑿𝟐𝒊 + 𝒂𝟑 𝑿𝟑𝒊 + ⋯ + 𝒂𝒌 𝑿𝒌𝒊 + 𝒗𝒊


𝒖 (2)

where Xki are the explanatory variables of the original regression (1).
Breusch-Pagan Test
• Step 3: Formulate the null and the alternative hypotheses. The null hypothesis of
homoskedasticity is:

𝒂𝟐 = 𝒂𝟑 = ⋯ = 𝒂 𝒌 = 𝟎

while the alternative is that at least one of the 𝒂’s is different from zero and that at
least one of the X’s affects the variance of the residuals.

• Step 4: Compute the LM = nR2 statistic, where n is the number of observations used in order
to estimate the auxiliary regression in Step 2, and R2 is the coefficient of determination of
this regression. The LM statistic follows the χ2 distribution with k − 1 degrees of freedom.

• Step 5: Reject the null and conclude that there is significant evidence of heteroskedasticity
when LM-statistical is greater than the critical value [LM-stat > χ2 (k−1,α)]. Alternatively,
compute the p-value and reject the null if the p-value is less than the level of significance α
(usually α = 0.05)*.
Remedy
• As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency
properties of the OLS estimators, but they are no longer efficient, not even asymptotically
(i.e., large sample size).

• This lack of efficiency makes the usual hypothesis-testing procedure of dubious value.
Therefore, remedial measures may be called for.

• There are two approaches to remediation:

• when 𝛔𝟐𝐢 is known, and

• when 𝛔𝟐𝐢 is not known.


When 𝝈𝟐𝒊 is Known: The Method of Weighted Least Squares

• If 𝛔𝟐𝐢 is known, the most straightforward method of correcting heteroscedasticity is by


means of weighted least squares, for the estimators thus obtained are BLUE.

• To see how this procedure works, let us continue with the now-familiar two-variable model:

𝐘𝐢 = 𝛃𝟎 + 𝛃𝟏 𝐗 𝐢 + 𝐮𝐢 (7)

which for ease of algebraic manipulation we write as:

𝐘𝐢 = 𝛃𝟎 𝐗 𝟎𝐢 + 𝛃𝟏 𝐗 𝐢 + 𝐮𝐢 (8) Where 𝐗 𝟎𝐢 = 1 for each i.

Now assume that the heteroscedastic variances 𝛔𝟐𝐢 are known. Divide Eq. (8) through by σ𝐢 to
obtain:

𝐘𝐢 𝐗 𝟎𝐢 𝐗 𝐮
= 𝛃𝟎 ( ) + 𝛃𝟏 ( 𝐢 ) + ( 𝐢 ) (9)
σ𝐢 σ𝐢 σ𝐢 σ𝐢
When 𝝈𝟐𝒊 is Known: The Method of Weighted Least Squares

• For ease of exposition, Eq. (9) can be written as:


𝐘𝐢∗ = 𝛃∗𝟎 𝐗 ∗𝟎𝐢 + 𝛃∗𝟏 𝐗 ∗𝐢 + 𝐮∗𝐢

where the starred, or transformed, variables are the original variables divided by (the
known) σi . We use the notation 𝛃∗𝟎 and 𝛃∗𝟏 , the parameters of the transformed model, to
distinguish them from the usual OLS parameters 𝛃𝟎 and 𝛃𝟏 .

• This procedure of transforming the original variables in such a way that the transformed
variables satisfy the assumptions of the CLRM and then applying OLS to them is known as
the method of Generalized Least Squares (GLS)/Weighted Least Squares(WLS).

• In short, GLS is OLS on the transformed variables that satisfy the standard least-squares
assumptions. The estimators thus obtained are known as GLS estimators, and it is these
estimators that are BLUE.
𝟐
When 𝝈𝒊 is not Known
• As noted earlier, if true 𝛔𝟐𝐢 are known, we can use the WLS method to obtain BLUE estimators.

• Since the true 𝛔𝟐𝐢 are rarely known, is there a way of obtaining consistent (in the statistical sense)
estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity?

• The answer is Yes.

• White’s Heteroscedasticity-Consistent Variances and Standard Errors:

• White has shown that this estimate can be performed so that asymptotically valid (i.e., large-
sample) statistical inferences can be made about the true parameter values.

• Nowadays, several computer packages present White’s heteroscedasticity-corrected


variances and standard errors (HCE) along with the usual OLS variances and standard
errors.

• Incidentally, White’s heteroscedasticity-corrected standard errors are also known as robust


standard errors.

You might also like