Heteroscedasticity
Heteroscedasticity
Asif Tariq
Overview of the presentation
• What is the nature of Heteroscedasticity?
• What are the causes of Heteroscedasticity?
• What are its consequences?
• How does one detect it?
• What is the remedy?
What is Heteroscedasticity?
• Recall the assumption of homoskedasticity implied that conditional on the explanatory
variables, the variance of the unobserved error, u, was constant.
𝐕𝐚𝐫 𝐮𝐢 = 𝛔𝟐 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐢
or
𝐕𝐚𝐫 𝐮𝐢 𝐗 𝐢 = 𝛔𝟐
• If this is not true, that is, if the variance of u is different for different values of the X’s, then
the errors are heteroscedastic.
𝐕𝐚𝐫 𝐮𝐢 𝐗 𝐢 = 𝛔𝟐𝐢
• A homoscedastic error is one that has constant variance. Equivalently, this means that the
dispersion of the observed values of Y around the regression line is the same across all
observations.
• The OLS estimators are still unbiased and consistent, yet the estimators are not efficient
because minimum variance property is violated, making statistical inference less reliable
(i.e., the estimated t and F values may not be reliable).
• Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear
unbiased estimators (LUE).
• As a result, the t and F tests based under the standard assumptions of CLRM may not be
reliable, resulting in erroneous conclusions regarding the statistical significance of the
estimated regression coefficients.
• In the presence of heteroscedasticity, the BLUE estimators are provided by the method of
weighted least squares (WLS)/generalised least squares (GLS).
Causes
• Error Learning Models.
• Measurement Errors:
• Presence of Outliers:
• Model Misspecification:
• Other Sources:
2. Incorrect-Functional Form
ෝ 𝟐𝐢
• Same procedure can be done by plotting 𝐮
against each X.
Park’s Test
• Advocated by Rolla Edward Park
• Park formalises the graphical method by suggesting that 𝝈𝟐𝒊 is some function of the
explanatory variable X.
𝜷
𝝈𝟐𝒊 = 𝝈𝟐 𝑿𝒊 𝒆𝒗𝒊 (1) or
ln 𝝈𝟐𝒊 = ln 𝝈𝟐 + 𝜷 ln 𝑿𝒊 + 𝒗𝒊 (2)
ෝ 𝟐𝒊 = ln 𝝈𝟐 + 𝜷 ln 𝑿𝒊 + 𝒗𝒊
ln 𝒖 (3)
ෝ 𝟐𝒊 = α + 𝜷 ln 𝑿𝒊 + 𝒗𝒊
ln 𝒖 (4)
Park’s Test
• If β in eq. (4) turns out to be statistically significant, it would suggest that
heteroscedasticity is present in the data. If it turns out to be insignificant, we may accept
the assumption of homoscedasticity.
• In the first stage we run the OLS regression disregarding the heteroscedasticity question.
• We obtain 𝐮
ෝ 𝐢 from this regression, and then in the second stage we run the regression
(4).
Limitation of Park’s Test
• Although empirically appealing, the Park test has some problems.
• Goldfeld and Quandt* have argued that the error term vi entering into Eq. (4) may not satisfy
the OLS assumptions and may itself be heteroscedastic. Nonetheless, as a strictly
exploratory method, one may use the Park test.
Spearman’s Rank Correlation Test
• The Spearman’s rank correlation coefficient as:
σ 𝒅𝟐𝒊
ෝ=1–6[
𝝆 ] (5) where di = difference in the ranks
𝒏 𝒏𝟐 −𝟏
Assume 𝐘𝐢 = 𝛃𝟎 + 𝛃𝟏 𝐗 𝐢 + 𝐮𝐢
ෝ𝐢.
Step 1: Fit the regression to the data on Y and X and obtain the residuals 𝐮
ෝ 𝐧−𝟐
𝝆
𝐭= (6) with df = n – 2
ෝ𝟐 )
(𝟏− 𝝆
• Decision Rule: If the computed t-value exceeds the critical t-value, we may reject the null
hypothesis of no heteroscedasticity; otherwise we don’t reject it.
• Note: If the regression model involves more than one X variable, 𝝆 ෝ can be computed between
𝐮𝐢 | and each of the X variables separately and can be tested for statistical significance by the
|ෝ
t-test given in Eq. (6).
Breusch-Pagan Test
• Breusch and Pagan (1979) developed a Lagrange multiplier (LM) test for heteroskedasticity.
In the following model:
where Xki are the explanatory variables of the original regression (1).
Breusch-Pagan Test
• Step 3: Formulate the null and the alternative hypotheses. The null hypothesis of
homoskedasticity is:
𝒂𝟐 = 𝒂𝟑 = ⋯ = 𝒂 𝒌 = 𝟎
while the alternative is that at least one of the 𝒂’s is different from zero and that at
least one of the X’s affects the variance of the residuals.
• Step 4: Compute the LM = nR2 statistic, where n is the number of observations used in order
to estimate the auxiliary regression in Step 2, and R2 is the coefficient of determination of
this regression. The LM statistic follows the χ2 distribution with k − 1 degrees of freedom.
• Step 5: Reject the null and conclude that there is significant evidence of heteroskedasticity
when LM-statistical is greater than the critical value [LM-stat > χ2 (k−1,α)]. Alternatively,
compute the p-value and reject the null if the p-value is less than the level of significance α
(usually α = 0.05)*.
Remedy
• As we have seen, heteroscedasticity does not destroy the unbiasedness and consistency
properties of the OLS estimators, but they are no longer efficient, not even asymptotically
(i.e., large sample size).
• This lack of efficiency makes the usual hypothesis-testing procedure of dubious value.
Therefore, remedial measures may be called for.
• To see how this procedure works, let us continue with the now-familiar two-variable model:
𝐘𝐢 = 𝛃𝟎 + 𝛃𝟏 𝐗 𝐢 + 𝐮𝐢 (7)
Now assume that the heteroscedastic variances 𝛔𝟐𝐢 are known. Divide Eq. (8) through by σ𝐢 to
obtain:
𝐘𝐢 𝐗 𝟎𝐢 𝐗 𝐮
= 𝛃𝟎 ( ) + 𝛃𝟏 ( 𝐢 ) + ( 𝐢 ) (9)
σ𝐢 σ𝐢 σ𝐢 σ𝐢
When 𝝈𝟐𝒊 is Known: The Method of Weighted Least Squares
where the starred, or transformed, variables are the original variables divided by (the
known) σi . We use the notation 𝛃∗𝟎 and 𝛃∗𝟏 , the parameters of the transformed model, to
distinguish them from the usual OLS parameters 𝛃𝟎 and 𝛃𝟏 .
• This procedure of transforming the original variables in such a way that the transformed
variables satisfy the assumptions of the CLRM and then applying OLS to them is known as
the method of Generalized Least Squares (GLS)/Weighted Least Squares(WLS).
• In short, GLS is OLS on the transformed variables that satisfy the standard least-squares
assumptions. The estimators thus obtained are known as GLS estimators, and it is these
estimators that are BLUE.
𝟐
When 𝝈𝒊 is not Known
• As noted earlier, if true 𝛔𝟐𝐢 are known, we can use the WLS method to obtain BLUE estimators.
• Since the true 𝛔𝟐𝐢 are rarely known, is there a way of obtaining consistent (in the statistical sense)
estimates of the variances and covariances of OLS estimators even if there is heteroscedasticity?
• White has shown that this estimate can be performed so that asymptotically valid (i.e., large-
sample) statistical inferences can be made about the true parameter values.