L11_2023
L11_2023
L11_2023
Heteroskedasticity
2
Introduction
Consider the following questions:
Heteroscedasticity
• When the variances of Yi are not the same.
• Symbolically,
5
Heteroscedasticity
There are several reasons why the variances of ui may be variable,
some of which are as follows:
Heteroscedasticity
4. Heteroscedasticity can also arise as a result of the presence of
outliers.
• Under homoskesdasticity
8
OLS Estimation
• It still BLUE when we drop only the homoscedasticity assumption
and replace it with the assumption of heteroscedasticity?
OLS Estimation
• Given that 𝛽መ2 is still linear unbiased and consistent, is it “efficient” or
“best”?
• The answer is no to both the questions: 𝛽መ2 is no longer best and the
minimum variance is not given by the previous equation.
10
• We may write it as
GLS
• Assume that the heteroscedastic variances σi2 are known.
• Divide the equation as
12
GLS
• Assume that the heteroscedastic variances σi2 are known.
• Divide the equation as
13
GLS
• This procedure of transforming the original variables in such a way
that the transformed variables satisfy the assumptions of the
classical model and then applying OLS to them is known as the
method of generalized least squares (GLS).
GLS
• The coefficient and the variances are
15
• Using the variance formula seen earlier, and assuming σi2 are
known.
• Using the variance formula seen earlier, and assuming σi2 are
known.
• The usual OLS standard errors are either too large (for the
intercept) or generally too small (for the slope coefficient) in relation
to those obtained by OLS allowing for heteroscedasticity.
Detection of Heteroscedasticity
There are two types of methods
○ Informal methods
○ Formal methods
21
Informal Methods
1. Nature of the Problem
Informal Methods
2. Graphical Method
• In Fig a we see that there is no
systematic pattern between the
two variables, suggesting that
perhaps no heteroscedasticity is
present in the data.
• Figures b to e, however, exhibit
definite patterns.
23
Informal Methods
• Instead of plotting 𝜇Ƹ 𝑖2 against
𝑌𝑖 , one may plot them
against one of the
explanatory variables,
especially if plotting 𝜇Ƹ 𝑖2
against 𝑌𝑖 results in the
pattern shown in Figure a
24
Formal Methods
Park Test
• The functional form suggested is
Park Test
• Since σi2 is generally not known, Park suggests using 𝜇Ƹ 𝑖2 as a proxy
and running the following regression:
Park Test
The Park test is thus a two-stage procedure.
•
Source SS df MS Number of obs = 9
F(1, 7) = 0.45
Model .95900152 1 .95900152 Prob > F = 0.5257
Residual 15.0546945 7 2.15067064 R-squared = 0.0599
Adj R-squared = -0.0744
Total 16.013696 8 2.001712 Root MSE = 1.4665
Glejser Test
• The Glejser test is similar in spirit to
the Park test.
• After obtaining the residuals 𝜇Ƹ 𝑖 from
the OLS regression, Glejser
suggests regressing the absolute
values of 𝜇Ƹ 𝑖 on the X variable.
• Glejser uses the following functional
forms:
33
Example
• We take the previous example and use the absolute value of
residuals.
Example
• Run the regression model:
Example
• We take the previous example
Glejser Test
• Goldfeld and Quandt point out that the error term vi has some
problems in that its expected value is nonzero, it is serially
correlated, and heteroscedastic.
37
Step 1. Fit the regression to the data on Y and X and obtain the
residuals 𝜇Ƹ 𝑖
Step 2. Ignoring the sign of 𝜇Ƹ 𝑖 , that is, taking their absolute value 𝜇Ƹ 𝑖
rank both 𝜇Ƹ 𝑖 and Xi (or 𝑌𝑖 )
38
Example
40
Goldfeld–Quandt Test
• Assumption is that the heteroscedastic variance, σi2 , is positively
related to one of the explanatory variables in the regression model.
• where σ2 is a constant
41
Goldfeld–Quandt Test
Goldfeld and Quandt suggest the following steps:
Goldfeld–Quandt Test
Goldfeld and Quandt suggest the following steps:
Goldfeld–Quandt Test
• These RSS have
Goldfeld–Quandt Test
• If we assume ui are normally distributed, and if the assumption of
homoscedasticity is valid, then λ follows the F distribution with
numerator and denominator df each of (n − c − 2k)/2.
Goldfeld–Quandt Test
• In case there is more than one X variable in the model, the ranking
of observations, the first step in the test, can be done according to
any one of them.
46
EXAMPLE
• Regression model based on first 13 observations:
EXAMPLE
• Regression model based on last 13 observations:
EXAMPLE
50
EXAMPLE
• Obtain the critical F value for 11 numerator and 11 denominator df
at the 5 percent level.
51
EXAMPLE
lambda = 4.0745946
pvalue = .01408971
crit = 2.8179305
EXAMPLE
• The critical F value for 11 numerator and 11 denominator df at the 5
percent level is 2.82.
Breusch–Pagan–Godfrey Test
• Consider the k-variable linear regression model
Breusch–Pagan–Godfrey Test
• Specifically, assume that
55
Breusch–Pagan–Godfrey Test
• If α2 = α3 = ··· = αm = 0, σi2 = α1, which is a constant.
Breusch–Pagan–Godfrey Test
• Step 1. Estimate Eq. by OLS and obtain the residuals
• Step 2. Obtain
Breusch–Pagan–Godfrey Test
• Step 4. Regress pi thus constructed on the Z’s as
Breusch–Pagan–Godfrey Test
• Assuming ui are normally distributed, if there is homoscedasticity
and if the sample size n increases indefinitely, then
Breusch–Pagan–Godfrey Test
• Regressing Y on X
• Step 1.
Source SS df MS Number of obs = 30
F(1, 28) = 496.72
Model 41886.7134 1 41886.7134 Prob > F = 0.0000
Residual 2361.15325 28 84.3269018 R-squared = 0.9466
Adj R-squared = 0.9447
Total 44247.8667 29 1525.78851 Root MSE = 9.183
EXAMPLE
• Step 3. Divide the squared residuals 𝜇Ƹ 𝑖 obtained from regression by
78.7051 to construct the variable pi.
64
EXAMPLE
Step 4. Assuming that pi are linearly related to Xi (= Zi ), we obtain the
regression using the command:
EXAMPLE
Step 4. Assuming that pi are linearly related to Xi (= Zi ), we obtain the
regression using the command:
66
EXAMPLE
Step 5.
• Use the command to get model ESS:
EXAMPLE
theta = 5.2140103
pvalue = .0224056
crit_5 = 3.8414588
crit_1 = 6.6348966
• For 1 df, the 5 percent critical chi-square value is 3.8414 and the 1
percent critical χ2 value is 6.6349.
• Step 1. Given the data, we estimate Eq. and obtain the residuals, 𝜇Ƹ 𝑖
69
nr2 = 4.2232337
crit = 7.8147279
pvalue = .23834602
Remedial Measures
83
When σiSource
2 Is Known: SS The
df Method
MS ofofWeighted
Number
F(2, 7)
obs =
=
9 Least
5258.53
Model
Residual
174.509886
.116151268
Squares
2 87.2549429
7 .016593038
Prob > F
R-squared
=
=
0.0000
0.9993
Adj R-squared = 0.9991
• The results of WLS are as:
Total 174.626037 9 19.402893 Root MSE = .12881
Concluding Examples
• Data on research and development (R&D) expenditure, sales, and
profits for 18 industry groupings in the United States, all figures in
millions of dollars.
• Run the regression