L11_2023

1
Heteroskedasticity
2
Introduction
Consider the following questions:
1. What is the nature of heteroscedasticity?
2. What are its consequences?
3. How does one detect it?
4. What are the remedial measures?

3
The Nature of Heteroscedasticity

• Homoscedasticity or equal (homo) spread (scedasticity), that is,
equal variance.
• The conditional variance of Yi (which is equal to that of ui),

conditional upon the given Xi
4
Heteroscedasticity
• When the variances of Yi are not the same.
• Symbolically,
5
Heteroscedasticity
There are several reasons why the variances of ui may be variable,
some of which are as follows:
1. Following the error-learning models, as people learn, their errors of

behavior become smaller over time or the number of errors
becomes more consistent.
2. As incomes grow, people have more discretionary income and

hence more scope for choice about the disposition of their income.
3. As data collecting techniques improve, σi2 is likely to decrease.

6
Heteroscedasticity
4. Heteroscedasticity can also arise as a result of the presence of
outliers.
5. when the regression model is not correctly specified.
6. Skewness in the distribution of one or more regressors included in

the model.
7. Heteroscedasticity can also arise because of (1) incorrect data

transformation (e.g., ratio or first difference transformations) and (2)
incorrect functional form (e.g., linear versus log–linear models)
7
OLS Estimation in the Presence of

Heteroscedasticity
• The two-variable model:
• Under homoskesdasticity
8
OLS Estimation
• It still BLUE when we drop only the homoscedasticity assumption
and replace it with the assumption of heteroscedasticity?
• 𝛽መ2 is still linear and unbiased.
• The variance of ui, homoscedastic or heteroscedastic, plays no part

in the determination of the unbiasedness property.
• 𝛽መ2 is a consistent estimator despite heteroscedasticity; that is, as

the sample size increases indefinitely, the estimated 𝛽2 converges
to its true value.
9
OLS Estimation
• Given that 𝛽መ2 is still linear unbiased and consistent, is it “efficient” or
“best”?
• That is, does it have minimum variance in the class of unbiased

estimators?
• And is that minimum variance given by previous equation?
• The answer is no to both the questions: 𝛽መ2 is no longer best and the
minimum variance is not given by the previous equation.
10
The Method of Generalized Least Squares

(GLS)
• Two-variable model:
• We may write it as
• where X0i = 1 for each i

11
GLS
• Assume that the heteroscedastic variances σi2 are known.
• Divide the equation as
12
GLS
• Assume that the heteroscedastic variances σi2 are known.
• Divide the equation as
13
GLS
• This procedure of transforming the original variables in such a way
that the transformed variables satisfy the assumptions of the
classical model and then applying OLS to them is known as the
method of generalized least squares (GLS).
• In short, GLS is OLS on the transformed variables that satisfy the

standard least-squares assumptions.
• The estimators thus obtained are known as GLS estimators, and it

is these estimators that are BLUE.
14
GLS
• The coefficient and the variances are
15
Difference between OLS and GLS

• In OLS we minimize
• In GLS, the weight assigned to each observation is inversely

proportional to its σi, that is, observations coming from a population
with larger σi will get relatively smaller weight and those from a
population with smaller σi will get proportionately larger weight in
minimizing the RSS.
16
Consequences of Using OLS in the Presence of

Heteroscedasticity
OLS Estimation Allowing for Heteroscedasticity.
• Using the variance formula seen earlier, and assuming σi2 are
known.
• Can we establish confidence intervals and test hypotheses with the

usual t and F tests?
17

Heteroscedasticity
OLS Estimation Allowing for Heteroscedasticity.
• Using the variance formula seen earlier, and assuming σi2 are
known.
• Can we establish confidence intervals and test hypotheses with the

usual t and F tests?
• The answer generally is no because var (𝛽መ2∗ ) ≤ var (𝛽መ2 ).

18

Heteroscedasticity
OLS Estimation Disregarding Heteroscedasticity
• Suppose we use the following formula ignoring heteroskedasticity:
• Variance will be biased estimator.
• If we persist in using the usual testing procedures despite heteroscedasticity,

whatever conclusions we draw or inferences we make may be very
misleading.
19

Heteroscedasticity
OLS Estimation Disregarding Heteroscedasticity
• The usual OLS standard errors are either too large (for the
intercept) or generally too small (for the slope coefficient) in relation
to those obtained by OLS allowing for heteroscedasticity.
• The message is clear: In the presence of heteroscedasticity, use

GLS.
20
Detection of Heteroscedasticity
There are two types of methods
○ Informal methods
○ Formal methods
21
Informal Methods
1. Nature of the Problem
• In cross-sectional data involving heterogeneous units,

heteroscedasticity may be the rule rather than the exception.
• Thus, in a cross-sectional analysis involving the investment

expenditure in relation to sales, rate of interest, etc.,
heteroscedasticity is generally expected if small-, medium-, and
large-size firms are sampled together.
22
Informal Methods
2. Graphical Method
• In Fig a we see that there is no
systematic pattern between the
two variables, suggesting that
perhaps no heteroscedasticity is
present in the data.
• Figures b to e, however, exhibit
definite patterns.
23
Informal Methods
• Instead of plotting 𝜇Ƹ 𝑖2 against
𝑌෠𝑖 , one may plot them
against one of the
explanatory variables,
especially if plotting 𝜇Ƹ 𝑖2
against 𝑌෠𝑖 results in the
pattern shown in Figure a
24
Formal Methods
Park Test
• The functional form suggested is
• where vi is the stochastic disturbance term.

25
Park Test
• Since σi2 is generally not known, Park suggests using 𝜇Ƹ 𝑖2 as a proxy
and running the following regression:
• If β turns out to be statistically significant, it would suggest that

heteroscedasticity is present in the data. If it turns out to be
insignificant, we may accept the assumption of homoscedasticity.
26
Park Test
The Park test is thus a two-stage procedure.
• In the first stage, we run the OLS regression disregarding the

heteroscedasticity question.
• We obtain 𝜇Ƹ 𝑖2 from this regression, and then in the second stage we

run the above regression model.
27
EXAMPLE : Relationship between Compensation and

Productivity
The following data is available: Y X
3396 9355
• Y = average compensation in 3787 8584
thousands of dollars, 4013 7962
4104 8275
4146 8389
• X = average productivity
4241 9418
in thousands of dollars 4388 9795
4538 10281
4843 11750
28

Productivity
• Run the following regression
Source SS df MS Number of obs = 9
F(1, 7) = 5.44
Model 619377.506 1 619377.506 Prob > F = 0.0523
Residual 796278.05 7 113754.007 R-squared = 0.4375
Adj R-squared = 0.3572
Total 1415655.56 8 176956.944 Root MSE = 337.27
x Coef. Std. Err. t P>|t| [95% Conf. Interval]
y .2329993 .0998528 2.33 0.052 -.0031151 .4691137

_cons 1992.062 936.6123 2.13 0.071 -222.6741 4206.798
• As labor productivity increases by a dollar, labor compensation on

the average increases by about 23 cents.
29

Productivity
• Obtain the residuals using the following command:
• Generate squared log of residuals using the following command:
• Generate log of variable X using the command:

30

Productivity
• Run the following regression using log squared residuals:
•
F(1, 7) = 0.45
Model .95900152 1 .95900152 Prob > F = 0.5257
Adj R-squared = -0.0744
Total 16.013696 8 2.001712 Root MSE = 1.4665
lehat2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
lx -2.802023 4.196131 -0.67 0.526 -12.7243 7.12025

_cons 35.82686 38.3227 0.93 0.381 -54.79192 126.4456
31

Productivity
• Run the following regression:
There is no statistically significant

relationship between the two
variables.
Following the Park test, we may
conclude that there is no
heteroscedasticity in the error
variance.
32
Glejser Test
• The Glejser test is similar in spirit to
the Park test.
• After obtaining the residuals 𝜇Ƹ 𝑖 from
the OLS regression, Glejser
suggests regressing the absolute
values of 𝜇Ƹ 𝑖 on the X variable.
• Glejser uses the following functional
forms:
33
Example
• We take the previous example and use the absolute value of
residuals.
• Generate the absolute residuals variable:

34
Example
• Run the regression model:

F(1, 7) = 0.09
Model 4724.55491 1 4724.55491 Prob > F = 0.7718
Adj R-squared = -0.1282
Total 368649.926 8 46081.2408 Root MSE = 228.01
aehat Coef. Std. Err. t P>|t| [95% Conf. Interval]
x -.0203497 .0675047 -0.30 0.772 -.179973 .1392736

_cons 407.476 633.1895 0.64 0.540 -1089.779 1904.731
35
Example
• We take the previous example
• There is no relationship between the absolute value of the residuals

and the regressor, average productivity.
• This reinforces the conclusion based on the Park test.

36
Glejser Test
• Goldfeld and Quandt point out that the error term vi has some
problems in that its expected value is nonzero, it is serially
correlated, and heteroscedastic.
37
Spearman’s Rank Correlation Test

The formula is
Step 1. Fit the regression to the data on Y and X and obtain the
residuals 𝜇Ƹ 𝑖
Step 2. Ignoring the sign of 𝜇Ƹ 𝑖 , that is, taking their absolute value 𝜇Ƹ 𝑖
rank both 𝜇Ƹ 𝑖 and Xi (or 𝑌෠𝑖 )
38
Spearman’s Rank Correlation Test

Step 3. Assuming that the population rank correlation coefficient ρs is
zero and n > 8, the significance of the sample rs can be tested by the t
test as follows
• If the computed t value exceeds the critical t value, we may accept

the hypothesis of heteroscedasticity; otherwise we may reject it.
39
Example
40
Goldfeld–Quandt Test
• Assumption is that the heteroscedastic variance, σi2 , is positively
related to one of the explanatory variables in the regression model.
• Consider the usual two-variable model:
• Suppose σi2 is positively related to Xi as
• where σ2 is a constant
41
Goldfeld and Quandt suggest the following steps:
Step 1. Order or rank the observations according to the values of Xi,

beginning with the lowest X value.
Step 2. Omit c central observations, where c is specified a priori, and

divide the remaining (n − c) observations into two groups each of (n −
c)/ 2 observations.
42
Goldfeld and Quandt suggest the following steps:
Step 3. Fit separate OLS regressions to the first (n − c)/ 2 observations

and the last (n − c)/ 2 observations, and obtain the respective residual
sums of squares RSS1 and RSS2, RSS1 representing the RSS from the
regression corresponding to the smaller Xi values (the small variance
group) and RSS2 that from the larger Xi values (the large variance
group).
43
• These RSS have
• where k is the number of parameters to be estimated, including the

intercept.
Step 4. Compute the ratio

44
• If we assume ui are normally distributed, and if the assumption of
homoscedasticity is valid, then λ follows the F distribution with
numerator and denominator df each of (n − c − 2k)/2.
• If in an application the computed λ ( = F) is greater than the critical F

at the chosen level of significance, we can reject the hypothesis of
homoscedasticity, that is, we can say that heteroscedasticity is very
likely.
45
• In case there is more than one X variable in the model, the ranking
of observations, the first step in the test, can be done according to
any one of them.
46
EXAMPLE : The Goldfeld–Quandt Test

• Data on consumption expenditure in relation to income for a cross
section of 30 families.
• Dropping the middle 4 observations, the OLS regressions based on

the first 13 and the last 13 observations and their associated
residual sums of squares are as follows:
47
EXAMPLE
• Regression model based on first 13 observations:

F(1, 11) = 87.79
Model 3010.06452 1 3010.06452 Prob > F = 0.0000
Total 3387.23077 12 282.269231 Root MSE = 5.8556
ry Coef. Std. Err. t P>|t| [95% Conf. Interval]
rx .6967742 .074366 9.37 0.000 .5330958 .8604526

_cons 3.409429 8.704924 0.39 0.703 -15.74998 22.56884
48
EXAMPLE
• Regression model based on last 13 observations:

F(1, 11) = 36.42
Model 5088.89274 1 5088.89274 Prob > F = 0.0001
Total 6625.69231 12 552.141026 Root MSE = 11.82
ry Coef. Std. Err. t P>|t| [95% Conf. Interval]
rx .7941373 .1315819 6.04 0.000 .5045274 1.083747

_cons -28.02717 30.64214 -0.91 0.380 -95.47006 39.41573
49
EXAMPLE
50
EXAMPLE
• Obtain the critical F value for 11 numerator and 11 denominator df
at the 5 percent level.
51
EXAMPLE
lambda = 4.0745946
pvalue = .01408971
crit = 2.8179305
• Since the estimated F (= λ) value exceeds the critical value, we may

conclude that there is heteroscedasticity in the error variance.
52
EXAMPLE
• The critical F value for 11 numerator and 11 denominator df at the 5
percent level is 2.82.
• Since the estimated F (= λ) value exceeds the critical value, we may

conclude that there is heteroscedasticity in the error variance.
53
Breusch–Pagan–Godfrey Test
• Consider the k-variable linear regression model
• Assume that the error variance σi2 is described as
• That is, σi2 is some function of the nonstochastic Z variables; some

or all of the X’s can serve as Z’s.
54
• Specifically, assume that
55
• If α2 = α3 = ··· = αm = 0, σi2 = α1, which is a constant.
• Therefore, to test whether σi2 is homoscedastic, one can test the

hypothesis that α2 = α3 = ··· = αm = 0.
• This is the basic idea behind the Breusch–Pagan–Godfrey test.
• The actual test procedure is as follows.

56
• Step 1. Estimate Eq. by OLS and obtain the residuals
• Step 2. Obtain
• Step 3. Construct variables pi defined as

57
• Step 4. Regress pi thus constructed on the Z’s as
• Step 5. Obtain the ESS (explained sum of squares) and define

58
• Assuming ui are normally distributed, if there is homoscedasticity
and if the sample size n increases indefinitely, then
• That is, it follows the chi-square distribution with (m − 1) degrees of

freedom.
59
• Therefore, if in an application the computed Θ ( = χ2) exceeds the

critical χ2 value at the chosen level of significance, we can reject the
hypothesis of homoscedasticity; otherwise we do not reject it.
60
EXAMPLE : The Breusch– Pagan–Godfrey

(BPG) Test
• We use the previous FULL data set
• Regressing Y on X
• Step 1.
F(1, 28) = 496.72
Model 41886.7134 1 41886.7134 Prob > F = 0.0000
Total 44247.8667 29 1525.78851 Root MSE = 9.183
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
x .6377846 .0286167 22.29 0.000 .579166 .6964031

_cons 9.290307 5.231386 1.78 0.087 -1.4257 20.00632
61

(BPG) Test
• Step 2.
• Generate residuals
• Get the sum of residuals
62

(BPG) Test
• Step 2.
• Generate sum of residuals
• Generate number of observations
• Get the sigma tilde value

63
EXAMPLE
• Step 3. Divide the squared residuals 𝜇Ƹ 𝑖 obtained from regression by
78.7051 to construct the variable pi.
64
EXAMPLE
Step 4. Assuming that pi are linearly related to Xi (= Zi ), we obtain the
regression using the command:

F(1, 28) = 5.97
Model 10.4280495 1 10.4280495 Prob > F = 0.0211
Total 59.3380894 29 2.04614101 Root MSE = 1.3217
p Coef. Std. Err. t P>|t| [95% Conf. Interval]
x .0100632 .0041187 2.44 0.021 .0016265 .0184999

_cons -.7426146 .7529284 -0.99 0.332 -2.284918 .7996892
65
EXAMPLE
Step 4. Assuming that pi are linearly related to Xi (= Zi ), we obtain the
regression using the command:
66
EXAMPLE
Step 5.
• Use the command to get model ESS:
• To obtain theta value, use the command:

67
EXAMPLE
theta = 5.2140103
pvalue = .0224056
crit_5 = 3.8414588
crit_1 = 6.6348966
• For 1 df, the 5 percent critical chi-square value is 3.8414 and the 1
percent critical χ2 value is 6.6349.
• Thus, the observed chi-square value of 5.2140 is significant at the 5

percent but not the 1 percent level of significance.
68
White’s General Heteroscedasticity Test

• Does not rely on the normality assumption
• Consider the following model
• The White test proceeds as follows:
• Step 1. Given the data, we estimate Eq. and obtain the residuals, 𝜇Ƹ 𝑖
69

• Step 2. We then run the following (auxiliary) regression:
• There is a constant term in this equation even though the original

regression may or may not contain it.
70

• Step 3. Under the null hypothesis that there is no
heteroscedasticity, the sample size (n) times the R2 obtained from
the auxiliary regression asymptotically follows the chi-square
distribution with df equal to the number of regressors (excluding the
constant term) in the auxiliary regression.
71

• Step 4. If the chi-square value obtained exceeds the critical chi-
square value at the chosen level of significance, the conclusion is
that there is heteroscedasticity.
72
White’s Heteroscedasticity Test

The White test can be a test of (pure) heteroscedasticity or
specification error or both.
• If no cross-product terms are present in the White test procedure,

then it is a test of pure heteroscedasticity.
• If cross-product terms are present, then it is a test of both

heteroscedasticity and specification bias.
73
EXAMPLE : White’s Heteroscedasticity Test
• Use the dataset HPRICE1 from Wooldridge

• Run the regression using the command:

F(3, 84) = 57.46
Model 617130.701 3 205710.234 Prob > F = 0.0000
Total 917854.506 87 10550.0518 Root MSE = 59.833
price Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .0020677 .0006421 3.22 0.002 .0007908 .0033446

sqrft .1227782 .0132374 9.28 0.000 .0964541 .1491022
bdrms 13.85252 9.010145 1.54 0.128 -4.065141 31.77018
_cons -21.77031 29.47504 -0.74 0.462 -80.38466 36.84405
74
• Compute residuals and their squared terms:

• Now, run the regression

F(3, 84) = 5.34
Model 701213780 3 233737927 Prob > F = 0.0020
Residual 3.6775e+09 84 43780003.5 R-squared = 0.1601
Total 4.3787e+09 87 50330276.7 Root MSE = 6616.6
resid2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
lotsize .2015209 .0710091 2.84 0.006 .0603116 .3427302

sqrft 1.691037 1.46385 1.16 0.251 -1.219989 4.602063
bdrms 1041.76 996.381 1.05 0.299 -939.6526 3023.173
_cons -5522.795 3259.478 -1.69 0.094 -12004.62 959.0348
75
• Form the scalar for R2 and observations:
• Compute test statistics
• Find out critical value for df and 5% significance level:

76
• Compute the pvalue:
• List the value using command:

nr2 = 14.092386
crit = 7.8147279
pvalue = .00278206
• Thus, we reject the null hypothesis of no heteroskedasticity.

77
• Perform the same analysis using the logarithmic transformation of

variables.

F(3, 84) = 50.42
Model 5.15503875 3 1.71834625 Prob > F = 0.0000
Residual 2.86256415 84 .034078145 R-squared = 0.6430
Total 8.0176029 87 .092156355 Root MSE = .1846
lprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize .1679668 .0382811 4.39 0.000 .0918406 .2440931

lsqrft .7002321 .0928653 7.54 0.000 .5155594 .8849049
bdrms .0369583 .0275313 1.34 0.183 -.0177907 .0917073
_cons -1.297041 .6512836 -1.99 0.050 -2.59219 -.0018916
78
• Generate squared residuals and run the regression

F(3, 84) = 1.41
Model .022620136 3 .007540045 Prob > F = 0.2451
Residual .448718204 84 .005341883 R-squared = 0.0480
Total .471338339 87 .005417682 Root MSE = .07309
resid2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
llotsize -.0070156 .0151563 -0.46 0.645 -.0371556 .0231244

lsqrft -.0627367 .0367674 -1.71 0.092 -.1358526 .0103792
bdrms .0168407 .0109002 1.54 0.126 -.0048357 .038517
_cons .5099937 .2578573 1.98 0.051 -.0027838 1.022771
79
• Compute test statistics, critical value and pvalue.

80
• Show all the values and make a decision:
nr2 = 4.2232337
crit = 7.8147279
pvalue = .23834602
• As pvalue is very high, thus we don’t reject the null hypothesis.

• It means the standard errors are well defined.
81

82
Remedial Measures
83
When σi2 Is Known: The Method of Weighted Least

Squares
EXAMPLE : Illustration of the Method of Weighted Least Squares.
• Suppose we want to study the relationship between compensation

and employment size.
84

Squares
85

Squares
The procedure
86

Squares
• Run the regression model:

F(2, 7) = 5258.53
Model 174.509886 2 87.2549429 Prob > F = 0.0000
Residual .116151268 7 .016593038 R-squared = 0.9993
Total 174.626037 9 19.402893 Root MSE = .12881
yt Coef. Std. Err. t P>|t| [95% Conf. Interval]
inter 3392.686 77.17705 43.96 0.000 3210.192 3575.181

xt 154.3817 16.16221 9.55 0.000 116.1641 192.5992
87
When σiSource
2 Is Known: SS The
df Method
MS ofofWeighted
Number
F(2, 7)
obs =
=
9 Least
5258.53
Model
Residual
174.509886
.116151268
Squares
2 87.2549429
7 .016593038
Prob > F
R-squared
=
=
0.0000
0.9993
• The results of WLS are as:
Total 174.626037 9 19.402893 Root MSE = .12881
yt Coef. Std. Err. t P>|t| [95% Conf. Interval]
inter 3392.686 77.17705 43.96 0.000 3210.192 3575.181

xt 154.3817 16.16221 9.55 0.000 116.1641 192.5992
When σi2 Is Known
• For comparison, we give the usual or unweighted OLS regression
results:

F(1, 7) = 121.62
Model 1354804.27 1 1354804.27 Prob > F = 0.0000
Total 1432784 8 179098 Root MSE = 105.55
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
x 150.2667 13.62593 11.03 0.000 118.0465 182.4869

_cons 3400.333 76.6774 44.35 0.000 3219.02 3581.647
89
When σi2 Is Not Known

• White’s Heteroscedasticity-Consistent Variances and Standard
Errors.
• White’s heteroscedasticity corrected standard errors are also known

as robust standard errors.
90
Plausible Assumptions about Heteroscedasticity

Pattern
• Consider the two-variable regression model:
• Consider several assumptions about the pattern of

heteroscedasticity.
91

Pattern
• Transform the original model as follows: Divide the original model
through by Xi :
92

Pattern
• Transform the original model as follows:
93

Pattern
94

Pattern
• Very often reduces heteroscedasticity when compared with the

regression Yi = β1 + β2Xi +ui .
95
Concluding Examples
• Data on research and development (R&D) expenditure, sales, and
profits for 18 industry groupings in the United States, all figures in
millions of dollars.
• Run the regression

F(1, 16) = 14.67
Model 111675212 1 111675212 Prob > F = 0.0015
Residual 121806834 16 7612927.12 R-squared = 0.4783
Total 233482046 17 13734238 Root MSE = 2759.2
rd Coef. Std. Err. t P>|t| [95% Conf. Interval]
sales .0319003 .008329 3.83 0.001 .0142436 .049557

_cons 192.9932 990.9858 0.19 0.848 -1907.803 2293.789
96
Test for Heteroskedasticity

• Obtain residuals, absolute and squared absolute values.
•
97

• Park Test
98

99

L11_2023

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

L11_2023

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L11_2023

Uploaded by

Copyright:

Available Formats

1

1. What is the nature of heteroscedasticity?

2. What are its consequences?

3. How does one detect it?

4. What are the remedial measures?

The Nature of Heteroscedasticity

• The conditional variance of Yi (which is equal to that of ui),

1. Following the error-learning models, as people learn, their errors of

2. As incomes grow, people have more discretionary income and

3. As data collecting techniques improve, σi2 is likely to decrease.

5. when the regression model is not correctly specified.

6. Skewness in the distribution of one or more regressors included in

7. Heteroscedasticity can also arise because of (1) incorrect data

OLS Estimation in the Presence of

• 𝛽መ2 is still linear and unbiased.

• The variance of ui, homoscedastic or heteroscedastic, plays no part

• 𝛽መ2 is a consistent estimator despite heteroscedasticity; that is, as

• That is, does it have minimum variance in the class of unbiased

• And is that minimum variance given by previous equation?

The Method of Generalized Least Squares

• where X0i = 1 for each i

• In short, GLS is OLS on the transformed variables that satisfy the

• The estimators thus obtained are known as GLS estimators, and it

Difference between OLS and GLS

• In GLS, the weight assigned to each observation is inversely

Consequences of Using OLS in the Presence of

• Can we establish confidence intervals and test hypotheses with the

Consequences of Using OLS in the Presence of

• Can we establish confidence intervals and test hypotheses with the

• The answer generally is no because var (𝛽መ2∗ ) ≤ var (𝛽መ2 ).

Consequences of Using OLS in the Presence of

• Suppose we use the following formula ignoring heteroskedasticity:

• Variance will be biased estimator.

• If we persist in using the usual testing procedures despite heteroscedasticity,

Consequences of Using OLS in the Presence of

• The message is clear: In the presence of heteroscedasticity, use

• In cross-sectional data involving heterogeneous units,

• Thus, in a cross-sectional analysis involving the investment

• where vi is the stochastic disturbance term.

• If β turns out to be statistically significant, it would suggest that

• In the first stage, we run the OLS regression disregarding the

• We obtain 𝜇Ƹ 𝑖2 from this regression, and then in the second stage we

EXAMPLE : Relationship between Compensation and

EXAMPLE : Relationship between Compensation and

x Coef. Std. Err. t P>|t| [95% Conf. Interval]

y .2329993 .0998528 2.33 0.052 -.0031151 .4691137

• As labor productivity increases by a dollar, labor compensation on

EXAMPLE : Relationship between Compensation and

• Generate squared log of residuals using the following command:

• Generate log of variable X using the command:

EXAMPLE : Relationship between Compensation and

lehat2 Coef. Std. Err. t P>|t| [95% Conf. Interval]

lx -2.802023 4.196131 -0.67 0.526 -12.7243 7.12025

EXAMPLE : Relationship between Compensation and

There is no statistically significant

• Generate the absolute residuals variable:

Source SS df MS Number of obs = 9

aehat Coef. Std. Err. t P>|t| [95% Conf. Interval]