EC229 Part II Answers
EC229 Part II Answers
Explain the nature, causes, and effects of heteroscedasticity and what are the
remedies of this problem. How does it violate the Gauss-Markov conditions? What
are its implications for the parameter estimates obtained by ordinary least squares?
Heteroscedasticity refers to a situation that occurs when the variance of the residuals is unequal
over a range of measured values; meaning that heteroscedasticity is a result of the linear
regression assumption of homoscedasticity (constant variance) being violated.
𝒗𝒂𝒓(𝝁 |𝑿 ) = 𝝈𝟐
A second reason is the presence of outlier. Outliers refer to an observation in a dataset that
deviates significantly from the other observations. It is unusually high or low value compared to
the rest of the data as a result of inclusion or exclusion of some observation especially when the
sample size is small, such an occurrence alters the results of the regression analysis.
A third reason for heteroscedasticity is following the error learning models. In this case, we are
referring to how when people are conducting research for example, as they grow to become
experts, their level of faults decline, or become more consistent; which leads to the population
variance decreasing overtime and causing heteroscedasticity.
Furthermore, as data collection techniques improve the population variance is likely to decrease.
For example, banks that have advanced data processing equipment are likely to produce less
errors.
A fifth reason for heteroscedasticity is skewness in data. If the distribution of the dependent
variable is asymmetric, then there is a presence of heteroscedasticity.
In addition, to the reasons stated above, heteroscedasticity can occur as a result of incorrect data
transformation, or incorrect functional form.
Moving onto the effects of heteroscedasticity, one of the first things to be observed is the Gauss-
Markov theorem. This theorem discusses the optimal properties that the coefficients in ordinary
least square estimations are required to possess. These properties are shortened by the denotation
BLUE. An estimator is said to be BLUE when the estimator is linear, unbiased, and have
minimum variance so that it is an efficient estimator. In saying so, we can see that due to
heteroscedasticity Gauss Markov’s theorem is violated as the estimators are no longer efficient
due to variance not being minimum and constant throughout the data. Although, the estimated
parameters remain unbiased.
In inclusion to the point mentioned above, heteroscedasticity produces biased standard errors.
This leads to unreliable confidence intervals and hypothesis testing, which in return sequels to
incorrect conclusions about the significance of variables.
Furthermore, as stated above, heteroscedasticity leads to biased standard errors, which causes the
usual F-test and T-test for significance to no longer be valid. Henceforth, increasing the Type 1
and Type 2 errors (false positives and false negatives).
Heteroscedasticity can also lead to the r-square being inflated. This makes the model appear to
have a better fit than the realistic perspective of things.
Furthermore, in the occurrence of heteroscedasticity occurring due to the model being miss-
specified, the estimates of regression coefficients will be misleading.
Lastly, heteroscedasticity makes comparison between data hard as the differences in error
variance complicate the comparison of goodness of fit measures.
In resolving, heteroscedasticity there are a number of methods. The first remedy being the
application of White’s correction. This was a simple method developed by Hal White. It is used
in the case of not knowing the population variance. Through the White’s correction for
heteroscedasticity a method used in econometrics to provide robust standard errors for regression
coefficients in the presence of heteroscedasticity. The robust standard errors allow for valid
statistical inference even in the presence of heteroscedasticity.
The use of weighted least squares is another efficient solution to heteroscedasticity as this
method assigns weights to observations inversely proportional to the variance of the error term.
This can help to stabilize the variance and improve the efficiency of the estimates. It is applied
when the population variance is known.
Lastly, when the reason for heteroscedasticity is due to differences between subgroups within the
data, analyzing these subgroups separately can help address the issue of heteroscedasticity.
Henceforth, as observed from above, these are the causes, effects, and remedies of
heteroscedasticity as being able to have homoscedasticity is crucial in regression analysis.
2. Explain the nature, causes and effects autocorrelation and what are the remedies of
this problem. How does it violate the Gauss-Markov conditions? What are its
implications for the parameter estimates obtained by ordinary least squares?
Autocorrelation refers to a situation that occurs when the error term in one period is correlated to
the error term in the previous periods. It is a result of violating the assumption of no serial
correlation between disturbances. Autocorrelation usually occurs in the case of time series data.
Causes of Autocorrelation:
(i) Inertia: An important of most economic time series is inertia, or sluggishness. As is well
known, time series such as GNP, price indexes, production, employment and
unemployment exhibit (business) cycles.
(ii) Specification Bias: (Omitted variable biasness)
(iii) Lags
(iv)Cobweb Phenomenon
(v) Non-stationarity
Effects of Autocorrelation:
The residual variance underestimates the true population variance. This results into R^2
being overestimated.
Due to this inefficiency: there is a large variance which means large standard error. This
leads to statically significance tests such as t-test and f-test being invalid as we get
misleading results, such as the t-test being less than 1.96 which means it is statistically
insignificant when it is not true.
Remedies of Autocorrelation
First-order autocorrelation refers to the correlation between a variable and its immediate previous
value. Here’s the difference between positive and negative first-order autocorrelation:
Pattern: The residuals show a consistent pattern of rising and falling, indicating a tendency for
errors to maintain their direction over time.
Implication: In time series data, this often suggests a momentum effect, where the process
shows persistence or trend-following behavior.
Pattern: The residuals alternate signs more frequently, indicating a tendency for errors to
reverse their direction from one period to the next.
Implication: This often suggests a mean-reverting process, where deviations from the mean are
corrected in subsequent periods.
Summary
Coefficient vs. Marginal Effect: Coefficients in binary choice models affect the latent variable
or log-odds, while marginal effects describe changes in the probability of the outcome.
R² vs. Likelihood Ratio Index: R² measures variance explained in linear models, while the
likelihood ratio index assesses model fit in binary choice models.
One unit increase in ASVABC score on average increases the probability of graduating by 0.106,
that is 10.6%.
b) Comment on the significance of the variables estimates and the overall significance.
Based off the results shown on the STATA, both parameters are significant:
The intercept is also statistically significant as in accordance to the T-test T-calculated is greater
than T-critical.
The overall significance of the regression model which is found through the F-test is statistically
significant as F-calculated is greater than F-critical.