0% found this document useful (0 votes)
63 views9 pages

EC229 Part II Answers

Uploaded by

fahadially263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views9 pages

EC229 Part II Answers

Uploaded by

fahadially263
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Explain the nature, causes, and effects of heteroscedasticity and what are the
remedies of this problem. How does it violate the Gauss-Markov conditions? What
are its implications for the parameter estimates obtained by ordinary least squares?

Heteroscedasticity refers to a situation that occurs when the variance of the residuals is unequal
over a range of measured values; meaning that heteroscedasticity is a result of the linear
regression assumption of homoscedasticity (constant variance) being violated.

The occurrence of heteroscedasticity can be written mathematically as follows:

𝒗𝒂𝒓(𝝁 |𝑿 ) = 𝝈𝟐

and, represented diagrammatically as follows:


When it comes to discussing the causes of heteroscedasticity. There are six causes. One of the
causes being the violation of the one of the classical linear regression models, that sates that a
model should be correctly specified. When such an assumption is violated, and there is omitted
variable biasness, heteroscedasticity can occur.

A second reason is the presence of outlier. Outliers refer to an observation in a dataset that
deviates significantly from the other observations. It is unusually high or low value compared to
the rest of the data as a result of inclusion or exclusion of some observation especially when the
sample size is small, such an occurrence alters the results of the regression analysis.

A third reason for heteroscedasticity is following the error learning models. In this case, we are
referring to how when people are conducting research for example, as they grow to become
experts, their level of faults decline, or become more consistent; which leads to the population
variance decreasing overtime and causing heteroscedasticity.

Furthermore, as data collection techniques improve the population variance is likely to decrease.
For example, banks that have advanced data processing equipment are likely to produce less
errors.

A fifth reason for heteroscedasticity is skewness in data. If the distribution of the dependent
variable is asymmetric, then there is a presence of heteroscedasticity.

In addition, to the reasons stated above, heteroscedasticity can occur as a result of incorrect data
transformation, or incorrect functional form.

Moving onto the effects of heteroscedasticity, one of the first things to be observed is the Gauss-
Markov theorem. This theorem discusses the optimal properties that the coefficients in ordinary
least square estimations are required to possess. These properties are shortened by the denotation
BLUE. An estimator is said to be BLUE when the estimator is linear, unbiased, and have
minimum variance so that it is an efficient estimator. In saying so, we can see that due to
heteroscedasticity Gauss Markov’s theorem is violated as the estimators are no longer efficient
due to variance not being minimum and constant throughout the data. Although, the estimated
parameters remain unbiased.
In inclusion to the point mentioned above, heteroscedasticity produces biased standard errors.
This leads to unreliable confidence intervals and hypothesis testing, which in return sequels to
incorrect conclusions about the significance of variables.

Furthermore, as stated above, heteroscedasticity leads to biased standard errors, which causes the
usual F-test and T-test for significance to no longer be valid. Henceforth, increasing the Type 1
and Type 2 errors (false positives and false negatives).

Heteroscedasticity can also lead to the r-square being inflated. This makes the model appear to
have a better fit than the realistic perspective of things.

Furthermore, in the occurrence of heteroscedasticity occurring due to the model being miss-
specified, the estimates of regression coefficients will be misleading.

Lastly, heteroscedasticity makes comparison between data hard as the differences in error
variance complicate the comparison of goodness of fit measures.

In resolving, heteroscedasticity there are a number of methods. The first remedy being the
application of White’s correction. This was a simple method developed by Hal White. It is used
in the case of not knowing the population variance. Through the White’s correction for
heteroscedasticity a method used in econometrics to provide robust standard errors for regression
coefficients in the presence of heteroscedasticity. The robust standard errors allow for valid
statistical inference even in the presence of heteroscedasticity.

The use of weighted least squares is another efficient solution to heteroscedasticity as this
method assigns weights to observations inversely proportional to the variance of the error term.
This can help to stabilize the variance and improve the efficiency of the estimates. It is applied
when the population variance is known.

The generalized least squares (GLS) is a remedy to heteroscedasticity as it is an extension of


OLS (original least square) that accounts for heteroscedasticity by transforming the model so that
the error terms have constant variance.
Furthermore, as observed from the causes of heteroscedasticity, one the reasons of
heteroscedasticity are violating the assumption of a specified model. Henceforth, in order to
remedy such an occurrence of model misspecification, it is important to ensure that the model
has been correctly specified by including all relevant variables and interactions.

Lastly, when the reason for heteroscedasticity is due to differences between subgroups within the
data, analyzing these subgroups separately can help address the issue of heteroscedasticity.

Henceforth, as observed from above, these are the causes, effects, and remedies of
heteroscedasticity as being able to have homoscedasticity is crucial in regression analysis.
2. Explain the nature, causes and effects autocorrelation and what are the remedies of
this problem. How does it violate the Gauss-Markov conditions? What are its
implications for the parameter estimates obtained by ordinary least squares?

Autocorrelation refers to a situation that occurs when the error term in one period is correlated to
the error term in the previous periods. It is a result of violating the assumption of no serial
correlation between disturbances. Autocorrelation usually occurs in the case of time series data.

Autocorrelation can be symbolically stated as:

Causes of Autocorrelation:

(i) Inertia: An important of most economic time series is inertia, or sluggishness. As is well
known, time series such as GNP, price indexes, production, employment and
unemployment exhibit (business) cycles.
(ii) Specification Bias: (Omitted variable biasness)
(iii) Lags
(iv)Cobweb Phenomenon
(v) Non-stationarity

Effects of Autocorrelation:

 Autocorrelation leads to Gauss-Markov condition of efficiency being violated, as the


variance is no longer at its minimum value, but estimators still remain unbiased.

 The residual variance underestimates the true population variance. This results into R^2
being overestimated.

 Due to this inefficiency: there is a large variance which means large standard error. This
leads to statically significance tests such as t-test and f-test being invalid as we get
misleading results, such as the t-test being less than 1.96 which means it is statistically
insignificant when it is not true.

Remedies of Autocorrelation

In the case that it is pure autocorrelation (autocorrelation is not a result of model


misspecification):
(i) Use appropriate transformation of the original model
(ii) Newey-West
This is used in the case of large samples, in order to obtain robust standard errors that correct
for autocorrelation and heteroscedasticity in order to have valid inferences.
3. What is the difference between positive and negative first-order autocorrelation?

First-order autocorrelation refers to the correlation between a variable and its immediate previous
value. Here’s the difference between positive and negative first-order autocorrelation:

Positive First-Order Autocorrelation


Definition: This occurs when positive error terms in one period are likely to be followed by
positive error terms in the next period, and similarly, negative error terms are likely to be
followed by negative error terms. E.g. Positive Autocorrelation: If sales in one month are high,
sales in the next month are also likely to be high (and vice versa).

Pattern: The residuals show a consistent pattern of rising and falling, indicating a tendency for
errors to maintain their direction over time.

Implication: In time series data, this often suggests a momentum effect, where the process
shows persistence or trend-following behavior.

Negative First-Order Autocorrelation:


Definition: This occurs when positive error terms in one period are likely to be followed by
negative error terms in the next period, and vice versa. E.g. Negative Autocorrelation: If a stock's
price goes up one day, it is likely to go down the next day (and vice versa).

Pattern: The residuals alternate signs more frequently, indicating a tendency for errors to
reverse their direction from one period to the next.

Implication: This often suggests a mean-reverting process, where deviations from the mean are
corrected in subsequent periods.

Identifying Positive and Negative Autocorrelation through Durbin-Watson Test:


- Positive Autocorrelation: Durbin-Watson statistic close to 0 indicates positive
autocorrelation.

- Negative Autocorrelation: Durbin-Watson statistic close to 4 indicates negative


autocorrelation.

- No Autocorrelation: Durbin-Watson statistic around 2 indicates no autocorrelation.

Understanding the nature of autocorrelation is crucial for selecting appropriate modeling


techniques and ensuring the validity of statistical inferences in time series analysis.
4. Explain the difference between the following pairs of terms in the context of binary
choice models: (i) coefficient and marginal effect, (ii) R2 and likelihood ratio index,
(iii) predicted Y and observed Y

I) Coefficient and Marginal Effect


Coefficient Marginal Effect
Definition: In binary choice models (e.g., Definition: The marginal effect measures
logistic regression, probit models), the the change in the probability of the binary
coefficient represents the change in the outcome for a one-unit change in the
log-odds (for logistic regression) or the predictor variable, holding other variables
latent variable (for probit models) for a constant.
one-unit change in the predictor variable.
Interpretation: Marginal effects provide
Interpretation: Coefficients in these a more intuitive understanding of the
models are not directly interpretable in impact of a predictor on the probability of
terms of the probability of the binary the outcome. They are often computed at
outcome because they affect the latent the mean values of the predictors or
variable or log-odds, not the probability averaged over the sample.
itself.

ii) R² and Likelihood Ratio Index


R² Likelihood Ratio Index
Definition: In linear regression, R² Definition: This is a measure used in the
measures the proportion of the variance in context of binary choice models (e.g.,
the dependent variable that is predictable logistic regression). It is calculated as 1 -
from the independent variables. (log-likelihood of the fitted model / log-
likelihood of the null model), where the
Interpretation: Higher values of R² null model has only an intercept.
indicate a better fit of the model to the
data. However, in binary choice models, a Interpretation: The likelihood ratio index
direct analogue of R² is not typically used ranges from 0 to 1, with higher values
because the dependent variable is binary. indicating a better fit. Unlike R² in linear
regression, it does not directly measure
the proportion of variance explained but
provides an indication of model fit
relative to a null model.
iii) Predicted Y and Observed Y

Definition: In binary choice models, Definition: Observed Y refers to the actual


predicted Y refers to the predicted probability binary outcome observed in the data, typically
of the binary outcome being 1 for a given set coded as 0 or 1.
of predictor values.

Interpretation: Predicted probabilities range Interpretation: Observed Y values are the


from 0 to 1 and indicate the likelihood of the ground truth against which the predictions are
event occurring (e.g., the likelihood of a compared to assess the accuracy and
customer making a purchase). performance of the model.

Summary

Coefficient vs. Marginal Effect: Coefficients in binary choice models affect the latent variable
or log-odds, while marginal effects describe changes in the probability of the outcome.

R² vs. Likelihood Ratio Index: R² measures variance explained in linear models, while the
likelihood ratio index assesses model fit in binary choice models.

Predicted Y vs. Observed Y: Predicted Y represents model-derived probabilities, while


observed Y is the actual outcome in the dataset.
5. Using relevant example(s) show how and explain why the linear probability model is
considered an inappropriate model for the estimation of dummy dependent
variables.

6. A second-year student investigates the factors influencing graduating high school.


She defines a variable GRAD that is equal to 1 for those individuals who graduated,
and 0 for those who dropped out, and regresses it on ASVABC, the composite
cognitive ability test score. The regression output shows the result of fitting this
linear probability model

a) Interpret the regression results


ASVABC (test score) is scaled so that it has a mean of zero and its units are standard deviations.

One unit increase in ASVABC score on average increases the probability of graduating by 0.106,
that is 10.6%.

b) Comment on the significance of the variables estimates and the overall significance.

Based off the results shown on the STATA, both parameters are significant:

ASVABC is statistically significant as in accordance to the T-test T-calculated is greater than T-


critical.

The intercept is also statistically significant as in accordance to the T-test T-calculated is greater
than T-critical.

The overall significance of the regression model which is found through the F-test is statistically
significant as F-calculated is greater than F-critical.

You might also like