0% found this document useful (0 votes)
20 views5 pages

333 Practice Final Solutions

Uploaded by

bowerdl0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

333 Practice Final Solutions

Uploaded by

bowerdl0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

University of the South

Fall 2023

Practice Questions for Final Examination.

1 Multiple Choices
1) In the model y = β0 +β1 x1 +β2 x2 +u, when is it appropriate to interpret as the marginal
effect of x1 on the probability of a success?
a) When x1 is a dummy variable.
b) When all variables but y are dummy variables.
c) When y is a dummy variable.
d) When y is any quantity between 0 and 1.
e) Both c) and d) are correct.
2) Why does STATA always omit one of the dummy variables (out of a set of
mutually exclusive dummy variables) when performing a regression?
a) To prevent redundance amongst the independent variables.
b) To avoid biases in the slope estimates.
c) To prevent heteroskedasticity.
d) None of the other answers.
3) Suppose you estimate the probability of getting a college degree as a function of
individual characteristics, using a sample of 200 observations. Suppose that ŷ > 0.5 for 120
observations but the data contains 140 individuals with a college degree. If the Which of
the following is necessarily true?
a) The percentage of correctly predicted observations is 85.7%.
b) The probability of getting a degree is predicted to be less than 50% for 60 individuals.

c) The percentage of correctly predicted observations is at most 90%.


d) None of the above.
4) Consider the following regression linking savings (in $ per year) to income (also in
$ per year) and the interest rate (as a percentage). The regression yields the following:
ln(s-
avings) = 250 + 0.01income + 0.1interestrate. Which interpretation is true?
a) Savings increase by 0.1% for every percentage increase in the interest rate.

1
b) Savings increase by 10% for every percentage point increase in the interest rate.
c) Savings increase by 0.1% for every percentage point increase in the interest rate.
d) Individuals with zero income are always able to save even when the interest rate is
zero.
Which of the following is a valid reason to use an interaction term in a
regression?
a) When some of the independent variables are endogenous but some are not.
b) When the effect of an independent variable depends on the value taken by another
independent variable.
c) When the goodness of fit of the current regression is too low.
d) Whenever you have many correlated dummy variables as independent variables.

2 Short Answers
Question 1: In this problem, we study a sample of college basketball games. The
dependent variable is favwin, a dummy variable that take a value = 1 when the favored team
won. Being favored is measured by a variable spread, which corresponds to the so-called
Las Vegas spread. The spread is a measure of advantage (odds of winning) and the variable
ranges from 1 to 39 with a mean of 9.6 according to the sample.
a) Consider the Linear Probability Model (LPM), where favwin = β1 + β2spread.
Results of the OLS regression are given in table 1 from the other file. Test H0 : β1 =0.5 at
the 5% significance level. Interpret the result.
Answer: The t-stat for this test is (0.576-0.5)/0.031~ 2.45, which leads to a rejection of the
null hypothesis at the 5% level. Note that the value 0.5 is not included in the confidence
interval. This means the odds of the favored team winning when the spread is zero is not
50% (here it tends to be a bit more than that).

b) Is spread statistically significant? If so, what is the impact of a one-unit increase in


spread?
Answer: Yes, very much so even at the 1% level. A unit increase in spread increases the
probability of the favored team winning by about 0.019.

c) According to the LPM estimates, compute the estimated probability of a success


when spread = 1, spread = 9.6 and when spread = 39. Comment on your findings.

Answer:

2
Value of spread "
𝒚

1 0.595

9.6 0.761

39 1.328

The main issue to point out is the out-of-range prediction for spread=39. The probability of a
success is predicted to be way above 1, which is not realistic. This is an issue in the LPM when
considering extreme values for the independent variables. The model predicts that the
probability of success reaches 1 for a value of spread ~ 22.31.

3
Question 2:
a) Based on this regression, does it appear that job training, which took place in 1976
and 1977, had a positive effect on real labor earnings in 1978?
Answer: the estimate is positive and statistically significant at the 5% level. Given the dependent
variables is measured in $1,000 (of real dollars) then the training increases earnings by $1,794,
on average.

b) Now using the first-difference in real labor earnings, cre = re78 − re75, as the
dependent variable, the following regression has been performed: cre = β1 ∗ train + vi
(see table 4). What is the estimated effect of training? Discuss how it compares with the
estimate in part (i). Explain carefully.

Answer: It is now a $1,529 increase on real earnings, statistically significant at the 5% level..
That is less than previously estimated. This likely reflects on a positive bias in a), perhaps
individuals who were more concerned with increasing their productivity are also the more
productive. This could explain a possible upward bias in part a).

c) Find the 95% confidence interval for the training effect using the heteroskedasticity-
robust standard error reported in table 4, and describe your findings.

Answer: See table 4. It ranges from 0.12 to 2.93 for a range of job training effect from $120 to
$2,930.

Question 3
Consider the following model:
yit = β1 + β2xit + β3wit + ϵit
where ϵit = ai + uit
As usual, the index it refers to individual i at period t. To add some context, suppose
that y is log(wage), x is experience (measured in years) and w is a dummy for gender (= 1
if male).
a) What does the term ai represents? Explain and provide an example.
Answer: It represents an unobservable fixed effect, constant over time. In this context, it could
capture elements like race or other individual traits. It might include location, type of work, etc,
depending on the time frame.
b) Suppose you have access to a panel data and perform the usual (pooled) OLS
regression to estimate the βs. Under which assumption(s) would you expect them to be
4
BLUE?

Answer: As usual, we need assumptions MLR1-MLR5 to hold for BLUE. That implies that the
correlation between x,w and the error term (both fixed effect and u) is zero.
c) To make better use of the panel data set, someone suggests that you run a first-
difference (FD) regression. Explain why it might be a good idea to do so from an
econometric point of view.
Answer: Because the potential bias that might come from a non-zero correlation between the
independent variables and the unobserved fixed effect ai is no longer a concern. The FD is
usually not as biased, if at all.

Question 4
a) Explain the concept of heteroskedasticity and explain why it is important to account
for it when performing regressions.

Answer: Heteroskedasticity is present when the variance of the error term depends on the values
taken by the independent variables, i.e. when MLR5 is violated. Not accounting for it could
render the statistical inference (t-tests and F-tests) useless. That is why using the robust
estimator is important.
b) What is an interaction variable? Provide an example.
Answer: It is the product of two variables. For example, if the return to education depends on race
then adding a term Education*Race (possibly one term for each race) would allow the model to
capture that relationship. We covered many examples in class so just mention one of them here.
c) Explain what is an endogeneity bias.
Answer: It occurs when an independent variable is also endogenous. When that is the case then
this variable necessarily correlates with the error term, which violates MLR4 and leads to a bias.
It is fairly common to encounter such variables.

You might also like