Econ7020X FinalReview (Answers)
Econ7020X FinalReview (Answers)
Explain which Gauss-Markov assumption each of these models break and give a potential
solution.
1. The Cobb-Douglas production function is a classic example of a production function
often used in macroeconomics. It is commonly expressed as:
α β
Y=A⋅K ⋅L
Y is the total production (output),
K is the amount of capital input,
L is the amount of labor input,
A represents the total factor productivity, and
α , and β are output elasticities of capital and labor respectively.
Violates the linearity assumption- because our parameters are in the exponents, therefore this
is not a linear functional form.
ln ( Y ) =ln ( A K α L β ) → ln ( A ) +ln ( K α ) + ln ( Lβ ) → ln ( A ) + α ln ( K ) + β ln ( L )
ln ( Y ) =ln ( A ) +α ln ( K ) + β ln ( L )
2. If we are constructing a model of precipitation and the variables we include in our model
are:
Relative humidity in percentage
Cloud coverage in percentage
Temperature in degrees Celsius
Temperature in degrees Fahrenheit
R=β 0+ β 1 H+ β 2 L+ β 3 T c + β 4 T F +ε
R=β 0+ β 1 H+ β 2 L+ β 3 T c + ε
3. A simple model of housing prices (Y) based on square footage (X). Consider the error
terms.
This likely breaks homoskedasticity because as the square footage of the house increases, the
error is likely to be more dispersed.
Logarithm transformation.
4. Suppose we run a study examining the effect of education on income, where the
researcher fails to account for unobserved factors such as natural ability or motivation. If
these unobserved factors are correlated with both education and income, what assumption
is violated?
Strict exogeneity, because the unobserved factors are causing correlation between the regressor
education and the error term.
Income=β 0+ β1 education+ ε
Explain what a dummy variable is, when they are useful, and what the interpretation of the effect
of a dummy variable is.
E.g. for a dummy variable, D, what is the interpretation of gamma?
Y = β0 + β 1 X 1 + D1 γ 1 +ε
A dummy variable is a binary variable, that either takes a zero or one, and is typically useful in
denoting membership in a category or having a characteristic.
Is_female Is_black Is_approved Attended_college
Education via dummy variables
Less than high school, high school diploma, some college, associates degree, bachelors, masters,
doctoral professional
Y = β0 + D 1 γ 1 + ε
Gamma refers to the marginal effect of being a woman on the expected value of attending
college.
160
140
120
100
80
60
40
20
0
1995 2000 2005 2010 2015 2020 2025 2030 2035
Your colleague approaches you about a fancy new trick they learned in Microsoft excel, that
allows them to forecast data using a polynomial regression. Excitedly they explain that by
making a graph of a column in their spreadsheet, clicking add trendline, and format trendline,
they can turn that inaccurate line forecast into a really closely fitting ‘polynomial 4’ regression.
Explain to your colleague what a polynomial 4 regression is, and why it might not be a good idea
to use for this dataset that you happen to know grows exponentially. Why does this forecast look
so good? (specifically, you know the true data generating process is:
ROUND(0.003*EXP(A2)+A2*2+RANDBETWEEN(0,5), 0) where the A column contains an
index from 1 to 10.)
Index Value
1 6
2 4
3 6
4 10
5 15
6 14
7 17
8 26
9 44
10 86
Polynomial Regressions
100
90
60
30
20
10
0
1 2 3 4 5 6 7 8 9 10
This is a classic example of overfitting. Some sampling variation is causing the model to fit the
data very closely, despite not being the true relationship. Also, R^2 is a measure of goodness of
fit, not a measure of statistical significance.
Anscombe’s Quartet: descriptive statistics are not everything. Do not only rely on statistics
to justify the model.
What is an instrumental variable?
b. What are the two conditions for a valid instrumental variable, and how might we assess the
validity of an instrument?
b. Suppose you are presented with regression results for the following probit model,
and your colleague comments that the logit and probit model results cannot be
right- there is a 3.5% higher chance of working at the government given a one-
year increase in schooling, but the probit model says it should be a 27% increase
and logit says over 55%. How would you respond to this comment?
For context schooling refers to years of schooling, the dependent variable is probability of
working in a government job.
Let 𝑌1,𝑌2,… ,𝑌𝑛 be i.i.d. draws from a distribution of mean μ. A test of 𝐻0: 𝜇≥10
Question 1, single variable statistics (10 pts):
versus 𝐻𝐴:𝜇<10
using the usual t-statistic yields a p-value of 0.03.
a. Can we reject the null hypothesis at 5% significance level (or 𝛼=0.05)? Explain?
b. How about at 1% significance level (or 𝛼=0.01)? Explain?
Mark Questions
11. Which of the following best describes the term "endogeneity" in regression analysis?
a) The independent variable is caused by the dependent variable.
b) The variance of the error term changes over observations.
c) The dependent variable is caused by the independent variable.
d) The error term is correlated with one or more independent variables.
12. Which of the following statistics is commonly used to detect the presence of
autocorrelation in the residuals of a time-series regression?
a) F-statistic
b) T-statistic
c) Durbin-Watson statistic
d) Jarque-Bera statistic
13. In the context of OLS regression, which assumption, if violated, can lead to
heteroskedasticity?
a) No endogeneity
b) Linearity in parameters
c) Constant variance of errors
d) No multicollinearity
14. What is the primary purpose of adding dummy variables in a regression model?
a) To account for autocorrelation
b) To represent categorical variables
c) To handle missing data
d) To improve the R-squared value
15. In a simple linear regression, if all data points lie perfectly on a straight line, what will be
the value of R-squared?
a) 0
b) 0.5
c) 0.75
d) 1
Regression Results
Which of the following coefficients are statistically significant results?
Interpret the base case model.
Data Results
Without reading the full paper, tell me everything you can about this research design and
findings, just with this data visualization.