0% found this document useful (0 votes)
18 views

Problem Set 3

This document contains a problem set with multiple questions regarding multiple linear regression analysis. It includes questions about omitted variable bias, estimating regression coefficients, and making predictions using regression models. Students are asked to consider regressions using different variables and datasets to analyze effects, interpret coefficients, and check for potential omitted variable bias.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Problem Set 3

This document contains a problem set with multiple questions regarding multiple linear regression analysis. It includes questions about omitted variable bias, estimating regression coefficients, and making predictions using regression models. Students are asked to consider regressions using different variables and datasets to analyze effects, interpret coefficients, and check for potential omitted variable bias.

Uploaded by

Luca Vanz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Universidad Carlos III de Madrid

ME-MIEM
Econometrics
Multiple Linear Regression. Estimation II
Problem Set 3

1. (Yi ; X1i ; X2i ) satisfy the assumptions of multiple regression model RLM.1-RLM.4. You are interested in 1 ,
the causal e¤ect of X1 on Y: Assume that X1 and X2 are uncorrelated. 1 is estimated by regressing Y on X1
(so that X2 is not included in the regression). Does this estimator su¤er from omitted variable bias? Explain.
2. (Yi ; X1i ; X2i ) satisfy the assumptions of multiple regression model RLM.1-RLM.4. Furthermore, V ar (Ui jX1i ; X2i ) =
4; and V ar (X1i ) = 6: A random sample of size n = 400 is drawn from the population.

(a) Suppose that X1 and X2 are uncorrelated. Calculate the variance of ^ 1 :


(b) Suppose that Corr(X1 ,X2 ) = 0:5. Calculate the variance of ^ : 1
(c) Comment the following statements: "If X1 and X2 are correlated, the variance of ^ 1 is bigger than it
would be if X1 and X2 were uncorrelated. Therefore, if we are interested in 1 ; is better to leave X2 out
of the regression if it is correlated with X1 :"

3. A school district runs an experiment to estimate the e¤ect of class size on obtained test scores in the second
year exams. The district allocates 50% of its …rst-course student from the previous year to second-course small
classes (18 students per class) and 50% to classes of normal size (21 students per class). New students of
district are treated di¤erently: 20% are randomly assigned to small classes and 80% to normal size classes. At
the end of the course for second-course students, each student is subjected to a standardized test. Let Yi the
grade obtained by the ith student, X1i is a binary variable equal to 1 if the student is assigned to a small class
and X2i is a binary variable that takes the value 1 if the student is incoming. Let 1 be the causal e¤ect on
test scores of reducing class size from a normal size to a small size.

(a) Consider the regression Yi = 0 + 1 X1i + Ui : Do you think that E ( Ui j X1i ) = 0? Is the OLS estimator
unbiased and consistent? Explain.
(b) Consider the regression Yi = 0 + 1 X1i + 2 X2i + Ui Do you think that E ( Ui j X1i ; X2i ) depends on
X1i ? Explain. Do you think that E ( Ui j X1i ; X2i ) depends on X2i ? Explain. Will the 2 OLS estimator
provide an unbiased and consistent estimation of the causal e¤ect of the change to the new school (that
is, of being an incoming student)? Explain.

4. Using the data set CollegeDistance carry out the following exercises.

(a) Run a regression of years of completed education (ED) on distance to the nearest college (Dist). What is
the estimated slope?
(b) Run a regression of ED on Dist; but include some additional regressors to control for characteristics of
the student, the student’s family and the local labor market. In particular, include as additional regressors
the variables Bytest; F emale; Black; Hispanic; Incomehi; Ownhome, DadColl; Cue80; and Stwmf g80:
What is the estimated e¤ect of Dist on ED?
(c) Is the estimated e¤ect of Dist on ED in the regression in (b) substantively di¤erent from the regression
in (a)? Based on this, does the regression in (a) seem to su¤er from important omitted variable bias?
(d) Compare the …t of the regression in (a) and (b) using the regression standard errors, R2 and R2 : Why are
the R2 and R2 so similar in regression (b)?
(e) The value of the coe¢ cient on DadColl is positive. What does this coe¢ cient measure?
(f) Explain why Cue80 and Swmf g80 appear in the regression. Are the signs of their estimated coe¢ cients
(+ or -). What would you have believed? Interpret the magnitudes of these coe¢ cients.
(g) Bob is a black male. His high school was 20 miles from the nearest college. His test score (Bytest) was 58.
His family income in 1980 was $26,000 and his family owned a home. His mother attended college, but
his father did not. The unemployment rate in his county was 7.5%, and the State average manufacturing
hourly wage was $9.74. Predict the number Bob’s years of completed schooling using the regression in
(b) :

1
(h) Jim has the same characteristics as Bob except that his high school was 40 miles from the nearest college.
Predict Jim’s years of completed schooling using the regression in (b) :

5. A researcher plans to study the causal e¤ect of police on crime based on data from a random sample of US
counties. A regression of the county’s crime rate on the size (per capita) of the county police corps is proposed.

(a) Explain why this regression is likely to have an omitted variable bias. What variables would you add to
the regression to control for the important omitted variables?
(b) Use your answer in (a) and the expression of the omitted variable bias to determine if the regression is
likely to over or underestimate the e¤ect of the police on the crime rate (i.e. do you think that ^ 1 > 1
or that ^ 1 < 1 ?)

6. This problem deals with the di¤erence between lineal and causal relation and the misspeci…cation bias. Given
two variables Y and X; we know that

E ( Y j X) = 0 + 1 log X;

where 0 and 1 are two unknown parameters. We know that 1 6= 0: However, we estimate the following
model by OLS
Y = 0 + 1X + "; (1)
where 0 and 1 are unknown parameters, and we know that the error term " satis…es E (") = E ("X) = 0:

(a) Establish the relation between 1 and 1:


0 0
(b) Establish the relation between the 1s OLS estimator in model (1) and the 1s OLS estimator in the
model
Y = 0 + 1 log X + U;
where U is an error term.

ANSWERS:
4. (a) -0.073
(b) -0.032
(c) The coe¢ cient has fallen by more than 50%. Thus, it seems that result in (a) did su¤er from omitted
variable bias.
(d) The regression in (b) …ts the data much better as indicated by the R2 , R2 , and SER. The R2 and R2 are
similar because the number of observations is large (n = 3796).
(e) Students with a “dadcoll = 1” (so that the student’s father went to college) complete 0.696 more years of
education, on average, than students with “dadcoll = 0” (so that the student’s father did not go to college).
(f) These terms capture the opportunity cost of attending college. As ST W M F G, the 1980 state hourly wage
in manufacturing, increases, forgone wages increase, so that, on average, college attendance declines. The
negative sign on the coe¢ cient is consistent with this. As CU E80, the county unemployment rate, increases, it
is more di¢ cult to …nd a job, which lowers the opportunity cost of attending college, so that college attendance
increases. The positive sign on the coe¢ cient is consistent with this.
(g) Bob’s predicted years of education are 14.79.
(h) Jim’s expected years of education is 0.0630 less than Bob’s. Thus, Jim’s expected years of education is
14.69.

You might also like