Econ 3180 Final Exam, April 15th 2013 Ryan Godwin
Econ 3180 Final Exam, April 15th 2013 Ryan Godwin
Ryan Godwin
You may use a calculator. Answer all questions in the answer book provided. The exam is 3
hours long and consists of 300 marks.
A formula sheet, a table of probabilities from the standard Normal distribution, and critical
values from the F-distribution, are provided at the back of the exam booklet.
Easy Question
[28 marks]
1)
a. a random variable.
b. a constant.
c. don’t pick this.
d. don’t pick this either.
5) A type I error is
6) Degrees of freedom
a. in the context of the sample variance formula means that estimating the mean uses
up some of the information in the data.
b. can correct for omitted variable bias.
c. are (n-2) when replacing the population mean by the sample mean.
d. ensure that sY2 2
Y .
a. cannot be negative.
b. will never be greater than the unadjusted R2.
c. equals the square of the correlation coefficient r.
d. cannot decrease when an additional explanatory variable is added.
4
12) When there are omitted variables in the regression, which are determinants of the
dependent variable, then
a. you cannot measure the effect of the omitted variable, but the estimator of your
included variable(s) is (are) unaffected.
b. this has no effect on the estimator of your included variable because the other
variable is not included.
c. this will always bias the OLS estimator of the included variable.
d. the OLS estimator is biased if the omitted variable is correlated with the included
variable.
13) You have to worry about perfect multicollinearity in the multiple regression model
because
14) In the multiple regression model, the least squares estimator is derived by
16) One of the least squares assumptions in the multiple regression model is that you have
random variables which are “i.i.d.” This stands for
a. added unless the estimated coefficient on the added regressor is exactly zero.
b. added.
c. added unless there is heteroskedasticity.
d. greater than 1.96 in absolute value.
19) In the multiple regression model, the t-statistic for testing that the slope is significantly
different from zero is calculated
2 2
20) Let Runrestricted and Rrestricted be 0.4366 and 0.4149 respectively. The difference between the
unrestricted and the restricted model is that you have imposed two restrictions. There are
420 observations. The F-statistic in this case is
a. 4.61.
b. 8.01.
c. 10.34.
d. 7.71.
6
21) A 95% confidence set for two or more coefficients is a set that contains
22) When there are two coefficients, the resulting confidence sets are
a. rectangles.
b. ellipses.
c. squares.
d. trapezoids.
23) All of the following are true, with the exception of one condition:
2
a. a high R 2 or R does not mean that the regressors are a true cause of the
dependent variable.
2
b. a high R 2 or R does not mean that there is no omitted variable bias.
2
c. a high R 2 or R always means that an added variable is statistically significant.
2
d. a high R 2 or R does not necessarily mean that you have the most appropriate set
of regressors.
24) If the estimates of the coefficients of interest change substantially across specifications,
a. makes little sense, because variables in the real world are related linearly.
b. can be adequately described by a straight line between the dependent variable and
one of the explanatory variables.
c. is a concept that only applies to the case of a single or two explanatory variables
since you cannot draw a line in four dimensions.
d. is a function with a slope that is not constant.
7
a. Yi 0 1 X Y 2 ui .
2
b. Yi 0 1 ln( X ) ui .
c. Yi 0 1 X 2 X 2 ui .
d. Yi 2 0 1 X ui .
Y
27) In the model Yi 0 1 X1 2 X2 3 ( X1 X 2 ) ui , the expected effect is
X1
a. 1 3 X2 .
b. 1 .
c. 1 3 .
d. 1 3 X1 .
8
28) Consider a sample of three observations collected from the random variable, Y:
𝑌 = {2,4,6}
29) Briefly explain how the OLS estimator 𝛽 , in a single regressor model, is derived.
30) Consider the following estimated regression line: 𝑇𝑒𝑠𝑡𝑆𝑐 𝑜𝑟𝑒 = 698.9 − 2.28 × 𝑆𝑇𝑅, where
STR is a variable that describes the student-teacher ratio in a classroom. What is the predicted
test score in a classroom of size 30?
32) Briefly explain what it means for an estimator to be unbiased and consistent.
33) Suppose that a researcher, using wage data on 180 randomly selected workers with a
university education and 200 workers without a university, estimates the OLS regression,
Where Wage is measured in $/hour and UNI is a “dummy” variable that is equal to 1 if the
person has a university education and 0 if the person does not have a university education.
Conduct a formal hypothesis test to determine whether or not obtaining a university education
will affect hourly wage.
where male is a dummy variable that takes on the value “1” if the individual is male, and “0” if
the individual is female. Consider the variable female, which takes on the value “1” if the
individual is female and “0” if the individual is male. What is the problem with adding female to
the model?
35) Why should you use adjusted R-square (𝑅 ) instead of the unadjusted R-square (𝑅 ) in the
multiple regression model?
9
36) Suppose you have estimated a model with multiple regressors, and two of them are
individually statistically insignificant (based on the t-statistics). Can you test the joint hypothesis
that both coefficients are equal to zero using t-tests? Why or why not?
37) Provide an example of a non-linear relationship between two variables. Why is it important
to try to capture non-linear effects in our regressions?
(𝑅 − 𝑅 )/𝑞
𝐹=
(1 − 𝑅 )/(𝑛 − 𝑘 − 1)
Describe intuitively why a large value for “F” indicates that we should reject the null hypothesis.
10
39) This question uses the same CollegeDistance data that was used in assignment #3 and #4.
These data are taken from the HighSchool and Beyond survey conducted by the Department of
Education in 1980, with a follow-up in 1986. The survey included students from approximately
1100 high schools.
Name Description
ed Years of Education Completed (See below)
female 1 = Female/0 = Male
black 1 = Black/0 = Not-Black
hispanic 1 = Hispanic/0 = Not-Hispanic
bytest Base Year Composite Test Score. (These are achievement tests given to high
school seniors in the sample)
dadcoll 1 = Father is a College Graduate/ 0 = Father is not a College Graduate
momcoll 1 = Mother is a College Graduate/ 0 = Mother is not a College Graduate
incomehi 1 = Family Income > $25,000 per year/ 0 = Income ≤ $25,000 per year.
ownhome 1= Family Owns Home / 0 = Family Does not Own Home
cue80 County Unempolyment rate in 1980
stwmfg80 State Hourly Wage in Manufacturing in 1980
dist Distance from 4yr College in 10's of miles
tuition Avg. State 4yr College Tuition in $1000's
Years of Education: Rouse (the author) computed years of education by assigning 12 years to all
members of the senior class. Each additional year of secondary education counted as a one year.
Student’s with vocational degrees were assigned 13 years, AA degrees were assigned 14 years,
BA degrees were assigned 16 years, those with some graduate education were assigned 17 years,
and those with a graduate degree were assigned 18 years.
11
Below is a table of estimated models which you should use for parts (a) – (k)
(a) Suppose we are only concerned with the effect of distance on education. Why bother including all the
other variables?
(b) Using model (2), if dist increases from 4 to 5, how are years of education expected to change?
(c) What is the interaction term dadcoll × momcoll, measuring? Why has the inclusion of dadcoll ×
momcoll caused the estimated coefficients on dadcoll and momcoll to change from model (2) to model
(3)?
(d) Do you think that dist has a non-linear or a linear effect on education?
(e) Using model (4), what is the difference between the average number of years of education obtained by
women, versus the average number of years of education obtained by men?
(g) What does the interaction term, incomehi × dist, measure? Why include this in the regression?
(h) Why does the 𝑅 increase when going from model (3) to model (4), but the 𝑅 decreases?
(i) State the null and alternative hypothesis associated with the “F-stat. (overall)”.
(j) Suppose you added another variable to the above regression. The p-value associated with the t-statistic
on the estimated coefficient is 0.03. How many stars (*) would you put next to the estimated coefficient in
order to make it compatible with the information in the above table?
(k) Using model (2), construct a 95% confidence interval for the variable cue80. Interpret this interval.
END.
13
standard deviation of Y 𝜎 = 𝜎
OLS residuals 𝑢 =𝑌 −𝑌
14
𝐸𝑆𝑆
regression 𝑅 𝑅 =
𝑇𝑆𝑆
1
standard error of regression × 𝑆𝑆𝑅
𝑛−2
L.S.A. #1 𝐸(𝑢|𝑋 = 𝑥) = 0