0% found this document useful (0 votes)
99 views14 pages

Econ 3180 Final Exam, April 15th 2013 Ryan Godwin

This document is an exam for an econometrics course consisting of 300 marks over 3 hours. It contains 25 multiple choice questions testing concepts like expected value, regression, hypothesis testing, and econometric modeling. A formula sheet and tables are provided. Students are instructed not to open the exam until told to begin.

Uploaded by

sehun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views14 pages

Econ 3180 Final Exam, April 15th 2013 Ryan Godwin

This document is an exam for an econometrics course consisting of 300 marks over 3 hours. It contains 25 multiple choice questions testing concepts like expected value, regression, hypothesis testing, and econometric modeling. A formula sheet and tables are provided. Students are instructed not to open the exam until told to begin.

Uploaded by

sehun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Econ 3180 Final Exam, April 15th 2013

Ryan Godwin

You may use a calculator. Answer all questions in the answer book provided. The exam is 3
hours long and consists of 300 marks.

A formula sheet, a table of probabilities from the standard Normal distribution, and critical
values from the F-distribution, are provided at the back of the exam booklet.

DO NOT OPEN THE EXAM UNTIL YOU ARE INSTRUCTED TO DO SO.


2

Easy Question
[28 marks]

1)

Consider a random variable, Y:

𝑌 = 1, with probability 0.5

𝑌 = 2, with probability 0.5

What is the expected value of Y? (What is 𝐸(𝑌)?)

Part A Multiple Choice


[104 marks – 4 marks each]

2) The sample average is

a. a random variable.
b. a constant.
c. don’t pick this.
d. don’t pick this either.

3) Binary variables (dummy variables)

a. are generally used to control for outliers in your sample.


b. can take on more than two values.
c. exclude certain individuals from your sample.
d. can take on only two values.

4) The regression R2 is a measure of

a. whether or not X causes Y.


b. the goodness of fit of your regression line.
c. whether or not ESS > TSS.
d. the square of the determinant of R.

5) A type I error is

a. always the same as (1-type II) error.


b. the error you make when rejecting the null hypothesis when it is true.
c. the error you make when rejecting the alternative hypothesis when it is true.
d. always 5%.
3

6) Degrees of freedom

a. in the context of the sample variance formula means that estimating the mean uses
up some of the information in the data.
b. can correct for omitted variable bias.
c. are (n-2) when replacing the population mean by the sample mean.
d. ensure that sY2 2
Y .

7) In the simple linear regression model, the regression intercept

a. indicates by how many percent Y increases, given a one percent increase in X.


b. when multiplied with the explanatory variable will give you the predicted Y.
c. indicates by how many units Y increases, given a one unit increase in X.
d. represents the expected value of Y when X is zero.

8) The OLS estimator is unbiased:

a. if least squares assumption #1 holds.


b. if least squares assumption #3 holds.
c. if the total sum of squares is minimized.
d. always.

9) Finding a large value of the p-value (e.g. more than 10%)

a. indicates evidence in favor of the null hypothesis.


b. implies that the t-statistic is less than 1.96.
c. indicates evidence against the null hypothesis.
d. will only happen roughly one in twenty samples.

10) In the multiple regression model, the adjusted R-square, R 2

a. cannot be negative.
b. will never be greater than the unadjusted R2.
c. equals the square of the correlation coefficient r.
d. cannot decrease when an additional explanatory variable is added.
4

11) Under imperfect multicollinearity

a. the OLS estimator cannot be computed.


b. two or more of the regressors are highly correlated.
c. the OLS estimator is biased even in samples of n > 100.
d. the error terms are highly, but not perfectly, correlated.

12) When there are omitted variables in the regression, which are determinants of the
dependent variable, then

a. you cannot measure the effect of the omitted variable, but the estimator of your
included variable(s) is (are) unaffected.
b. this has no effect on the estimator of your included variable because the other
variable is not included.
c. this will always bias the OLS estimator of the included variable.
d. the OLS estimator is biased if the omitted variable is correlated with the included
variable.

13) You have to worry about perfect multicollinearity in the multiple regression model
because

a. many economic variables are perfectly correlated.


b. the OLS estimator is no longer consistent.
c. the OLS estimator cannot be computed in this situation.
d. in real life, economic variables change together all the time.

14) In the multiple regression model, the least squares estimator is derived by

a. minimizing the sum of squared prediction mistakes.


b. setting the sum of squared errors equal to zero.
c. minimizing the absolute difference of the residuals.
d. forcing the smallest distance between the actual and fitted values.

15) The OLS residuals in the multiple regression model

a. cannot be calculated because there is more than one explanatory variable.


b. can be calculated by subtracting the fitted values from the actual values.
c. are zero because the predicted values are another name for forecasted values.
d. are typically the same as the population regression function errors.
5

16) One of the least squares assumptions in the multiple regression model is that you have
random variables which are “i.i.d.” This stands for

a. initially indeterminate differences.


b. irregularly integrated dichotomies.
c. identically initiated deltas.
d. independently and identically distributed.

17) Omitted variable bias

a. will always be present as long as the regression R 2 1


b. is always there but is negligible in almost all economic examples
c. exists if the omitted variable is correlated with the included regressor but is not a
determinant of the dependent variable
d. exists if the omitted variable is correlated with the included regressor and is a
determinant of the dependent variable

18) In multiple regression, the R 2 increases whenever a regressor is

a. added unless the estimated coefficient on the added regressor is exactly zero.
b. added.
c. added unless there is heteroskedasticity.
d. greater than 1.96 in absolute value.

19) In the multiple regression model, the t-statistic for testing that the slope is significantly
different from zero is calculated

a. by dividing the estimate by its standard error.


b. from the square root of the F-statistic.
c. by multiplying the p-value by 1.96.
d. using the adjusted R2 and the confidence interval.

2 2
20) Let Runrestricted and Rrestricted be 0.4366 and 0.4149 respectively. The difference between the
unrestricted and the restricted model is that you have imposed two restrictions. There are
420 observations. The F-statistic in this case is

a. 4.61.
b. 8.01.
c. 10.34.
d. 7.71.
6

21) A 95% confidence set for two or more coefficients is a set that contains

a. the sample values of these coefficients in 95% of randomly drawn samples.


b. integer values only.
c. the same values as the 95% confidence intervals constructed for the coefficients.
d. the population values of these coefficients in 95% of randomly drawn samples.

22) When there are two coefficients, the resulting confidence sets are

a. rectangles.
b. ellipses.
c. squares.
d. trapezoids.

23) All of the following are true, with the exception of one condition:

2
a. a high R 2 or R does not mean that the regressors are a true cause of the
dependent variable.
2
b. a high R 2 or R does not mean that there is no omitted variable bias.
2
c. a high R 2 or R always means that an added variable is statistically significant.
2
d. a high R 2 or R does not necessarily mean that you have the most appropriate set
of regressors.

24) If the estimates of the coefficients of interest change substantially across specifications,

a. then this can be expected from sample variation.


b. then you should change the scale of the variables to make the changes appear to
be smaller.
c. then this often provides evidence that the original specification had omitted
variable bias.
d. then choose the specification for which your coefficient of interest is most
significant.

25) A nonlinear function

a. makes little sense, because variables in the real world are related linearly.
b. can be adequately described by a straight line between the dependent variable and
one of the explanatory variables.
c. is a concept that only applies to the case of a single or two explanatory variables
since you cannot draw a line in four dimensions.
d. is a function with a slope that is not constant.
7

26) An example of a quadratic regression model is

a. Yi 0 1 X Y 2 ui .
2

b. Yi 0 1 ln( X ) ui .
c. Yi 0 1 X 2 X 2 ui .
d. Yi 2 0 1 X ui .

Y
27) In the model Yi 0 1 X1 2 X2 3 ( X1 X 2 ) ui , the expected effect is
X1
a. 1 3 X2 .
b. 1 .
c. 1 3 .
d. 1 3 X1 .
8

Part B Short Answer


[80 marks – 8 marks each]

28) Consider a sample of three observations collected from the random variable, Y:

𝑌 = {2,4,6}

Estimate the variance of Y.

29) Briefly explain how the OLS estimator 𝛽 , in a single regressor model, is derived.

30) Consider the following estimated regression line: 𝑇𝑒𝑠𝑡𝑆𝑐 𝑜𝑟𝑒 = 698.9 − 2.28 × 𝑆𝑇𝑅, where
STR is a variable that describes the student-teacher ratio in a classroom. What is the predicted
test score in a classroom of size 30?

31) What problems arise when there is heteroskedasticity?

32) Briefly explain what it means for an estimator to be unbiased and consistent.

33) Suppose that a researcher, using wage data on 180 randomly selected workers with a
university education and 200 workers without a university, estimates the OLS regression,

𝑊𝑎𝑔𝑒 = 12.34 + 6.52 × 𝑈𝑁𝐼, 𝑅 = 0.28


(1.45) (4.22)

Where Wage is measured in $/hour and UNI is a “dummy” variable that is equal to 1 if the
person has a university education and 0 if the person does not have a university education.

Conduct a formal hypothesis test to determine whether or not obtaining a university education
will affect hourly wage.

34) Consider the population regression model:

𝑤𝑎𝑔𝑒 = 𝛽 + 𝛽 𝑎𝑔𝑒 + 𝛽 𝑚𝑎𝑙𝑒 + 𝑢

where male is a dummy variable that takes on the value “1” if the individual is male, and “0” if
the individual is female. Consider the variable female, which takes on the value “1” if the
individual is female and “0” if the individual is male. What is the problem with adding female to
the model?

35) Why should you use adjusted R-square (𝑅 ) instead of the unadjusted R-square (𝑅 ) in the
multiple regression model?
9

36) Suppose you have estimated a model with multiple regressors, and two of them are
individually statistically insignificant (based on the t-statistics). Can you test the joint hypothesis
that both coefficients are equal to zero using t-tests? Why or why not?

37) Provide an example of a non-linear relationship between two variables. Why is it important
to try to capture non-linear effects in our regressions?

38) Consider the formula:

(𝑅 − 𝑅 )/𝑞
𝐹=
(1 − 𝑅 )/(𝑛 − 𝑘 − 1)

Describe intuitively why a large value for “F” indicates that we should reject the null hypothesis.
10

Part C Long Answer


[88 marks total 8 marks for each part]

39) This question uses the same CollegeDistance data that was used in assignment #3 and #4.

These data are taken from the HighSchool and Beyond survey conducted by the Department of
Education in 1980, with a follow-up in 1986. The survey included students from approximately
1100 high schools.

Series in Data Set

Name Description
ed Years of Education Completed (See below)
female 1 = Female/0 = Male
black 1 = Black/0 = Not-Black
hispanic 1 = Hispanic/0 = Not-Hispanic
bytest Base Year Composite Test Score. (These are achievement tests given to high
school seniors in the sample)
dadcoll 1 = Father is a College Graduate/ 0 = Father is not a College Graduate
momcoll 1 = Mother is a College Graduate/ 0 = Mother is not a College Graduate
incomehi 1 = Family Income > $25,000 per year/ 0 = Income ≤ $25,000 per year.
ownhome 1= Family Owns Home / 0 = Family Does not Own Home
cue80 County Unempolyment rate in 1980
stwmfg80 State Hourly Wage in Manufacturing in 1980
dist Distance from 4yr College in 10's of miles
tuition Avg. State 4yr College Tuition in $1000's

Years of Education: Rouse (the author) computed years of education by assigning 12 years to all
members of the senior class. Each additional year of secondary education counted as a one year.
Student’s with vocational degrees were assigned 13 years, AA degrees were assigned 14 years,
BA degrees were assigned 16 years, those with some graduate education were assigned 17 years,
and those with a graduate degree were assigned 18 years.
11

Below is a table of estimated models which you should use for parts (a) – (k)

(1) (2) (3) (4) (5)


Regressor ed ed ed ed
-0.037** -0.081** -0.081** -0.110** -0.113**
dist
(0.013) (0.026) (0.256) (0.029) (0.025)
0.005* 0.005* 0.007** 0.007**
dist2
(0.002) (0.002) (0.003) (0.002)
-0.191 -0.193 -0.194 -0.210* -0.290**
tuition
(0.101) (0.101) (0.101) (0.101) (0.097)
0.143** 0.143** 0.141** 0.142** 0.133**
female
(0.050) (0.050) (0.050) (0.050) (0.051)
0.351** 0.334** 0.331** 0.333**
black
(0.071) (0.072) (0.072) (0.072)
0.362** 0.333** 0.330** 0.323**
hispanic
(0.077) (0.079) (0.079) (0.079)
0.093** 0.093** 0.093** 0.093** 0.087**
bytest
(0.003) (0.003) (0.003) (0.003) (0.003)
0.372** 0.370** 0.362** 0.217* 0.334**
incomehi
(0.061) (0.061) (0.061) (0.091) (0.061)
0.139* 0.143* 0.141* 0.144* 0.099
ownhome
(0.067) (0.067) (0.067) (0.067) (0.067)
0.571** 0.561** 0.654** 0.663** 0.642**
dadcoll
(0.074) (0.074) (0.084) (0.084) (0.085)
0.378** 0.378** 0.569** 0.568** 0.591**
momcoll
(0.082) (0.081) (0.117) (0.117) (0.118)
-0.367* -0.356* -0.389*
dadcoll × momcoll
(0.161) (0.162) (0.162)
0.029** 0.026** 0.026** 0.026** 0.030**
cue80
(0.010) (0.010) (0.010) (0.010) (0.010)
-0.043* -0.043* -0.042* -0.042* -0.052**
stwmfg80
(0.020) (0.020) (0.020) (0.020) (0.020)
0.124
incomehi × dist
(0.064)
-0.009
incomehi × dist2
(0.007)
8.921** 9.012** 9.002** 9.042** 9.627**
intercept
(0.252) (0.256) (0.256) (0.256) (0.229)

124.8 115.6 107.8 94.73 122.4


F-stat. (overall)
(0.000) (0.000) (0.000) (0.000) (0.000)
𝑅 0.2836 0.2844 0.2854 0.2863 0.2796
𝑅 0.2813 0.2819 0.2827 0.2832 0.2774
Significance at the *5% and **1% significance level.
12

(a) Suppose we are only concerned with the effect of distance on education. Why bother including all the
other variables?

(b) Using model (2), if dist increases from 4 to 5, how are years of education expected to change?

(c) What is the interaction term dadcoll × momcoll, measuring? Why has the inclusion of dadcoll ×
momcoll caused the estimated coefficients on dadcoll and momcoll to change from model (2) to model
(3)?

(d) Do you think that dist has a non-linear or a linear effect on education?

(e) Using model (4), what is the difference between the average number of years of education obtained by
women, versus the average number of years of education obtained by men?

(f) Does ethnicity affect the number of years of education obtained?

(g) What does the interaction term, incomehi × dist, measure? Why include this in the regression?

(h) Why does the 𝑅 increase when going from model (3) to model (4), but the 𝑅 decreases?

(i) State the null and alternative hypothesis associated with the “F-stat. (overall)”.

(j) Suppose you added another variable to the above regression. The p-value associated with the t-statistic
on the estimated coefficient is 0.03. How many stars (*) would you put next to the estimated coefficient in
order to make it compatible with the information in the above table?

(k) Using model (2), construct a 95% confidence interval for the variable cue80. Interpret this interval.

END.
13

Econ 3180 - Final Formula Sheet

expected value of Y (mean of Y) 𝜇

variance of Y 𝜎 = 𝐸(𝑌 − 𝜇 ) = 𝐸(𝑌 ) − (𝜇 )

standard deviation of Y 𝜎 = 𝜎

covariance between X and Y 𝜎 = 𝐸[(𝑋 − 𝜇 )(𝑌 − 𝜇 )]


𝜎
correlation coefficient (between X and Y) 𝜌 =
𝜎 𝜎
expected value of the sample average, 𝑌 𝐸(𝑌) = 𝜇
𝜎
variance of the sample average, 𝑌 𝜎 =
𝑛
t-statistic for testing 𝜇 (for large n, and when 𝑌−𝜇 ,
𝑡= ~ 𝑁(0,1)
𝜎 is known) 𝜎
1
sample variance (estimator for 𝜎 ) 𝑠 = (𝑌 − 𝑌)
𝑛−1
1
sample covariance (estimator for covariance) 𝑠 = (𝑋 − 𝑋)(𝑌 − 𝑌)
𝑛−1
𝑠
sample correlation (estimator for correlation) 𝑟 =
𝑠 𝑠
standard error of 𝑌 (estimator for the standard 𝑠
deviation of 𝑌) 𝑠 =
𝑛
t-statistic for testing 𝜇 (for large n, and when 𝑌 −𝜇 ,
𝑡= ~ 𝑁(0,1)
𝜎 is unknown) 𝑠
95% confidence interval for 𝜇 (for large n) 𝑐𝑜𝑛𝑓. 𝑖𝑛𝑡. = 𝑌 ± 1.96 × 𝑠
population linear regression model with one
𝑌 =𝛽 +𝛽 𝑋 +𝑢 , 𝑖 = 1, … , 𝑛
regressor
∑ (𝑋 − 𝑋)(𝑌 − 𝑌)
OLS estimator of the slope (𝛽 ) 𝛽 =
(𝑋 − 𝑋)
OLS estimator of the intercept (𝛽 ) 𝛽 =𝑌−𝛽 𝑋

OLS predicted values 𝑌 =𝛽 +𝛽 𝑋

OLS residuals 𝑢 =𝑌 −𝑌
14

explained sum of squares 𝐸𝑆𝑆 = 𝑌 −𝑌

total sum of squares 𝑇𝑆𝑆 = (𝑌 − 𝑌)

sum of squared residuals 𝑆𝑆𝑅 = 𝑢

𝐸𝑆𝑆
regression 𝑅 𝑅 =
𝑇𝑆𝑆
1
standard error of regression × 𝑆𝑆𝑅
𝑛−2

L.S.A. #1 𝐸(𝑢|𝑋 = 𝑥) = 0

L.S.A. #2 (𝑋 , 𝑌 ), 𝑖 = 1, … , 𝑛, are i.i.d.

L.S.A. #3 Large outliers are rare.


𝑣𝑎𝑟[(𝑋 − 𝜇 )𝑢 ]
The sampling distribution of 𝛽 (for large n) 𝛽 ~𝑁 𝛽 ,
𝑛𝜎
𝛽 −𝛽 ,
t-statistic for testing 𝛽 𝑡=
𝑆𝐸 𝛽
95% confidence interval for 𝛽 (for large n) 𝑐𝑜𝑛𝑓. 𝑖𝑛𝑡. = 𝛽 ± 1.96 × 𝑆𝐸 𝛽
𝑆𝑆𝑅
alternative regression 𝑅 𝑅 =1−
𝑇𝑆𝑆
𝑆𝑆𝑅 𝑛−1
adjusted R-square (𝑅 ) 𝑅 = 1−
𝑇𝑆𝑆 𝑛 − 𝑘 − 1
(𝑆𝑆𝑅 − 𝑆𝑆𝑅 )/𝑞
F-statistic 𝐹=
𝑆𝑆𝑅 /(𝑛 − 𝑘 − 1)
(𝑅 − 𝑅 )/𝑞
F-statistic 𝐹=
(1 − 𝑅 )/(𝑛 − 𝑘 − 1)

You might also like