100% found this document useful (1 vote)
46 views9 pages

ECON 6001 Assignment1 2023

1. This document provides instructions for an assignment in applied econometrics. It includes 20 multiple choice questions covering topics like probability, hypothesis testing, and simple and multiple linear regression. It also includes a review probability question involving a joint probability distribution table. 2. The assignment is due on September 16th, 2023. Students are instructed to show their work and answer the questions independently while being allowed to discuss with classmates and consult references. 3. The multiple choice questions cover concepts like the mean and variance of random variables, probability, hypothesis testing, p-values, correlation, regression coefficients, residuals, and the assumptions and properties of simple and multiple linear regression models.

Uploaded by

雷佳璇
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
46 views9 pages

ECON 6001 Assignment1 2023

1. This document provides instructions for an assignment in applied econometrics. It includes 20 multiple choice questions covering topics like probability, hypothesis testing, and simple and multiple linear regression. It also includes a review probability question involving a joint probability distribution table. 2. The assignment is due on September 16th, 2023. Students are instructed to show their work and answer the questions independently while being allowed to discuss with classmates and consult references. 3. The multiple choice questions cover concepts like the mean and variance of random variables, probability, hypothesis testing, p-values, correlation, regression coefficients, residuals, and the assumptions and properties of simple and multiple linear regression models.

Uploaded by

雷佳璇
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Applied Econometrics, 2023

Assignment 1
DUE DATE: September 16th, 2023

Instructions. Do your best to make your arguments rigorous. You may discuss this problem
set with your classmates and consult any books or notes, but write out the answers on your own
using your own words and show your derivation so that your understanding is transparent from
the answers.

Multiple Choice

1. The mean and variance of a Bernoulli random variable are given as:
A) np and np(1 − p).
B) p and p(1 − p).
p
C) p and p(1 − p).
D) None of above.

2. The probability of an event A or B to occur equals:


P r(A)
A) P r(B) .

B) P r(A) × P r(B).
C) P r(A) + P r(B) if A and B are not mutually exclusive.
D) P r(A) + P r(B) if A and B are mutually exclusive.

3. Assume that you assign the following subjective probabilities for your final grade in your
econometrics course (the standard GPA scale of 4 = A to 0 = F applies):

Grade Probability
A 0.40
B 0.35
C 0.15
D 0.10
F 0.00

The expected value is:


A) 3.0.
B) 3.5.
C) 2.78.
D) 3.05.

4. When the sample size is large, the law of large numbers and the central limit theorem are
useful. Which of the statements related to the law of large numbers and/or the central limit
theorem is true?

1
A) The central limit theorem says that when the sample size is large, Ȳ will be close to µY
with very high probability.
B) The law of large numbers states conditions under which a variable involving the sum of
Y1 , ..., Yn i.i.d. variables becomes the standard normal distribution.
C) The central limit theorem states conditions under which a variable involving the sum of
Y1 , ..., Yn i.i.d. variables becomes the standard normal distribution.
D) The central limit theorem only holds in the presence of the law of large numbers.

5. Standardization puts different variables on the same scale, which allows you to compare
scores between different types of variables. To standardize a random variable, you:
A) Multiply the variable with its standard deviation.
B) Subtract the mean and divide by the standard deviation.
C) Multiply the variable with its variance.
D) Subtract the mean and divide by the variance.

6. An estimate and an estimator are different and should not be misused. They are respectively:
A) efficient if it has the largest variance possible; efficient if it has the smallest variance
possible.
B) unbiased if its expected value equals the population value; a formula that gives an efficient
guess of the true population value.
C) a nonrandom number; a random variable.
D) a random variable; a nonrandom number.

7. To compare different estimators, we introduce unbiasedness, consistency, and efficiency, which


are the desirable characteristics of an estimator. Denote µY as the mean value of Y in a
population, which of the following statements is correct?
A) Suppose µ̂Y and µ̃Y are unbiased estimators, we say that µ̂Y is more efficient than µ̃Y if
µ̂Y has larger variance than µ̃Y .
B) We say that µ̂Y is an consistent estimator of µY if the probability that µ̂Y is within a
small interval of the true value µY approaches 0 as the sample size increases.
C) If E(µ̂Y ) equals µY , then µ̂Y is an unbiased estimator of µY .
D) None of above.

8. Type I error and type II error are likely to be discussed in a hypothesis testing. Which of
the following statements about type I error and/or type II error is correct?
A) Type I error is typically larger than the type II error.
B) Type I error is always the same as (1-type II) error.
C) Type I error is the error you make when rejecting the null hypothesis when it is true.
Type II error is the error you make when not rejecting the null hypothesis when it is false.
D) The power of a test equals one minus the probability of committing a type I error.

2
9. In a null-hypothesis (H0 ) significance testing, p-value is the probability that often mentioned.
Which of the following statements about the p-value is correct?
A) A large p-value implies that the observed value is inconsistent with H0 .
B) It is the probability of drawing a statistic at least as adverse to the null hypothesis as the
one you actually computed in your sample, assuming the null hypothesis is correct.
C) The critical value of a two-sided t-test computed from a large sample is the same as the
p-value.
D) A large p-value implies rejection of H0 .

10. The correlation coefficient is an important concept of statistical relationship between two
variables. Which of the following statements about the correlation coefficient is correct?
A) The correlation coefficient lies between -1 and 1.
B) A low correlation coefficient implies that the two variables are unrelated.
C) The correlation coefficient is a measure of nonlinear association.
D) A scatterplot relates the covariance of X and Y to the correlation coefficient.

11. Which of the following statements is correct?


A) T SS ≤ ESS.
ESS
B) R2 = 1 − T SS .

C) ESS = SSR + T SS.


D) T SS = ESS + SSR.

12. In the simple linear regression model, the regression slope:


A) indicates by how many units Y increases, given a one percent increase in X.
B) when multiplied with the explanatory variable will give you the predicted Y .
C) indicates by how many percent Y increases, given a one unit increase in X.
D) represents the elasticity of Y on X.

13. The OLS residuals, ûi , are defined as:


A) Ŷi − β̂0 − β̂1 Xi .
B) Yi − Ŷi .
C) (Yi − Ŷi )2 .
D) Yi − Ȳi .
E) (Yi − Ȳi )2 .

14. The only difference between a one-and two-sided hypothesis test is:
A) the sign of the slope coefficient.
B) dependent on the sample size n.
C) how you interpret the t-statistic.
D) the null hypothesis.

3
15. In the presence of heteroskedasticity, and assuming that the usual least squares assumptions
hold, the OLS estimator is:
A) efficient.
B) BLUE.
C) unbiased but not consistent.
D) consistent but not unbiased.
E) unbiased and consistent.

16. In the multiple regression model, the adjusted R2 , R̄2 :


A) equals the square of the correlation coefficient r.
B) cannot decrease when an additional explanatory variable is added.
C) will never be greater than the regression R2 .
D) cannot be negative.

17. The reason for including control variables in multiple regressions is to:
A) increase the regression R2 .
B) reduce imperfect multicollinearity.
C) make the variables of interest no longer correlated with the error term, once the control
variables are held constant.
D) reduce heteroskedasticity in the error term.

18. Heteroskedasticity means that:


A) homogeneity cannot be assumed automatically for the model.
B) the observed units have different preferences.
C) the variance of the error term is not constant.
D) agents are not all rational.

19. Which of the following statements related to the multiple regression model is true?
A) The least squares estimator in the multiple regression model is derived by setting the sum
of squared errors equal to zero.
B) The OLS residuals in the multiple regression model can be calculated by subtracting the
fitted values from the actual values.
C) The least squares estimator in the multiple regression model is derived by maximizing the
sum of squared prediction mistakes.
D) The least squares estimator in the multiple regression model is derived by forcing the
smallest distance between the actual and fitted values.

20. Omitted variable bias:


A) will always be present as long as the regression R2 < 1.
B) is always there but is negligible in almost all economic examples.

4
C) exists if the omitted variable is correlated with the included regressor and is a determinant
of the dependent variable.
D) exists if the omitted variable is correlated with the included regressor but is not a deter-
minant of the dependent variable.

Review of Probability

21. (Stock and Watson 2.6) The following table gives the joint probability distribution between
employment status and college graduation among those either employed or looking for work
(unemployed) in the working-age population of South Africa.

Unemployed Employed Total


(Y = 0) (Y = 1)
Non–college grads (X = 0) 0.07 0.60 0.67
College grads (X = 1) 0.04 0.29 0.33
Total 0.11 0.89 1.000

a. Compute E(Y ).
b. The unemployment rate is the fraction of the labor force that is unemployed. Show that
the unemployment rate is given by 1 − E(Y ).
c. Calculate E(Y |X = 1) and E(Y |X = 0).
d. Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates
respectively.
e. A randomly selected member of this population reports being unemployed. What is the
probability that this worker is a college graduate? A non-college graduate?
f. Are educational achievement and employment status independent? Please explain.

Review of Statistics

22. (Stock and Watson 3.15) Ya and Yb are Bernoulli random variables from two different popu-
lations, denoted a and b. Suppose E(Ya ) = pa and E(Yb ) = pb . A random sample of size na
is chosen from population a, with a sample average denoted pˆa , and a random sample of size
nb is chosen from population b, with a sample average denoted pˆb . Suppose the sample from
population a is independent of the sample from population b.
pa (1−pa )
a. Show that E(pˆa ) = pa and var(pˆa ) = na .
pb (1−pb )
Show that E(pˆb ) = pb and var(pˆb ) = nb .
pa (1−pa ) pb (1−pb )
b. Show that var(pˆa − pˆb ) = na + nb . (Hint: Samples are independent.)
c. Suppose na and q nb are large. Show that a 95% confidence interval for pa − pb is given by
{(pˆa − pˆb ) ± 1.96 pˆa (1−
na
pˆa )
+ pˆb (1−
nb
pˆb )
}.
d. Suppose na and nb are large. From part (c), how would you construct a 90% and 99%
confidence interval for pa − pb ?

5
Linear Regression with a Single Regressor

23. (Stock and Watson 4.1) Suppose that a researcher, using data on class size (CS) and average
test scores from 50 third-grade classes, estimates the OLS regression:

\ = 643 − 4.93 × CS, R2 = 0.1, SER = 8.3.


T estScore

a. A classroom has 25 students. What is the regression’s prediction for that classroom’s
average test score?
b. Last year a classroom had 20 students, and this year it has 26 students. What is the
regression’s prediction for the change in the classroom average test score?
c. The sample average class size across the 50 classrooms is 22.8. What is the sample average
of the test scores across the 50 classrooms?
d. What is the sample standard deviation of test scores across the 50 classrooms? (Hint:
You may use SSR and T SS.)

24. (Stock and Watson 4.6) Show that the first least squares assumption, E(ui |Xi ) = 0, implies
that E(Yi |Xi ) = β0 + β1 Xi when there is only one regressor Xi .

25. (Stock and Watson 5.2) Suppose that a researcher, using wage data on 200 randomly selected
male workers and 240 female workers, estimates the OLS regression

W
\ age =10.73 + 3.56 × M ale, R2 = 0.08, SER = 3.5,
(0.16) (0.58)

where W age is measured in dollars per hour and M ale is a binary variable that is equal to
1 if the person is a male and 0 if the person is a female. Define the wage gender gap as the
difference in mean earnings between men and women.

a. What is the estimated gender gap?


b. Is the estimated gender gap significantly different from 0? (Compute the p-value for
testing the null hypothesis that there is no gender gap.)
c. Construct a 95% confidence interval for the gender gap.
d. In the sample, what is the mean wage of women? Of men?
e. Another researcher uses these same data but regresses W age on F emale, a variable that
is equal to 1 if the person is female and 0 if the person a male. What are the regression
estimates calculated from this regression? (What are A, B, C, and D?) Please explain.

W
\ age = A + B × F emale, R2 = C, SER = D.

6
Linear Regression with Multiple Regressors

26. (Stock and Watson 6.1 - 6.4) The following table shows the estimated regressions which are
computed using data for 2015 from the Current Population Survey. The data set consists
of information on 7178 full-time, full-year workers. The highest educational achievement
for each worker was either a high school diploma or a bachelor’s degree. The workers’ ages
ranged from 25 to 34 years. The data set also contains information on the region of the
country where the person lived, marital status, and number of children. For the purposes of
these exercises, let
AHE = average hourly earnings
College = binary variable (1 if college, 0 if high school)
F emale = binary variable (1 if female, 0 if male)
Age = age (in years)
N ortheast = binary variable (1 if Region = Northeast, 0 otherwise)
M idwest = binary variable (1 if Region = Midwest, 0 otherwise)
South = binary variable (1 if Region = South, 0 otherwise)
W est = binary variable (1 if Region = West, 0 otherwise)

a. Compute R̄2 for each of the regressions.


b. Using the regression results in column (1):
b1. Do workers with college degrees earn more, on average, than workers with only high
school diplomas? How much more?
b2. Do men earn more than women, on average? How much more?
c. Using the regression results in column (2):
c1. Is age an important determinant of earnings? Explain.
c2. Sally is a 30-year-old female college graduate. Jimmy is a 34-year-old male college
graduate. Predict Sally’s and Jimmy’s earnings.
d. Using the regression results in column (3):
d1. Do there appear to be important regional differences?
d2. Why is the regressor W est omitted from the regression? What would happen if it
were included?
d3. Juanita is a 28-year-old female college graduate from the South. Jennifer is a 28-year-
old female college graduate from the Midwest. Calculate the expected difference in earnings
between Juanita and Jennifer.

7
Dependent variable: AHE
Regressor (1) (2) (3)
College (X1 ) 10.49 10.43 10.38
Female (X2 ) -4.55 -4.52 -4.53
Age (X3 ) 0.61 0.61
Northeast(X4 ) 0.72
Midwest(X5 ) -1.51
South(X6 ) -0.38
Intercept 18.20 0.21 0.62
Summary Statistics
SER 12.09 11.93 11.87
R 2 0.169 0.187 0.201
R̄2
n 7178 7178 7178

Table 1: Results of Regressions of Average Hourly Earnings on Sex and Education Binary Variables
and Other Characteristics, Using 2015 Data from the Current Population Survey

Programming

27. (Stock and Watson Empirical Exercise.6.1) On the text website1 , you will find the Birth-
weight Smoking data set. Use it to answer the following questions.

(a) Regress Birthweight on Smoker. What is the estimated effect of smoking on birth
weight?
(b) Regress Birthweight on Smoker, Alcohol, and N previst.
i. For omitted variable bias to occur, two conditions must be true:
1. The regressor X is correlated with the omitted variable.
2. The omitted variable is a determinant of the dependent variable, Y .

Please use the concepts to explain why the exclusion of Alcohol and N previst could
lead to omitted variable bias in the regression estimated in (a).
ii. Is the estimated effect of smoking on birth weight substantially different from the
regression that excludes Alcohol and N previst? Does the regression in (a) seem to
suffer from omitted variable bias?
iii. Jane smoked during her pregnancy, did not drink alcohol, and had 8 prenatal care
visits. Use the regression to predict the birth weight of Jane’s child.
iv. Compute R2 and R̄2 . Are they similar? Please explain.
v. How should you interpret the coefficient on N previst? Does the coefficient measure
a causal effect of prenatal visits on birth weight? If not, what does it measure?
(c) The Frisch-Waugh theorem states that the OLS coefficient on X1 in the multiple regres-
sion model can be computed by a sequence of shorter regressions including 3 steps:
1. Regress X1 on X2 , X3 , ..., Xk , and let X̃1 denote the residuals from this regression;
2. Regress Y on X2 , X3 , ..., Xk , and Ỹ denote the residuals from this regression; and
1
http://www.pearsonglobaleditions.com

8
3. Regress Ỹ on X̃1 .

Estimate the coefficient on Smoking for the multiple regression model in (b), using
the three-step process in the Frisch-Waugh theorem. Verify that the three-step process
yields the same estimated coefficient for Smoking as that obtained in (b).
(d) An alternative way to control for prenatal visits is to use the binary variables T ripre0
through T ripre3. Regress Birthweight on Smoker, Alcohol, T ripre0, T ripre2, and
T ripre3.
i. Why is T ripre1 excluded from the regression? What would happen if you included
it in the regression?
ii. The estimated coefficient on T ripre0 is large and negative. What does this coeffi-
cient measure? Interpret its value.
iii. Interpret the value of the estimated coefficients on T ripre2 and T ripre3.
iv. Does the regression in (d) explain a larger fraction of the variance in birth weight
than the regression in (b)?

References

You might also like