hw2 2024spring Solution
hw2 2024spring Solution
2.The slope estimator, β1 , has a smaller standard error, other things equal,
if
a. there is more variation in the explanatory variable, X.
b. there is a large variance of the error term, u.
c. the sample size is smaller.
d. the intercept, β0 , is small.
Answer: a
5. Imperfect multicollinearity
a. implies that it will be dicult to estimate precisely one or more of the
partial eects using the data at hand
1
b. violates one of the four Least Squares assumptions in the multiple regres-
sion model
c. means that you cannot estimate the eect of at least one of the Xs on Y
d. suggests that a standard spreadsheet program does not have enough
power to estimate the multiple regression model
Answer: a
6. Consider the multiple regression model with two regressors X1 and X2,
where both variables are determinants of the dependent variable. You rst
regress Y on X1 only and nd no relationship. However when regressing Y on
X1 and X2, the slope coecient changes by a large amount. This suggests that
your rst regression suers from
a. heteroskedasticity
b. perfect multicollinearity
c. omitted variable bias
d. dummy variable trap
Answer: c
7. If you reject a joint null hypothesis using the F-test in a multiple hypoth-
esis setting, then
a. a series of t-tests may or may not give you the same conclusion.
b. the regression is always signicant.
c. all of the hypotheses are always simultaneously rejected.
d. the F-statistic must be negative.
Answer: a
8. The following linear hypothesis can be tested using the F-test with the
exception of
a. β2 = 1 and β3 = β4 /β5
b.β2 =0
c. β1 + β2 = 1 and β3 = −2β4
d. β0 = β1 and β1 = 0
Answer: a
9. A nonlinear function
a. makes little sense, because variables in the real world are related linearly.
b. can be adequately described by a straight line between the dependent
variable and one of the explanatory variables.
c. is a concept that only applies to the case of a single or two explanatory
variables since you cannot draw a line in four dimensions.
d. is a function with a slope that is not constant.
Answer: d
T estScore
d = 607.3 + 3.85Income − 0.0423Income2
2
where TestScore is the average of the reading and math scores on the Stan-
ford 9 standardized test administered to 5th grade students in 420 California
school districts in 1998 and 1999. Income is the average annual per capita in-
come in the school district, measured in thousands of 1998 dollars. The equation
a. suggests a positive relationship between test scores and income for most
of the sample.
b. is positive until a value of Income of 610.81.
c. does not make much sense since the square of income is entered.
d. suggests a positive relationship between test scores and income for all of
the sample.
Answer: a
(4 points) (i) Explain what the coecient values 689.99 and 10.67 mean.
3
Answer: The coecient 10.64 shows the marginal eect of Age on AWE; that
is, the average weekly earnings is expected to increase by $10.67 for each addi-
tional year of age. 689.88 is the intercept of the regression line. It determines
the overall level of the line.
(2 points) (ii) The regression R2 is 0.045. What are the units of measurement
2 2
for the R ? (Dollars? Years? Or is R unit-free?)
Answer:
Unit-free.
(2 points) (iii) What is the regression's predicted earnings for a 28-year-old
worker? A 38-year-old worker?
Answer:
The regression's predicted average weekly earnings for a 28-year-old worker
is $988.64 : AWˆ E = 689.88+10.67×28 = 988.64. The number for a 38-year-old
worker is ˆ E = 689.88 + 10.67 × 38 = 1095.34.
$1095.34, AW
(4 points) (iv) Will the regression give reliable predictions for a 95-year-old
worker? Why or why not?
Answer:
No. The oldest worker in the sample is 65 years old, 95 years is far outside
the range of the sample data.
(2 points) (v) The average age in this sample is 41 years. What is the average
value of AWE in the sample?
Answer:
The sample mean of AW E is $1127.35. ¯ E = 689.88 + 10.67 × 41 =
AW
1127.35.
2.2. This question is about an omitted variable bias. The following model
estimates the eects of age on time spent sleeping by adults:
Entertainment
d = 1800.15 − 30.34age,
(62.13) (8.5)
2
n = 706, R = 0.008,
4
this condition holds in reality? If yes, what do you expect the sign of the omitted
variable bias is?
Answer: For an omitted variable bias to occur, totwrk must be correlated
with age, which is quite plausible. Among adults, it is likely that people work
less as they grow old. Therefore totwrk and age are likely to be negatively
correlated. There exists a positive omitted variable bias. (Students could also
argue people work more as they grow up, so there is a positive correlation
between totwrk and age , there exists a negative omitted variable bias. As
long as the argument is consistent with the sign of omitted variable bias, it is
correct.)
ˆ i)
ln(Earn = 12.45 + 0.052 × Y ears + 0.00089 × Innings + 0.0032 × Saves
(0.08) (0.026) (0.00020) (0.0018)
−0.0085 × ERA, R2 = 0.45, SER = 0.874
(0.0168)
where Earn is annual salary in dollars, Y ears is number of years in the
major leagues, Innings is number of innings pitched during the career before
the 1998 season, Saves is number of saves during the career before the 1998
season, and ERA is the earned run average before the 1998 season.
(a) What happens to earnings when the pitcher stays in the league for one
additional year? Compare the salaries of two relievers, one with 10 more saves
than the other. What eect does pitching 100 more innings have on the salary of
the pitcher? What eect does reducing his ERA by 1.5? Do the signs correspond
to your expectations? Explain.
Answer: For staying an additional year in the league, the pitcher receives a
5.2 percent increase in earnings. On average, the reliever with 10 more saves
ends up with 3.2 percent higher earnings. Pitching100 additional innings results
in 8.9 percent higher earnings, and lowering the ERA by 1.5 increases earnings
by 1.3 percent. ERA, innings pitched, and number of saves are all quality of
input indicators and should therefore have the signs as in the regression above.
Years in the major leagues stands as a proxy for on the job training and should
therefore carry a positive sign.
(b) Are the individual coecients statistically signicant? Indicate the level
of signicance you used and the type of alternative hypothesis you considered.
Answer: Given that there is prior expectation on the sign of the coecients,
you should conduct a one-sided hypothesis test. (It is ne if you are not aware
of this and still use two sided hypothesis test.) All variables with the exception
of ERA carry statistically signicant coecients at the 5% level.
5
(c) Although you are quite impressed with the t of the regression, someone
suggests that you should include the square of years and innings as additional
explanatory variables. Your results change as follows:
ˆ i)
ln(Earn = 12.15 + 0.160 × Y ears + 0.00268 × Innings + 0.0063 × Saves
(0.05) (0.039) (0.00030) (0.0010)
−0.0584 × ERA − 0.0165 × Y ears2 − 0.00000045 × Innings2 ,
(0.0165) (0.0026) (0.00000012)
R2 = 0.69, SER = 0.666
What is her reasoning? Are the coecients of the quadratic terms statisti-
cally signicant? Are they meaningful?
Answer: Allowing for the quadratic terms to enter results in an inverted
U-shape for the relationship between the log of earnings, and both years in the
league and innings pitched. Both coecients are highly signicant and have
resulted also in a signicant ERA coecient.
(d) Calculate the eect of moving from two to three years, as opposed to
from 12 to 13 years.
Answer: Having played for two years and staying for one more year in the
league results in an earnings increase of 7.8 percent, while staying for an addi-
tional year after 12 years in the majors results in a predicted decrease of 25.3
percent.
(e) You also decide to test the specication for stability across leagues (Na-
tional League and American League) by including a dummy variable for the
National League and allowing the intercept and all slopes to dier. The result-
ing F-statistic for restricting all coecients that involve the National League
dummy variable to zero, is 0.40. Compare this to the relevant critical value
from the table and decide whether or not these additional variables should be
included.
Answer: It is not statistically signicant at the 5% or 10% level. Hence you
cannot reject the null hypothesis of equality of coecients across leagues.
6
eect of Beauty on Course_Eval large or small? Explain what you mean by
large and small.
(iii) Dr. Qin has an average value of Beauty , while Dr. Pretty's value of
Beauty is one standard deviation above the average. Predict Dr. Qin and Dr.
Pretty's course evaluations.
(iv) Dr. Qin would like to add some personal characteristics, f emale, age
and minority into the regression. Please run the regression for Dr. Qin, report
it in the same table and explain the outcome. Could you please provide the
reason why Dr. Qin want to add them into the regression?
Answer:
(i) The scatter graph does not show any relationship between the variables.
The regression outcome (reported in column 1) suggests a weak positive rela-
7
tionship between course evaluation and beauty index. The full sample includes
observations whose teaching evaluations are far beyond the standard scale of
1 to 5. The existence of outliers will drive the regression results imprecise.
Specically, it may bias coecients towards 0.
(ii) After dropping the outliers, I run the regression by the new data set
and report in column (2). Slope is 0.131, which means one unit increase in
Beauty index is associated with 0.131 points increase in course evaluation. The
standard deviation of course evaluations is 0.55 and the standard deviation of
beauty is 0.789. A one standard deviation increase in beauty is expected to
increase course evaluation by 0.131*0.789=0.103, or 1/5 of a standard deviation
of course evaluations. The eect seems not big.
(iii) The average value of Beauty = 0.002831, standard deviation of Beauty =
0.7891652. Dr.Qin' predicted course evaluations=4.00+0.131*0.002831=4.000371.
Dr.Pretty's predicted course evalutations=4.00+0.131*(0.002831+0.7891652)=4.103752.
(iv) Answer: Coecients on age and minority are not statistically signicant.
1 increase in beauty index is associated with 0.138 points increase in course eval-
uation score. On average, course evaluation score for female are 0.207 points
lower than that for male. After including female; age and minority, the coe-
cient on beauty has increased. And R2 has increased signicantly from 0.035
to 0.073. Exclusion of personal characteristics would induce negative omitted
variable bias(compare 0.138 with 0.131) for beauty. Those variables should be
included into the regression.
3.2 Use the data birth_weight.dta for this exercise. Suppose that you are
hired by a health center to analyze the factors that may aect babies' birth
weight (bwght). Please use R command str() to nd out what each variable
refers to.
(i) You rst want to discuss how mom's smoking habit aects the baby's
birth weight. You estimate the model
Report the result in a table, including the sample size and R-squared. Interpret
the coecient of β0 . If cigs increases by one sample standard deviation, what
is the estimated eect on birth weight?
(ii) Second, you want to discuss whether mom's ethnicity aects the baby's
birth weight. According to what you have (the data), please design a regression
model (write down the model) and use the estimation result to answer the
question. Make sure you do interpret the related coecient(s) correctly.
(iii) Third, you want to discuss whether mother's education matters in de-
termining the baby's birth weight. You add meduc to the regression model that
you designed in (ii), and run the regression. Report the result in a table (can
be in the same table from (i) and/or (ii)). What can you say about the impact
of mother's education from the result? Compare the coecients for the ethnic
groups in this regression and the ones from (ii), is there any dierence? If you
observe the dierence, what is the reason that may cause the dierence? Com-
8
pare the coecient for cigs from this regression and the one from (ii), is there
any dierence?
(iv) Now, let's give one more try on mother's education discussion. Please
generate a new variable, meduc2, is the squared of meduc. Including meduc2 in
the regression from (iii), please estimate the model and then report the result.
Comparing the result with the one from (iii), do you nd any signicant change
of the coecients estimation? Does mother's education matter for the baby's
birth weight? Can you construct a test to support your conclusion?
Answer:
(i) Answer: The estimation result is reported in column (1). The intercept
is 9.730, which implies the average weight of female baby whose moms not
smoke, and the estimate is statistically signicant at 1% level. The sample
standard deviation for cigs is 2.55. One standard deviation increase in moms'
daily cigarette consumption is expected to decrease the birth weight of the baby
by 2.55*0.007=0.01785 pound.
(ii) Answer: To discuss whether mom's ethnicity aects the baby's birth
weight, we include the variable mblck and masian in the model (you can include
any two of the ethnic dummies). The suggested model should be
The regession result is shown in column (2). Only the coecient for masian
is statistically signicant at 5% level. No signicant birth weight dierence is
found between a white mom and an American Africa mom. However, holding
other factors constant, a baby of an asian mom tend to weigh 0.051 pound less
than that of a white mom. Mom's ethnicity does aect baby's birth weight.
(iii) Answer: The result is reported in column (3). The coecient on mom's
education is statistically insignicant. We cannot reject the null hypothesis
9
that mom's education does not have impact on birth weight. Comparing the
results in (ii) and (iii), the coecient magnitude on asian group increases. And
the coecient on mblck is still not statistically signicant at 10% level. The
coecient on cigs becomes less negative after including meduc into the model.
The possible reason could be the omitted variable bias. Mother's education is
correlated with ethnic groups and the habit of smoking, while at the same time
mother's education is a determinant of babies' birth weight.
By the correlation table we can see that mothers with higher education tend
to smoke less(corr(meduc, cigs) < 0) and less likely to be American Africa while
more likely to be Asian (corr(meduc, mblck) < 0; corr(meduc, masian) > 0). At
meanwhile, better educated moms tend to have babies with higher birth weight
(corr(meduc, bwght) > 0). So when meduc is omitted, the eect of smoking
habit is overestimated and there is positive bias for asian moms and negative
bias for black moms.
(iv) Answer: The result is shown in column (4). There is signicant change
in the coecient on meduc, from 0.0032 in model to -0.06. And it is now statis-
tically signicant at 1% level. The coecient on the quadratic term meduc2 is
signicant at 1% level. That suggests a nonlinear relationship between mom's
education and baby's birth weight. We use the F-test to test the joint signi-
cance of mom's education, both linar and quadratic term. The null hypothesis is
H0 : meduc = 0 and meduc2 = 0. The p-value for F-test is 0.0003226, which is
smaller than 1%. Thus, we reject the null hypothesis and conclude that mother's
education matters for babies' weight.
3.3 Please use VOTE2016.dta to answer the following questions. The follow-
ing model can be used to study whether campaign expenditures aect election
outcomes:
10
Answer; Yes. When the campaign expenditure by A is increased by 1%,
the vote share for candidate A will be increased by 0.06 percentage points, the
vote share for candidate B will be decreassed by 0.066 percentage points. Both
coecients are statistically signicant and we can reject the zero null hypothesis.
(b) Can you tell whether a 1% increase in A's expenditures is oset by a
1% increase in B's expenditure? How? Please suggest a regression or test and
thenanswer the question according to your result.
Answer: Yes. We can tell from the F test. The null hypothesis of the F
test is the sum of the two coeicnets is equal to zero, which suggests that a
1% increase in A's expenditures is oset by a 1% increase in B's expenditure.
The F statistics is 0.7094 and the p value is 0.4008, so we cannot reject the null
hypothesis that a 1% increase in A's expenditures is oset by a 1% increase in
B's expenditure.
another regression and test to solve question (c).
Answer: We can also transform the regression to test the same hypothesis.
Now lets write θ1 = β1 + β2 , or β1 = θ − β2 . Plugging this into the original
equation and rearranging, gives
11