hw2 Spring2023 Econ3005 Solution
hw2 Spring2023 Econ3005 Solution
1.2 The slope estimator, β1 , has a smaller standard error, other things equal,
if
a. there is more variation in the explanatory variable, X.
b. there is a large variance of the error term, u.
c. the sample size is smaller.
d. the intercept, 0, is small.
Answer: a
1.5 The only dierence between a one- and two-sided hypothesis test is
a. the null hypothesis
b. dependent on the sample size n
c. the sign of the slope coecient
d. how you interpret the t-statistic
Answer: d
1
1.6 Using the textbook example of 420 California school districts and the
regression of testscores on the student-teacher ratio, you nd that the standard
error on the slope coecient is 0.51 when using the heteroskedasticity robust
formula, while it is 0.48 when employing the homoskedasticity only formula.
When calculating the t-statistic, the recommended procedure is to
a. use the homoskedasticity only formula because the t-statistic becomes
larger
b. rst test for homoskedasticity of the errors and then make a decision
c. use the heteroskedasticity robust formula
d. make a decision depending on how much dierent the estimate of the
slope is under the two procedures
Answer: c
1.7 When there are omitted variables in the regression, which are determi-
nants of the dependent variable, then
a. you cannot measure the eect of the omitted variable, but the estimator
of your included variable(s) is (are) unaected.
b. this has no eect on the estimator of your included variable because the
other variable is not included.
c. this will always bias the OLS estimator of the included variable.
d. the OLS estimator is biased if the omitted variable is correlated with the
included variable.
Answer: d
2
full-time workers aged 25-65 yields the following:
(2 points) (i) Explain what the coecient values 689.99 and 10.67 mean.
Answer: The coecient 10.64 shows the marginal eect of Age on AWE;
that is, AWE is expected to increase by $9.6 for each additional year of age.
689.88 is the intercept of the regression line. It determines the overall level of
the line.
(2 points) (ii) The regression R2 is 0.045. What are the units of measurement
2 2
for the R ? (Dollars? Years? Or is R unit-free?)
Answer: R2 is unit free.
(2 points) (iii) What is the regression's predicted earnings for a 28-year-old
worker? A 38-year-old worker?
Answer:
The regression's predicted earnings for a 28-year-old worker is $988.64 :
AWˆ E = 689.88 + 10.67 × 28 = 988.64. The number for a 40-year-old worker is
ˆ E = 689.88 + 10.67 × 38 = 1095.34.
$1095.34, AW
(3 points) (iv) Will the regression give reliable predictions for a 95-year-old
worker? Why or why not?
Answer: No. The oldest worker in the sample is 65 years old, 95 years is far
outside the range of the sample data.
(2 points) (v) The average age in this sample is 41 years. What is the average
value of AWE in the sample?
Answer: βˆ0 = Ȳ − βˆ1 X̄ , so that Ȳ = βˆ0 + βˆ1 X̄ . Thus the sample mean of
AW E is $1127.35. AW ¯ E = 689.88 + 10.67 × 41 = 1127.35.
(20 points) 2.2 You have collected data for 104 countries to address the
dicult questions of the determinants for dierences in the standard of living
among the countries of the world. You recall from your macroeconomics lectures
that the neoclassical growth model suggests that output per worker (per capita
income) levels are determined by, among others, the saving rate and population
growth rate. To test the predictions of this growth model, you run the following
ˆ
regression:RelP ersInc = 0.339 − 12.894 × n + 1.397 × sk, R2 = 0.621
where RelP ersInc is GDP per worker relative to the United States, n is the
average population growth rate, 1980-1990, and sk is the average investment
share of GDP from 1960 to1990 (remember investment equals saving).
(6 points) (a) Interpret the results. Do the signs correspond to what you
expected them to be? Explain. (Hints:The Solow growth model predicts higher
productivity with higher saving rates and lower population growth.)
Answer: The Solow growth model predicts higher productivity with higher
saving rates and lower population growth. The signs therefore correspond to
prior expectations. A 10 percent point increase in the saving rate results in a
roughly 14 percent increase in per capita income relative to the United States.
Lowering the population growth rate by 1 percent results in a 13 percent higher
per capita income relative to the United States. It is best not to interpret the
3
intercept. The regression explains approximately 62 percent of the variation in
per capita income among the 104 countries of the world.
(8 points) (b) You remember that human capital in addition to physical
capital also plays a role in determining the standard of living of a country. You
therefore collect additional data on the average educational attainment in years
for 1985, and add this variable (Educ) to the above regression. This results in
1 No need to show how you derive the sign according to the formula. You just
need to tell the sign of the omitted variable bias, and then describe the mech-
anism. We show you once how the direction of omitted variable bias is de-
P
rived. According to omitted variable bias formla: βˆi → βi + ( ααu )ρxi u , as βˆn <
xi
βn (we can regard the estimate when controling for omitted variables as the one very closer to the true β ),
we know the direction of omitted variable bias is negative ρxi u < 0; and as βˆ
sk > βsk , we
know the direction of omitted variable bias is positive ρsk,u > 0 .
4
Part 3 Empirical Exercise (45
points in total)
For all regressions, please report the heteroskedasticity-robust standard errors.
(20 points) 1. Please download the World Bank Development Report Data
wbdr.dta from Moodle and answer the questions. The data contain the variables
for 1997 situation.
code : country code (alphabetical)
country : country name
illit_f : % illiterate, female aged 15+
illit_m: % illiterate, male aged 15+
illit_t : % illiterate, total aged 15+
mort_inf : infant mortality rate, per 1000
mort_5: < age 5 mortality rate, per 100
gnppc: GNP per capita (US$1995)
gnppcppp: GNP per capita (PPP)
mort77: 1977 infant mort rate, per 1000
gnppc77: 1977 GNP per capita (US$1995)
Please report all the regression outcomes in one single table.
(3 points) (i) Regress the illiteracy rate (illit_t) on per capita GNP in 1997
(gnppc). Is the sign of the coecient what you expected? Explain the result
briey.
Answer:
The result is reported in column (i) The coecient on per capita GNP is
.001, with a standard error of .0001, statistically signicant at 1% level. The co-
ecient tells us that illiteracy and GNP per capita are negatively correlated: on
average, a $1000 rise in per capita GNP is associated with a 1% (1000*.001=1)
decline in the illiteracy rate. (Remember correlation does not imply causality.
5
Just because two variables are highly correlated does not mean one causes the
other.) The sign of the coecient is what we would have expected: We expect
higher GNP countries to have better education because they can spend more on
schools. Conversely, with a more educated labor force, countries can produce
more. Hence, the negative correlation between gnppc and illiteracy.
(3 points) (ii) Regress the infant mortality rate in 1997 on GNP per capita
in 1997. Is the coecient on per capita GNP signicantly dierent from zero?
How do you know? Interpret the coecient in terms of a $1000 dierence in
per capita GNP.
Answer:
The result is reported in column (ii). The coecient on per capita GNP is
signicantly dierent from zero at the 1% level of signicance. We know this by
the t-statistic, which tests the null hypothesis that the true coecient equals
zero; the t-statistic is greater (in absolute value) than 2.58, so we reject the
null hypothesis and conclude that the coecient is signicant (You can also use
p value to explain.). The coecient means that a $1000 increase in GNP is
associated with 2 fewer infant deaths per 1000 births (1000 * -.0020= 2).
(4 points) (iii) Regress the infant mortality rate in 1997 on the illiteracy rate.
Graph a scatter plot of the data as well as the regression line. Please interpret
the coecient of the illiteracy rate. (Use the ggplot(....)+geom_point()+geom_smooth(method=lm ,...)
R command to produce the graph.)
Answer:
The result is reported in column (iii). The graph is shown as following.
150
mort_inf
100
50
0
0 25 50 75
illit_t
(4 points) (iv) Regress the infant mortality rate in 1997 on the illiteracy rate
and GNP per capita in 1997. Please interpret the results. Is the coecient of
illiteracy rate changed a lot from result (iii) ? Why or why not?
Answer: The result is reported in column (iv). The result suggests that
1% increase in the illiteracy rate is associated with around 1.2 higher infant
deaths per 1000 births. Compared with the result in (iii), the coecinet of
the illiteracy rate dropped by 16% but still shows statististically signicant at
6
1% level. The reason is due to the omitted variable bias that caused by the
GNP per capita in 1997. The regression outcome (iv) suggests that for $10000
increase in GNP per capita is associated with 8 fewer infant deaths per 1000
births (corr(gnppc, mort_inf ) < 0). While higher GNP per capita is associated
with lower illiteracy rate (suggested by regression (i), corr(gnppc, illit_t) < 0),
omitting GNP per capita in the regression ((iii)) would result in a positive bias
in the estimate of the coecient of the illiteracy rate.
(6 points) (v) Using the results from part i-iv, what can we say about the
causal relationship between illiteracy, infant mortality, and income (GNP)?
Answer: Results from i-iii do not say anything about causality except that
most likely, causality runs boths ways between each variable.
GNP could cause illiteracy to decrease b/c richer countries can spend more
on education. Conversely, being literate probably increases worker productivity
(amongst other things); decreasing illiteracy could cause an increase in GNP.
Literate mothers can read public health info better; hence decreasing il-
literacy could cause a decrease in infant mortality. Infant mortality is unlikely
to have a direct eect on illiteracy (although we could probably imagine some.)
However, illiteracy and infant mortality might both decrease if income increased;
hence the joint determination problem.
Richer countries generally have better health facilities; hence increased
GNP can decrease infant mortality. Reverse causality such that infant mortality
increases or decreases GNP is unlikely. However, infant mortality and GNP
might be jointly determined by illiteracy. (E.g. Literacy increases GNP).
(25 points) 3.2 This question deals with the estimation of betas of the Capital
Asset Pricing Model (CAPM), and it is a relatively straightforward application
of a simple linear regression.
Rte = α + βRmt + ut
Rte is the expected return (return), Rmt is the market return (market). You
are given data on monthly stock returns for 15 companies in 7 industries for the
period from January 1978 to December 1987. They are:
7
Industries Companies
Oil Mobil (11)
Texaco (14)
Computers IBM (10)
DEC (Digital Equipment Corporation) (6)
DataGen (Data General) (5)
Electric Utilities ConEd (Consolidated Edison) (3)
PSNH (Public Service of New Hampshire) (13)
Forest Products Weyer (Weyerhauser) (15)
Boise (1)
Airlines PanAm (Pan American Airways) (12)
Delta (7)
Banks Contil (Continental Illinois) (4)
Citcrp (Citicorp) (2)
Foods Gerber (9)
GenMil (General Mills) (8)
These data are contained in the le capm3.dta. The le also contains in-
formation on the market monthly return (market, a value-weighted average of
returns on stocks listed on the New York Stock Exchange) and information on
the risk-free rate of return (return, the return on 30-day U.S. Treasury Bills).
The stock and market returns in the le are excess returns over the risk-free
rate of return.
From the list of industries, choose DEC from industry of computer (com-
paratively highly risky) and GenMil from the industry of Foods (relatively
safe) (Hint: The variable ncomp runs from 1 to 15 and identies the com-
pany in each observation, while the corresponding number for each company
is listed in the table in the paranthesis). You can use the subset on R com-
mands to choose sample and run the regression with. For example, DEC <
−subset(capm, capm$ncomp == 6) uses the data from company DEC only. )
Answer:
The regression outcomes are reported in table 2.
8
DEC(i) GenMil(i) DEC(ii) GenMil(ii)
(0.123) (0.093)
(0.123) (0.093)
Standard errors are heteroskedasticity robust. *** p < 0.01; ** p < 0.05; *
p < 0.1.
(6 points) (i) Estimate α and β in the CAPM by OLS for each of the two
rms. How do the estimates of α and β dier between the two rms? Does this
accord with your expectation?
Answer: Column 1 and 2 report the results. From the results, we can see
that Boise is more sensitive to the market, βDEC > βGenM il i.e. the demand for
computer products is more dependent on economic conditions than the demand
for foods. As the question suggests the Computers as more risky industry,
then we should expect that a positive change in the market returns would be
associated with a larger positive change in the returns for computer companies
than in the returns for food products companies. The results are consistent with
our expectations.
(7 points) (ii) The monthly stock and market returns (return and market)
are in decimal. Convert them into percentage and re-estimate α and β . Are the
new estimates dierent from the estimates you got in part (i)? Explain.
Answer:
Column 3 and 4 report the results. The estimate of β does not dier from
the estimate in part (i), but the esitmate of α is 100 times the estimate of α
in part (i). A straightforward way to show this would be using the formula for
the OLS estimator. Here is an intuitive way to explain this. Note that we can
multiply 100 on both sides of a CAPM regression
Rte = e
α + βRmt + ut ,
where Rte and
e
Rmt are in decimal, and obtain a CAPM regression with returns
in percentage
100 × Rte = e
100 × α + β (100 × Rmt ) + 100 × ut
R̃te e
= α̃ + β R̃mt + ũt .
The OLS estimator of α̃ is 100 times that of α and the OLS estimator of β does
not change.
9
(6 points) (iii) For each company, compute the proportion of total risk that
is market risk. Are the results consistent with your expectations?
Answer:
In the CAPM, the proportion of total risk that is market risk is R2 , so the
proportion of market risk for DEC is 34.2% and for GenMil the proportion of
market risk is 8%. This agrees with the expectations since factors that explain
higher market returns also explain the demand for computer products, which
leads to higher returns for computer products. The relationship between foods
demand and market performance is not expected to be as robust.
(6 points) (iv) Do large estimates of β correspond to higher R2 values? Do
you expect this to be the case? Why or why not?
Answers:
While in the case of these two companies the company with the higher β
also had the higher R2 , this may not always be the case. β and R2 do not
have a monotonic relationship in general. For example, if the stock return of a
company is subject to large idiosyncratic risks unrelated to the market risk (ut
has a large standard deviation), then R2 can be quite low while its stock return
has a high β.
10