PS4 Sol
PS4 Sol
Econometrics
1. Suppose that a researcher collects data on house that have sold in a particular neigh-
borhood over that past year and obtains the regression results in the table shown below.
1
Price: sale price ($), View: =1 if has a nice view
Bedroom: number of bedrooms Pool: =1 it has a swimming pool
Size: house size in square feet
Condition: =1 if real estate reports it is in excellent condition
(a) Using the results a column (1), what is the expected change in price a building
a 500-square-foot addition to a house? Construct a 95% CI for the percentage
change in price.
0:0004 500 100% = 20%. the house price is expected to increase by 20%
with additional 500 square feet. The 95% CI for the change is (0:0004 1:96
0:000038) 500 100% = [16:28%; 23:72%].
(b) Comparing columns (1) and (2), is it better to use Size or ln(Size) to explain house
prices?
Compared to the column (1), the column (2) has higher R2 which implies that
the column (2) explains the house price better. Thus, it is better to use log
size rather than just level size.
(c) Using column (2), what is the estimated e¤ect of pool on price? (Make sure you
get the units right.) Construct 95% CI for this e¤ect.
0:071 100% = 7:1%. the house price is expected to increase by 7.1% if the
house has a pool. The 95% CI for the change is (0:071 1:96 0:034) 100% =
[0:44%; 13:76%].
2
(d) The regression in column (3) adds the number of bedrooms to the regression. How
large is the estimated e¤ect of an additional bedroom? Is the e¤ect statistically
signi…cant? Why do you think the estimated e¤ect is so small? (Hint: which other
variables are being held constant?)
3
2. Suppose that a researcher collects data on workers that live in US and obtains the
regression results in the table shown below.
4
(a) Consider a man with 16 years of education and 2 years of experience who is from
a wester state. Use the results from column (4) of above table, and estimate the
expected change in the logarithm of average hourly earnings associated with an
additional year of experience.
b
E(ln(AHE)j:::; Exper = 2; :::) = ::: + 0:0143 2 0:000211 22 + :::
b
E(ln(AHE)j:::; Exper = 3; :::) = ::: + 0:0143 3 0:000211 32 + :::
b
The expected di¤erence is E(ln(AHE)j:::; b
Exper = 3; :::) E(ln(AHE)j:::; Exper =
2; :::) 0:013 (1.3%).
b
E(ln(AHE)j:::; Exper = 10; :::) = ::: + 0:0143 10 0:000211 102 + :::
b
E(ln(AHE)j:::; Exper = 11; :::) = ::: + 0:0143 11 0:000211 112 + :::
b
The expected di¤erence is E(ln(AHE)j:::; b
Exper = 11; :::) E(ln(AHE)j:::; Exper =
10; :::) 0:0099 (0.99%).
(c) Explain why the answers to (a) and (b) are di¤erent.
Since the coe¢ cients of Exper and Exper2 are signi…cant, it implies that the
marginal e¤ect of experience is nonlinear and it depends on the level of expe-
rience. In particular, the negative coe¢ cient of Exper2 and the positive co-
e¢ cient of Exper imply the diminishing marginal e¤ect of experience. Thus,
the marginal e¤ect of experience decreases as the year of experience increases.
5
(d) Is the di¤erence in the answers to (a) and (b) statistically signi…cant at the 5%
level? Explain.
Let 1 denote the di¤erence in the expected ln(AHE) with Exper=2, and 2
So we need to test
H0 : 2 1 = Exper2 16 = 0:
(e) Would your answers to (a) through (d) change if the person were a woman? If the
person were from the South? Explain.
No, because the marginal e¤ect of experience is not associated with other
regressors.
(f) How would you change the regression if you suspected that the e¤ect of experience
on earnings was di¤erent for men than for women?
6
3. Suppose we consider the following equation
price = 0 + 1 assess + U.
The variable price is house price and assess is the assessed housing price (before the
house was sold). Suppose we would like to test whether the assessed housing price
is a rational valuation. If this is the case, then one unit change in assess should be
associated with one unit change in price.
[ =
price 14:47 + 0:976assess
(16:27) 0:049
We use the SSR form of the F statistic. We are testing q = 2 restrictions and
the df in the unrestricted model is 86. We are given SSRr = 209; 448:99 and
SSRur = 165; 622:51. Therefore,
7
which is a strong rejection of H0 from F-distribution table, the 1% critical
value with q = 2 and df = 90 df is 4:85:
where lotsize is size of the lot, in feet, sqrf t is square footage, and bdrms is a
number of bedrooms. The R2 from estimating this model using the same 88 houses
is 0:829.
(d) If the variance of price change with access, lotsize, sqrft, or bdrms, what can you
say about the F test from part (c)?
8
4. Suppose using data the following equation was estimated
\
rdintens = 2:613 + 0:0003sales 0:000000007sales2
(0:429) 0:00014 0:0000000037
N = 32, R2 = 0:1484.
The variable rdintens is R&D spending as percentage of sales, and sales is …rm sales
in million dollar.
(a) Would you keep the quadratic term in the model? Explain.
Probably. Its t statistic is about 1:89, which is signi…cant against the one-
sided alternative H1 : < 0 at the 5% level (t0:05 1:70 with df = 29). In
fact, the p-value is about 0:0344. If we consider the null hypothesis H0 : = 0,
then it is insigni…cant at the 5% level, but signi…cant at the 10% level.
Because sales gets divided by 1; 000 to obtain salesbil, the corresponding co-
e¢ cient gets multiplied by 1; 000: (1; 000) (0:00030) = :30. The standard
error gets multiplied by the same factor. As stated in the hint, salesbil2 =
sales=1; 000; 000, and so the coe¢ cient on the quadratic gets multiplied by
one million: (1; 000; 000) (0:0000000070) = :0070; its standard error also
gets multiplied by one million. Nothing happens to the intercept (because
rdintens has not been rescaled) or to the R2 :
\
rdintens = 2:613 + 0:3 salesbil 0:007salesbil2
(0:429) 0:14 0:0037
2
N = 32, R = 0:1484.
9
5. Suppose using data the following equation was estimated
\
log(wage) = 5:65 + 0:047educ + 0:00078educ pareudc + 0:019exper
(0:13) 0:01 0:00021 0:004
+0:010tenure
0:003
N = 722, R2 = 0:169.
The variable pareudc is the total amount of both parents’ education, and tenure is
years with the current employer.
(a) Interpret the coe¢ cient on the interaction term. It might help to choose two
speci…c values for pareudc, for example pareudc = 32, if both parents have a
college education, or pareudc = 24 if both parents have a highschool education,
and to compare the estimated return to education.
We use the values pareduc = 32 and pareduc = 24 to interpret the coe¢ cient
on educ pareudc. The di¤erence in the estimated return to education is
0:00078 (32 24) = 0:0062, or about 0:62 percentage points.
\
log(wage) = 4:49 + 0:097educ + 0:034pareudc 0:0016educ pareudc + 0:020exper
(0:38) 0:027 0:017 0:0012 0:004
+0:010tenure
0:003
N = 722, R2 = 0:174.
Does the estimated return to education now depend positively on parent educa-
tion? Test the null hypothesis that the return to education does not depend on
parent education.
When we add pareduc by itself, the coe¢ cient on the interaction term is nega-
tive. The t statistic on educ pareudc is about 1:33, which is not signi…cant
at the 10% level against a two-sided alternative. Note that the coe¢ cient on
10
pareduc is signi…cant at the 5% level against a two-sided alternative. This
provides a good example of how omitting a level e¤ect (pareudc in this case)
can lead to biased estimation of the interaction e¤ect.
11
6. Suppose we want to estimate the e¤ects of alcohol consumption (alcohol) on college
grade point average (colGP A). In addition to collecting this information, we also
obtain attendance information (say, percentage of lectures attended, called attend). A
standardized test score (say, SAT ) and highschool GPA (hsGP A) are also available.
(a) Should we include attend along with alcohol as regressors in a multiple regression
model? (Think about how you would interpret alcohol .)
The answer is not entirely obvious, but one must properly interpret the coef-
…cient on alcohol in either case. If we include attend, then we are measuring
the e¤ect of alcohol consumption on college GPA, holding attendance …xed.
Because attendance is likely to be an important mechanism through which
drinking a¤ects performance, we probably do not want to hold it …xed in the
analysis. If we do include attend, then we interpret the estimate of alcohol as
measuring those e¤ects on colGP A that are not due to attending class. (For
example, we could be measuring the e¤ects that drinking alcohol has on study
time.) To get a total e¤ect of alcohol consumption, we would leave attend out.
12
7. Suppose using data the following equation was estimated
c = 1; 028:1 + 19:3hsize
sat 2:19hsize2 45:09f emale
(6:29) 3:83 0:53 4:29
N = 4; 137, R2 = 0:0858.
The variable sat is the college entrance exam score, hsize is size of the student’s high-
school graduating class, in hundreds, f emale is a gender dummy variable, and black is
a race dummy variable equal to one for blacks and zero otherwise.
(a) Is there strong evidence that hsize2 should be included in the model?
The t statistic on hsize2 is over four in absolute value, so there is very strong
evidence that it belongs in the equation.
(b) Holding hsize …xed, what is the estimated di¤erence in sat score between non-
black females and nonblack males? How statistically signi…cant is this estimated
di¤erence?
This is given by the coe¢ cient on female (since black = 0): nonblack females
have SAT scores about 45 points lower than nonblack males. The t statistic
is about 10:51, so the di¤erence is very statistically signi…cant. (The very
large sample size certainly contributes to the statistical signi…cance.
(c) What is the estimated di¤erence in sat score between nonblack males and black
males? Test the null hypothesis that there is no di¤erence between their scores,
against the alternative that there is a di¤erence.
13
Because f emale = 0, the coe¢ cient on black implies that a black male has an
estimated SAT score almost 170 points less than a comparable nonblack male.
The t statistic is over 13 in absolute value, so we easily reject the hypothesis
that there is no ceteris paribus di¤erence.
(d) What is the estimated di¤erence in sat score between black females and nonblack
females? What would you need to do to test whether the di¤erence is statistically
signi…cant?
14
8. Suppose you collect data from a survey on wages, education, experience, and gender.
In addition, you ask for information about marijuana usage. The original question is :
“On how many separate occasions last month did you smoke marijuana?”
(a) Write an equation that would allow you to estimate the e¤ects of marijuana usage
on wage, while controlling for other factors. You should be able to make statement
such as, “Smoking marijuana …ve more times per month is estimated to change
wage by x%.”
2
log(wage) = 0 + 1 usage + 2 educ + 3 exper + 4 exper
+ 5 f emale +U
(b) Write a model that would allow you to test whether drug usage has di¤erent e¤ects
on wages for men and women. How would you test that there are no di¤erence in
the e¤ects of drug usage for men and women?
2
log(wage) = 0 + 1 usage + 2 educ + 3 exper + 4 exper
The null hypothesis that the e¤ect of marijuana usage does not di¤er by gender
is : H0 : 6 = 0. The test can be done by using t-statistic of 6.
15
(c) Suppose you think it is better measure marijuana usage by putting people into
one of four categories: nonuser, light user (1 to 5 times per month), moderate user
(5 to 10 times per month), and heavy user (more than 10 times per month). Now,
write a model that allow you to estimate the e¤ects of marijuana usage on wage.
We take the base group to be nonuser. Then we need dummy variables for
the other three groups:
= 0 otherwise,
= 0 otherwise,
= 0 otherwise.
2
+ 3 exper + 4 exper + 4 f emale + U.
Then, 1 will capture the e¤ect of light usage of marijuana on wage relative
to nonuser, and other two coe¢ cients of dummy variables will do analogously.
(d) Using the model in (c), explain in detail how to test the null hypothesis that
marijuana usage has no e¤ect on wage. Be very speci…c and include a careful
listing of degree of freedom.
16
(e) What are some potential problems with drawing causal inference using the survey
data that you collected?
The error term could contain factors, such as family background (including
parental history of drug abuse) that could directly a¤ect wages and also be
correlated with marijuana usage. We are interested in the e¤ects of a person’s
drug usage on his or her wage, so we would like to hold other confounding
factors …xed. We could try to collect data on relevant background information.
17