0% found this document useful (0 votes)
32 views17 pages

PS4 Sol

Uploaded by

ongaribelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views17 pages

PS4 Sol

Uploaded by

ongaribelia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Problem Sets 3

Econometrics

1. Suppose that a researcher collects data on house that have sold in a particular neigh-
borhood over that past year and obtains the regression results in the table shown below.

Regression Results: Dependent Variable ln(Price)


(1) (2) (3) (4) (5)
Size 0:0004
(0:000038)
ln(Size) 0:69 0:68 0:57 0:69
(0:054) (0:087) (2:03) (0:055)
ln(Size)2 0:0078
(0:14)
Bedrooms 0:0036
(0:037)
Pool 0:082 0:071 0:071 0:071 0:071
(0:032) (0:034) (0:026) (0:036) (0:035)
View 0:037 0:027 0:026 0:027 0:027
(0:029) (0:028) (0:026) (0:029) (0:030)
Pool View 0:0022
(0:10)
Condition 0:13 0:12 0:12 0:12 0:12
(0:045) (0:035) (0:035) (0:036) (0:035)
Intercept 10:97 6:60 6:63 7:02 6:60
(0:069) (0:39) (0:53) (7:50) (0:4)
R2ad 0:72 0:74 0:73 0:73 0:73
the standard error is reported in ()

1
Price: sale price ($), View: =1 if has a nice view
Bedroom: number of bedrooms Pool: =1 it has a swimming pool
Size: house size in square feet
Condition: =1 if real estate reports it is in excellent condition

(a) Using the results a column (1), what is the expected change in price a building
a 500-square-foot addition to a house? Construct a 95% CI for the percentage
change in price.

0:0004 500 100% = 20%. the house price is expected to increase by 20%
with additional 500 square feet. The 95% CI for the change is (0:0004 1:96
0:000038) 500 100% = [16:28%; 23:72%].

(b) Comparing columns (1) and (2), is it better to use Size or ln(Size) to explain house
prices?

Compared to the column (1), the column (2) has higher R2 which implies that
the column (2) explains the house price better. Thus, it is better to use log
size rather than just level size.

(c) Using column (2), what is the estimated e¤ect of pool on price? (Make sure you
get the units right.) Construct 95% CI for this e¤ect.

0:071 100% = 7:1%. the house price is expected to increase by 7.1% if the
house has a pool. The 95% CI for the change is (0:071 1:96 0:034) 100% =
[0:44%; 13:76%].

2
(d) The regression in column (3) adds the number of bedrooms to the regression. How
large is the estimated e¤ect of an additional bedroom? Is the e¤ect statistically
signi…cant? Why do you think the estimated e¤ect is so small? (Hint: which other
variables are being held constant?)

0:0036 100% = 0:36%. The house price is expected to increase by 0.36%


with an additional bedroom. But the e¤ect is statistically insigni…cant. This
is the marginal e¤ect of a bedroom keeping other variables constant. If the
size of the house does not increase along the additional bedroom, the e¤ect of
an additional bedroom would be small.

(e) Is the quadratic term ln(Size)2 important?

The quadratic term on ln(size)2 is not important because it is not statistically


signi…cant at a 5% signi…cance level.

3
2. Suppose that a researcher collects data on workers that live in US and obtains the
regression results in the table shown below.

Regression Results: Dependent Var. ln(hourly earnings) = ln(AHE)


(1) (2) (3) (4)
Educ 0:1035 0:1050 0:1001 0:1032
(0:0009) (0:0009) (0:0011) (0:0012)
Female 0:263 0:432 0:451
(0:004) (0:024) (0:0024)
Female Educ 0:0121 0:0134
(0:0017) (0:0017)
Exper 0:0143
(0:0012)
Exper2 0:000211
(0:000023)
Midwest 0:095
(0:006)
South -0.092
(0.006)
West 0:023
(0:007)
Intercept 1:533 1:629 1:697 1:503
(0:012) (0:012) (0:016) (0:023)
R2ad 0:208 0:258 0:258 0:267

Female: =1 if female Midwest: =1 if the worker lives in Midwest


West: =1 if the worker lives in South South: =1 if the worker lives in South
Omitted region is Northeast. N=52,790

4
(a) Consider a man with 16 years of education and 2 years of experience who is from
a wester state. Use the results from column (4) of above table, and estimate the
expected change in the logarithm of average hourly earnings associated with an
additional year of experience.

The expected ln(AHE) with Exper=2 and with Exper=3 are

b
E(ln(AHE)j:::; Exper = 2; :::) = ::: + 0:0143 2 0:000211 22 + :::
b
E(ln(AHE)j:::; Exper = 3; :::) = ::: + 0:0143 3 0:000211 32 + :::

b
The expected di¤erence is E(ln(AHE)j:::; b
Exper = 3; :::) E(ln(AHE)j:::; Exper =
2; :::) 0:013 (1.3%).

(b) Repeat (a) assuming 10 years of experience.

The expected ln(AHE) with Exper=10 and with Exper=11 are

b
E(ln(AHE)j:::; Exper = 10; :::) = ::: + 0:0143 10 0:000211 102 + :::
b
E(ln(AHE)j:::; Exper = 11; :::) = ::: + 0:0143 11 0:000211 112 + :::

b
The expected di¤erence is E(ln(AHE)j:::; b
Exper = 11; :::) E(ln(AHE)j:::; Exper =
10; :::) 0:0099 (0.99%).

(c) Explain why the answers to (a) and (b) are di¤erent.

Since the coe¢ cients of Exper and Exper2 are signi…cant, it implies that the
marginal e¤ect of experience is nonlinear and it depends on the level of expe-
rience. In particular, the negative coe¢ cient of Exper2 and the positive co-
e¢ cient of Exper imply the diminishing marginal e¤ect of experience. Thus,
the marginal e¤ect of experience decreases as the year of experience increases.

5
(d) Is the di¤erence in the answers to (a) and (b) statistically signi…cant at the 5%
level? Explain.

Let 1 denote the di¤erence in the expected ln(AHE) with Exper=2, and 2

denote the di¤erence in the expected ln(AHE) with Exper=10. Then,

1 = 0:0143 0:000211 (32 22 )

2 = 0:0143 0:000211 (112 102 )

=) 2 1 = 0:000211(21 5) = 0:000211 16:

So we need to test
H0 : 2 1 = Exper2 16 = 0:

Since Exper2 is statistically signi…cant at 5% signi…cance level (=it is not zero


at 5% signi…cance level), so Exper2 16 is. We can reject the null hypothesis,
and can say that the di¤erence is signi…cantly di¤erent from zero.

(e) Would your answers to (a) through (d) change if the person were a woman? If the
person were from the South? Explain.

No, because the marginal e¤ect of experience is not associated with other
regressors.

(f) How would you change the regression if you suspected that the e¤ect of experience
on earnings was di¤erent for men than for women?

If we include an interaction term between female and experience, then the


coe¢ cient of the interaction term would capture the di¤erence in the e¤ect by
gender.

6
3. Suppose we consider the following equation

price = 0 + 1 assess + U.

The variable price is house price and assess is the assessed housing price (before the
house was sold). Suppose we would like to test whether the assessed housing price
is a rational valuation. If this is the case, then one unit change in assess should be
associated with one unit change in price.

(a) The estimated equation is

[ =
price 14:47 + 0:976assess
(16:27) 0:049

N = 88, R2 = 0:82, SSR = 165; 622:51

Test the hypothesis H0 : 1 = 1 against the two-side alternative. What do you


conclude?

The t statistic for H0 : 1 = 1 is (:976 1)=0:049 :49, which is not


signi…cant at any of signi…cant levels.

(b) To test the joint hypothesis that H0 : = 0;


1 = 1, we need the SSR in
0
PN
the restricted model. This amounts to computing i (pricei assessi )2 , where
N = 88, since the residuals in the restricted model are just pricei assessi . (No
estimation is needed for the restricted model because both parameters are speci…ed
under H0 . This turns out to yield SSR = 209; 448:99. Carry out the F-test for
the joint hypothesis.

We use the SSR form of the F statistic. We are testing q = 2 restrictions and
the df in the unrestricted model is 86. We are given SSRr = 209; 448:99 and
SSRur = 165; 622:51. Therefore,

(209; 448:99 165; 622:51)=2


F statistic = 11:38.
165; 622:51=86

7
which is a strong rejection of H0 from F-distribution table, the 1% critical
value with q = 2 and df = 90 df is 4:85:

(c) Now, test H0 : 2 = 0; 3 = 0, and 4 = 0 in the model

price = 0 + 1 assess + 2 lotsize + 3 sqrf t + 4 bdrms +U

where lotsize is size of the lot, in feet, sqrf t is square footage, and bdrms is a
number of bedrooms. The R2 from estimating this model using the same 88 houses
is 0:829.

We use the R2 form of the F-statistic. We are testing q = 3 restrictions


and there are 88 5 = 83 df in the unrestricted model. The F-statistic is
(0:829 0:820)=3
(1 0:829)=83 1:46. The 10% critical value (again using 90 denominator df
in F-distribution table) is 2:15, so we fail to reject H0 at even the 10% level.
In fact, the p-value is about 0:23.

(d) If the variance of price change with access, lotsize, sqrft, or bdrms, what can you
say about the F test from part (c)?

If heteroskedasticity were present, Assumption MLR.6 would be violated, and


the F-statistic would not have an F-distribution under the null hypothesis.
Therefore, comparing the F-statistic against the usual critical values, or ob-
taining the p-value from the F distribution, would not be especially meaning-
ful.

8
4. Suppose using data the following equation was estimated

\
rdintens = 2:613 + 0:0003sales 0:000000007sales2
(0:429) 0:00014 0:0000000037

N = 32, R2 = 0:1484.

The variable rdintens is R&D spending as percentage of sales, and sales is …rm sales
in million dollar.

(a) Would you keep the quadratic term in the model? Explain.

Probably. Its t statistic is about 1:89, which is signi…cant against the one-
sided alternative H1 : < 0 at the 5% level (t0:05 1:70 with df = 29). In
fact, the p-value is about 0:0344. If we consider the null hypothesis H0 : = 0,
then it is insigni…cant at the 5% level, but signi…cant at the 10% level.

(b) De…ne salesbil as sales measured in billions of dollars: salebil = sales=1000.


Rewrite the estimated equation with salesbil and salesbil2 as the regressors. Be
sure to report SE and the R2 . [Hint: Note that salesbil2 = sales2 =(1000)2 .

Because sales gets divided by 1; 000 to obtain salesbil, the corresponding co-
e¢ cient gets multiplied by 1; 000: (1; 000) (0:00030) = :30. The standard
error gets multiplied by the same factor. As stated in the hint, salesbil2 =
sales=1; 000; 000, and so the coe¢ cient on the quadratic gets multiplied by
one million: (1; 000; 000) (0:0000000070) = :0070; its standard error also
gets multiplied by one million. Nothing happens to the intercept (because
rdintens has not been rescaled) or to the R2 :

\
rdintens = 2:613 + 0:3 salesbil 0:007salesbil2
(0:429) 0:14 0:0037
2
N = 32, R = 0:1484.

9
5. Suppose using data the following equation was estimated

\
log(wage) = 5:65 + 0:047educ + 0:00078educ pareudc + 0:019exper
(0:13) 0:01 0:00021 0:004

+0:010tenure
0:003

N = 722, R2 = 0:169.

The variable pareudc is the total amount of both parents’ education, and tenure is
years with the current employer.

(a) Interpret the coe¢ cient on the interaction term. It might help to choose two
speci…c values for pareudc, for example pareudc = 32, if both parents have a
college education, or pareudc = 24 if both parents have a highschool education,
and to compare the estimated return to education.

We use the values pareduc = 32 and pareduc = 24 to interpret the coe¢ cient
on educ pareudc. The di¤erence in the estimated return to education is
0:00078 (32 24) = 0:0062, or about 0:62 percentage points.

(b) When pareudc is added as a separate variable to the equation, we get

\
log(wage) = 4:49 + 0:097educ + 0:034pareudc 0:0016educ pareudc + 0:020exper
(0:38) 0:027 0:017 0:0012 0:004

+0:010tenure
0:003

N = 722, R2 = 0:174.

Does the estimated return to education now depend positively on parent educa-
tion? Test the null hypothesis that the return to education does not depend on
parent education.

When we add pareduc by itself, the coe¢ cient on the interaction term is nega-
tive. The t statistic on educ pareudc is about 1:33, which is not signi…cant
at the 10% level against a two-sided alternative. Note that the coe¢ cient on

10
pareduc is signi…cant at the 5% level against a two-sided alternative. This
provides a good example of how omitting a level e¤ect (pareudc in this case)
can lead to biased estimation of the interaction e¤ect.

11
6. Suppose we want to estimate the e¤ects of alcohol consumption (alcohol) on college
grade point average (colGP A). In addition to collecting this information, we also
obtain attendance information (say, percentage of lectures attended, called attend). A
standardized test score (say, SAT ) and highschool GPA (hsGP A) are also available.

(a) Should we include attend along with alcohol as regressors in a multiple regression
model? (Think about how you would interpret alcohol .)

The answer is not entirely obvious, but one must properly interpret the coef-
…cient on alcohol in either case. If we include attend, then we are measuring
the e¤ect of alcohol consumption on college GPA, holding attendance …xed.
Because attendance is likely to be an important mechanism through which
drinking a¤ects performance, we probably do not want to hold it …xed in the
analysis. If we do include attend, then we interpret the estimate of alcohol as
measuring those e¤ects on colGP A that are not due to attending class. (For
example, we could be measuring the e¤ects that drinking alcohol has on study
time.) To get a total e¤ect of alcohol consumption, we would leave attend out.

(b) Should SAT and hsGP A be included as regressors? Explain.

We would want to include SAT and hsGP A as controls, as these measure


student abilities and motivation. Drinking behavior in college could be corre-
lated with one’s performance in high school and on standardized tests. Other
factors, such as family background, would also be good controls.

12
7. Suppose using data the following equation was estimated

c = 1; 028:1 + 19:3hsize
sat 2:19hsize2 45:09f emale
(6:29) 3:83 0:53 4:29

169:81black + 62:31f emale black


12:71 18:15

N = 4; 137, R2 = 0:0858.

The variable sat is the college entrance exam score, hsize is size of the student’s high-
school graduating class, in hundreds, f emale is a gender dummy variable, and black is
a race dummy variable equal to one for blacks and zero otherwise.

(a) Is there strong evidence that hsize2 should be included in the model?

The t statistic on hsize2 is over four in absolute value, so there is very strong
evidence that it belongs in the equation.

(b) Holding hsize …xed, what is the estimated di¤erence in sat score between non-
black females and nonblack males? How statistically signi…cant is this estimated
di¤erence?

This is given by the coe¢ cient on female (since black = 0): nonblack females
have SAT scores about 45 points lower than nonblack males. The t statistic
is about 10:51, so the di¤erence is very statistically signi…cant. (The very
large sample size certainly contributes to the statistical signi…cance.

(c) What is the estimated di¤erence in sat score between nonblack males and black
males? Test the null hypothesis that there is no di¤erence between their scores,
against the alternative that there is a di¤erence.

13
Because f emale = 0, the coe¢ cient on black implies that a black male has an
estimated SAT score almost 170 points less than a comparable nonblack male.
The t statistic is over 13 in absolute value, so we easily reject the hypothesis
that there is no ceteris paribus di¤erence.

(d) What is the estimated di¤erence in sat score between black females and nonblack
females? What would you need to do to test whether the di¤erence is statistically
signi…cant?

We plug in black = 1, f emale = 1 for black females and black = 0 and


f emale = 1 for nonblack females. The di¤erence is therefore 169:81 +
62:31 = 107:50. Because the estimate depends on two coe¢ cients, we cannot
construct a t statistic from the information given. The easiest approach is to
de…ne dummy variables for three of the four race/gender categories and choose
nonblack females as the base group. We can then obtain the t statistic we
want as the coe¢ cient on the black female dummy variable.

14
8. Suppose you collect data from a survey on wages, education, experience, and gender.
In addition, you ask for information about marijuana usage. The original question is :
“On how many separate occasions last month did you smoke marijuana?”

(a) Write an equation that would allow you to estimate the e¤ects of marijuana usage
on wage, while controlling for other factors. You should be able to make statement
such as, “Smoking marijuana …ve more times per month is estimated to change
wage by x%.”

We want to have a constant semi-elasticity model, so a standard wage equation


with marijuana usage included would be

2
log(wage) = 0 + 1 usage + 2 educ + 3 exper + 4 exper

+ 5 f emale +U

Then 100 1 is the approximate percentage change in wage when marijuana


usage increases by one time per month.

(b) Write a model that would allow you to test whether drug usage has di¤erent e¤ects
on wages for men and women. How would you test that there are no di¤erence in
the e¤ects of drug usage for men and women?

We would add an interaction term in female and usage:

2
log(wage) = 0 + 1 usage + 2 educ + 3 exper + 4 exper

+ 5 f emale + 6 f emale usage + U

The null hypothesis that the e¤ect of marijuana usage does not di¤er by gender
is : H0 : 6 = 0. The test can be done by using t-statistic of 6.

15
(c) Suppose you think it is better measure marijuana usage by putting people into
one of four categories: nonuser, light user (1 to 5 times per month), moderate user
(5 to 10 times per month), and heavy user (more than 10 times per month). Now,
write a model that allow you to estimate the e¤ects of marijuana usage on wage.

We take the base group to be nonuser. Then we need dummy variables for
the other three groups:

lghtuser = 1 if light user

= 0 otherwise,

moduser = 1 if moderate user

= 0 otherwise,

hvyuser = 1 if heavy user

= 0 otherwise.

Assuming no interactive e¤ect with gender, the model would be

log(wage) = 0 + 1 lghtuser + 2 moduser + 3 hvyuser + 2 educ

2
+ 3 exper + 4 exper + 4 f emale + U.

Then, 1 will capture the e¤ect of light usage of marijuana on wage relative
to nonuser, and other two coe¢ cients of dummy variables will do analogously.

(d) Using the model in (c), explain in detail how to test the null hypothesis that
marijuana usage has no e¤ect on wage. Be very speci…c and include a careful
listing of degree of freedom.

The null hypothesis is H0 : 1 = 0; 2 = 0; 3 = 0, for a total of q = 3


restrictions. If N is the sample size, the df in the unrestricted model – the
denominator df in the F distribution – is N 8. So we would obtain the
critical value from the F q; n 8 distribution.

16
(e) What are some potential problems with drawing causal inference using the survey
data that you collected?

The error term could contain factors, such as family background (including
parental history of drug abuse) that could directly a¤ect wages and also be
correlated with marijuana usage. We are interested in the e¤ects of a person’s
drug usage on his or her wage, so we would like to hold other confounding
factors …xed. We could try to collect data on relevant background information.

17

You might also like