0% found this document useful (0 votes)
23 views14 pages

Final Exam 102 w10 Solutions

This document provides the solutions to a final exam for an economics course. It includes instructions for completing the exam, formulas that may be useful, and multiple choice and long answer questions. The multiple choice section has 6 questions testing concepts like how adding regressors affects R-squared, factors that influence confidence interval width, hypothesis testing of regression coefficients, and sources of bias. The long answer questions require showing work.

Uploaded by

Belay Daba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Final Exam 102 w10 Solutions

This document provides the solutions to a final exam for an economics course. It includes instructions for completing the exam, formulas that may be useful, and multiple choice and long answer questions. The multiple choice section has 6 questions testing concepts like how adding regressors affects R-squared, factors that influence confidence interval width, hypothesis testing of regression coefficients, and sources of bias. The long answer questions require showing work.

Uploaded by

Belay Daba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Ecn 102 - Analysis of Economic Data

University of California - Davis March 19, 2010


Instructor: John Parman

Final Exam - Solutions


You have until 5:30pm to complete this exam. Please remember to put your name, section and
ID number on both your scantron sheet and the exam. Fill in test form A on the scantron sheet.
Answer all multiple choice questions on your scantron sheet. Choose the single best answer for
each multiple choice question. Answer the long answer questions directly on the exam. Keep your
answers complete but concise. For the long answer questions, you must show your work where
appropriate for full credit.

Name: ID Number: Section:

(POTENTIALLY) USEFUL FORMULAS


1 Pn 1 Pn
x̄ = n i=1 xi sxy = n−1 i=1 (xi − x̄)(yi − ȳ)
1 Pn
s2 = − x̄)2
Pn
n−1 i=1 (xi rxy = √Pn i=1 (xi −x̄)(yi −ȳ)
Pn
2 2
i=1 (xi −x̄) · i=1 (yi −ȳ)
s
CV = x̄
sxy
Pn xi −x̄ 3
rxy = √
sxx ·syy
n
skew = (n−1)(n−2) i=1 ( s )
Pn
n(n+1) 3(n−1)2 b2 = Pn(xi −x̄)(yi 2−ȳ)
i=1
Pn xi −x̄ 4 i=1 (xi −x̄)
kurt = (n−1)(n−2)(n−3) i=1 ( s ) − (n−2)(n−3)
b1 = ȳ − b2 x̄
µ = E(X)

z∗ = x̄−µ yˆi = b1 + b2 xi
√σ
n
1 Pn
x̄−µ s2e = i=1 (yi − yˆi )2
t∗ = √s
n−2
n Pn
T SS = i=1 (yi − ȳ)2
P r[Tn−k > tα,n−k ] = α
Pn
ESS = i=1 (yi − yˆi )2
P r[|Tn−k | > t α2 ,n−k ] = α
ESS
Pn R2 = 1 − T SS
i=1 a = na
q
Pn Pn s2e
i=1 (axi ) = a i=1 xi s b2 = Pn 2
i=1 (xi −x̄)
Pn Pn Pn
i=1 (xi + yi ) = i=1 xi + i=1 yi F = R2 n−k
1−R2 k−1
s2 = x̄(1 − x̄) for proportions data
ESSr −ESSu n−k
F = ESSu k−g
tα,n−k = T IN V (2α, n − k)
2 −R2
Ru r n−k
F =
P r(|Tn−k | ≥ |t∗ |) = T DIST (|t∗ |, n − k, 2) 1−Ru2 k−g

P r(Tn−k ≥ t∗ ) = T DIST (t∗ , n − k, 1) R2 = 1 − n−1 ESS


n−k T SS
2 Final Exam - Solutions

SECTION I: MULTIPLE CHOICE (60 points)

1. Suppose we regress SAT score on parent’s education and parent’s income. If we run the
regression again but also include the student’s GPA as an additional regressor:
(a) The R2 for the regression will either stay the same or increase.
(b) The adjusted R2 for the regression will either stay the same or increase.
(c) Both (a) and (b) are true.
(d) Neither (a) nor (b) is true.
(a) When adding an additional regressor, our fit should be at least as good as before,
so the R2 for the regression should either stay the same or increase. Adjusted R2
may decrease if the additional regressor had little to no explanatory power.
2. Suppose we have a sample of the heights of Davis students and want to use the sample mean
to get a confidence interval for the mean height in the population. Which of the following
would increase the width of this confidence interval?
(a) Switching from a 95% confidence interval to a 90% confidence interval.
(b) Increasing the sample size used to calculate the sample mean.
(c) Switching from a 95% confidence interval to a 99% confidence interval.
(d) All of the above.
(c) The smaller we make α, the wider our confidence interval will get. A larger
sample size would make the confidence interval narrower.
3. Suppose we can reject the null hypothesis that β2 ≥ 0 at a 5% significance level where β2 is
the slope coefficient from a bivariate regression. Which of the following is definitely true?
(a) Our test statistic was negative.
(b) We can reject the null hypothesis that β2 = 0 at a 5% significance level.
(c) We can reject the null hypothesis that β2 ≥ 0 at a 2.5% significance level.
(d) We can reject the null hypothesis that β2 < 0 at a 5% significance level.
(a) The critical value for a lower one-tailed hypothesis test will be negative and we
will reject the null when the test statistic is more negative than the critical value.
4. Suppose we regress y on x2 . Which of the following would lead to a biased coefficient for x2 ?
(a) There is a variable x3 that is correlated with y but not with x2
(b) There is a variable x3 that is correlated with x2 but not with y.
(c) y is measured with some random error.
(d) x is measured with some random error.
(d) An omitted variable will bias the coefficient on x2 only if it is correlated with
both x2 and with y. Measurement error in x2 will bias the coefficient on x2 since
it will lead to errors that are negatively correlated with x2 . Measurement error in
y will decrease the precision of the estimated slope coefficient but will not bias the
coefficient.
5. When testing the significance of a subset of regressors, the R2 of the unrestricted model will
always be:
Final Exam - Solutions 3

(a) Greater than or equal to the R2 of the restricted model.


(b) Less than or equal to the R2 of the restricted model.
(c) Equal to the R2 of the restricted model.
(d) It could be greater than, less than or equal to the R2 of the restricted model.
(a) The unrestricted model contains all of the regressors in the restricted model
plus additional regressors. The unrestricted model can achieve the same fit as the
restricted model by simply having the coefficients on the additional regressors set to
zero. More likely is that these coefficients will be nonzero and the fit will improve.
6. Suppose that the true population model of the relationship between weight (W ) and hours
of exercise per day (H) is:
W = 200 − 10H + ε
where ε meets all of our assumptions. If hours of exercise is measured with some random,
mean zero error, which of the following statements about the estimated slope coefficient b˜2 is
true?
(a) E(b˜2 ) = −10.
(b) E(b˜2 ) = 10.
(c) E(b˜2 ) > −10.
(d) E(b˜2 ) < −10.
(c) The slope coefficient will be biased toward zero due to the measurement error.
In this case, that means that the expected value of b˜2 will be between -10 and 0.
7. Doubling the value of the largest observation in a sample of incomes will:
(a) Increase the median.
(b) Increase the mode.
(c) Increase the mean.
(d) All of the above.
(c) Doubling the largest observation will increase the average value of the variable
but will not change the position or value of the 50th percentile of the distribution
of values.
8. Suppose that the risk of catching the flu is high for young children and the elderly but low
for teenagers and younger adults. Which of the following equations would be the best choice
for modeling the relationship between flu risk (R) and age (A)?
(a) R = β1 + β2 A + ε.
(b) ln(R) = β1 + β2 ln(A) + ε.
(c) R = β1 + β2 A + β2 A2 + ε.
(d) R = β1 + β2 ln(A) + ε.
(c) Based on the description, flu risk is first decreasing with age and then increasing
with age. We need a polynomial to fit this type of parabolic curve.
9. Suppose that snow depth measured in feet is included as a regressor in a multivariate re-
gression and the magnitude of the estimated coefficient for snowdepth is 5. If we rerun the
4 Final Exam - Solutions

regression using snow depth measured in inches, the new estimated coefficient on snowfall
will be:
(a) Larger than 5.
(b) Smaller than 5.
(c) Still equal to 5.
(d) Not enough information.
(b) The coefficient is giving us the change in y with a change in snowdepth of one
1
foot. The change in y with a change in snowdepth of one inch will be 12 of this
value.
10. When regressing annual work hours on income, a researcher finds that the variance of the
residuals increases as work hours increases. This will affect:
(a) The expected value of the slope coefficient for income.
(b) The magnitude of the standard error for the slope coefficient for income.
(c) Both (a) and (b).
(d) Neither (a) nor (b).
(b) This is a case of heteroskedasticity. Heteroskedasticity will change the standard
errors of our estimates but will not bias the coefficients.
11. The histogram for hours of study per week based on a sample of 400 Davis students is
symmetric and centered at 15 hours. Which of the following statements is true?
(a) The sample median is 15 hours.
(b) The sample mean is 15 hours.
(c) The skewness for the sample is zero.
(d) All of the above.
(d) Because the distribution is symmetric, 50 percent of the observations will be
to the right of 15 hours and 50 percent will be to the left of 15 hours, making 15
hours the median. The symmetry will also lead to the mean being equal to the 15
hours (for every observation that is larger than 15, there is a corresponding obser-
vation that is smaller than 15 by the same amount). For a symmetric distribution,
skewness is zero.
12. When running a bivariate regression, which of the following is not possible?
(a) The error sum of squares is larger than the total sum of squares.
(b) The error sum of squares is equal to the total sum of squares.
(c) The error sum of squares is zero.
(d) The error sum of squares is positive.
(a) The largest the error sum of squares can ever be is the magnitude of the total
sum of squares. If it were larger you could achieve a better fit by simply setting all
of your slope coefficients to zero.
13. Which of the following would definitely not lead to the error term being correlated with a
regressor x?
Final Exam - Solutions 5

(a) Random measurement error in x.


(b) An omitted variable correlated with x.
(c) Choosing an incorrect functional form for the regression equation.
(d) Random measurement error in y.
(d) If the measurement error in y is truly random, then it will be independent of
the value of x. So adding this measurement error into the error term will leave the
error term uncorrelated with x.
14. Adding an irrelevant variable to a regression will:
(a) Have no effect on the regression results.
(b) Tend to bias the coefficients for the other regressors.
(c) Lower the R2 .
(d) None of the above.
(d) Including an irrelevant variable may increase our standard errors. It will also
lower the adjusted R2 for the regression.
15. Suppose we run a regression with GPA as the dependent variable and SAT score as the
independent variable. Which of the following statements is definitely true?
(a) The sign of the estimated slope coefficient will be the same as the sign of the correlation
between GPA and SAT score.
(b) The sign of the estimated slope coefficient could be different than the sign of the corre-
lation between GPA and SAT score if there are omitted variables.
(c) The magnitude of the slope coefficient will be equal to the magnitude of the correlation
between GPA and SAT score.
(d) The slope coefficient will be statistically significant.
(a) The slope coefficient is a function of the correlation between GPA and SAT
score. The signs will be the same. Even if the sign of the true relationship is the
opposite of the estimated coefficient due to omitted variable bias, the sign of the
correlation will match up with the sign of the estimated coefficient (the correlation
does not control for the omitted variable either).
16. Which of the following is not a measure of central tendency?
(a) The mean.
(b) The mode.
(c) The sample range.
(d) The median.
(c) The sample range is a measure of dispersion. If the entire sample distribution
was shifted, the center of the distribution would certainly shift but the sample range
would stay exactly the same.
17. Suppose we use an F test after running a multivariate regression to test the null hypothesis
that β3 = β4 = 0 and get an F statistic that is larger than the critical value for a 5%
significance level. We would conclude that:
(a) β3 6= β4 .
6 Final Exam - Solutions

(b) β3 > 0 or β4 > 0.


6 0 and β4 6= 0.
(c) β3 =
(d) None of the above.
(d) We would reject the null hypothesis that β3 = β4 = 0 in favor of the alternative
hypothesis that at least one of the coefficients is different from zero. We cannot say
anything about whether both are different than zero or whether they are different
from each other.
18. Which of the following would make you more likely to reject the hypothesis that an individual
slope coefficient is equal to zero?
(a) A larger standard error for that slope coefficient.
(b) A smaller t statistic for that slope coefficient.
(c) A smaller F statistic for the regression.
(d) A larger value for the ratio of the coefficient to its standard error.
(d) When testing whether a slope coefficient is different than zero, the test statistic
is simply the ratio of the coefficient to the standard error. We are more likely to
reject the null hypothesis that the coefficient is equal to zero when this test statistic
is larger in magnitude.
19. Suppose that we include a dummy variable for male and a dummy variable for female in a
regression. This will create a:
(a) Omitted variable bias problem.
(b) Heteroskedasticity problem.
(c) Multicollinearity problem.
(d) Homoskedasticity problem.
(c) The dummy variables will be perfectly collinear (the value of one always tells
us exactly what the value of the other one is).
20. The distribution of the sample mean will:
(a) Be centered at zero.
(b) Have a smaller variance for smaller sample sizes.
(c) Centered at the population mean.
(d) (b) and (c).
(c) The sample mean is normally distributed with a mean equal to the population
mean and a variance that decreases as sample size increases.
21. Suppose the R2 for a bivariate regression is equal to 1. This tells us that:
(a) The correlation between the dependent and independent variables is equal to 1.
(b) The slope coefficient is equal to 1 or -1.
(c) The error sum of squares is equal to the total sum of squares.
(d) The dependent and independent variables are perfectly correlated.
(d) An R2 of 1 tells us that the error sum of squares is equal to zero and the
variables are perfectly correlated. It does not tell us whether the correlation is
equal to 1 or -1.
Final Exam - Solutions 7

22. Suppose we ran a regression of Y on X 1000 times using 1000 different samples and made a
histogram of the resulting slope coefficient values. Which of the following is true about the
distribution shown on the histogram?
(a) It would be centered at zero.
(b) It would look like a normal distribution.
(c) All of the observations would be located at the true value of the slope coefficient.
(d) It would be right skewed.
(b) The estimated slope coefficient is simply a random variable. It will be dis-
tributed normally with a mean equal to the true population value of the slope
coefficient.
23. Which of the following depends on the units variables are measured in?
(a) Correlation.
(b) Coefficient of variation.
(c) Estimated slope coefficient.
(d) t statistic.
(c) The estimated slope coefficient is in the units of y divided by the units of x.
Changing the units of either y or x will rescale the slope coefficient.
24. Suppose the size of your social network grows exponentially over time. Which of the following
equations would be the most appropriate for modeling social network size (S) as a function
of time (T ):
(a) S = β1 + β2 T + ε.
(b) ln(S) = β1 + β2 ln(T ) + ε.
(c) S = β1 + β2 ln(T ) + ε.
(d) ln(S) = β1 + β2 T + ε.
(d) If S grows exponentially, then S increases by a constant percentage for every
one unit change in time. This can be modeled with a log-linear equation.
25. Suppose that we want to test whether eye color influences the likelihood of being hired for a
job. Our dataset includes five different values for the eye color variable. If we want to regress
the probability of being hired on eye color, we will:
(a) Convert each eye color to a number and include this new variable in for eye color number
in the regression.
(b) Create dummy variables for each eye color and include all of the dummy variables as
regressors.
(c) Create dummy variables for each eye color and include four of the dummy variables as
regressors.
(d) Create dummy variables for each eye color and include three of the dummy variables as
regressors.
(c) We always include one fewer dummy variable than the total number of cate-
gories. If we didn’t do this, we would run into the dummy variable trap and have
a perfect collinearity problem (one of the dummy variables could be rewritten in
terms of the other dummy variables).
8 Final Exam - Solutions

SECTION II: SHORT ANSWER (40 points)

1. (14 points) Suppose that the number of traffic accidents (N ) is a function of the number of
cars on the road (C) and the average speed of cars on the road (S). The true population
relationship between accidents, cars and average speed is given by:

N = 1000 + 50C + 25S + ε (1)

where ε is a random error that meets all of our standard assumptions. The number of
cars on the road is negatively correlated with average speed due to the increased congestion
associated with additional cars. The true population relationship between the number of cars
and average speed is given by:
S = 80 − 4C + ν (2)
where ν is a random error that meets all of our standard assumptions.
(a) If you ran a regression with N as the dependent variable and C and S as the independent
variables, what would the expected value of the estimated slope coefficient for C be?
Assume that you include a constant term in your regression.
Given that ε meets all of our standard assumptions, the estimated coefficient
will be unbiased. So its expected value will be equal to the true population
value of 50.
(b) If you ran a regression with N as the dependent variable and C as the only independent
variable, what would the expected value of the estimated slope coefficient for C be?
Assume that you include a constant term in your regression.
By omitting S from the regression equation, S enters the error term making
the error term correlated with C and creating an omitted variable bias. The
expected value of the estimated coefficient for C will be equal to the true value
plus a bias term that captures the indirect effect of S on N :

E(b˜c ) = βc + βs · γc

E(b˜c ) = 50 + 25 · (−4)
E(b˜c ) = −50
(c) Suppose that you ran a regression with average speed as the dependent variable and
number of cars on the road as the independent variable but you forced the intercept to
be zero (in other words, you do not include a constant term). Will the the expected
value of the estimated slope coefficient be greater than, equal to or less than the true
population value of the slope coefficient? Include a written explanation and a scatter
plot showing speed as a function of number of cars to illustrate your answer. Assume
that we always observe positive numbers of cars and positive average speeds.
We know that all of our data points will have positive values for number of cars
and average speed, so they will all lie above and to the right of the origin on a
graph with S on the vertical axis and C on the horizontal axis. We are forcing
our regression line to pass the origin and through this scatter of data points
above and to the right of the origin. This means that we will get a positive
Final Exam - Solutions 9

slope for the regression line. Given that the true value of the slope coefficient is
negative, the estimated slope will certainly be greater than the true value. This
situation is depicted on the graph below.

S
estimated
80 regression line,
slope>0

true population line,


slope=-4

C
10 Final Exam - Solutions

SUMMARY OUTPUT: height as dependent variable

Regression Statistics
Multiple R 0.747320875
R Square 0.558488491
Adjusted R Square 0.533259262
Standard Error 0.357993439
Observations 75

ANOVA
df SS MS F Significance F
Regression 4 11.34802735 2.837007 22.13657 7.68637E‐12
Residual 70 8.971151182 0.128159
Total 74 20.31917853

Coefficients Standard Error t Stat P‐value


Intercept 68.41515323 0.082797499 826.2949 2.3E‐141
northeast ‐0.731809115 0.102140602 ‐7.16472 6.25E‐10
south ‐0.260178112 0.137428521 ‐1.89319 0.062467
west 0.256275028 0.141697586 1.808605 0.074807
typhoiddeaths 0.02171783 0.009977496 2.176682 0.03288
Omitted region dummy variable is midwest.

SUMMARY OUTPUT: height as dependent variable

Regression Statistics
Multiple R 0.260580275
R Square 0.06790208
Adjusted R Square 0.055133615
Standard Error 0.509357157
Observations 75

ANOVA
df SS MS F Significance F
Regression 1 1.379714485 1.379714 5.317952 0.023948764
Residual 73 18.93946405 0.259445
Total 74 20.31917853

Coefficients Standard Error t Stat P‐value


Intercept 68.09642269 0.07802453 872.7566 2E‐148
typhoiddeaths 0.026346173 0.011424714 2.306068 0.023949
Final Exam - Solutions 11

2. (14 points) For this problem, use the regression output shown on the previous page. Both
regressions use the same data set. The dataset is a sample of 75 cities. height is a variable
giving the average height in inches of adult males from the city. tyhpoiddeaths is a variable
giving the number of typhoid deaths per 1,000 people in the city. The variables northeast,
south and west are all dummy variables that are equal to one if the city is in that region
and zero otherwise. All cities are located either in the Northeast, the South, the West or the
Midwest.
(a) Based on the regression results, what is the difference in the average male height between
a city in the South and a city in the Northeast.

E(height|south = 1) = b1 + b2 northeast + b3 south + b4 west + b5 typhoiddeaths

E(height|south = 1) = b1 + b2 · 0 + b3 · 1 + b4 · 0 + b5 typhoiddeaths
E(height|south = 1) = b1 + b3 + b5 typhoiddeaths
E(height|northeast = 1) = b1 + b2 northeast + b3 south + b4 west + b5 typhoiddeaths
E(height|northeast = 1) = b1 + b2 · 1 + b3 · 0 + b4 · 0 + b5 typhoiddeaths
E(height|northeast = 1) = b1 + b2 + b5 typhoiddeaths
E(height|south = 1)−E(height|northeast = 1) = b1 +b3 +b5 typhoiddeaths−b1 −b2 −b5 tyhpoiddeaths
E(height|south = 1) − E(height|northeast = 1) = b3 − b2
E(height|south = 1) − E(height|northeast = 1) = (−.26) − (−.73)
E(height|south = 1) − E(height|northeast = 1) = .47
So the average height in a southern city is .47 inches greater than the average
height in a northeastern city.
(b) What is the average male height for a city in the West with no typhoid deaths?

E(height|west = 1, typhoid = 0) = b1 + b2 · 0 + b3 · 0 + b4 · 1 + b5 · 0

E(height|west = 1, typhoid = 0) = b1 + b4
E(height|west = 1, typhoid = 0) = 68.42 + .26 = 68.68
(c) Based on the first set of regression results, can you reject the null hypothesis that the
coefficient for typhoid deaths is less than or equal to zero at a 5% significance level? Be
certain to justify your answer.
Notice that the p-value for the typhoid deaths coefficient is .033. This value
corresponds to a two-tailed test and means that would reject the null hypothesis
that the coefficient is equal to 0 at a 5% significance level (.05 > .033) and that
our t-statistic is to the right of t.025,n−k (since our coefficient is positive). For
an upper one-sided test, we would reject the null if the t-statistic is to the
right of t.05,n−k . Notice that t∗ > t.025,n−k > t.05,n−k , so we will reject the null
hypothesis that the coefficient is less than or equal to zero at a 5% significance
level.
12 Final Exam - Solutions

(d) Calculate the test statistic you would use to test the following set of hypotheses:

Ho : βne = βs = βw = 0

Ha : at least one of βne , βs and βw is different than zero


We are testing the significance of a subset of regressors. This requires calculating
an F statistic:
R2 − Rr2 n − k
F∗ = u
1 − Ru2 k − g
.56 − .07 75 − 5
F∗ =
1 − .56 5 − 2
F ∗ = 25.98
(e) Explain how you would use your test statistic from part (d) to decide whether or not to
reject the null hypothesis. Be as specific as possible.
We could take either the p-value approach or the critical value approach. For
the p-value approach, we would use FDIST() in Excel to calculate the p-value
associate with our F statistic and our degrees of freedom. We would have Excel
calculate FDIST(25.98, 3, 70). The result would be our p-value. We would
reject the null hypothesis if this p-value is less than our chosen significance level
α.
For the critical value approach, we would need to calculate the critical value
corresponding to our chosen significance level α. We could do this in Excel by
calculating FINV(α, 3, 70). If the resulting critical value is less than our F
statistic, we would reject the null hypothesis.
Final Exam - Solutions 13

3. (12 points) Suppose that we are interested in the relationship between hours of weekly exercise
and resting heart rate. The more a person exercises on average, the lower his or her resting
heart rate is. For individuals who don’t exercise at all, males have a lower resting heart
rate on average than females. The decrease in resting heart rate from an additional hour of
exercise per week is bigger for males than females.
(a) Write down the regression model you would use to estimate the relationship between
resting heart rate, gender and weekly exercise. Resting heart rate should be your de-
pendent variable. Provide clear definitions of all variables you include in your model.
Our regression model will have to include resting heart rate, weekly exercise and
a variable capturing gender. Since gender is a categorical variable, we will need
to use a dummy variable. We have two values for gender (male and female) so
we will need one dummy variable. Let’s make our dummy variable for male, so
it equals one if gender is male and equals zero if gender is female. This leaves
gives us the following set of variables:
• R - resting heart rate
• E - amount of weekly exercise
• M - dummy variable equal to one if male, zero if female
Our regression equation will have R as the dependent variable. E and M will
be independent variables. We also need to include an interaction term between
E and M since the marginal effect of E on R depends on the value of M . This
gives us the following regression model:

R = β1 + β2 E + β3 M + β4 E · M + ε

(b) Based on the information given above, what are the expected signs for each of your
coefficients in the regression model you specified in part (a)?
Notice that β2 is the marginal effect of exercise on resting heart rate for females
(since the interaction term will be zero). According to the problem, more exer-
cise lowers the resting heart rate, so β2 should be negative. The marginal effect
of exercise on heart rate is larger (more negative) for males than females. This
marginal effect is captured by β2 + β4 , so β4 should be negative. For individuals
exercising the same amount, the difference between the average male heart rate
and the average female heart rate will be β3 . We are told that males have a
lower heart rate than females that exercise the same amount. So β3 should be
negative. Finally, heart rate has to be positive overall, so the constant term β1
should be positive (if it were negative, we would predict that a female who does
not exercise has a negative heart rate). To summarize:

β1 > 0

β2 < 0
β3 < 0
β4 < 0
14 Final Exam - Solutions

Note that if you used a dummy variable equal to one for females and zero for
males, your signs for β3 and β4 would be reversed. The signs for β1 and β2
would stay the same.
(c) Suppose people tend to make random mistakes when measuring their heart rate. What
effects will this have on the estimation results when you run the regression model specified
in part (a)?
Measurement error in the dependent variable will not bias our coefficients. So
the expected values of the coefficients will stay the same. However, the mea-
surement error does add variance to the error term which will lead to less precise
estimates of the coefficients (larger standard errors).

You might also like