First Assignment
First Assignment
HOMEWORK 1
(Review of linear econometrics and review of
methods)
1
(1) (3 points) Generate an histogram of the medical costs and compute de-
scriptive statistics (mean, median, standard deviation, minimum, max-
imum). Is the distribution symmetric? Why or why not, in your view?
(5) (4 points) We want to test whether the coefficient θ2 for bmi is statis-
tically significant. Test the hypothesis using the relevant test statistic.
Does bmi have more or less explanatory power than age?
(6) (3 points) We want to test whether the coefficient θ2 for bmi is statis-
tically significant. Test the hypothesis using the relevant p-value.
(7) (5 points) Test the single linear restriction θ1 = 3θ2 using the relevant
test statistic.
(8) (3 points) Test the single linear restriction θ1 = 3θ2 using the relevant
p-value.
2
(11) (4 points) Using the estimated model, predict medical costs for a 50
year-old person with bmi = 36 and 4 children. Is the prediction lower or
higher than the mean of the distribution of the medical costs? (Recall
that the regression gives you a prediction for the log of the medical
costs (say, log(y)) not for the medical costs (say, y). Hence, after you
find the prediction for the log of the medical costs, you need to make
a transformation to find a prediction for the medical costs themselves.
Hint: if log(y) is normal, y is lognormal. What is E(y) for a log normal
random variable? )
Now, take the categorical variables into account using dummy variables
(https://en.wikipedia.org/wiki/Dummy_variable_(statistics)).
(12) (3 points) How much more (or less) do males spend relative to females
(controlling for all other variables)?
(13) (3 points) How much more (or less) do smokers spend relative to non
smokers (controlling for all other variables)?
(14) (3 points) In which region are medical costs higher (controlling for all
other variables)?
(15) (3 points) What is the difference in medical costs between the northeast
and the southwest (controlling for all other variables)?
(16) (4 points) Are the coefficients associated with the dummies individually
statistically significant?
(17) (4 points) Using your model, predict medical costs for a 50 year-old
male smoker with bmi = 36 who lives in the southwest and has 4
children.
3
(1) (4 points) Show that the sample variance s2x is biased for the true
variance σ 2 .
(2) (3 points) How would you correct the bias?
(3) (3 points) What is the bias of the infeasible variance estimator s2x,inf =
1
PT 2
T t=1 (xt − µ) . Why am I calling this estimator infeasible?
(2)
(4) (4 points) Show that s2x is consistent for σ 2 by applying the LLN to
(a), (b) and (c) in Eq (1).
√
(5) (4 points) Show that T (s2x − σ 2 ) is asymptotically normal by ap-
plying the LLN, the CLT and Slutsky’s theorem to (a∗ ), (b∗ ) and (c∗ )
in Eq. (2).
Notice that consistency is a statement about sample averages, like s2x , con-
verging (as T → ∞) to expected values. Asymptotic normality is a statement
√
about demeaned (by σ 2 , in our example)
√ and standardized (by T , in
our example) sample averages, like T (s2x − σ 2 ), converging (as T → ∞) to
a mean-zero normal distribution.
4
(6) (6 points) Use my sample Python codes from Lecture 1 to write a code
which shows consistency of s2x . You should draw your observations from
a random variable which is neither exponential nor normal.