Properties of Sums: Problem Set 1 - Due July 16th ECON 139/239 2010 Summer Term II
Properties of Sums: Problem Set 1 - Due July 16th ECON 139/239 2010 Summer Term II
ECON 139/239
2010 Summer Term II
Properties of Sums
P
n
xi stands for the sum x1 + x2 + . . . + xn−1 + xn .
i=1
Some useful properties of sums are:
N
X
a = Na
i=1
n
X n
X
axi = a xi
i=1 i=1
n
X n
X n
X
(xi + yi ) = xi + yi
i=1 i=1 i=1
n X
m
à n !à m !
X X X
xi yj = xi yj
i=1 j=1 i=1 j=1
n X
X m n X
X m n X
X m
(xij + yij ) = xij + yij
i=1 j=1 i=1 j=1 i=1 j=1
n X
m n
à m !
X X X
xi yij = xi yij
i=1 j=1 i=1 j=1
This is wrong. Why? Because yij depends on both indices (i and j) so we can’t take it out
of the sum indexed by j.
Finally, you should also be aware that
X n
n X n
X
xi yj 6= xi yi
i=1 j=1 i=1
If you find some of the above properties difficult to follow, please practice using the following
exercise. It will not be graded, so you do not need to submit the answers.
Given x1 = 2, x2 = 5, x3 = 3, x4 = −2, x5 = 7, x6 = 3, compute (think of organizing
your computations efficiently!):
P6 P
4 P
6 P
6 P
6
(a) xi (b) xi (c) xi x7−i (d) x2i (e) (xi − 3)
i=1 i=2 i=1 i=1 i=1
For the following two problems, try to avoid computing and express the required sums
in terms of the ones you’ve already computed in a–e before:
P
6 P2 P
6 P
2
(f) xi k (g) xki
i=1 k=0 i=1 k=0
Graded Problems:
1. An Econometrics instructor wonders if midterm scores are higher for the students who
handed it in early (before the end of the exam) compared to the students who worked
on it until the end of the 75-minute exam period.
Let R denote a dummy (binary) variable equal to one if the student hands in the midterm
early. Let H denote a dummy variable equal to one if student scored above the median
on the midterm. Suppose that the joint distribution of R and H is known and is given
by the following table:
H=0 H=1
R=0 0.4 0.3 0.7
R=1 0.1 0.2 0.3
0.5 0.5
Solution:
Since H is binary,
P (H=1,R=0) 0.3
E(H|R = 0) = P (H|R = 0) = P (R=0)
= 0.7
= 37 ,
P (H=1,R=1) 0.2
E(H|R = 1) = P (H|R = 1) = P (R=1)
= 0.3
= 23 .
Page 2
(c) Are R and H independent? Justify your answer.
(d) In part (b), you have found that R and H are positively correlated. Does this mean
that students should hand in their midterms earlier to get a better grade? Explain.
Solution: No, correlation does not prove causation. There could be other fac-
tors that affect both R and H, which makes them to correlate. For example,
students who are better prepared, are likely to solve the exam faster and get a
better grade.
3. Let Y follow a binomial distribution with parameters (n, p), i.e., the probability function
of Y is
n!
P (Y = y) = py (1 − p)n−y (1)
y!(n − y)!
Calculate the limiting distribution of Y /n when n → ∞. (Hint: represent Y as a sum
of Bernoulli random variables.) Justify your answer thoroughly.
Page 3
Solution: A binomial random variable Y can be represented as a sum of n iid
Bernoulli random variables Xi with mean p and variance p(1 − p):
n
X
Y = Xi (2)
i=1
4. In this problem, you should show each step of your proofs, justifying each one of them
as precisely as you can.
2
Suppose you want to estimate σX , the variance of a random variable X. We want to
study the properties of a couple of estimators for the variance. You have an
£ iid sample
¤
of n observations. Let µX = E(X). You know that by definition, σX = E (X − µX )2 .
2
Note that the variance is just an expected value. How do we estimate an expected value?
Using a sample mean! So, one estimator for the variance might be
n
1X
2
σ̂X = (Xi − µX )2 (1)
n i=1
2 2
(a) Show that E [σ̂X ] = σX .
Solution:
· X n ¸ n · ¸
2 1 2 1X 2
E[σ̂X ] =E (Xi − µX ) = E (Xi − µX )
n i=1 n i=1
Note that to use (1) above, you need to know µX . But now suppose that you have
to estimate µX as well. Of course, we will use a sample mean for this parameter
too. Then the estimator for the variance becomes a “two-step estimator ”. In a first
Page 4
step you estimate µX by using X̄, then you plug this estimate into (1) to obtain
2
σ̂X . We know that the sample mean is defined as
n
1X
X̄ = Xi
n i=1
We also know that the sample mean is an unbiased estimator for µX . That is
E(X̄) = µX .
(b) Prove that
2
σX
V ar(X̄) =
n
Solution: µ X n ¶ µXn ¶
1 1
V ar(X̄) = V ar Xi = 2 V ar Xi
n i=1 n i=1
Again, since we have an iid sample the covariances between Xi and Xj are all
zero and therefore we can express the variance of the sum as the sum of the
variances: µ n ¶ µ n ¶
1 X 1 X 2
V ar(X̄) = 2 V ar(Xi ) = 2 σX
n i=1
n i=1
and therefore:
2
σX
V ar(X̄) =
n
(c) Given that you have to use X̄ instead of µX in equation (1), the new estimator
becomes n
2 1 X¡ ¢2
σ̃X = Xi − X̄
n i=1
Show that the above expression can be rewritten as
n
2 1 X£ ¡ ¢¤2
σ̃X = (Xi − µX ) − X̄ − µX (2)
n i=1
Solution: n
2 1X
σ̃X = (Xi − X̄)2
n i=1
Adding and subtracting µX from every term in parenthesis:
n
2 1X
σ̃X = (Xi − µX + µX − X̄)2
n i=1
Page 5
and therefore: n
2 1X
σ̃X = [(Xi − µX ) − (X̄ − µX )]2
n i=1
(d) Expand the square in (2) and rearrange the terms to show that
n n n
1X 1 X¡ ¢2 2 X ¡ ¢
2
σ̃X = (Xi − µX )2 + X̄ − µX − (Xi − µX ) X̄ − µX (3)
n i=1 n i=1 n i=1
Solution: · X n ¸ · ¸
1 2 2
E (X̄ − µX ) = E (X̄ − µX )
n i=1
This is due to the fact that the summation is over terms that do not depend
on i and therefore can be thought of as computing the arithmetic average of a
constant. The resulting expectation is nothing but the variance of X̄ which has
already been computed in part (b):
· X n ¸
1 σ2
E (X̄ − µX ) = X
2
n i=1 n
1
Pn ¡ ¢ ¡ ¢2
(f) Next, prove that n i=1 (Xi − µX ) X̄ − µX = X̄ − 2µX X̄ + µ2X .
Page 6
Solution:
n n
1X 1X
(Xi − µX )(X̄ − µX ) = (X̄ − µX ) · (Xi − µX )
n i=1 n i=1
because the term (X̄ − µX ) does not depend on i it can be taken out of the
summation. Computing the summation of the remaining term yields:
n
1X
(Xi − µX )(X̄ − µX ) = (X̄ − µX ) · (X̄ − µX ) = X̄ 2 − 2X̄µX + µ2X
n i=1
£ ¤ 2
σX
(g) Now show that E X̄ 2 − 2µX X̄ + µ2X = n
. (This is tricky. Think about the
variance of the sample mean.)
Solution: The easiest way to show this is to realize that that the term inside
the expectation is a perfect square,
so that
£ ¤ £ ¤ σ2
E X̄ 2 − 2µX X̄ + µ2X = E (X̄ − µX )2 = V ar(X̄) = X
n
Alternatively, you can calculate this expectation term by term:
· ¸ · ¸ · ¸
E X̄ − 2X̄µX + µX = E X̄ − 2µX · E X̄ + µ2X
2 2 2
and therefore: · ¸ · ¸
E X̄ − 2X̄µX + µX = E X̄ 2 − µ2X
2 2
The term on the RHS is the variance of the sample mean X̄: remember that
V AR(z) = E[(z − µz )2 ] = E[z 2 ] − µ2z .
· ¸
σ2
E X̄ − 2X̄µX + µX = X
2 2
n
(h) Finally, using the results from the previous steps, show that
£ 2¤ n−1 2
E σ̃X = σX
n
Page 7
Solution: Putting these results together:
2
σX σ2 n−1 2
2
E[σ̃X 2
] = σX + −2 X = σX
n n n
2 2
(i) So, is σ̃X a biased or unbiased estimator of the variance σX ? If you think it’s
unbiased, prove it. If not, calculate the bias and find an unbiased estimator.
2 2
Solution: σ̃X is a biased estimator of the variance σX since its expectation has
a different value:
2 2
E[σ̃X ] 6= σX
The bias is given by the difference between the expected value of the estimator
and the true parameter value:
n−1 2 2 1 2
bias = σX − σX = − σX
n n
2
To find an unbiased estimator: start from the expected value of σ̃X :
2 n−1 2
E[σ̃X ]= σX
n
n
Now, multiply both sides of the equality by n−1 :
n 2 2
· E[σ̃X ] = σX
n−1
Therefore, if what is on the LHS was an estimator of the variance it would be
unbiased. The term outside the expected value is a constant and therefore it can
be put inside the expectation (it is the opposite operation of taking a constant
out): · ¸
n 2 2
E · σ̃X = σX
n−1
2
Now let’s substitute σ̃X with its expression:
· n ¸
n 1X 2 2
E · (Xi − X̄) = σX
n − 1 n i=1
Hence,
· n ¸
1 X 2 2
E (Xi − X̄) = σX
n − 1 i=1
As it can be seen a correct estimator of the variance can be obtained by simply
dividing the sum of square deviations from the mean by the sample size minus
one (n − 1) instead that by the sample size (n).
Page 8
2 2
(j) Do you think σ̃X is a consistent estimator for the variance σX ? Justify your answer.
Solution: The formal proof of the consistency of the sample variance can be
found in the Appendix 3.3 to the textbook. Here I’ll provide an “intuitive”
justification for the answer.
Using the law of large numbers (LLN), it is fairly straightforward to see why
n
2 1X
σ̂X = (Xi − µX )2
n i=1
2 2
is a consistent estimator of σX . How? Notice that if P
we define Y
P i = (Xi − µX ) ,
1 1 2
then using the LLN we can conclude that Ȳ = n Yi = n (Xi − µX ) is
a consistent estimator of µY = E(Y ) = E [(Xi − µX )2 ] ≡ σX 2
. Now, since we
p
know that X̄ −→ µX , a similar argument applies to
n
2 1X
σ̃X = (Xi − X̄)2
n i=1
Page 9
µ ¶ Ã n
!
Y1 1 X
E(W3 ) = E +E Yi
2 2(n − 1) i=2
X n
E(Y1 ) 1
= + E(Yi )
2 2(n − 1) i=2
µ µ
= + =µ
2 2
plimW1 = plimȲ = µ
à n
! n
n 1X n 1X
plimW2 = plim Yi = plim × plim Yi
n + 3 n i=1 n+3 n i=1
= 1 × plimȲ = µ
1
Pn
Now, the second part of W3 , 2(n−1) i=2 Yi , converge in probability to 0.5µ.
However, the first part of W3 , Y1 /2, is independent of the sample size n and
therefore does not converge to a constant as n goes to infinity. Therefore, W3
does not converge in probability to µ and thus is an inconsistent estimator of
the mean.
(c) Our old friend from Princeton, Drew, suggested to use eW1 as an estimator of eµ .
Explain whether it is, or is not, a good estimator.
E(eW1 ) 6= eE(W1 ) = eµ
6. (Past Exam Question) As a soon to be college graduate, you are interested in esti-
mating the salary you can expect to receive when you graduate from Duke. Although
you believe you are an above average student, you decide that you would be satisfied
knowing something about the salaries of typical Duke graduates. To help with your
Page 10
estimation, the Dean provides you with a sample of 200 recent Duke graduates. The
sample mean salary is $78,000 and the sample standard deviation is $7,000.
(a) What’s the first thing you would like to know about the sample? Why?
Solution: You would like to know that the sample is in fact a random sample.
If not, the point estimates of the population parameters you calculate will be
biased and so will your confidence intervals and hypothesis tests.
(b) Assuming that the Dean’s answer to the previous question is satisfactory to you,
construct a 95% confidence interval for the mean salary of Duke graduates. Interpret
this confidence interval.
Solution: The CI takes the form X ± zα/2 · √sn = 78000 ± 1.96 · √7000 200
=
(77, 030; 78, 970). The confidence interval is the set of all null hypotheses about
the population mean that would not be rejected at the 5% level. Alternatively,
it is an interval constructed so that there is a 95% chance that the interval will
contain the true population mean µ.
(c) At a 5% level of significance, test the hypothesis that the average starting salary
of Duke graduates is greater than $80,000. Write down the null and alternative
hypotheses. What do you conclude?
(d) Your friend Drew, who is a student at Princeton, obtains a sample of 200 recent
Princeton graduates. The sample mean for the Princeton sample is $75,000 with a
sample standard deviation of $6,800. At a 5% level, test the hypothesis that the
average starting salary of Duke graduates is higher than the average starting salary
of Princeton graduates. What do you conclude?
Page 11
Solution: Now, H0 : µD − µP = 0 vs. HA : µD − µP > 0.
µ q 2 ¶ q 2
sD s2p s s2
The AR takes the form −∞, µD − µP + zα · nD + nP . Since nDD + nPp =
q
70002 +68002
200
≈ 690 the AR is (−∞, 0 + 1.64·) = (−∞, 1132). Here we have
XD − XP = 3000 which lies outside the AR so we can reject that they are equal.
µ q 2 ¶
sD s2p
You could also have calculated the CI XD − XP − zα · nD + nP , ∞ =
(1868, ∞). Since µD − µP = 0 is not in the CI we reject the null.
You could also have calculated the p − value
3000 − 0
p − value = Pr z ≥ q 2 = Pr (z ≥ 4.35) ≈ 0
sD s2p
nD
+ nP
(e) Your friend Drew, who is taking the econometrics course at Princeton, tells you that
the starting salaries of college graduates are not normally distributed, but have an
asymmetric distribution with a long right tail (it might look like an F distribution,
for example). Your friend claims that this should change the results in parts b and
c. Do you agree? Why or why not?
7. Empirical. Please attach STATA output and supporting graphs to your homework!
The file hprice.csv contains data collected from the real estate pages of the Boston
Globe in 1990. (These are homes selling in Boston, MA area). In particular, we have
the following variables: id, house ID number; price, the selling price of the house in
1000$; bdrms, the number of bedrooms; lotsize, the size of the lot in square feet, sqrf t,
the size of the house in square feet; colonial, a dummy variable which is equal to 1 if
the home is colonial style.
(a) Import the dataset using the insheet command or through the Import option under
the File menu.
Page 12
(b) Scatter plot each of the variables against the house id number. Visually examine
the data. Is there any evidence for outliers or errors in the data?
Solution: These are the scatter plots of the data against the house id number:
Price Number of bedrooms
800
7
6
600
5
bdrms
price
400
4
200
3
0
2
0 20 40 60 80 100 0 20 40 60 80 100
id id
4000
80000
3000
40000 60000
lotsize
sqrft
2000
20000
1000
0
0 20 40 60 80 100 0 20 40 60 80 100
id id
Colonial Style
1
.8 .6
colonial
.4 .2
0
0 20 40 60 80 100
id
A possibly problematic observation is the one with a huge lot size, nearly 9
standard deviations above the mean! It is not necessarily a data error, but is
definitely an outlier which should be kept in mind in later empirical work.
(c) Calculate and report the average values and standard deviations for each variable
except house id number. How many houses built in colonial style are in the sample?
Page 13
Solution: The summary statistics for the 5 variables of interest follow:
Variable | Obs Mean Std. Dev.
-------------+-----------------------------------
price | 101 288.3074 97.77324
bdrms | 101 3.544554 .8188376
lotsize | 101 8755.287 9559.392
sqrft | 101 1980.574 551.6402
colonial | 101 .7029703 .4592288
-------------+-----------------------------------
The average value of the binomial variable colonial gives the proportion of
houses built in colonial style. Thus, the number of houses in colonial style is
0.7030 × 101 = 71.
(d) Construct a 95% confidence interval for the mean price of houses. Interpret this
confidence interval.
Solution: The CI takes the form price ± zα/2 · √sn = 288.31 ± 1.96 · √ 97.77
101
=
(269.24; 307.38). The confidence interval is the set of all null hypotheses about
the population mean that would not be rejected at the 5% level. Alternatively,
it is an interval constructed so that there is a 95% chance that the interval will
contain the true population mean µ.
(e) What is the most common number of bedrooms per house in our sample? How
many houses have 5 bedrooms or more? (Hint: use the command tabulate, or tab
for short.)
Page 14
7 + 1 + 1 = 9.
(f) Do bigger houses typically have more bedrooms? Check your answer visually by
looking at the scatter plot of the two variables and compute a sample correlation
coefficient between the number of bedrooms and the house size. In a similar way,
check whether the price of the house co-moves in the expected direction with number
of bedrooms, lot size and house size.
Solution: The correlation of the number of bedrooms per house and its size
is about 0.52, which is positive and pretty large. We can also detect a strong
positive relationship between the two variables from their scatter plot:
Number of bedrooms vs house size
7
6
5
bdrms
4
3
2
Price of the house co-moves very strongly with the size of the house (correlation
coefficient is 0.79), with number of bedrooms (0.50) and less so with the size of
the lot (0.35). The scatter plots confirm our intuition that these variables are
important determinants of the price of the house:
Price of the house vs house size Price of the house vs number of bedrooms
800
800
600
600
price
price
400
400
200
200
0
Page 15
Price of the house vs lot size
800
600
price
400
200
0
Notice how the lot size outlier influences the results in the last graph: the
correlation of price with the size of the lot increases twofold to 0.68 if we omit
this problematic observation.
(g) What is the average price of the house if it is built in colonial style? What is the
average price of the house if it is not built in colonial style? (Hint: think about the
option if in summarize, sum for short, command). Our friend Drew thinks that the
colonial style houses are more expensive than the ones not built in colonial style.
Test this hypothesis at 5% level.
Solution: There are 71 houses built in colonial style. Their average price is
295.34$ and standard deviation is 93.76$. Similarly, we have 30 houses not built
in colonial style, with the average price of 271.67$ and standard deviation of
106.48$.
We want to test the null that
H0 : µ(colonial) = µ(notcolonial)
against the alternative
H1 : µ(colonial) > µ(not colonial)
We can calculate the standard error of the difference in sample averages of house
prices in the following way:
s r
s2c s2nc 93.762 106.482
SE(pricec − pricenc ) = + = + = 22.40
nc nnc 71 30
Page 16
As t-statistics is less than 1.65, the critical value for a one-sided test with a 5%
significance level, we cannot reject the null. Alternatively, the p-value, computed
as 1 − Φ(t) = 0.14, is higher than the significance level, so we cannot reject the
null. Therefore, we are unable to confirm our friend’s intuition that the colonial
style houses are more expensive.
Page 17