Exercise 6
Exercise 6
Exercise 1
Let X1, X2, . . , Xn be a simple random sample of X with μ=E[X]. Consider the following estimator of μ:
n
1 1
μ̌= ∑
n i=1
Xi+
n
(a) Explain the difference between μ̌ and the “usual” estimator of the mean μ̌= x̄ .
n
1
Let ^μ =X= ∑ X be the sample mean, i . e .the usual estimator of μ .
n i=1 i
1
We can write μ̌=^μ +
n
1
That is, μ̆ is the “usual” estimator ^μ plus an additional term .
n
1
This is a bad estimator. Why would you add the term ? But it is here to show
n
you an example of an estimator that is biased but consistent.
Recall that the sample mean is an unbiased estimator of μ, i.e. E ( ^μ )=μ . Using this, and
the rules for calculation with expectations, we get.
( 1n )=E ( μ^ ) + E ( 1n )=μ+ 1n
E ( μ̌ )=E ^μ +
1
Hence, we conclude that E ( μ̌ )=μ+ ≠ μ .So μ̌ is a biased estimator of μ
n
1
The bias is E ( μ̌ )−μ=
n
Using the rules for calculations with variances (in particular that additive constant do not
alter the variance), we get:
V ( μ̌ )=V μ^ +( 1n )=V ( μ^ )
This shows that that themean of two estimators could be different whiletheir variances are the same .
(d) Argue that μ̌ is a consistent estimator of μ.
1
E ( μ̌ )=μ+
n
1
lim E ( μ̌ )=¿ lim μ+ ¿ μ ¿
n→∞ n→∞ n
2
σ
V ( ^μ )=
n
2
lim σ
n →∞
lim V ( μ̌ )=¿ ¿0¿
n→∞ n
Thus, we have two conditions for estimator’s consistency are fulfilled.
We can conclude that even though an estimator is biased it is also consistent.
(e) Compare the two estimators, ^μ and μ̌, in terms of the properties of estimators
(unbiasedness, efficiency, and consistency). Based on this, which of the two estimators would
you recommend using in practice?
Both estimators are consistent. They are also equally efficient, since V (ˆμ) = V (ˇμ). But ˇμ is
biased, while ˆμ is unbiased. Hence, we would prefer to use ˆμ, and not ˇμ, in practice.
Exercise 2
A school nurse selects a simple random sample of 200 children’s heights from a school with a
population of 2000 pupils. Assume that the (unknown) population’s mean value is μ = 140cm, and
that its variance is σ2 = 64.
(b) Using the distribution from your last answer, compute the probability of obtaining a sample
with an average of less than 136cm.
(c) Find the probability that the sample average will be within 2cm of the population’s mean
value.
Exercise 3
In this exercise, you are asked to analyze the outcome of a slot machine. Let the winnings from one
game in the slot machine be denoted by the random variable X. Let μ denote the mean of X, i.e.
μ = E(X), and σ 2 denote the variance of X, i.e. σ 2 = Var(X)
(a) Use R to load the data set in the file “tutorial6 slot1.csv”. The file contains the outcome of a
simple random sample of outcomes from the slot machine, X1, X2, . . . , Xn. What is the
sample size n?
barplot(table(dat1))
From a simple random sample of X, we may estimate the mean of X, μ = E(X), via the sample
average. For this, we use the R command mean (note that we should access the values in
dat1 with the syntax dat1$X):
mean(dat1$X)
Our estimate of the mean of X is ˆμ = 9.86
(e) Also calculate the standard error of the estimate in 3c, SE(ˆμ).
sqrt(6.04/50)
The standard error is 0.3475629
(f) Re-do questions 3a–3e using the simple random sample in the file “tutorial6 slot2.csv”.
Comment on the difference to the answers from above.
(g) Re-do questions 3a–3e using the simple random sample in the file “tutorial6 slot3.csv”.
Comment on the difference to the answers from above.
Exercise 4
In Exercise Set 5, you analyzed a sample of income data in the file “cps15 stat sample1.csv”.
We now analyze these data again.
(a) In Exercise Set 5, you estimated the mean income and mean age of the population.
Now, calculate also the standard errors of your estimates.
mean(dat$hhincome)
82785.79
mean(dat$age)
42.39437
sqrt(var(dat$hhincome)/nrow(dat))
9267.176
sqrt(var(dat$age)/nrow(dat))
0.9750641
The standard error on the estimate of the mean income in the population is 9267 and
The standard error in the estimate of the mean age in the population is 0.98.
(b) Load the data set “cps15 stat sample2.csv” and let n be the number of observations.
What is n?
(c) Use this new data set to estimate the mean income and mean age in the population.
Calculate also the standard errors of your estimates.
mean(dat2$hhincome)
## [1] 83025.41
mean(dat2$age)
## [1] 42.93
sqrt(var(dat2$hhincome)/nrow(dat2))
## [1] 2565.261
sqrt(var(dat2$age)/nrow(dat2))
## [1] 0.3387179
(d) Comment on the estimates and standard errors found in 4a and 4c. What accounts for
their differences?
Exercise 5
Consider again the following opinion poll from Exercise Set 5 (note: n = 1003 people re-
sponded to the poll):
(a) In Exercise Set 5, you estimated pG, the fraction of the population that will vote for the
“government parties” (A, M, V). Based on the opinion poll, provide the standard error
of your estimate.
(b) Assume that the opinion poll came from a sample of size n = 10 000 instead of n = 1003.
Provide the standard error of your estimate in this case.
(c) Comment on the reason for the difference between the two standard errors calculated
in 5(a) and 5(b). What can you conclude from the two standard errors?
The standard error is much lower in 5(c) compared to 5(a). This is a consequence of the larger
sample sizes in 5(c). Since the standard error is lower in 5(c), we can be more confident in this
estimate than the estimate in 5(a). (This holds even though the two estimates are actually
equal, i.e. they are both 43%. The second estimate is more precise, in the sense that we are
more confident that it is closer to the true value pG.)