Solutions Stat CH 5
Solutions Stat CH 5
Exercise 1
a. We want to compare two population proportions with two independent samples:
𝑝̂1 (1−𝑝̂1 ) 𝑝̂2 (1−𝑝̂2 )
use the formula 𝑝̂1 − 𝑝̂2 ± 𝑐√ 𝑛1
+ 𝑛2
, with Φ(𝑐) = 1 − 12𝛼, where:
140 100
𝑛1 = 500, 𝑛2 = 500, difference in proportions 𝑝̂1 − 𝑝̂2 = 500 − 500 = 0.08, standard error
𝑝̂1 (1−𝑝̂1 ) 𝑝̂2 (1−𝑝̂2 )
√ 𝑛1
+ 𝑛2
= 0.0269 and 𝑐 = 2.575 is such that Φ(𝑐) = 1 − 12𝛼 = 0.995
99%-CI(𝑝1 – 𝑝2 ) = (0.08 − 2.575 × 0.0269, 0.08 + 2.575 × 0.0269) ≈ (0.011, 0.149)
b. If we would use this interval we can state that there is a significant proportional difference: at a
confidence level of 99% the difference in mortality rate is between +1.1% and +14.9%, so the difference
0% (= 𝑝1 – 𝑝2 if 𝐻0 is true) is excluded by this interval.
140 𝑝̂1 (1−𝑝̂1 )
c. For 𝑝̂1 = 500 = 0.28 we have: 𝑝̂1 ± 𝑐√ = 0.28 ± 0.052, so 99%-CI(𝑝1 ) = (0.228, 0.332)
𝑛1
100 𝑝̂2 (1−𝑝̂2 )
And for 𝑝̂2 = 500 = 0.2: 𝑝̂ 2 ± 𝑐√ = 0.20 ± 0.046, so 99%-CI(𝑝2 ) = (0.154, 0.246)
𝑛2
These intervals show some overlap, which implies that possibly 𝑝1 = 𝑝2.
Remark: it is best to base a statement on the difference 𝑝1 − 𝑝2 on an interval estimate of 𝑝1 − 𝑝2, and
not on two separate intervals of 𝑝1 𝑎𝑛𝑑 𝑝2,respectively.
Exercise 2
1. Let 𝑋1 and 𝑋2 be the numbers of rats, that died among the untreated and the treated rats, resp.
𝑋1 and 𝑋2 are independent and 𝐵(500, 𝑝1 )- resp. 𝐵(500, 𝑝2 )-distributed with mortality rates 𝑝1 and 𝑝2 .
2. We test 𝐻0 : 𝑝1 = 𝑝2 versus 𝐻1 : 𝑝1 > 𝑝2 , with 𝛼 = between 1% and 10%.
𝑝̂1 −𝑝̂2 𝑋 +𝑋
3. Test statistic: 𝑍 = 1 1
with 𝑝̂ = 𝑛1 +𝑛2
√𝑝̂(1−𝑝̂)(𝑛 +𝑛 ) 1 2
1 2
4. Under 𝐻0 𝑍 has a 𝑁(0,1)-distribution.
140+100 0.28−0.20
5. Outcome of 𝑍: 𝑝̂ = 500+500 = 0.24, so: 𝑧 = 1 1
≈ 2.96.
√0.24∙0.76( + )
500 500
6. Reject 𝐻0 if the p-value = 𝑃(𝑍 ≥ 2.96) ≤ 𝛼. The p-value = 1 - Φ(2.96) = 1 – 0.9985 = 0.15%
(Or: reject H 0 if 𝑍 ≥ 𝑐: Significance level α between 1% and 10%, so 𝑐 is between 1.28 and 2.33)
7. The p-value is less than every value of α between 1% and 10% , so reject 𝐻0 .
(Or: 𝑧 =2.96 is not in the Rejection Region for all α between 1% and 10% , so reject 𝐻0 .)
8. We consider the statement that the mortality rate of rats decreases when using the homeopathic
medicine, to be proven at all levels of significance between 1% and 10%.
Exercise 3
This exercise deals with a comparison of two population proportions. Using the testing procedure in 8 steps:
1. We define 𝑋1 and 𝑋2 to be the numbers of dissertations within 6 years among 229 female and 795 male
PhD`s, resp. 𝑋1 and 𝑋2 are independent and both binomially distributed with success probabilities
𝑝1 and 𝑝2 .
2. Test 𝐻0 : 𝑝1 = 𝑝2 against 𝐻1 : 𝑝1 ≠ 𝑝2 with α = 5%.
𝑝̂1 −𝑝̂2 𝑋 +𝑋
3. Test statistic: 𝑍 = 1 1
with 𝑝̂ = 𝑛1 +𝑛2
√𝑝̂(1−𝑝̂)(𝑛 +𝑛 ) 1 2
1 2
4. Under 𝐻0 𝑍 has a 𝑁(0,1)-distribution.
98 423
98 + 423 −
229 795
5. Observed value of 𝑍: 𝑝̂ = 229 +795 = 0.509, so: 𝑧 = 1 1
≈ −2.77.
√0.509 ∙ 0.491( + )
229 795
6. Two-sided test: reject 𝐻0 if 𝑍 ≤ −𝑐 or 𝑍 ≥ 𝑐 Significance level 5%: Φ(𝑐) = 0.975 if 𝑐 = 1.96
7. The observed value -2.77 lies in the rejection region, so reject 𝐻0 .
8. We showed, at a 5% level of significance, that the proportions of dissertations within 6 years for female
and male PhD’s are different.
Exercise 4
𝑝̂1 (1−𝑝̂1 ) 𝑝̂2 (1−𝑝̂2 )
The condition for the confidence interval = the width = 2 ∙ 𝑐√ + ≤ 0.02,
𝑛1 𝑛2
1
where 𝑐 = 1.96, 𝑛1 = 𝑛2 = 𝑛 and 𝑝(1 − 𝑝) ≤ .
4
1/4 1/4 1.96
1.96√ + = ≤ 0.01, so √2𝑛 ≥ 196, implying: 𝑛 ≥ 19208
𝑛 𝑛 √2𝑛
Exercise 5
a. Obviously we have two independent samples, drawn from two (separate) subpopulations of students, one
subpopulation has to answer questions before and the other after the introduction. We will try to
compare the mean scores µ1 and µ2 in those subpopulations by assessing the means in the samples.
b. This is a case of paired samples: there is only one population and one person is assessed twice: pairwise
dependence of the observations. We are interested in the issues of a (systematic) difference µ of the two
scores.
c. We have one random sample in this case.
d. The researcher evaluates two methods of measuring a fluid with a fixed concentration: the outcomes of
all measurements are independent. If there is a systematic difference in the outcome of the two methods,
that is, in µ1 and µ2, the samples will show a difference in sample means. We will have to conduct a test
on the difference µ1 - µ2, based on two independent samples.
Exercise 6
a. Let 𝑋1 , 𝑋2 , … , 𝑋9 the crop quantities of variety 𝐴 and 𝑌1 , 𝑌2 , … , 𝑌11 the crop quantities of variety 𝐵.
1 1
Let us notate the sample means as 𝑋1 = 9 ∑9𝑖=1 𝑋𝑖 and 𝑋2 = 11 ∑11 2
𝑗=1 𝑌𝑗 and the sample variances as 𝑆1
and 𝑆22 .
Observed values (simple calculator!): 𝐴: 𝑥1 = 35.0 and 𝑠1 = 2.598 and 𝐵: 𝑥2 = 39.0 and 𝑠2 = 3.286.
The standard deviations differ less than a factor 2, indicating, according to the rough rule of thumb, that
the variances can be assumed equal.
b. To be more precise we will conduct the F-test to get the assumption of equal variance confirmed (or,
more accurate, to show that the assumption of equal variances is not rejected):
1. Probability model: the crop quantities 𝑋1 , … , 𝑋9 , 𝑌1 , … , 𝑌11 are independent with 𝑋𝑖 ~𝑁(𝜇1 , 𝜎12 ) and
𝑌𝑗 ~ 𝑁(𝜇2 , 𝜎22 ).
2. Test 𝐻0 : 𝜎12 = 𝜎22 (or 𝜎1 = 𝜎2 ) against 𝐻1 : 𝜎12 ≠ 𝜎22 with 𝛼 = 5%.
𝑆2
3. Test statistic 𝐹 = 𝑆12 .
2
9−1
4. Distribution under 𝐻0 : 𝐹 ~ 𝐹11−1
𝑠2 2.5982
5. Observed value: 𝐹 = 𝑠12 = ≈ 0.625
2 3.2862
6. We have a two-sided test: reject 𝐻0 if 𝐹 ≤ 𝑐1 or 𝐹 ≥ 𝑐2.
8 𝛼 8
𝑃(𝐹10 ≥ 𝑐2 ) = 2 = 0.025 , so according to the 𝐹10 : 𝑐2 = 3.85
8 1 𝛼 1
𝑃(𝐹10 ≤ 𝑐1 ) = 𝑃 (𝐹810 ≥ 𝑐 ) = = 0.025, so 𝑐 = 4.30 , or 𝑐1 ≈ 0.23
1 2 1
7. Since 𝐹 = 0.625 does not lie in the Rejection Region, we cannot reject 𝐻0 .
8. At a significance level of 5% we cannot prove that the variances of the crop quantities are different.
c. 1. Model assumptions (“statistical assumptions”):
We have two independent random samples of crop quantities here, one drawn from a 𝑁(𝜇1 , 𝜎 2 )-
distribution for variety 𝐴 and the other from a 𝑁(𝜇2 , 𝜎 2 )-distribution for variety 𝐵 (equal σ’s!)
(Stated more formally and shorter: the crop quantities 𝑋1 , … , 𝑋9 , 𝑌1 , … , 𝑌11 are independent, where
𝑋𝑖 ~𝑁(𝜇1 , 𝜎 2 ) and 𝑌𝑗 ~𝑁(𝜇2 , 𝜎 2 ).)
2. We will test 𝐻0 : 𝜇1 = 𝜇2 against 𝐻1 : 𝜇1 ≠ 𝜇2 with α = 5%
𝑋1 −𝑋2 8𝑆12 +10𝑆22
3. Test statistic 𝑇 = with S 2 =
1 1 9+11−2
√𝑠2 ( + )
9 11
4. 𝑇 is under 𝐻0 𝑡-distributed with 𝑑𝑓 = 𝑛1 + 𝑛2 − 2 = 18
8×2.5982 + 10×3.2862 35.0−39.0
5. Observed: 𝑠 2 = ≈ 9.00, so 𝑡 = = −2.97
18 1 1
√9.00( + )
9 11
6. This test is two-tailed: reject 𝑯𝟎 if 𝑻 ≤ −𝒄 or 𝑻 ≥ 𝒄.
where 𝑐 = 2.101, taken from the 𝑡18 -table.
7. 𝑡 = -2.97 lies in the Rejection Region, so reject 𝐻0 .
8. The mean crop quantities of the two varieties are significantly different at a 5% level.
6./7. Using the p-value at the observed 𝑡 = −2.97: 𝑝 − value = 2 ∙ 𝑃(𝑇 ≥ |𝑡|) = 2 ∙ 𝑃(𝑇18 ≥ 2.97)
𝑃(𝑇18 ≥ 2.97) lies between 0.1% and 0.5%, so the p-value is between 0.2% and 1% < α: reject 𝐻0 .
1 1
d. We can use the formula of the interval bounds: 𝑋1 − 𝑋2 ± 𝑐√𝑆 2 (𝑛 + 𝑛 ) , in which
1 2
𝑛1 = 9, 𝑛2 = 11, 𝑥1 = 35.0 𝑥2 = 39.0 (a.),
and (see c.): 𝑠 2 = 9.00 and 𝑐 = 2.101 such that 𝑃(𝑇18 ≥ 𝑐) = 0.025.
1 1 1 1
( 𝑐√𝑆 2 (𝑛 + 𝑛 ) = 2.101 ∙ √9.00 (9 + 11) ≈ 2.83 )
1 2
So: 95%-CI(𝜇1 − 𝜇2 ) ≈ (-4.0 – 2.8, -4.0 + 2.8) = (-6.8, -1.2)
e. The difference 0 of 𝜇1 − 𝜇2 is not contained in the confidence interval in d.: it is completely negative. So
“at a confidence level of 95%” one can state that the expected crop quantities are different, confirming
the conclusion in c.
Exercise 7
a. We have 2 observations for each store: we need to apply a paired samples approach: we will consider the
differences 𝑍𝑖 = 𝑌𝑖 − 𝑋𝑖 only and apply the 8 steps procedure:
1. The differences (𝒃𝒆𝒇𝒐𝒓𝒆 − 𝒂𝒇𝒕𝒆𝒓) 𝑍1 , 𝑍2 , … , 𝑍7 are independent and all 𝑁(𝜇, 𝜎 2 )-distributed, where
the expected difference µ is unknown and the variance 𝜎 2 of the differences in unknown as well.
(the notation of the mean will be 𝑧 = 516.0 and the accompanying standard deviation is 𝑠𝑍 = 622.7)
2 . We will test 𝐻0 : 𝜇 = 0 against 𝐻1 : 𝜇 > 0 with 𝛼0 = 0.05.
𝑍 𝑍
3 . Test statistic: 𝑇 = 𝑆𝑍 = 𝑆𝑍
⁄ ⁄
√𝑛 √7
4 . Under 𝐻0 : 𝑇~ 𝑡6
516.0
5 . Outcome of 𝑇: 𝑡 = 622.7 ≈ 2.19
⁄
√7
6 . We will reject 𝐻0 if 𝑇 ≥ 𝑐.
Since 𝛼0 = 0.05, we will find in the 𝑡6 -table: 𝑐 = 1.943.
7 . Outcome 𝑡 = 2.19 lies in the rejection region ⟹ reject 𝐻0 .
8 . At a 5% significance level we have proven that the sales numbers after the ad campaign
have increased.
b. Reject 𝐻0 : 𝜇 = 0 in favour of 𝐻1 : 𝜇 ≠ 0 if 𝑇 ≥ 2.447 or 𝑇 ≤ −2.447,
so in this case we will not reject 𝐻0 (“not sufficient proof of different sales numbers before and after the
ad campaign”, at a 5% level of significance).
Exercise 8
a. Since the two groups of cockerels are treated differently we can assume that the samples are
independent. If both samples are random and drawn from normal distributions with the same variance,
we can apply the 2 samples 𝑡 −procedure with equal variances.
a. We will apply the following formula (on the formula sheet!), interchanging 𝑋̅ and 𝑌̅ (such that 𝑌̅-𝑋̅ is
positive):
1 1 1 1
(𝑌̅ − 𝑋̅ − 𝑐√𝑆 2 (20 + 20) , 𝑌̅ − 𝑋̅ + 𝑐√𝑆 2 (20 + 20)), where
𝑐 = 1.68 from the t-table with df = 20 + 20 – 2 = 38 (using the 𝑡40 − table)
2 2
(𝑛1 −1)𝑠𝑋 +(𝑛2 −1)𝑠𝑌 19×50.802 +19×42.732
𝑠2 = = = 2203.25 (= 46.942 , 𝑠𝑜 𝑠 𝑙𝑖𝑒𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠1 𝑎𝑛𝑑 𝑠2 )
𝑛1 + 𝑛2 −2 38
Result after substitution: 90%-CI(𝜇1 − 𝜇2 ) = (38.45 − 24.94, 38.45 + 24.94) = (13.51, 63.39)
b. The assumptions are, in detail:
1. 2 independent random samples: 𝑋1 , … , 𝑋20 𝑎𝑛𝑑 𝑌1 , … , 𝑌20 are independent.
2. 𝑋1 , … , 𝑋20 is a random sample drawn from the 𝑁(𝜇1 , 𝜎12 ) − distribution
3. 𝑌1 , … , 𝑌20 is a random sample drawn from the 𝑁(𝜇2 , 𝜎22 ) − distribution
4. The variances are equal: 𝜎12 = 𝜎22
For short: 𝑋1 , … , 𝑋20 , 𝑌1 , … , 𝑌20 are independent and 𝑋𝑖 ~ 𝑁(𝜇1 , 𝝈𝟐 ) and 𝑌𝑗 ~ 𝑁(𝜇2 , 𝝈𝟐 ).
We will apply the testing procedure for the 𝐹-test to check the “equal variances”-assumption:
1. Assumptions: the increase of the weights 𝑋1 , … , 𝑋20 , 𝑌1 , … , 𝑌20 are independent and
𝑋𝑖 ~𝑁(𝜇1 , 𝜎12 ) and 𝑌𝑗 ~𝑁(𝜇2 , 𝜎22 )
2. Test 𝐻0 : 𝜎12 = 𝜎22 (or 𝜎1 = 𝜎2 ) against 𝐻1 : 𝜎12 ≠ 𝜎22 with 𝛼 = 10%
2
𝑆𝑋
3. Test statistic 𝐹 = 𝑆𝑌2
20−1
4. Distribution under 𝐻0 : 𝐹 ~ 𝐹20−1
2
𝑠𝑋 50.82
5. Observed value: 𝐹 = 2 = 42.732 ≈ 1.41
𝑠𝑌
6. It is a two-sided test: reject 𝐻0 if 𝐹 ≤ 𝑐1 or 𝐹 ≥ 𝑐2 .
19 𝛼 20
𝑃(𝐹19 ≥ 𝑐2 ) = 2 = 0.05 , so (according to the 𝐹19 -table) 𝑐2 = 2.16
20 15
(or using interpolation of the table values in 𝐹19 and 𝐹19 : 𝑐2 = 2.17)
19 19 1 𝛼 1
𝑃(𝐹19 ≤ 𝑐1 ) = 𝑃 (𝐹19 ≥ 𝑐 ) = 2 = 0.05, so 𝑐 = 2.16 , or 𝑐1 ≈ 0.46
1 1
7. Since 𝐹 = 1.41 is not in the Rejection Region, we will not reject 𝐻0 .
8. The variances of the increase of the weights are not statistically significantly different at a 10%
significance level.
Exercise 9
The observed numbers of products in stock are obtained for one store chain per pair, so we will consider the
difference of the numbers in stock with and without import quota for each store chain, “without – with”:
51 8 2 -4 -2 11 29 4
The testing procedure results in the following 8 steps:
1. The differences 𝑋1 , … , 𝑋8 are independent and 𝑁(𝜇, 𝜎 2 )-distributed.
2. Test 𝐻0 : 𝜇 = 0against 𝐻1 : 𝜇 > 0 with 𝛼0 = 0.05 (one-sided test).
𝑋
3. Test statistic: 𝑇 = 𝑆/
√𝑛
4. Under 𝐻0 : 𝑇~ 𝑡7 .
5. The observed value of 𝑇 can be computed by first calculating: 𝑥 = 12.38 and 𝑠 = 18.68,
12.38
so: 𝑡 = 18.68/√8 ≈ 1.875
6. Reject 𝐻0 if 𝑇 ≥ 𝑐. The level of significance is 5%, so from the 𝑡7 -table we find: 𝑐 = 1.895
7. The observed 1.875 does not lie in the Rejection Region: we cannot reject 𝐻0 .
8. We cannot consider proven at a 5% level of significance that the number of colour televisions in
stock has increased (on averaged) after abolition of the import quota.