CUHK STAT3004 Assignment2 Solution
CUHK STAT3004 Assignment2 Solution
Cheng Li
16 Oct 2020
1 Problem 1
From the problem, we can use one-sample t-test as the parametric test and Wilcoxon signed-rank test as
non-parametric test. So let di = xi − yi and δ be the median of {di }, where xi is the i-th BMD for femoral
neck of ligher-smoking twin and yi is the i-th BMD for fermoral neck of heavier- smoking twin. Therefore,
we can get two hypothesis:
H0 : µ = 0,
H1 : µ 6= 0,
and
H0 : δ = 0,
H1 : δ 6= 0,
where µ means the mean of di . With R, we can easily get p-value of one-sample t-test is 0.96 > 0.05 and
p-value of Wilcoxon signed-rank is 0.856 > 0. Thus, we can not reject either of null-hypothesis.
2 Problem 2
2.1 (a)
We can use the sign test. It is easily for us to get that the test statistics C is 15 and n is 23. So, we use the
normal theory test. The rejection region is given by C > cupper or C < clower, where
r
n 1 n
cupper = + + z0.975
2 2 4
= 16.7,
r
n 1 n
clower = − − z0.975
2 2 4
= 6.3.
Therefore, we can not reject H0 at the 5% level. But we assumed that the periodontal status of patients
would remain unchanged in the absence of the program, which is a questionable assumption. A better study
design would involve following a control group over 6 months who did not receive the education program
and comparing results in the two groups.
2.2 (b)
From the question, we can get the sum rank R of d+ is 185. So, the test statistics is given by
n(n+1)
R − 4 − 0.5
T =q
n(n+1)(2n+1) (t3 −t )
− gi=1 i48 i
P
24
= 1.436 ∼ N (0, 1) under H0
So the p-value is 0.151. Thus, the periodontal status of the patients has not significantly changed over time,
even when accounting for the magnitude of improvement or decline.
2.3 (c)
The normal theory test can be used, since min(n1 , n2 ) = 12 ≤ 10. The test statistic is given by
n (n +n +1)
R − 1 1 2 2 − 0.5
T = q
n1 n2 (n1 +n2 +1)
12
= 2.513 ∼ N (0, 1) under H0
Thus, the p-value is 0.012 less than 0.05. So, we should reject H0 .
3 Problem 3
3.1 (a)
We should use the chi-square test. and the hypothesis is given by
H0 : p1 = p2
H1 : p1 6= p2 ,
where
3.2 (b)
We can form the 2*2 table relating outcome to group as follows:
2
The expected counts under the null hypothesis are as follows:
E11 = 22.65,
E12 = 53.35,
E21 = 22.35,
E22 = 52.65.
Since all expected counts are not less than 5, we can use the chi-square test for 2*2 tables. So the statistic
is given by
(|4 − 22.65| − 0.5)2 (|72 − 53.35| − 0.5)2 (|41 − 22.35| − 0.5)2 (|34 − 52.65| − 0.5)2
χ2corr = + + +
22.65 53.35 22.35 52.65
2
= 41.71 ∼ χ1 under H0 .
Since χ21,0.999 = 10.83 < 41.71, which means that p < 0.001. Thus, there is a highly significant difference
in prevalence between the 2 groups.
4 Problem 4
4.1 (a)
We can form a 2*2 table relating the type of bird to the type of sunflower seeds eaten:
Type of seed
Type of Bird black oil striped total
Titmouse 1 4 5
Gold Finch 19 5 24
Total 20 9 29
H0 : p1 = p2 ,
H1 : p1 6= p2 ,
where p1 is the proportion of titmice who prefer black oil seeds and p2 is the proportion of gold finches who
prefer black oil seeds.
3
4.2 (b)
To perform this test, we need to enumerate all possible tables with the same row and column margins as the
observed table:
0 5
20 4
1 4
19 5
2 3
18 6
3 2
17 7
4 1
16 8
5 0
15 9
We can get P r(0) = 0.001,P r(1) = 0.021,P r(2) = 0.134,P r(3) = 0.346,P r(4) = 0.367,P r(5) =
0.131. Since the observed table is the ”1” table, the two-tailed p-value = 2∗(0.001+0.021) = 0.045.Therefore,
we should reject H0 at the level 0.05.
4.3 (c)
We display the observed and expected counts in a 2*4 table as shown below( expected counts in parentheses):
Day
1 2 3 4 Total
19 14 9 45
Type of black oil 87
(14.2) (14.2) (8.88) (49.71)
Seed
5 10 6 39
striped 60
(9.8) (9.8) (6.12) (34.29)
Total 24 24 15 84 147
The smallest expected value is 6.1 ¿ 5. Thus, we can use the chi-square test for R*C tables to test the
hypothesis
H0 : p1 = p2 = p3 = p4 ,
H1 : at least two of the pi are dif f erent,
where pi s are proportion of gold finches who prefer black oil seeds on the ith day, i = 1, ..., 4.
4.4 (d)
The expected value for the Eij cell( listed in parentheses in the above table) is obtained from Eij =
Ri Cj /N, i = 1, 2; j = 1, 2, 3, 4, where Ri is i-th row total, Cj is j-th column total. So we have the
test statistic χ2 = 5.07 ∼ χ23 under H0 . Since 5.07 is less than χ23,0.95 , p-value is larger than 0.05. Thus,
there is no significant difference in feeding preferences by day.