Midterm2021R1 Sol PDF
Midterm2021R1 Sol PDF
September–December, 2021.
Instructor: Jiahua Chen
Total marks: 70
• Put your name and student ID on the up-right corner of every sheet.
• You must write your solutions on paper (not on a device). Use a camera or
scanner to upload your written solutions to Canvas as a PDF.
• Correct answers are usually short. Please be brief and answer the questions
with complete sentences.
1
1. [6] Name and explain (in complete sentences) the three principles in
design of experiments that were emphasized in STAT 404 lectures.
Use your own discretion to decide how much to write based on the
following example:
Random variable: a function defined on sample space. It provides a
numerical summary of the output of an experiment.
Answer
The three principles in design of experiments are as follows:
2
2. Consider a two-sample problem based on self-reported body temper-
atures of 15 male and 20 female students. Suppose that the equal
variance assumption holds. We wish to test the equal mean tempera-
ture null against the one-sided alternative that male students have a
higher mean body temperature at nominal level 4%.
A student mistakenly uses the critical value from the 4% level two-sided
t-test as the critical value in their one-sided t-test.
(a) [2] What is the actual size of the type I error of the student’s test?
Answer
The actual size of the type I error is 2% because the critical value
in the 4% level two-sided t-test is the upper 98% quantile of the t
distribution. The probability that the test statistic is greater than
the 98% quantile is 0.02 (2%).
(b) [2] If the alternative hypothesis holds in the specific application,
is the power of the student’s test higher or lower than the correct
one-sided t-test?
Answer
If the alternative hypothesis holds, the power of the student’s test
is lower than the correct one-sided test because the probability of
the test statistic being larger than the 98% quantile critical value
is smaller than the probability of it being larger than the correct
96% quantile critical value.
3
3. A linear regression model assumes that the response values in an ex-
periment can be expressed as
yi = x>
i β + i
(a) [4] For the given data set, obtain the least squares estimates of β.
Answer
The estimates of coefficients (β0 , β1 , β2 )> are given by
β̂ = (X > X)−1 X T Y = (169.460, 5.946, 0.404)> .
R code:
invXTX = solve(t(X) %*% X)
beta.hat = invXTX %*% t(X) %*% Y
(b) [4] Estimate the error variance σ 2 (use the method given in class).
Answer
The estimate of the error variance σ 2 is given by the residual mean
sum of squares
(Y − X β̂)T (Y − X β̂)
σ̂ 2 = = 34.0188.
(N − k − 1)
4
R code:
N = nrow(X); k = ncol(X)-1
resid = Y - X %*% beta.hat
MSS.error = sum(resid^2) / (N-k-1)
(c) [4] Estimate the variance matrix of β̂.
Answer
The variance matrix of β̂ is given by
Var(β̂) = (X > X)−1 X > (Var(Y ))X(X > X)−1 = σ 2 (X > X)−1 .
Thus, an estimate is given by
24.3488 1.1793 −0.8704
d β̂) = σ̂ 2 (X > X)−1
Var( = 1.1793 1.0305 −0.0408 .
−0.8704 −0.0408 0.0324
R code:
var.beta = solve(t(X) %*% X) * MSS.error
(d) [4] Construct 95%, two-sided individual (not simultaneous) CIs
for β1 and β2 (effects of gender and foot-length). Hint: remember
the general recipe for CI construction.
Answer
Consider a two-sided t-test with H0 : βi = 0, H1 : βi 6= 0 for
i ∈ {1, 2}. Note that the test statistic is given by
β̂i
Ti = ,
SE(β̂i )
which follows a distribution tN −k−1 under H0 . Therefore, the CI
for β̂1 is given by
β̂1 ± t0.975,32 × SE(β̂1 ) = (3.8783, 8.0138)
and the CI for β̂2 is given by
β̂2 ± t0.975,32 × SE(β̂2 ) = (0.03684, 0.77038).
R code:
5
# beta1
beta.hat[1+1] - qt(1-0.05/2, N-k-1)*var.beta[2,2]^.5;
beta.hat[1+1] + qt(1-0.05/2, N-k-1)*var.beta[2,2]^.5
# beta2
beta.hat[1+2] - qt(1-0.05/2, N-k-1)*var.beta[3,3]^.5;
beta.hat[1+2] + qt(1-0.05/2, N-k-1)*var.beta[3,3]^.5
Var(β̂2 − β̂1 ) = Var((0, −1, 1)β̂) = (0, −1, 1)Var(β̂)(0, −1, 1)T .
Thus,
6
4. Consider a hypothetical one-way layout comparing k = 5 treatments.
The response values are given as follows:
R code:
y = c(yy1, yy2, yy3, yy4, yy5)
trt =
as.factor(rep(1:5, c(
length(yy1),
length(yy2),
length(yy3),
length(yy4),
length(yy5)
)))
anova_data = data.frame(y = y, trt = trt)
bar.y = mean(anova_data$y)
bar.yi = tapply(anova_data$y, INDEX = anova_data$trt, FUN = mean)
n = length(y)
ni = summary(anova_data$trt)
k = length(ni)
ss.trt = sum(ni * (bar.yi - bar.y) ^ 2)
7
(b) [4] Compute the error sum of squares (also called residual), SS(err).
Answer
By definition,
ni
5 X
X
SS(err) = SS(total)−SS(trt) = (ȳij −ȳ.. )2 −SS(trt) = 32.55049.
i=1 j=1
R code:
ss.total = sum((y - bar.y)^2)
ss.error = ss.total - ss.trt
Source DF SS MSS F
Treatment
Error
Total
Answer
Source DF SS MSS F
Treatment 4 108.4607 27.1152 23.3245
Error 28 32.5505 1.1625
Total 32 141.0112
R code:
mss.trt = ss.trt / (k - 1)
mss.error = ss.error / (n - k)
f.test = mss.trt / mss.error
8
5. Continuing the last problem:
(a) [4] Test the hypothesis that all treatment means are equal at the
5% level. Be sure to state the null and alternative hypotheses,
and state what the test statistic is and its reference distribution.
Then, clearly state your conclusions.
Answer
Let µi be the true mean response value of treatment i. The null
hypothesis is
H0 : µ1 = µ2 = · · · = µ5
and the alternative hypothesis is at least two of {µi : i = 1, 2, . . . , 5}
are not equal. The test statistic is
MSS(trt)
F =
MSS(err)
which has a Fk−1,n−k distribution under H0 . The p-value is
(b) [4] Construct simultaneous 95% confidence intervals for mean dif-
ferences using Tukey’s method, but present only the first three
(that is, 1 vs 2; 1 vs 3; 2 vs 3).
Answer
The lower and upper limits of the 95% Tukey confidence intervals
for the first three are in the following table:
9
R code:
cv.tukey = qtukey(0.95, k, n - k) / sqrt(2)
diff.means = c(bar.yi[1]-bar.yi[2],
bar.yi[1]-bar.yi[3],
bar.yi[2]-bar.yi[3])
se = sqrt(mss.error * c((1 / ni[1] + 1 / ni[2]),
(1 / ni[1] + 1 / ni[3]),
(1 / ni[2] + 1 / ni[3])))
(diff.means - se * cv.tukey)
(diff.means + se * cv.tukey)
(c) [3] Find the estimated effects of the 5 treatments and the estimated
error variance (i.e., τ̂j and σ̂ 2 ).
Answer
The estimated effects of the treatment means are given by
σ̂ 2 = MSS(err) = 1.162 .
R code:
hat.taui = bar.yi - bar.y
hat.sigma2 = mss.error
(d) [4] Suppose the true effects of the 5 treatments are the same as
the estimates found in part (c), but the true error variance is ten
times larger than the estimate. What is the power of the test for
H0 : τ1 = · · · = τ5 ?
Answer
Let δ = ki=1 ni (τi − τ̄ )2 /σ 2 where τ̄ = N −1 ki=1 ni τi , the null
P P
10
When the true effects are given as in part (c) and σ 2 = 10σ̂ 2 , we
have δ = 9.3298. The power is computed as
11
6. Continuing the last problem:
(a) [4] Suppose the random effects model is more suitable instead so
that the variance of the treatment effect is στ2 .
Obtain an expression for E [(ȳ1· − ȳ2· )2 ] in terms of στ2 and σ 2 .
Answer
Since ȳ1 and ȳ2 have the same mean and they are independent,
we have
12
7. [5] Someone claims that the ratio of height over foot-length is larger
for males than females.
Using a randomization test with the data we collected (Midterm2021Q3.txt
from Q3), provide your opinion on this claim as a statistician.
Note: computing all permutations is infeasible due to the size
of this data set. You may estimate the p-value via simulations.
Answer
See Assignment 3.
13