Midterm 2023 Sol
Midterm 2023 Sol
Midterm 2023 Sol
Name: Student #:
• Put your name and student ID on the up-right corner of every sheet.
• Correct answers are usually short. Answer questions in brief but complete sentences.
If you do not show your work, part marks will not be given if your answer is
incorrect. For example, a satisfactory answer is:
The treatment sum of squares is given by
k
X
SStrt = ni (ȳi· − ȳ·· )2 = 4 × (5 − 3.2)2 + 6 × (2 − 3.2)2 = 21.6 .
i=1
• Use R for simple calculations such as the sample mean and sample variance (as in the
assignments). Answers obtained using one-line R functions will not be accepted.
• Save the R code you used in a .doc, .docx, .rtf, or .txt file. Include comments
describing which question the code block is used for. Leave sufficient space between code
for different questions. Submit your code to Canvas.
2. use the conventional 5% level for tests, hypothesis for two-sided alternatives, and
95% confidence level.
• Bonus 1 mark will be given to all, unless one’s presentation is judged very
messy.
1
MIDTERM PROBLEMS START HERE
1. [6] List the three principles of design of experiments we discussed in STAT 404.
Explain each principle in 1–2 complete sentences.
Answer.
(b) Replication: improves the precision of estimating the treatment effects or re-
peating the same treatment on several experiment units.
(c) Blocking: removes the effect of a factor that is not of interest or grouping simi-
lar experiment units to compare different treatments under similar conditions.
2. [8] The standard two-sample t-test is formulated under strict model assumptions.
(a) [4] Name two of the model assumptions. Describe each assumption in one
sentence.
Answer. Any two of the following (or other relevant assumptions) is acceptable:
(b) [4] For each assumption, explain in 1–2 complete sentences how the assumption
affects the standard t-test (based on the lecture discussions).
Answer.
• Normality: without normality, the numerator of the t-statistic does not have
a normal distribution, and so the test statistic does not have a t-distribution.
Other answers may also be acceptable if correct and answers the question.
2
3. [10] We collected data on the gender, footlength and height of 35 adults. The
dataset (BodyHeight.txt) and code to read in the data are on Canvas.
Answer. Using the pooled estimator, the variance is estimated to be s2p = 38.167.
The test statistic is then
ȳ − ȳf
T = qm = 6.158 .
sp n1m + n1f
Answer.
4. [28] A linear regression model assumes that the response values in a study can be
expressed as
yi = x ⊤
i β + ϵi for i = 1, 2, . . . , n ,
3
where x⊤ 2
i β is the expected value of yi given xi and the ϵi ’s are iid N(0, σ ) random
variables.
You are asked to analyze the same dataset as in Q3: model the height against
gender and footlength (the numbers have been doctored slightly).
(a) [4] Obtain an expression for the expected height of a female with foot-length
25 cm. Hint: it is linear in β0 , β1 and β2 .
(c) [4] Compute the predicted height of a female with foot-length 25 cm based on
the fitted linear regression.
(d) [4] Estimate the error variance σ 2 (use the method given in class).
(y − ŷ)2
P
2
σ̂ = = 27.960 .
35 − 2 − 1
4
(e) [4] Estimate the variance matrix of β̂.
(f ) [4] Estimate the variance of the predicted height of a female with foot-length
25 cm.
Answer. Let x = c(1, −1, 25). The estimated variance of the predicted height of a
female with foot-length 25 cm is
Var(ŷ)
d d β̂) = σ̂ 2 x⊤ (X⊤ X)−1 x = 1.627 .
= Var(x
(g) [4] Construct a two-sided 95% confidence interval for the height of a female
with foot-length 25 cm. Hint: remember the general recipe for constructing CIs.
Answer. A two-sided 95% confidence interval for the height of a female with
foot-length 25 cm is given by
q √
ŷ ± t32 (0.975) Var(ŷ)
d = 166.263 ± 2.037 1.627 = (163.667, 168.863) .
(a) [4] Estimate the treatment effects of the 5 groups (i.e., τ̂j ).
5
Answer. The treatment sum of squares is
5
X
SStrt = ni τi2 = 1714.365 .
i=1
(c) [4] Compute the error (or residual) sum of squares SSerr .
(d) [4] Complete the one-way layout ANOVA table. Not every cell needs to be filled.
Source DF SS MSS F
Treatment 4 1714.365 428.591 14.434
Total 30 2486.387
(f ) [4] Test the hypothesis that all treatment means are equal at the 10% level.
State the hypotheses, the test statistic and its reference distribution, the p-value or
critical value, and your conclusions.
H0 : µ1 = . . . = µ5 , H1 : µi ̸= µj for some i ̸= j .
The test statistic is the F-statistic with value given in the ANOVA table. It has a
F4,26 distribution under H0 . The p-value is computed to be 2.484 × 10−6 . We reject
H0 at any reasonable significance level (including at the 10% level) and conclude
that at least two of the treatment means are different.
6
(g) [6] Construct simultaneous 90% CIs for the mean differences using Tukey’s
method. Pretend that you are computing all simultaneous CIs but show only the
first 3 (1 vs 2 ; 1 vs 3 ; 2 vs 3 ) in writing.
All of the confidence intervals contain 0 and so none of the means between the first
three groups are significantly different.
6. [6] Under standard model assumptions of the one-way layout, the treatment and
error sums of squares are independent. This is largely due to two facts. The first
fact is that the total sum of squares has the decomposition
The second fact is the following, which you will prove. Assume the common nota-
tions used in lectures.
Hint: work out the special case i = 1, j = 1, and comment on why the proof holds
for any i, j.
As per the standard one-way layout, we assume that the k groups have the same
number of units n. By properties of covariance, we have
Cov(ȳi. − ȳ.. , yij − ȳi. ) = Cov(ȳi. , yij ) − Cov(ȳi. , ȳi. ) − Cov(ȳ.. , yij ) + Cov(ȳ.. , ȳi. )
7
By assumptions of the one-way layout, observations j and j ′ in group i are inde-
pendent and so
1 1 σ2
Cov(ȳi. , yij ) = Cov(yij , yij ) = Var(yij ) = .
n n n
Also by assumption of the one-way layout, groups i and i′ are independent of one
another, and so
1 1 σ2
Cov(ȳ.. , ȳi. ) = Cov(ȳi. , ȳi. ) = Var(ȳi. ) = ,
k k kn
and
1 1 σ2
Cov(ȳ.. , yij ) = Cov(yij , yij ) = Var(ȳi. ) = .
kn kn kn
Replacing the covariances with these simplified expressions in the above equation
gives us that Cov(τ̂i , rij ) = 0.
8
PAGE FOR ADDITIONAL SPACE OR SCRATCH WORK
9
PAGE FOR ADDITIONAL SPACE OR SCRATCH WORK
10