36-401 Modern Regression HW #6 Solutions: Problem 1 (32 Points)
36-401 Modern Regression HW #6 Solutions: Problem 1 (32 Points)
Given : Chick
0 5 10 15 20
300
50 150
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
300
50 150
300
50 150
Weight
300
50 150
300
50 150
300
50 150
300
50 150
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Time
Figure 1: Weight Progression of 50 Chicks in the ChickWeight Data Set
1
(b) (9 pts.)
The results of a naive simple linear fit of Chick Weight on Time are shown in Figure 2. The right panel shows
very significant residual correlation. Therefore, the model does not fit well.
20
150
10
125
Residuals
Weight
100
0
75
−10
50
−20
0 5 10 15 20 0 5 10 15 20
Time Time
Figure 3 displays the results of a degree four1 polynomial regression of Weight on Time. The lack of visible
correlation in the residuals suggests this is a much more suitabe fit.
150
125
Residuals
Weight
100
0
75
50
0 5 10 15 20 0 5 10 15 20
Time Time
2
Table 1: Summary of Polynomial Regression for Chick 6
Not surprisingly, given the polynomial shape of the weight progression of Chick 6, our polynomial regression
does a much better job fitting the data than the naive linear regression. The most notable aspect of our
model parameters is that Time is no longer statistically significant given its higher order terms.
(c) (9 pts.)
160
350
120
300
80
250
40
Residuals
Weight
200
0
150
−40
−120 −80
100
50
0 5 10 15 20 0 5 10 15 20
Time Time
As we saw with Chick 6, and as we could infer from Figure 1, the weight progression of individual chicks is
often more suitable to model with a polynomial than a simple linear regression. However, when the data are
aggregated over all chicks, a simple linear model is actually not a poor choice. The model fit and residuals
are shown in Figure 4. For the most part the line fits the data fairly well. However, some residual curvature
is apparent in the earlier stages of the chick lifetime. Therefore, adding a quadratic term will likely benefit
the model. Figure 5 shows that the extra term does indeed level out the residuals nicely.
3
160
350
120
300
80
250
40
Residuals
Weight
200
0
−40
150
−120 −80
100
50
0 5 10 15 20 0 5 10 15 20
Time Time
(d) (9 pts.)
Diet 1
350
Diet 2
Diet 3
Diet 4
300
250
Weight
200
150
100
50
0 5 10 15 20
Time
Figure 6: Results of Polynomial Regression of Chick Weight on Time and Diet
4
Simply adding Diet as a predictor without interacting it with Time amounts to the growth curve for each
diet group having the same shape but possibly different intercepts. As we can see in Figure 6, such a model
specification is not suitable in this setting. Given the HW instructions, it is fine if you do this as long as you
comment on the ill-suitedness of the model.
Since a chick’s birthweight is independent of the diet it is thereafter fed, it is more sensible to force each
diet group to have the same intercept, but possibly differing curvatures. The fits of this model are shown in
Figure 7 (superposed) and Figure 8 (separated into respective Diet groups).
Diet 1
350
Diet 2
Diet 3
Diet 4
300
250
Weight
200
150
100
50
0 5 10 15 20
Time
Figure 7: Polynomial Regression of Chick Weight with Timeˆ2 and Diet Interacted
5
Diet = 1 Diet = 2
350
350
300
300
250
250
Weight
Weight
200
200
150
150
100
100
50
50
0 5 10 15 20 0 5 10 15 20
Time Time
Diet = 3 Diet = 4
350
350
300
300
250
250
Weight
Weight
200
200
150
150
100
100
50
50
0 5 10 15 20 0 5 10 15 20
Time Time
Figure 8: Polynomial Regression of Chick Weight with Timeˆ2 and Diet Interacted
The residuals of the model are plotted vs. Time in Figure 9 (superposed) and Figure 10 (separated into
respective Diet groups). The residuals look pretty healthy overall, but there is some clear correlation in Diet
groups 1 and 4. A box plot can also be used to do diet-wise diagnostics of the residuals (see Figure 11).
However, the significant heteroskedasticity in all diet groups renders this plot next to useless. Instead we are
much better off examining Figure 10. Notice we are only a few predictor interactions shy of performing four
separate regressions any way.
6
120
Diet 1
Diet 2
Diet 3
80
Diet 4
40
Residuals
0
−40
−80
−120
0 5 10 15 20
Time
Figure 9: Residuals of chosen model
7
Diet = 1 Diet = 2
120
120
80
80
40
40
Residuals
Residuals
0
0
−40
−40
−80
−80
−120
−120
0 5 10 15 20 0 5 10 15 20
Time Time
Diet = 3 Diet = 4
120
120
80
80
40
40
Residuals
Residuals
0
0
−40
−40
−80
−80
−120
−120
0 5 10 15 20 0 5 10 15 20
Time Time
8
100
50
Residuals
0
−50
−100
−150
1 2 3 4
Diet
Figure 11: Diet-wise Distribution of Residuals
9
Problem 2 [36 points]
(a) (9 pts.)
And,
det(X T X) = det(X T ) · det(X) = det(X) · det(X) = 02 = 0.
det(X T X) = 0 so X T X is not invertible.
(b) (9 pts.)
Y <- c(33,36,35,35,31,29,31,29,37,39,36,36)
X0 <- rep(1,12)
X1 <- c(rep(1,4),rep(0,8))
X2 <- c(rep(0,4),rep(1,4),rep(0,4))
X3 <- c(rep(0,8),rep(1,4))
X <- cbind(X0,X1,X2) # leave out X3
model7 <- lm(Y ~ X - 1) # leave out default intercept since we have already included one
tmp <- summary(model7)$coefficients
rownames(tmp) <- c("(Intercept)","France","Italy")
kable(tmp, caption = "Income Regression Summary")
All coefficients in the regression are significant, signalling a statistically significant difference between the
mean incomes of France, Italy, and the USA.
10
(c) (9 pts.)
E[Income
b | France] = βb0 + βb1 = 34.75 (thousand)
E[Income
b | Italy] = βb0 + βb2 = 30.00 (thousand)
E[Income
b | USA] = βb0 = 37.00 (thousand)
(d) (9 pts.)
Continuing with the same notation, the true mean income of France is given by β0 + β1 . Assuming the noise
is normally distributed implies
βb0 ∼ N β0 , Var(βb0 ) and βb1 ∼ N β1 , Var(βb1 )
and thus
βb0 + βb1 ∼ N β0 + β1 , Var(βb0 + βb1 )
N β0 + β1 , Var(βb0 ) + Var(βb1 ) + 2 · Cov(βb0 + βb1 ) .
Var(βb0 + βb1 ) is not known so we replace it with its unbiased estimator and use the t-distribution. This yields
the 95% confidence interval
q
(β0 + β1 ) ± tn−1 (0.025) · Var(
b b d βb0 + βb1 )
q
=⇒ (βb0 + βb1 ) ± t3 (0.025) · Var( d βb1 ) + 2 · Cov(
d βb0 ) + Var( d βb0 , βb1 )
=⇒ 34.75 ± 2.037,
where we have gotten the estimated errors from the vcov function.
11
Problem 3 [32 points]
(a) [5 pts.]
15 25 35 45
age
200
lwt
100
3.0
race
2.0
1.0
0.8
smoke
0.4
0.0
3.0
ptl
1.5
0.0
0.8
ht
0.4
0.0
0.8
ui
0.4
0.0
6
4
ftv
2
0
4000
bwt
1000
15 25 35 45 1.0 2.0 3.0 0.0 1.5 3.0 0.0 0.4 0.8 1000 4000
12
(b) [9 pts.]
(c) [9 pts.]
Overall, the residuals look quite good plotted against the fitted values and the numerical predictors. Case
226 (a 45 year old) bears substantial leverage on the fit since all other cases are much younger. Furthermore,
this data point also yields a very large outlier. In practice, we would therefore discard this observation when
building a predictive model. In regard to the categorical predictors, there is some interclass heteroskedasticity;
however, the varied sample sizes can also make is challenging to deduce this strictly from these box plots.
That is, categorical levels containing a large number of observations can give the appearance of a wider
distribution while levels with small sample sizes natural appear to have low variance. To normalize these
comparisons with respect to sample size we could, for example, perform an F -test for equality of variances or
(more sophisticated) a Kolmorogov-Smirnov test for equality of distributions.
13
900 1500
226
−300 300
Residuals
−1200
16
10
−2100
Fitted values
lm(bwt ~ age + lwt + factor(race) + smoke + factor(ptl) + ht + ui + factor( ...
1500
900
300
Residuals
−300
−900
−1800
15 20 25 30 35 40 45
Age
14
Residuals Residuals
−1500 −500 0 500 1500 −1800 −900 −300 300 900 1500
80
100
1
120
140
15
160
2
Age
Race
180
200
3
220
240
Residuals Residuals
0
0
16
Smoke
2
1
0
−500
−1500
0 1
History of hypertension
1500
500
Residuals
0
−500
−1500
0 1
17
1500
500
Residuals
0
−500
−1500
0 1 2 3 4 6
(d) [9 pts.]
Table 3 (part b) suggests we may have included more predictors than are useful for predicting Birth Weight.
A more predictive model would likely reduce the dimension of our covariates. We could, for example, proceed
by performing a sequence of partial F -tests (equivalently stepwise regression; 36-402) or perhaps implementing
a LASSO regression (also 36-402)2 Given the full set of predictors, the multiple linear regression suggests
that a baby’s birth weight is significantly associated with the mother’s weight, race, smoking status, history
of hypertension, and presence of uterine iritability (and possibly the number of previous premature labours).
18
Appendix
if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)){
stop("Vector lengths not correct")
}
if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))
if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))
data(ChickWeight)
names(ChickWeight)
attach(ChickWeight)
(a) (4 pts.)
(b) (9 pts.)
par(mfrow=c(1,2))
with(Chick6, plot(Time, weight, col = NA, pch = 19, cex = 1.25,
axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),
font = 5)
19
axis(side = 2, at = seq(0,300,25), as.character(seq(0,300,25)),
font = 5)
abline(v = c(0,5,10,15,20), h = seq(50,300,25), col = "gray70",
lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
with(Chick6, points(Time, weight, col = addTrans("orange",120),
cex = 1.25, pch = 19))
with(Chick6, points(Time, weight, col = "orange", pch = 1,
cex = 1.25))
model1 <- lm(weight ~ Time, data = Chick6)
abline(model1, col = "seagreen", lwd = 2)
par(mfrow=c(1,2))
with(Chick6, plot(Time, weight, col = NA, pch = 19, cex = 1.25,
axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20), as.character(c(0,5,10,15,20)),
font = 5)
axis(side = 2, at = seq(0,300,25), as.character(seq(0,300,25)),
font = 5)
abline(v = c(0,5,10,15,20), h = seq(50,300,25), col = "gray70",
lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
with(Chick6, points(Time, weight, col = addTrans("orange",120),
cex = 1.25, pch = 19))
with(Chick6, points(Time, weight, col = "orange", pch = 1, cex = 1.25))
model2 <- lm(weight ~ Time + I(Time ^ 2) + I(Time ^ 3) + I(Time ^ 4),
data = Chick6)
lines(Chick6$Time, fitted(model2), col = "seagreen", lwd = 2)
20
points(Chick6$Time, residuals(model2), col = addTrans("orange",120),
pch = 19, cex = 1.25)
points(Chick6$Time, residuals(model2), col = "orange", cex = 1.25)
panel.smooth(Chick6$Time, residuals(model2), col = NA,cex = 0.5,
col.smooth = "seagreen", span = 0.5, iter = 3)
library(knitr)
kable(summary(model2)$coefficients, caption = "Summary of Polynomial Regression for Chick 6")
(c) (9 pts.)
par(mfrow=c(1,2))
with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,
axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),
font = 5)
axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)),
font = 5)
abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70",
lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
with(ChickWeight, points(Time, weight, col = addTrans("orange",120),
cex = 0.8, pch = 19))
with(ChickWeight, points(Time, weight, col = "orange", pch = 1,
cex = 0.8))
model3 <- lm(weight ~ Time, data = ChickWeight)
abline(model3, col = "seagreen", lwd = 2)
par(mfrow=c(1,2))
with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,
axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),
font = 5)
axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)), font = 5)
abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70", lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
21
mtext(side = 1, text = "Time", font = 3, line = 3)
with(ChickWeight, points(Time, weight, col = addTrans("orange",120), cex = 0.8, pch = 19))
with(ChickWeight, points(Time, weight, col = "orange", pch = 1, cex = 0.8))
model4 <- lm(weight ~ Time + I(Time ^ 2), data = ChickWeight)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model4, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50))),
col = "seagreen", lwd = 2)
(d) (9 pts.)
22
with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8, axes = FALSE,
xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)), font = 5)
axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)), font = 5)
abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70", lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
with(ChickWeight, points(Time, weight, col = addTrans("orange",120), cex = 0.8,
pch = 19))
with(ChickWeight, points(Time, weight, col = "orange", pch = 1, cex = 0.8))
model6 <- lm(weight ~ Time + I(Time ^ 2) : factor(Diet), data = ChickWeight)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),
Diet = 1)),
col = "seagreen", lwd = 2)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),
Diet = 2)),
col = "red", lwd = 2)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),
Diet = 3)),
col = "blue", lwd = 2)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),length = 50),
Diet = 4)),
col = "purple", lwd = 2)
legend(x = "topleft", legend = c("Diet 1", "Diet 2", "Diet 3", "Diet 4"),
col = c("seagreen","red","blue","purple"), lwd = 2, bg = "white")
par(mfrow=c(2,2))
cols <- c("seagreen","red","blue","purple")
for (itr in 1:4){
with(ChickWeight, plot(Time, weight, col = NA, pch = 19, cex = 0.8,
axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),
font = 5)
axis(side = 2, at = seq(0,350,50), as.character(seq(0,350,50)),
font = 5)
abline(v = c(0,5,10,15,20,25), h = seq(50,350,50), col = "gray70",
lty = 2)
mtext(side = 2, text = "Weight", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
mtext(side = 3, text = paste0("Diet = ",itr), font = 3)
with(ChickWeight[which(ChickWeight$Diet==itr),], points(Time, weight,
col = addTrans("orange",120),
cex = 0.8, pch = 19))
with(ChickWeight[which(ChickWeight$Diet==itr),], points(Time, weight,
col = "orange",
pch = 1, cex = 0.8))
model6 <- lm(weight ~ Time + I(Time ^ 2) : factor(Diet),
data = ChickWeight)
lines(seq(0,max(ChickWeight$Time),length = 50),
predict(model6, newdata=data.frame(Time = seq(0,max(ChickWeight$Time),
23
length = 50), Diet = itr)),
col = cols[itr], lwd = 2)
}
plot(ChickWeight$Time, residuals(model6), col = NA, axes = FALSE,
xlab= "Time", ylab = "Residuals", font.lab = 3)
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),
font = 5)
axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),
font = 5)
abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70", lty = 2)
abline(0,0, lty = 2, col = "gray45")
points(ChickWeight$Time, residuals(model6),
col = addTrans(c("seagreen","red","blue","purple")[ChickWeight$Diet],120),
pch = 19, cex = 0.8)
points(ChickWeight$Time, residuals(model6),
col = c("seagreen","red","blue","purple")[ChickWeight$Diet], cex = 0.8)
legend(x = "topleft", legend = c("Diet 1", "Diet 2", "Diet 3", "Diet 4"),
col = addTrans(c("seagreen","red","blue","purple"),120), pch = 19, bg = "white")
par(mfrow=c(2,2))
cols <- c("seagreen","red","blue","purple")
for (itr in 1:4){
with(ChickWeight, plot(Time, residuals(model6), col = NA,
pch = 19, cex = 0.8, axes = FALSE, xlab="",ylab=""))
axis(side = 1, at = c(0,5,10,15,20,25), as.character(c(0,5,10,15,20,25)),
font = 5)
axis(side = 2, at = seq(-200,200,40), labels = as.character(seq(-200,200,40)),
font = 5)
abline(h = seq(-200,200,40), v = c(0,5,10,15,20,25), col = "gray70", lty = 2)
abline(0,0, lty = 2, col = "gray45")
mtext(side = 2, text = "Residuals", font = 3, line = 3)
mtext(side = 1, text = "Time", font = 3, line = 3)
mtext(side = 3, text = paste0("Diet = ",itr), font = 3)
with(ChickWeight, points(Time[which(ChickWeight$Diet==itr)],
residuals(model6)[which(ChickWeight$Diet==itr)],
col = addTrans(c("seagreen","red","blue","purple"),120)[itr],
cex = 0.8, pch = 19))
with(ChickWeight, points(Time[which(ChickWeight$Diet==itr)],
residuals(model6)[which(ChickWeight$Diet==itr)],
col = c("seagreen","red","blue","purple")[itr], pch = 1,
cex = 0.8))
}
boxplot(residuals(model6) ~ ChickWeight$Diet,
col = addTrans(c("seagreen","red","blue","purple"),120),
border = c("seagreen","red","blue","purple"),
xlab = "Diet",ylab = "Residuals", pch = 19)
abline(0,0, lty = 2, col = "gray45")
24
Problem 3 [32 points]
library(MASS)
(a) [5 pts.]
pairs(birthwt[,2:10])
(b) [9 pts.]
(c) [9 pts.]
25
labels = as.character(seq(-1800,1500,300)), font = 5)
abline(h = seq(-1800,1500,300), v = seq(80,250,20), col = "gray70", lty = 2)
abline(0,0, lty = 2, col = "gray45")
points(birthwt$lwt, residuals(model8), col = addTrans("orange",120),
pch = 19, cex = 0.8)
points(birthwt$lwt, residuals(model8), col = "orange", cex = 0.8)
panel.smooth(birthwt$lwt, residuals(model8), col = NA,cex = 0.5,
col.smooth = "seagreen", span = 2/3, iter = 3)
boxplot(residuals(model8) ~ birthwt$ptl,
col = addTrans(c("seagreen","orange","red","blue"),120),
border = c("seagreen","orange","red","blue"),
xlab = "Number of previous premature labours",
ylab = "Residuals", pch = 19)
abline(0,0, lty = 2, col = "gray45")
boxplot(residuals(model8) ~ birthwt$ftv,
col = addTrans(c("seagreen","orange","red","blue","purple","yellow2","pink"),120),
border = c("seagreen","orange","red","blue","purple","yellow2","pink"),
xlab = "Number of physician visits during the first trimester.",
ylab = "Residuals", pch = 19)
abline(0,0, lty = 2, col = "gray45")
26