HW5
HW5
2024-10-16
Problem 6.11
a)
H0: b1=b2=b3=0
Ha: not all b# = 0
The decision rule is if calculated F > F* fail to reject
F = 2.798061
F* = 35.33703
as can be seen, the F* is way bigger so we reject the null hypothesis
The p-value is also very small (less than .05) so we reject the null hypothesis and state that there is a
regression relation.
This implies that at least one of B1 B2 B3 are relevant p-value = 3.315708e-12
b)
B1 CI = ( -5.64608e-05 , 0.001630622 )B3 CI = ( 478.6096 , 768.4993 ) The CI for B1 includes 0, so we can’t
reject H0: b1 = 0 with 95% confidence.
The CI for B3 does not include 0, so we can reject H0: b3 = 0 with 95% confidence.
c)
R squared = 0.6883342
This is the % of the error that is explained by X1, X2, and X3 in the model
Problem 6.16
a)
H0: b1=b2=b3=0
Ha: not all b# = 0
The decision rule is if calculated F > F* fail to reject
F = 2.219059
F* = 35.33703
as can be seen, the F* is way bigger so we reject the null hypothesis
The p-value is also very small (less than .05) so we reject the null hypothesis and state that there is a
regression relation.
This implies that at least one of B1 B2 B3 are relevant p-value = 1.541973e-10
b)
1
B1 CI = ( -1.614248 , -0.6689755 )
B2 CI = ( -1.52451 , 0.6405013 )
B3 CI = ( -29.09203 , 2.151701 )
I used Bonferroni joint CI becasue we were given a family confidence coefficient
The CI for B1 does not include 0, so we can reject H0: b1 = 0 with 90% confidence.
The CI for B2 includes 0, so we can’t reject H0: b2 = 0 with 90% confidence.
The CI for B3 includes 0, so we can’t reject H0: b3 = 0 with 90% confidence.
R squared = 0.6821943
This is the % of the error that is explained by X1, X2, and X3 in the model
Problem 6.17
a)
90% CI for Yh hat = ( 64.52854 , 73.49204 )
This CI shows the range that we are 90% sure this set of observations will fall into
b)
The 90% prediction interval = ( 52.09065 , 85.92992 ) For a future observation with the given X paramater
values we are 90% sure it will fall within out Prediction Interval
Problem 7.4
a)
Extra sum of squares for X1 = 136366.2
Extra sum of squares for X3 | X1 = 2033565
Extra sum of squares for X2 | X1,X3 = 6674.588
b)
H0 : B2 = 0 (reduced model) Ha : B2 != 0 (full model)
F* = 0.3250843
F = 4.042652
Since F* < F we cannot say at with 95% confidence that B2 should be in the model. We fail to reject H0, so
we should probably drop B2.
P-value = 0.5712274
c)
Expression 1 = 142092.2
Expression 2 = 142092.2
They are equal in this case. They will always be equal because both expressions are representing the total
amount of error explained by both X1 and X2 in a model.
Problem 7.9
H0: B1 = -1 , B2 = 0 (reduced model) Y = B0 - X1 +B3*X3
Ha: B1 != -1 , B2 != 0 (full model) Y = B0+B1X1+B2 X2+B3*X3
[1] 8275.389 [1] 4.03271
2
summary(mlrX3)
##
## Call:
## lm(formula = Y2.0 ~ X3.2 - X1.2, data = dataX3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.369 -9.606 -1.946 9.212 31.631
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.449 15.304 9.569 2.55e-12 ***
## X3.2 -37.117 6.637 -5.593 1.33e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.33 on 44 degrees of freedom
## Multiple R-squared: 0.4155, Adjusted R-squared: 0.4022
## F-statistic: 31.28 on 1 and 44 DF, p-value: 1.335e-06
anova(mlr2)
3
# sm = summary(mlr)
# pVal = glance(sm)$p.value[[1]]
# ```
# H0: b1=b2=b3=0
#
# Ha: not all b# = 0
#
# The decision rule is if calculated F > F* fail to reject
#
# ```{r, echo=FALSE, results='asis'}
# cat("F = ",qf(.95,3,48),"\n\n")
# cat("F* = ",glance(sm)$statistic, "\n\n")
# ```
#
# as can be seen, the F* is way bigger so we reject the null hypothesis
#
# The p-value is also very small (less than .05) so we reject the null hypothesis
# and state that there is a regression relation.
#
# This implies that at least one of B1 B2 B3 are relevant
# ```{r, echo=FALSE, results='asis'}
# cat("p-value = ", pVal)
# ```
#
# b)
#
# ```{r, echo=FALSE, results='asis'}
# B0 = mlr$coefficients[[1]]
# B1 = mlr$coefficients[[2]]
# B2 = mlr$coefficients[[3]]
# B3 = mlr$coefficients[[4]]
# B = qt(1-.05/4,48)
# SEB1 = sm$coef[[2,2]]
# SEB3<-sm$coef[[4,2]]
# cat("B1 CI = (", B1 - B*SEB1, ", ", B1 + B*SEB1,")")
# cat("B3 CI = (", B3 - B*SEB3, ", ", B3 + B*SEB3,")")
# ```
# The CI for B1 includes 0, so we can't reject H0: b1 = 0 with 95% confidence.
#
# The CI for B3 does not include 0, so we can reject H0: b3 = 0 with 95% confidence.
#
# c)
#
# ```{r, echo=FALSE, results='asis'}
# cat("R squared = ", sm$r.squared)
# ```
#
# This is the % of the error that is explained by X1, X2, and X3 in the model
#
# Problem 6.16
#
# a)
#
4
# ```{r, echo=FALSE, results='asis'}
# Y2.0<-c(48,57,66,70,89,36,46,54,26,77,89,67,47,51,57,66,79,88,60,49,77,52,60,86,43,34,63,72,57,55,59,8
# X1.2<-c(50,36,40,41,28,49,42,45,52,29,29,43,38,34,53,36,33,29,33,55,29,44,43,23,47,55,25,32,32,42,33,3
# X2.2<-c(51,46,48,44,43,54,50,48,62,50,48,53,55,51,54,49,56,46,49,51,52,58,50,41,53,54,49,46,52,51,42,4
# X3.2<-c(2.3,2.3,2.2,1.8,1.8,2.9,2.2,2.4,2.9,2.1,2.4,2.4,2.2,2.3,2.2,2,2.5,1.9,2.1,2.4,2.3,2.9,2.3,1.8,
# data2 <- data.frame(Y2.0, X1.2, X2.2, X3.2)
# mlr2<-lm(Y2.0 ~ X1.2+X2.2+X3.2, data = data2)
# sm2 = summary(mlr2)
# pVal2 = glance(sm2)$p.value[[1]]
# ```
#
# H0: b1=b2=b3=0
#
# Ha: not all b# = 0
#
# The decision rule is if calculated F > F* fail to reject
#
# ```{r, echo=FALSE, results='asis'}
# cat("F = ",qf(.90,3,42),"\n\n")
# cat("F* = ",glance(sm)$statistic, "\n\n")
# ```
#
# as can be seen, the F* is way bigger so we reject the null hypothesis
#
# The p-value is also very small (less than .05) so we reject the null hypothesis
# and state that there is a regression relation.
#
# This implies that at least one of B1 B2 B3 are relevant
# ```{r, echo=FALSE, results='asis'}
# cat("p-value = ", pVal2)
# ```
#
# b)
#
# ```{r, echo=FALSE, results='asis'}
# B0.2 = mlr2$coefficients[[1]]
# B1.2 = mlr2$coefficients[[2]]
# B2.2 = mlr2$coefficients[[3]]
# B3.2 = mlr2$coefficients[[4]]
# B2.0 = qt(1-.1/6,42)
# SEB1.2 = sm2$coef[[2,2]]
# SEB2.2 = sm2$coef[[3,2]]
# SEB3.2 = sm2$coef[[4,2]]
# cat("B1 CI = (", B1.2 - B2.0*SEB1.2, ", ", B1.2 + B2.0*SEB1.2,")\n\n")
# cat("B2 CI = (", B2.2 - B2.0*SEB2.2, ", ", B2.2 + B2.0*SEB2.2,")\n\n")
# cat("B3 CI = (", B3.2 - B2.0*SEB3.2, ", ", B3.2 + B2.0*SEB3.2,")\n\n")
# ```
#
# I used Bonferroni joint CI becasue we were given a family confidence coefficient
#
# The CI for B1 does not include 0, so we can reject H0: b1 = 0 with 90% confidence.
#
# The CI for B2 includes 0, so we can't reject H0: b2 = 0 with 90% confidence.
5
#
# The CI for B3 includes 0, so we can't reject H0: b3 = 0 with 90% confidence.
#
# ```{r, echo=FALSE, results='asis'}
# cat("R squared = ", sm2$r.squared)
# ```
#
# This is the % of the error that is explained by X1, X2, and X3 in the model
#
# Problem 6.17
#
# a)
#
# ```{r, echo=FALSE, results='asis'}
# bVec = c(B0.2,B1.2,B2.2,B3.2)
# xHVec = c(1,35,45,2.2)
# Temp = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
# Xmatrix = matrix(c(Temp,X1.2,X2.2,X3.2),ncol=4)
#
# MSE = (t(Y2.0) %*% Y2.0 - t(bVec) %*% t(Xmatrix) %*% Y2.0)/42
# MSE1 = sum((B0.2+X1.2*B1.2+X2.2*B2.2+X3.2*B3.2 - Y2.0)ˆ2)/42
# XTX = t(Xmatrix)%*%Xmatrix
#
# library(matlib)
# seBsquared = MSE1*(solve(XTX))
#
# YHhat = t(xHVec) %*% bVec
# SEYHhat = sqrt(t(xHVec) %*% seBsquared %*% xHVec)
# tVal7 = qt(.95,42)
# cat("90% CI for Yh hat = (", YHhat - tVal7*SEYHhat, ",", YHhat + tVal7*SEYHhat, ")")
# ```
#
# This CI shows the range that we are 90% sure this set of observations will fall into
#
# b)
#
# ```{r, echo=FALSE, results='asis'}
# hmat = t(xHVec) %*% xHVec
# SEpred = sqrt(MSE1*solve(hmat)*(1+(t(xHVec) %*% xHVec)))
# cat("The 90% prediction interval = (", YHhat-tVal7*SEpred, ",", YHhat+tVal7*SEpred,")")
# ```
# For a future observation with the given X paramater values we are 90% sure it will fall within out Pre
#
# Problem 7.4
#
# a)
#
# ```{r, echo=FALSE, results='asis'}
# #Extra sum of squares is SSE model+new var(s) - SSE model
# dataX1 <- data.frame(Y, X1)
# mlrX1<-lm(Y ~ X1, data = dataX1)
# B0X1 = mlrX1$coefficients[[1]]
# B1X1 = mlrX1$coefficients[[2]]
6
# SSEX1 = sum((B0X1+X1*B1X1 - Y)ˆ2)
#
# dataX13 <- data.frame(Y, X1, X3)
# mlrX13<-lm(Y ~ X1+X3, data = dataX13)
# B0X13 = mlrX13$coefficients[[1]]
# B1X13 = mlrX13$coefficients[[2]]
# B3X13 = mlrX13$coefficients[[3]]
# SSEX13 = sum((B0X13+X1*B1X13+X3*B3X13 - Y)ˆ2)
#
# data <- data.frame(Y, X1, X2, X3)
# mlr<-lm(Y ~ X1+X2+X3, data = data)
# SSEfull = sum((B0+X1*B1+B2*X2+X3*B3 - Y)ˆ2)
#
#
# dataX2 <- data.frame(Y, X2)
# mlrX2<-lm(Y ~ X2, data = dataX2)
# B0X2 = mlrX2$coefficients[[1]]
# B2X2 = mlrX2$coefficients[[2]]
# SSEX2 = sum((B0X2+B2X2*X2 - Y)ˆ2)
#
# cat("Extra sum of squares for X1 = ", anova(mlr)[[1,2]])
# cat("\n\nExtra sum of squares for X3 | X1 = ", SSEX1-SSEX13)
# cat("\n\nExtra sum of squares for X2 | X1,X3 = ", SSEX13-SSEfull)
# ```
#
# b)
#
# ```{r, echo=FALSE, results='asis'}
# #F* = MSR(X2|X1,X3)/MSE(full)
# Fdrop = (SSEX13-SSEfull)*48/SSEfull
# ```
#
# H0 : B2 = 0 (reduced model)
# Ha : B2 != 0 (full model)
#
# ```{r, echo=FALSE, results='asis'}
# cat("F* = ", Fdrop)
# cat("\n\nF =",qf(.95,1,48))
# ```
#
# Since F* < F we cannot say at with 95% confidence that B2 should be in the model. We fail to reject H0
#
# ```{r, echo=FALSE, results='asis'}
# cat("P-value = ", pf(Fdrop,1,48,lower.tail=FALSE))
# ```
#
# c)
#
# ```{r, echo=FALSE, results='asis'}
# dataX12 <- data.frame(Y, X1, X2)
# mlrX12<-lm(Y ~ X1+X2, data = dataX12)
# B0X12 = mlrX12$coefficients[[1]]
# B1X12 = mlrX12$coefficients[[2]]
7
# B2X12 = mlrX12$coefficients[[3]]
# SSEX12 = sum((B0X12+X1*B1X12+X2*B2X12 - Y)ˆ2)
#
# cat("Expression 1 =", anova(mlr)[[1,2]] + (SSEX1-SSEX12))
#
# cat("\n\nExpression 2 = ", anova(mlrX2)[[1,2]] + (SSEX2-SSEX12))
# ```
#
# They are equal in this case. They will always be equal because both expressions are representing the t
#
# Problem 7.9
#
# H0: B1 = -1 , B2 = 0 (reduced model) Y = B0 - X1 +B3*X3
#
# Ha: B1 != -1 , B2 != 0 (full model) Y = B0+B1*X1+B2*X2+B3*X3
#
# ```{r, echo=FALSE, results='asis'}
# #SSR(X1,X2|X3)*42/SSE(full)/2
# #SSE(b1 = -1, X3)-SSE(X1,X2,X3) / SSE full * 21
# dataX3 <- data.frame(Y2.0, X3.2)
# mlrX3<-lm(Y2.0 ~ X3.2-X1.2, data = dataX3)
# B0X3 = mlrX3$coefficients[[1]]
# B3X3 = mlrX3$coefficients[[2]]
# SSEX3 = sum((B0X3+B3X3*X3.2 - Y2.0)ˆ2)
# anova(mlr2)[1,2]
# qf(.975,2,42)
# ```
# ```{r}
# summary(mlrX3)
# anova(mlr2)
# ```
# As you can see at the .025 level since F>F* we reject the null hypothesis