HW3 Solutions - Stats 500: Problem 1
HW3 Solutions - Stats 500: Problem 1
Problem 1.
library(faraway)
data(teengamb)
gambmod = lm(gamble ~ ., data = teengamb)
RSS = sum(residuals(gambmod)**2)
DF = df.residual(gambmod)
summary(gambmod)
##
## Call:
## lm(formula = gamble ~ ., data = teengamb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51.082 -11.320 -1.451 9.452 94.252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.55565 17.19680 1.312 0.1968
## sex -22.11833 8.21111 -2.694 0.0101 *
## status 0.05223 0.28111 0.186 0.8535
## income 4.96198 1.02539 4.839 1.79e-05 ***
## verbal -2.95949 2.17215 -1.362 0.1803
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
##
## Residual standard error: 22.69 on 42 degrees of freedom
## Multiple R-squared: 0.5267, Adjusted R-squared: 0.4816
## F-statistic: 11.69 on 4 and 42 DF, p-value: 1.815e-06
(a).
The variables sex and income are significant, with p-values equal to 0.0101 and 1.79e-05, respectively.
(b).
Assuming all other variables are held constant, a female teenager is expected to spend $22.12 less on gambling
than a male teenager.
(c).
1
pvalue = pf(F, df - DF, DF, lower.tail = FALSE)
pvalue
## [1] 0.01177211
We conclude that at the 5% level, including the other predicotrs significantly improves the fit of the model.
Problem 2.
(a).
data(sat)
satmod = lm(total ~ expend + ratio + salary, data = sat)
summary(satmod)
##
## Call:
## lm(formula = total ~ expend + ratio + salary, data = sat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -140.911 -46.740 -7.535 47.966 123.329
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1069.234 110.925 9.639 1.29e-12 ***
## expend 16.469 22.050 0.747 0.4589
## ratio 6.330 6.542 0.968 0.3383
## salary -8.823 4.697 -1.878 0.0667 .
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
##
## Residual standard error: 68.65 on 46 degrees of freedom
## Multiple R-squared: 0.2096, Adjusted R-squared: 0.1581
## F-statistic: 4.066 on 3 and 46 DF, p-value: 0.01209
Based on the summary, the p-value for H0 : —salary = 0 is 0.0667, so at the 5% level we fail to reject —salary = 0.
On the other hand, the p-value for the F-test with null hypothesis: {—expend = —ratio = —salary = 0} is
0.01209 so we can say that at least one of these variables is significant.
(b).
2
RSS = sum(satmod2$residuals**2); DF = df.residual(satmod2)
rss = sum(satmod$residuals**2); df = df.residual(satmod)
F = ((rss - RSS) /(df - DF)) / (RSS / DF)
pvalue = pf(F, df - DF, DF, lower.tail = FALSE)
pvalue
## [1] 2.606559e-16
As expected, the F-test yields the same result.
Problem 3:
(a).
(b).
(c).
(d).
3
x2 = x0
x2[1,] <- c(1, 20, 1, 10)
predict(transformed.model, new = x2, interval = prediction )
6
5
4
2 4 6 8 10
Income