ApplStats Spring2022 Final Practice
ApplStats Spring2022 Final Practice
1. The data for this practice question is based on the cars dataset which is
automatically comes with R.
(a) Let dist be the response variable and speed be the explanatory vari-
able. Do quintic polynomial regression (including the interecept). Which
individual coefficient has the highest statistical significance?
(b) Using stepwise backwards elimination, continue to drop the least sta-
tistically significant regressors (but do not drop the intercept) until
all (non-intercept) regressors have p-values of less than 0.05. Which
regressors remain?
(c) Now treat the intercept as just another regressor. Using stepwise back-
wards elimination, continue to drop the least statistically significant
regressors (drop the intercept if it is least significant) until all regres-
sors have p-values of less than 0.05. Which regressors remain?
(d) First regress on the intercept only. Then using stepwise forward selec-
tion, continue to include the most statistically significant regressors (up
to and including the quintic term) until no more additional regressors
would have p-values of less than 0.05. Which regressors are selected for
the model?
(e) Which two regressors (plus intercept) give the best fit? And which set
of regressors gives the best BIC? (Hint: Use the leaps package.)
1
2. The data for this practice question is based on the cars dataset which is
automatically comes with R. The intercept will is included.
(c) Plot a histogram of the AIC’s. Does the distribution look skewed left,
or skewed right, or symmetric?
2
3. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.
(b) The difference between the null and residual deviance is distributed as
chi-squared with how many degrees of freedom?
(c) Make a box plot of the Pearson residuals versus the Pclass variable.
Hint: If you get a mismatched length error, make an adjustment to the
appropriate parameter in your glm call. See R glm() documentation
for help.
(d) Find the mean Pearson residual for Pclass = 2. Hint: One method is
to regress the Pearson residuals versus Pclass as a categorical variable.
Another method is to use the aggregate() function.
3
4. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.
(c) Is a survivor more or less likely to have embarked from France? How
much to the log odds change?
4
5. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.
(b) Do sed.seed(0). Using the mice package and the default method, cre-
ate five imputed datasets. What are the five imputed ages for passenger
number 6?
(c) Do sed.seed(0) and repeat the above using the norm.boot method.
What are the five imputed ages for passenger number 6?