0% found this document useful (0 votes)
13 views5 pages

ApplStats Spring2022 Final Practice

Uploaded by

Aaron Zhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

ApplStats Spring2022 Final Practice

Uploaded by

Aaron Zhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

AMS 553.

414/614: Applied Statistics and Data Analysis


Practice questions for final exam

1. The data for this practice question is based on the cars dataset which is
automatically comes with R.

(a) Let dist be the response variable and speed be the explanatory vari-
able. Do quintic polynomial regression (including the interecept). Which
individual coefficient has the highest statistical significance?

(b) Using stepwise backwards elimination, continue to drop the least sta-
tistically significant regressors (but do not drop the intercept) until
all (non-intercept) regressors have p-values of less than 0.05. Which
regressors remain?

(c) Now treat the intercept as just another regressor. Using stepwise back-
wards elimination, continue to drop the least statistically significant
regressors (drop the intercept if it is least significant) until all regres-
sors have p-values of less than 0.05. Which regressors remain?

(d) First regress on the intercept only. Then using stepwise forward selec-
tion, continue to include the most statistically significant regressors (up
to and including the quintic term) until no more additional regressors
would have p-values of less than 0.05. Which regressors are selected for
the model?

(e) Which two regressors (plus intercept) give the best fit? And which set
of regressors gives the best BIC? (Hint: Use the leaps package.)

1
2. The data for this practice question is based on the cars dataset which is
automatically comes with R. The intercept will is included.

(a) Regress dist on speed. What is the AIC?

(b) Do set.seed(0). Use bootstrapping to create 10,000 more AIC statis-


tics. What is their standard deviation? (Hint: Use dplyr::sample_n
to appropriately sample rows from a dataframe.)

(c) Plot a histogram of the AIC’s. Does the distribution look skewed left,
or skewed right, or symmetric?

2
3. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.

(a) Logistically regress Survived (the response variable) on the regressors


Pclass (treat as cardinal variable), Sex and Age. What is the least
signficant regressor?

(b) The difference between the null and residual deviance is distributed as
chi-squared with how many degrees of freedom?

(c) Make a box plot of the Pearson residuals versus the Pclass variable.
Hint: If you get a mismatched length error, make an adjustment to the
appropriate parameter in your glm call. See R glm() documentation
for help.

(d) Find the mean Pearson residual for Pclass = 2. Hint: One method is
to regress the Pearson residuals versus Pclass as a categorical variable.
Another method is to use the aggregate() function.

3
4. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.

(a) What is the most common value of Embarked?

(b) Do multinomial logistic regression with Embarked as the response vari-


able and Pclass (treat as cardinal variable), Sex, Age and Survived
as the regressor. Use the most common value of Embarked as the ref-
erence value. For predicting which passengers embarked from France,
what is the most signficant regressor? The least significant regressor?

(c) Is a survivor more or less likely to have embarked from France? How
much to the log odds change?

4
5. The data for this practice question is based on the Titanic_train.csv
which is available in Blackboard. The intercept will always be included.

(a) How many values of Age are missing?

(b) Do sed.seed(0). Using the mice package and the default method, cre-
ate five imputed datasets. What are the five imputed ages for passenger
number 6?

(c) Do sed.seed(0) and repeat the above using the norm.boot method.
What are the five imputed ages for passenger number 6?

You might also like