0% found this document useful (0 votes)
26 views2 pages

HW 6

This homework assignment involves analyzing properties of adjusted R-squared and using various regression techniques to predict college acceptance rates. Specifically, it asks students to: 1) Investigate properties of adjusted R-squared through examples and counterexamples. 2) Fit linear regression models using different variable selection techniques like forward selection and evaluate their performance on training and test sets. 3) Compare regularization methods like ridge regression, lasso, and others in terms of their test error and variable selection. 4) Recommend the best approach for this dataset based on the test error and predictive ability of different models.

Uploaded by

Dhroov Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views2 pages

HW 6

This homework assignment involves analyzing properties of adjusted R-squared and using various regression techniques to predict college acceptance rates. Specifically, it asks students to: 1) Investigate properties of adjusted R-squared through examples and counterexamples. 2) Fit linear regression models using different variable selection techniques like forward selection and evaluate their performance on training and test sets. 3) Compare regularization methods like ridge regression, lasso, and others in terms of their test error and variable selection. 4) Recommend the best approach for this dataset based on the test error and predictive ability of different models.

Uploaded by

Dhroov Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

STATS 415 Homework 6

Please use R Markdown to write up your solutions. Submit your work


through Canvas by uploading a pdf file that contains your solutions and
a separate Rmd file that contains your code.

1. Read ISLR chapter 6


2. ISLR chapter 6 exercises: 1, 2, 3, 4, 6
3. This exercise investigates properties of the adjusted R2 .

(a) We know that R2 is guaranteed to be between 0 and 1. Are both bounds


(≥ 0 and ≤ 1) true for Ra2 ? For each bound, either prove it is true or
give a counterexample.
(b) Suppose you have p = 500 predictors and 501 observations in your
dataset, and you fit a linear regression model. Predictors 1-50 are cor-
related with the response, and when a linear model with just these 50
predictors is fit, we get R2 = 0.5. The remaining 450 predictors have 0
correlation with the response, so adding any of them to the model does
not change the R2 . How many of these extra “uninformative” predictors
added to the model will make the adjusted R2 exactly 0?

4. In this exercise, we will predict the acceptance rate of a college (number of


applications accepted / number of applications received) using the College
dataset from the ISLR package.

(a) Split the data set into a training set and a test set. Fix the random seed
to the value 234, choose 30% (rounded down to the nearest integer) of the
data at random for testing, and use the rest for training. Define a new
response variable Accept/Apps. Plot this variable against every variable
in the dataset (make sure you use the appropriate type of plot for each
predictor). Comment on which variables appear to be most predictive.
(b) Fit a linear model using least squares on the training set, and report
the training and test error obtained, with Accept/Apps as the response
variable and all other variables as predictors.
(c) Perform forward and backward selection on the full model with the
threshold α = 0.05 to select a potentially smaller model. Report which
model each method chose, and the training and test errors for their chosen
models.
(d) Use AIC, BIC, and adjusted R2 to select a potentially smaller model
instead, from the set of all possible predictors used in 4b. Report which
model each method chose, and the training and test errors for their chosen
model(s).
(e) Use 5-fold cross-validation to estimate the test error from the training
data, for the candidate smaller model(s) you found so far, and for the full
model from 4b. Compare the training, CV, and test errors and comment
on the results.
(f) Fit a ridge regression model on the training set, with λ chosen by cross-
validation. Report the training and test errors.
(g) Fit a lasso model on the training set, with λ chosen by cross-validation.
Report which variables are included in the model, and the training and
test errors obtained.
(h) Fit a PCR model on the training set, with M chosen by cross-validation.
Report the test error obtained, along with the value of M selected by
cross-validation.
(i) Fit a PLS model on the training set, with M chosen by cross-validation.
Report the test error obtained, along with the value of M selected by
cross-validation.
(j) Comment on the results obtained. How accurately can we predict the
acceptance rate? How much difference is there among the test errors
resulting from different approaches? Which approach would you recom-
mend for this dataset and why?

You might also like