Tutorial 2
Tutorial 2
R Tutorial 2
1 November 2024
Important Instructions
• These weekly exercises are highly relevant to the group assignment.
1
Exercise 1: Linear Regression on the Auto Data
The task in this exercise is to apply multiple linear regression to the Auto data set.
1. Produce a scatterplot matrix including all variables in the data set (You can
use ggpairs() from GGally library).
2. Compute the correlation matrix between the variables. Exclude the name
variable, as it is qualitative.
3. Use lm() to perform multiple linear regression with mpg as the response and
all other variables (except name) as predictors. Use summary() to print the
results. Address the following:
4. Plot residuals versus fitted values (this is called a residual plot). Does the
residual plot suggest a non-linear relationship between predictors and re-
sponse? Does it suggest any large outliers?
5. Use the * and : symbols to include interaction effects in the model. Identify
any statistically significant interactions.
√
6. Experiment with transformations such as log(X), X, and X 2 . Comment on
their effectiveness.
1. Fit a multiple regression model to predict Sales using Price, Urban, and US.
4. Fit a reduced model using only the predictors with statistically significant
relationships to the response.
5. Compare the fit of the full model and the reduced model.
6. Using the reduced model, obtain 95% confidence intervals for the coefficients.
2
Exercise 3: Predicting Gas Mileage Using Logistic
Regression and KNN
In this exercise, you will develop a model to predict whether a car gets high or low
gas mileage based on the Auto data set.
1. Create a binary variable mpg01, which is 1 if mpg is above its median and 0
otherwise.
2. Explore the data graphically to assess the relationship between mpg01 and the
other features. Use scatterplots and boxplots to identify useful predictors.
4. Perform logistic regression on the training data using the most relevant pre-
dictors identified in the previous step. Report the test error.
5. Perform K-Nearest Neighbors (KNN) on the training data with different val-
ues of K to predict mpg01. Report the test errors and determine which value
of K performs best.