0% found this document useful (0 votes)
3 views

Tutorial 2

The document outlines a tutorial on data science methods in finance, specifically focusing on linear regression and predictive modeling using R. It includes exercises on applying multiple linear regression to the Auto and Carseats data sets, as well as logistic regression and KNN for predicting gas mileage. The tutorial emphasizes the importance of exploring data, interpreting coefficients, and assessing model performance without requiring formal submissions.

Uploaded by

q.s.b.bibo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Tutorial 2

The document outlines a tutorial on data science methods in finance, specifically focusing on linear regression and predictive modeling using R. It includes exercises on applying multiple linear regression to the Auto and Carseats data sets, as well as logistic regression and KNN for predicting gas mileage. The tutorial emphasizes the importance of exploring data, interpreting coefficients, and assessing model performance without requiring formal submissions.

Uploaded by

q.s.b.bibo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Science Methods in Finance

R Tutorial 2

1 November 2024

Important Instructions
• These weekly exercises are highly relevant to the group assignment.

• It is optional, but we strongly encourage you to work through it.

NO write-up of your answers or submission is required

1
Exercise 1: Linear Regression on the Auto Data
The task in this exercise is to apply multiple linear regression to the Auto data set.

1. Produce a scatterplot matrix including all variables in the data set (You can
use ggpairs() from GGally library).

2. Compute the correlation matrix between the variables. Exclude the name
variable, as it is qualitative.

3. Use lm() to perform multiple linear regression with mpg as the response and
all other variables (except name) as predictors. Use summary() to print the
results. Address the following:

• Is there a relationship between the predictors and the response?


• Which predictors have a statistically significant relationship with the
response?
• What does the coefficient for the year variable suggest?

4. Plot residuals versus fitted values (this is called a residual plot). Does the
residual plot suggest a non-linear relationship between predictors and re-
sponse? Does it suggest any large outliers?

5. Use the * and : symbols to include interaction effects in the model. Identify
any statistically significant interactions.

6. Experiment with transformations such as log(X), X, and X 2 . Comment on
their effectiveness.

Exercise 2: Linear Regression on the Carseats Data


The objective is to fit and interpret a multiple regression model using the Carseats
data set.

1. Fit a multiple regression model to predict Sales using Price, Urban, and US.

2. Provide an interpretation of each coefficient, noting that some predictors are


qualitative.

3. Identify which predictors allow you to reject the null hypothesis H0 : βj = 0.

4. Fit a reduced model using only the predictors with statistically significant
relationships to the response.

5. Compare the fit of the full model and the reduced model.

6. Using the reduced model, obtain 95% confidence intervals for the coefficients.

2
Exercise 3: Predicting Gas Mileage Using Logistic
Regression and KNN
In this exercise, you will develop a model to predict whether a car gets high or low
gas mileage based on the Auto data set.

1. Create a binary variable mpg01, which is 1 if mpg is above its median and 0
otherwise.

2. Explore the data graphically to assess the relationship between mpg01 and the
other features. Use scatterplots and boxplots to identify useful predictors.

3. Split the data into training and test sets.

4. Perform logistic regression on the training data using the most relevant pre-
dictors identified in the previous step. Report the test error.

5. Perform K-Nearest Neighbors (KNN) on the training data with different val-
ues of K to predict mpg01. Report the test errors and determine which value
of K performs best.

You might also like