0% found this document useful (0 votes)

8 views5 pages

ISYE6501 Homework 5

The document outlines a homework assignment for ISYE 6501, focusing on using linear regression to predict sales performance and crime rates based on various predictors. It details the process of building models, including a naive model and a refined model using recursive feature elimination to identify significant predictors. The final model predicts a crime rate of 870.68 using ten selected predictors, demonstrating the effectiveness of data-driven insights in forecasting.

Uploaded by

vitieubao083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

ISYE6501 Homework 5

Uploaded by

vitieubao083

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ISYE6501-Homework-5

2024-02-13

Load dependencies packages using pacman

rm(list=ls())
setwd("~/Georgia Tech - OMSA/ISYE 6501")
if(!require(pacman)) install.packages("pacman")

## Loading required package: pacman

library(pacman)
p_load(tinytex, tidyverse, caret, ggplot2, datasets)

Question 8.1

As a webmaster overseeing an e-commerce platform, I often rely on data-driven insights to forecast sales
performance. One situation where a linear regression model proves invaluable is in predicting monthly
revenue. In this scenario, I typically consider several predictors:

1. Marketing Spend: I analyze the impact of our marketing investments, like digital ads and social media
campaigns, on sales. Tracking how different spending levels correlate with revenue helps optimize our
marketing budget.
2. Website Traffic: Monitoring the number of visitors to our site provides a clear indication of potential
sales. Understanding traffic patterns helps anticipate demand and tailor promotional efforts accord-
ingly.
3. Seasonal Trends: Recognizing how seasonal variations affect sales allows me to adjust strategies accord-
ingly. Holidays and special events often drive fluctuations in consumer behavior, influencing purchasing
decisions.
4. Product Pricing: By examining how changes in product pricing impact sales, I can refine pricing
strategies to maximize revenue. Finding the balance between competitiveness and profitability is key.
5. Customer Feedback: Incorporating customer reviews and ratings into the model gives insight into
product perception. Understanding how sentiment influences purchasing decisions helps refine product
offerings and marketing approaches.

Question 8.2

Using crime data from http://www.statsci.org/data/general/uscrime.txt, use regression (a useful

R function is lm or glm) to predict the observed crime rate in a city with the following data:
M = 14.0, So = 0, Ed = 10.0, Po1 = 12.0, Po2 = 15.5, LF = 0.640, M.F = 94.0, Pop = 150, NW
= 1.1, U1 = 0.120, U2 = 3.6, Wealth = 3200, Ineq = 20.1, Prob = 0.04, Time = 39.0
Show your model (factors used and their coefficients), the software output, and the quality of fit.

1
# Store the data point as test
test <- data.frame(M = 14.0, So = 0, Ed = 10.0, Po1 = 12.0, Po2 = 15.5, LF = 0.640, M.F = 94.0, Pop = 15

#Load US Crime rate

file_path <- "~/Georgia Tech - OMSA/ISYE 6501/hw5/data 8.2/uscrime.txt"
crime <- read.table(file_path, stringsAsFactors = FALSE, header=TRUE)
head(crime)

## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq Prob

## 1 15.1 1 9.1 5.8 5.6 0.510 95.0 33 30.1 0.108 4.1 3940 26.1 0.084602
## 2 14.3 0 11.3 10.3 9.5 0.583 101.2 13 10.2 0.096 3.6 5570 19.4 0.029599
## 3 14.2 1 8.9 4.5 4.4 0.533 96.9 18 21.9 0.094 3.3 3180 25.0 0.083401
## 4 13.6 0 12.1 14.9 14.1 0.577 99.4 157 8.0 0.102 3.9 6730 16.7 0.015801
## 5 14.1 0 12.1 10.9 10.1 0.591 98.5 18 3.0 0.091 2.0 5780 17.4 0.041399
## 6 12.1 0 11.0 11.8 11.5 0.547 96.4 25 4.4 0.084 2.9 6890 12.6 0.034201
## Time Crime
## 1 26.2011 791
## 2 25.2999 1635
## 3 24.3006 578
## 4 29.9012 1969
## 5 21.2998 1234
## 6 20.9995 682

Objective: We will need to find the significant predictor variables that can impact the outcome
of our model in predicting the crime rate of the provided test data point. Our hypothesis are:

• Ho : the selected variable does not impact the outcome which is the crime rate
• Ha : the selected variable does have some impacts on predicting the outcome

Methods: First, we will create a naive simple Linear regression model lm() with all the available
predictors and observe the result. Then we will apply the recursive feature elimination rfe()
as a cross -validation method to reduce the less important predictors to produce an optimal
model.

set.seed(1)
# create a naive model and print result
naive_model <- lm(Crime~., data=crime)
summary(naive_model)

##
## Call:
## lm(formula = Crime ~ ., data = crime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -395.74 -98.09 -6.69 112.99 512.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 ***
## M 8.783e+01 4.171e+01 2.106 0.043443 *

2
## So -3.803e+00 1.488e+02 -0.026 0.979765
## Ed 1.883e+02 6.209e+01 3.033 0.004861 **
## Po1 1.928e+02 1.061e+02 1.817 0.078892 .
## Po2 -1.094e+02 1.175e+02 -0.931 0.358830
## LF -6.638e+02 1.470e+03 -0.452 0.654654
## M.F 1.741e+01 2.035e+01 0.855 0.398995
## Pop -7.330e-01 1.290e+00 -0.568 0.573845
## NW 4.204e+00 6.481e+00 0.649 0.521279
## U1 -5.827e+03 4.210e+03 -1.384 0.176238
## U2 1.678e+02 8.234e+01 2.038 0.050161 .
## Wealth 9.617e-02 1.037e-01 0.928 0.360754
## Ineq 7.067e+01 2.272e+01 3.111 0.003983 **
## Prob -4.855e+03 2.272e+03 -2.137 0.040627 *
## Time -3.479e+00 7.165e+00 -0.486 0.630708
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 209.1 on 31 degrees of freedom
## Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078
## F-statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07

# let predict the naive model and compare it with the Crime rate range
predict(naive_model, test)

## 1
## 155.4349

range(crime$Crime)

## [1] 342 1993

First Observation: We can see that the predicted crime rate (155.4) is clearly out of the Crime
rate range (342 - 1993), one of the possibility is that the naive model has been overfitting with
all the predictors.
Taking a closer look at the column Pr of the Coefficients table and comparing the values to
the Significant codes, we conclude that all the predictors that has the p values of 0.05 and
above should be excluded from our final model since they are not significantly impacted the
outcome. Keep in mind that a predictor must have a value that is lower than alpha (0.05) in
order for us to reject the null hypothesis.
With all these insignificant predictor excluded, we are now end up with 4 strong predictors, in-
cluding M (0.043), Ed (0.0048), Ineq (0.0039), Prob (0.040), so we are ready to build hopefully
a better model.

set.seed(1)
better_model <- lm(Crime ~ M + Ed + Ineq + Prob, data = crime, x = TRUE, y = TRUE)
summary(better_model)

##
## Call:
## lm(formula = Crime ~ M + Ed + Ineq + Prob, data = crime, x = TRUE,
## y = TRUE)

3
##
## Residuals:
## Min 1Q Median 3Q Max
## -532.97 -254.03 -55.72 137.80 960.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1339.35 1247.01 -1.074 0.28893
## M 35.97 53.39 0.674 0.50417
## Ed 148.61 71.92 2.066 0.04499 *
## Ineq 26.87 22.77 1.180 0.24458
## Prob -7331.92 2560.27 -2.864 0.00651 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 347.5 on 42 degrees of freedom
## Multiple R-squared: 0.2629, Adjusted R-squared: 0.1927
## F-statistic: 3.745 on 4 and 42 DF, p-value: 0.01077

#predict the outcome with test data point

predict(better_model, test)

## 1
## 897.2307

Second observation: We can tell that there is an improvement this time with the predicted
crime rate of 897.2307, and the Adjusted R-squared value drop from 0.7078 (naive model) to
0.1927. However, we don’t think we have built the best model yet, since the other excluded
predictor might still have a positive effect when being combined with the included predictors.
For this, we want to go extra miles, and apply the recursive feature elimination rfe() to train
out model and identify the best predictors using cross-validation.

set.seed(1)

# set parameters for rfe() methods

params <- rfeControl(functions = lmFuncs,
method = "repeatedcv",
number=10,
repeats = 25,
verbose = FALSE)

#run the rfe() methods

rfe_lm <- rfe(crime[,-16], crime[[16]],
sizes = c(1:15),
rfeControl = params)

rfe_lm

##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold, repeated 25 times)
##

4
## Resampling performance over subset size:
##
## Variables RMSE Rsquared MAE RMSESD RsquaredSD MAESD Selected
## 1 360.8 0.3296 291.5 133.31 0.2986 100.85
## 2 345.8 0.3610 277.7 134.61 0.2975 110.02
## 3 352.0 0.3339 285.5 132.77 0.2993 109.49
## 4 333.4 0.4115 277.6 99.41 0.3142 87.97
## 5 316.4 0.4805 260.8 96.84 0.3183 86.07
## 6 306.9 0.4952 253.8 96.65 0.3224 88.09
## 7 308.9 0.5038 256.0 93.59 0.3101 84.88
## 8 270.1 0.5720 224.2 85.00 0.3001 74.44
## 9 235.8 0.6502 190.8 94.79 0.2851 78.07
## 10 230.9 0.6636 186.9 85.05 0.2645 68.75 *
## 11 236.8 0.6374 191.7 90.27 0.2725 70.84
## 12 249.0 0.6112 199.9 90.39 0.2848 72.46
## 13 252.8 0.6063 203.9 92.74 0.2881 75.30
## 14 261.2 0.5911 210.2 97.12 0.2918 80.58
## 15 265.2 0.5841 215.0 94.74 0.2874 81.74
##
## The top 5 variables (out of 10):
## U1, Prob, LF, Po1, Ed

Third observation: by comparing the lower RMSE and the higher R-squared value, we can
tell that our best fiited model can be built with 10 strongest predictors (10 variables, RMSE:
230.9, Rsquared:0.6636). Now let list our 10 best predictors and predict the outcome with our
best fitted model.

#list the top ten predictors

set.seed(1)
predictors(rfe_lm)

## [1] "U1" "Prob" "LF" "Po1" "Ed" "U2" "Po2" "M" "Ineq" "So"

best_model <- lm(Crime ~ U1 + Prob + LF + Po1 + Ed + U2 + Po2 + M + Ineq + So, data = crime, x = TRUE, y
predict(best_model, test)

## 1
## 870.6834

Conclusion: by running cross-validation (repeatedcv) combining with the recursive feature

elimination (rfe), we are able to find the ten possible predictors to build our best model with
an acceptable predicted crime value of 870.6834. Our final best model is lm(Crime ~ U1 + Prob
+ LF + Po1 + Ed + U2 + Po2 + M + Ineq + So, data = crime, x = TRUE, y = TRUE)

Homework #6: Student: Mario Perez
No ratings yet
Homework #6: Student: Mario Perez
8 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Assignments
No ratings yet
Assignments
6 pages
HW5 11.1.R Submission
No ratings yet
HW5 11.1.R Submission
11 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
Activity 7
No ratings yet
Activity 7
5 pages
08 Test
0% (1)
08 Test
11 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Homework 8
No ratings yet
Homework 8
6 pages
IE 451 Fall 2023-2024 Homework 4 Solutions
No ratings yet
IE 451 Fall 2023-2024 Homework 4 Solutions
19 pages
HW4 9.1 Submission
No ratings yet
HW4 9.1 Submission
4 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Lab 5
No ratings yet
Lab 5
6 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Regression in R
No ratings yet
Regression in R
40 pages
As 2
No ratings yet
As 2
13 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Predictive Analytics Group Assignment
No ratings yet
Predictive Analytics Group Assignment
21 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Lec 05 - Time Series Regression Model
No ratings yet
Lec 05 - Time Series Regression Model
32 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
Unit 3
No ratings yet
Unit 3
24 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model For Medical Data
7 pages
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
No ratings yet
Cap8 Predicting Continuous Target Variables With Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
36 pages
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
No ratings yet
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
5 pages
DSEnd
No ratings yet
DSEnd
30 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
2 Linear
No ratings yet
2 Linear
15 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Bi Pract 9
No ratings yet
Bi Pract 9
8 pages
Machine Learning-Lecture 1 (Student)
No ratings yet
Machine Learning-Lecture 1 (Student)
14 pages
Basic Econometrics Revision - Econometric Modelling
No ratings yet
Basic Econometrics Revision - Econometric Modelling
65 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Week 5
No ratings yet
Week 5
11 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Project
No ratings yet
Project
16 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
Adequacy Og Regression Model
No ratings yet
Adequacy Og Regression Model
10 pages
강준혁 회귀분석 과제 4
No ratings yet
강준혁 회귀분석 과제 4
10 pages
Lec 05 2 - Time Series Regression Model
No ratings yet
Lec 05 2 - Time Series Regression Model
75 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Exercice V
No ratings yet
Exercice V
5 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
CC02 Group6 Report
No ratings yet
CC02 Group6 Report
36 pages
Stat 378
No ratings yet
Stat 378
73 pages
Two Way Fixed Effect Models
No ratings yet
Two Way Fixed Effect Models
116 pages
Solutions Week 10
No ratings yet
Solutions Week 10
7 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Banking Risk Management
No ratings yet
Banking Risk Management
57 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Crime Rate Pridction
No ratings yet
Crime Rate Pridction
9 pages
Graded Homework 1 Solutions
No ratings yet
Graded Homework 1 Solutions
19 pages
Chap12 2012
No ratings yet
Chap12 2012
30 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
MBS659 Anova and T-Tests
No ratings yet
MBS659 Anova and T-Tests
8 pages
Assignment 3 Week 3
No ratings yet
Assignment 3 Week 3
3 pages
Linear Regression Practice
No ratings yet
Linear Regression Practice
4 pages
Leron - Arlene - Psychological Statistics - Lesson 3 One Way Anova
No ratings yet
Leron - Arlene - Psychological Statistics - Lesson 3 One Way Anova
51 pages
Topic 2 - Estimation (Students' Notes)
No ratings yet
Topic 2 - Estimation (Students' Notes)
30 pages
SNM - PRESENTATION - GRP 8
No ratings yet
SNM - PRESENTATION - GRP 8
15 pages
Point and Interval Estimation: Presented By: Shubham Mehta 0019
100% (1)
Point and Interval Estimation: Presented By: Shubham Mehta 0019
43 pages
Session - 20-Problem Set-Solution - PK
No ratings yet
Session - 20-Problem Set-Solution - PK
43 pages
The Effect of Food Expenditure To The Total of Household Expenditure
No ratings yet
The Effect of Food Expenditure To The Total of Household Expenditure
12 pages
Lecturenotes 3
No ratings yet
Lecturenotes 3
11 pages
Time Series Analysis and Applications: Homework Please Return (In PDF File) by May 24, 2020
No ratings yet
Time Series Analysis and Applications: Homework Please Return (In PDF File) by May 24, 2020
2 pages
Chapter 08
No ratings yet
Chapter 08
13 pages
Computer Oriented Statistical Methods - Lab List of Programs
No ratings yet
Computer Oriented Statistical Methods - Lab List of Programs
26 pages
Lab 5 - Hypothesis Testing Using One Sample T-Test: Table 1
No ratings yet
Lab 5 - Hypothesis Testing Using One Sample T-Test: Table 1
7 pages
Stat PDF
No ratings yet
Stat PDF
132 pages
MALONES Midterm Exam FND 602608
No ratings yet
MALONES Midterm Exam FND 602608
5 pages
Psych Stats Reviewer
100% (1)
Psych Stats Reviewer
16 pages
Chapter 8. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 8. Test of Hypotheses For A Single Sample
116 pages
Ardl Analysis Chapter Four Updated
No ratings yet
Ardl Analysis Chapter Four Updated
11 pages
SBE - 11e Ch13b DOE and Analysis of Variance DOE ANOVA
No ratings yet
SBE - 11e Ch13b DOE and Analysis of Variance DOE ANOVA
24 pages
Results and Discussion Effectiveness-of-Rambutan-Rind-Nephelium-lappaceum-Methanolic-Extract-as-Home-Disinfectant-Spray
No ratings yet
Results and Discussion Effectiveness-of-Rambutan-Rind-Nephelium-lappaceum-Methanolic-Extract-as-Home-Disinfectant-Spray
10 pages
SMA 2272 Statistics Special Supp
No ratings yet
SMA 2272 Statistics Special Supp
3 pages
Statistics Papers
No ratings yet
Statistics Papers
7 pages
Ans of 12th Stats Part 1 SET B 50 Marks Paper
No ratings yet
Ans of 12th Stats Part 1 SET B 50 Marks Paper
10 pages
Message
No ratings yet
Message
8 pages
Political Analysis Using R: James E. Monogan III
No ratings yet
Political Analysis Using R: James E. Monogan III
4 pages
Heteroscedastic Regression Models
No ratings yet
Heteroscedastic Regression Models
29 pages
Estimators Chap 6 - Handout 7
No ratings yet
Estimators Chap 6 - Handout 7
3 pages
Paired T-Test
No ratings yet
Paired T-Test
11 pages
Nutrients 12 03561 v2
No ratings yet
Nutrients 12 03561 v2
14 pages