0% found this document useful (0 votes)

16 views5 pages

Regression Models Assignment 1

Uploaded by

Maadhav Sehgal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Regression Models Assignment 1

Uploaded by

Maadhav Sehgal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Automatic or Manual?

Yash Kumar Singh

08 April 2017

Executive Summary

In this study we look at the cars dataset comprising of different aspects of automobile design for 32 automobiles,
to explore the relationship between these aspects with the miles per gallon. We specifically focus on the
following two questions being is an automatic or manual transmission better for MPG and how to quantify
this MPG difference between automatic and manual transmissions.
To achieve our objectives we take the following steps:
• Data preprocessing
• Exploratory Analysis
• Model Selection
• Model Examination
• Conclusion

Data Preprocessing

First, we change the ‘am’ variable of the dataset which denotes if a car is automatic or manual transmission
to a factor variable. We also other variables factor just as to make them discrete instead of continuous.
data("mtcars")
data <- mtcars
data$am <- as.factor(data$am)
levels(data$am) <- c("A", "M")

data$cyl <- as.factor(data$cyl)

data$gear <- as.factor(data$gear)
data$vs <- as.factor(data$vs)
levels(data$vs) <- c("V", "S")

Exploratory Analysis

First let’s take a look at the dataset itself to know about the fields it contains.
str(data)

## 'data.frame': 32 obs. of 11 variables:

## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "V","S": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "A","M": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...

1
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
head(data, n = 5)

## mpg cyl disp hp drat wt qsec vs am gear carb

## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 V M 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 V M 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 S M 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 S A 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 V A 3 2
To see the relationship between the mpg and am more clearly lets create a boxplot.
library(ggplot2)
g <- ggplot(data, aes(am, mpg))
g <- g + geom_boxplot(aes(fill = am))
print(g)

25
am
mpg

A
M
20

10
A M
am

The plot clearly shows that cars with manual transmission do have higher mpg as compared to the one’s with
automatic transmission. However there might be other factors which we might be overlooking. Hence before
creating a model we should look at other parameters which have high correlation with the variable. Lets look
at all the variables whose correlation with mpg is higher than the am variable.
correlation <- cor(mtcars$mpg, mtcars)
correlation <- correlation[,order(-abs(correlation[1, ]))]
correlation

## mpg wt cyl disp hp drat

## 1.0000000 -0.8676594 -0.8521620 -0.8475514 -0.7761684 0.6811719

2
## vs am carb gear qsec
## 0.6640389 0.5998324 -0.5509251 0.4802848 0.4186840
variables <- names(correlation)[1: which(names(correlation) == "am")]
variables

## [1] "mpg" "wt" "cyl" "disp" "hp" "drat" "vs" "am"

Model Selection

Now that we know mpg variable has stronger correlations with other variables too apart from just am, we
can’t base our model solely on this one variable as it will not be the most accurate one. Let’s start this
process by fitting mpg with just am.
first <- lm(mpg ~ am, data)
summary(first)

##
## Call:
## lm(formula = mpg ~ am, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amM 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
In this case p-value is quite low but the R-squared value is the real problem. Hence, let’s now go to the other
extreme end and fit all variables with mpg.
last <- lm(mpg ~ ., data)
summary(last)

##
## Call:
## lm(formula = mpg ~ ., data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2015 -1.2319 0.1033 1.1953 4.3085
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.09262 17.13627 0.881 0.3895
## cyl6 -1.19940 2.38736 -0.502 0.6212
## cyl8 3.05492 4.82987 0.633 0.5346
## disp 0.01257 0.01774 0.708 0.4873

3
## hp -0.05712 0.03175 -1.799 0.0879 .
## drat 0.73577 1.98461 0.371 0.7149
## wt -3.54512 1.90895 -1.857 0.0789 .
## qsec 0.76801 0.75222 1.021 0.3201
## vsS 2.48849 2.54015 0.980 0.3396
## amM 3.34736 2.28948 1.462 0.1601
## gear4 -0.99922 2.94658 -0.339 0.7382
## gear5 1.06455 3.02730 0.352 0.7290
## carb 0.78703 1.03599 0.760 0.4568
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.616 on 19 degrees of freedom
## Multiple R-squared: 0.8845, Adjusted R-squared: 0.8116
## F-statistic: 12.13 on 12 and 19 DF, p-value: 1.764e-06
Here R-squared values have definitely improved but the p-value becomes the problem now which is caused
most probably due to overfitting. So, lets use ‘step’ method to iterate over the variables and obtain the best
model.
best <- step(last, direction = "both", trace = FALSE)
summary(best)

##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amM 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Here the R-squared value is pretty good and also p-values are quite significant. Hence undoubtedly this is
the best fit for us.

Model Examination

The best model we obtained i.e., ‘best’ depicts the dependance of mpg over wt and qsec other than am. Let’s
plot and study some residual plots to understand more about the ‘best’ fit.
layout(matrix(c(1,2,3,4),2,2))
plot(best)

4
Standardized residuals
Residuals vs Fitted Scale−Location
Chrysler Imperial Chrysler Imperial Fiat 128
Fiat 128 Toyota Corolla
Residuals

Toyota Corolla
4

1.0
0

0.0
−4

10 15 20 25 30 10 15 20 25 30

Fitted values Fitted values

Standardized residuals

Standardized residuals
Normal Q−Q Residuals vs Leverage
Chrysler Imperial
Fiat Corolla
128 Chrysler Imperial
Fiat 128 0.5
Toyota

1
1

−1
Cook's distance
−1

Merc 230

−2 −1 0 1 2 0.00 0.10 0.20 0.30

Theoretical Quantiles Leverage

Conclusion

The first question whether automatic or manual is better for mpg can be answered using all the models
created as holding all the other parameters constant, manual transmission increases the mpg.
However the second question is a little difficult to answer. Based on ‘best’ fit model, we conclude that cars
with manual transmission have 2.93 more mpg than that of automatic with p < 0.05 and R-squared 0.85.
Residuals vs Fitted plot however shows something is missing from the model which might be a problem
due to a small sample size which is 32 observations. Even though the conclusion that manual has better
performance with respect to mpg, whether the model will git all future observations will be doubtful.

Handbook of Aseptic Processing and Packaging PDF
89% (9)
Handbook of Aseptic Processing and Packaging PDF
386 pages
R LAB Exproling Data
100% (2)
R LAB Exproling Data
6 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
Weather Tools PPT
No ratings yet
Weather Tools PPT
23 pages
Anthropology
100% (1)
Anthropology
4 pages
Clarity On Concurrency - James Pickavance
100% (1)
Clarity On Concurrency - James Pickavance
36 pages
Regression Models Assignment 1 (1) (1)
No ratings yet
Regression Models Assignment 1 (1) (1)
5 pages
Regression Models Assignment 1
No ratings yet
Regression Models Assignment 1
6 pages
Regression Models Project - Motor Trend Data Analysis: Executive Summary
No ratings yet
Regression Models Project - Motor Trend Data Analysis: Executive Summary
7 pages
Regression
No ratings yet
Regression
5 pages
Regression Models Project
No ratings yet
Regression Models Project
5 pages
Regression Models Course Project
100% (1)
Regression Models Course Project
4 pages
Motor Trend Car Road Tests
No ratings yet
Motor Trend Car Road Tests
5 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
Regression Models Project Sid Jas
No ratings yet
Regression Models Project Sid Jas
7 pages
Regression Models - Others
No ratings yet
Regression Models - Others
2 pages
Motor Trend MPG Data Analysis
No ratings yet
Motor Trend MPG Data Analysis
4 pages
Regression
No ratings yet
Regression
5 pages
Exercise 4 Comparing Transmission Types
No ratings yet
Exercise 4 Comparing Transmission Types
1 page
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
Assignment Auto
No ratings yet
Assignment Auto
6 pages
Car Mileage Prediction Model
No ratings yet
Car Mileage Prediction Model
5 pages
Chapter 4 Exercise 11
No ratings yet
Chapter 4 Exercise 11
5 pages
Motor Trend Analysis
No ratings yet
Motor Trend Analysis
4 pages
bda file
No ratings yet
bda file
54 pages
Analysis of Mtcars
100% (1)
Analysis of Mtcars
3 pages
DMPM-LAB-03-Assignment: Rcode
No ratings yet
DMPM-LAB-03-Assignment: Rcode
9 pages
R Studio
No ratings yet
R Studio
4 pages
R Studio
No ratings yet
R Studio
5 pages
Lab4
No ratings yet
Lab4
4 pages
Aayushi Bda File
No ratings yet
Aayushi Bda File
41 pages
Fall 2023-2024 IE 451 Homework 2 Solutions
No ratings yet
Fall 2023-2024 IE 451 Homework 2 Solutions
20 pages
R Program
No ratings yet
R Program
2 pages
Week2 Submission Assignment Solution AshaA-3
No ratings yet
Week2 Submission Assignment Solution AshaA-3
2 pages
Flavor Packaging Light Organic Score (1-Worst) DV: Regression Statistics
No ratings yet
Flavor Packaging Light Organic Score (1-Worst) DV: Regression Statistics
14 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Regression_Analysis_Sumana Mondal
No ratings yet
Regression_Analysis_Sumana Mondal
18 pages
Assignment
No ratings yet
Assignment
49 pages
TWP Streams Replication 11gr1 130056
No ratings yet
TWP Streams Replication 11gr1 130056
3 pages
Introduction To R Program and Output
No ratings yet
Introduction To R Program and Output
6 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
HW3 Isye 7406
No ratings yet
HW3 Isye 7406
8 pages
Exercise 1 Filtering and Summarizing Fuel Efficiency
No ratings yet
Exercise 1 Filtering and Summarizing Fuel Efficiency
1 page
7406HW03
No ratings yet
7406HW03
2 pages
Practical 5
No ratings yet
Practical 5
5 pages
LinearRegression HandsOn
No ratings yet
LinearRegression HandsOn
3 pages
Statistics Introduction
No ratings yet
Statistics Introduction
8 pages
Automatic Versus Manual Transmissions: Mtcars Dataset Analysis
No ratings yet
Automatic Versus Manual Transmissions: Mtcars Dataset Analysis
4 pages
R
No ratings yet
R
3 pages
In Class Exercise Linear Regression in R
No ratings yet
In Class Exercise Linear Regression in R
6 pages
Untitled 2
No ratings yet
Untitled 2
26 pages
Statistics
No ratings yet
Statistics
10 pages
Notes 8 - Examples(March5)
No ratings yet
Notes 8 - Examples(March5)
25 pages
Multiple Regression1
No ratings yet
Multiple Regression1
27 pages
R Lab Ex 1 to 5
No ratings yet
R Lab Ex 1 to 5
26 pages
Mtcars Dataset Analysis in R
No ratings yet
Mtcars Dataset Analysis in R
4 pages
Team AN
No ratings yet
Team AN
23 pages
Economics 400 Computer Exercise
No ratings yet
Economics 400 Computer Exercise
7 pages
R Notebook: "Mtcars - CSV"
No ratings yet
R Notebook: "Mtcars - CSV"
4 pages
Topic
No ratings yet
Topic
9 pages
Regression Analysis
No ratings yet
Regression Analysis
19 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
Regression_Analysis_Interpretation
No ratings yet
Regression_Analysis_Interpretation
4 pages
Engine Tuning Guide
From Everand
Engine Tuning Guide
Rodulf nouh Fidal
No ratings yet
MG Windsor Brochure
No ratings yet
MG Windsor Brochure
39 pages
2
No ratings yet
2
16 pages
Microeconomics
No ratings yet
Microeconomics
168 pages
Zarathustra
100% (1)
Zarathustra
531 pages
Population of Ethiopia
100% (1)
Population of Ethiopia
23 pages
Automation of Disability Claims Handling at Insureit
No ratings yet
Automation of Disability Claims Handling at Insureit
3 pages
Business Communication Assignment
No ratings yet
Business Communication Assignment
13 pages
Shopify Vs BeCommerce
No ratings yet
Shopify Vs BeCommerce
3 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
A S Enterprises
No ratings yet
A S Enterprises
7 pages
9618 - 12 Computer Science Paper 1 Theory Fundamentals AS & A
No ratings yet
9618 - 12 Computer Science Paper 1 Theory Fundamentals AS & A
4 pages
Bali 2007: On The Road Again!
No ratings yet
Bali 2007: On The Road Again!
7 pages
Afi63 1011
No ratings yet
Afi63 1011
226 pages
Ex34063 PDF
No ratings yet
Ex34063 PDF
1 page
MEDIATION - The Importance and Challenge of Active Listening in Mediation
No ratings yet
MEDIATION - The Importance and Challenge of Active Listening in Mediation
6 pages
Nano Technology in Electrical Engineering
67% (3)
Nano Technology in Electrical Engineering
17 pages
Question Paper Preview: Mechanical Engineering 11th May 2018 Shift2 Mechanical Engineering 120
No ratings yet
Question Paper Preview: Mechanical Engineering 11th May 2018 Shift2 Mechanical Engineering 120
42 pages
Christmas Gifts and Kin Networks
No ratings yet
Christmas Gifts and Kin Networks
11 pages
Joe Structure
No ratings yet
Joe Structure
7 pages
Sec 2 E Math Peicai Sec SA2 2018i
No ratings yet
Sec 2 E Math Peicai Sec SA2 2018i
42 pages
Protoplast Isolation - Technical Notes
No ratings yet
Protoplast Isolation - Technical Notes
4 pages
Basic Limits & Continuity Hard
No ratings yet
Basic Limits & Continuity Hard
8 pages
Service Manual - Pala Eléctrica 7495 CAT
0% (1)
Service Manual - Pala Eléctrica 7495 CAT
1 page
99 181471 - Sailor System 6000b 150w Gmdss MFHF - Ec Type Examination Module B - Uk Tuvsud
No ratings yet
99 181471 - Sailor System 6000b 150w Gmdss MFHF - Ec Type Examination Module B - Uk Tuvsud
6 pages
Full download Free Pascal Reference guide version 3 0 2 Michaël Van Canneyt pdf docx
100% (1)
Full download Free Pascal Reference guide version 3 0 2 Michaël Van Canneyt pdf docx
55 pages
Assignment-2: Computer Memory
No ratings yet
Assignment-2: Computer Memory
3 pages
Crime Delinquency And Justice A Caribbean Reader Ramesh Deosaran pdf download
No ratings yet
Crime Delinquency And Justice A Caribbean Reader Ramesh Deosaran pdf download
83 pages
Relational Theory: New Growth in Psychoanalysis Andpsychotherapy
No ratings yet
Relational Theory: New Growth in Psychoanalysis Andpsychotherapy
7 pages
Guida Master
No ratings yet
Guida Master
8 pages
A 252
No ratings yet
A 252
1 page

Regression Models Assignment 1

Uploaded by

Regression Models Assignment 1

Uploaded by

Automatic or Manual?

Yash Kumar Singh

data$cyl <- as.factor(data$cyl)

## 'data.frame': 32 obs. of 11 variables:

## mpg cyl disp hp drat wt qsec vs am gear carb

## mpg wt cyl disp hp drat

## [1] "mpg" "wt" "cyl" "disp" "hp" "drat" "vs" "am"

Fitted values Fitted values

−2 −1 0 1 2 0.00 0.10 0.20 0.30

Theoretical Quantiles Leverage

You might also like