0% found this document useful (0 votes)

84 views13 pages

Amta - Final Exams: Code: # Load The Toyotacorolla - CSV

The document provides code to analyze Toyota car data and model price using regression. It includes: 1) Checking for missing data and visualizing relationships between price and variables like km and age. 2) Splitting data into training and test sets and building a linear regression model to predict price based on variables like age, km, and fuel type. 3) Evaluating the model on training and test sets to select the best variables using measures like RMSE and R-squared. 4) Using stepwise regression to further improve the model by removing insignificant variables.

Uploaded by

Shambhawi Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views13 pages

Amta - Final Exams: Code: # Load The Toyotacorolla - CSV

Uploaded by

Shambhawi Sinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

AMTA – FINAL EXAMS

1. Toyota car data

Code:

# Load the ToyotaCorolla.csv

df = read.csv(file.choose())

# Missing Value

#first lets check for if any values are missing or not

library(randomForest)

library(missForest)

sapply(df, function(x) sum(is.na(x))) #Used to check and find if the data set has any
missing values

#from the output we understand that, there are no missing values in the given data set.
Henceforth, no need of imputation of values in the data set.

attach(df)

# Load the libraries

library(caret)

library(e1071)

library(foreach)

library(ggplot2)

library(ISLR)library(ggplot2)

library(ISLR)

i) Scatterplot
Price vs Kilo meter
Code :
# Price Vs KM
plot(Price ~ KM)

Interpretation

 Price of old cars which travelled for very less KM are priced High whereas the price
of old car which travelled for more KM are priced Low.

Price Vs Age

Code:

# Price vs Age

plot(Price ~ Age_08_04)
Interpretation

 As the age of car increases the price of the care decreases and vice versa

ii) Multicollinearity
Vif(df)
# variance inflation function should be less than 3 for all the variables of mtcars , check
for the dataset
# In case of multi colinearilty , If F is significant all independent variables become
insignificant
# A high value means alternate hypothesis
iii) Split Data Set
Code:

library(caret)
split <- createDataPartition(Price, p = .30, list = F)
train = df[-split, ] #training the model
test = df[ split, ] #model validation

mean(train$Price)
mean(test$Price)
mean(Price)

iv) Regression model

Code :
# 1.4 Predice the price based on Age, Kilometers, Fuel_Type,HP,Met_Color
# Automatic,CC,Doors,Quaterly_Tax,Weight

lmtrain = lm(Price ~
Age_08_04+KM+Fuel_Type+HP+Met_Color+Automatic+CC+Doors+Quarterly_Tax+Weig
ht, data = train)

v) Significance
Code:
summary(lmtrain)
# age,fuel type, km,hp,quaterly tax and weight are most significant variables when it
comes to predicting price
vi) Performance Validation
Code:
# 1.6 Performance validation set
# prediction interval for test data based on train model

dfpredtest = predict(lmtrain, test, interval = "predict", level = .95)

dfpredtest = data.frame(dfpredtest)
dfpredtest$price = test$Price

middle = function(a,b,y) {
val= ifelse( y >= a & y < b, 1,0)
return(val)
}

dfpredtest$validate = mapply(middle, dfpredtest$lwr, dfpredtest$upr, dfpredtest$price)

(sum(dfpredtest$validate)/dim(dfpredtest)[1])*100 # %of cases correctly classified

# saving predicted values

trainPrice = predict(lmtrain, train)
testPrice = predict(lmtrain, test)

# model performance measures

# R square: train
cor(train$Price, trainPrice)^2
# R square: test
cor(test$Price, testPrice)^2
# RMSE
RMSE(train$Price, trainPrice)
RMSE(test$Price, testPrice)
#lower RMSE , better model.
vii) Stepwise Regression

Code:

# 1.7 Stepwise Regression

lm.fit = lm(Price ~
Age_08_04+KM+Fuel_Type+HP+Met_Color+Automatic+CC+Doors+Quarterly_Tax+Weight, data = df)

step.fit = train(Price ~
Age_08_04+KM+Fuel_Type+HP+Met_Color+Automatic+CC+Doors+Quarterly_Tax+Weight, data = df,
method = "lmStepAIC", trace = FALSE)

summary(step.fit)

#when value of p is less than 0.05,then it is significant

#checking for AIC

AIC(lm.fit)

AIC(step.fit$finalModel)

#lower AIC is better. therefore stepwise AIC regression

#AIC estimates the quality of model.it is a type of criterion

2) Decision Tree
Code:
# Building the base tree
basetree = rpart(carat ~., data = df2)
rpart.plot(basetree)

# identifying pruning parameter

plotcp(basetree)
# post pruning

prunedtree = prune(basetree, cp =.015)

rpart.plot(prunedtree)

# pre pruning

preprunedtree <- rpart(carat ~ ., data = df2,

control = rpart.control(maxdepth = 2,minsplit = 50))

rpart.plot(preprunedtree)

TOPIC 2 : MODELLING IN CARET

ctrl.cv = trainControl(method = "cv", number = 6)

tree.fit = train(carat ~.,

data = df2,

method = "rpart2",

trControl = ctrl.cv,

tuneGrid = expand.grid(maxdepth = c(1:10)),

control = rpart.control(minsplit =10, minbucket = 5))

tree.fit

summary(tree.fit)

plot(tree.fit$finalModel)

rpart.plot(tree.fit$finalModel)

text(tree.fit$finalModel)

tree.fit$finalModel
# RMSE is low at MAXDEPTH – 6

plot(tree.fit$finalModel)

rpart.plot(tree.fit$finalModel)
2.2 Cluster Analysis

Code :

library(ISLR)

library(cluster)

library(caret)

library(leaps)

# Mixed Variable Cluster Analysis Using Partitioning Around Mediods

str(df2)

# computing the gower distance

gowerdist <- daisy(df2, metric = "gower")

class(gowerdist) #distance ; dissimilarity

gower.mat = as.matrix(gowerdist)

class(gower.mat)

# cluster analysis using partitioning around mediods:

# making k clusters from gowermatrix

pamsol = pam(gower.mat,diss = TRUE,3)

pamsol

# pulling out the cluster centers

pamsol$medoids

# pulling out the clustering vector

pamsol$clustering

df2$carat = pamsol$clustering# creating a column for which cluster the car is

head(df2)

# pulling out silinfo

pamsol$silinfo$avg.width

SIL = NULL

for(i in 2:7)

SIL[i] = pam(gower.mat, diss = TRUE , i)$silinfo$avg.width

plot(SIL,type = "l")

# So, 2 is the optimal number of clusters from the plot.

# using 4 clusters (SAME PROCESS)

pamsol = pam(gower.mat,diss = TRUE,2)

pamsol$medoids

df$clusmem = pamsol$clustering # creating a column for which cluster the car is

head(df)

# to know which variables are important or more significant in the cluster

model = glm(df2$carat~.,data = df2)

summary(model)

varImp(model)
Clusters obtained are “58”,”263”

Car Price Prediction Using Machine Learning
33% (3)
Car Price Prediction Using Machine Learning
15 pages
Flipkart - Transitioning To A Marketplace Model
50% (2)
Flipkart - Transitioning To A Marketplace Model
13 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
No ratings yet
Anderson F. Survival Analysis by Example. Hands On Approach Using R 2016
42 pages
Linear Regression
100% (1)
Linear Regression
16 pages
Institute of Management Technology, Ghaziabad End Term Exam (Term - VII) Take Home Exam (Time Duration: 2.30 HRS) Batch 2019 - 21 Answer-Sheet
No ratings yet
Institute of Management Technology, Ghaziabad End Term Exam (Term - VII) Take Home Exam (Time Duration: 2.30 HRS) Batch 2019 - 21 Answer-Sheet
18 pages
Aswin 80303180002
100% (1)
Aswin 80303180002
12 pages
dsr8,9
No ratings yet
dsr8,9
6 pages
Performing Unit Root Tests in Eviews
No ratings yet
Performing Unit Root Tests in Eviews
9 pages
Chave Et Al-2005 Ecuaciones Alometricas
No ratings yet
Chave Et Al-2005 Ecuaciones Alometricas
13 pages
SVKM'S Narsee Monjee Institute of Management Studies, Hyderabad
No ratings yet
SVKM'S Narsee Monjee Institute of Management Studies, Hyderabad
28 pages
hw16 109090023
No ratings yet
hw16 109090023
22 pages
Perspectives On System Identification
100% (1)
Perspectives On System Identification
13 pages
Stationarity and Time Series
No ratings yet
Stationarity and Time Series
58 pages
Arfima
No ratings yet
Arfima
20 pages
Statistical Learning in R
No ratings yet
Statistical Learning in R
31 pages
Lecture 3 - MachineLearning-CrashCourse2023
No ratings yet
Lecture 3 - MachineLearning-CrashCourse2023
99 pages
MGWR 2.2 User Manual
No ratings yet
MGWR 2.2 User Manual
38 pages
Electronic Commerce or E-Commerce Is A Business Model That Lets Firms and Individuals Buy and Sell Things Over The Internet
No ratings yet
Electronic Commerce or E-Commerce Is A Business Model That Lets Firms and Individuals Buy and Sell Things Over The Internet
14 pages
Sberbank Project Report
No ratings yet
Sberbank Project Report
19 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Ex 6 - Regression Model
No ratings yet
Ex 6 - Regression Model
3 pages
Assignment 9
No ratings yet
Assignment 9
8 pages
Handling The Dataset Using R - Word
No ratings yet
Handling The Dataset Using R - Word
54 pages
Package Bnlearn': January 15, 2018
No ratings yet
Package Bnlearn': January 15, 2018
106 pages
How Does Inflation Affect The Exchange Rate Between Two Nations?
No ratings yet
How Does Inflation Affect The Exchange Rate Between Two Nations?
32 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
Sakhil Assignment 02
No ratings yet
Sakhil Assignment 02
8 pages
Modelling and Forecasting of Price Volatility An Application
No ratings yet
Modelling and Forecasting of Price Volatility An Application
11 pages
R Assignment
No ratings yet
R Assignment
8 pages
Lindu Software Presentation SEACG 2018 BALI
No ratings yet
Lindu Software Presentation SEACG 2018 BALI
24 pages
Chapter 6: Specification: Choosing The Independent Variables
No ratings yet
Chapter 6: Specification: Choosing The Independent Variables
6 pages
Janani Prakash Loan Prediction Study
No ratings yet
Janani Prakash Loan Prediction Study
97 pages
Notes On Product Pricing: Concept 1: Price Elasticity of Demand
No ratings yet
Notes On Product Pricing: Concept 1: Price Elasticity of Demand
5 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Business Analytics 1 Ca 2
No ratings yet
Business Analytics 1 Ca 2
26 pages
Forecasting The Price of Rice
No ratings yet
Forecasting The Price of Rice
20 pages
Final Major Report
No ratings yet
Final Major Report
31 pages
Car Price Detection Based On The Travelling Distance
No ratings yet
Car Price Detection Based On The Travelling Distance
15 pages
Set 2
No ratings yet
Set 2
19 pages
1
No ratings yet
1
19 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
HW11數學規劃
No ratings yet
HW11數學規劃
14 pages
BSAI
No ratings yet
BSAI
4 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Data Mining
No ratings yet
Data Mining
10 pages
Surabhi Charu Project
No ratings yet
Surabhi Charu Project
16 pages
To Edit Data Science
No ratings yet
To Edit Data Science
18 pages
Assign Geographic Roles
No ratings yet
Assign Geographic Roles
2 pages
Company: Job: Job Role: Job Duties
No ratings yet
Company: Job: Job Role: Job Duties
3 pages
Government Steps Needed To Boost Economy, Corporate Taxes, Effect of Rising Interest Rates
No ratings yet
Government Steps Needed To Boost Economy, Corporate Taxes, Effect of Rising Interest Rates
10 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
L7 Model Selection
No ratings yet
L7 Model Selection
41 pages
ISYE6501 HW1 Kevin
No ratings yet
ISYE6501 HW1 Kevin
7 pages
Autoregressive Conditional Heteroskedasticity (ARCH) : Volatility Clustering
No ratings yet
Autoregressive Conditional Heteroskedasticity (ARCH) : Volatility Clustering
9 pages
R Codes
No ratings yet
R Codes
23 pages
The Vietnamese Version of The Social and Emotional Competence Questionnaire (Secq) : Psychometric Properties Among Adolescents
No ratings yet
The Vietnamese Version of The Social and Emotional Competence Questionnaire (Secq) : Psychometric Properties Among Adolescents
16 pages
Verizon Communication Inc: Anubhav Anand Katyayini Kesharwani Shambhawi Sinha
No ratings yet
Verizon Communication Inc: Anubhav Anand Katyayini Kesharwani Shambhawi Sinha
8 pages
Development of The Gastrointestinal Dysfunction Score (GIDS)
100% (1)
Development of The Gastrointestinal Dysfunction Score (GIDS)
9 pages
Codes
No ratings yet
Codes
14 pages
Assignment AnjaliVats 244
No ratings yet
Assignment AnjaliVats 244
12 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
Predictive Analytics: Group Assignment 2
No ratings yet
Predictive Analytics: Group Assignment 2
6 pages
Uts Ekonometrika
No ratings yet
Uts Ekonometrika
37 pages
Sellers2017 - Underdispersion Models
No ratings yet
Sellers2017 - Underdispersion Models
27 pages
N.S.Aswin 8030318002 TIVO
No ratings yet
N.S.Aswin 8030318002 TIVO
2 pages
IE 451 Fall 2023-2024 Homework 7 Solutions
No ratings yet
IE 451 Fall 2023-2024 Homework 7 Solutions
11 pages
Model Lab
No ratings yet
Model Lab
6 pages
Chapitre 5 IMEA 1
No ratings yet
Chapitre 5 IMEA 1
32 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Phillips Curve, Network Effect, Theory of Interest, Employmentand Money, Insolvency and Bankruptcy Code, Inflation (Low and High)
No ratings yet
Phillips Curve, Network Effect, Theory of Interest, Employmentand Money, Insolvency and Bankruptcy Code, Inflation (Low and High)
7 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
Predictivemaintenance FaultDetection
No ratings yet
Predictivemaintenance FaultDetection
12 pages
IIC Paper On Modelling The Intra Day Activity in Stock Market
No ratings yet
IIC Paper On Modelling The Intra Day Activity in Stock Market
32 pages
Problem: # Partition
No ratings yet
Problem: # Partition
5 pages
Car Price Prediction
No ratings yet
Car Price Prediction
5 pages
STAT2 2e R Markdown Files Sec4.7
No ratings yet
STAT2 2e R Markdown Files Sec4.7
10 pages
DMPM-LAB-03-Assignment: Rcode
No ratings yet
DMPM-LAB-03-Assignment: Rcode
9 pages
Firmwidespeech
No ratings yet
Firmwidespeech
3 pages
GMM
No ratings yet
GMM
25 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Examining The Associations of Trait Self-Control With Hedonic and Eudaimonic Well-Being
No ratings yet
Examining The Associations of Trait Self-Control With Hedonic and Eudaimonic Well-Being
22 pages
R Programming
No ratings yet
R Programming
11 pages
Package Tempdisagg': February 7, 2020
No ratings yet
Package Tempdisagg': February 7, 2020
14 pages
DAVL Prac 1
No ratings yet
DAVL Prac 1
6 pages
IMCAdvertising
No ratings yet
IMCAdvertising
6 pages
Report
No ratings yet
Report
4 pages
Exercises 2 Unfinished
No ratings yet
Exercises 2 Unfinished
8 pages
Rlab
No ratings yet
Rlab
7 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Limited Staff
No ratings yet
Limited Staff
2 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model
No ratings yet
Prediction of Network Traffic in Wireless Mesh Networks Using Hybrid Deep Learning Model
13 pages
Fattori Ecologici e Riproduzione Delle Leonesse Kruger-2019
No ratings yet
Fattori Ecologici e Riproduzione Delle Leonesse Kruger-2019
12 pages
Wa0014.
No ratings yet
Wa0014.
3 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages
Anirudh Kapoor: Demographics
No ratings yet
Anirudh Kapoor: Demographics
2 pages
PSI INDIA - Will Balbir Pasha Help Fight AIDS: Target Audience
No ratings yet
PSI INDIA - Will Balbir Pasha Help Fight AIDS: Target Audience
2 pages
Article
No ratings yet
Article
8 pages
SVM
No ratings yet
SVM
2 pages
Assignment 4
No ratings yet
Assignment 4
1 page
Transportation
No ratings yet
Transportation
1 page
Interpretations
No ratings yet
Interpretations
1 page
Post Covid Restaurant
No ratings yet
Post Covid Restaurant
3 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Amta - Final Exams: Code: # Load The Toyotacorolla - CSV

Uploaded by

Amta - Final Exams: Code: # Load The Toyotacorolla - CSV

Uploaded by

AMTA – FINAL EXAMS

1. Toyota car data

# Load the ToyotaCorolla.csv

#first lets check for if any values are missing or not

# Load the libraries

iv) Regression model

dfpredtest = predict(lmtrain, test, interval = "predict", level = .95)

dfpredtest$validate = mapply(middle, dfpredtest$lwr, dfpredtest$upr, dfpredtest$price)

# saving predicted values

# model performance measures

# 1.7 Stepwise Regression

#when value of p is less than 0.05,then it is significant

#checking for AIC

#lower AIC is better. therefore stepwise AIC regression

#AIC estimates the quality of model.it is a type of criterion

# identifying pruning parameter

prunedtree = prune(basetree, cp =.015)

preprunedtree <- rpart(carat ~ ., data = df2,

control = rpart.control(maxdepth = 2,minsplit = 50))

TOPIC 2 : MODELLING IN CARET

ctrl.cv = trainControl(method = "cv", number = 6)

tree.fit = train(carat ~.,

tuneGrid = expand.grid(maxdepth = c(1:10)),

control = rpart.control(minsplit =10, minbucket = 5))

# Mixed Variable Cluster Analysis Using Partitioning Around Mediods

# computing the gower distance

gowerdist <- daisy(df2, metric = "gower")

class(gowerdist) #distance ; dissimilarity

# cluster analysis using partitioning around mediods:

# making k clusters from gowermatrix

pamsol = pam(gower.mat,diss = TRUE,3)

# pulling out the cluster centers

# pulling out the clustering vector

df2$carat = pamsol$clustering# creating a column for which cluster the car is

# pulling out silinfo

SIL[i] = pam(gower.mat, diss = TRUE , i)$silinfo$avg.width

# So, 2 is the optimal number of clusters from the plot.

# using 4 clusters (SAME PROCESS)

df$clusmem = pamsol$clustering # creating a column for which cluster the car is

# to know which variables are important or more significant in the cluster

model = glm(df2$carat~.,data = df2)

You might also like