Regression Trees & Random Forests Analysis

The regression tree applied to the training data had a test MSE of 4.149. Pruning the tree did not improve test error. Bagging and random forests achieved lower test MSE of 2.604 and 3.296 respectively. For random forests, test error was minimized with mtry near p/2=7 and ntree between 30-35. Price and ShelveLoc were identified as the most important variables.

Uploaded by

Aman Chheda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views7 pages

Regression Trees & Random Forests Analysis

Uploaded by

Aman Chheda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment 6 Tree Based Methods

Name: Aman Chheda

UIN: 426009694

Problem 1:

In the lab, a classification tree was applied to the Carseats data set after converting Sales into a
binary response variable. This question will seek to predict Sales using regression trees and related
approaches, treating the response as a quantitative variable (that is, without the conversion).

(a) Split the data set into a training set and a test set.

(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. Then compute the
test MSE.
ShelveLoc: Bad,Medium
|

Price < 120.5 Price < 113

Price < 142.5

CompPrice < 133
12.080
6.973
8.778 11.010

Age < 50.5 Age < 66.5

CompPrice < 148 Price < 132
Advertising < 10.5
7.031 4.780 2.249
Price < 104.5 Price < 92 4.627 6.418
ShelveLoc: Bad Advertising < 3.5

8.284 10.400 6.922 9.117 Income < 85 ShelveLoc: Bad

CompPrice < 107
7.512 9.882 5.059
5.206 7.131

Test MSE = 4.149

The Sales are high when the Shelve location is medium and the price is low. The sales are low when
the shelve location is bad, price is high, age is high.
(c) Prune the tree obtained in (b). Use cross validation to determine the optimal level of tree
complexity. Plot the pruned tree and interpret the results. Compute the test MSE of the pruned tree.
Does pruning improve the test error?

The optimum level of tree complexity was found to be 8. It is observed that pruning does not
improve the test MSE in this case.
1100 1300 1500
cv.carseats$dev

5 10 15

cv.carseats$size

ShelveLoc: Bad,Medium
|

Price < 120.5 Price < 113

12.080 8.792
Age < 50.5 Age < 66.5

Price < 92 5.654 3.303

9.278 ShelveLoc: Bad
8.628 5.059 6.820

The sales are high when the Shelve location is medium, and specifically when the prices are low(ie
less than 113). The sales are low when the Shelve location is bad and the price, age are high.
(d) Use the bagging approach to analyze the data. What test MSE do you obtain? Determine which
variables are most important.

Test MSE = 2.604 it is much lesser than that of bagging. The important variables according to the
given data are Price and ShelveLoc
(e) Use random forests to analyse the data. What test MSE do you obtain? Determine which
variables are most important.

The Test MSE = 3.296. The important variables according to the given data are Price and
ShelveLoc
Problem 2:

In the lab, we applied random forests to the Boston data using mtry=6 and ntree=100.

(a) Consider a more comprehensive range of values for mtry: 1, 2,,13. Given each value of mtry,
find the test error resulting from random forests on the Boston data (using ntree=100). Create a plot
displaying the test error rate vs. the value of mtry. Comment on the results in the plot.
randomforest.error

20
16
12

2 4 6 8 10 12

Index
Conclusion from the plot: It reaches a minimum near the p/2 region ie 7. At ie around 3 the error
is not that low.
(b) Similarly, consider a range of values for ntree (between 5 to 200). Given each value of ntree, find
the test error resulting from random forests (using mtry=6). Create a plot displaying the test error vs.
the value of ntree. Comment on the results in the plot.

12 14 16 18
randomforest.error

0 50 100 150 200

Index

Conclusion from the plot: It reaches a minimum at around ntree=30-35 Thereafter the results are
stable and similar to the minimum.

IDS 575 Assignment - 3: Name: Swapnil Shashank Parkhe UIN: 660014865
No ratings yet
IDS 575 Assignment - 3: Name: Swapnil Shashank Parkhe UIN: 660014865
7 pages
IE 451 Fall 2023-2024 Homework 7 Solutions
No ratings yet
IE 451 Fall 2023-2024 Homework 7 Solutions
11 pages
SVM Model for Khan Gene Data Analysis
No ratings yet
SVM Model for Khan Gene Data Analysis
16 pages
Tree Based Methods
No ratings yet
Tree Based Methods
21 pages
Decision Tree Analysis of Carseats Data
No ratings yet
Decision Tree Analysis of Carseats Data
7 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
Problem: # Partition
No ratings yet
Problem: # Partition
5 pages
Assignment 7
No ratings yet
Assignment 7
1 page
MBA Decision Tree Analysis
No ratings yet
MBA Decision Tree Analysis
12 pages
Introduction to Base R Programming
No ratings yet
Introduction to Base R Programming
10 pages
Tree Models in Insurance Pricing
No ratings yet
Tree Models in Insurance Pricing
142 pages
Model Evalution
No ratings yet
Model Evalution
6 pages
Machine Learning: Random Forests & Regression
No ratings yet
Machine Learning: Random Forests & Regression
26 pages
Project Report On Customer Lifetime Value
No ratings yet
Project Report On Customer Lifetime Value
23 pages
Classification and Regression Trees-3
No ratings yet
Classification and Regression Trees-3
27 pages
Loan Response Prediction Models
No ratings yet
Loan Response Prediction Models
97 pages
Exercise RandomForest
No ratings yet
Exercise RandomForest
5 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
Tut Sol Week12
No ratings yet
Tut Sol Week12
8 pages
File 3
No ratings yet
File 3
2 pages
R Data Analysis and Statistics Overview
No ratings yet
R Data Analysis and Statistics Overview
14 pages
Lab 14 Questions
No ratings yet
Lab 14 Questions
4 pages
Seminar Presentation
No ratings yet
Seminar Presentation
25 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
ML Assigment 1 Report
No ratings yet
ML Assigment 1 Report
8 pages
House Price Prediction for Investors
No ratings yet
House Price Prediction for Investors
3 pages
Report
No ratings yet
Report
1 page
Multivariate Regression Tree Overview
No ratings yet
Multivariate Regression Tree Overview
14 pages
Car Seats R Code
No ratings yet
Car Seats R Code
5 pages
Reg Tree
No ratings yet
Reg Tree
38 pages
Regression Tree
No ratings yet
Regression Tree
7 pages
FIT2086 Assignment 3: Regression & Classification Analysis
No ratings yet
FIT2086 Assignment 3: Regression & Classification Analysis
9 pages
Random Forest and Parameter Tuning in R
No ratings yet
Random Forest and Parameter Tuning in R
9 pages
Text Problems Solved
No ratings yet
Text Problems Solved
9 pages
Business Analytics-1: STR (Crew - Data)
No ratings yet
Business Analytics-1: STR (Crew - Data)
16 pages
Predicting Cubic Zirconia Prices
No ratings yet
Predicting Cubic Zirconia Prices
56 pages
Data Mining Project Presentation - JAG
No ratings yet
Data Mining Project Presentation - JAG
32 pages
Mock Midterm Exam: Eng. Analytics I
No ratings yet
Mock Midterm Exam: Eng. Analytics I
11 pages
Data Science Practical Completion Report
No ratings yet
Data Science Practical Completion Report
31 pages
STAT2 2e R Markdown Files Sec4.7
No ratings yet
STAT2 2e R Markdown Files Sec4.7
10 pages
Data Mining - Summer 2 - Sesh 3
No ratings yet
Data Mining - Summer 2 - Sesh 3
66 pages
Lab6 STA552
No ratings yet
Lab6 STA552
5 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Group 6 Solution For Assignment
No ratings yet
Group 6 Solution For Assignment
17 pages
Regression Trees for Data Prediction
No ratings yet
Regression Trees for Data Prediction
8 pages
Tut6 Solution
No ratings yet
Tut6 Solution
4 pages
Project Group 20
No ratings yet
Project Group 20
3 pages
Tutorial Answers
No ratings yet
Tutorial Answers
5 pages
CS 229 Project Report: Predicting Used Car Prices
100% (1)
CS 229 Project Report: Predicting Used Car Prices
5 pages
決策樹 R程式練習
No ratings yet
決策樹 R程式練習
11 pages
Regression Trees: Methods & Pruning
No ratings yet
Regression Trees: Methods & Pruning
64 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
32 pages
Marketing Analytics for Bajaj Allianz
No ratings yet
Marketing Analytics for Bajaj Allianz
30 pages
R Data Analysis with mtcars Dataset
No ratings yet
R Data Analysis with mtcars Dataset
11 pages
613 P
No ratings yet
613 P
2 pages
Three Stages of Fatigue Crack Growth in GFRP Composite Laminates
No ratings yet
Three Stages of Fatigue Crack Growth in GFRP Composite Laminates
5 pages
Solar Energy and Its Efficient Extraction: Sardar Patel College of Engineering (Government Aided Autonomous Institute)
No ratings yet
Solar Energy and Its Efficient Extraction: Sardar Patel College of Engineering (Government Aided Autonomous Institute)
17 pages
Resume: Educational Qualifications
No ratings yet
Resume: Educational Qualifications
2 pages
Effective Survey Writing Tips
No ratings yet
Effective Survey Writing Tips
5 pages
The Trials and Tribulations of Performance Appraisal
No ratings yet
The Trials and Tribulations of Performance Appraisal
5 pages
Rational Choice & Behavioralism in Politics
No ratings yet
Rational Choice & Behavioralism in Politics
44 pages
Factors Affecting Online Shopping Decision Behavior of Vietnam Consumers Shopee International Platform (Operations)
No ratings yet
Factors Affecting Online Shopping Decision Behavior of Vietnam Consumers Shopee International Platform (Operations)
10 pages
Proposal - Impact Assessment - AI at Workplace
No ratings yet
Proposal - Impact Assessment - AI at Workplace
30 pages
Sources of Risks in Construction Projects
100% (1)
Sources of Risks in Construction Projects
16 pages
Impact of Transmission Lines on Property Value
No ratings yet
Impact of Transmission Lines on Property Value
10 pages
Lecture Notes On Review of Related Literature
No ratings yet
Lecture Notes On Review of Related Literature
3 pages
Eapp The Reports, Survey Questionnaire and Methods of
No ratings yet
Eapp The Reports, Survey Questionnaire and Methods of
28 pages
Data Flow Analysis in Compilers
No ratings yet
Data Flow Analysis in Compilers
44 pages
FBA 310 - (Business Statistics) Assignment Questions - FIN'22
No ratings yet
FBA 310 - (Business Statistics) Assignment Questions - FIN'22
3 pages
Hedge Fund Management in India
No ratings yet
Hedge Fund Management in India
11 pages
Operations Research Assignment Problems
100% (1)
Operations Research Assignment Problems
6 pages
2 Research Design
No ratings yet
2 Research Design
22 pages
Types of Quantitative Research Designs
No ratings yet
Types of Quantitative Research Designs
26 pages
Shared Leadership Performance Relationship Trajectories As A - 2021 - The Leader
No ratings yet
Shared Leadership Performance Relationship Trajectories As A - 2021 - The Leader
16 pages
The Impact of Social Media On Students
100% (2)
The Impact of Social Media On Students
12 pages
MBTI and Coaching
100% (7)
MBTI and Coaching
48 pages
Equipment Failure Rate Updating
No ratings yet
Equipment Failure Rate Updating
5 pages
Assignment 3 Data Analysis Plan WORKING
No ratings yet
Assignment 3 Data Analysis Plan WORKING
3 pages
Breadwinner Experiences and Family Pressure
No ratings yet
Breadwinner Experiences and Family Pressure
6 pages
Comparison FWD Benkelman
No ratings yet
Comparison FWD Benkelman
7 pages
HR Literature Review Example
100% (3)
HR Literature Review Example
6 pages
Eva Cancik-Kirschbaum, Nicole Brisch, Jesper Eidem - Constituent, Confederate, and Conquered Space. The Emergence of The Mittani State (Retail)
100% (1)
Eva Cancik-Kirschbaum, Nicole Brisch, Jesper Eidem - Constituent, Confederate, and Conquered Space. The Emergence of The Mittani State (Retail)
300 pages
Guimaras State College: Graduate School
100% (1)
Guimaras State College: Graduate School
7 pages
Assessment of The Use of Multimedia Tools in Practical Skills Among Student Nurses of Lagos State College of Nursing
No ratings yet
Assessment of The Use of Multimedia Tools in Practical Skills Among Student Nurses of Lagos State College of Nursing
23 pages
Linear Regression Essentials
No ratings yet
Linear Regression Essentials
14 pages
The Research Onion Model
No ratings yet
The Research Onion Model
21 pages
What Is A Pilot Study
No ratings yet
What Is A Pilot Study
3 pages
Protecting Children Against Bullying and Its Consequences
No ratings yet
Protecting Children Against Bullying and Its Consequences
92 pages