0% found this document useful (0 votes)
133 views7 pages

Assignment 6 Tree Based Methods

The regression tree applied to the training data had a test MSE of 4.149. Pruning the tree did not improve test error. Bagging and random forests achieved lower test MSE of 2.604 and 3.296 respectively. For random forests, test error was minimized with mtry near p/2=7 and ntree between 30-35. Price and ShelveLoc were identified as the most important variables.

Uploaded by

Aman Chheda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views7 pages

Assignment 6 Tree Based Methods

The regression tree applied to the training data had a test MSE of 4.149. Pruning the tree did not improve test error. Bagging and random forests achieved lower test MSE of 2.604 and 3.296 respectively. For random forests, test error was minimized with mtry near p/2=7 and ntree between 30-35. Price and ShelveLoc were identified as the most important variables.

Uploaded by

Aman Chheda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Assignment 6 Tree Based Methods

Name: Aman Chheda


UIN: 426009694

Problem 1:

In the lab, a classification tree was applied to the Carseats data set after converting Sales into a
binary response variable. This question will seek to predict Sales using regression trees and related
approaches, treating the response as a quantitative variable (that is, without the conversion).

(a) Split the data set into a training set and a test set.

(b) Fit a regression tree to the training set. Plot the tree, and interpret the results. Then compute the
test MSE.
ShelveLoc: Bad,Medium
|

Price < 120.5 Price < 113

Price < 142.5


CompPrice < 133
12.080
6.973
8.778 11.010

Age < 50.5 Age < 66.5


CompPrice < 148 Price < 132
Advertising < 10.5
7.031 4.780 2.249
Price < 104.5 Price < 92 4.627 6.418
ShelveLoc: Bad Advertising < 3.5

8.284 10.400 6.922 9.117 Income < 85 ShelveLoc: Bad


CompPrice < 107
7.512 9.882 5.059
5.206 7.131

Test MSE = 4.149

The Sales are high when the Shelve location is medium and the price is low. The sales are low when
the shelve location is bad, price is high, age is high.
(c) Prune the tree obtained in (b). Use cross validation to determine the optimal level of tree
complexity. Plot the pruned tree and interpret the results. Compute the test MSE of the pruned tree.
Does pruning improve the test error?

The optimum level of tree complexity was found to be 8. It is observed that pruning does not
improve the test MSE in this case.
1100 1300 1500
cv.carseats$dev

5 10 15

cv.carseats$size

ShelveLoc: Bad,Medium
|

Price < 120.5 Price < 113

12.080 8.792
Age < 50.5 Age < 66.5

Price < 92 5.654 3.303


9.278 ShelveLoc: Bad
8.628 5.059 6.820

The sales are high when the Shelve location is medium, and specifically when the prices are low(ie
less than 113). The sales are low when the Shelve location is bad and the price, age are high.
(d) Use the bagging approach to analyze the data. What test MSE do you obtain? Determine which
variables are most important.

Test MSE = 2.604 it is much lesser than that of bagging. The important variables according to the
given data are Price and ShelveLoc
(e) Use random forests to analyse the data. What test MSE do you obtain? Determine which
variables are most important.

The Test MSE = 3.296. The important variables according to the given data are Price and
ShelveLoc
Problem 2:

In the lab, we applied random forests to the Boston data using mtry=6 and ntree=100.

(a) Consider a more comprehensive range of values for mtry: 1, 2,,13. Given each value of mtry,
find the test error resulting from random forests on the Boston data (using ntree=100). Create a plot
displaying the test error rate vs. the value of mtry. Comment on the results in the plot.
randomforest.error

20
16
12

2 4 6 8 10 12

Index
Conclusion from the plot: It reaches a minimum near the p/2 region ie 7. At ie around 3 the error
is not that low.
(b) Similarly, consider a range of values for ntree (between 5 to 200). Given each value of ntree, find
the test error resulting from random forests (using mtry=6). Create a plot displaying the test error vs.
the value of ntree. Comment on the results in the plot.

12 14 16 18
randomforest.error

0 50 100 150 200

Index

Conclusion from the plot: It reaches a minimum at around ntree=30-35 Thereafter the results are
stable and similar to the minimum.

You might also like