Part 1: Print decision tree
a. We begin by setting the working directory, loading the required packages
(rpart and mlbench) and then loading the Ionosphere dataset.
#set working directory if needed (modify path as needed)
setwd(“working directory”)
#load required libraries – rpart for classification and regression trees
library(rpart)
#mlbench for Ionosphere dataset
library(mlbench)
#load Ionosphere
data(Ionosphere)
> setwd('C:\\Users\\Admin\\Downloads')
> library(rpart)
> library(mlbench)
> data(Ionosphere)
b. Use the rpart() method to create a regression tree for the data.
rpart(Class~.,Ionosphere)
> rpart.ionosphere=rpart(Class~.,Ionosphere)
> rpart.ionosphere
n= 351
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 351 126 good (0.35897436 0.64102564)
2) V5< 0.23154 77 4 bad (0.94805195 0.05194805) *
3) V5>=0.23154 274 53 good (0.19343066 0.80656934)
6) V27>=0.999945 52 13 bad (0.75000000 0.25000000)
12) V1=0 19 0 bad (1.00000000 0.00000000) *
13) V1=1 33 13 bad (0.60606061 0.39393939)
26) V3< 0.73004 8 0 bad (1.00000000 0.00000000) *
27) V3>=0.73004 25 12 good (0.48000000 0.52000000)
54) V22>=0.47714 9 1 bad (0.88888889 0.11111111) *
55) V22< 0.47714 16 4 good (0.25000000 0.75000000) *
7) V27< 0.999945 222 14 good (0.06306306 0.93693694) *
c. Use the plot() and text() methods to plot the decision tree.
> plot(rpart.ionosphere)
> text(rpart.ionosphere,pretty=0)
Part 2: Estimate accuracy
a. Split the data a test and train subsets using the sample() method.
> set.seed=(42)
> train=sample(1:nrow(Ionosphere),200)
b. Use the rpart method to create a decision tree using the training data.
rpart(Class~.,Ionosphere,subset=train)
> rpart.ionosphere=rpart(Class~.,Ionosphere,subset=train)
> rpart.ionosphere
n= 200
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 200 73 good (0.36500000 0.63500000)
2) V5< 0.02313 40 0 bad (1.00000000 0.00000000) *
3) V5>=0.02313 160 33 good (0.20625000 0.79375000)
6) V27>=0.99921 31 9 bad (0.70967742 0.29032258)
12) V22>=-0.009455 20 2 bad (0.90000000 0.10000000) *
13) V22< -0.009455 11 4 good (0.36363636 0.63636364) *
7) V27< 0.99921 129 11 good (0.08527132 0.91472868) *
c. Use the predict method to find the predicted class labels for the testing data.
> rpart.pred=predict(rpart.ionosphere,Ionosphere.test,type="class")
d. Use the table method to create a table of the predictions versus true labels and then
compute the accuracy. The accuracy is the number of correctly assigned good cases
(true positives) plus the number of correctly assigned bad cases (true negatives) divided
by the total number of testing cases.
> table(rpart.pred,Ionosphere$Class[-train])
rpart.pred bad good
bad 37 3
good 16 95
> (37+95)/(37+3+16+95)
[1] 0.8741722