0% found this document useful (0 votes)

53 views7 pages

Decision Tree Is An Upside

1. A decision tree is a predictive model that uses a tree-like structure to determine the optimal decision path based on input data and conditions. 2. Decision trees can be used for both classification and regression problems depending on whether the target variable is categorical or continuous. 3. The algorithms used to build decision trees include ID3, C4.5, CART, and CHAID, which differ in how they determine the optimal split at each node.

Uploaded by

Smriti Piyush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views7 pages

Decision Tree Is An Upside

Uploaded by

Smriti Piyush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

decision tree is an upside-down tree that makes decisions based on the conditions present in
the data. Now the question arises why decision tree? Why not other algorithms? The answer
is quite simple as the decision tree gives us amazing results when the data is mostly
categorical in nature and depends on conditions. Still confusing? Let us illustrate this to make
it easy. Let us take a dataset and assume that we are taking a decision tree for building our
final model. So internally, the algorithm will make a decision tree which will be something
like this given below.
In the above representation of a tree, the conditions such as the salary, office location and
facilities go on splitting into branches until they come to a decision whether a person should
accept or decline the job offer. The conditions are known as the internal nodes and they split
to come to a decision which is known as leaf.
Two Types of Decision Tree
1. Classification
2. Regression
Classification trees are applied on data when the outcome is discrete in nature or is
categorical such as presence or absence of students in a class, a person died or survived,
approval of loan etc. but regression trees are used when the outcome of the data is continuous
in nature such as prices, age of a person, length of stay in a hotel, etc.
Assumptions
Despite such simplicity of a decision tree, it holds certain assumptions like:
1. Discretization of continuous variables is required
2. The data taken for training should be wholly considered as root
3. Distribution of records is done in a recursive manner on the basis of attribute
values.
Algorithms used in Decision Tree
Different libraries of different programming languages use particular default algorithms to
build a decision tree but it is quite unclear for a data scientist to understand the difference
between the algorithms used. Here we will discuss those algorithms.
1. ID3
ID3 generates a tree by considering the whole set S as the root node. It then iterates on every
attribute and splits the data into fragments known as subsets to calculate the entropy or the
information gain of that attribute. After splitting, the algorithm recourses on every subset by
taking those attributes which were not taken before into the iterated ones. It is not an ideal
algorithm as it generally overfits the data and on continuous variables, splitting the data can
be time consuming.
2. C4.5
It is quite advanced compared to ID3 as it considers the data which are classified samples.
The splitting is done based on the normalized information gain and the feature having the
highest information gain makes the decision. Unlike ID3, it can handle both continuous and
discrete attributes very efficiently and after building a tree, it undergoes pruning by removing
all the branches having low importance.
3. CART
CART can perform both classification and regression tasks and they create decision points by
considering Gini index unlike ID3 or C4.5 which uses information gain and gain ratio for
splitting. For splitting, CART follows a greedy algorithm which aims only to reduce the cost
function. For classification, cost function such as Gini index is used to indicate the purity of
the leaf nodes. For regression, sum squared error is chosen by the algorithm as the cost
function to find out the best prediction.
4. CHAID

SMRITI SHARMA Page 1

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

CHAID or Chi-square Automatic Interaction Detector is a process which can deal with any
type of variables be it nominal, ordinal or continuous. In regression tree, it uses F-test and in
classification trees, it uses the Chi-Square test. In this analysis, continuous predictors are
separated into equal number of observations until an outcome is achieved. It is very less used
and adopted in real world problems compared to other algorithms.
5. MARS
MARS or Multivariate adaptive regression splines is an analysis specially implemented in
regression problems when the data is mostly nonlinear in nature.
Applications
As decision tree are very simple in nature and can be easily interpretable by any senior
management, they are used in wide range of industries and disciplines such as
1. In healthcare industries
In healthcare industries, decision tree can tell whether a patient is suffering from a disease or
not based on conditions such as age, weight, sex and other factors. Other applications such as
deciding the effect of the medicine based on factors such as composition, period of
manufacture, etc. Also, in diagnosis of medical reports, a decision tree can be very effective.
The above flowchart represents a decision tree deciding if there is a cure possible or not after
performing surgery or by prescribing medicines
2. In banking sectors.
A person eligible for a loan or not based on his financial status, family member, salary, etc.
can be decided on a decision tree. Other applications may include credit card frauds, bank
schemes and offers, loan defaults, etc. which can be prevented by using a proper decision
tree.
The above tree represents a decision whether a person can be granted loan or not based on his
financial conditions.
3. In educational Sectors
In colleges and universities, the shortlisting of a student can be decided based upon his merit
scores, attendance, overall score etc. A decision tree can also decide the overall promotional
strategy of faculties present in the universities.
The above tree decides whether a student will like the class or not based on his prior
programming interest.
There are many other applications too where a decision tree can be a problem-solving
strategy despite its certain drawbacks.
Advantages and disadvantages of a Decision tree
Advantages of Decision Tree
1. A decision tree model is very interpretable and can be easily represented to senior
management and stakeholders.
2. Preprocessing of data such as normalization and scaling is not required which
reduces the effort in building a model.
3. A decision tree algorithm can handle both categorical and numeric data and is
much efficient compared to other algorithms.
4. Any missing value present in the data does not affect a decision tree which is why
it is considered a flexible algorithm.
These are the advantages. But hold on. A decision tree also lacks certain things in real world
scenarios which is indeed a disadvantage. Some of them are
1. A decision tree works badly when it comes to regression as it fails to perform if
the data have too much variation.
2. A decision tree is sometimes unstable and cannot be reliable as alteration in data
can cause a decision tree go in a bad structure which may affect the accuracy of
the model.

SMRITI SHARMA Page 2

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

3. If the data are not properly discretized, then a decision tree algorithm can give
inaccurate results and will perform badly compared to other algorithms.
4. Complexities arise in calculation if the outcomes are linked and it may consume
time while training a model.
Processes involved in Decision Making
A decision tree before starting usually considers the entire data as a root. Then on particular
condition, it starts splitting by means of branches or internal nodes and makes a decision until
it produces the outcome as a leaf. Only one important thing to know is it reduces impurity
present in the attributes and simultaneously gains information to achieve the proper outcomes
while building a tree.
As the algorithm is simple in nature, it also contains certain parameters which are very
important for a data scientist to know because these parameters decide how well a decision
tree performs during the final building of a model.
1. Entropy
It is defined as a measure of impurity present in the data. The entropy is almost zero when the
sample attains homogeneity but is one when it is equally divided. Entropy with the lowest
value makes a model better in terms of prediction as it segregates the classes better. Entropy
is calculated based on the following formula
Here n is the number of classes. Entropy tends to be maximum in the middle with value up to
1 and minimum at the ends with value up to 0.
2. Information Gain
It is a measure used to generalize the impurity which is entropy in a dataset. Higher the
information gain, lower is the entropy. An event having low probabilities to occur has lower
entropy and high information whereas an event having high probabilities has higher entropy
and low information. It is calculated as
Information Gain = Entropy of Parent – sum (weighted % * Entropy of Child)
Weighted % = Number of observations in particular child/sum (observations in all
child nodes)
3. Gini
It is a measure of misclassification and is used when the data contain multi class labels. Gini
is similar to entropy but it calculates much quicker than entropy. Algorithms like CART
(Classification and Regression Tree) use Gini as an impurity parameter.
4. Reduction in Variance
Reduction in variance is used when the decision tree works for regression and the output is
continuous is nature. The algorithm basically splits the population by using the variance
formula.
The criteria of splitting are selected only when the variance is reduced to minimum. The
variance is calculated by the basic formula
Where X bar is the mean of values, X is the actual mean and n is the number of values.
Challenges faced in Decision Tree
Decision tree can be implemented in all types of classification or regression problems but
despite such flexibilities it works best only when the data contains categorical variables and
only when they are mostly dependent on conditions.
Overfitting
There might also be a possibility of overfitting when the branches involve features that have
very low importance. Overfitting can be avoided by two methods
1. Pruning
Pruning is a process of chopping down the branches which consider features having low
importance. It either begins from root or from leaves where it removes the nodes having the
most popular class. Other methods include adding a parameter to decide removing a node on

SMRITI SHARMA Page 3

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

the basis of the size of the sub tree. This method is simply known as post pruning. On the
other hand, pre pruning is the method which stops the tree making decisions by producing
leaves considering smaller samples. As the name suggests, it should be done at an early stage
to avoid overfitting.
2. Ensemble method or bagging and boosting
Ensemble method like a random forest is used to overcome overfitting by resampling training
data repeatedly building multiple decision trees. Boosting technique is also a powerful
method which is used both in classification and regression problems where it trains new
instances to give importance to those instances which are misclassified. AdaBoost is one
commonly used boosting technique.
Discretization
When the data contains too many numerical values, discretization is required as the algorithm
fails to make a decision on such small and rapidly changing values. Such a process can be
time consuming and produce inaccurate results when it comes in training the data.
Case Study in Python
We will be covering a case study by implementing a decision tree in Python. We will be
using a very popular library Scikit learn for implementing decision tree in Python
Step 1
We will import all the basic libraries required for the data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Step 2
Now we will import the kyphosis data which contains the data of 81 patients undergoing
treatment to diagnose whether they have kyphosis or not. The dataset is small so we will not
discretize the numeric values present in the data. It contains the following attributes
 Age – in months
 Number – the number of vertebrae involved
 Start – the number of the first (topmost) vertebra operated on.
Let us read the data.
df = pd.read_csv(‘kyphosis.csv’)
Now let us check what are the attributes and the outcome.
df.head()
Step 3
The dataset is normal in nature and further preprocessing of the attributes is not required. So,
we will directly jump into splitting the data for training and testing.
from sklearn.model_selection import train_test_split
X = df.drop(‘Kyphosis’,axis=1)
y = df[‘Kyphosis’]
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.30)
Here, we have split the data into 70% and 30% for training and testing. You can define your
own ratio for splitting and see if it makes any difference in accuracy.
Step 4
Now we will import the Decision Tree Classifier for building the model. For that scikit learn
is used in Python.
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train,y_train)
Step 5

SMRITI SHARMA Page 4

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

Now that we have fitted the training data to a Decision Tree Classifier, it is time to predict the
output of the test data.
predictions = dtree.predict(X_test)
Step 6
Now the final step is to evaluate our model and see how well the model is performing. For
that we use metrics such as confusion matrix, precision and recall.
from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,predictions))
From the evaluation, we can see that the model is performing good but the present label gives
a 40% precision and recall what needs to be improved. Let us see the confusion matrix for the
misclassification.
print(confusion_matrix(y_test,predictions))
[[17 3]
[[17 3]
[[ 3 2]]
Step 7
Now the model building is over but we did not see the tree yet. Now scikit learn has a built-in
library for visualization of a tree but we do not use it often. For visualization, we need to
install the pydot library and run the following code.
from IPython.display import Image
from sklearn.externals.six import StringIO
from sklearn.tree import export_graphviz
import pydot
features = list(df.columns[1:])
dot_data = StringIO()
export_graphviz(dtree, out_file=dot_data,feature_names=features,filled=True,rounded=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())
After running the above code, we get the following tree as given below.
Case study in R.
Now we will be building a decision tree on the same dataset using R.
The following data set showcases how R can be used to create two types of decision trees,
namely classification and Regression decision trees. The first decision tree helps in
classifying the types of flower based on petal length and width while the second decision tree
focuses on finding out the prices of the said asset.

SMRITI SHARMA Page 5

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

Decision Tree – Classification

#party package
library(party)
#splitting data
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
createDataPartition(iris$Species,p=0.65,list=F) -> split_tag

iris[split_tag,] ->train
iris[–split_tag,] ->test

#Building tree
ctree(Species~.,data=train) -> mytree
plot(mytree)
#predicting values
predict(mytree,test,type=”response”) -> mypred
table(test$Species,mypred)
## mypred
## setosa versicolor virginica
## setosa 17 0 0
## versicolor 0 17 0
## virginica 0 2 15
#model-2

ctree(Species~Petal.Length+Petal.Width,data=train) -> mytree2

plot(mytree2)
#prediction
predict(mytree2,test,type=”response”) -> mypred2
table(test$Species,mypred2)
## mypred2
## setosa versicolor virginica
## setosa 17 0 0
## versicolor 0 17 0
## virginica 0 2 15
Decision Tree – Regression
library(rpart)

read.csv(“C:/Users/BHARANI/Desktop/Datasets/Boston.csv”) -> boston

#splitting data
library(caret)
createDataPartition(boston$medv,p=0.70,list=F) -> split_tag

boston[split_tag,] ->train
boston[–split_tag,] ->test

#building model

SMRITI SHARMA Page 6

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

rpart(medv~., train) -> my_tree

library(rpart.plot)
## Warning: package ‘rpart.plot’ was built under R version 3.6.2
rpart.plot(my_tree)
#predicting
predict(my_tree,newdata = test) -> predict_tree

cbind(Actual=test$medv,Predicted=predict_tree) -> final_data

as.data.frame(final_data) -> final_data

(final_data$Actual – final_data$Predicted) -> error

cbind(final_data,error) -> final_data

sqrt(mean((final_data$error)^2)) -> rmse1

rpart(medv~lstat+nox+rm+age+tax, train) -> my_tree2

library(rpart.plot)

#predicting
predict(my_tree2,newdata = test) -> predict_tree2

cbind(Actual=test$medv,Predicted=predict_tree2) -> final_data2

as.data.frame(final_data2) -> final_data2

(final_data2$Actual – final_data2$Predicted) -> error2

cbind(final_data2,error2) -> final_data2

sqrt(mean((final_data2$error2)^2)) -> rmse2

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of
the R code that generated the plot.
The concept of a decision tree has been made interpretable throughout the article. If data
contains too many logical conditions or is discretized to categories, then decision tree
algorithm is the right choice. If the data contains too many numeric variables, then it is better
to prefer other classification algorithms as decision tree will perform badly due to the
presence of minute variation of attributes present in the data. Still, it is advisable to perform
feature engineering on numeric data to confront the algorithm that a decision-making tree
holds.

SMRITI SHARMA Page 7

Decision Tree
0% (1)
Decision Tree
24 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Week 8 - Understanding The Decision Tree
No ratings yet
Week 8 - Understanding The Decision Tree
28 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Anomaly Detection For Cyber Security
No ratings yet
Anomaly Detection For Cyber Security
31 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Kiran
No ratings yet
Kiran
12 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Dmbi Mini Project Report
No ratings yet
Dmbi Mini Project Report
7 pages
Time Series Analysis Handbook 03
No ratings yet
Time Series Analysis Handbook 03
12 pages
Pde 240509154448 9589657a
No ratings yet
Pde 240509154448 9589657a
20 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
5 pages
Notes Decision Tree
No ratings yet
Notes Decision Tree
22 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
CT Week 3 Ga
No ratings yet
CT Week 3 Ga
21 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Chapter 5 Sol
100% (1)
Chapter 5 Sol
48 pages
A Brief Introduction To Pytorch: (A Deep Learning Library)
No ratings yet
A Brief Introduction To Pytorch: (A Deep Learning Library)
32 pages
Akash Chowdhury - Se
No ratings yet
Akash Chowdhury - Se
5 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
First Course On Fuzzy Theory and Applications: Kwang H. Lee
No ratings yet
First Course On Fuzzy Theory and Applications: Kwang H. Lee
5 pages
Single-Source Shortest Paths - Cormen Book CH 24
No ratings yet
Single-Source Shortest Paths - Cormen Book CH 24
28 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Trees
No ratings yet
Decision Trees
12 pages
MATHESH Matlab Final Output
No ratings yet
MATHESH Matlab Final Output
19 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Seminar Report
No ratings yet
Seminar Report
29 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Math 5
No ratings yet
Math 5
37 pages
Signals and Systems: Laboratory Manual
No ratings yet
Signals and Systems: Laboratory Manual
6 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Blowfish
No ratings yet
Blowfish
21 pages
Data Structures and Algorithm: Avl Tree
No ratings yet
Data Structures and Algorithm: Avl Tree
42 pages
Simplex Method
No ratings yet
Simplex Method
29 pages
EX267 Red Hat Exam Valid Questions
No ratings yet
EX267 Red Hat Exam Valid Questions
11 pages
Lecture 6
No ratings yet
Lecture 6
32 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Paper 7 - The Object Detection Based On Deep Learning
No ratings yet
Paper 7 - The Object Detection Based On Deep Learning
6 pages
Dy Fxy Yx y DX: 2. Taylor's Series Method
No ratings yet
Dy Fxy Yx y DX: 2. Taylor's Series Method
2 pages
Taming The Waves Sine As Activation Function in Deep Neural - Networks PDF
No ratings yet
Taming The Waves Sine As Activation Function in Deep Neural - Networks PDF
12 pages
Brain - Inspired Computing: Wozniak Et Al. Yin Et Al. Masquelier
No ratings yet
Brain - Inspired Computing: Wozniak Et Al. Yin Et Al. Masquelier
5 pages
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
No ratings yet
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
4 pages
WWW Gradplus Pro Lessons Elective IV Digital Image Processing Nagpur University Summer 2019
No ratings yet
WWW Gradplus Pro Lessons Elective IV Digital Image Processing Nagpur University Summer 2019
2 pages
Quantitative Management-Network Models: Minimum Spanning Tree
No ratings yet
Quantitative Management-Network Models: Minimum Spanning Tree
10 pages
Review ICC
No ratings yet
Review ICC
3 pages
Hand Written Notes On Stability
No ratings yet
Hand Written Notes On Stability
3 pages
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
No ratings yet
Business Intelligence and Decision Support Systems (9 Ed., Prentice Hall)
41 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Dynamical Functional Equations PDF
No ratings yet
Dynamical Functional Equations PDF
2 pages
35 - 44 Simulation Random Numbers
No ratings yet
35 - 44 Simulation Random Numbers
5 pages

Decision Tree Is An Upside

Uploaded by

Decision Tree Is An Upside

Uploaded by

BCA 5TH SEM ( MACHINE LEARNING)UNIT 2

SMRITI SHARMA Page 1

SMRITI SHARMA Page 2

SMRITI SHARMA Page 3

SMRITI SHARMA Page 4

SMRITI SHARMA Page 5

Decision Tree – Classification

ctree(Species~Petal.Length+Petal.Width,data=train) -> mytree2

read.csv(“C:/Users/BHARANI/Desktop/Datasets/Boston.csv”) -> boston

SMRITI SHARMA Page 6

rpart(medv~., train) -> my_tree

cbind(Actual=test$medv,Predicted=predict_tree) -> final_data

(final_data$Actual – final_data$Predicted) -> error

cbind(final_data,error) -> final_data

sqrt(mean((final_data$error)^2)) -> rmse1

rpart(medv~lstat+nox+rm+age+tax, train) -> my_tree2

cbind(Actual=test$medv,Predicted=predict_tree2) -> final_data2

(final_data2$Actual – final_data2$Predicted) -> error2

cbind(final_data2,error2) -> final_data2

sqrt(mean((final_data2$error2)^2)) -> rmse2

SMRITI SHARMA Page 7

You might also like