LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
LAB (1)
Decision Tree
Eng.Mohammed W. Awwad
Eng.Lina Y. Al.Aloul
Mar,2020
P age |2
Objectives
1- To be familiar with classification decision tree.
2- To be able to build classification decision tree using Python.
Introduction
A decision tree is a map of the possible outcomes of a series of related choices. It allows an
individual or organization to weigh possible actions against one another based on their
costs, probabilities, and benefits. They can be used to map out an algorithm that predicts
the best choice mathematically.
A decision tree typically starts with a single node, which branches into possible outcomes.
Each of those outcomes leads to additional nodes, which branch off into other possibilities.
Types Of Trees
Classification and Regression Trees (CART) is a term introduced by Leo Breiman to refer to the
Decision Tree algorithm that can be learned for classification or regression predictive
modeling problems.
Classification predictive modelling : is the task of approximating a mapping function (f) from
input variables (X) to discrete output variables (y).
The output variables are often called labels or categories. The mapping function predicts the
class or category for a given observation.
Regression predictive modelling: is the task of approximating a mapping function (f) from
input variables (X) to a continuous output variable (y).
A continuous output variable is a real-value, such as an integer or floating point value.
Figure 2(a) : Partial Tree splits the data into two branches Figure 2(b): vertical line A as splitter at 2.45
Figure 3(a): splits the data into two branches based on 4.95 Figure 3(b) : Vertical line B as splitter at 4.95
P age |4
In the image in Figure 3(a) , the tree has a maximum depth of 2.Tree depth is a measure of
how many splits a tree can make before coming to a prediction. This process could be
continued further with more splitting until the tree is as pure as possible. The problem with
many repetitions of this process is that this can lead to a very deep classification tree with
many nodes. Luckily, most classification tree implementations allow you to control for the
maximum depth of a tree which reduces overfitting. In other words, you can set the
maximum depth to stop the growth of the decision tree past a certain depth. For a visual
understanding of maximum depth, you can look at the image in Figure 4 .
Selection Criterion
Decision tree algorithm use information gain to split a node . Gini or entropy is the criterion
for calculating information gain .
IG = Entropy/Impurity before splitting(parent) — Entropy/Impurity after splitting(children)
Gini impurity is a measure of how often a randomly chosen element from the set would be
incorrectly labeled if it was randomly labeled according to the distribution of labels in the
subset.
P age |5
Tree Parameters
One of the benefits of decision tree training is that you can stop training based on several
thresholds.
The option minbucket provides the smallest number of observations that are allowed in a
terminal node. If a split decision breaks up the data into a node with less than the minbucket,
it won’t accept it.
The minsplit parameter is the smallest number of observations in the parent node that could
be split further. The default is 20. If you have less than 20 records in a parent node, it is labeled
as a terminal node.
Finally, the maxdepth parameter prevents the tree from growing past a certain depth/height.
height. The default is 30 . You can use the maxdepth option to create single-rule trees.
P age |6
Disadvantages:
1. A small change in the data can cause a large change in the structure of the decision tree
causing instability.
2. For a Decision tree sometimes calculation can go far more complex compared to other
algorithms.
3. Decision tree often involves higher time to train the model.
4. Decision tree training is relatively expensive as complexity and time taken is more.
5. Decision Tree algorithm is inadequate for applying regression and predicting continuous
values.
6. Decision trees prone to overfitting.
P age |7
Note: You should reshape iris target shape from (150,) to (150,1) to treat it as
column vector.
P age |8
4- Building Model
To draw the tree , we need to install graphviz library by enter the command :
conda install python-graphviz , in anaconda window
P a g e | 10
7- Features importance
As we note the petal length and width have the highest features importance
weights. Keep in mind that if a feature has a low feature importance value, it doesn’t
necessarily mean that the feature isn’t important for prediction, it just means that the
particular feature wasn’t chosen at a particularly early level of the tree. It could also be
that the feature identical or highly correlated with another informative feature.
P a g e | 11
One way to improve the performance of our model is by finding the optimal value
for max_depth hyperparameter. The code below outputs the accuracy for decision
trees with different values for max_depth.
P a g e | 12
P a g e | 13
Model A
P a g e | 15
Model B
P a g e | 16
Model C
P a g e | 17
Model D
P a g e | 18
Model E
Good Luck :)