Examples
Examples
A decision tree is a classification and prediction tool having a tree-like structure, where each
internal node denotes a test on an attribute, each branch represents an outcome of the test, and
each leaf node (terminal node) holds a class label.
Above we have a small decision tree. An important advantage of the decision tree is that it is
highly interpretable. Here If Height > 180cm or if height < 180cm and weight > 80kg person
is male.Otherwise female. Did you ever think about how we came up with this decision tree?
I will try to explain it using the weather dataset.
Before going to it further I will explain some important terms related to decision trees.
Entropy
Information Gain
Information gain can be defined as the amount of information gained about a random variable
or signal from observing another random variable.It can be considered as the difference
between the entropy of parent node and weighted average entropy of child nodes.
Gini Impurity
Gini impurity is a measure of how often a randomly chosen element from the set would be
incorrectly labeled if it was randomly labeled according to the distribution of labels in the
subset.
Gini impurity is lower bounded by 0, with 0 occurring if the data set contains only one class.
There are many algorithms there to build a decision tree. They are
1. CART (Classification and Regression Trees) — This makes use of Gini impurity as
the metric.
2. ID3 (Iterative Dichotomiser 3) — This uses entropy and information gain as metric.
In this article, I will go through ID3. Once you got it it is easy to implement the same using
CART.
Here There are for independent variables to determine the dependent variable. The
independent variables are Outlook, Temperature, Humidity, and Wind. The dependent
variable is whether to play football or not.
As the first step, we have to find the parent node for our decision tree. For that follow the
steps:
note: Here typically we will take log to base 2. Here total there are 14 yes/no. Out of which 9
yes and 5 no.Based on it we calculated probability above.
From the above data for outlook we can arrive at the following table easily
Now we have to calculate average weighted entropy. ie, we have found the total of weights
of each feature multiplied by probabilities.
The next step is to find the information gain. It is the difference between parent entropy and
average weighted entropy we found above.
Now select the feature having the largest entropy gain. Here it is Outlook. So it forms the
first node(root node) of our decision tree.
The next step is to find the next node in our decision tree. Now we will find one under sunny.
We have to determine which of the following Temperature, Humidity or Wind has higher
information gain.
Similarly we get
Here IG(sunny, Humidity) is the largest value. So Humidity is the node that comes under
sunny.
For humidity from the above table, we can say that play will occur if humidity is normal and
will not occur if it is high. Similarly, find the nodes under rainy.
As the next step, we will calculate the Gini gain. For that first, we will find the average
weighted Gini impurity of Outlook, Temperature, Humidity, and Windy.
Choose one that has a higher Gini gain. Gini gain is higher for outlook. So we can choose it
as our root node.
Now you have got an idea of how to proceed further. Repeat the same steps we used in the
ID3 algorithm.
Advantages:
Disadvantages:
1. More likely to overfit noisy data. The probability of overfitting on noise increases as a
tree gets deeper. A solution for it is pruning. You can read more about pruning from
my Kaggle notebook. Another way to avoid overfitting is to use bagging techniques
like Random Forest. You can read more about Random Forest from an article from
neptune.ai.