6 DecisionTrees ID3 CART
6 DecisionTrees ID3 CART
A decision tree is a classification and prediction tool having a tree-like structure, where each
internal node denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label.
Above we have a small decision tree. An important advantage of the decision tree is that it is
highly interpretable. Here If Height > 180cm or if height < 180cm and weight > 80kg person is
male. Otherwise female. Did you ever think about how we came up with this decision tree? I
will try to explain it using the weather dataset.
Before going to it further I will explain some important terms related to decision trees.
Entropy
In machine learning, entropy is a measure of the randomness in the information being processed. The
higher the entropy, the harder it is to draw any conclusions from that information.
Information Gain
Information gain can be defined as the amount of information gained about a random variable or signal
from observing another random variable. It can be considered as the difference between the entropy of
parent node and weighted average entropy of child nodes.
Gini Impurity
Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly
labeled if it was randomly labeled according to the distribution of labels in the subset.
Gini impurity is lower bounded by 0, with 0 occurring if the data set contains only one class.
There are many algorithms there to build a decision tree. They are
1. CART (Classification and Regression Trees) — This makes use of Gini impurity as the metric.
2. ID3 (Iterative Dichotomiser 3) — This uses entropy and information gain as metric.
In this article, I will go through ID3. Once you got it, it is easy to implement the same using CART.
Here There are for independent variables to determine the dependent variable. The independent variables
are Outlook, Temperature, Humidity, and Wind. The dependent variable is whether to play football or not.
As the first step, we have to find the parent node for our decision tree. For that follow the steps:
Now we have to calculate average weighted entropy, ie, we have found the total of weights of each feature
multiplied by probabilities.
E(S, outlook) = (5/14)*E(3,2) + (4/14)*E(4,0) + (5/14)*E(2,3) = (5/14)(-(3/5)log(3/5)-(2/5)log(2/5))+ (4/14)(0) +
(5/14)((2/5)log(2/5)-(3/5)log(3/5)) = 0.693
The next step is to find the next node in our decision tree. Now we will find one under sunny. We have to
determine which of the following Temperature, Humidity or Wind has higher information gain.
For humidity from the above table, we can say that play will occur if humidity is normal and will not occur if
it is high. Similarly, find the nodes under rainy.
Classification using CART is similar to it. But instead of entropy, we use Gini impurity.
So as the first step we will find the root node of our decision tree. For that Calculate the Gini index of the
class variable
Gini(S) = 1 - [(9/14)² + (5/14)²] = 0.4591
As the next step, we will calculate the Gini gain. For that first, we will find the average weighted Gini
impurity of Outlook, Temperature, Humidity, and Windy.
Now you have got an idea of how to proceed further. Repeat the same steps we used in the ID3 algorithm.
The Classification and Regression Tree (CART) algorithm is a type of classification algorithm that is
required to build a decision tree on the basis of Gini’s impurity index.
It is a basic machine learning algorithm and provides a wide variety of use cases. A statistician named Leo
Breiman coined the phrase to describe Decision Tree algorithms that may be used for classification or
regression predictive modeling issues.
CART is an umbrella word that refers to the following types of decision trees:
Classification Trees: When the target variable is continuous, the tree is used to find the "class" into
which the target variable is most likely to fall.
Regression trees: These are used to forecast the value of a continuous variable.
The CART algorithm is a subpart of Random Forest, which is one of the most powerful algorithms of
Machine learning. The CART algorithm is organized as a series of questions, the responses to which decide
the following question if any. The ultimate outcome of these questions is a tree-like structure with terminal
nodes when there are no more questions.
Gini’s impurity index which measures a distribution among affection of specific-field with the result of
instance. It means, it can measure how much every mentioned specification is affecting directly in the
resultant case. Gini index is used in the real-life scenario.
Step by Step ID3 Decision Tree Example
Decision tree algorithms transfom raw data to rule based decision making trees. Herein, ID3 is one of the
most common decision tree algorithm. Firstly, It was introduced in 1986 and it is acronym of Iterative
Dichotomiser.
First of all, dichotomisation means dividing into two completely opposite things. That’s why, the algorithm
iteratively divides attributes into two groups which are the most dominant attribute and others to
construct a tree. Then, it calculates the entropy and information gains of each attribute. In this way, the
most dominant attribute can be founded. After then, the most dominant one is put on the tree as decision
node. Thereafter, entropy and gain scores would be calculated again among the other attributes. Thus, the
next most dominant attribute is found. Finally, this procedure continues until reaching a decision for that
branch. That’s why, it is called Iterative Dichotomiser.
No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees.
They all look for the feature offering the highest information gain. Then, they add a decision rule for the
found feature and build another decision tree for the sub data set recursively until they reached a decision.
Besides, regular decision tree algorithms are designed to create branches for categorical features. Still, we
are able to build trees with continuous and numerical features. The trick is here that we will convert
continuous features into categorical. We will split the numerical feature where it offers the highest
information gain.
ID3 in Python
This blog post mentions the deeply explanation of ID3 algorithm and we will solve a problem step by step.
On the other hand, you might just want to run ID3 algorithm and its mathematical background might not
attract your attention.
Herein, you can find the python implementation of ID3 algorithm here. You can build ID3 decision trees
with a few lines of code. This package supports the most common decision tree algorithms such as
ID3, C4.5, CART, CHAID or Regression Trees, also some bagging methods such as random forest and some
boosting methods such as gradient boosting and adaboost.
Objective
Decision rules will be found based on entropy and information gain pair of features.
Data set
For instance, the following table informs about decision making factors to play tennis at outside for
previous 14 days.
Day Outlook Temp. Humidity Wind Decision
These formulas might confuse your mind. Practicing will make it understandable.
Entropy
We need to calculate the entropy first. Decision column consists of 14 instances and includes two labels:
yes and no.
Notice that if the number of instances of a class were 0 and total number of instances were n, then we need
to calculate -(0/n) . log2(0/n). Here, log(0) would be equal to -∞, and we cannot calculate 0 times ∞. This is a
special case often appears in decision tree applications. Even though compilers cannot compute this
operation, we can compute it with calculus.
Here, there are 6 instances for strong wind. Decision is divided into two equal parts.
1- Entropy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)
2- Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1
1- Gain(Outlook=Sunny|Temperature) = 0.570
2- Gain(Outlook=Sunny|Humidity) = 0.970
3- Gain(Outlook=Sunny|Wind) = 0.019
Now, humidity is the decision because it produces the highest score if outlook were sunny.
On the other hand, decision will always be yes if humidity were normal
Here, wind produces the highest score if outlook were rain. That’s why, we need to check wind attribute in
2nd level if outlook were rain.
So, it is revealed that decision will always be yes if wind were weak and outlook were rain.
What’s more, decision will be always no if wind were strong and outlook were rain.
Feature Importance
Decision trees are naturally explainable and interpretable algorithms. Besides, we can find the feature
importance values as well to understand how model works.
References:
1. https://sefiks.com/2017/11/20/a-step-by-step-id3-decision-tree-example/
2. https://medium.datadriveninvestor.com/decision-tree-algorithm-with-hands-on-example-e6c2afb40d38
3. https://www.saedsayad.com/decision_tree.htm