Decision Tree
Decision Tree
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
Example: Suppose there is a candidate who has a job offer and wants to decide whether
he should accept the offer or Not. So, to solve this problem, the decision tree starts with
the root node (Salary attribute by ASM). The root node splits further into the next
decision node (distance from the office) and one leaf node based on the corresponding
labels. The next decision node further gets split into one decision node (Cab facility) and
one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:
There are many algorithms there to build a decision tree. They are
1. CART (Classification and Regression Trees) — This makes use of Gini impurity as
the metric.
2. ID3 (Iterative Dichotomiser 3) — This uses entropy and information gain as metric.
Here There are for independent variables to determine the dependent variable. The
independent variables are Outlook, Temperature, Humidity, and Wind. The dependent
variable is whether to play football or not.
As the first step, we have to find the parent node for our decision tree. For that follow the
steps:
Find the entropy of the class variable.
E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
note: Here typically we will take log to base 2.Here total there are 14 yes/no. Out of which 9
yes and 5 no.Based on it we calculated probability above.
From the above data for outlook we can arrive at the following table easily
Now we have to calculate average weighted entropy. ie, we have found the total of weights of each
feature multiplied by probabilities.
The next step is to find the information gain. It is the difference between parent entropy and
average weighted entropy we found above.
Now select the feature having the largest entropy gain. Here it is Outlook. So it forms the first
node(root node) of our decision tree.
Since overcast contains only examples of class ‘Yes’ we can set it as yes. That means If outlook is
overcast football will be played. Now our decision tree looks as follows.
The next step is to find the next node in our decision tree. Now we will find one under sunny. We
have to determine which of the following Temperature, Humidity or Wind has higher information
gain.
Similarly we get
Here IG(sunny, Humidity) is the largest value. So Humidity is the node that comes under sunny.
For humidity from the above table, we can say that play will occur if humidity is normal and will not
occur if it is high. Similarly, find the nodes under rainy.