Decision Trees
Decision Trees
Decision Trees
• Decision Trees is a non-parametric Supervised learning technique that can
be used for both classification and Regression problems.
• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node.
• Decision nodes are used to make any decision and have multiple branches.
• Leaf nodes are the output of those decisions and do not contain any further
branches.
Decision Tree
• The decisions or the test are performed on the basis of features of the
given dataset.
• A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
Examples
Types of Decision Trees
• There are two main types of Decision Trees:
• Classification trees
• Regression trees
• Classification trees (Yes/No types)
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
Working principles of Decision Tree
algorithm
• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.
Attribute Selection Measures
• Attribute selection measure or ASM used to select the best attribute for the
nodes of the tree.
• Information Gain
• Gini Index
• Information Gain
• Information gain is the measurement of changes in entropy after the segmentation of
a dataset based on an attribute.
• It is the measure of how good an attribute is for predicting the class of each of the
training data..
• According to the value of information gain, we split the node and build the decision
tree.
Entropy
• Entropy, also called as Shannon Entropy is denoted by H(S) for
a finite set S, is the measure of the amount of uncertainty or
randomness in data.
Where,
•S= Total number of samples
•P(Yes)= Probability of Yes
•P(No)= Probability of No
Example
• For the set S = {Y,Y,Y,N,N,N,N,N}
• Total instances: 8
• Instances of N: 5
• Instances of Y: 3
Entropy H(S)= -P(yes)log2 P(yes)+ P(no) log2
P(no)
S
(or)
Tree pruning
• Identify and remove branches that reflect noise or outliers.
Decision Tree classifier (ID3)
ID3 Steps
• Calculate the Information Gain of each feature.
• Considering that all rows don’t belong to the same class, split the dataset S
into subsets using the feature for which the Information Gain is maximum.
• Make a decision tree node using the feature with the maximum Information
gain.
• If all rows belong to the same class, make the current node as a leaf node with
the class as its label.
• Repeat for the remaining features until we run out of all features, or the
decision tree has all leaf nodes.
xample :dataset of COVID-19 infection
ID Fever Cough Breathing Infected
issues
1 NO NO NO NO
2 YES YES YES YES
3 YES YES NO NO
4 YES NO YES YES
5 YES YES YES YES
6 NO YES NO NO
7 YES NO YES YES
8 YES NO YES YES
9 NO YES YES YES
10 YES YES NO YES
11 NO YES NO NO
12 NO YES YES YES
13 NO YES YES NO
14 YES YES NO NO
Example Cont’d
• From the total of 14 rows in our dataset S, there are 8 rows
with the target value YES and 6 rows with the target value
NO.
= 0.99
13 NO YES YES NO
Expanding the summation in the IG formula:
But, since there is only one unused feature left we have no other choice but to make it the right
branch of the root node. So our tree now looks like this:
Example Cont’d
• Since all the values in the target column are YES, we label the left leaf node
as YES, but to make it more logical we label it Infected.
• Similarly, for the right node of Fever we see the subset of rows from the original
data set that have Breathing Issues value as YES and Fever as NO.
Example Cont’d
• Here not all but most of the values are NO, hence NO or Not
Infected becomes our right leaf node. We repeat the same process for the
node Cough, however here both left and right leaves turn out to be the same
i.e. NO or Not Infected as shown below :