0% found this document useful (0 votes)
37 views25 pages

Decision Trees

Uploaded by

nehaalkhasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views25 pages

Decision Trees

Uploaded by

nehaalkhasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Decision Trees

Decision Trees
• Decision Trees is a non-parametric Supervised learning technique that can
be used for both classification and Regression problems.

• It is a tree-structured classifier, where internal nodes represent the


features of a dataset, branches represent the decision rules and each leaf
node represents the outcome.

• In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node.

• Decision nodes are used to make any decision and have multiple branches.

• Leaf nodes are the output of those decisions and do not contain any further
branches.
Decision Tree
• The decisions or the test are performed on the basis of features of the
given dataset.

• It is a graphical representation for getting all the possible solutions to a


problem/decision based on given conditions.

• A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
Examples
Types of Decision Trees
• There are two main types of Decision Trees:
• Classification trees
• Regression trees
• Classification trees (Yes/No types)

• Regression trees (Continuous data types)


• Here the decision or the outcome variable is Continuous, Ex, a
number like 123.
• Iterative Dichotomiser 3 (ID3 Algorithm)
Why use Decision Trees?

• Decision Trees usually mimic human thinking ability while


making a decision, so it is easy to understand.

• The logic behind the decision tree can be easily understood


because it shows a tree-like structure.
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.

• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.

• Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.

• Branch/Sub Tree: A tree formed by splitting the tree.

• Pruning: Pruning is the process of removing the unwanted branches from the tree.

• Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
Working principles of Decision Tree
algorithm
• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.

• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).

• Step-3: Divide the S into subsets that contains possible values for the best
attributes.

• Step-4: Generate the decision tree node, which contains the best attribute.

• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.
Attribute Selection Measures
• Attribute selection measure or ASM used to select the best attribute for the
nodes of the tree.

• Information Gain
• Gini Index

• Information Gain
• Information gain is the measurement of changes in entropy after the segmentation of
a dataset based on an attribute.
• It is the measure of how good an attribute is for predicting the class of each of the
training data..
• According to the value of information gain, we split the node and build the decision
tree.
Entropy
• Entropy, also called as Shannon Entropy is denoted by H(S) for
a finite set S, is the measure of the amount of uncertainty or
randomness in data.

Entropy H(s)= -P(Yes)log2 P(Yes)- P(No) log2 P(No)

Where,
•S= Total number of samples
•P(Yes)= Probability of Yes
•P(No)= Probability of No
Example
• For the set S = {Y,Y,Y,N,N,N,N,N}
• Total instances: 8
• Instances of N: 5
• Instances of Y: 3
Entropy H(S)= -P(yes)log2 P(yes)+ P(no) log2
P(no)
S

• If number of yes = number of no, Then P(s)=0.5 and Entropy(s)


=1
Information Gain
• Information Gain= Entropy(S)-
[(Weighted Avg) *Entropy(each feature) (or)

(or)

Information Gain = H(S) - H(S|X)


Gini Index
• Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
• An attribute with the low Gini index should be preferred as
compared to the high Gini index.
• It only creates binary splits, and the CART algorithm uses the
Gini index to create binary splits.
• Gini index can be calculated using the below formula:
Decision Tree classifier (ID3)
Decision tree generation consists of two phases:
Tree construction
• Initially all the training examples are at the root

• Attributes are categorical(if continuous-valued, they are discretized in


advance)
• Partition examples based on selected attributes

• Attributes are selected on the basis of a heuristic or statistical measure


(e.g.,information gain)

Tree pruning
• Identify and remove branches that reflect noise or outliers.
Decision Tree classifier (ID3)
ID3 Steps
• Calculate the Information Gain of each feature.

• Considering that all rows don’t belong to the same class, split the dataset S
into subsets using the feature for which the Information Gain is maximum.

• Make a decision tree node using the feature with the maximum Information
gain.

• If all rows belong to the same class, make the current node as a leaf node with
the class as its label.

• Repeat for the remaining features until we run out of all features, or the
decision tree has all leaf nodes.
xample :dataset of COVID-19 infection
ID Fever Cough Breathing Infected
issues
1 NO NO NO NO
2 YES YES YES YES
3 YES YES NO NO
4 YES NO YES YES
5 YES YES YES YES
6 NO YES NO NO
7 YES NO YES YES
8 YES NO YES YES
9 NO YES YES YES
10 YES YES NO YES
11 NO YES NO NO
12 NO YES YES YES
13 NO YES YES NO
14 YES YES NO NO
Example Cont’d
• From the total of 14 rows in our dataset S, there are 8 rows
with the target value YES and 6 rows with the target value
NO.

• The entropy of S is calculated as:

Entropy H(S) = — (8/14) * log₂(8/14) — (6/14) * log₂(6/14)

= 0.99

• Next step calculate the Information Gain for each feature.


Example Cont’d
• Information Gain Calculation for Fever:
In our Data set 8 rows with YES for Fever, there
are 6 rows having target value YES and 2 rows having
# target value NO.
Total rows
|S| = 14
For v = YES, |Sᵥ| = 8
Entropy(Sᵥ) = - (6/8) * log₂(6/8) - (2/8) * log₂(2/8) = 0.81

For v = NO, |Sᵥ| = 6


Entropy(Sᵥ) = - (2/6) * log₂(2/6) - (4/6) * log₂(4/6) = 0.91

Expanding the summation in the IG formula:

H(S, Fever) = Entropy(S) - (|Sʏᴇꜱ| / |S|) * Entropy(Sʏᴇꜱ) - (|Sɴᴏ| / |S|) *


Entropy(Sɴᴏ)

H(S, Fever) = 0.99 - (8/14) * 0.81 - (6/14) * 0.91 = 0.13


Example Cont’d
• Information Gain Calculation for Cough:
Fever Cough Breathi Infecte
In our Data set 10 rows with YES for Fever, there are 5 rows
ng
issues
d having target value YES and 5 rows having target value NO.
YES YES YES YES
# Total rows
YES YES NO NO
|S| = 14
YES YES YES YES For v = YES, |Sᵥ| = 8
NO YES NO NO Entropy(Sᵥ) = - (5/8) * log₂(5/8) - (5/8) * log₂(5/8) = 0.84
NO YES YES YES
For v = NO, |Sᵥ| = 6
YES YES NO YES
Entropy(Sᵥ) = - (5/6) * log₂(5/6) - (1/6) * log₂(1/6) = 0.68
NO YES NO NO

NO YES YES YES Expanding the summation in the IG formula:


NO YES YES NO
H(S, Cough) = Entropy(S) - (|Sʏᴇꜱ| / |S|) * Entropy(Sʏᴇꜱ) - (|Sɴᴏ| / |S|)
YES YES NO NO * Entropy(Sɴᴏ)

H(S, Cough) = 0.99 - (8/14) * 0.84 - (6/14) * 0.68 = 0.21


Example Cont’d
• Information Gain Calculation for Breathing Issues:
ID Fever Cough Breathi Infected
ng In our Data set 8 rows with YES for Fever, there
issues are 7 rows having target value YES and 1 row
2 YES YES YES YES
having target value NO.
4 YES NO YES YES
# Total rows
5 YES YES YES YES |S| = 14
7 YES NO YES YES
For v = YES, |Sᵥ| = 8
Entropy(Sᵥ) = - (7/8) * log₂(7/8) - (1/8) * log₂(1/8) = 0.54
8 YES NO YES YES

9 NO YES YES YES For v = NO, |Sᵥ| = 6


12 NO YES YES YES
Entropy(Sᵥ) = - (1/6) * log₂(1/6) - (5/6) * log₂(5/6) = 0.65

13 NO YES YES NO
Expanding the summation in the IG formula:

H(S, Breathing Issues) = Entropy(S) - (|Sʏᴇꜱ| / |S|) *


Entropy(Sʏᴇꜱ) - (|Sɴᴏ| / |S|) * Entropy(Sɴᴏ)

H(S, Breathing Issues) = 0.99 - (8/14) * 0.54 - (6/14) *


0.65 = .40
Example Cont’d
• Since the feature Breathing issues have the highest
Information Gain it is used to create the root node. Hence, after
this initial step our tree looks like this:

• Next, from the remaining two unused features, namely, Fever


and Cough, we decide which one is the best for the left branch
of Breathing Issues.
Example Cont’d

• Since the left branch of Breathing Issues denotes YES, we


will work with the subset of the original data i.e the set of rows
having YES as the value in the Breathing Issues
column. These 8 rows are shown below:
Information Gain(Sʙʏ, Fever) =
0.20
Information Gain(Sʙʏ, Cough)
= 0.09

• IG of Fever is greater than that


of Cough, so we select Fever as
the left branch of Breathing
Issues.
Example Cont’d

Our tree now looks like this:

But, since there is only one unused feature left we have no other choice but to make it the right
branch of the root node. So our tree now looks like this:
Example Cont’d

• There are no more unused features, so we stop here and jump


to the final step of creating the leaf nodes. For the left leaf node
of Fever, we see the subset of rows from the original data set
that has Breathing Issues and Fever both values as YES.

• Since all the values in the target column are YES, we label the left leaf node
as YES, but to make it more logical we label it Infected.

• Similarly, for the right node of Fever we see the subset of rows from the original
data set that have Breathing Issues value as YES and Fever as NO.
Example Cont’d

• Here not all but most of the values are NO, hence NO or Not
Infected becomes our right leaf node. We repeat the same process for the
node Cough, however here both left and right leaves turn out to be the same
i.e. NO or Not Infected as shown below :

You might also like