0% found this document useful (0 votes)

5 views

Decision Tree

The document discusses the decision tree model, which is used for prediction by conducting a series of tests on descriptive features. It explains the structure of a decision tree, the importance of choosing the root node based on information gain, and the application of Shannon's entropy model to measure impurity and informativeness of features. The ID3 algorithm is introduced as a method for constructing decision trees, detailing the steps to calculate entropy and information gain for feature selection.

Uploaded by

Nilay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Decision Tree

Uploaded by

Nilay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

DECISION TREE

 Its a prediction model in which a series of test is conducted and use the answers to take decision
on prediction

 a decision tree consists of a root node (or starting node),interior nodes, and leaf nodes (or
terminating nodes) that are connected by branches

Each non-leaf node (root and interior)

in the tree specifies a test to be
carried out on a descriptive feature.

The number of possible levels that a

feature can take determines the number
of downward branches from a
non-leaf node
THE DATASET

To predict whether emails are spam or

ham (genuine). The dataset has three
binary descriptive features
SUSPICIOUS WORDS
UNKNOWN
SENDER CONTAINS
IMAGES
DEMONSTRATION

How do we decide which is the best decision tree to use?

CHOICE OF ROOT
NODE
 Decision trees are preferred to be shallower

 The shallowness of the tree depends on the descriptive features that best
discriminate between instances that have different target feature values toward
the top of the tree

 The formal measure we will use to do this is Shannon’s entropy model.

SHANNON’S ENTROPY MODEL
 Claude Shannon’s entropy model defines a computational measure of the impurity
(heterogenity)of the elements in a set

 In the activity of choosing a card randomly from fig. a it is known for

sure That an ace of spades is selected  zero entropy(no impurity)

 In the activity of choosing a card randomly from fig. f it is not known for sure as to which card is
going to be picked as the chances of any card being picked is
equally likely  very high entropy (high impurity)
SHANNON’S ENTROPY MODEL

An outcome with a large probability should map to a low entropy value

An outcome with a small probability should map to a large entropy value

Mathematical logarithm, or log, function does almost exactly the transformation

An attractive characteristic of this function is that the range of values for the binary
logarithm of a probability, [−∞, 0], is much larger than those taken by the probability
itself [0, 1].
SHANNON’S ENTROPY MODEL

(a) A graph illustrating how the value of a binary log (the log to the base 2) of a probability
changes across the range of probability values; (b) the impact of multiplying these values by − 1.
SHANNON’S ENTROPY MODEL

 Shannon’s model of entropy is a weighted sum of the logs of the probabilities of each
possible outcome when we make a random selection from a set.

measure of the impurity,

heterogeneity, of a set

use 2 as the base, s, when we calculate entropy, which means that we measure entropy in bits
ENTROPY CALCULATION : EXAMPLE1
Entropy for a card being picked from a pack of 52 cards

The probability of randomly selecting any specific card i, P(card = i), from a set of 52 cards is
1/52. The entropy is calculated as below
ENTROPY CALCULATION : EXAMPLE
2 Entropy for a type of suit being picked from a pack of (4 different suits )52 cards

Calculating the entropy of a set of 52 playing cards if we only distinguish between cards based
on their suit (heart ,club,diamond or spade)
INFORMATION
GAIN

Need to develop a formal model that captures the intuitions about the informativeness of the
features
Shannon’s entropy model does this , The measure of informativeness that we
will use is known as information gain
COMPUTATION OF INFORMATION GAIN
Step1 : Compute the entropy of the original dataset with respect to the target feature

Step2 : For each descriptive feature, create the sets that result by partitioning the
instances in the dataset using their feature values, and then sum the entropy
scores of each of these sets.

Step3: Subtract the remaining entropy value (computed in step 2) from the original
entropy value (computed in step 1) to give the information gain
COMPUTATION OF INFORMATION GAIN
(CONTD.)
Step1 :
COMPUTATION OF INFORMATION GAIN
(CONTD.)
Step2 : Computation of remaining entropy for the SUSPICIOUS WORDS feature
Computation of remaining entropy for the UNKNOWN SENDER feature
Computation of remaining entropy for the CONTAINS IMAGES feature
Computation of remaining entropy for the SUSPICIOUS WORDS feature
COMPUTATION OF INFORMATION GAIN
(CONTD.)
Step2 : Computation of remaining entropy for the SUSPICIOUS WORDS feature
Computation of remaining entropy for the UNKNOWN SENDER feature
Computation of remaining entropy for the CONTAINS IMAGES feature
Computation of remaining entropy for the UNKNOWN SENDER feature

US class
T
t
t
US class
f
f
f
Ranjitha U.N
COMPUTATION OF INFORMATION GAIN
(CONTD.)
Step2 : Computation of remaining entropy for the SUSPICIOUS WORDS feature
Computation of remaining entropy for the UNKNOWN SENDER feature
Computation of remaining entropy for the CONTAINS IMAGES feature
Computation of remaining entropy for the CONTAINS IMAGES feature
COMPUTATION OF INFORMATION GAIN
(CONTD.)
Step3 : information gain calculation for each descriptive feature
CHOICE OF THE ROOT
NODE

The feature that possess the highest information gain is the right choice
to be considered as root node to start with .As the tree grows the entropy
model allows us to decide which test we should add to the sequence next
ITERATIVE DICHOTOMIZER 3 (ID3) ALGORITHM (DECISION
TREE INDUCTION ALGORITHMS)
1: if all the instances in D have the same target level C then
stops
2: return a decision tree consisting of a leaf node with label C growing
3: else if d is empty then the
current
4: return a decision tree consisting of a leaf node with the label of the majority target path
tree by
in the

level in D adding a
leaf node
5: else if D is empty then to the
tree
6: return a decision tree consisting of a leaf node with the label of the majority
target level of the dataset of the immediate parent node
7:Else it extends
the current
8: d [best] ← arg max IG (d, D) ,d belongs to D path by
adding an

make a new node, Node d[best ], and label it with d [best] interior
node to the
tree and
10: partition D using d [best] growing the
branches
11: remove d [best] from d ,iteratively

12: for each partition D i of D do

13: grow a branch from Noded[best] to the decision tree created by rerunning ID3 with D= D i = i
Ranjitha U.N
NOTE

 In the ID3 algorithm the base cases are the situations where we stop splitting the dataset and
construct a leaf node with an associated target level

 There are two important things to remember when designing these base cases
1. the dataset of training instances considered at each of the interior
nodes in the tree is not the complete dataset
2. once a feature has been tested, it is not considered for selection
again along that path in the tree
 The ID3 algorithm uses the information gain metric to choose the best feature to test at each
node in the tree
EXAMPLE
THE TOTAL ENTROPY FOR THIS
DATASET
CALCULATING THE ENTROPY OF EVERY FEATURE
Calculating the entropy for feature = STREAM
Consider the values taken by the feature stream ie (true ,false)
Stream vegetation Stream vegetation
True Riparian False Chapparal
True Riparian
False Chapparal
True Conifer
False Conifer
True Chapparal

H(Stream=true) = - [ 2/4 log2 (2/4) + 1/4 log2(1/4) + 1/4 log2(1/4) ]

= 1.5 bit

H(Stream=false) = - [0/3 log2 (0/3) + 2/3 log2(2/3) + 1/3 log2(1/3) ]

= 0.91 bit

Rem(Stream) = 4/7(1.5) + 3/7(0.91)

=1.24 bits
I G(stream) = H(vegetation) – Rem(Stream)
= 1.55-1.24
Ranjitha U.N
= 0.308 bits
CALCULATING THE ENTROPY OF EVERY FEATURE
Calculating the entropy for feature = slope
Consider the values taken by the feature stream ie (flat ,moderate ,steep)
Slope vegetation Slope vegetation Slope vegetation

Flat Conifer Moderate Riparian Steep Chapparal

Steep Riparian

H(slope=flat) = - [1/1 log2 1] Steep Chapparal

Steep conifer
=0
Steep Chapparal
H(Slope=moderate) = - [1/1 log2 1]
=0
H(Slope=steep) = - [3/5 log2 (3/5)
+ 1/5 log2 (1/5)+ 1/5 log2 (1/5) ]
= 1.37
Rem(Steep ,D) = 5/7(1.37) + 1/7(0) +1/7(0)
=0.97 bits
I G(slope) = H(vegetation) – Rem(slope)
= 1.55-0.97
= 0.577 bits
CALCULATING THE ENTROPY OF EVERY FEATURE
Calculating the entropy for feature = Elevation
Consider the values taken by the feature elevation i.e. (high,low,medium,highest)
Elevation vegetation Elevation vegetation Elevation vegetation

Low Riparian highest Chapparal medium Riparian

medium Chapparal
H(elevation =high) = - [ 2/3 log2 (2/3) + 1/3 log2(1/3) + 0/3 log2(0/3) ]
= 0.9183 bit
H(elevation =medium) = - [1/2 log2 (1/2) + 1/2 log2(1/2) +0/2log2(0/2) ] Elevation vegetation
high Chapparal
=1 bit
high conifer
high Chapparal
H(elevation =highest) = - [1/1 log2 (1/1) + 0+0]
=0
H(elevation =low) = - [1/1 log2 (1/1) + 0+0])
=0
Rem(elevation ,D) =3/7(0.9183) + 2/7(1) 0.67
+0+0= I G(slope) = H(vegetation) –
Rem(elevation)
= 0.877 bit
CONSIDERING THE FIRST NODE( ROOT NODE)
FOR SPLITTING

Feature with highest information

gain is considered as root node
DECISION TREE AFTER FIRST LEVEL OF SPLITTING
 ELEVATION has the largest information
gain of the three features and so is
selected by the algorithm at the root
node of the tree

 ELEVATION is no longer listed in these

partitions as it has already been used
to split the data

 Elevation-low and Elevation –highest

both has single instances and can be
converted into leaf nodes.
SPLITTING THE TREE (SECOND
Consider the feature Elevation =medium
TIME)
H(elevation=medium) = 1/2 log2 1/2 + 1/2 log2 1/2
+0
=1
stream vegetation slope vegetation

true Riparian steep Riparian

false Chapparal steep Chapparal

H(stream=false) = -[1/1 log2 1 ] H(slope=steep) = -[1/2 log2 1/2 + 1/2 log2 1/2 ]
=0 =1
H(stream=true) = -[1/1 log2 1 ]
=0 Rem(stream ) = 1/2 (0) + ½(0) Rem(slope) =2/2(1)
=0 =1
I G (stream) = I G (slope) = H(elevation=medium) –
H(elevation=medium) – Rem(Slope)
Rem(Stream)= 1-0 = 1-1
=1 =0
STREAM has a higher information gain than SLOPE and so is the best feature
SPLITTING THE TREE FURTHER
Consider the feature Elevation =high
H(elevation=high) = 2/3 log2 2/3 + 1/3 log2 1/3 +0
= 0.9183
stream vegetation
slope vegetation
false Chapparal
steep Chapparal
false conifer
flat conifer
true Chapparal
steep Chapparal

H(stream=false) = -[1/2 log2 1/2 +1/2 log2 1/2 ] H(slope=steep) = -[2/2 log2 2/2]
=1 =0
H(stream=true) = -[1/1 log2 1 ] H(slope=flat) = - [1/1 log2 1/1]
=0 =0
Rem(stream ) = 2/3 (1) +0 Rem(slope) =2/3 (0) + 0
=0.666 =0
I G (stream) = H(elevation=high) – I G (slope) = H(elevation=high) –
Rem(Stream) Rem(Slope)
= 0.9183 – 0.66 = 0.9183- 0
=0.2517 =0.9183
FINAL DECISION
TREE

Entropy ID3 Exercise
No ratings yet
Entropy ID3 Exercise
3 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
10b Understanding Entropy Information Gain
No ratings yet
10b Understanding Entropy Information Gain
10 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
23 Id3
No ratings yet
23 Id3
20 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
27 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
UNIT - 5 - ID3 Algotithm (Good Slide)
No ratings yet
UNIT - 5 - ID3 Algotithm (Good Slide)
28 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
entropy and IG
No ratings yet
entropy and IG
23 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Act9
No ratings yet
Act9
22 pages
15-780: Graduate Artificial Intelligence: Decision Trees
No ratings yet
15-780: Graduate Artificial Intelligence: Decision Trees
41 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
Randomforest TNP
No ratings yet
Randomforest TNP
71 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Naïve Bayes-DecisionTrees-RandomForest-SVM
No ratings yet
Naïve Bayes-DecisionTrees-RandomForest-SVM
26 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
ID3 Algorithm: Michael Crawford
No ratings yet
ID3 Algorithm: Michael Crawford
28 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Decision Tree-Using Entropy
No ratings yet
Decision Tree-Using Entropy
17 pages
kumarIJMA49 52 2012 PDF
No ratings yet
kumarIJMA49 52 2012 PDF
8 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Decision Tree Classification
100% (1)
Decision Tree Classification
11 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Trees 2
No ratings yet
Decision Trees 2
18 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
ID3 Algorithm: Michael Crawford
No ratings yet
ID3 Algorithm: Michael Crawford
28 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
BookSlides 4B Information Based Learning Edited
No ratings yet
BookSlides 4B Information Based Learning Edited
64 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees
No ratings yet
Comparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees
10 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
C Programmin Language
From Everand
C Programmin Language
Knowledge Flow
No ratings yet
Week-8 Class Exercise Ques
No ratings yet
Week-8 Class Exercise Ques
2 pages
Function_Reading Material
No ratings yet
Function_Reading Material
13 pages
The Qualifier 1 (Xii - CS)
No ratings yet
The Qualifier 1 (Xii - CS)
6 pages
Comprehensive Guide To Python Commands
No ratings yet
Comprehensive Guide To Python Commands
7 pages
OOP Java Practical Assignment
No ratings yet
OOP Java Practical Assignment
2 pages
0 2020 ICT Notes Feb 15
No ratings yet
0 2020 ICT Notes Feb 15
93 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Subject Notes
No ratings yet
Subject Notes
33 pages
PPS Unit-Iii
No ratings yet
PPS Unit-Iii
69 pages
Using Linux Kernel Modules For Operating Systems Class Projects
No ratings yet
Using Linux Kernel Modules For Operating Systems Class Projects
10 pages
Queue Managerment
No ratings yet
Queue Managerment
46 pages
PDC 1
No ratings yet
PDC 1
41 pages
12345
No ratings yet
12345
4 pages
1fuzzy Logic
No ratings yet
1fuzzy Logic
5 pages
Project Report
No ratings yet
Project Report
47 pages
Fast and Accurate Detection of Banana Fruits in Complex Background Orchards
No ratings yet
Fast and Accurate Detection of Banana Fruits in Complex Background Orchards
12 pages
Notes - PPS Unit 6
No ratings yet
Notes - PPS Unit 6
24 pages
Iterative Error Correction Turbo Low Density Parity Check and Repeat Accumulate Codes 1st Edition Sarah J. Johnson - Download the complete ebook in PDF format and read freely
100% (1)
Iterative Error Correction Turbo Low Density Parity Check and Repeat Accumulate Codes 1st Edition Sarah J. Johnson - Download the complete ebook in PDF format and read freely
48 pages
L.P.P FINAL
No ratings yet
L.P.P FINAL
10 pages
Algorithm Lab Manual Updated
No ratings yet
Algorithm Lab Manual Updated
66 pages
DSA - W2022 (3134201) (GTURanker - Com)
No ratings yet
DSA - W2022 (3134201) (GTURanker - Com)
1 page
It221 Midterm Exam Reviewer
No ratings yet
It221 Midterm Exam Reviewer
8 pages
BCS304 Assignment2
No ratings yet
BCS304 Assignment2
3 pages
Part 22 Handouts
No ratings yet
Part 22 Handouts
6 pages
Associativity of Python Operators
No ratings yet
Associativity of Python Operators
3 pages
File Handling in Python
No ratings yet
File Handling in Python
65 pages
MATH 1101_TEST 1_2025_095616
No ratings yet
MATH 1101_TEST 1_2025_095616
5 pages
Long Question & Answer
No ratings yet
Long Question & Answer
11 pages
A class is a blueprint from which individual objects are created
No ratings yet
A class is a blueprint from which individual objects are created
2 pages
Data Structures and Algorithms Syllabus
No ratings yet
Data Structures and Algorithms Syllabus
10 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

DECISION TREE

Each non-leaf node (root and interior)

The number of possible levels that a

To predict whether emails are spam or

How do we decide which is the best decision tree to use?

 The formal measure we will use to do this is Shannon’s entropy model.

 In the activity of choosing a card randomly from fig. a it is known for

An outcome with a large probability should map to a low entropy value

An outcome with a small probability should map to a large entropy value

Mathematical logarithm, or log, function does almost exactly the transformation

measure of the impurity,

12: for each partition D i of D do

H(Stream=true) = - [ 2/4 log2 (2/4) + 1/4 log2(1/4) + 1/4 log2(1/4) ]

H(Stream=false) = - [0/3 log2 (0/3) + 2/3 log2(2/3) + 1/3 log2(1/3) ]

Rem(Stream) = 4/7(1.5) + 3/7(0.91)

Flat Conifer Moderate Riparian Steep Chapparal

H(slope=flat) = - [1/1 log2 1] Steep Chapparal

Low Riparian highest Chapparal medium Riparian

Feature with highest information

 ELEVATION is no longer listed in these

 Elevation-low and Elevation –highest

true Riparian steep Riparian

false Chapparal steep Chapparal

You might also like