L5 - Decision Tree - B
L5 - Decision Tree - B
Vinh Vo
Ho Chi Minh city University of Banking
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Introductory Problems
ü John is working as a salesman at a
computer store. He collected the data
related to his previous customers as
shown in the table on the right.
ü John wants to use these data to
predict whether a new customer buy a
computer or not. He applied rules that
base on information like: age, income,
student or not, credit rating.
ü This lecture introduces an algorithm
for the question, ID3 Decision Tree.
Review: The Classification Problem
General Pattern
Previous Lecture:
Hypothesis Logistic Model
Input Output
h! x Now: Decision Tree
x (#) y (#) ∈ {0,1}
(classifier)
In the following we build a
• 0: “Negative Class” (e.g., spam email)
Decision Tree on another
• 1: “Positive Class” (e.g., not spam)
dataset, called “play-tennis”.
• These problems are binary classification: We leave the customer dataset
– The output is a discrete value, and on the previous slide as an
– It takes only one out of two possible values exercise at the end
Data Set for “Play-Tennis” Example
ID Outlook Temp. Humidity Wind Play Tennis • This is a typical dataset for
D1 Sunny Hot High Weak No Decision Tree illustration
D2 Sunny Hot High Strong No • 14 objects in two classes {𝒀, 𝑵}.
D3 Overcast Hot High Weak Yes Each row has 4 properties
D4 Rain Mild High Weak Yes • 𝐷𝑜𝑚{𝑂𝑢𝑡𝑙𝑜𝑜𝑘} =
D5 Rain Cool Normal Weak Yes {𝑆𝑢𝑛𝑛𝑦, 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡, 𝑅𝑎𝑖𝑛}
D6 Rain Cool Normal Strong No • 𝐷𝑜𝑚{𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒} =
D7 Overcast Cool Normal Strong Yes {𝐻𝑜𝑡, 𝑀𝑖𝑙𝑑, 𝐶𝑜𝑙𝑑}
D8 Sunny Mild High Weak No
• 𝐷𝑜𝑚{𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦} =
D9 Sunny Cold Normal Weak Yes {𝐻𝑖𝑔ℎ, 𝑁𝑜𝑟𝑚𝑎𝑙}
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
• 𝐷𝑜𝑚{𝑊𝑖𝑛𝑑} =
{𝑊𝑒𝑎𝑘, 𝑆𝑡𝑟𝑜𝑛𝑔}
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes • We will step-by-step build an ID3
Decision Tree for “Play-Tennis”
D14 Rain Mild High Strong No
Two Possible Decision Trees for “Play-Tennis”
Entropy
ü If the collection has 𝑘 distinct classes of
objects then the entropy is defined by:
(
Entropy S = 5 −p% log ) p%
%&'
=−p' log ) p' − p) log ) p) − ⋯ − p( log ) p(
Entropy: Example
ü From dataset of Play-Tennis:
9 9 5 5
Entropy 9! , 5" = − log # − log = 0.940
14 14 14 # 14
ü If all members of 𝑆 belong to the same class (the purest set) then
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = 0. For example, if all members are positive (p! = 1),
then p" = 0, and 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − 1 ∗ log # 1 − 0 ∗ log # 0 = 0.
ü If the collection contains an equal number of positive and negative
examples (p! = p" = 0.5), then the 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑠) = 1 (most impurity).
ü If the numbers of positive and negative examples are unequal then the
entropy is between 0 and 1.
Information Gain: Definition
• Information Gain is a measure of the effectiveness of an
attribute in classifying data.
• It is the expected reduction in entropy caused by partitioning
the objects according to this attribute.
Source: https://www.kaggle.com
Case Study: Explore The Data
Customer Job Distribution Marital status distribution
Case Study: Explore The Data
Barplot for credit in default Barplot for housing loan
Case Study: Explore The Data
Barplot for previous
Barplot for personal loan
marketing campaign outcome
Case Study: Explore The Data
Barplot for the y variable Correlation matrix
36548
4640
Entrop Entro
y(1F,3 py(3F
M) = - ,2M)
(1/4)lo = -(3/
g2(1/4) 5)log
(
= 0.81 - (3/4)l
o = 0.9 2 3/5) - (2/5)
13 g2(3/4) 710 log2 (2
/5)
Entrop Entro
y(4F,1 py(0F
M) = - ,4M)
(4/5)lo = -(0/
g2(4/5) 4)log
2 (0/4)
= 0.72 - (1/5)l = 0 - (4/4
19 og2(1/5) )log2 (4
/4)
Entrop Entro
y(3F,3 py(1F
M) = - ,2M)
(3/6)lo = -(1/
g2(3/6) 3)log
(
= 1 - (3/6)l
og2(3/6 = 0.9 2 1/3) - (2/3)
) 183 log2 (2
/3)
Male Female
Rules to Classify Males/Females
No Yes Yes No
Exercise 2
ü The entropy of a binary classification, as shown in the Entropy for Binary Class
figure on the right.
ü Explain why entropy is maximum when 𝑝 = 0.5?
Entropy
)
Entropy S = 5 −p% log ) p%
%&'
=−p' log ) p' − p) log ) p)
Note that: p) = 1 − p'
What we have learned so far?
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
THE END