0% found this document useful (0 votes)

65 views51 pages

L5 - Decision Tree - B

This document provides an overview of decision trees and how they are used for classification problems. It discusses key concepts like entropy, information gain, and how decision trees work by recursively splitting the data into purer subsets based on attribute values. An example decision tree is built to classify whether to play tennis based on weather attributes. Entropy and information gain are used to select the best attributes to split on at each node, resulting in a tree that maximizes purity in its leaf nodes to classify new data. Pruning is also mentioned to avoid overfitting.

Uploaded by

Bùi Tấn Phát

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views51 pages

L5 - Decision Tree - B

Uploaded by

Bùi Tấn Phát

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Topic: Decision Tree

DAT706: Data Science for Business

Date: 2 March 2023

Vinh Vo
Ho Chi Minh city University of Banking
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Introductory Problems
ü John is working as a salesman at a
computer store. He collected the data
related to his previous customers as
shown in the table on the right.
ü John wants to use these data to
predict whether a new customer buy a
computer or not. He applied rules that
base on information like: age, income,
student or not, credit rating.
ü This lecture introduces an algorithm
for the question, ID3 Decision Tree.
Review: The Classification Problem
General Pattern
Previous Lecture:
Hypothesis Logistic Model
Input Output
h! x Now: Decision Tree
x (#) y (#) ∈ {0,1}
(classifier)
In the following we build a
• 0: “Negative Class” (e.g., spam email)
Decision Tree on another
• 1: “Positive Class” (e.g., not spam)
dataset, called “play-tennis”.
• These problems are binary classification: We leave the customer dataset
– The output is a discrete value, and on the previous slide as an
– It takes only one out of two possible values exercise at the end
Data Set for “Play-Tennis” Example
ID Outlook Temp. Humidity Wind Play Tennis • This is a typical dataset for
D1 Sunny Hot High Weak No Decision Tree illustration
D2 Sunny Hot High Strong No • 14 objects in two classes {𝒀, 𝑵}.
D3 Overcast Hot High Weak Yes Each row has 4 properties
D4 Rain Mild High Weak Yes • 𝐷𝑜𝑚{𝑂𝑢𝑡𝑙𝑜𝑜𝑘} =
D5 Rain Cool Normal Weak Yes {𝑆𝑢𝑛𝑛𝑦, 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡, 𝑅𝑎𝑖𝑛}
D6 Rain Cool Normal Strong No • 𝐷𝑜𝑚{𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒} =
D7 Overcast Cool Normal Strong Yes {𝐻𝑜𝑡, 𝑀𝑖𝑙𝑑, 𝐶𝑜𝑙𝑑}
D8 Sunny Mild High Weak No
• 𝐷𝑜𝑚{𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦} =
D9 Sunny Cold Normal Weak Yes {𝐻𝑖𝑔ℎ, 𝑁𝑜𝑟𝑚𝑎𝑙}
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
• 𝐷𝑜𝑚{𝑊𝑖𝑛𝑑} =
{𝑊𝑒𝑎𝑘, 𝑆𝑡𝑟𝑜𝑛𝑔}
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes • We will step-by-step build an ID3
Decision Tree for “Play-Tennis”
D14 Rain Mild High Strong No
Two Possible Decision Trees for “Play-Tennis”

Occam’s Principle: “If two theories This tree is much simpler as

explain the facts equally well, then the “outlook” is selected at the root.
simpler theory is preferred”
Question: How to select a good
=> Preferred the smallest tree that
correctly classifies all training examples. attribute to split a decision node?
Which attribute is better?
• The “play-tennis” set 𝑆 contains 9 positive objects (+) and 5
negative objects (-), denoted by [9+, 5-]
• If attributes “humidity” and “wind” split 𝑆 into sub-nodes with
proportions of positive and negative objects as below, which
attribute is better?
Idea: select the attribute that the
data within a sub-node is purer.
Question: How can we measure
the purity of a class? Solution:
Entropy and Information Gain
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Entropy: Definition
• Entropy characterizes the impurity (purity) of an
arbitrary collection of objects.
Ø𝑆: collection of positive & negative objects
Øp! : proportion of positive objects in S
Øp" : proportion of negative objects in S
ØOur dataset: 𝑆 = 14, p! = 9/14, p" = 5/14
• Entropy is defined by:
Entropy: Definition
Entropy for Binary Class
ü The entropy function relative to a binary
classification, as the proportion 𝑝 of positive
objects varies between 0 and 1.

Entropy
ü If the collection has 𝑘 distinct classes of
objects then the entropy is defined by:

(
Entropy S = 5 −p% log ) p%
%&'
=−p' log ) p' − p) log ) p) − ⋯ − p( log ) p(
Entropy: Example
ü From dataset of Play-Tennis:
9 9 5 5
Entropy 9! , 5" = − log # − log = 0.940
14 14 14 # 14
ü If all members of 𝑆 belong to the same class (the purest set) then
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆) = 0. For example, if all members are positive (p! = 1),
then p" = 0, and 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − 1 ∗ log # 1 − 0 ∗ log # 0 = 0.
ü If the collection contains an equal number of positive and negative
examples (p! = p" = 0.5), then the 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑠) = 1 (most impurity).
ü If the numbers of positive and negative examples are unequal then the
entropy is between 0 and 1.
Information Gain: Definition
• Information Gain is a measure of the effectiveness of an
attribute in classifying data.
• It is the expected reduction in entropy caused by partitioning
the objects according to this attribute.

𝑉𝑎𝑙𝑢𝑒(𝐴): the set of all possible values for the attribute 𝐴.

S* : the subset of S for which 𝐴 has value 𝑣.
Information Gain: Example
ID Wind Play Tennis
• 𝑆 = 9+, 5, D1 Weak No
D8 Weak No
• 𝑉𝑎𝑙𝑢𝑒(𝑊𝑖𝑛𝑑) = {𝑊𝑒𝑎𝑘, 𝑆𝑡𝑟𝑜𝑛𝑔} D3 Weak Yes

• S-./( = 6+, 2, ; S012345 = 3+, 3, D4

D5
Weak
Weak
Yes
Yes
D9 Weak Yes
D10 Weak Yes
D13 Weak Yes
D2 Strong No
D6 Strong No
D14 Strong No
D7 Strong Yes
D11 Strong Yes
D12 Strong Yes
Which attribute is a better classifier?

Information gain of each attribute:

Ø Gain(S, Outlook) = 0.246
Ø Gain S, Humidity = 0.151
Ø Gain S, Wind = 0.048
Ø Gain(S, Temperature) = 0.029
==> Root level: Outlook

Next level: consider other attributes after classifying by Outlook

Next step in growing the decision tree
ID Outl. Temp. Humi. Wind Play?
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D3 Overcast Hot High Weak Yes
Ø S!"##$ = D% , D& , D' , D(, D%% D7 Overcast Cool Normal Strong Yes
ü Gain S!"##$ , Humidity = .970 D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
ü Gain S!"##$ , Temperature = 0.570
D6 Rain Cool Normal Strong No
ü Gain(S!"##$ , Wind) = 0.019 D14 Rain Mild High Strong No
ü We select the attribute Humidity D4 Rain Mild High Weak Yes
Ø Similarly, on S)*+# we select the D5 Rain Cool Normal Weak Yes
attribute Wind D10 Rain Mild Normal Weak Yes
The Resulting Decision Tree & Its Rules
Decision Tree: Stopping Condition
Root node
The growing process can be stop if:
• The entropy in a node is 0 (purest), or
• The number of data points in a node is
less than a 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝛼, or
• The path from a sub-node to root node Leaf nodes
reaches a 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝛽, or
• The reduction in entropy is less than 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝛿, or
• The total number of leaf nodes reaches a 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝛾.
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Decision Tree: Extension Versions
ü For non-categorical attributes, we need ID Outlook Temp. Humi. Wind Play?
to converse the continuous values into D1 Sunny 85=>T 85=>T Weak No
discrete values by setting a 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑. D2 Sunny 80=>F 90=>T Strong No
D3 Overcast 83=>T 78=>F Weak Yes
For example:
D4 Rain 70=>F 96=>T Weak Yes
Ø If threshold!"#$. = 83, then D5 Rain 68=>F 80=>T Weak Yes
Temp. ≥ 83 = {True, False} D6 Rain 65=>F 70=>F Strong No
Ø If threshold&'#(. = 80, then D7 Overcast 64=>F 65=>F Strong Yes
(Humi. ≥ 80) = {True, False} D8 Sunny 72=>F 95=>T Weak No
D9 Sunny 69=>F 70=>F Weak Yes
ü Threshold should be a value which
D10 Rain 75=>F 80=>T Weak Yes
maximizes the gain for that attribute.
D11 Sunny 75=>F 70=>F Strong Yes
ü C4.5 Decision Tree proposes to perform D12 Overcast 72=>F 90=>T Strong Yes
binary split based on a threshold value. D13 Overcast 81=>F 75=>F Weak Yes
D14 Rain 71=>F 80=>T Strong No
Extension Version: An Example - C4.5
ID Outlook Temp. Humi. Wind Play? In ID3 algorithm, we’ve calculated
D1 Sunny 85=>T 85=>T Weak No
D2 Sunny 80=>F 90=>T Strong No
gains for each attribute.
D3 Overcast 83=>T 78=>F Weak Yes Here, we need to calculate gain
D4 Rain 70=>F 96=>T Weak Yes
ratios instead of gains (C4.5)
D5 Rain 68=>F 80=>T Weak Yes
DEFG(I)
D6 Rain 65=>F 70=>F Strong No
GainRatio A =
D7 Overcast 64=>F 65=>F Strong Yes KLMFNOGPQ(I)
D8 Sunny 72=>F 95=>T Weak No K! K!
D9 Sunny 69=>F 70=>F Weak Yes SplitInfo A = − ∑URST log V
K K
D10 Rain 75=>F 80=>T Weak Yes
D11 Sunny 75=>F 70=>F Strong Yes Details of C4.5 can be found here
D12 Overcast 72=>F 90=>T Strong Yes
D13 Overcast 81=>F 75=>F Weak Yes
D14 Rain 71=>F 80=>T Strong No
Decision Tree: Extension Versions
• Besides Information Gain, there are other measures that can
be used in attribute selection as below
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Decision Tree: Overfitting
• Classification task: fit a “model” to a set of training data, so as to be able
to make reliable predictions on general untrained data.
• Overfitting: A statistical model describes random error or noise instead of
the underlying relationship.
• Overfitting occurs when a model fits the training set very well but poorly
perform on general data.
Decision Tree: Overfitting
ü The generated tree may overfit the training data
Ø Too many branches, some may reflect anomalies due to noises or outliers
Ø Result is in poor performance on unseen objects.
ü Two approaches to avoid overfitting
Ø Prepruning: Halt tree construction early—do not split a node if this would result
in the goodness measure falling below a threshold.
o Difficult to choose an appropriate threshold.
Ø Postpruning: Remove branches from a “fully grown” tree—get a sequence of
progressively pruned trees
o Use a set of data different from the training data to decide which is the “best
pruned tree”.
Decision Tree: Pros & Cons
Pros Cons
• Understandable prediction rules are • Only one attribute is considered at a time.
created from the training data. • Computationally expensive for continuous
• Builds a short tree quickly. data (C4.5).
• Only need to test enough attributes • If new data is incorrectly clasified at near
until all data is classified. root level then the result varies so much.
• Finding leaf nodes enables test data to • If an attribute can take many categorical
be pruned, reducing number of tests. values then the decision tree may consist
• The whole dataset is considered to of many branchs and nodes regarding to
create tree. that attrubute. This is useless in prediction.
Training set TRAINING
1 2 3
x (() , y (() PHASE
Raw Data Feature Extracted 4
extraction feature
(feature
engineering)
Learning Algorithm:
maximize Information Gain
Data exploration
Data visualization 5 TESTING PHASE
Correlation Matrix Testing set
… x (+) , y (+) Decision Predicted
Tree y% ∈ 0,1
ID3 Decision Tree:
An Overview Evaluation metrics: Accuracy, Extension:
F-Score, TPR, FNR, etc. C4.5, CART
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Case Study: Banking Dataset
• The dataset comes from the UCI Machine Learning
repository, and it is related to direct marketing campaigns
(phone calls) of a Portuguese banking institution.
• Goal: predict whether the client will buy a product or not
(binary classification problem)
• Dataset:
Ø 41,188 instances.
Ø Each instance consists of 21 attributes.
Case Study: Banking Dataset
# Attributes Description
1 age numeric value
2 job job type (categorical: admin, retired, student, unknown, etc.)
3 marital status categorical: divorced, married, single, unknown
5 default has credit in default? (categorical: no, yes, unknown)
6 housing has housing loan? (categorical: no, yes, unknown)
7 loan loan: has personal loan? (categorical: no, yes, unknown)
15 poutcome outcome of the previous marketing campaign (categorical: failure,
nonexistent, success)
… … …
21 𝑦 target variable: has the client buy the product? (1: Yes, 0: No)
Banking Dataset: Snapshot

Source: https://www.kaggle.com
Case Study: Explore The Data
Customer Job Distribution Marital status distribution
Case Study: Explore The Data
Barplot for credit in default Barplot for housing loan
Case Study: Explore The Data
Barplot for previous
Barplot for personal loan
marketing campaign outcome
Case Study: Explore The Data
Barplot for the y variable Correlation matrix
36548

4640

Observation: the attributes are

nearly independent to each other
Case Study: Feature Selection
• The features in this lecture are: Split data into train & test set
Ø job type Total instances in Training Testing
Ø contact the data set 41188 Set Set
Ø previous #positive instances 25638 10910
Ø euribor3m #negative instances 3193 1447
Ø the outcome of the previous Total instances 28831 12357
marketing campaigns.
• Decision Tree version: CART
• Drop the variables that we do not need
Case Study: Experiment Results
#data in Predicted 10749 + 303
test set Label Accuracy = ≅ 89.44%
12357
12357 No Yes
No 10749 161 161 + 1144
Label
True

Incorrect Prediction = ≅ 10.56%

Yes 1144 303 12357

Precision Recall F1-Score F1-Score: the higher, the better. We

Class 0 0.90 0.99 0.94 will see later during the course
Class 1 0.65 0.21 0.32 We may try different features to
Avg/total 0.87 0.89 0.87 increase the classifier’s performance
Bank Dataset: Tree Visualization (1/2)
Bank Dataset: Tree Visualization (2/2)
Outline
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
Worked Exercise: Person Hair Weight Age Class
Length
The Simpsons Homer 0” 250 36 M
Converse attributes into Marge 10” 150 34 F
categorical values:
Bart 2” 90 10 M
ü Threshold Hair = 5
ü Threshold Weight = 160 Lisa 6” 78 8 F
ü Threshold Age = 40 Maggie 4” 20 1 F
Goal: determine the IG for Abe 1” 170 70 M
each attribute: Selma 8” 160 41 F
ü Gain Hair ?
Otto 10” 180 38 M
ü Gain Weight ?
ü Gain Age ? Krusty 6” 200 45 M
Note: the 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 for each Comic 8” 290 38 ?
level may be different This exercise is adapted from the lecture on ID3 by Prof. Allan Neymark
p æ p ö n æ n ö
Entropy ( S ) = - log 2 çç ÷÷ - log 2 çç ÷÷
p+n è p+nø p+n è p+nø

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
Hair Length <= 5?

Let us try splitting on

Hair length

Entrop Entro
y(1F,3 py(3F
M) = - ,2M)
(1/4)lo = -(3/
g2(1/4) 5)log
(
= 0.81 - (3/4)l
o = 0.9 2 3/5) - (2/5)
13 g2(3/4) 710 log2 (2
/5)

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

p æ p ö n æ n ö
Entropy ( S ) = - log 2 çç ÷÷ - log 2 çç ÷÷
p+n è p+nø p+n è p+nø

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
Weight <= 160?

Let us try splitting on

Weight

Entrop Entro
y(4F,1 py(0F
M) = - ,4M)
(4/5)lo = -(0/
g2(4/5) 4)log
2 (0/4)
= 0.72 - (1/5)l = 0 - (4/4
19 og2(1/5) )log2 (4
/4)

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

p æ p ö n æ n ö
Entropy ( S ) = - log 2 çç ÷÷ - log 2 çç ÷÷
p+n è p+nø p+n è p+nø

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
age <= 40?

Let us try splitting on

Age

Entrop Entro
y(3F,3 py(1F
M) = - ,2M)
(3/6)lo = -(1/
g2(3/6) 3)log
(
= 1 - (3/6)l
og2(3/6 = 0.9 2 1/3) - (2/3)
) 183 log2 (2
/3)

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

At Root level:
ü Gain Hair = 0.0911
ü Gain Weight = 0.5900
ü Gain Age = 0.0183 yes no
Weight <= 160?

At the next level, we found

that Hair Length is the best
one. The final decision tree no
yes
is given in behind Hair Length <= 2?
Convert Decision Weight <= 160?
Trees into rules
yes no

How would these Hair Length <= 2?

Male
people be classified?
yes no

Male Female
Rules to Classify Males/Females

If Weight greater than 160, classify as Male

Elseif Hair Length less than or equal to 2, classify as
Male
Else classify as Female
Exercise 1
The dataset that John さ
ん collected about his
previous customers as
shown on the right.
Question: Apply ID3
algorithm to build a
decision tree for the
concept “buy-computer”
Solution
Age
≤ 30 > 40
31 − 40
Credit
Student Yes
Rating
No Yes Fair Excellent

No Yes Yes No
Exercise 2
ü The entropy of a binary classification, as shown in the Entropy for Binary Class
figure on the right.
ü Explain why entropy is maximum when 𝑝 = 0.5?

Entropy
)
Entropy S = 5 −p% log ) p%
%&'
=−p' log ) p' − p) log ) p)
Note that: p) = 1 − p'
What we have learned so far?
• Introduction
• Decision Tree: theorical review
– Entropy and Information Gain
– Extension versions
– Overfitting and Tree Pruning
• Case study: Banking dataset
• Worked exercises
THE END

2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Unit 3
No ratings yet
Unit 3
81 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Lecture 19 - Decision Tress
No ratings yet
Lecture 19 - Decision Tress
21 pages
Unit 3
No ratings yet
Unit 3
90 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Machine Learning - Part 1
100% (1)
Machine Learning - Part 1
80 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
CSC454 10
No ratings yet
CSC454 10
36 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
ID3 Explanation
No ratings yet
ID3 Explanation
23 pages
Decision Tree Id3 Problem
No ratings yet
Decision Tree Id3 Problem
5 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
Decision Tree.10.11
No ratings yet
Decision Tree.10.11
31 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Decision Tree
100% (4)
Decision Tree
66 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Decision Tree
100% (1)
Decision Tree
10 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Decision Science PPT 2
No ratings yet
Decision Science PPT 2
20 pages
Bidirectional LSTM-CRF For Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Named Entity Recognition
10 pages
Test Bank Chapter 13
No ratings yet
Test Bank Chapter 13
19 pages
Quantitative Management-Network Models: Minimum Spanning Tree
No ratings yet
Quantitative Management-Network Models: Minimum Spanning Tree
10 pages
Goal Programming
No ratings yet
Goal Programming
12 pages
CS339 Voice Gender Recoginition
No ratings yet
CS339 Voice Gender Recoginition
19 pages
Automata - Chap2+finiteautomata
No ratings yet
Automata - Chap2+finiteautomata
47 pages
Information Security
No ratings yet
Information Security
43 pages
Apple T Notes
No ratings yet
Apple T Notes
29 pages
Saltelli Algorithm
No ratings yet
Saltelli Algorithm
3 pages
T318 Applied Network Security: Dr. Mahmoud Attalah
No ratings yet
T318 Applied Network Security: Dr. Mahmoud Attalah
54 pages
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
No ratings yet
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
8 pages
Topic Modeling A Comprehensive Review
No ratings yet
Topic Modeling A Comprehensive Review
17 pages
Familiarization of Matlab - Part2 - lsdcRNyfAH
No ratings yet
Familiarization of Matlab - Part2 - lsdcRNyfAH
3 pages
Unit 2 Graphics Filling
No ratings yet
Unit 2 Graphics Filling
4 pages
9.2 Notes 2DArray Challenges - Watermark
No ratings yet
9.2 Notes 2DArray Challenges - Watermark
11 pages
8-The Sampling Theorem
No ratings yet
8-The Sampling Theorem
8 pages
AVL Trees: Cse, Postech
100% (2)
AVL Trees: Cse, Postech
29 pages
Low Code No Code Ai Presentation
No ratings yet
Low Code No Code Ai Presentation
11 pages
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
No ratings yet
2011control Strategy of Disc Braking Systems For Downward Belt Conveyors
4 pages
Quantum Computing
No ratings yet
Quantum Computing
122 pages
Recursion
No ratings yet
Recursion
24 pages
Equations Worksheet
No ratings yet
Equations Worksheet
3 pages
Walter Simulation
No ratings yet
Walter Simulation
19 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
A Gentle Introduction To Mini-Batch Gradient Descent and How To Configure Batch Size
No ratings yet
A Gentle Introduction To Mini-Batch Gradient Descent and How To Configure Batch Size
16 pages
Seislet Based Morphological Component An
No ratings yet
Seislet Based Morphological Component An
9 pages
Data Science Interview Questions (#Day13)
No ratings yet
Data Science Interview Questions (#Day13)
10 pages
Project1 Report
No ratings yet
Project1 Report
21 pages
Pert CPM
No ratings yet
Pert CPM
31 pages

L5 - Decision Tree - B

Uploaded by

L5 - Decision Tree - B

Uploaded by

Topic: Decision Tree

DAT706: Data Science for Business

Occam’s Principle: “If two theories This tree is much simpler as

𝑉𝑎𝑙𝑢𝑒(𝐴): the set of all possible values for the attribute 𝐴.

• S-./( = 6+, 2, ; S012345 = 3+, 3, D4

Information gain of each attribute:

Next level: consider other attributes after classifying by Outlook

Observation: the attributes are

Incorrect Prediction = ≅ 10.56%

Precision Recall F1-Score F1-Score: the higher, the better. We

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E (Current set ) - å E (all child sets)

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

At the next level, we found

How would these Hair Length <= 2?

If Weight greater than 160, classify as Male

You might also like