0% found this document useful (0 votes)

7 views25 pages

22.InfoTheory DecisionTrees Short

Artificial intelligence

Uploaded by

zenithw131013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

22.InfoTheory DecisionTrees Short

Artificial intelligence

Uploaded by

zenithw131013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Decision Trees

Jihoon Yang
Machine Learning Research Laboratory
Department of Computer Science & Engineering
Sogang University
Email: [email protected]
Decision tree representation

• In the simplest case

– Each internal node tests on an attribute
– Each branch corresponds to an attribute value
– Each leaf node corresponds to a class label

• In general
– Each internal node corresponds to a test (on input instances)
with mutually exclusive and exhaustive outcomes – tests may be
univariate or multivariate
– Each branch corresponds to an outcome of a test
– Each leaf node corresponds to a class label

Department of Computer Science & Engineering

Machine Learning Research Laboratory 2
Decision tree representation

11,01,10,00

Attributes Class 11,01,10,00 y

x y c x 1 0

1 0 11 10
1 1 1 A
E
01 00
x
2 0 1 B
11 01
1 x 0 1 x 0
a
m
10 00
p
l 3 1 0 A
e
c=A c=B 11 01 10 00
s
4 0 0 B
c=A c=B c=A c=B

Data set Tree 1 Tree 2

• Should we choose Tree 1 or Tree 2? Why?

Department of Computer Science & Engineering

Machine Learning Research Laboratory 3
Learning decision tree classifiers

• Ockham’s razor recommends that we pick the simplest decision

tree that is consistent with the training set

• There are far too many trees that are consistent with a training set

• Searching for the simplest tree that is consistent with the training
set is not typically computationally feasible

• Solution
– Use a greedy algorithm – not guaranteed to find the simplest
tree – but works well in practice
– Or restrict the space of hypothesis to a subset of simple trees

Department of Computer Science & Engineering

Machine Learning Research Laboratory 4
Information and Shannon Entropy

• Suppose we have a message that conveys the result of a random

experiment with m possible discrete outcomes, with probabilities
p1 , p2 ,..., pm

• The expected information content of such a message is called the

entropy of the probability distribution
m
H ( p1 , p2 ,..., pm )   pi I ( pi )
i 1

I ( pi )   log 2 pi provided pi  0
I ( pi )  0 otherwise

Department of Computer Science & Engineering

Machine Learning Research Laboratory 5
Shannon’s entropy as a measure of information


Let P  ( p1 .... pn ) be a discrete probability distribution
The entropy of the distribution P is given by

 1
 n n
H P   pi log 2     pi log 2  pi 
i 1  pi  i 1

1 1 2
1 1 1 1
H  ,    pi log 2  pi     log 2      log 2    1 bit
2 2 i 1 2 2 2 2
2
H 0,1   pi log 2  pi   1I 1  0 I 0  0 bit
i 1

Department of Computer Science & Engineering

Machine Learning Research Laboratory 6
The entropy for a binary variable

Department of Computer Science & Engineering

Machine Learning Research Laboratory 7
Learning decision tree classifiers

• On average, the information needed to convey the class



membership of a random instance drawn from nature is H P

Nature
Training Data S S1 S2

Instance
Sm

Classifier  m     

H  P   i 1  pi  log 2  pi   H ( X )
     

Class label where P is an estimate of P
and X is a random variable with distribution P

Si is the multi-set of examples belonging to class Ci

Department of Computer Science & Engineering
Machine Learning Research Laboratory 8
Learning decision tree classifiers

• The task of the learner then is to extract the needed information

from the training set and store it in the form of a decision tree for
classification

• Information gain based decision tree learner

Start with the entire training set at the root
Recursively add nodes to the tree
corresponding to tests that yield the
greatest expected reduction in entropy
(or the largest expected information gain)
until some termination criterion is met
( e.g. the training data at every leaf node has zero entropy )

Department of Computer Science & Engineering

Machine Learning Research Laboratory 9
Learning decision tree classifiers – Example

Training Data
Instance Class label
Instances – I1 (t, d, l) +
ordered 3-tuples of attribute values I2 (s, d, l) +
corresponding to I3 (t, b, l) 
I4 (t, r, l) 
Height (tall, short) I5 (s, b, l) 
Hair (dark, blonde, red) I6 (t, b, w) +
Eye (blue, brown) I7 (t, d, w) +
I8 (s, b, w) +

Department of Computer Science & Engineering

Machine Learning Research Laboratory 10
Learning decision tree classifiers – Example


I1…I8 3 3 5 5
H ( X )   log 2  log 2  0.954bits
8 8 8 8
Height
 3 3 2 2
H ( X | Height  t )   log 2  log 2  0.971bits
5 5 5 5
t s  2 2 1 1
H ( X | Height  s)   log 2  log 2  0.918bits
3 3 3 3

Hair is the most informative because it yields the largest reduction in

entropy; Test on the value of Hair is chosen to correspond to the root of
the decision tree
Department of Computer Science & Engineering
Machine Learning Research Laboratory 11
Learning decision tree classifiers – Example

Hair
d r
b
Compare the result with
+ Eye - Naïve Bayes

l w

- +

• In practice, we need some way to prune the tree to avoid overfitting

the training data

Department of Computer Science & Engineering

Machine Learning Research Laboratory 12
Learning, generalization, overfitting

• Consider the error of a hypothesis h over

– Training data: ErrorTrain (h)
– Entire distribution D of data: ErrorD (h)

• Hypothesis h  H over fits training data if there is an alternative

hypothesis h' H such that

ErrorTrain( h)  ErrorTrain( h' )

and

ErrorD ( h)  ErrorD ( h' )

Department of Computer Science & Engineering

Machine Learning Research Laboratory 13
Overfitting in decision tree learning
(e.g. diabetes dataset)

Department of Computer Science & Engineering

Machine Learning Research Laboratory 14
Causes of overfitting

• As we move further away from the root, the data set used to choose
the best test becomes smaller  poor estimates of entropy

• Noisy examples can further exacerbate overfitting

Department of Computer Science & Engineering

Machine Learning Research Laboratory 15
Minimizing overfitting

• Use roughly the same size sample at every node to estimate

entropy – when there is a large data set from which we can sample

• Stop when further split fails to yield statistically significant

information gain (estimated from validation set)

• Grow full tree, then prune

• Minimize size (tree) + size (exceptions (tree))

Department of Computer Science & Engineering

Machine Learning Research Laboratory 16
Rule post-pruning

• Convert tree to equivalent

set of rules

IF (Outlook = Sunny)  (Humidity = High)

THEN PlayTennis = No
IF (Outlook = Sunny)  (Humidity = Normal)
THEN PlayTennis = Yes
...

Department of Computer Science & Engineering

Machine Learning Research Laboratory 17
Rule post-pruning

1. Convert tree to equivalent set of rules

2. Prune each rule independently of others by dropping a condition

at a time if doing so does not reduce estimated accuracy (at the
desired confidence level)

3. Sort final rules in order of lowest to highest error for classifying

new instances

• Advantage – can potentially correct bad choices made close to the

root

• Post pruning based on validation set is the most commonly used

method in practice

Department of Computer Science & Engineering

Machine Learning Research Laboratory 18
Classification of instances

• Unique classification – possible when each leaf has zero entropy

and there are no missing attribute values

• Most likely or probabilistic classification – based on distribution of

classes at a node when there are no missing attributes

Department of Computer Science & Engineering

Machine Learning Research Laboratory 19
Handling different types of attribute values

• Types of attributes
– Nominal – values are names

– Ordinal – values are ordered

– Cardinal (numeric) – values are numbers (hence ordered)

– …

Department of Computer Science & Engineering

Machine Learning Research Laboratory 20
Handling numeric attributes

Attribute T 40 48 50 54 60 70

Class N N Y Y Y N

48  50
T
60  70
Candidate splits T ? ?
2 2
2 4 3 3 1  1 
E( S | T  49 ?)  (0)      log2      log2   
6 6 4 4 4  4 

• Sort instances by value of numeric attribute under consideration

• For each attribute, find the test which yields the lowest entropy

• Greedily choose the best test across all attributes

Department of Computer Science & Engineering
Machine Learning Research Laboratory 21
Handling numeric attributes

Axis-parallel split Oblique split

C2
C2 C1
C1

• Oblique splits cannot be realized by univariate tests

Department of Computer Science & Engineering

Machine Learning Research Laboratory 22
Two-way versus multi-way splits

• Entropy criterion favors many-valued attributes

– Pathological behavior – what if in a medical diagnosis data set,
social security number is one of the candidate attributes?

• Solutions
– Only two-way splits (CART): A = value versus A = ~value

– Gain ratio (C4.5)

Gain(S , A)
GainRatio(S , A) 
SplitInformation(S , A)
Values ( A )
| Si | |S |
SplitInformation(S , A)   
i 1 |S |
log2 i
|S |

Department of Computer Science & Engineering

Machine Learning Research Laboratory 23
See5/C5.0 [1997]

• Boosting

• New data types (e.g. dates), N/A values, variable misclassification

costs, attribute pre-filtering

• Unordered rulesets: all applicable rules are found and voted

• Improved scalability: multi-threading, multi-core/CPUs

Department of Computer Science & Engineering

Machine Learning Research Laboratory 24
Summary of decision trees

• Simple

• Fast (linear in size of the tree, linear in the size of the training set,
linear in the number of attributes)

• Produce easy to interpret rules

• Good for generating simple predictive rules from data with lots of
attributes

• Popular extensions: GBDT(Friedman, 2001), XGBoost(Chen &

Guestrin, 2016), LightGBM(Ke et al., 2017)

Department of Computer Science & Engineering

Machine Learning Research Laboratory 25

Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
100% (1)
Laptop Price Prediction Using Machine Learning: International Journal of Computer Science and Mobile Computing
5 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
4.decision Tree
No ratings yet
4.decision Tree
39 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
91 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
P4-DTRF 1
No ratings yet
P4-DTRF 1
63 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Lec 16-17 Decsion Tree
No ratings yet
Lec 16-17 Decsion Tree
69 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
MLSP Lab Exp4
No ratings yet
MLSP Lab Exp4
9 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Machine Learning Approaches: Decision Trees
No ratings yet
Machine Learning Approaches: Decision Trees
44 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Lesson 36 - Rule Induction and Decision Tree II
No ratings yet
Lesson 36 - Rule Induction and Decision Tree II
6 pages
06 Trees Handout
No ratings yet
06 Trees Handout
39 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
6 pages
Ds 6
No ratings yet
Ds 6
24 pages
ML - 04 - Decision Trees
No ratings yet
ML - 04 - Decision Trees
51 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Week03 Classification
No ratings yet
Week03 Classification
22 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
MLBF Session 5
No ratings yet
MLBF Session 5
23 pages
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
No ratings yet
Machine Learning: Decision Trees: CS540 Jerry Zhu University of Wisconsin-Madison
49 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Unit 3
No ratings yet
Unit 3
33 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Machine Learning: Prepared by
No ratings yet
Machine Learning: Prepared by
44 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Module 3 - Decision Tress and Artificial Neural Networks
No ratings yet
Module 3 - Decision Tress and Artificial Neural Networks
177 pages
CSE 455 Artificial Intelligence: Decision Trees
No ratings yet
CSE 455 Artificial Intelligence: Decision Trees
16 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Lec 02
No ratings yet
Lec 02
73 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
ML & Statistics Unit 6
No ratings yet
ML & Statistics Unit 6
36 pages
Decision Tree Classifier - Manual
No ratings yet
Decision Tree Classifier - Manual
6 pages
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
No ratings yet
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
17 pages
23 Id3
No ratings yet
23 Id3
20 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Lab Workbook With Solutions-Final PDF
100% (5)
Lab Workbook With Solutions-Final PDF
109 pages
McBride-2019 - Overview of Surrogate Modeling in Chemical Process Engineering
No ratings yet
McBride-2019 - Overview of Surrogate Modeling in Chemical Process Engineering
12 pages
Techniques For Developing and Refining Datasets
No ratings yet
Techniques For Developing and Refining Datasets
2 pages
Rose 2016
No ratings yet
Rose 2016
17 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Class-9 - AI Term-1 - Revision Worksheet Part-B Unit 1.1,1.2,1.3
100% (1)
Class-9 - AI Term-1 - Revision Worksheet Part-B Unit 1.1,1.2,1.3
13 pages
DL Unit 1
No ratings yet
DL Unit 1
21 pages
02-Predicting Corporate Bond Illiquidity Via Machine Learning
No ratings yet
02-Predicting Corporate Bond Illiquidity Via Machine Learning
26 pages
Agripredict CAPSTONE Report
No ratings yet
Agripredict CAPSTONE Report
40 pages
Final Pattern Recognition Laboratery
No ratings yet
Final Pattern Recognition Laboratery
39 pages
Artificial Neural Networks and Player Recruitment in Professional Soccer
No ratings yet
Artificial Neural Networks and Player Recruitment in Professional Soccer
8 pages
4 s2.0 S1568494623011080 Main
No ratings yet
4 s2.0 S1568494623011080 Main
17 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
Scaling Down Deep Learning With MNIST-1D
No ratings yet
Scaling Down Deep Learning With MNIST-1D
12 pages
ML Lab Manual TE 2021-22
No ratings yet
ML Lab Manual TE 2021-22
43 pages
Data Science Interview Preparation 1
100% (3)
Data Science Interview Preparation 1
79 pages
Question Bank - Machine Learning
100% (1)
Question Bank - Machine Learning
4 pages
House DZ RC 158 ML Patterns 2023
No ratings yet
House DZ RC 158 ML Patterns 2023
7 pages
Ids Course File-3rd Year
No ratings yet
Ids Course File-3rd Year
34 pages
Scalable and Weakly Supervised Bank Transaction CL
No ratings yet
Scalable and Weakly Supervised Bank Transaction CL
20 pages
Ai&ml Unit 5
No ratings yet
Ai&ml Unit 5
89 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
22 pages
Researchpaper
No ratings yet
Researchpaper
20 pages
Self-Study Plan For Becoming A Quantitative Trader - Part II
No ratings yet
Self-Study Plan For Becoming A Quantitative Trader - Part II
4 pages
Ayomitide Assignment
No ratings yet
Ayomitide Assignment
7 pages
Financial Distress Prediction Models: A Review Their Usefulness'
No ratings yet
Financial Distress Prediction Models: A Review Their Usefulness'
14 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
XAI For Precision Pathology, 18
No ratings yet
XAI For Precision Pathology, 18
30 pages

22.InfoTheory DecisionTrees Short

Uploaded by

22.InfoTheory DecisionTrees Short

Uploaded by

Decision Trees

• In the simplest case

Department of Computer Science & Engineering

Attributes Class 11,01,10,00 y

Data set Tree 1 Tree 2

• Should we choose Tree 1 or Tree 2? Why?

Department of Computer Science & Engineering

• Ockham’s razor recommends that we pick the simplest decision

Department of Computer Science & Engineering

• Suppose we have a message that conveys the result of a random

• The expected information content of such a message is called the

Department of Computer Science & Engineering

Department of Computer Science & Engineering

Department of Computer Science & Engineering

• On average, the information needed to convey the class

Classifier  m     

Si is the multi-set of examples belonging to class Ci

• The task of the learner then is to extract the needed information

• Information gain based decision tree learner

Department of Computer Science & Engineering

Department of Computer Science & Engineering

Hair is the most informative because it yields the largest reduction in

• In practice, we need some way to prune the tree to avoid overfitting

Department of Computer Science & Engineering

• Consider the error of a hypothesis h over

• Hypothesis h  H over fits training data if there is an alternative

ErrorTrain( h)  ErrorTrain( h' )

ErrorD ( h)  ErrorD ( h' )

Department of Computer Science & Engineering

Department of Computer Science & Engineering

• Noisy examples can further exacerbate overfitting

Department of Computer Science & Engineering

• Use roughly the same size sample at every node to estimate

• Stop when further split fails to yield statistically significant

• Grow full tree, then prune

• Minimize size (tree) + size (exceptions (tree))

Department of Computer Science & Engineering

• Convert tree to equivalent

IF (Outlook = Sunny)  (Humidity = High)

Department of Computer Science & Engineering

1. Convert tree to equivalent set of rules

2. Prune each rule independently of others by dropping a condition

3. Sort final rules in order of lowest to highest error for classifying

• Advantage – can potentially correct bad choices made close to the

• Post pruning based on validation set is the most commonly used

Department of Computer Science & Engineering

• Unique classification – possible when each leaf has zero entropy

• Most likely or probabilistic classification – based on distribution of

Department of Computer Science & Engineering

– Ordinal – values are ordered

– Cardinal (numeric) – values are numbers (hence ordered)

Department of Computer Science & Engineering

• Sort instances by value of numeric attribute under consideration

• Greedily choose the best test across all attributes

Axis-parallel split Oblique split

• Oblique splits cannot be realized by univariate tests

Department of Computer Science & Engineering

• Entropy criterion favors many-valued attributes

– Gain ratio (C4.5)

Department of Computer Science & Engineering

• New data types (e.g. dates), N/A values, variable misclassification

• Unordered rulesets: all applicable rules are found and voted

• Improved scalability: multi-threading, multi-core/CPUs

Department of Computer Science & Engineering

• Produce easy to interpret rules

• Popular extensions: GBDT(Friedman, 2001), XGBoost(Chen &

Department of Computer Science & Engineering

You might also like