CS6364 Lecture18 - ML Decision Tree(3)
CS6364 Lecture18 - ML Decision Tree(3)
Chapter 18
Section 1 – 3
Outline
• Learning agents
• Inductive learning
• Decision tree learning
Learning
• Learning is essential for unknown environments,
– i.e., when designer lacks omniscience
• Type of feedback:
– Supervised learning: correct answers for each
example
– Unsupervised learning: correct answers not given
– Reinforcement learning: occasional rewards
Inductive learning
• Simplest form: learn a function from examples
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:
•
Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
Expressiveness
• Decision trees can express any function of the input attributes.
• E.g., for Boolean functions, each truth table row is a path from root to leaf:
• Trivially, there is a consistent decision tree for any training set with one path
to leaf for each example (unless f nondeterministic in x) but it probably won't
generalize to new examples
– How to arrange the tree (of training set), so that the tree is smaller (to
make its tree-depth minimal and shallow)? => to make the decision
tree as small as possible.
Decision tree learning
• Aim: find a small tree consistent with the training examples
• Idea: (recursively) choose "most significant" attribute as root of
(sub)tree
Choosing an attribute
• Aim: find a small tree consistent with the training examples.
• Reason: (recursively) choose “most significant” attribute as
root of (sub)tree
• Idea: a good attribute splits the examples into subsets
that are (ideally) "all positive" or "all negative"
p n p p n n
I( , ) log2 log2
pn pn pn pn pn pn
Information gain
• A chosen attribute A divides the training set E into subsets
E1, … , Ev according to their values for A, where A has v
distinct values.
v
p i ni pi ni
remainder ( A) I( , )
i 1 p n pi ni pi ni
p n
IG ( A) I ( , ) remainder ( A)
pn pn
• Choose the attribute with the largest IG
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
2 4 6 2 4
IG( Patrons ) 1 [ I (0,1) I (1,0) I ( , )] .0541 bits
12 12 12 6 6
2 1 1 2 1 1 4 2 2 4 2 2
IG(Type ) 1 [ I ( , ) I ( , ) I ( , ) I ( , )] 0 bits
12 2 2 12 2 2 12 4 4 12 4 4
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
p n p p n n
I( , ) log 2 log 2
pn pn pn pn pn pn
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
2 4 6 2 4
IG( Patrons ) 1 [ I (0,1) I (1,0) I ( , )] .0541 bits
12 12 12 6 6
2 1 1 2 1 1 4 2 2 4 2 2
IG(Type ) 1 [ I ( , ) I ( , ) I ( , ) I ( , )] 0 bits
12 2 2 12 2 2 12 4 4 12 4 4
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
Example contd.
• Decision tree learned from the 12 examples: