Machine Learning Notes - Lec 04 - Decision Tree Learning
Machine Learning Notes - Lec 04 - Decision Tree Learning
Lecture 04
Decision Tree Learning
Reading:
Chapter 3 of Mitchell
Sections 4.3 and 6.1 of Wittena and Frank
Dr. Rao Muhammad Adeel Nawab 6
What are Decision Trees?
Decision tree learning is a method for approximating
discrete-valued target functions, in which the learned
function is represented by a decision tree.
Learned trees can also be re-represented as sets of if-then
rules to improve human readability.
Most popular of inductive inference algorithms
Successfully applied to a broad range of tasks..
No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 9
Decision Tree for PlayTennis
Outlook
No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 11
Decision Tree for Conjunction
Outlook=Sunny ∧ Wind=Weak Outlook
Wind No No
Strong Weak
No Yes
No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 13
Decision Tree for XOR
Outlook=Sunny XOR Outlook
Wind=Weak
Sunny Overcast Rain
Outlook
No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 16
What are Decision Trees?
Note that
each path through a decision tree forms a conjunction of
attribute tests
the tree as a whole forms a disjunction of such paths; i.e. a
disjunction of conjunctions of attribute tests
Preceding example could be re-expressed as:
(Out look = Sunny ∧ Humidity = Normal)
∧ = AND
∨ (Out look = Overcast) V = OR
∨ (Out look = Rain ∧ Wind =Weak)
Dr. Rao Muhammad Adeel Nawab 17
What are Decision Trees? (cont)
As a complex rule, such a decision tree could be coded by
hand.
However, the challenge for machine learning is to propose
algorithms for learning decision trees from examples.
Entropy(S) ≡ −p⊕
⊕ log2p⊕
⊕− p⊖
⊖ log2p⊖
⊖
where
Values(A) is the set of values attribute A can take on
Sv is the subset of S for which A has value v
First term in Gain(S,A) is entropy of original set; second
term is expected entropy after partitioning on A = sum of
entropies of each subset Sv weighted by ratio of Sv in S.
Dr. Rao Muhammad Adeel Nawab 40
Information Gain Cont….
Outlook
Ssunny=[D1, D2, D8, D9, D11] [D3, D7, D12, D13] [D4, D5, D6, D10, D14]
[2+,3-] [4+,0-] [3+,2-]
? Yes ?
Which attribute should be tested here?
Ssunny ={D1,D2,D8,D9,D11}
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
45
Second step
Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970
Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3.5 × 0.918 = 0 .019
Gain(SSunny, Temp) = 0.970 − 2/5 × 0.0 − 2/5 × 1.0 − 1/5 × 0.0 =0.570
Humidity provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value of Humidity
partition the training samples according to the value of Humidity
Over
Sunny Rain
cast
Outlook
Final tree is for S is:
No Yes No Yes
+ - +
A2
A1
+ - + + - -
+ - + - - +
A2 A2
- + - + -
A3 A4
Dr. Rao Muhammad Adeel Nawab + - - + 50
Hypothesis Space Search by ID3
ID3 searches a space of hypotheses (set of possible
decision trees) for one fitting the training data.
Search is simple-to-complex, hill-climbing search guided
by the information gain evaluation function.
Hypothesis space of ID3 is complete space of finite,
discrete-valued functions w.r.t available attributes
contrast with incomplete hypothesis spaces, such as
conjunctive hypothesis space
No Yes No Yes
Dr. Rao Muhammad Adeel Nawab 65
Refinements to Basic Decision Tree Learning
The addition of this incorrect example will now cause ID3 to
construct a more complex tree.
The new example will be sorted into the second leaf node from
the left in the learned tree, along with the previous positive
examples D9 and D11.
Because the new example is labeled as a negative example, ID3 will
search for further refinements to the tree below this node.
Result will be tree which performs well on (errorfull training
examples) and less well on new unseen instances.
Dr. Rao Muhammad Adeel Nawab 66
Refinements to Basic Decision Tree Learning
Since we previously had the correct example:
‹Sunny, Cool , Normal, Weak, PlayTennis = Yes›
‹Sunny, Mild , Normal, Strong, PlayTennis = Yes›
Tree will be elaborated below right branch of Humidity
Result will be tree that performs well on (errorful) training
examples, but less well on new unseen instances
Accuracy of tree over training examples increases monotonically as tree grows(to be expected)
Accuracy of tree over independent test examples increases till about 25 nodes, then decreases
Drawback: holding data back for a validation set reduces data available for
training
Dr. Rao Muhammad Adeel Nawab 77
Refinements: Rule Post-Pruning
Perhaps most frequently used method (e.g.,C4.5)
Proceed as follows:
1. Convert tree to equivalent set of rules
2. Prune each rule independently of others
3. Sort final rules into desired sequence for use
Convert tree to rules by making the conjunction of
decision nodes along each branch the antecedent of a rule
and each leaf the consequent
Trend vs Comfort
Control vs Quit
Make a Schedule with a particular focus on 3 things
3. Exercise REGULARLY
Exercise is any bodily activity that enhances or maintains physical
fitness and overall health and wellness.
I am 55 years old and I can run (or brisk walk) five kilometers in one go
(Prof. Roger Moore, University of Sheffield, UK)
No Pain No Gain