Machine Learning - Decision Trees
Machine Learning - Decision Trees
-Decision Trees-
Gulden Uchyigit
Advances in AI
MACHINE LEARNING
• Introduction
• Decision trees
1
Machine learning
• So far, we have focused on building systems that do something
(behaviour/performance), given some knowledge
• (Machine) learning to
• improve behaviour/performance:
Definition
• A computer program is said to learn from
experience E with respect to some class of tasks T
and performance measure P, if its performance at
tasks in T, as measured by P, improves with
experience E.
2
Learning architecture
Learning Issues
3
Assessing performance
• Cross-validation: set of examples split into
training set (to learn) + test set (to check)
• Learning curves: growing the training set, how
does the behaviour of the learnt system improve
upon the test set?
Learning techniques
4
Decision Trees - Introduction
Goal: Categorization
• Given an event, predict its category. Examples:
• Who won a given football match?
• How should we file a given e-mail?
• Event=list of attributes. Examples
• Football: Who was the goalie?
• Email: Who sent the e-mail?
Introduction …..cont.
Training
Decision
Events &
Tree
Categories
Category
10
5
What is a Decision Trees?
• It is a classifier in the form of a tree structure.
Decision Leaf
node node
11
Example
12
6
Decision trees: example
Goal predicate: prior
attend-party commitment?
no yes
distance? no
long
short med
friends no
tired?
attending?
yes no yes
no
no yes raining? no
no yes
yes no
13
∀P attend-party(P) iff
[not prior(P) & dist(P,short) & friends(P)] or
[not prior(P) & dist(P,med) & not tired(P) & not rain(P)] 14
7
Decision tree learning algorithm
• Start with a set of examples (training set), set of
attributes SA, default value for goal predicate.
• If the set of examples is empty, then add a leaf with
the default value for the goal predicate and
terminate, otherwise
• If all examples have the same classification, then
add a leaf with that classification and terminate,
otherwise
• If the set of attributes SA is empty, then return the
default value for the goal predicate and terminate,
otherwise 15
16
8
Example: training set (step 1)
Set of attributes SA
18
9
Example: decision tree learning
distance All negative!
19
3+,4-,6+ no yes
7- Default value: N
20
10
Example: decision tree learning
distance
short
med long
prior
commit? prior friend tired rain classification
no
2. N N Y Y N
no yes 5. N Y N N Y
8. Y Y Y Y N
friend tired rain classification
no
Default value: N
3. Y Y Y Y
4. N Y N N
6. Y Y N Y
Default value: Y
21
prior no
tired?
commit?
no yes no yes
friends no yes no
attending?
no yes
no yes
22
11
Empty set of attributes if noise
A B classification
1. Y N N
2. N Y N
noise
3. N Y Y
yes A no
1- 2-,3+
no B yes
no
2-,3+ ?
Default value
23
24
12
Entropy
25
Example
13
The Entropy Function
1.0
0.5
0.0
0.5 1.0
p
+ 27
Information Gain
Gain( S , A)
• = Expected reduction in
entropy due to sorting on A
| Sv |
Gain( S , A) ≡ Entropy ( S ) − ∑ Entropy ( Sv )
v∈Values ( A ) | S |
Where,
S is the number of training examples.
A is the attribute.
v is the values of the attributes.
28
14
A Worked Example
Day Outlook Temperature Humidity Wind PlayTennis
29
Humidity Wind
15
Selecting the Root Attribute
S: [9+,5-]
E=0.940
Outlook
S: [2+,3-]
S: [4+,0-] S:[3+,2- ]
E=0.970
E=0 E=0.970
Gain ( S , Sunny )
= 0 . 940 − ( 5 / 14 ) × ( 0 . 970 ) + ( 4 / 14 ) × ( 0 ) + ( 5 / 14 ) × 0 . 970
= 0 . 246
31
• Gain(S,Outlook)=0.246
• Gain(S,Humidity)=0.151
• Gain(S,Wind)=0.048
• Gain(S,Temperature)=0.029
The ”best” attribute is the one with the highest information gain
value.
The ”worst” attribute is the one with the smallest information gain
value
32
16
Selecting root attribute
Outlook
Sunny
Overcast Rain
[D1,D2,D8,D9,D11]
[2+,3-]
?
34
17