0% found this document useful (0 votes)
122 views17 pages

Machine Learning - Decision Trees

The document discusses machine learning and decision trees. It defines machine learning as improving performance on tasks through experience. Decision trees are described as classifiers that predict categories or outcomes based on attributes, using a tree structure. The document outlines the algorithm for building a decision tree from a training set using attributes to split the data into branches at each node.

Uploaded by

xorpho
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views17 pages

Machine Learning - Decision Trees

The document discusses machine learning and decision trees. It defines machine learning as improving performance on tasks through experience. Decision trees are described as classifiers that predict categories or outcomes based on attributes, using a tree structure. The document outlines the algorithm for building a decision tree from a training set using attributes to split the data into branches at each node.

Uploaded by

xorpho
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning

-Decision Trees-
Gulden Uchyigit

Advances in AI
MACHINE LEARNING

• Introduction
• Decision trees

Russel and Norvig, AI, a modern approach (second edition)


Chapter 18 – sections 1-3

1
Machine learning
• So far, we have focused on building systems that do something
(behaviour/performance), given some knowledge
• (Machine) learning to
• improve behaviour/performance:

• learn to perform new tasks (more)


• increase ability on existing tasks (better)
• increase speed on existing tasks (faster)
• produce and increase knowledge:
• formulate explicit concept descriptions
• formulate explicit rules
• discover regularities in data
• discover the way the world behaves
Overall, machine learning to promote autonomy of agents
3

Definition
• A computer program is said to learn from
experience E with respect to some class of tasks T
and performance measure P, if its performance at
tasks in T, as measured by P, improves with
experience E.

• E.g A computer program that learns to play checkers might


improve its performance as measured by its ability to win
at of tasks involving playing checkers games, through
experience obtained by playing games against itself.

2
Learning architecture

Experience (e.g. inputs/outputs) problem/


task
Reasoning/
Learning
performing
element
element
answer/
background knowledge/ performance
bias

Learning Issues

• Expressiveness - what can be learnt?


• Efficiency - how easily is learning performed?
• Transparency - can we understand what has been
learnt?
• Bias - which hypotheses are preferred?
• Background knowledge - available or not?
• Assessing performance - cross-validation and
learning curves
• Coping with noise
6

3
Assessing performance
• Cross-validation: set of examples split into
training set (to learn) + test set (to check)
• Learning curves: growing the training set, how
does the behaviour of the learnt system improve
upon the test set?

[from Russell & Norvig]


7

Learning techniques

• Decision tree learning


• Learning neural networks
• Reinforcement learning
• Inductive logic programming

4
Decision Trees - Introduction

Goal: Categorization
• Given an event, predict its category. Examples:
• Who won a given football match?
• How should we file a given e-mail?
• Event=list of attributes. Examples
• Football: Who was the goalie?
• Email: Who sent the e-mail?

Introduction …..cont.

• Use a decision tree to predict categories for


new events.
• Use training data to build the decision tree.
New
Events

Training
Decision
Events &
Tree
Categories

Category

10

5
What is a Decision Trees?
• It is a classifier in the form of a tree structure.
Decision Leaf
node node

•Each decision node is labeled


with an attribute.
•Each arc is labeled with a
value for the nodes attribute.
•Each leaf node is labeled with
a category.

11

Example

• Whether or not I will be going to a party depends


on the four attributes:
• Distance - distance of the party (short, medium, long).
• Friends - whether or not any of my friends are going
(yes,no).
• Prior - whether or not I have any prior plans (yes, no).
• Rain- whether or not it is raining (yes, no).
• Tired – whether or not I am tired (yes, no).

12

6
Decision trees: example
Goal predicate: prior
attend-party commitment?
no yes

distance? no

long
short med

friends no
tired?
attending?
yes no yes
no
no yes raining? no

no yes

yes no
13

Reading the Decision Tree


•Decision trees implicitly define
prior?
logical sentences (conjunctions of implications)
no yes
dist? no
short long
med
friends? tired? no
no yes no yes
no yes rain? no
e.g. no yes
yes no
∀P attend-party(P) if not prior(P) & dist(P,short) & friends(P)
∀P attend-party(P) if not prior(P) & dist(P,med) & not tired(P) & not rain(P)

∀P attend-party(P) iff
[not prior(P) & dist(P,short) & friends(P)] or
[not prior(P) & dist(P,med) & not tired(P) & not rain(P)] 14

7
Decision tree learning algorithm
• Start with a set of examples (training set), set of
attributes SA, default value for goal predicate.
• If the set of examples is empty, then add a leaf with
the default value for the goal predicate and
terminate, otherwise
• If all examples have the same classification, then
add a leaf with that classification and terminate,
otherwise
• If the set of attributes SA is empty, then return the
default value for the goal predicate and terminate,
otherwise 15

Decision tree learning alg.


..cont.
• Choose an attribute A to split on.
• Add a corresponding test to the tree.
• Create new branches for each value of the
attribute.
• Assign each example to the appropriate branch.
• Iterate from step 1) on each branch, with set of
attributes SA-{A} and default value the majority
value for the current set of examples .

16

8
Example: training set (step 1)
Set of attributes SA

prior dist friend tired rain classification


1. Y L N Y N N
2. N M N Y Y N
3. N S Y Y Y Y
4. N S N Y N N
5. N M Y N N Y
6. N S Y Y N Y
7. Y S Y Y N N
8. Y M Y Y Y N
9. Y L Y Y N N
10. Y L Y Y Y N
Default value: Y
17

Example: decision tree learning


prior dist friend tired rain classification
1. Y L N Y N N
2. N M N Y Y N
3. N S Y Y Y Y
4.
5.
N
N
S
M
N
Y
Y
N
N
N
N
Y
distance
6.
7.
N
Y
S
S
Y
Y
Y
Y
N
N
Y
N
long
8. Y M Y Y Y N short med
9. Y L Y Y N N
2-,5+,8- 1-,9-,10-
10. Y L Y Y Y N
3+,4-,6+,7-
Default value: Y

18

9
Example: decision tree learning
distance All negative!

short med long 1-,9-,10-


3+,4-,6+,7- 2-,5+,8-
no
prior friend tired rain classification prior friend tired rain classification
prior friend tired rain classification
3. N Y Y Y Y 2. N N Y Y N
4. N N Y N N 5. N Y N N Y 1. Y N Y N N
6. N Y Y N Y 8. Y Y Y Y N 9. Y Y Y N N
7. Y Y Y N N 10. Y Y Y Y N
Default value: N
Default value: N Default value: N

19

Example: decision tree learning


prior friend tired rain classification
3. N Y Y Y Y distance
4. N N Y N N
6. N Y Y N Y
7. Y Y Y N N short med long
Default value: N 3+,4-,6+,7-
prior friend tired rain classification no
prior 2. N N Y Y N
5. N Y N N Y
commit? 8. Y Y Y Y N

3+,4-,6+ no yes
7- Default value: N

20

10
Example: decision tree learning

distance
short
med long
prior
commit? prior friend tired rain classification
no
2. N N Y Y N
no yes 5. N Y N N Y
8. Y Y Y Y N
friend tired rain classification
no
Default value: N
3. Y Y Y Y
4. N Y N N
6. Y Y N Y

Default value: Y

21

Example: decision tree learning


final tree
distance

short med long

prior no
tired?
commit?
no yes no yes

friends no yes no
attending?
no yes
no yes

22

11
Empty set of attributes if noise
A B classification
1. Y N N
2. N Y N
noise
3. N Y Y

yes A no
1- 2-,3+
no B yes
no
2-,3+ ?
Default value

23

Choose the “best” attribute?


• Intuition:
• The aim is to minimise the depth of the final tree
• choose attribute that provides as exact as possible a
classification :
• “perfect” attribute: all examples are either positive or negative
• “useless” attribute: the proportion of positive and negative
examples in the new sets is roughly the same as in the original set
• Information theory for defining
“perfect/useful/useless” attributes by computing the
information gain from choosing attributes

24

12
Entropy

• S is a sample of training examples


• p+ is the proportion of positive examples in
S
• p- is the proportion of negative examples in
S

Entropy(S)=- p+log2 p+ - p-log2 p-

25

Example

• Imagine that a total of 14 examples contains


9 positive examples and 5 negative
examples.
9 9 5 5
Entropy([9+,5−]) = − log2   − log2   = 0.940
14  14  14  14 
Note:
ln( x)
log 2 ( x) =
ln(2)
26

13
The Entropy Function
1.0

Equal number of positive


Entropy(S)

and negative examples

0.5

All positive examples

0.0
0.5 1.0
p
+ 27

Information Gain
Gain( S , A)
• = Expected reduction in
entropy due to sorting on A
| Sv |
Gain( S , A) ≡ Entropy ( S ) − ∑ Entropy ( Sv )
v∈Values ( A ) | S |

Where,
S is the number of training examples.
A is the attribute.
v is the values of the attributes.

28

14
A Worked Example
Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

29

Selecting the Root Attribute


S: [9+,5-] S: [9+,5-]
E=0.940 E=0.940

Humidity Wind

High Normal Weak Strong

S: [3+,4-] S: [6+,1-] S: [6+,2-] S: [3+,3-]


E=0.985 E=0.592 E=0.0.811 E=1.00
Gain ( S , Humidity ) Gain ( S , Humidity )
= 0 . 940 − ( 7 / 14 ) × 0 . 985 − ( 7 / 14 ) × 0 . 592 = 0 . 940 − ( 8 / 14 ) × 0 . 811 − ( 6 / 14 ) × 1 . 0
= 0 . 151 = 0 . 048

Which attribute is the best classifier? 30

15
Selecting the Root Attribute
S: [9+,5-]
E=0.940

Outlook

Sunny Overcast Rain

S: [2+,3-]
S: [4+,0-] S:[3+,2- ]
E=0.970
E=0 E=0.970
Gain ( S , Sunny )
= 0 . 940 − ( 5 / 14 ) × ( 0 . 970 ) + ( 4 / 14 ) × ( 0 ) + ( 5 / 14 ) × 0 . 970
= 0 . 246

31

Information Gain Values

• Gain(S,Outlook)=0.246
• Gain(S,Humidity)=0.151
• Gain(S,Wind)=0.048
• Gain(S,Temperature)=0.029
The ”best” attribute is the one with the highest information gain
value.
The ”worst” attribute is the one with the smallest information gain
value
32

16
Selecting root attribute
Outlook

Sunny
Overcast Rain

[D1,D2,D8,D9,D11]
[2+,3-]
?

Which Attribute Goes Here?


33

Selecting the next attribute


S: [2+,3-]
E=0.970 [D9+,D11+, D1-,D2-,D8-]
Temperature

Hot Mild Cool

[D1-,D2-] [D11+,D8-] [D9+]

S: [0+,2-] S: [1+,1-] S: [1+,0-]


E:? E:? E:?

34

17

You might also like