Decision Tree
Dept of CS & IT
Bahauddin Zakariya University, Sahiwal Campus
Artificial Intelligence
Amjad Rana
1
DECISION TREES
Introduction
It is a method that induces concepts from examples
(inductive learning)
Most widely used & practical learning method
The learning is supervised: i.e. the classes or categories of the
data instances are known
It represents concepts as decision trees (which can be
rewritten as if-then rules)
2
DECISION TREES
Introduction
The target function can be Boolean or discrete valued
3
DECISION TREES
Decision Tree Representation
1. Each node corresponds to an attribute
2. Each branch corresponds to an attribute value
3. Each leaf node assigns a classification
4
DECISION TREES
Example
5
DECISION TREES
Example
Outlook
Sunny Rain
Overcast
Humidity Wind
High Normal Strong Weak
A Decision Tree for the concept PlayTennis
An unknown observation is classified by testing its attributes and
reaching a leaf node
For example, the instance
<Outlook= Sunny, Temperature= Hot, Humidity= High, Wind= Strong>
corresponds to the left most branch of the tree and hence be classified as
negative instance 6
DECISION TREES
Decision Tree Representation
Decision trees represent a disjunction of conjunctions of
constraints on the attribute values of instances
Each path from the tree root to a leaf corresponds to a
conjunction of attribute tests (one rule for classification)
The tree itself corresponds to a disjunction of these
conjunctions (set of rules for classification)
7
DECISION TREES
Decision Tree Representation
8
DECISION TREES
Basic Decision Tree Learning Algorithm
Most algorithms for growing decision trees are variants of a
basic algorithm
An example of this core algorithm is the ID3 algorithm
developed by Quinlan (1986)
It employs a top-down, greedy search through the space of
possible decision trees
9
DECISION TREES
Basic Decision Tree Learning Algorithm
First of all we select the best attribute to be tested at the root
of the tree
For making this selection each attribute is evaluated using a
statistical test to determine how well it alone classifies the
training examples
10
DECISION TREES
Basic Decision Tree Learning Algorithm
We have
D12 D11
D1 - 12 observations
D2 D5
D10 D4 - 4 attributes
D6 • Outlook
D3
D14 • Temperature
D8 D9 • Humidity
D7 D13 • Wind
- 2 classes (Yes, No)
11
DECISION TREES
Basic Decision Tree Learning Algorithm
Outlook
Sunny Rain
Overcast
D10 D6
D1 D8
D3
D14
D11 D4
D9 D12
D2 D7
D13 D5
12
DECISION TREES
Basic Decision Tree Learning Algorithm
The selection process is then repeated using the training
examples associated with each descendant node to select the
best attribute to test at that point in the tree
13
DECISION TREES
Outlook
Sunny Rain
Overcast
D10 D6
D1 D8
D3
D14
D11 D4
D9 D12
D2 D7 D5
D13
What is the
“best” attribute to test at this point? The possible choices are
Temperature, Wind & Humidity
14
DECISION TREES
Which Attribute is the Best Classifier?
The central choice in the ID3 algorithm is selecting which
attribute to test at each node in the tree
We would like to select the attribute which is most useful for
classifying examples
For this we need a good quantitative measure
For this purpose a statistical property, called information
gain is used
15
DECISION TREES
Which Attribute is the Best Classifier?: Definition of Entropy
In order to define information gain precisely, we begin
by defining entropy
Entropy is a measure commonly used in information
theory.
Entropy characterizes the impurity of an arbitrary
collection of examples
16
DECISION TREES
Which Attribute is the Best Classifier?: Definition of Entropy
•S is sample of training
examples
• is the proportion of
positive example in S
• is the proportion of
negetive example in S
•Entropy measures the
impurity of S as:
17
DECISION TREES
Which Attribute is the Best Classifier?: Definition of Entropy
Suppose S is a collection of 14 examples of some Boolean concept,
including 9 positive and 5 negative examples (we adopt the notation [9+,
5-]) to summarize such a sample of data. Then the entropy of this
Boolean classification is
Entropy ([9+, 5–]) = – (9/14) log2 (9/14) – (5/14) log2 (5/14)
= 0.940
Note:
•Entropy is 0 if all the members of S belongs to same class
•Entropy is 1 if S contains an equal number of positive and negative
examples
•When the collection contain unequal number of positive and negative
examples, entropy is between 0 and 1
18
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
Gain (S, A) = Expected reduction in the entropy due to
sorting on A
19
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
Let’s
investigate
the attribute
Wind
20
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
The collection of examples has 9 positive values and 5
negative ones
Eight (6 positive and 2 negative ones) of these examples
have the attribute value Wind = Weak
Six (3 positive and 3 negative ones) of these examples have
the attribute value Wind = Strong
21
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
The information gain obtained by separating the
examples according to the attribute Wind is calculated as:
22
DECISION TREES
Which Attribute is the Best Classifier?: Information Gain
We calculate the Info Gain for each attribute and select
the attribute having the highest Info Gain
23
DECISION TREES
Example
Which attribute should be selected as the first test?
“Outlook” provides the most information
24
DECISION TREES
25
DECISION TREES
Example
The process of selecting a new attribute is now repeated for
each (non-terminal) descendant node, this time using only
training examples associated with that node
Attributes that have been incorporated higher in the tree are
excluded, so that any given attribute can appear at most once
along any path through the tree
26
DECISION TREES
Example
This process continues for each new leaf node until either:
1. Every attribute has already been included along this path
through the tree
2. The training examples associated with a leaf node have
zero entropy
27
DECISION TREES
Example
28
DECISION TREES
Reference
Sections 3.1 – 3.4 of T. Mitchell
Assignment
• Implement the decision tree algorithm which can
accept any data set and construct a tree – Due date
17th October
• Use dataset from UCI Machine Learning Repository
29
30
GOLDEN PEARL 7
• DO NOT BE A TEACHER EVERY
WHERE.
• DO NOT GIVE ADVICES
EVERYWHERE, EVERY TIME.
31