AIML Lect5 Decision Tree
AIML Lect5 Decision Tree
Outlook
sunny
rain
overcast
Humidity Yes
Windy
No Yes No Yes
When to consider Decision Trees
• Instances of attribute-value pairs
• Target function is discrete valued
• Missing attribute values
• Examples:
– Medical diagnosis
– Credit risk analysis
– Object classification
4
Decision Tree
Given
– Database schema contains {A1, A2, …, Ah}
– D = {t1, …, tn} where ti=<ti1, …, tih>
– Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated with D
such that
– Each internal node is labeled with attribute, Ai
– Each arc is labeled with predicate which can be
applied to attribute at parent
– Each leaf node is labeled with a class, Cj
5
Decision Tree
Outlook
6
Decision Tree
Day Outlook Temperature Humidity Wind PlayCricket
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayCricket
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Issues
• Choosing Splitting Attributes
• Ordering of Splitting Attributes
• Number of Splits
• Tree Structure
• Stopping Criteria
• Training Data
• Pruning
• etc
9
Information
DT Induction
• When all the marbles in the bowl are mixed up, little
information is given.
• When the marbles in the bowl are all from one class
and those in the other two classes are on either side,
more information is given.
11
DT Induction
12
Entropy
• Entropy measures the amount of randomness or
surprise or uncertainty.
• Entropy E is defined as:
– E(D) = ci=1 pi log2 1/pi ,
• Where D is a dataset,
• c is the number of classes, and
• pi is the proportion of the training dataset belongs to
class i
• Goal in classification
– no surprise (entropy = 0)
– 0 log2 0 = 0
13
ID3
• It creates tree using information theory concepts.
• It chooses split attribute with the highest
information gain:
14
Top-Down Induction of Decision Trees ID3
1. Let A be the “best” decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant
4. Sort training dataset to leaf node according to
the attribute value of the branch
5. If all training dataset are perfectly classified (same
value of target attribute) stop, else iterate over new leaf
nodes.
15
• log2(X) = log10(X)/ log10(2)
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning
• G(D,Outlook) = 0.246
• G(D,Temperature) = 0.029
• G(D,Humidity) = 0.1515
• G(D,Wind) = 0.048
• Maximum gain is of Outlook➔ Outlook is best
splitting attribute
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning: A Simple Example
• Outlook is best splitting attribute
Decision Tree Learning: A Simple Example