0% found this document useful (0 votes)
33 views33 pages

AIML Lect5 Decision Tree

This document discusses decision trees, which are a type of supervised learning model used for classification and regression. It describes how decision trees are constructed by splitting the sample data into branches and leaves based on attribute values. The tree consists of decision nodes that test attributes, and leaf nodes that provide a class label. The example shows how a decision tree can be built to classify whether to play cricket based on weather attributes. It discusses concepts like information gain, entropy, and how the ID3 algorithm uses these to select the optimal attribute at each split in the tree.

Uploaded by

Yash Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views33 pages

AIML Lect5 Decision Tree

This document discusses decision trees, which are a type of supervised learning model used for classification and regression. It describes how decision trees are constructed by splitting the sample data into branches and leaves based on attribute values. The tree consists of decision nodes that test attributes, and leaf nodes that provide a class label. The example shows how a decision tree can be built to classify whether to play cricket based on weather attributes. It discusses concepts like information gain, entropy, and how the ID3 algorithm uses these to select the optimal attribute at each split in the tree.

Uploaded by

Yash Thakur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Decision Tree

Decision Tree learning


• Rules for classifying data using attributes.
• The tree consists of decision nodes and leaf nodes.
• A decision node has two or more branches, each
representing values for the attribute tested.
• A leaf node attribute produces a homogeneous result
(all in one class), which does not require additional
classification testing.
Decision Tree Example

Outlook

sunny
rain
overcast

Humidity Yes
Windy

high normal true false

No Yes No Yes
When to consider Decision Trees
• Instances of attribute-value pairs
• Target function is discrete valued
• Missing attribute values
• Examples:
– Medical diagnosis
– Credit risk analysis
– Object classification

4
Decision Tree
Given
– Database schema contains {A1, A2, …, Ah}
– D = {t1, …, tn} where ti=<ti1, …, tih>
– Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated with D
such that
– Each internal node is labeled with attribute, Ai
– Each arc is labeled with predicate which can be
applied to attribute at parent
– Each leaf node is labeled with a class, Cj

5
Decision Tree
Outlook

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an


attribute value node
No Yes
Each leaf node assigns a classification

6
Decision Tree
Day Outlook Temperature Humidity Wind PlayCricket
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayCricket
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Issues
• Choosing Splitting Attributes
• Ordering of Splitting Attributes
• Number of Splits
• Tree Structure
• Stopping Criteria
• Training Data
• Pruning
• etc

9
Information
DT Induction

• When all the marbles in the bowl are mixed up, little
information is given.
• When the marbles in the bowl are all from one class
and those in the other two classes are on either side,
more information is given.

11
DT Induction

12
Entropy
• Entropy measures the amount of randomness or
surprise or uncertainty.
• Entropy E is defined as:
– E(D) = ci=1 pi log2 1/pi ,
• Where D is a dataset,
• c is the number of classes, and
• pi is the proportion of the training dataset belongs to
class i
• Goal in classification
– no surprise (entropy = 0)
– 0 log2 0 = 0
13
ID3
• It creates tree using information theory concepts.
• It chooses split attribute with the highest
information gain:

• G(D,S) = E(D) –ci=1 P(Di)E(Di)


• where S is the splitting attribute

14
Top-Down Induction of Decision Trees ID3
1. Let A be the “best” decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant
4. Sort training dataset to leaf node according to
the attribute value of the branch
5. If all training dataset are perfectly classified (same
value of target attribute) stop, else iterate over new leaf
nodes.

15
• log2(X) = log10(X)/ log10(2)
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning

• 14 training; 9(Yes), 5 (No).


• Let E([X+,Y-]) represent that there are X positive(Yes)
training elements and Y negative elements.
• Therefore the Entropy for the training dataset, E(D),
can be represented as E([9+,5-])
Decision Tree Learning: A Simple Example

• E(D) = ci=1 pi log2 1/pi


=ci=1 -pi log2 pi ,
Initial Entropy of the Training Set.
E(D) = E([9+,5-])
= (-9/14 log2 9/14) + (-5/14 log2 5/14)
= 0.94
Decision Tree Learning: A Simple Example

• Calculate the information gain G(D,S) for each


attribute S where S is taken from the set {Outlook,
Temperature, Humidity, Wind}.
• G(D,S) = E(D) –ci=1 P(Di)E(Di)
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning: A Simple Example
• The information gain for Outlook is:
• G(D,Outlook) = E(D) – [5/14 * E(Outlook=sunny) +
4/14 * E(Outlook = overcast) + 5/14 *
E(Outlook=rain)]
• G(D,Outlook) = E([9+,5-]) – [5/14*E(2+,3-) +
4/14*E([4+,0-]) + 5/14*E([3+,2-])]
• G(D,Outlook) = 0.94 – [5/14*0.971 + 4/14*0.0 +
5/14*0.971]
• G(D,Outlook) = 0.246
Decision Tree Learning: A Simple Example
• G(D,Temperature) = 0.94 – [4/14*E(Temperature=hot) +
6/14*E(Temperature=mild) +
4/14*E(Temperature=cool)]
• G(D,Temperature) = 0.94 – [4/14*E([2+,2-]) +
6/14*E([4+,2-]) + 4/14*E([3+,1-])]
• G(D,Temperature) = 0.94 – [4/14 + 6/14*0.918 +
4/14*0.811]
• G(D,Temperature) = 0.029
Decision Tree Learning: A Simple Example
• G(D,Humidity) = 0.94 – [7/14*E(Humidity=high) +
7/14*E(Humidity=normal)]
• G(D,Humidity = 0.94 – [7/14*E([3+,4-]) +
7/14*E([6+,1-])]
• G(D,Humidity = 0.94 – [7/14*0.985 + 7/14*0.592]
• G(D,Humidity) = 0.1515
Decision Tree Learning: A Simple Example
• G(D,Wind) = 0.94 – [8/14*0.811 + 6/14*1.00]
• G(D,Wind) = 0.048
Maximum Gain

• G(D,Outlook) = 0.246
• G(D,Temperature) = 0.029
• G(D,Humidity) = 0.1515
• G(D,Wind) = 0.048
• Maximum gain is of Outlook➔ Outlook is best
splitting attribute
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning: A Simple Example
• Outlook is best splitting attribute
Decision Tree Learning: A Simple Example

• The root of our decision tree is Outlook (Sunny,


Overcast, and Rain)
• Next, recursively find the nodes that should go below
it.
Decision Tree Learning
Day Outlook Temperature Humidity Wind PlayTennis
D1 S H H W No
D2 S H H S No
D3 O H H W Yes
D4 R M H W Yes
D5 R C N W Yes
D6 R C N S No
D7 O C N S Yes
D8 S M H W No
D9 S C N W Yes
D10 R M N W Yes
D11 S M N S Yes
D12 O M H S Yes
D13 O H N W Yes
D14 R M H S No
Decision Tree Learning: A Simple Example
• G(Outlook=Rain, Humidity) = 0.971 –
[2/5*E(Outlook=Rain ^ Humidity=high) +
3/5*E(Outlook=Rain ^Humidity=normal]
• G(Outlook=Rain, Humidity) = 0.02

• G(Outlook=Rain,Wind) = 0.971- [3/5*0 + 2/5*0]


• G(Outlook=Rain,Wind) = 0.971
• G(Outlook=Rain,Temperature)=????
Decision Tree Learning: A Simple Example
• Decision tree looks like:

You might also like