0% found this document useful (0 votes)
50 views12 pages

Chapter 3 Decision Trees

This document summarizes the ID3 decision tree algorithm. It discusses how ID3 operates on categorical data and is primarily a classification algorithm. The key steps of ID3 include computing the entropy of the data, entropy given attribute values, information gain of attributes, and building the decision tree recursively. Weaknesses like overfitting are addressed through techniques like limiting tree depth and using ensemble methods like random forests.

Uploaded by

Mark Magumba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views12 pages

Chapter 3 Decision Trees

This document summarizes the ID3 decision tree algorithm. It discusses how ID3 operates on categorical data and is primarily a classification algorithm. The key steps of ID3 include computing the entropy of the data, entropy given attribute values, information gain of attributes, and building the decision tree recursively. Weaknesses like overfitting are addressed through techniques like limiting tree depth and using ensemble methods like random forests.

Uploaded by

Mark Magumba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 3: Decision Tree

Algorithm
Mark. A. Magumba
Decision Trees: ID3 (Iterative Dichotomizer)
• Decision Trees operate on categorical data
• They are primarily classification algorithms
• However, they can be modified into regression trees
• One basic algorithm is ID3 which is based on entropy
• Entropy of some set X is given by:
Entropy

• Entropy is a measure of uncertainty, least entropy is when all the probability mass
is in one outcome and maximal entropy is when the probability mass is uniformly
distributed
ID3 Concrete Example: Data
ID3 Steps
1. Compute the entropy of the data (Entropy(S))
On our data
Entropy (S) =
Entropy (S) =
= 0.94
2. Next you compute the entropy given some attribute value
This can be expressed as:
=
In other words, this time we take the probabilities given each value v of the
attribute set V
Entropy given v
For Outlook
V = “Sunny”
=
= Entropy (S|”Sunny”) =
=
= 0.97
Entropy given v
• Similarly entropy for other branches of outlook can be computed
• = 0.97
• =0
Information Gain
• Next we have to compute the information gain on the attribute
• This can be obtained by:

• In our case this is:


=
= 0.94 – 5/14*0.97 – 4/14* 0 – 5/14*0.97
= 0.246
Building the tree
• The Information gain for the other attributes can be similarly
computed to obtain the following values
• Gain(S, Temperature) = 0.029
• Gain(S, Humidity) = 0.152
• Gain(S, Windy) = 0.048
• From these values we fin that the maximal IG is on the outlook
attribute and this becomes our root node
Building the tree
Building the tree: Branching and leaf nodes
• ID3 is iterative
• The algorithm recursively calls itself on the branches until it
encounters some stopping condition
• These are the basic stopping conditions
• When there are no more attributes to check, ID3 returns the majority class
• When all remaining examples are one class again ID3 returns the majority
class, e.g. in our example all instances where outlook = overcast are positive
instances hence ID3 returns the leaf node “Yes”
• The branches for Outlook = “Sunny” and Outlook = “Rainy” have mixed
instances and since we have more attributes we can continue growing them
ID3 Weaknesses
• Given enough attributes, ID3 can end up perfectly classifying all of the
instances and overfit.
• Solutions: Have enough training data and in some cases it is beneficial to
specify a maximum tree depth to avoid very deep trees

• ID3 algorithm may also favor features with many branches leading to
sub optimal solutions.
• Solution: Some updates to the algorithm like c4.5 algorithm algorithmically
adjust for splitting. C4.5 normalizes information gain by dividing it with the
split information. The split information is given by:
Random Forests
• An additional solution to both problems is random forests
• Random forests algorithms generate multiple trees each on a subset
of the data
• The final decision is made after aggregating the output of these
different trees

You might also like