Decision Tree Using ID3 Algorithm
Decision Tree Using ID3 Algorithm
Algorithm
Compiled by,
Dr. Shashank Shetty
DECISION TREE REPRESENTATION
• Decision trees classify instances by sorting them down the tree from
the root to some leaf node, which provides the classification of the
instance.
• Each node in the tree specifies a test of some attribute of the
instance, and each branch descending from that node corresponds to
one of the possible values for this attribute.
• An instance is classified by starting at the root node of the tree,
testing the attribute specified by this node, then moving down the
tree branch corresponding to the value of the attribute in the given
example. This process is then repeated for the subtree rooted at the
new node.
• Decision trees represent a disjunction of conjunctions of constraints
on the attribute values of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of
attribute tests, and the tree itself to a disjunction of these
conjunctions For example, the decision tree shown in above figure
corresponds to the expression
(Outlook = Sunny ∧ Humidity = Normal) ∨
(Outlook = Overcast) ∨
(Outlook = Rain ∧ Wind = Weak)
Appropriate Problems for Decision Tree Learning:
• Decision tree learning is generally best suited to problems with the
following characteristics:
1. Instances are represented by attribute-value pairs – Instances are
described by a fixed set of attributes and their values.
2. The target function has discrete output values – The decision tree assigns
a Boolean classification (e.g., yes or no) to each example. Decision tree
methods easily extend to learning functions with more than two possible
output values.
3. Disjunctive descriptions may be required.
4. The training data may contain errors – Decision tree learning methods are
robust to errors, both errors in classifications of the training examples and
errors in the attribute values that describe these examples.
5. The training data may contain missing attribute values – Decision tree
methods can be used even when some training examples have unknown
values.
What is ID3?
• A mathematical algorithm for building the decision tree.
• Invented by J. Ross Quinlan in 1979.
• Uses Information Theory invented by Shannon in 1948.
• Builds the tree from the top down, with no backtracking.
• Information Gain is used to select the most useful attribute for
classification.
Entropy
• A formula to calculate the homogeneity of a sample.
• A completely homogeneous sample has entropy of 0.
• An equally divided sample has entropy of 1.
• Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and
positive elements.
• The formula for entropy is:
Entropy Example
Entropy(S) =
- (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
Information Gain (IG)
• The information gain is based on the decrease in entropy after a dataset is split on an
attribute.
• Which attribute creates the most homogeneous branches?
• First the entropy of the total dataset is calculated.
• The dataset is then split on the different attributes.
• The entropy for each branch is calculated. Then it is added proportionally, to get total
entropy for the split.
• The resulting entropy is subtracted from the entropy before the split.
• The result is the Information Gain, or decrease in entropy.
• The attribute that yields the largest IG is chosen for the decision node.
Information Gain (cont’d)
• A branch set with entropy of 0 is a leaf node.
• Otherwise, the branch needs further splitting to classify its dataset.
• The ID3 algorithm is run recursively on the non-leaf branches, until all data
is classified.
Input Parameters:
•Examples: The training examples with known attribute values and corresponding class labels.