0% found this document useful (0 votes)
128 views22 pages

ID3 AllanNeymark

This document provides an overview of the ID3 algorithm for building decision trees. It explains that ID3 uses information theory to select the most useful attributes for classifying data at each decision node. The key concepts discussed include entropy, information gain, building the tree recursively from the top down, and converting decision trees to classification rules. An example applying ID3 to classify characters from The Simpsons is included to illustrate the process.

Uploaded by

Rajesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views22 pages

ID3 AllanNeymark

This document provides an overview of the ID3 algorithm for building decision trees. It explains that ID3 uses information theory to select the most useful attributes for classifying data at each decision node. The key concepts discussed include entropy, information gain, building the tree recursively from the top down, and converting decision trees to classification rules. An example applying ID3 to classify characters from The Simpsons is included to illustrate the process.

Uploaded by

Rajesh Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

ID3 Algorithm

Allan Neymark

CS157B Spring 2007

Agenda
Decision Trees
What is ID3?
Entropy
Calculating Entropy with Code
Information Gain
Advantages and Disadvantages
Example
Decision Trees
Rules for classifying data using attributes.
The tree consists of decision nodes and leaf
nodes.
A decision node has two or more branches,
each representing values for the attribute
tested.
A leaf node attribute produces a homogeneous
result (all in one class), which does not require
additional classification testing.
Decision Tree Example
overcast
high normal false true
sunny
rain
No No Yes Yes
Yes
Outlook
Humidity
Windy


What is ID3?
A mathematical algorithm for building the decision tree.
Invented by J. Ross Quinlan in 1979.
Uses Information Theory invented by Shannon in 1948.
Builds the tree from the top down, with no backtracking.
Information Gain is used to select the most useful
attribute for classification.
Entropy
A formula to calculate the homogeneity of a sample.
A completely homogeneous sample has entropy of 0.
An equally divided sample has entropy of 1.
Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of
negative and positive elements.
The formula for entropy is:
Entropy Example
Entropy(S) =
- (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
Calculating Entropy with Code
Most programming languages and calculators do not
have a log2 function.
Use a conversion factor
Take log function of 2, and divide by it.
Example: log10(2) = .301
Then divide to get log2(n):
log10(3/5) / .301 = log2(3/5)

Calculating Entropy with
Code (contd)
Taking log10(0) produces an error.
Substitute 0 for (0/3)log10(0/3)
Do not try to calculate log10(0/3)

Information Gain (IG)
The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
Which attribute creates the most homogeneous branches?
First the entropy of the total dataset is calculated.
The dataset is then split on the different attributes.
The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
The resulting entropy is subtracted from the entropy before the
split.
The result is the Information Gain, or decrease in entropy.
The attribute that yields the largest IG is chosen for the decision
node.

Information Gain (contd)
A branch set with entropy of 0 is a leaf node.
Otherwise, the branch needs further splitting to classify
its dataset.
The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.

Advantages of using ID3
Understandable prediction rules are created from the
training data.
Builds the fastest tree.
Builds a short tree.
Only need to test enough attributes until all data is
classified.
Finding leaf nodes enables test data to be pruned,
reducing number of tests.
Whole dataset is searched to create tree.
Disadvantages of using ID3
Data may be over-fitted or over-classified, if a small
sample is tested.
Only one attribute at a time is tested for making a
decision.
Classifying continuous data may be computationally
expensive, as many trees must be generated to see
where to break the continuum.
Example: The Simpsons
Person
Hair
Length
Weight Age Class
Homer 0 250 36 M
Marge 10 150 34 F
Bart 2 90 10 M
Lisa 6 78 8 F
Maggie 4 20 1 F
Abe 1 170 70 M
Selma 8 160 41 F
Otto 10 180 38 M
Krusty 6 200 45 M
Comic 8 290 38 ?
Hair Length <= 5?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Hair Length <= 5) = 0.9911 (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Hair length
Weight <= 160?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Weight <= 160) = 0.9911 (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Weight
age <= 40?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Age <= 40) = 0.9911 (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Age
Weight <= 160?
yes
no
Hair Length <= 2?
yes
no
Of the 3 features we had, Weight
was best. But while people who
weigh over 160 are perfectly
classified (as males), the under 160
people are not perfectly
classified So we simply recurse!
This time we find that we
can split on Hair length, and
we are done!
Weight <= 160?
yes no
Hair Length <= 2?
yes no
We need dont need to keep the data
around, just the test conditions.
Male
Male Female
How would
these people
be classified?
It is trivial to convert Decision
Trees to rules
Weight <= 160?
yes no
Hair Length <= 2?
yes
no
Male
Male Female
Rules to Classify Males/Females

If Weight greater than 160, classify as Male
Elseif Hair Length less than or equal to 2, classify as Male
Else classify as Female
References
Quinlan, J.R. 1986, Machine Learning, 1, 81

http://dms.irb.hr/tutorial/tut_dtrees.php

http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html

http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4
_dtrees2.html

Professor Sin-Min Lee, SJSU.
http://cs.sjsu.edu/~lee/cs157b/cs157b.html

You might also like