ID3 AllanNeymark

This document provides an overview of the ID3 algorithm for building decision trees. It explains that ID3 uses information theory to select the most useful attributes for classifying data at each decision node. The key concepts discussed include entropy, information gain, building the tree recursively from the top down, and converting decision trees to classification rules. An example applying ID3 to classify characters from The Simpsons is included to illustrate the process.

Uploaded by

Rajesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views22 pages

ID3 AllanNeymark

Uploaded by

Rajesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 22

ID3 Algorithm

Allan Neymark

CS157B Spring 2007

Agenda
Decision Trees
What is ID3?
Entropy
Calculating Entropy with Code
Information Gain
Advantages and Disadvantages
Example
Decision Trees
Rules for classifying data using attributes.
The tree consists of decision nodes and leaf
nodes.
A decision node has two or more branches,
each representing values for the attribute
tested.
A leaf node attribute produces a homogeneous
result (all in one class), which does not require
additional classification testing.
Decision Tree Example
overcast
high normal false true
sunny
rain
No No Yes Yes
Yes
Outlook
Humidity
Windy

What is ID3?
A mathematical algorithm for building the decision tree.
Invented by J. Ross Quinlan in 1979.
Uses Information Theory invented by Shannon in 1948.
Builds the tree from the top down, with no backtracking.
Information Gain is used to select the most useful
attribute for classification.
Entropy
A formula to calculate the homogeneity of a sample.
A completely homogeneous sample has entropy of 0.
An equally divided sample has entropy of 1.
Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of
negative and positive elements.
The formula for entropy is:
Entropy Example
Entropy(S) =
- (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
Calculating Entropy with Code
Most programming languages and calculators do not
have a log2 function.
Use a conversion factor
Take log function of 2, and divide by it.
Example: log10(2) = .301
Then divide to get log2(n):
log10(3/5) / .301 = log2(3/5)

Calculating Entropy with
Code (contd)
Taking log10(0) produces an error.
Substitute 0 for (0/3)log10(0/3)
Do not try to calculate log10(0/3)

Information Gain (IG)
The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
Which attribute creates the most homogeneous branches?
First the entropy of the total dataset is calculated.
The dataset is then split on the different attributes.
The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
The resulting entropy is subtracted from the entropy before the
split.
The result is the Information Gain, or decrease in entropy.
The attribute that yields the largest IG is chosen for the decision
node.

Information Gain (contd)
A branch set with entropy of 0 is a leaf node.
Otherwise, the branch needs further splitting to classify
its dataset.
The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.

Advantages of using ID3
Understandable prediction rules are created from the
training data.
Builds the fastest tree.
Builds a short tree.
Only need to test enough attributes until all data is
classified.
Finding leaf nodes enables test data to be pruned,
reducing number of tests.
Whole dataset is searched to create tree.
Disadvantages of using ID3
Data may be over-fitted or over-classified, if a small
sample is tested.
Only one attribute at a time is tested for making a
decision.
Classifying continuous data may be computationally
expensive, as many trees must be generated to see
where to break the continuum.
Example: The Simpsons
Person
Hair
Length
Weight Age Class
Homer 0 250 36 M
Marge 10 150 34 F
Bart 2 90 10 M
Lisa 6 78 8 F
Maggie 4 20 1 F
Abe 1 170 70 M
Selma 8 160 41 F
Otto 10 180 38 M
Krusty 6 200 45 M
Comic 8 290 38 ?
Hair Length <= 5?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Hair Length <= 5) = 0.9911 (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Hair length
Weight <= 160?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Weight <= 160) = 0.9911 (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Weight
age <= 40?
yes
no
Entropy(4F,5M) = -(4/9)log
2
(4/9) - (5/9)log
2
(5/9)
= 0.9911
|
|
.
|

\
|
+ +

|
|
.
|

\
|
+ +
=
n p
n
n p
n
n p
p
n p
p
S Entropy
2 2
log log ) (
Gain(Age <= 40) = 0.9911 (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
) ( ) ( ) ( sets child all E set Current E A Gain

=
Let us try splitting
on Age
Weight <= 160?
yes
no
Hair Length <= 2?
yes
no
Of the 3 features we had, Weight
was best. But while people who
weigh over 160 are perfectly
classified (as males), the under 160
people are not perfectly
classified So we simply recurse!
This time we find that we
can split on Hair length, and
we are done!
Weight <= 160?
yes no
Hair Length <= 2?
yes no
We need dont need to keep the data
around, just the test conditions.
Male
Male Female
How would
these people
be classified?
It is trivial to convert Decision
Trees to rules
Weight <= 160?
yes no
Hair Length <= 2?
yes
no
Male
Male Female
Rules to Classify Males/Females

If Weight greater than 160, classify as Male
Elseif Hair Length less than or equal to 2, classify as Male
Else classify as Female
References
Quinlan, J.R. 1986, Machine Learning, 1, 81

http://dms.irb.hr/tutorial/tut_dtrees.php

http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html

http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4
_dtrees2.html

Professor Sin-Min Lee, SJSU.
http://cs.sjsu.edu/~lee/cs157b/cs157b.html

CASE SUBMISSION - "Buffett's Bid For Media General's Newspapers"
No ratings yet
CASE SUBMISSION - "Buffett's Bid For Media General's Newspapers"
8 pages
The DIRKS Methodology: A User Guide
100% (2)
The DIRKS Methodology: A User Guide
285 pages
Remarkable Incidents and Modern Miracles Through Prayer and Faith by G. C. Bevington
100% (4)
Remarkable Incidents and Modern Miracles Through Prayer and Faith by G. C. Bevington
126 pages
Maths 1A - Chapter Wise Important Questions PDF
0% (1)
Maths 1A - Chapter Wise Important Questions PDF
27 pages
Forgive Me Father
No ratings yet
Forgive Me Father
9 pages
Randfontein Secondary June Exam Timetable 2025
No ratings yet
Randfontein Secondary June Exam Timetable 2025
12 pages
Weisberg 2016 Guided Play
No ratings yet
Weisberg 2016 Guided Play
6 pages
Cara Konsisten Karakter Dengan Text-To-Video Google Veo 3
No ratings yet
Cara Konsisten Karakter Dengan Text-To-Video Google Veo 3
4 pages
A Letter
No ratings yet
A Letter
8 pages
Decision Tree 2
No ratings yet
Decision Tree 2
19 pages
INTERJECTIONS-WPS Office
No ratings yet
INTERJECTIONS-WPS Office
5 pages
Data Mining Practical 8
No ratings yet
Data Mining Practical 8
7 pages
Cara Membaca CT Thorax
100% (4)
Cara Membaca CT Thorax
43 pages
L 3 Worksheet 6
No ratings yet
L 3 Worksheet 6
10 pages
Doc. No. Appendix D - HSE Risk Assessment 0 HDPE Pipe Hydrotest
No ratings yet
Doc. No. Appendix D - HSE Risk Assessment 0 HDPE Pipe Hydrotest
1 page
ID3 Complete Solution
No ratings yet
ID3 Complete Solution
3 pages
Decizsion Tree
No ratings yet
Decizsion Tree
16 pages
Oral Manifestaions of Systemic Diseases
No ratings yet
Oral Manifestaions of Systemic Diseases
116 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
40 pages
Chapter 7
No ratings yet
Chapter 7
6 pages
Complete ID3 Decision Tree
No ratings yet
Complete ID3 Decision Tree
15 pages
Decession Tree
No ratings yet
Decession Tree
72 pages
Decision Tree
No ratings yet
Decision Tree
29 pages
DM Lect 9 - Classification - Decision Trees
No ratings yet
DM Lect 9 - Classification - Decision Trees
39 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
22.InfoTheory DecisionTrees Short
No ratings yet
22.InfoTheory DecisionTrees Short
25 pages
UN Data Minig
No ratings yet
UN Data Minig
24 pages
Culture Eats Strategy: Nucor's Ken Iverson On Building A Different Kind of Company
No ratings yet
Culture Eats Strategy: Nucor's Ken Iverson On Building A Different Kind of Company
15 pages
Group Project
No ratings yet
Group Project
5 pages
ID3, Information Gain and Entropy
No ratings yet
ID3, Information Gain and Entropy
8 pages
UNIT - 5 - ID3 Algotithm (Good Slide)
No ratings yet
UNIT - 5 - ID3 Algotithm (Good Slide)
28 pages
Cultural Perspectives On Marvin Gaye's 'What's Going On?'
No ratings yet
Cultural Perspectives On Marvin Gaye's 'What's Going On?'
6 pages
Classification
No ratings yet
Classification
148 pages
Decision Trees - Id3 Algorithms
No ratings yet
Decision Trees - Id3 Algorithms
12 pages
Pil FD (1652)
No ratings yet
Pil FD (1652)
17 pages
3 - Decision Trees
No ratings yet
3 - Decision Trees
16 pages
2167TC1 Lab
No ratings yet
2167TC1 Lab
8 pages
Cases For Problem Areas in Legal Ethics
No ratings yet
Cases For Problem Areas in Legal Ethics
8 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Decision Trees
No ratings yet
Decision Trees
19 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
No ratings yet
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
18 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
CLAT Solved Paper 2015 PDF
No ratings yet
CLAT Solved Paper 2015 PDF
50 pages
Align Assessments
No ratings yet
Align Assessments
25 pages
23 Id3
No ratings yet
23 Id3
20 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Potla Shanthi
No ratings yet
Potla Shanthi
84 pages
Asort PDF
No ratings yet
Asort PDF
103 pages
Basic Boolean Functions and Logic Gates
No ratings yet
Basic Boolean Functions and Logic Gates
44 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
Ieee 2009 Image Processing Project Titles
No ratings yet
Ieee 2009 Image Processing Project Titles
1 page
Digital Design and Computer Architecture 60-265: Dr. Robert D. Kent LT 5100 519-253-3000 Ext. 2993 Rkent@uwindsor - Ca
No ratings yet
Digital Design and Computer Architecture 60-265: Dr. Robert D. Kent LT 5100 519-253-3000 Ext. 2993 Rkent@uwindsor - Ca
25 pages
Coping With Fixed Point-Bry
No ratings yet
Coping With Fixed Point-Bry
22 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Micro-Programmed Control CH 17 Hardwired Control: Computer Organization II 11.10.2002
No ratings yet
Micro-Programmed Control CH 17 Hardwired Control: Computer Organization II 11.10.2002
5 pages
King Tutankhamen
No ratings yet
King Tutankhamen
4 pages
Module 3
No ratings yet
Module 3
103 pages
ID3 MedhaPradhan
No ratings yet
ID3 MedhaPradhan
22 pages
Application Details of Registration-Id: 51160417060 For Recruitment of Si in Delhi Police & Capfs and Assistant Si in Cisf - 2014
No ratings yet
Application Details of Registration-Id: 51160417060 For Recruitment of Si in Delhi Police & Capfs and Assistant Si in Cisf - 2014
2 pages
Održivi Turizam U Republici Srbiji
No ratings yet
Održivi Turizam U Republici Srbiji
10 pages
Egron-Polak, E., & Hudson, R. (2014) - Internationalization of Higher Education
No ratings yet
Egron-Polak, E., & Hudson, R. (2014) - Internationalization of Higher Education
19 pages
Neill.j Compare Contrast
No ratings yet
Neill.j Compare Contrast
4 pages
Digital Equipment Corporation: Thomas J. Bergin ©computer History Museum American University
No ratings yet
Digital Equipment Corporation: Thomas J. Bergin ©computer History Museum American University
32 pages
About Way2wealth Group
No ratings yet
About Way2wealth Group
9 pages
Traded As: Industry Founded Founder Headquarters Area Served Key People
No ratings yet
Traded As: Industry Founded Founder Headquarters Area Served Key People
16 pages
Grammars, Recursively Enumerable Languages, and Turing Machines
100% (1)
Grammars, Recursively Enumerable Languages, and Turing Machines
58 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
CPC2
No ratings yet
CPC2
51 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
6 Habits For Successful People
No ratings yet
6 Habits For Successful People
5 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
ID3 Algorithm: Michael Crawford
No ratings yet
ID3 Algorithm: Michael Crawford
28 pages
Functions in Staffing
No ratings yet
Functions in Staffing
11 pages
Ai Notes
No ratings yet
Ai Notes
18 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Dissertation On Export Finance
No ratings yet
Dissertation On Export Finance
38 pages
Demisew Presentation 1
No ratings yet
Demisew Presentation 1
14 pages
Induction of Decision Trees: Machine Learning
No ratings yet
Induction of Decision Trees: Machine Learning
52 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
BookSlides 4A Information-Based Learning PDF
No ratings yet
BookSlides 4A Information-Based Learning PDF
65 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Designing An Improved Id3 Decision Tree Algorithm
No ratings yet
Designing An Improved Id3 Decision Tree Algorithm
5 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
ID3 Algorithm: Michael Crawford
No ratings yet
ID3 Algorithm: Michael Crawford
28 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
No ratings yet
ID3 Algorithm: Abbas Rizvi CS157 B Spring 2010
19 pages
Taxonomy of Bugs
100% (1)
Taxonomy of Bugs
8 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
ID3 Algorithm & ROC Analysis
No ratings yet
ID3 Algorithm & ROC Analysis
51 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages

ID3 AllanNeymark

Uploaded by

ID3 AllanNeymark

Uploaded by

ID3 Algorithm

You might also like