0% found this document useful (0 votes)

45 views1 page

Building Classification Models - ID3 and C4.5

The document discusses algorithms ID3 and C4.5 for inducing classification models or decision trees from data. ID3 builds decision trees by selecting the attribute at each node that best splits the data, as measured by information gain. C4.5 extends ID3 to handle continuous attributes, missing values, and pruning decision trees to derive classification rules. The document provides examples and definitions of key concepts like entropy, information gain, and the ID3 and C4.5 algorithms.

Uploaded by

Ayele Nugusie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views1 page

Building Classification Models - ID3 and C4.5

Uploaded by

Ayele Nugusie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Building Classification Models: ID3 and C4.

5
Introduction
Basic Definitions
The ID3 Algorithm
Using Gain Ratios
C4.5 Extensions
Pruning Decision Trees and Deriving Rule Sets
Classification Models in the undergraduate AI Course
References

Introduction
ID3 and C4.5 are algorithms introduced by Quinlan for inducing Classification Models, also called Decision Trees, from data.
We are given a set of records. Each record has the same structure, consisting of a number of attribute/value pairs. One of these attributes represents the category of the record. The problem is to
determine a decision tree that on the basis of answers to questions about the non-category attributes predicts correctly the value of the category attribute. Usually the category attribute takes only the
values {true, false}, or {success, failure}, or something equivalent. In any case, one of its values will mean failure.

For example, we may have the results of measurements taken by experts on some widgets. For each widget we know what is the value for each measurement and what was decided, if to pass, scrap, or
repair it. That is, we have a record with as non categorical attributes the measurements, and as categorical attribute the disposition for the widget.

Here is a more detailed example. We are dealing with records reporting on weather conditions for playing golf. The categorical attribute specifies whether or not to Play. The non-categorical attributes
are:

ATTRIBUTE | POSSIBLE VALUES

============+=======================
outlook | sunny, overcast, rain
------------+-----------------------
temperature | continuous
------------+-----------------------
humidity | continuous
------------+-----------------------
windy | true, false
============+=======================

and the training data is:

OUTLOOK | TEMPERATURE | HUMIDITY | WINDY | PLAY
=====================================================
sunny | 85 | 85 | false | Don't Play
sunny | 80 | 90 | true | Don't Play
overcast| 83 | 78 | false | Play
rain | 70 | 96 | false | Play
rain | 68 | 80 | false | Play
rain | 65 | 70 | true | Don't Play
overcast| 64 | 65 | true | Play
sunny | 72 | 95 | false | Don't Play
sunny | 69 | 70 | false | Play
rain | 75 | 80 | false | Play
sunny | 75 | 70 | true | Play
overcast| 72 | 90 | true | Play
overcast| 81 | 75 | false | Play
rain | 71 | 80 | true | Don't Play

Notice that in this example two of the attributes have continuous ranges, Temperature and Humidity. ID3 does not directly deal with such cases, though below we examine how it can be extended to do
so. A decision tree is important not because it summarizes what we know, i.e. the training set, but because we hope it will classify correctly new cases. Thus when building classification models one
should have both training data to build the model and test data to verify how well it actually works.

A simpler example from the stock market involving only discrete ranges has Profit as categorical attribute, with values {up, down}. Its non categorical attributes are:

ATTRIBUTE | POSSIBLE VALUES

============+=======================
age | old, midlife, new
------------+-----------------------
competition | no, yes
------------+-----------------------
type | software, hardware
------------+-----------------------

and the training data is:

AGE | COMPETITION | TYPE | PROFIT

=========================================
old | yes | swr | down
--------+-------------+---------+--------
old | no | swr | down
--------+-------------+---------+--------
old | no | hwr | down
--------+-------------+---------+--------
mid | yes | swr | down
--------+-------------+---------+--------
mid | yes | hwr | down
--------+-------------+---------+--------
mid | no | hwr | up
--------+-------------+---------+--------
mid | no | swr | up
--------+-------------+---------+--------
new | yes | swr | up
--------+-------------+---------+--------
new | no | hwr | up
--------+-------------+---------+--------
new | no | swr | up
--------+-------------+---------+--------

For a more complex example, here are files that provide records for a series of votes in Congress. The first file describes the structure of the records. The second file provides the Training Set, and the
third the Test Set.

The basic ideas behind ID3 are that:

In the decision tree each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute
for the records described by the path from the root to that leaf. [This defines what is a Decision Tree.]

In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root. [This establishes
what is a "Good" decision tree.]

Entropy is used to measure how informative is a node. [This defines what we mean by "Good". By the way, this notion was introduced by Claude Shannon in Information Theory.]

C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on.

Definitions
If there are n equally probable possible messages, then the probability p of each is 1/n and the information conveyed by a message is -log(p) = log(n). [In what follows all logarithms are in base 2.] That
is, if there are 16 messages, then log(16) = 4 and we need 4 bits to identify each message.

In general, if we are given a probability distribution P = (p1, p2, .., pn) then the Information conveyed by this distribution, also called the Entropy of P, is:
I(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))

For example, if P is (0.5, 0.5) then I(P) is 1, if P is (0.67, 0.33) then I(P) is 0.92, if P is (1, 0) then I(P) is 0. [Note that the more uniform is the probability distribution, the greater is its information.]

If a set T of records is partitioned into disjoint exhaustive classes C1, C2, .., Ck on the basis of the value of the categorical attribute, then the information needed to identify the class of an element of T is
Info(T) = I(P), where P is the probability distribution of the partition (C1, C2, .., Ck):
P = (|C1|/|T|, |C2|/|T|, ..., |Ck|/|T|)

In our golfing example, we have Info(T) = I(9/14, 5/14) = 0.94,

and in our stock market example we have Info(T) = I(5/10,5/10) = 1.0.

If we first partition T on the basis of the value of a non-categorical attribute X into sets T1, T2, .., Tn then the information needed to identify the class of an element of T becomes the weighted average
of the information needed to identify the class of an element of Ti, i.e. the weighted average of Info(Ti):
|Ti|
Info(X,T) = Sum for i from 1 to n of ---- * Info(Ti)
|T|

In the case of our golfing example, for the attribute Outlook we have
Info(Outlook,T) = 5/14*I(2/5,3/5) + 4/14*I(4/4,0) + 5/14*I(3/5,2/5)
= 0.694

Consider the quantity Gain(X,T) defined as

Gain(X,T) = Info(T) - Info(X,T)

This represents the difference between the information needed to identify an element of T and the information needed to identify an element of T after the value of attribute X has been obtained, that is,
this is the gain in information due to attribute X.

In our golfing example, for the Outlook attribute the gain is:
Gain(Outlook,T) = Info(T) - Info(Outlook,T) = 0.94 - 0.694 = 0.246.

If we instead consider the attribute Windy, we find that Info(Windy,T) is 0.892 and Gain(Windy,T) is 0.048. Thus Outlook offers a greater informational gain than Windy.

We can use this notion of gain to rank attributes and to build decision trees where at each node is located the attribute with greatest gain among the attributes not yet considered in the path from the root.

The intent of this ordering are twofold:

To create small decision trees so that records can be identified after only a few questions.

To match a hoped for minimality of the process represented by the records being considered(Occam's Razor).

The ID3 Algorithm

The ID3 algorithm is used to build a decision tree, given a set of non-categorical attributes C1, C2, .., Cn, the categorical attribute C, and a training set T of records.

function ID3 (R: a set of non-categorical attributes,

C: the categorical attribute,
S: a training set) returns a decision tree;
begin
If S is empty, return a single node with value Failure;
If S consists of records all with the same value for
the categorical attribute,
return a single node with that value;
If R is empty, then return a single node with as value
the most frequent of the values of the categorical attribute
that are found in records of S; [note that then there
will be errors, that is, records that will be improperly
classified];
Let D be the attribute with largest Gain(D,S)
among attributes in R;
Let {dj| j=1,2, .., m} be the values of attribute D;
Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute D;
Return a tree with root labeled D and arcs labeled
d1, d2, .., dm going respectively to the trees

ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);

end ID3;

In the Golfing example we obtain the following decision tree:

In the stock market case the decision tree is:

Age
/ | \
/ | \
new/ |mid \old
/ | \
Up Competition Down
/ \
/ \
no/ \yes
/ \
Up Down

Here is the decision tree, just as produced by c4.5, for the voting example introduced earlier.

Using Gain Ratios

The notion of Gain introduced earlier tends to favor attributes that have a large number of values. For example, if we have an attribute D that has a distinct value for each record, then Info(D,T) is 0, thus
Gain(D,T) is maximal. To compensate for this Quinlan suggests using the following ratio instead of Gain:

Gain(D,T)
GainRatio(D,T) = ----------
SplitInfo(D,T)

where SplitInfo(D,T) is the information due to the split of T on the basis

of the value of the categorical attribute D. Thus SplitInfo(D,T) is

I(|T1|/|T|, |T2|/|T|, .., |Tm|/|T|)

where {T1, T2, .. Tm} is the partition of T induced by the value of D.

In the case of our golfing example SplitInfo(Outlook,T) is

-5/14log(5/14) - 4/14log(4/14) - 5/14*log(5/14) = 1.577

thus the GainRatio of Outlook is 0.246/1.577 = 0.156. And

SplitInfo(Windy,T) is

-6/14log(6/14) - 8/14log(8/14) = 6/140.1.222 + 8/140.807

= 0.985

thus the GainRatio of Windy is 0.048/0.985 = 0.049

You can run PAIL to see how ID3 generates the decision tree [you need to have an X-server and to allow access (xhost) from yoda.cis.temple.edu].

C4.5 Extensions
C4.5 introduces a number of extensions of the original ID3 algorithm.

In building a decision tree we can deal with training sets that have records with unknown attribute values by evaluating the gain, or the gain ratio, for an attribute by considering only the records where
that attribute is defined.

In using a decision tree, we can classify records that have unknown attribute values by estimating the probability of the various possible results. In our golfing example, if we are given a new record for
which the outlook is sunny and the humidity is unknown, we proceed as follows:

We move from the Outlook root node to the Humidity node following
the arc labeled 'sunny'. At that point since we do not know
the value of Humidity we observe that if the humidity is at most 75
there are two records where one plays, and if the humidity is over
75 there are three records where one does not play. Thus one
can give as answer for the record the probabilities
(0.4, 0.6) to play or not to play.

We can deal with the case of attributes with continuous ranges as follows. Say that attribute Ci has a continuous range. We examine the values for this attribute in the training set. Say they are, in
increasing order, A1, A2, .., Am. Then for each value Aj, j=1,2,..m, we partition the records into those that have Ci values up to and including Aj, and those that have values greater than Aj. For each of
these partitions we compute the gain, or gain ratio, and choose the partition that maximizes the gain.
In our Golfing example, for humidity, if T is the training set, we determine the information for each partition and find the best partition at 75. Then the range for this attribute becomes {<=75, >75}.
Notice that this method involves a substantial number of computations.

Pruning Decision Trees and Deriving Rule Sets

The decision tree built using the training set, because of the way it was built, deals correctly with most of the records in the training set. In fact, in order to do so, it may become quite complex, with long
and very uneven paths.

Pruning of the decision tree is done by replacing a whole subtree by a leaf node. The replacement takes place if a decision rule establishes that the expected error rate in the subtree is greater than in the
single leaf. For example, if the simple decision tree

Color
/ \
red/ \blue
/ \
Success Failure

is obtained with one training red success record and two training blue Failures, and then in the Test set we find three red failures and one blue success, we might consider replacing this subtree by a
single Failure node. After replacement we will have only two errors instead of five failures.

Winston shows how to use Fisher's exact test to determine if the category attribute is truly dependent on a non-categorical attribute. If it is not, then the non-categorical attribute need not appear in the
current path of the decision tree.

Quinlan and Breiman suggest more sophisticated pruning heuristics.

It is easy to derive a rule set from a decision tree: write a rule for each path in the decision tree from the root to a leaf. In that rule the left-hand side is easily built from the label of the nodes and the
labels of the arcs.

The resulting rules set can be simplified:

Let LHS be the left hand side of a rule. Let LHS' be obtained from LHS by eliminating some of its conditions. We can certainly replace LHS by LHS' in this rule if the subsets of the training set that
satisfy respectively LHS and LHS' are equal.

A rule may be eliminated by using metaconditions such as "if no other rule applies".

You can run the C45 program here [you need to have an X-server and to allow access (xhost) from yoda.cis.temple.edu].

Classification Models in the Undergraduate AI Course

It is easy to find implementations of ID3. For example, a Prolog program by Shoham and a nice Pail module.

The software for C4.5 can be obtained with Quinlan's book. A wide variety of training and test data is available, some provided by Quinlan, some at specialized sites such as the University of California
at Irvine.

Student projects may involve the implementation of these algorithms. More interesting is for students to collect or find a significant data set, partition it into training and test sets, determine a decision
tree, simplify it, determine the corresponding rule set, and simplify the rule set.

The study of methods to evaluate the error performance of a decision tree is probably too advanced for most undergraduate courses.

References
Breiman,Friedman,Olshen,Stone: Classification and Decision Trees
Wadsworth, 1984

A decision science perspective on decision trees.

Quinlan,J.R.: C4.5: Programs for Machine Learning

Morgan Kauffman, 1993

Quinlan is a very readable, thorough book, with actual usable programs

that are available on the internet. Also available are a number of
interesting data sets.

Quinlan,J.R.: Simplifying decision trees

International Journal of Man-Machine Studies, 27, 221-234, 1987

Winston,P.H.: Artificial Intelligence, Third Edition

Addison-Wesley, 1992

Excellent introduction to ID3 and its use in building decision trees and,
from them, rule sets.

[email protected]

DVP&R
No ratings yet
DVP&R
2 pages
A Case Study in Mathematizing Divination Systems Using Modular
100% (1)
A Case Study in Mathematizing Divination Systems Using Modular
19 pages
Hns Knowledge Short Notes
No ratings yet
Hns Knowledge Short Notes
39 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Story Timelevel 2
No ratings yet
Story Timelevel 2
33 pages
Slope Stability PDF
No ratings yet
Slope Stability PDF
6 pages
Rail Gun
100% (1)
Rail Gun
20 pages
Important Effective Teaching Methods and Techniques
No ratings yet
Important Effective Teaching Methods and Techniques
26 pages
The ID3 Algorithm
No ratings yet
The ID3 Algorithm
9 pages
Purposive Communication 2
100% (1)
Purposive Communication 2
2 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Financial Planning
No ratings yet
Financial Planning
53 pages
03 02 Decision Trees
No ratings yet
03 02 Decision Trees
61 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Unit 3
No ratings yet
Unit 3
90 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Unit 3
No ratings yet
Unit 3
81 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Classification Trees
No ratings yet
Classification Trees
48 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Kirubel
No ratings yet
Kirubel
26 pages
Additional Topics in Variance Analysis: True / False Questions
No ratings yet
Additional Topics in Variance Analysis: True / False Questions
232 pages
400 (M) G Alfa Romeo 166 01
No ratings yet
400 (M) G Alfa Romeo 166 01
3 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Tree Models
No ratings yet
Tree Models
42 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Cosmetic & Homecare Industry
No ratings yet
Cosmetic & Homecare Industry
2 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Dmba203 - Marketing Management
No ratings yet
Dmba203 - Marketing Management
6 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
No ratings yet
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
15 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree
No ratings yet
Decision Tree
38 pages
DIVIDENDS
No ratings yet
DIVIDENDS
2 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Saint Mary'S University: School of Accountancy and Business
No ratings yet
Saint Mary'S University: School of Accountancy and Business
2 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Abeya Merga Research
No ratings yet
Abeya Merga Research
45 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Unit 3
No ratings yet
Unit 3
46 pages
Restorative Justice: Nega Jibat
No ratings yet
Restorative Justice: Nega Jibat
61 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Diseases Parasites and Predators Management and Control
No ratings yet
Diseases Parasites and Predators Management and Control
7 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Solutions On Quiz 1
No ratings yet
Solutions On Quiz 1
6 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Ai 01 Id3
No ratings yet
Ai 01 Id3
7 pages
ACC4210 Module Handbook 2014
No ratings yet
ACC4210 Module Handbook 2014
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
4a - Training
No ratings yet
4a - Training
38 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Memory Addressing and Instruction Formats
No ratings yet
Memory Addressing and Instruction Formats
9 pages
Theo 5 - Module 4
No ratings yet
Theo 5 - Module 4
26 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Chapter 3 ER
No ratings yet
Chapter 3 ER
44 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Tempus Guidelines
No ratings yet
Tempus Guidelines
69 pages
Fouridiots: Online Library Management System
No ratings yet
Fouridiots: Online Library Management System
40 pages
Leading For The Future
No ratings yet
Leading For The Future
4 pages
Kaffa Review
No ratings yet
Kaffa Review
2 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Alice Reading Guide
No ratings yet
Alice Reading Guide
2 pages
ITI Newsletter July 2024
No ratings yet
ITI Newsletter July 2024
3 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
Dai 2015
No ratings yet
Dai 2015
4 pages
Syllabus Operating System Outline ITec2022-CSoc 2042
No ratings yet
Syllabus Operating System Outline ITec2022-CSoc 2042
5 pages
Frequency-Dependence of Relative Permeability in Steel
No ratings yet
Frequency-Dependence of Relative Permeability in Steel
8 pages
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
No ratings yet
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
7 pages
So sánh thuật toán cây quyết định ID3 và C45
No ratings yet
So sánh thuật toán cây quyết định ID3 và C45
7 pages
ID3
No ratings yet
ID3
7 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
CUCOH 2013 Executive Application
No ratings yet
CUCOH 2013 Executive Application
4 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Building Classification Models - ID3 and C4.5

Uploaded by

Building Classification Models - ID3 and C4.5

Uploaded by

Building Classification Models: ID3 and C4.

ATTRIBUTE | POSSIBLE VALUES

and the training data is:

ATTRIBUTE | POSSIBLE VALUES

and the training data is:

AGE | COMPETITION | TYPE | PROFIT

The basic ideas behind ID3 are that:

In our golfing example, we have Info(T) = I(9/14, 5/14) = 0.94,

Consider the quantity Gain(X,T) defined as

Gain(X,T) = Info(T) - Info(X,T)

The intent of this ordering are twofold:

The ID3 Algorithm

function ID3 (R: a set of non-categorical attributes,

ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-{D}, C, Sm);

In the Golfing example we obtain the following decision tree:

In the stock market case the decision tree is:

Using Gain Ratios

where SplitInfo(D,T) is the information due to the split of T on the basis

I(|T1|/|T|, |T2|/|T|, .., |Tm|/|T|)

where {T1, T2, .. Tm} is the partition of T induced by the value of D.

In the case of our golfing example SplitInfo(Outlook,T) is

-5/14*log(5/14) - 4/14*log(4/14) - 5/14*log(5/14) = 1.577

thus the GainRatio of Outlook is 0.246/1.577 = 0.156. And

-6/14*log(6/14) - 8/14*log(8/14) = 6/14*0.1.222 + 8/14*0.807

thus the GainRatio of Windy is 0.048/0.985 = 0.049

Pruning Decision Trees and Deriving Rule Sets

Quinlan and Breiman suggest more sophisticated pruning heuristics.

The resulting rules set can be simplified:

Classification Models in the Undergraduate AI Course

A decision science perspective on decision trees.

Quinlan,J.R.: C4.5: Programs for Machine Learning

Quinlan is a very readable, thorough book, with actual usable programs

Quinlan,J.R.: Simplifying decision trees

Winston,P.H.: Artificial Intelligence, Third Edition

You might also like

-5/14log(5/14) - 4/14log(4/14) - 5/14*log(5/14) = 1.577

-6/14log(6/14) - 8/14log(8/14) = 6/140.1.222 + 8/140.807