0% found this document useful (0 votes)
10 views

ID3 Algorithm

The ID3 algorithm constructs a decision tree from a set of examples to classify future samples based on attributes. It uses entropy and information gain to determine the best attributes for decision nodes, with leaf nodes representing class names. The document illustrates the application of ID3 in deciding whether tennis is playable based on various weather factors, calculating gains for attributes like wind, outlook, temperature, and humidity.

Uploaded by

AMAN SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
10 views

ID3 Algorithm

The ID3 algorithm constructs a decision tree from a set of examples to classify future samples based on attributes. It uses entropy and information gain to determine the best attributes for decision nodes, with leaf nodes representing class names. The document illustrates the application of ID3 in deciding whether tennis is playable based on various weather factors, calculating gains for attributes like wind, outlook, temperature, and humidity.

Uploaded by

AMAN SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 22
ID3 ALGORITHM Abstract + ID3 builds a decision tree from a fixed set of examples. + Using this decision tree, future samples are classified. + The example has several attributes and belongs to a class. + The leaf nodes of the decision tree contain the class name whereas a non-leaf node is a decision node. + The decision node is an attribute test with each branch being a possible value of the attribute. + ID3 uses information gain to help it decide which attribute goes into a decision node. Algorithm * Calculate the entropy of every attribute using the data set. + Split the set into subsets using the attribute for which entropy is minimum (or equivalently, information gain is maximum). + Make a decision tree node containing that attribute. + Recurse on subsets using remaining attributes. Entropy and Information gain + The entropy is a measure of the randomness in the information being processed. + Ifthe sample is completely homogeneous the entropy is zero and if the sample is equally divided then it has entropy of one. + Entropy can be calculated as: Entropy(S) = J - p(l). log,p(!) + The information gain is based on the decrease in entropy after a data-set is split on anattribute. + Information gain can be calculated as: Gain(S, A) = Entropy(S)- 5 [ p(S|A) . Entropy(S|A) ] Decision tree for deciding if tennis is playable, using data from past 14 days took sary Entropy(Decision) =— pes). log,p( Yes) -p(No) fog, ptNo) Entropy(Decision) ~ (g/24) log, (9/44) - (5/24) .log,(5/24) = 0.940 Slee Sl ElELELE/E/E\ElEl EE Wind factor on decision + Wind attribute has two labels: weak and strong. Gain(Decision, Wind) = Entropy(Decision) - [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] -[ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ] + We need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively. Weak wind factor + There are 8 instances for weak wind. Decision of 2 items are no and items are yes. + Entropy(Decision)Wind-Weak) =~ p(No) log, p(No) ~ ples) .og,p(Yes) Entropy(Decision|Wind=Weak) =— (2/8) log,(2/8) ~ (6/8) og, (618) 20.811 Strong wind factor « Here, there are 6 instances for strong wind. Decision is divided into two equal parts. | Entropy(Decision|Wind=Strong) = - (3/6) . log, (3/6) - (3/6).10g,(3/6) eal Wind factor on decision « Information Gain can be calculated as: Gain(Decision, Wind) = Entropy(Decsion) ~[ p(Decision)Wind=Weak) . Entropy(Decision|Wind-Weak) ]— {p(Decsion|Wind=Strong) . Entropy(Decsion|Wind=Strong) = 0.940 [ (8/4) 0.811 ]~[ (6/24). a] =0.048 Other factor on decision On applying similar calculation on the other columns, we get: * Gain(Decision, Outlook) * Gain(Decision, Temperature) = 0.029 * Gain(Decision, Humidity) = 0.152 Overcast outlook on decision + Decision will always be yes if outlook were overcast. Sunny outlook on decision We have 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes + Gain(Outlook=Sunny|Temperature) = 0.570 . Gan(Outlook-SunnyHumity]o.g70_) + Gain(Outlook=Sunny|Wind) = 0.019 + Decision will always be no when humidity is, high. + Decision will always be yes when humidity is normal. Rain outlook on decision Information gain for Rain outlook are: + Gain(Outlook=Rain Temperature) = o.02 + Gain(Outlook=Rain| Humidity) = 0.02 * Geinoutock-ain|Windogra >) * Decision will always be yes if wind were weak and outlook were rain. * Decision will always be no if wind were strong and outlook were rain. USE Ole

You might also like