Machine Learning Unit-3.2
Machine Learning Unit-3.2
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
What is overfitting in decision tree?
• The decision tree algorithm, unless a stopping criterion is applied,
may keep growing indefinitely – splitting for every feature and
dividing into smaller partitions till the point that the data is perfectly
classified.
• Pruning reduces the size of the tree such that the model is more
generalized and can classify unknown and unlabelled data in a better
way.
• In the case of pre-pruning, the tree is stopped from further growing once it reaches a
certain number of decision nodes or decisions. Hence, in this strategy, the algorithm avoids
overfitting as well as optimizes computational cost.
• On the other hand, in the case of post-pruning, the tree is allowed to grow to the full
extent.
• Then, by using certain pruning criterion, e.g. error rates at the nodes, the size of the tree is
reduced.
• Can work well both with small and large training data sets.
Where,
probability(a) is probability of getting head and
probability(b) is probability of getting tail.
What is “Entropy”? and What is its function?
• In machine learning, entropy is a measure of the randomness
in the information being processed.
• It is given by
• For the age category “youth,” there are 2 yes tuples and 3
no tuples.
• For the category “middle aged,” there are 4 yes tuples and
0 no tuples.
= 0.694
• Hence, the gain in information from such a partitioning
would be,
= 0.940-0.694
= 0.246 bits
• Next, we need to compute the expected information
requirement for the attribute Income.
• For the income category “High ,” there are 2 yes tuples and
2 no tuples.
Entropy(income(D))=(4/14)*[-2/4*log(2/4)-2/4*log(2/4)+(6/14)*[-4/6*log(4/6)-
2/6*log(2/6)] + (4/14)*[-3/4*log(3/4)- ¼ *log(1/4)]
=(4/14)*[(-0.5*-1.0+-0.5* 1.0)]+(6/14)*[(0.666*0.5849)+(0.3333*1.5851)]+
(4/14)*[(0.75*0.4150) + (0.25*2.0)]
=(2/7)*[0.5+0.5]+(3/7)*[(0.666*0.5849)+(0.3333*1.5851)]+
(2/7)*[(0.75*0.4150) + (0.25*2.0)]
Info(student)(D)=(7/14)*[-6/7*log(6/7)-1/7*log(1/7)+(7/14)*[-3/7*log(3/7)-
4/7*log(4/7)]
= (7/14)*[0.8571*(0.2224)+(0.1428*2.8079)+(7/14)*[0.4285*1.2226 -
0.5714*0.8074)]
= (7/14)*[0.8571*(0.2224)+(0.1428*2.8079)+(7/14)*[0.4285*1.2226 -
0.5714*0.8074)]
= 0.5*(0.1906+0.4009)+0.5*(0.5238+0.4613)
=0.2957+0.4925
=0.7880
Gain(student) = 0.940-0.788
= 0.152 bits
• Similarly, we can compute Gain(credit_rating)= 0.048 bits.
• Node N is labeled with age, and branches are grown for each of
the attribute’s values.
• The final decision tree returned by the algorithm was shown earlier
in Figure 8.2.
• Step 1: Calculate the Entropy of one attribute —
Prediction: Clare Will Play Tennis/ Not Play Tennis
= 9/14
= 0.642
• Now, we will see probability of not playing
tennis.
• Probability of not playing tennis:
Number of favourable events : 5
• Number of total events : 14
• Probability = (Number of favourable events) / (Number of total events)
=5/14
=0.357
• Entropy at source= -(Probability of playing
tennis) * log2(Probability of playing tennis) –
(Probability of not playing tennis) *
log2(Probability of not playing tennis)
• E(S) = -[(9/14)log2(9/14) - (5/14)log2(5/14)]
= -0.652 * log2(0.652) – 0.357*log2(0.357)
=0.940
So, Entropy of whole system before we make
our first question is 0.940
Note: Here typically we will take log to base 2
• From the above data for outlook we can arrive at the following
table easily:
Sunny Rain
Overcast
• E(Sunny, Humidity)=(3/5)*E(0,3)+(2/5)*E(2,0)
=(3/5)*0+(2/5)*0
=0
IG(Sunny, Humidity)=E(S)- E(Sunny, Humidity)
= 0.971-0
= 0.971
For humidity from the above table, we can say that play will occur if
humidity is normal and will not occur if it is high. Similarly, find the
nodes under rainy.
Play Tennis?
Yes No Total
Hot 0 2 2
Sunny Temp. Mild 1 1 2
Cool 1 0 1
5
E(Sunny, Temperature)=(2/5)*E(0,2)+(2/5)*E(1,1)+(1/5)*E(1,0)
=(2/5)*0+(2/5)*(-(1/2)log1/2-(1/2)log1/2)+ (1/5)*0
=0+(2/5)*1.0+0= 0.40
E(Sunny, Wind)=(2/5)*E(1,1)+(3/5)*E(1,2)
=(2/5)*(-(1/2)log(1/2)-(1/2)log(1/2)+ (3/5)*(-(1/3)log1/3-(2/3)log2/3)
=0.4*[0.5(-1)+(0.5)(-1)]+0.6[-0.3333*(-1.5851)-0.6666*(-0.585)]
=0.4*1.0+0.6 *[0.5278+0.3896]
=0.4+0.6*0.9174
=0.4+0.5504
=0.9504
E(Rain)=E(3,2)
= (-3/5)*log3/5-(2/5)log(2/5)
=0.6*0.7369+0.4*1.3219
=0.970
High 1 1 2
Outlook Rain Normal 2 1 3
Total 5
E(Rain , Humidity)=(2/5)*E(1,1)+(3/5)*E(2,1)
=(2/5)*(-(1/2)log1/2-(1/2)log1/2+(3/5)*(-(2/3)log2/3-(1/3)log1/3
=0.4*((0.5*-1)+(0.5*1)+0.6*(0.666*0.5851+0.333*1.5851)
=0.4*1.0+0.6*(0.3900+0.5283)
=0.4+0.5509
=0.950
IG(Rain, Humidity)=E(S)- E(Rain, Humidity)
= 0.970-0.950
= 0.020
Day Outlook Wind Total
D4 Rain Weak Yes
D5 Rain Weak Yes
D6 Rain Strong No
D10 Rain Weak Yes
D14 Rain Strong No
E(Rain , Wind)=(2/5)*E(1,1)+(3/5)*E(2,1)
=(2/5)*(-(1/2)log1/2-(1/2)log1/2+(3/5)*(-(2/3)log2/3-(1/3)log1/3
=0.4*((0.5*-1)+(0.5*1)+0.6*(0.666*0.5851+0.333*1.5851)
=0.4*1.0+0.6*(0.3900+0.5283)
=0.4+0.5509
=0.950
IG(Rain, Wind)=E(S)- E(Rain, Wind)
= 0.970-0.950
= 0.020
Day Outlook Temp Humidity Wind Class:
Play Tennis?
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes
D14 Rain Mild High Strong No
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]
? Yes ?
Test for this node