Assignment 4: Decision Tree - Classification
Consider a data set (take a reasonable number of observations) from the literature or research
papers or some other source to construct a Decision Tree using the ID3 algorithm.
Use well the Entropy and Information Gain to perform the calculations...
Following medical diagnostic data, I will solve for decision tree by using ID3 algorithm.
Sore Fever Swollen Congestion Headach Diagnosis
Throat glands e
Yes Yes Yes Yes Yes Steep
Throat
No No No Yes Yes allergy
Yes Yes No Yes No cold
Yes No Yes No No Steep
throat
No Yes No Yes No Cold
No No No Yes No allergy
No No Yes No No Steep
throat
Yes No No Yes Yes Allergy
No Yes No Yes Yes Cold
Yes No No Yes Yes Cold
First of all here we need to use to calculator.
We will use here formulas of entropy and information gain.
Information gain(p,n)= - P/s Log2 p/s – n/s log2 n/s
S=(p+n)
Entropy E(A)=
Proper gain (A)=I p1(n)-E(A)
By using these formulas, we will solve decision tree.
Here in this example I will use log base 2.
First of all, I have to find the information gain.
How much we have sample space S=ST+A+C=10
Steep throat=3
Allergy=3
Cold=4
Information gain = - [3/10log2(3/10)] + [3/10 log2 (3/10)] + [4/10 log2 (4/10)]
= - [0.3 log2(0.3) + [0.3log2(0.3)]+[0.4log2(0.4)]
=-[-0.521-0.521-0.529]
=1.571
By finding splitting attributes
(i). Sore Throat
Steep throat Allergy Cold
yes 2 1 2
no 1 2 2
Entropy (sore throat) =
For entropy first we will find information gain of yes and no.
Entropy (sore throat) = info(gain)*p + info (gain)*p
Information [yes]=[2/5 log2 (2/5)] + [1/5log2 (1/5)]+[2/5log(2/5)]
= -[-0.53+0.46+0.53]
=1.52
Information (no)= -[1/5log2 (1/5)]+ [2/5log(2/5)]+ [2/5log(2/5)]
= -[-0.46 -0.53 -0.53]
=1.52
We have to calculate entropy of sore throat.
(i) Sore Throat
Entropy (sore throat) = info(gain)*p + info (gain)*p
=5/10*1.52 + 5/10*1.52
= 0.5*1.52+0.5*1.52
=0.76+076
=1.52
Now we have to calculate gain
Proper gain (A) =I p1(n)-E(A)
IP(n) we have already found which was 1.562
Proper gain (A) =I p1(n)-E(A)
=1.571-1.52
=0.05
First attributes gain has been fined.
Now for finding the second attributes which is Fever
(ii). Fever
ST A C
yes 1 0 3
No 2 3 1
I have to find entropy of Fever.
For finding entropy I have to need Information gain of (yes) and Information gain of (No).
Information gain (yes)=
Entropy (Fever) = info(gain)*p + info (gain)*p
Info gain (Yes) = -[1/4 log2(1/4) + 0 log2(0/4) + ¾ log 2(3/4)
= -[-0.5 – 0.0 -0.311]
=0.811
Info gain (no) = -[2/6log2(2/6) + 3/6log2(3/6) + 1/6log2(l/6)
=-[-0.52 – 0.5 – 0.43]
=-[-1.45]
=1.45
Entropy of fever = [4/10*.811] + [6/10*1.45]
=0.32+ 0.87
=1.19
Gain of fever is = I p1(n)-E(fever)
=1.571-1.19
=0.38
(iii). Swollen glands
ST A A
yes 3 0 0
No 0 3 4
We have to calculate entropy of swollen glands.
Info gain (yes) = -[3/3log2 (3/3)
=0
Info gain (No) = -[3/7log2(3/7) + [4/7log2(4/7)]
=- [- 0.53-0.46]
= 0.99
Entropy of the swollen glands= 3/10*0 + 7/10 * .99
= 0.69
Gain of swollen glands is = 1.571-0.69=0.88
(iv) Congestion
ST A C
Yes 1 3 4
No 2 0 0
Entropy of congestion=
Info gain (yes) = -[1/8 log2(1/8) + 3/8 log2(3/8) + 4/8 log2(4/8) ]
= - [ -0.38 – 0.53 -0.5]
= 1.41
Info gain (No) = -[2/2log2(2/2)
=0
Entropy of Congestion = 8/10*1.41+ 2/10*0
= 1.128
Gain of Congestion = 1.572.1.128
=0.44
(iv) Headache
ST A C
Yes 1 2 2
No 2 1 2
Now I have to calculate Entropy of headache.
Info gain of (yes) = -[1/5log2(1/5)+ 2/5log2(2/5)+ 2/5log2(2/5)]
= -[- 0.46 -- 0.53 – 0.53]
=1.52
Info gain of (no) = - [2/5log2(2/5) + 1/5log2(1/5)+ 2/5log2(2/5)]
= - [-0.53 - 0.46 - 0.53]
=1.52
Entropy (Congestion) = 5/10*1.52+5/10*1.52
=0.76+0.76
=1.52
Gain of congestion is = 1.572-1.52
=0.05
So we now all gains which are also written below.
Attribute Gain
Sore throat 0.05
fever 0.38
Swollen glands 0.88
Congestion 0.44
Headache 0.05
Now we have to create decision tree .
We will choose first attributes