Data Mining 3
Data Mining 3
A1 2 10
A2 2 5
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
i A1(cluster 1) A2 (Cluster 2)
A1 0 5
A2 5 0
A3 8.5 6.08
A4 3.6 4.2
A5 7.07 5
A6 7.2 4.1
A7 8.06 3.1
A8 2.2 4.47
X= (2+5+4)/3= 3.6
Y= (10+8+9)/3= 9 49
X= (2+8+7+6+1)/5= 4.8
Y= (5+5+4+4+2)/5= 4
Step2: Recalculate the distance from each point to the cluster means
I cluster 1 Cluster 2
A1 1.8 6.6
A2 4.3 2.9
A3 6.6 3.2
A4 1.7 4
A5 5.2 2.4
A6 5.5 1.2
A7 7.4 4.2
A8 0.4 5.06
i X Y Cluster
A1 2 10 1
A2 2 5 2
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 2
A8 4 9 1
X= (2+5+4)/3= 3.6
Y= (10+8+9)/3= 9 49
X= (2+8+7+6+1)/5= 4.8
Y= (5+5+4+4+2)/5= 4
Q2
=0.88
Find the Information Gain for each attribute attribute: Marital Status 3 Yes
7 No
E(Married)=0
3 Yes
E(YES)=0 7 No
E(NO)= - PN * log2 PN - PY * log2 PY Yes No
=0.98
3 Yes
3 No
4 No
= 0.19
Find the Information Gain for each attribute attribute: Taxable Income
3 Yes
E(>100)=0 7 No
E(<=100)= - PN * log2 PN - PY * log2 PY >100 <=100
=0.98
3 Yes
3 No
G(S, “Taxable Income”) = E(S) – (PN*E(N) + PY*E(Y) ) 4 No
= 0.19
From the calculated Gaining Information, select the attribute with the highest value
So, select the Marital Status to start with as it has highest value
Material Statues
Divorced Married
Single