DWM Question Bank Solution
DWM Question Bank Solution
A2 (2,6), A7 (5,10), and A15 (6,11) as the centroids of the initial clusters.
Now we will find the euclidean distance between each point and the
centroids. Based on the minimum distance of each point from the centroids,
we will assign the points to a cluster.
Distance from
Distance from Distance from Assigned
Point Centroid 1
Centroid 2 (5,10) Centroid 3 (6,11) Cluster
(2,6)
A3
10.29563 6.082763 5 Cluster 3
(11,11)
A9
10 5.385165 4.123106 Cluster 3
(10,12)
A13
4.123106 2 3.162278 Cluster 2
(3,10)
A15
6.403124 1.414214 0 Cluster 3
(6,11)
we will calculate the distance of each data point from the new centroids.
Distance
Distance from Distance from
from Assigned
Point Centroid 1 (3.833, centroid 2 (4,
centroid 3 Cluster
5.167) 9.6)
(9, 11.25)
A9
9.204 6.462 1.250 Cluster 3
(10,12)
A11
7.792 5.192 0.250 Cluster 3
(9,11)
A13
4.904 1.077 6.129 Cluster 2
(3,10)
A15
6.223 2.441 3.010 Cluster 2
(6,11)
Now, we will calculate the new centroid for each cluster for the third
iteration.
Now, we will calculate the distance of each data point from the new
centroids.
A3
9.485 7.004 1.054 Cluster 3
(11,11)
A9
9.527 6.341 0.667 Cluster 3
(10,12)
A11
8.122 5.063 1.054 Cluster 3
(9,11)
A12 (4,6) 1.400 3.574 8.028 Cluster 1
A13
5.492 1.221 7.126 Cluster 2
(3,10)
A15
6.705 2.343 4.014 Cluster 2
(6,11)
Now, we will calculate the new centroid for each cluster for the third iteration.
Here, you can observe that no point has changed its cluster compared to
the previous iteration. Due to this, the centroid also remains constant.
Therefore, we will say that the clusters have been stabilized.
Q.2
Following are two points from the dataset that we have selected as medoids.
M1 = (3, 4)
M2 = (7, 3)
Iteration 1
Now, we will calculate the distance between each data point and the medoids
using the Manhattan distance measure.
A1 (2, 6) 3 8 Cluster 1
A2 (3, 8) 4 9 Cluster 1
A3 (4, 7) 4 7 Cluster 1
A4 (6, 2) 5 2 Cluster 2
A5 (6, 4) 3 2 Cluster 2
A6 (7, 3) 5 0 Cluster 2
A7 (7,4) 4 1 Cluster 2
A8 (8, 5) 6 3 Cluster 2
A9 (7, 6) 6 3 Cluster 2
The clusters made with medoids (3, 4) and (7, 3) are as follows.
Points in cluster1= {(2, 6), (3, 8), (4, 7), (3, 4)}
Points in cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}
After assigning clusters, we will calculate the cost for each cluster and find
their sum. The cost is nothing but the sum of distances of all the data points
from the medoid of the cluster they belong to.
M1 = (3, 4)
M2 = (7, 4)
Now, let us calculate the distance between all the data points and the current
medoids.
A1 (2, 6) 3 7 Cluster 1
A2 (3, 8) 4 8 Cluster 1
A3 (4, 7) 4 6 Cluster 1
A4 (6, 2) 5 3 Cluster 2
A5 (6, 4) 3 1 Cluster 2
A6 (7, 3) 5 1 Cluster 2
A7 (7,4) 4 0 Cluster 2
A8 (8, 5) 6 2 Cluster 2
A9 (7, 6) 6 2 Cluster 2
The data points haven’t changed in the clusters after changing the medoids.
Hence, clusters are:
Here, the current cost is less than the cost calculated in the previous
iteration. Hence, we will make the swap permanent and make (7,4) the
medoid for cluster 2. If the cost this time was greater than the previous
cost i.e. 22, we would have to revert the change. New medoids after this
iteration are (3, 4) and (7, 4) with no change in the clusters.
Iteration 3
Now, let us again change the medoid of cluster 2 to (6, 4). Hence, the new
medoids for the clusters are M1=(3, 4) and M2= (6, 4 ).
Let us calculate the distance between the data points and the above medoids
to find the new cluster.
A1 (2, 6) 3 6 Cluster 1
A2 (3, 8) 4 7 Cluster 1
A3 (4, 7) 4 5 Cluster 1
A4 (6, 2) 5 2 Cluster 2
A5 (6, 4) 3 0 Cluster 2
A6 (7, 3) 5 2 Cluster 2
A7 (7,4) 4 1 Cluster 2
A8 (8, 5) 6 3 Cluster 2
A9 (7, 6) 6 3 Cluster 2
Now, let us again calculate the cost for each cluster and find their sum. The
total cost this time will be 3+4+4+2+0+2+1+3+3+0=22.
The current cost is 22 which is greater than the cost in the previous iteration
i.e. 20. Hence, we will revert the change and the point (7, 4) will again be
made the medoid for cluster 2.
So, the clusters after this iteration will be cluster1 = {(2, 6), (3, 8), (4, 7), (3, 4)}
and cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}. The medoids are (3,4) and
(7,4).
Q.3
To obtain the new distance matrix, we need to remove the 3 and 5 entries, and replace it by an
entry "35”.
This gives us the new distance matrix. The items with the smallest distance get clustered next.
This will be 2 and 4.
Complete Linkage
Q. 4
Using single linkage, we specify minimum distance between original objects of the two
clusters.
Using the input distance matrix, distance between cluster (D, F) and cluster A is
computed as
Distance between cluster (D, F) and cluster (A, B) is the minimum distance between all
objects involves in the two clusters
Distance between cluster ((D, F), E) and cluster C yields the minimum distance of 1.41.
This distance is computed as
After that, we merge cluster ((D, F), E) and cluster C into a new cluster name (((D, F),
E), C).
The updated distance matrix is shown in the figure below
Now if we merge the remaining two clusters, we will get only single cluster contain the
whole 6 objects. Thus, our computation is finished. We summarized the results of
computation as follow:
Using this information, we can now draw the final results of a dendogram.
Module 5
Q.2
The Apriori Algorithm makes the given assumptions
Step 1
Make a frequency table of all the products that appear in all the transactions. Now,
short the frequency table to add only those products with a threshold support level
of over 50 percent. We find the given frequency table.
Product Frequency (Number of transactions)
Rice (R) 4
Pulse(P) 5
Oil(O) 4
Milk(M) 4
Step 2
Create pairs of products such as RP, RO, RM, PO, PM, OM. You will get the given
frequency table.
RP 4
RO 3
RM 2
PO 4
PM 3
OM 2
Step 3
Implementing the same threshold support of 50 percent and consider the products
that are more than 50 percent. In our case, it is more than 3
Step 4
Now, look for a set of three products that the customers buy together. We get the
given combination.
Step 5
Calculate the frequency of the two itemsets, and you will get the given frequency
table.
RPO 4
POM 3
Q.3
Solution:
Q.4
Support threshold=50%, Confidence= 60%
Table 1:
T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
Item Count
I1 4
I2 5
I3 4
I4 4
I5 2
Item Count
I2 5
I1 4
I3 4
I4 4
Build FP Tree
Module 6
Q.3
Page Rank formula
PR(A) = (1-β) + β * [PR(B) / Cout(B) + PR(C) / Cout(C)+ ...... +
PR(N) / Cout(N)]
HERE, β is teleportation factor i.e. 0.8
Number of incoming links and outgoing links from each node considered to calculate the rank.
Let us create a table of the 0th Iteration, 1st Iteration, and 2nd Iteration.
Iteration 0:
For iteration 0 assume that each page is having page rank = 1/Total no. of
nodes
Therefore, PR(A) = PR(B) = PR(C) = PR(D) = PR(E) = PR(F) = 1/6 = 0.16
Iteration 1:
By using the above-mentioned formula
PR(A) = (1-0.8) + 0.8 * PR(B)/4 + PR(C)/2
= (1-0.8) + 0.8 * 0.16/4 + 0.16/2
= 0.3
So, what have we done here is for node A we will see how many incoming
signals are there so here we have PR(B) and PR(C). And for each of the
incoming signals, we will see the outgoing signals from that particular
incoming signal i.e. for PR(B) we have 4 outgoing signals and for PR(C) we
have 2 outgoing signals. The same procedure will be applicable for the
remaining nodes and iterations.
NOTE: USE THE UPDATED PAGE RANK FOR FURTHER
CALCULATIONS.
Iteration 2:
By using the above-mentioned formula
PR(A) = (1-0.8) + 0.8 * PR(B)/4 + PR(C)/2
= (1-0.8) + 0.8 * (0.32/4) + (0.32/2)
= 0.392
NOTE: USE THE UPDATED PAGE RANK FOR FURTHER
CALCULATIONS.
PR(B) = (1-0.8) + 0.8 * PR(A)/2
= (1-0.8) + 0.8 * 0.392/2
= 0.3568
PR(C) = (1-0.8) + 0.8 * PR(A)/2
= (1-0.8) + 0.8 * 0.392/2
= 0.3568
PR(D) = (1-0.8) + 0.8 * PR(B)/4
= (1-0.8) + 0.8 * 0.3568/4
= 0.2714
PR(E) = (1-0.8) + 0.8 * PR(B)/4
= (1-0.8) + 0.8 * 0.3568/4
= 0.2714
PR(F) = (1-0.8) + 0.8 * PR(B)/4 + PR(C)/2
= (1-0.8) + 0.8 * (0.3568/4) + (0.3568/2)
= 0.4141
So, the final PAGE RANK for the above-given question is,
NODES ITERATION 0 ITERATION 1 ITERATION 2