Problems
Problems
Questions
1. Given the following points: 2, 4, 10, 12, 3, 20, 30, 11, 25, and k = 3, use the K-Means algorithm to
compute the clusters and update the means in each iteration. The initial means are:
µ1 = 2, µ2 = 4, µ3 = 6
Show the clusters obtained and the new means after each iteration.
2. Compute the distance matrix Dij = dist(xi , xj ), using the Manhattan distance (i.e., L1 ), given the
data from the following table:
x1 x2
x1 0 0
x2 1 0
x3 2 0
x4 −0.5 −1
x5 0.5 −1
x6 0 −1.5
Then, perform single-linkage, average-linkage and complete-linkage agglomerative clustering and
draw the dendrograms.
3. Use single-link and complete-link divisive clustering to group the data described by the following
distance matrix. Show the dendrograms.
A B C D
A 0 1 4 5
B 0 2 6
C 0 3
D 0
4. Use the K-Means algorithm and Euclidean distance to cluster the following 8 examples into 3
clusters:
A1 = (2, 10), A2 = (2, 5), A3 = (8, 4), A4 = (5, 8), A5 = (7, 5), A6 = (6, 4), A7 = (1, 2), A8 = (4, 9)
Suppose the initial seeds (centers of each cluster) are A1 , A4 , and A7 . Compute the algorithm for
4 epochs and answer the following for each epoch:
i. The new clusters: (i.e., the examples belonging to each cluster)
ii. The centers of the new clusters.
iii. Sum of the Squared Error?
iv. How many more iterations are needed to converge? Draw the result for each epoch.
v. A 10 by 10 space: Plot all 8 points and show the clusters after the first epoch and the new
centroids.
Exercise 1 Page 2
5. A binary classification model is evaluated on two datasets with different class ratios. Each dataset
contains 100 samples. The confusion matrices are as follows:
Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Actual C1 C1 C1 C2 C2 C2 C3 C3 C3 C1 C2 C3 C1 C4 C3 C4 C4 C4 C4 C4
Pred C1 C2 C1 C2 C3 C2 C3 C1 C4 C1 C2 C3 C1 C4 C3 C4 C3 C2 C4 C1