Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
Assignment 10: Introduction To Machine Learning Prof. B. Ravindran
2. (1 mark) For the previous question, in how many iterations will the k-means algorithm con-
verge?
(a) 2
(b) 3
(c) 4
(d) 6
(e) 7
Sol. (c)
3. (1 mark) In the lecture on the BIRCH algorithm, it is stated that using the number of points
N, sum of points SUM and sum of squared points SS, we can determine the centroid and
radius of the combination of any two clusters A and B. How do you determine the centroid of
the combined cluster? (In terms of N,SUM and SS of both the clusters)
(a) SU MA + SU MB
SU MA SU MB
(b) NA + NB
SU MA +SU MB
(c) NA +NB
SSA +SSB
(d) NA +NB
Sol. (c)
Apply the centroid formula to the combined cluster points. It’s simply the sum of all points
divided by the total number of points.
4. (1 mark) What assumption does the CURE clustering algorithm make with regards to the
shape of the clusters?
(a) No assumption
(b) Spherical
1
(c) Elliptical
Sol. (a)
Explanation CURE does not make any assumption on the shape of the clusters.
5. (1 mark) What would be the effect of increasing MinPts in DBSCAN while retaining the same
Eps parameter? (Note that more than one statement may be correct)
(a) Increase in the sizes of individual clusters
(b) Decrease in the sizes of individual clusters
(c) Increase in the number of clusters
(d) Decrease in the number of clusters
Sol. (b), (c)
By increasing the MinPts, we are expecting large number of points in the neighborhood, to
include them in cluster. In one sense, by increasing MinPts, we are looking for dense clusters.
This can break not-so-dense clusters into more than one part, which can lead to reduce the
cluster size and increase the number of clusters.
For the next question, kindly download the dataset - DS1. The first two columns in the
dataset correspond to the co-ordinates of each data point. The third column corresponds two
the actual cluster label.
DS1: https://bit.ly/2Lm75Ly
6. (2 marks) Visualize the dataset DS1. Which of the following algorithms will be able to recover
the true clusters (first check by visual inspection and then write code to see if the result
matches to what you expected).
(a) K-means clustering
(b) Single link hierarchical clustering
(c) Complete link hierarchical clustering
(d) Average link hierarchical clustering
Sol. (b)
The dataset contains spiral clusters. Single link hierarchical clustering can recover spiral
clusters with appropriate parameter settings.
7. (1 marks) Consider the similarity matrix given below: Which of the following shows the
hierarchy of clusters created by the single link clustering algorithm.
P1 P2 P3 P4 P5 P6
P1 1.0000 0.7895 0.1579 0.0100 0.5292 0.3542
P2 0.7895 1.0000 0.3684 0.2105 0.7023 0.5480
P3 0.1579 0.3684 1.0000 0.8421 0.5292 0.6870
P4 0.0100 0.2105 0.8421 1.0000 0.3840 0.5573
P5 0.5292 0.7023 0.5292 0.3840 1.0000 0.8105
P6 0.3542 0.5480 0.6870 0.5573 0.8105 1.0000
2
Sol. (b)
8. (1 marks) For the similarity matrix given in the previous question, which of the following shows
the hierarchy of clusters created by the complete link clustering algorithm.
3
Sol. (d)