100% found this document useful (1 vote)
421 views4 pages

Assignment 10: Introduction To Machine Learning Prof. B. Ravindran

This document contains an assignment on machine learning clustering algorithms with 8 multiple choice questions. Key details: - Question 1 asks about k-means clustering results on a 1D dataset with k=3 initial centers. - Question 2 asks how many iterations k-means takes to converge for the dataset in Question 1. - Question 3 asks about determining the centroid of combined clusters using sums and counts of data points. - Question 4 asks about assumptions of the CURE clustering algorithm's cluster shapes. - Question 5 asks about the effect of increasing the MinPts parameter in DBSCAN. - Question 6 asks to visualize a dataset and identify the clustering algorithm that can recover the true clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
421 views4 pages

Assignment 10: Introduction To Machine Learning Prof. B. Ravindran

This document contains an assignment on machine learning clustering algorithms with 8 multiple choice questions. Key details: - Question 1 asks about k-means clustering results on a 1D dataset with k=3 initial centers. - Question 2 asks how many iterations k-means takes to converge for the dataset in Question 1. - Question 3 asks about determining the centroid of combined clusters using sums and counts of data points. - Question 4 asks about assumptions of the CURE clustering algorithm's cluster shapes. - Question 5 asks about the effect of increasing the MinPts parameter in DBSCAN. - Question 6 asks to visualize a dataset and identify the clustering algorithm that can recover the true clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment 10

Introduction to Machine Learning


Prof. B. Ravindran
1. (2 marks) Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8 and 18.
Given k = 3 and initial cluster centers to be 5, 6 and 31, what are the final cluster centres
obtained on applying the k-means algorithm?
(a) 5, 18, 30
(b) 5, 18, 32
(c) 6, 19, 32
(d) 4.8, 17.6, 32
(e) None of the above
Sol. (d)

2. (1 mark) For the previous question, in how many iterations will the k-means algorithm con-
verge?
(a) 2
(b) 3
(c) 4
(d) 6
(e) 7
Sol. (c)
3. (1 mark) In the lecture on the BIRCH algorithm, it is stated that using the number of points
N, sum of points SUM and sum of squared points SS, we can determine the centroid and
radius of the combination of any two clusters A and B. How do you determine the centroid of
the combined cluster? (In terms of N,SUM and SS of both the clusters)
(a) SU MA + SU MB
SU MA SU MB
(b) NA + NB
SU MA +SU MB
(c) NA +NB
SSA +SSB
(d) NA +NB

Sol. (c)
Apply the centroid formula to the combined cluster points. It’s simply the sum of all points
divided by the total number of points.
4. (1 mark) What assumption does the CURE clustering algorithm make with regards to the
shape of the clusters?

(a) No assumption
(b) Spherical

1
(c) Elliptical
Sol. (a)
Explanation CURE does not make any assumption on the shape of the clusters.
5. (1 mark) What would be the effect of increasing MinPts in DBSCAN while retaining the same
Eps parameter? (Note that more than one statement may be correct)
(a) Increase in the sizes of individual clusters
(b) Decrease in the sizes of individual clusters
(c) Increase in the number of clusters
(d) Decrease in the number of clusters
Sol. (b), (c)
By increasing the MinPts, we are expecting large number of points in the neighborhood, to
include them in cluster. In one sense, by increasing MinPts, we are looking for dense clusters.
This can break not-so-dense clusters into more than one part, which can lead to reduce the
cluster size and increase the number of clusters.

For the next question, kindly download the dataset - DS1. The first two columns in the
dataset correspond to the co-ordinates of each data point. The third column corresponds two
the actual cluster label.
DS1: https://bit.ly/2Lm75Ly
6. (2 marks) Visualize the dataset DS1. Which of the following algorithms will be able to recover
the true clusters (first check by visual inspection and then write code to see if the result
matches to what you expected).
(a) K-means clustering
(b) Single link hierarchical clustering
(c) Complete link hierarchical clustering
(d) Average link hierarchical clustering
Sol. (b)
The dataset contains spiral clusters. Single link hierarchical clustering can recover spiral
clusters with appropriate parameter settings.
7. (1 marks) Consider the similarity matrix given below: Which of the following shows the
hierarchy of clusters created by the single link clustering algorithm.

P1 P2 P3 P4 P5 P6
P1 1.0000 0.7895 0.1579 0.0100 0.5292 0.3542
P2 0.7895 1.0000 0.3684 0.2105 0.7023 0.5480
P3 0.1579 0.3684 1.0000 0.8421 0.5292 0.6870
P4 0.0100 0.2105 0.8421 1.0000 0.3840 0.5573
P5 0.5292 0.7023 0.5292 0.3840 1.0000 0.8105
P6 0.3542 0.5480 0.6870 0.5573 0.8105 1.0000

2
Sol. (b)

8. (1 marks) For the similarity matrix given in the previous question, which of the following shows
the hierarchy of clusters created by the complete link clustering algorithm.

3
Sol. (d)

You might also like