Machine Learning Homework 8
Machine Learning Homework 8
Collaboration: None
Acknowledgments: None
Exercise 1.
Solution 1.
Data points with three initial centroids
12
Initial Centroids
Data Points
10 A1
A8
8 A4
6
y
A2 A5
4 A6 A3
2 A7
0
0 2 4 6 8 10
x
We draw 9 data points in the above 2-D space.
We run K means clustering in 1 epoch to cluster these data points. Kmeans clustering
algorithm is showed below: Following the algorithm, we have initiate centroids as exercise.
Now, we will form 3 clusters. To do this, we need to calculate the distance between each
centroid to other non-centroids point. We use Euclidean distance formula to measure these
distances. The result is showed below:
1
Algorithm 1 K-means Clustering Algorithm
1: Select K points as the initial centroids.
2: repeat
3: Form K clusters by assigning all points to the closest centroid.
4: Recompute the centroid of each cluster.
5: until The centroids don’t change
Points A1 A4 A7
√ √
A2 5
√ 3 2 √10
A3 6√2 √5 √53
A5 5√ 2 √13 3√ 5
A6 2√13 √17 √29
A8 5 2 58
The table shows that: A2 is nearest to A7, while A3, A5, A6, A8 are nearest to A4. As the
result, we have three clusters:
Cluster1 = [A1]
Cluster2 = [A3, A4, A5, A6, A8]
Cluster3 = [A2, A7]
Finally, we recompute the centroid of each clusters:
2
Result after 1 epoch of K-means
12
Cluster 1
Cluster 2
10 Centroid1 A1 Cluster 3
A8
8 A4
6 Centroid2
y
A2 A5
4 A6 A3
Centroid3
2 A7
0
−2 0 2 4 6 8 10
x
Exercise 2.
Solution 2.
As the algorithm said, first we need to compute which are core points, border points or noise
points.
The core points need to satisfy the condition that overlaps at least 2 points, with the radius
ϵ = 2. As the result, the core points are [A3, A5, A6].
There is no border point (the core points have no neighbourhood that is not core point).
Apart from core points are showed, all other points are noise point.
We connect three points [A3, A5, A6], make it a cluster. So with ϵ = 2 and min sample=2,
DBSCAN discovers only 1 cluster C=[A3, A5, A6]. We draw a graph below to see clearly
the cluster.
3
Result of DBSCAN with ϵ = 2
12
Noise Point
Cluster
10 A1
A8
8 A4
6
y
A2 A5
4 A6 A3
2 A7
0
0 2 4 6 8 10
x
√
By increasing ϵ to 10, the core point list includes [A3, A5, A6, A8]. The border points list
includes [A1, A4], as these points are neighbourhood of A8, a core point. The noise point
includes A2, and A7. √
We connect core points A3, A5, A6 as their distance values are smaller than 10, then these
three points. make a √ cluster. Otherwise, there is no core point that is near A8 with the
distance smaller than 10, so A8 make a cluster by it self.
The two border points A1 and A4 is the neighbourhood of A8, so we assign these points as
the member of cluster created by A8. A2 and A7 is noise point, so they are not assigned to
any cluster. √
To conclude, two clusters are discovered by DBSCAN with ϵ = 10 and min sample=2 is:
4
√
Result of DBSCAN with ϵ = 10;
12
10 A1
A8
8 A4
6
y
A2 A5
4 A6 A3
2 A7
0
0 2 4 6 8 10
x