CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
Email: [email protected]
Handphone No.: +91-9944226963
1 Module 3: Clustering
Introduction to Clustering
K -Means Clustering
K -Medoids Clustering
Hierarchical Clustering
Applications of Clustering
Introduction to Clustering
K -Means Clustering
1 X
zi (n + 1) = x
Ni
x∈Gi (n)
K -Means Clustering
Question 3.1
Apply K -means clustering to cluster the following samples/data points:
(0,0), (0,1), (1,0), (3,3), (5,6), (8,9), (9,8) and (9,9).
Fix K = 2 and choose (0,0) and (5,6) as the initial cluster centres.
K -Medoids Clustering
In K -medoids clustering, each cluster is represented by a cluster
medoid which is one among the data points in the cluster.
The medoid of a cluster is defined as a data point in the cluster
whose average dissimilarity to all the other data points in the cluster
is minimal. As ‘medoid’ is the most centrally located point in the
cluster, the cluster representatives can be interpreted in a better way
(compared to K -means).
In K -medoids can use arbitrary dissimilarity measures, whereas
K -means generally requires Euclidean distance for better
performance. In general, K -medoids use Manhattan distance and
minimizes the sum of pairwise dissimilarities.
As in the case of K -means, the value of K needs to be specified
beforehand. An heuristic approach, the ‘silhouette method’ , can be
used for determining the optimal value of K .
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 10 / 17
Module 5: Clustering
K -Medoids Clustering
Hierarchical Clustering
Click here for more details (particularly an illustration) on hierachical
clustering.
The dendrogram obtained at the end of hierarchical clustering shows
the hierarchical relationship between the clusters.
After completing the merging step, it is necessary to update the
similarity matrix. The updation can be based on (i) the two most
similar parts of a cluster (single-linkage), (ii) the two least similar bits
of a cluster (complete-linkage), or (iii) the center of the clusters
(mean or average-linkage). Refer Figure 2.
The choice of similarity or distance metric and the choice of linkage
criteria are always application-dependent.
Hierarchical clustering can also be done by initially treating all data
points as one cluster, and then successively splitting them. This
approach is called the divisive hierarchical clustering.
Facilitator: Dr Sathiya Narayanan S VIT-Chennai - SENSE Winter Semester 2020-21 13 / 17
Module 3: Clustering
Hierarchical Clustering
Question 3.2
Consider the similarity matrix given below.
Applications of Clustering
Module-3 Summary