4.1 Clustering
4.1 Clustering
AY 2023-2024 SEM-VI
1
MIT School of Computing
Department of Information Technology
Unit-VI Syllabus
PL
D
2
Introduction to Clustering
Clustering
(Unsupervised Learning)
Given: Examples: < x1 , x2 , … x n >
Find: A natural clustering (grouping) of the data
Example Applications:
Identify similar energy, use customer profiles
<x> = time series of energy usage
Clustering is subjective
Similarity is
hard to define,
but…
“We know it
when we see it”
Defining Distance Measures
Slide from Eamonn Keogh
Definition: Let O1 and O2 be two objects from the universe of
possible objects. The distance (dissimilarity) between O1 and
O2 is a real number denoted by D(O1,O2)
Peter Piotr
0.23 3 342.7
Slide based on one by Eamonn Keogh
D(A,B) = D(B,A)
Otherwise you could claim “Alex looks like Bob, but Bob looks nothing like
Alex.”
D(A,A) = 0
Otherwise you could claim “Alex looks more like Bob, than Bob does.”
Slide based on one by Eamonn Keogh
Hierarchical Partitional
Slide based on one by Eamonn Keogh
Partitional Clustering
• Nonhierarchical, each instance is placed in
exactly one of K non-overlapping clusters.
• Since only one set of clusters is output, the
user normally has to input the desired
number of clusters K.
Slide based on one by Eamonn Keogh
4
k1
3
2
k2
k3
0
0 1 2 3 4 5
4
k1
3
2
k2
k3
0
0 1 2 3 4 5
4
k1
k3
1 k2
0
0 1 2 3 4 5
4
k1
k3
1 k2
0
0 1 2 3 4 5
4
k1
k2
1
k3
0
0 1 2 3 4 5
expression in condition 1
Slide based on one by Eamonn Keogh
Comments on k-Means
• Strengths
– Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.
– Often terminates at a local optimum.
• Weakness
– Applicable only when mean is defined, then what about
categorical data?
– Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers
– Not suitable to discover clusters with non-convex shapes
Peter Piotr
0.23 3 342.7
Pioter
Deletion (e)
Piotr
Slide based on one by Eamonn Keogh
But should we use Euclidean Distance?
1 2 3 4 5 6 7 8 9 10
Slide based on one by Eamonn Keogh
When k = 2, the objective function is 173.1
1 2 3 4 5 6 7 8 9 10
Slide based on one by Eamonn Keogh
When k = 3, the objective function is 133.6
1 2 3 4 5 6 7 8 9 10
Slide based on one by Eamonn Keogh
We can plot the objective function values for k equals 1 to 7…
1.00E+03
Objective Function
9.00E+02
8.00E+02
7.00E+02
6.00E+02
5.00E+02
4.00E+02
3.00E+02
2.00E+02
1.00E+02
0.00E+00
1 2 3 4 5 6
k
High-Dimensional Data poses
Problems for Clustering
• Difficult to find true clusters
– Irrelevant and redundant features
– All points are equally close
Animation-
https://shabal.in/visuals/kmeans/1.html
• There are various distance measures to
calculate similarity between data points or
clusters. Some of them are listed below:-
• · Euclidean distance
• · Manhattan distance
• · Canberra distance
• · Binary distance
• · Minkowski distance
Solved Example
• Apply Classic K(=2)-Means algorithm over the
data (185, 72), (170, 56), (168, 60), (179,68),
(182,72), (188,77) up to two iterations and
show the clusters. Initially choose first two
objects as initial centroids.
Do Not Solve by this Type/method
2. Hierarchical clustering
Outlier
Agglomerative and Divisive Clustering
Agglomerative Clustering Algorithm
1. Form as many clusters as there are data points (e.g. begin with N clusters)
2. Take two nearest data points and make them a cluster (now you will be left with N-
1 clusters)
3. Take two nearest data points and make them a cluster (now you will be left with N-
2 clusters)
4. Repeat step 3 until there is one cluster.
Slide based on one by Eamonn Keogh
0 2 4 4
0 3 3
D( , ) = 8 0 1
D( , ) = 1 0
Bottom-Up (agglomerative): Starting This slide and next 4 based on
with each item in its own cluster, slides by Eamonn Keogh
find the best pair to merge into a
new cluster. Repeat until all clusters
are fused together.
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Single linkage 29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7
Average linkage
The similarity between two objects in a dendrogram is represented as
the height of the lowest internal node they share
Slide based on one by Eamonn Keogh