AIMLB PGP 2024 Session 12
AIMLB PGP 2024 Session 12
1 Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
similar price
fluctuations
• Summarization
• Reduce the size of
large data sets Clustering precipitation
in Australia
Notion of a Cluster can be Ambiguous
p1
p3 p4
p2
p1 p2 p3 p4
2.5
1.5
y
0.5
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
K-means Clustering – Details
• Simple iterative algorithm.
• Choose initial centroids;
• repeat {assign each point to a nearest centroid; re-compute cluster
centroids}
• until centroids stop changing.
• Initial centroids are often chosen randomly.
• Clusters produced can vary from one run to another
• K-means will converge for common proximity measures with
appropriately defined centroid.
• Most of the convergence happens in the first few iterations.
• Often the stopping condition is changed to ‘Until relatively few points
change clusters’
• Complexity is 𝑂(𝑛 ∗ 𝐾 ∗ 𝐼 ∗ 𝑑)
•𝑛 = number of points, 𝐾 = number of clusters,
𝐼 = number of iterations, 𝑑 = number of attributes
K-means Objective Function
• A common objective function (used with Euclidean
distance measure) is Sum of Squared Error (SSE)
• For each point, the error is the distance to the nearest
cluster center
• To get SSE, we square these errors and sum them.
K
SSE = dist 2 (mi , x )
i =1 xCi
2.5
2
Original Points
1.5
y
1
0.5
3 3
2.5 2.5
2 2
1.5 1.5
y
1
y 1
0.5 0.5
0 0
2.5
1.5
y
0.5
2.5 2.5
2 2
1.5 1.5
y
y
1 1
0.5 0.5
0 0
2 2 2
y
1 1 1
0 0 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Limitations of K-means
1
0.05
3 1
0
1 3 2 5 4 6
Strengths of Hierarchical Clustering
• Basic algorithm
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
p2
p3
p4
p5
.
.
. Proximity Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
Step 4
• We want to merge the two closest clusters (C2 and
C5) and update the proximity matrix.
C1 C2 C3 C4 C5
C1
C2
C3 C3
C4 C4
C5
C1
Proximity Matrix
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
Step 5
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
Proximity Matrix
C1
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3
p4
p5
MIN (Single linkage)
.
MAX (Complete linkage)
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared distance
MIN, MAX, and Group Average
• MIN
• Can handle non-elliptical shapes
• Sensitive to noise and outliers
• MAX
• Less susceptible to noise and outliers
• Tends to break large clusters and biased towards globular
clusters
• Group average
• Compromise between MIN and MAX
• Less susceptible to noise and outliers
• Biased towards globular clusters
Hierarchical Clustering: Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4
5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3
4 1 1
4 4
3
Hierarchical Clustering: Time and
Space requirements
• 𝑂 𝑁 2 space since it uses the proximity matrix.
• 𝑁 is the number of points.
6 9
8
4
7
2 6
SSE
5
0
4
-2 3
2
-4
1
-6 0
2 5 10 15 20 25 30
5 10 15
K
Thank You