Clustering Algorithms
Clustering Algorithms
d max (C , C * ) = max d ( x, y )
for all elements x in C and y in C*
• Distance between two clusters is the largest distance between any pair of their elements
(complete-linkage)
d avg (C , C ) =
* å d ( x, y )
C C*
for all elements x in C and y in C*
• Distance between two clusters is the average distance between all pairs of their elements
(average-linkage)
Single Linkage example
Single Linkage continued
A B D F
Continued
Complete Linkage Method
Contid….
Contid….
Contid…
Which Distance Measure is Better?
• Each method has both advantages and disadvantages; application-
dependent, single-link and complete-link are the most common
methods
• Single-link
• Can find irregular-shaped clusters
• Sensitive to outliers
• Complete-link, Average-link,
• Robust to outliers
• Tend to break large clusters
• Prefer spherical clusters (smaller sized)
Partitional clustering
• It determines all clusters at once
They include:
• K-means and derivatives
• Fuzzy c-means clustering
• QT clustering algorithm
K –means clustering
K- Means Clustering
K-Means clustering
• consider an example in which our vectors have 2 dimensions
+ +
+ cluster center
profile
+
K-Means clustering
• each iteration involves two steps
• assignment of profiles to clusters
• re-computation of the cluster centers (means)
+ + + +
+ +
+ +
C(i) = argmin1≤k≤K ǁ xi - fk ǁ2
1 X
fk = xi
Nk
i:C(i)=k