Lecture 3. Partitioning-Based Clustering Methods
Lecture 3. Partitioning-Based Clustering Methods
Partitioning-Based
Clustering Methods
Lecture 3. Partitioning-Based Clustering Methods
Basic Concepts of Partitioning Algorithms
2
Session 1: Basic Concepts of
Partitioning Algorithms
Partitioning Algorithms: Basic Concepts
Partitioning method: Discovering the groupings in the data by optimizing a specific
objective function and iteratively improving the quality of partitions
K-partitioning method: Partitioning a dataset D of n objects into a set of K clusters
so that an objective function is optimized (e.g., the sum of squared distances is
minimized, where ck is the centroid or medoid of cluster Ck)
A typical objective function: Sum of Squared Errors (SSE)
K
SSE (C ) || xi ck ||2
k 1 xiCk
Problem definition: Given K, find a partition of K clusters that optimizes the chosen
partitioning criterion
Global optimal: Needs to exhaustively enumerate all partitions
Heuristic methods (i.e., greedy algorithms): K-Means, K-Medians, K-Medoids, etc.
4
Session 2: The K-Means
Clustering Method
The K-Means Clustering Method
K-Means (MacQueen’67, Lloyd’57/’82)
Each cluster is represented by the center of the cluster
Given K, the number of clusters, the K-Means clustering algorithm is outlined as follows
Manhattan distance (L1 norm), Euclidean distance (L2 norm), Cosine similarity
6
Example: K-Means Clustering
Assign
Recompute
points to
cluster
clusters
centers
9
Session 3: Initialization of K-
Means Clustering
Initialization of K-Means
Different initializations may generate rather different clustering
results (some could be far from optimal)
Original proposal (MacQueen’67): Select K seeds randomly
11
Example: Poor Initialization May Lead to Poor Clustering
Assign Recompute
points to cluster
clusters centers
12
Session 4: The K-Medoids
Clustering Method
Handling Outliers: From K-Means to K-Medoids
The K-Means algorithm is sensitive to outliers!—since an object with an extremely
large value may substantially distort the distribution of the data
K-Medoids: Instead of taking the mean value of the object in a cluster as a reference
point, medoids can be used, which is the most centrally located object in a cluster
The K-Medoids clustering algorithm:
Select K points as the initial representative objects (i.e., as initial K medoids)
Repeat
Assigning each point to the cluster with the closest medoid
Randomly select a non-representative object oi
Compute the total cost S of swapping the medoid m with oi
If S < 0, then swap m with oi to form the new set of medoids
Until convergence criterion is satisfied
14
PAM: A Typical K-Medoids Algorithm
10 10 10
9 9 9
8 8 8
7
Arbitrary 7
Assign 7
choose K each
6 6 6
remaining
5 5
4 object as 4 4
3 initial 3 object to 3
2
medoids 2
nearest 2
1 1
medoids 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
K=2
Randomly select a non-
medoid object,Oramdom
Select initial K medoids randomly
10 10
Repeat
Compute
9 9
Swapping O 8 8
6
swapping 6
4
5
improved
improves the clustering quality 3
2
3
1 1
15
Discussion on K-Medoids Clustering
K-Medoids Clustering: Find representative objects (medoids) in clusters
PAM (Partitioning Around Medoids: Kaufmann & Rousseeuw 1987)
Starts from an initial set of medoids, and
Iteratively replaces one of the medoids by one of the non-medoids if it improves
the total sum of the squared errors (SSE) of the resulting clustering
PAM works effectively for small data sets but does not scale well for large data
sets (due to the computational complexity)
Computational complexity: PAM: O(K(n − K)2) (quite expensive!)
Efficiency improvements on PAM
CLARA (Kaufmann & Rousseeuw, 1990):
PAM on samples; O(Ks2 + K(n − K)), s is the sample size
CLARANS (Ng & Han, 1994): Randomized re-sampling, ensuring efficiency + quality
16
Session 5: The K-Medians and K-
Modes Clustering Methods
K-Medians: Handling Outliers by Computing Medians
Medians are less sensitive to outliers than means
Think of the median salary vs. mean salary of a large firm when adding a few top
executives!
K-Medians: Instead of taking the mean value of the object in a cluster as a reference
point, medians are used (L1-norm as the distance measure)
The criterion function for the K-Medians algorithm: K
S | xij med kj |
The K-Medians clustering algorithm: k 1 xiCk
ck
| Ck |
Clustering can be performed without the actual individual projections φ(xi) and φ(xj)
for the data points xi, xj є Ck
22
Example: Kernel Functions and Kernel K-Means Clustering
|| X i X j || 2 /2 2
Gaussian radial basis function (RBF) kernel: K(Xi, Xj) = e
Suppose there are 5 original 2-dimensional points:
x1(0, 0), x2(4, 4), x3(-4, 4), x4(-4, -4), x5(4, -4)
If we set 𝜎 to 4, we will have the following points in the kernel space
2 32
2 2 −
E.g., 𝑥1 − 𝑥2 = 0−4 + 0−4 = 32, therefore, 𝐾 𝑥1 , 𝑥2 = 𝑒 2⋅42 = 𝑒 −1
The original data set The result of K-Means clustering The result of Gaussian Kernel K-Means clustering
The above data set cannot generate quality clusters by K-Means since it contains non-
covex clusters
Gaussian RBF Kernel transformation maps data to a kernel matrix K for any two points
|| X i X j || 2 /2 2
xi, xj: K x x ( xi ) ( x j ) and Gaussian kernel: K(Xi, Xj) = e
i j
26
Recommended Readings
J. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. In Proc.
of the 5th Berkeley Symp. on Mathematical Statistics and Probability, 1967
S. Lloyd. Least Squares Quantization in PCM. IEEE Trans. on Information Theory, 28(2), 1982
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John
Wiley & Sons, 1990
R. Ng and J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. VLDB'94
B. Schölkopf, A. Smola, and K. R. Müller. Nonlinear Component Analysis as a Kernel Eigenvalue
Problem. Neural computation, 10(5):1299–1319, 1998
I. S. Dhillon, Y. Guan, and B. Kulis. Kernel K-Means: Spectral Clustering and Normalized Cuts. KDD’04
D. Arthur and S. Vassilvitskii. K-means++: The Advantages of Careful Seeding. SODA’07
C. K. Reddy and B. Vinzamuri. A Survey of Partitional and Hierarchical Clustering Algorithms, in
(Chap. 4) Aggarwal and Reddy (eds.), Data Clustering: Algorithms and Applications. CRC Press, 2014
M. J. Zaki and W. Meira, Jr.. Data Mining and Analysis: Fundamental Concepts and Algorithms.
Cambridge Univ. Press, 2014
27