19.1. Partitioning-Based Clustering Algorithms
19.1. Partitioning-Based Clustering Algorithms
Clustering Algorithms
CSED, TIET
Partitioning-Based Algorithms- Introduction
▪ Discovers the grouping in the data by optimizing a specific objective function and iteratively
improving the quality of clusters.
▪ These algorithms partitions a dataset D into K clusters so that the objective function is optimized.
▪ In order to find global optimal it needs to exhaustively enumerate all the partitions.
▪ Realistically, we use following heuristic methods (greedy methods) to find the clusters:
▪ K-Means
▪ K-Medians
▪ K-Medoids
K-Means Clustering-Introduction
▪ It is the most popular and widely clustering method.
▪ It has been proposed by Macqueen in 1967.
▪ In K-means clustering algorithm, each cluster is represented by the center of the cluster.
▪ Given K, the number of clusters, the K-means clustering algorithm is outlined as follows:
▪ Select K-points as initial clusters.
▪ Repeat until convergence or for fixed number of iterations
▪ Assignment Step: Form K-clusters by assigning each point to its closest centroid.
▪ Update: Recompute the centroids (i.e., mean point) of each cluster.
▪Different kinds of measures such as Manhattan distance (L1 Norm), Euclidean Distance (L2
Norm), cosine distance are used to find the distance of each point from the centroids.
K-Means Clustering- Introduction
K-Means- Introduction
K-Means Algorithm
K-Means- Numerical Example
Suppose we have four types of medicines with two features: weight index, and pH.
Group these medicines into 2 groups based on their features using K-Means algorithm.
Since, there is no change in centroids and clusters. Hence, the algorithm will stop. So, the final
clusters are (Medicine A, Medicine B) and (Medicine C, Medicine D)
K-Mean Objective Function
▪ K-Means algorithm partitions a dataset D of n objects into a set of K clusters so that the
objective function is maximized.
▪ Particularly, it uses sum of squared error or deviations of each sample from the center of the
cluster to which it belongs.
▪ The sum squared error is given by:
𝐾
𝑆𝑆𝐸(𝐶) = |𝑥𝑖 − 𝑐𝑘 |2
𝑘=1 ∀𝑥𝑖 𝜖𝑐𝑘
▪ The above mean squared error is also called inertia and the mean of the above function is called
distortion.
▪ The SSE strictly decreases after we recompute new centers in the k-means algorithm because the
new center of a cluster comes from the average of all data points in this cluster, which actually
minimizes the SSE.
How to choose K?
▪ There are variety of ways to find the optimal
numbers of clusters (i.e. K)
▪ The most popular method is the elbow method.
▪ In this method, we consider number of distinct
values of K, and for each value of K, the sum
squared deviations of samples from the cluster
center is computed.
▪ To determine the optimal number of clusters, we
have to select the value of k at the “elbow” i.e. the
point after which the distortion/inertia start
decreasing in a linear fashion.
▪ For instance, in the figure, we conclude that the
optimal number of clusters for the data is 3.
How to choose K? (Contd…..)
▪ We can also find the optimal number of clusters using the clustering evaluation metrics.
▪ We can choose different values of number of clusters (K), and for each value of K,
silhouette coefficient is computed. The value for which Silhouette coefficient is
maximum is chosen.
▪ In case the ground truth is available, we can also consider external evaluation metrics
such as accuracy, Rand Index, Adjusted Rand Index, Purity, etc.
K-Means: Key Points
▪ Efficiency: O(tKn): where n are number of objects, K are number of clusters, and t is the number
of iterations.
▪ Normally K,t << n; thus an efficient method.
▪ K-means clustering often terminates at a local optimal.
▪ Initialization can be important to find high quality clusters.
▪ Sensitive to noisy data and outliers.
▪ Variations: K-medians, K-medoids, etc.
▪ K-means is applicable only to objects in a continuous n-dimensional space.
▪ Using the K-modes for categorical data.
▪ Not suitable to discover clusters with non-convex shapes.
▪ Use density-based methods, kernel K-means, etc.
Variations of K-Means
▪ There are many variations of the K-means method, varying in different aspects
1. Choosing better initial cluster centroids.
❑ Update: Recompute the median of each cluster using the median of the
datapoints in the individual clusters.
K-Mode
▪ K-Means cannot handle non-numerical data (categorical data).
▪ Mapping categorical values to numerical values does not generate quality clusters for high
dimensional data.
▪ A mixture of categorical and numerical features uses K-prototype algorithm (i.e. combination of K-
means and K-modes algorithm).
Kernel K-Means
▪ K-Means algorithm cannot be used to detect non-convex clusters.
▪ It can only detect clusters that are linearly separable (convex shapes).
▪ Convex shaped clusters are those in which the line segment joining any two points lie completely
inside the cluster.
▪Idea: Project data onto high-dimensional kernel space, and then perform K-means
clustering.
▪ Map data points in the feature space to a high-dimensional feature space using Kernel Functions.
▪ Apply K-means clustering on the mapped feature space.
Kernel K-Means (Contd….)
▪ Computational complexity is higher than K-Means.
▪ Need to compute and store the nxn kernel matrix generated from the kernel function
on original data (of order kxn).
Some Kernel Functions
|𝑥𝑖 −𝑥𝑗 |2
−
2𝜎2
𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑅𝑎𝑑𝑖𝑎𝑙 𝐵𝑎𝑖𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = 𝐾 𝑥𝑖 , 𝑥𝑗 = 𝑒