Clustering Analysis
Clustering Analysis
Clustering Algorithms
1 Introduction
Clustering can be considered the most important unsupervised learning problem; so,
as every other problem of this kind, it deals with finding a structure in a collection of
unlabeled data. A cluster is therefore a collection of objects which are similar between
them and are dissimilar to the objects belonging to other clusters. Besides the term
data clustering as synonyms like cluster analysis, automatic classification, numerical
taxonomy, botrology and typological analysis.
There exist a large number of clustering algorithms in the literature. The choice of
clustering algorithm depends both on the type of data available and on the particular
purpose and application. If cluster analysis is used as a descriptive or exploratory tool, it
is possible to try several algorithms on the same data to see what the data may disclose.
In general, major clustering methods can be classified into the following categories.
1. Partitioning methods
2. Hierarchical methods
3. Density-based methods
4. Grid-based methods
5. Model based methods
D.C. Wyld et al. (Eds.): ACITY 2011, CCIS 198, pp. 472–481, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Comparison between K-Means and K-Medoids Clustering Algorithms 473
2 Partitional Clustering
Partitioning algorithms are based on specifying an initial number of groups,
and iteratively reallocating objects among groups to convergence. This
algorithm typically determines all clusters at once. most applications adopt one of
two popular heuristic methods like
k-mean algorithm
k-medoids algorithm
K-means demonstration
Suppose we have 4 objects as your training data points and each object have 2
attributes. Each attribute represents coordinate of the object.
We also know before hand that these objects belong to two groups of Sno (cluster 1
and cluster 2). The problem now is to determine which Sno’s belong to cluster 1 and
which Sno’s belong to the other cluster.
The basic step of k-means clustering is simple. In the beginning we determine
number of cluster K and we assume the centroid or center of these clusters. We can
take any random objects as the initial centroid or the first K objects in sequence can
also serve as the initial centroid.
Then the K means algorithm will do the three steps below until convergence Iterate
until stable (= no object move group):
1. Determine the centroid coordinate