0% found this document useful (0 votes)
35 views2 pages

Clustering Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views2 pages

Clustering Analysis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Comparison between K-Means and K-Medoids

Clustering Algorithms

Tagaram Soni Madhulatha

Alluri Institute of Management Sciences


Warangal, Andhra Pradesh

Abstract. Clustering is a common technique for statistical data analysis,


Clustering is the process of grouping similar objects into different groups, or
more precisely, the partitioning of a data set into subsets according to some
defined distance measure. Clustering is an unsupervised learning technique,
where interesting patterns and structures can be found directly from very large
data sets with little or none of the background knowledge. It is used in many
fields, including machine learning, data mining, pattern recognition, image
analysis and bioinformatics. In this research, the most representative algorithms
K-Means and K-Medoids were examined and analyzed based on their basic
approach.

Keywords: Clustering, partitional algorithm, K-mean, K-medoid, distance


measure.

1 Introduction

Clustering can be considered the most important unsupervised learning problem; so,
as every other problem of this kind, it deals with finding a structure in a collection of
unlabeled data. A cluster is therefore a collection of objects which are similar between
them and are dissimilar to the objects belonging to other clusters. Besides the term
data clustering as synonyms like cluster analysis, automatic classification, numerical
taxonomy, botrology and typological analysis.
There exist a large number of clustering algorithms in the literature. The choice of
clustering algorithm depends both on the type of data available and on the particular
purpose and application. If cluster analysis is used as a descriptive or exploratory tool, it
is possible to try several algorithms on the same data to see what the data may disclose.
In general, major clustering methods can be classified into the following categories.
1. Partitioning methods
2. Hierarchical methods
3. Density-based methods
4. Grid-based methods
5. Model based methods

D.C. Wyld et al. (Eds.): ACITY 2011, CCIS 198, pp. 472–481, 2011.
© Springer-Verlag Berlin Heidelberg 2011
Comparison between K-Means and K-Medoids Clustering Algorithms 473

Some clustering algorithms integrate the ideas of several clustering methods, so


that it is sometimes difficult to classify a given algorithm as uniquely belonging to
only one clustering method category.

2 Partitional Clustering
Partitioning algorithms are based on specifying an initial number of groups,
and iteratively reallocating objects among groups to convergence. This
algorithm typically determines all clusters at once. most applications adopt one of
two popular heuristic methods like
k-mean algorithm
k-medoids algorithm

2.1 K-Means Algorithm

K means clustering algorithm was developed by J. McQueen and then by J. A.


Hartigan and M. A. Wong around 1975. Simply speaking k-means clustering is an
algorithm to classify the objects based on attributes/features into K number of group.
K is positive integer number. The grouping is done by minimizing the sum of squares
of distances between data and the corresponding cluster centroid. Thus the purpose of
K-mean clustering is to classify the data.

K-means demonstration
Suppose we have 4 objects as your training data points and each object have 2
attributes. Each attribute represents coordinate of the object.

Table 1. Sample data points


SNO Mid-I Mid-II
A 1 1
B 2 1
C 4 3
D 5 4

We also know before hand that these objects belong to two groups of Sno (cluster 1
and cluster 2). The problem now is to determine which Sno’s belong to cluster 1 and
which Sno’s belong to the other cluster.
The basic step of k-means clustering is simple. In the beginning we determine
number of cluster K and we assume the centroid or center of these clusters. We can
take any random objects as the initial centroid or the first K objects in sequence can
also serve as the initial centroid.
Then the K means algorithm will do the three steps below until convergence Iterate
until stable (= no object move group):
1. Determine the centroid coordinate

You might also like