0% found this document useful (0 votes)

15 views

A Comprehensive Survey of Clustering Algorithms

Uploaded by

ridhatullah216

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

A Comprehensive Survey of Clustering Algorithms

Uploaded by

ridhatullah216

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282523039

A Comprehensive Survey of Clustering Algorithms

Article in Annals of Data Science · August 2015

DOI: 10.1007/s40745-015-0040-1

CITATIONS READS

1,572 6,756

2 authors:

Dongkuan Xu Yingjie Tian

Pennsylvania State University Chinese Academy of Sciences
28 PUBLICATIONS 1,863 CITATIONS 249 PUBLICATIONS 6,349 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dongkuan Xu on 12 October 2015.

The user has requested enhancement of the downloaded file.

Ann. Data. Sci.
DOI 10.1007/s40745-015-0040-1

A Comprehensive Survey of Clustering Algorithms

Dongkuan Xu1,2 · Yingjie Tian2,3

Received: 25 May 2015 / Revised: 18 July 2015 / Accepted: 31 July 2015

Abstract Data analysis is used as a common method in modern science research,

which is across communication science, computer science and biology science. Clus-
tering, as the basic composition of data analysis, plays a significant role. On one hand,
many tools for cluster analysis have been created, along with the information increase
and subject intersection. On the other hand, each clustering algorithm has its own
strengths and weaknesses, due to the complexity of information. In this review paper,
we begin at the definition of clustering, take the basic elements involved in the cluster-
ing process, such as the distance or similarity measurement and evaluation indicators,
into consideration, and analyze the clustering algorithms from two perspectives, the
traditional ones and the modern ones. All the discussed clustering algorithms will be
compared in detail and comprehensively shown in Appendix Table 22.

Keywords Clustering · Clustering algorithm · Clustering analysis · Survey ·

Unsupervised learning

B Yingjie Tian
[email protected]
Dongkuan Xu
[email protected]

1 School of Mathematical Sciences, University of Chinese Academy of Sciences,

Beijing 100049, China
2 Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences,
Beijing 100190, China
3 Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences,
Beijing 100190, China

123
Ann. Data. Sci.

1 Introduction

Clustering, considered as the most important question of unsupervised learning, deals

with the data structure partition in unknown area and is the basis for further learning.
The complete definition for clustering, however, isn’t come to an agreement, and a
classic one is described as follows [1]:
(1) Instances, in the same cluster, must be similar as much as possible;
(2) Instances, in the different clusters, must be different as much as possible;
(3) Measurement for similarity and dissimilarity must be clear and have the practical
meaning;
The standard process of clustering can be divided into the following several steps
[2]:
(1) Feature extraction and selection: extract and select the most representative features
from the original data set;
(2) Clustering algorithm design: design the clustering algorithm according to the
characteristics of the problem;
(3) Result evaluation: evaluate the clustering result and judge the validity of algo-
rithm;
(4) Result explanation: give a practical explanation for the clustering result;
In the rest of this paper, the common similarity and distance measurements will be
introduced in Sect. 2, the evaluation indicators for the clustering result will be listed in
section 3, the traditional clustering algorithms and the modern ones will be analyzed
systematically respectively in Sects. 4 and 5, and the final conclusion will be drawn
in Sect. 6.

2 Distance and Similarity

Distance (dissimilarity) and similarity are the basis for constructing clustering algo-
rithms. As for quantitative data features, distance is preferred to recognize the
relationship among data. And similarity is preferred when dealing with qualitative
data features [2].
The common used distance functions for quantitative data feature are summarized
in Table 1.
The common used similarity functions for qualitative data feature are summarized
in Table 2.

3 Evaluation Indicator

The main purpose of evaluation indicator is to test the validity of algorithm. Evaluation
indicators can be divided into two categories, the internal evaluation indicators and
the external evaluation indicators, in terms of the test data whether in the process of
constructing the clustering algorithm.
The internal evaluation takes the internal data to test the validity of algorithm. It,
however, can’t absolutely judge which algorithm is better when the scores of two

123
Ann. Data. Sci.

Table 1 Distance functions

Name Formula Explanation

1/n
d

Minkowski distance xil − x jl n A set of definitions for
l=1 distance:
1. City-block distance when n
=1
2. Euclidean distance when n
=2
3. Chebyshev distance when
n→ ∞
1/2
d x −x 2
il jl
Standardized Euclidean sl 1. S stands for the standard
distance l=1 deviation
2. A weighted Euclidean
distance based on the
deviation
xiT x j
Cosine distance 1 − cos α = 1. Stay the same in face of the
xi x j
rotation change of data
2. The most commonly used
distance in document area

Cov xi ,x j
Pearson correlation distance 1− √ 1. Cov stands for the
D (xi ) D x j covariance for and D stands
for the variance
2. Measure the distance based
on linear correlation
T
Mahalanobis distance xi − x j S −1 xi − x j 1. S is the covariance matrix
inside the cluster
2. With high computation
complexity

algorithms are not equal based on the internal evaluation indicators [5]. There are
three commonly used internal indicators, summarized in Table 3.
The external evaluation, which is called as the gold standard for testing method,
takes the external data to test the validity of algorithm. However, it turns out that the
external evaluation is not completely correct recently [6]. There are six common used
external evaluation indicators, summarized in Table 4.
In the following sections, especially in the analysis of time complexity, n stands for
the number of total objects/data points, k stands for the number of clusters, s stands
for the number of sample objects/data points, and t stands for the number of iterations.

4 Traditional Clustering Algorithms

The traditional clustering algorithms can be divided into 9 categories which mainly
contain 26 commonly used ones, summarized in Table 5.

123
Ann. Data. Sci.

Table 2 Similarity functions

Name Function formula or Explanation

measure method

Jaccard similarity J (A, B) = |A∩B|

|A∪B| 1. Measure the similarity of
two sets
2. |X| Stands for the number
of elements of set X
3. Jaccard distance = 1 −
Jaccard similarity
Hamming similarity The minimum number of The number is smaller, the
substitutions needed to similarity is more
change one data point
into the other
Hamming distance is the
opposite of Hamming
similarity
Especially for the data of
string
For data of mixed type Map the feature into (0, 1) [3,4]
Transform the feature into
dichotomous one

d
Si j = d1 Si jl
l=1
Si j =
d d
l=1 ηi jl Si jl / l=1 ηi jl

4.1 Clustering Algorithm Based on Partition

The basic idea of this kind of clustering algorithms is to regard the center of data
points as the center of the corresponding cluster. K-means [7] and K-medoids [8]
are the two most famous ones of this kind of clustering algorithms. The core idea of
K-means is to update the center of cluster which is represented by the center of data
points, by iterative computation and the iterative process will be continued until some
criteria for convergence is met. K-mediods is an improvement of K-means to deal with
discrete data, which takes the data point, most near the center of data points, as the
representative of the corresponding cluster. The typical clustering algorithms based
on partition also include PAM [9], CLARA [10], CLARANS [11].
For more information about this kind of clustering algorithms, you can refer to
[12–14].
Analysis:
(1) Time complexity (Table 6):
(2) Advantages: relatively low time complexity and high computing efficiency in
general;

123
Ann. Data. Sci.

Table 3 Evaluation indicators

Name Formula or measure method Explanation

k σi +σ j
Davies–Bouldin D B = k1 max K stands for the number of
d ci ,c j
indicator i=1 i = j clusters, C x is the center of
cluster x, σx is the average
distance between any data
incluster x and C x ,
d ci , c j is the distance
between ci and c j
Dunn indicator D= 1. Mainly for the data that has
d(i, j) even density and
min min max1≤k≤n d (k)
1≤i≤n 1≤ j≤n,i = j distribution

2. d ci , c j is the distance
between ci and c j , d (k)
stands for the distance in
cluster k
Silhouette Evaluate the clustering result
coefficient based on the average distance
between a data point and other
data points in the same cluster
and average distance among
different clusters

(3) Disadvantages: not suitable for non-convex data, relatively sensitive to the outliers,
easily drawn into local optimal, the number of clusters needed to be preset, and
the clustering result sensitive to the number of clusters.;
(4) AP algorithm [15], which will be discussed in the section Clustering algorithm
based on affinity propagation, can also be considered as one of this kind of clus-
tering algorithm.

4.2 Clustering Algorithm Based on Hierarchy

The basic idea of this kind of clustering algorithms is to construct the hierarchical
relationship among data in order to cluster [16]. Suppose that each data point stands
for an individual cluster in the beginning, and then, the most neighboring two clusters
are merged into a new cluster until there is only one cluster left. Or, a reverse process.
Typical algorithms of this kind of clustering include BIRCH [17], CURE [18], ROCK
[19], Chameleon [20]. BIRCH realizes the clustering result by constructing the fea-
ture tree of clustering, CF tree, of which one node stands for a subcluster. CF tree
will dynamically grow when a new data point comes. CURE, suitable for large-scale
clustering, takes random sampling technique to cluster sample separately and inte-
grates the results finally. ROCK is an improvement of CURE for dealing with data of
enumeration type, which takes the effect on the similarity from the data around the
cluster into consideration. Chameleon, at first, divides the original data into clusters
with smaller size based on the nearest neighbor graph, and then the clusters with small

123
Ann. Data. Sci.

Table 4 Evaluation indicators

Name Formula or measure method Explanation

Rand indicator T P+T N

R I = T P+F 1. TP is the number of true
P+F N +T N
positives
2. TN is the number of true
negatives
3. FP is the number of false
positives
4. FN is the number of false
negatives
β 2 +1 ·P·R
F indicator Fβ = TP
1. P = T P+F
β 2 ·P+R P stands for
the accuracy, R = T P+F TP
N
stands for the recall rate
2. TP, TN, FP, and FN are
defined as before
Jaccard indicator J (A, B) = |A∩B|
|A∪B| = 1. Measure the similarity of
TP two sets
T P+F P+F N
2. |X| Stands for the number
of elements of set X
3. TP, TN, FP, and FN are
defined as before
FM = TP TP
Fowlkes–Mallows T P+F P · T P+F N TP, TN, FP, and FN are
indicator defined as before
Mutual To measure, based on
information information theory, how much
information is shared by two
clusters, between which the
nonlinear correlation can be
detected
Confusion matrix To figure out the difference
between a cluster and a
gold-standard cluster

size are merged into a cluster with bigger size, based on agglomerative algorithm, until
satisfied.
For more information about this kind of clustering algorithms, you can refer to
[21,22].
Analysis:
(1) Time complexity (Table 7):
(2) Advantages: suitable for the data set with arbitrary shape and attribute of arbitrary
type, the hierarchical relationship among clusters easily detected, and relatively
high scalability in general;
(3) Disadvantages: relatively high in time complexity in general, the number of clus-
ters needed to be preset.

123
Ann. Data. Sci.

Table 5 Traditional algorithms

Category Typical algorithm

Clustering algorithm based K-means, K-medoids, PAM,

on partition CLARA, CLARANS
Clustering algorithm based BIRCH, CURE, ROCK,
on hierarchy Chameleon
Clustering algorithm based FCM, FCS, MM
on fuzzy theory
Clustering algorithm based DBCLASD, GMM
on distribution
Clustering algorithm based DBSCAN, OPTICS,
on density Mean-shift
Clustering algorithm based CLICK, MST
on graph theory
Clustering algorithm based STING, CLIQUE
on grid
Clustering algorithm based FC
on fractal theory
Clustering algorithm based COBWEB, GMM, SOM,
on model ART

Table 6 Time complexity

K-means K-medoids PAM CLARA CLARANS

O(knt) O(k(n-k)ˆ2) O(kˆ3*nˆ2) O(ksˆ2+k(n-k)) O(nˆ2)

Low High High Middle High

Table 7 Time complexity

BIRCH CURE ROCK Chameleon

O(n) O(sˆ2s), O(nˆ2logn) O(nˆ2)

Low Low High High

4.3 Clustering Algorithm Based on Fuzzy Theory

The basic idea of this kind of clustering algorithms is that the discrete value of belong-
ing label, {0, 1}, is changed into the continuous interval [0, 1], in order to describe
the belonging relationship among objects more reasonably. Typical algorithms of this
kind of clustering include FCM [23–25], FCS [26] and MM [27]. The core idea of
FCM is to get membership of each data point to every cluster by optimizing the object
function. FCS, different from the traditional fuzzy clustering algorithms, takes the
multidimensional hypersphere as the prototype of each cluster, so as to cluster with
the distance function based on the hypersphere. MM, based on the Mountain Function,
is used to find the center of cluster.
For more information about this kind of clustering algorithms, you can refer to
[28–30].

123
Ann. Data. Sci.

Table 8 Time complexity

FCM FCS MM

O(n) (kernel) O(vˆ2*n)

Low High Middle

Table 9 Time complexity

DBCLASD GMM

O(n*logn) O(nˆ2*kt)
Middle High

Analysis:
1) Time complexity (Table 8):
2) The time complexity of FCS is high for the kernel involved in the algorithm;
3) Advantages: more realistic to give the probability of belonging, relatively high
accuracy of clustering;
4) Disadvantages: relatively low scalability in general, easily drawn into local opti-
mal, the clustering result sensitive to the initial parameter values, and the number
of clusters needed to be preset.

4.4 Clustering Algorithm Based on Distribution

The basic idea is that the data, generated from the same distribution, belongs to the
same cluster if there exists several distributions in the original data. The typical algo-
rithms are DBCLASD [31] and GMM [32]. The core idea of DBCLASD, a dynamic
incremental algorithm, is that if the distance between a cluster and its nearest data
point satisfies the distribution of expected distance which is generated from the exist-
ing data points of that cluster, the nearest data point should belong to this cluster. The
core idea of GMM is that GMM consists of several Gaussian distributions from which
the original data is generated and the data, obeying the same independent Gaussian
distribution, is considered to belong to the same cluster.
For more information about this kind of clustering algorithms, you can refer to
[33,34].
Analysis:
(1) Time complexity (Table 9):
(2) Advantages: more realistic to give the probability of belonging, relatively high
scalability by changing the distribution, number of clusters and so on, and sup-
ported by the well developed statistical science;
(3) Disadvantages: the premise not completely correct, involved in many parameters
which have a strong influence on the clustering result and relatively high time
complexity.

123
Ann. Data. Sci.

Table 10 Time complexity

DBSCAN OPTICS Mean-shift

O(nlogn) O(nlogn) (kernel)

Middle Middle High

4.5 Clustering Algorithm Based on Density

The basic idea of this kind of clustering algorithms is that the data which is in the region
with high density of the data space is considered to belong to the same cluster [35]. The
typical ones include DBSCAN [36], OPTICS [37] and Mean-shift [38]. DBSCAN is
the most well known density-based clustering algorithm, which is generated from the
basic idea of this kind of clustering algorithms directly. OPTICS is an improvement
of DBSCAN and it overcomes the shortcoming of DBSCAN that being sensitive to
two parameters, the radius of the neighborhood and the minimum number of points in
a neighborhood. In the process of Mean-shift, the mean of offset of current data point
is calculated at first, the next data point is figured out based on the current data point
and the offset then, and last, the iteration will be continued until some criteria are met.
For more information about this kind of clustering algorithms, you can refer to
[39–42].
Analysis:

(1) Time complexity (Table 10):

(2) The time complexity of Mean-shift is high for the kernel involved in the algorithm;
(3) Advantages: clustering in high efficiency and suitable for data with arbitrary
shape;
(4) Disadvantages: resulting in a clustering result with low quality when the density
of data space isn’t even, a memory with big size needed when the data volume is
big, and the clustering result highly sensitive to the parameters;
(5) DENCLUE algorithm [43], which will be discussed in the section Clustering
algorithm for large-scale data, can also be considered as one of this kind of
clustering algorithms.

4.6 Clustering Algorithm Based on Graph Theory

According to this kind of clustering algorithms, clustering is realized on the graph

where the node is regarded as the data point and the edge is regarded as the relationship
among data points. Typical algorithms of this kind of clustering are CLICK [44] and
MST-based clustering [45]. The core idea of CLICK is to carry out the minimum weight
division of the graph with iteration in order to generate the clusters. Generating the
minimum spanning tree from the data graph is the key step to do the cluster analysis
for the MST-based clustering algorithm.
For more detailed information about this kind of clustering algorithms, you can
refer to [1,20,46–49].

123
Ann. Data. Sci.

Table 11 Time complexity

CLICK MST

O(kf(v, e)) O(elogv)

Low Middle

Table 12 Time complexity

STING CLIQUE

O(n) O(n+kˆ2)
Low Low

Analysis:
(1) Time complexity (Table 11):
where v stands for the number of vertices, e stands for the number of edges, and
f(v, e) stands for the time complexity of computing a minimum cut;
(2) Advantages: clustering in high efficiency, the clustering result with high accuracy;
(3) Disadvantages: the time complexity increasing dramatically with the increasing
of graph complexity;
(4) SM algorithm [50] and NJW algorithm [51], which will be discussed in the section
Clustering algorithm based on spectral graph theory, can also be considered as
ones of this kind of clustering algorithms.

4.7 Clustering Algorithm Based on Grid

The basic idea of this kind of clustering algorithms is that the original data space is
changed into a grid structure with definite size for clustering. The typical algorithms
of this kind of clustering are STING [52] and CLIQUE [53]. The core idea of STING
which can be used for parallel processing is that the data space is divided into many
rectangular units by constructing the hierarchical structure and the data within different
structure levels is clustered respectively. CLIQUE takes advantage of the grid-based
clustering algorithms and the density-based clustering algorithms.
For more detailed information about this kind of clustering algorithms, you can
refer to [41,54–57].
Analysis:
(1) Time complexity (Table 12):
(2) Advantages: low time complexity, high scalability and suitable for parallel
processing and increment updating;
(3) Disadvantages: the clustering result sensitive to the granularity (the mesh size),
the high calculation efficiency at the cost of reducing the quality of clusters and
reducing the clustering accuracy;
4) Wavecluster algorithm [54], which will be discussed in the section Clustering
algorithm for spatial data, can also be considered as ones of this kind of clustering
algorithms.

123
Ann. Data. Sci.

4.8 Clustering Algorithm Based on Fractal Theory

Fractal stands for the geometry that can be divided into several parts which share
some common characters with the whole [58]. The typical algorithm of this kind of
clustering is FC [59] of which the core idea is that the change of any inner data of a
cluster does not have any influence on the intrinsic quality of the fractal dimension.
For more detailed information about this kind of clustering algorithms, you can
refer to [60–63].
Analysis:
(1) The time complexity of FC is O(n);
(2) Advantages: clustering in high efficiency, high scalability, dealing with outliers
effectively and suitable for data with arbitrary shape and high dimension;
(3) Disadvantages: the premise not completely correct, the clustering result sensitive
to the parameters.

4.9 Clustering Algorithm Based on Model

The basic idea is to select a particular model for each cluster and find the best fitting
for that model. There are mainly two kinds of model-based clustering algorithms, one
based on statistical learning method and the other based on neural network learning
method.
The typical algorithms, based on statistical learning method, are COBWEB [64]
and GMM [32]. The core idea of COBWEB is to build a classification tree, based on
some heuristic criteria, in order to realize hierarchical clustering on the assumption that
the probability distribution of each attribute is independent. The typical algorithms,
based on neural network learning method, are SOM [65] and ART [66–69]. The core
idea of SOM is to build a mapping of dimension reduction from the input space of
high dimension to output space of low dimension on the assumption that there exists
topology in the input data. The core idea of ART, an incremental algorithm, is to
generate a new neuron dynamically to match a new pattern to create a new cluster
when the current neurons are not enough. GMM has been discussed in the section
Clustering algorithm based on distribution.
For more detailed information about this kind of clustering algorithms, you can
refer to [70–75].
Analysis:
(1) Time complexity (Table 13):
(2) The time complexity of COBWEB is generally low, which depends on the distri-
bution involved in the algorithm;

Table 13 Time complexity

COBWEB GMM SOM ART

(distribution) O(nˆ2*kt) (layer) (type+layer)

Low High High Middle

123
Ann. Data. Sci.

(3) The time complexity of SOM is generally high, which depends on the layer
construction involved in the algorithm;
(4) The time complexity of ART is generally middle, which depends on the type of
ART and the layer construction involved in the algorithm;
(5) Advantages: diverse and well developed models providing means to describe data
adequately and each model having its own special characters that may bring about
some significant advantages in some specific areas;
(6) Disadvantages: relatively high time complexity in general, the premise not com-
pletely correct, and the clustering result sensitive to the parameters of selected
models.

5 Modern Clustering Algorithms

The modern clustering algorithms can be divided into 10 categories which mainly
contain 45 commonly used ones, summarized in Table 14.

Table 14 Modern algorithms

Category Typical algorithm

Clustering algorithm based on kernel K-means, kernel SOM, kernel

kernel FCM, SVC, MMC, MKC
Clustering algorithm based on Methods for generating the set of
ensemble clusters: 4 types Consensus
function: CSPA, HGPA, MCLA,
VM, HCE, LAC, WPCK, sCSPA,
sMCLA, sHBGPA
Clustering algorithm based on ACO_based(LF), PSO_based,
swarm intelligence SFLA_based, ABC_based
Clustering algorithm based on QC, DQC
quantum theory
Clustering algorithm based on SM, NJW
spectral graph theory
Clustering algorithm based on AP
affinity propagation
Clustering algorithm based on DD
density and distance
Clustering algorithm for spatial DBSCAN, STING, Wavecluster,
data CLARANS
Clustering algorithm for data STREAM, CluStream, HPStream,
stream DenStream
Clustering algorithm for K-means, BIRCH, CLARA, CURE,
large-scale data DBSCAN, DENCLUE,
Wavecluster, FC

123
Ann. Data. Sci.

Table 15 Time complexity

kernel K-means kernel SOM kernel FCM SVC MMC MKC

(kernel) (kernel) (kernel) (kernel) (kernel) (kernel)

High High High High High High

5.1 Clustering Algorithm Based on Kernel

The basic idea of this kind of clustering algorithms is that data in the input space is
transformed into the feature space of high dimension by the nonlinear mapping for
the cluster analysis. The typical algorithms of this kind of clustering include kernel K-
means [76], kernel SOM [77], kernel FCM [78], SVC [79], MMC [80] and MKC [81].
The basic idea of kernel K-means, kernel SOM and kernel FCM is to take advantage of
the kernel method and the original clustering algorithm, transforming the original data
into a high dimensional feature space by nonlinear kernel function in order to carry out
the original clustering algorithm. The core idea of SVC is to find the sphere with the
minimum radius that can cover all the data point in the high dimensional feature space,
then map the sphere back into the original data space to form the isoline, namely the
border of clusters, covering the data, and the data in the closed isoline should belong
to the same cluster. MMC tries to find the hyperplane with the maximum margin
to cluster and it can be promoted for the multi-label clustering problem. MKC, an
improvement of MMC, tries to find the best hyperplane based on several kernels to
cluster. MMC and MKC share the limitation of computation to a degree.
For more detailed information about this kind of clustering algorithms, you can
refer to [82–84].
Analysis:
(1) Time complexity (Table 15):
(2) The time complexity of this kind of clustering algorithms is generally high for
the kernel involved in the algorithm;
(3) Advantages: more easy to cluster in the high dimensional feature space, suitable
for data with arbitrary shape, able to analyze the noise and separate the overlapping
clusters, and not needed to have the preliminary knowledge about the topology
of data;
(4) Disadvantages: the clustering result sensitive to the type of kernel and its para-
meters, time complexity being high, and not suitable for large-scale data.

5.2 Clustering Algorithm Based on Ensemble

Clustering algorithm based on ensemble is also called ensemble clustering, of which

the core idea is to generate a set of initial clustering results by a particular method
and the final clustering result is got by integrating the initial clustering results.
There are mainly 4 kinds of methods to get the set of initial clustering results as
follows:

123
Ann. Data. Sci.

Table 16 Consensus functions

Name Typical algorithm and application

Based on co-association matrix [85]

Based on graph partition CSPA, HGPA and MCLA [86]
Based on relabeling and voting VM [88]
Based on the hybrid model [89]
Based on information theory [90]
Based on genetic algorithm HCE [91]
Based on local adaptation LAC [92]
Based on kernel method WPCK [93]
Based on fuzzy theory sCSPA, sMCLA and sHBGPA [94]

(1) For the same data set, employ the same algorithm with the different parameters
or the different initial conditions [85];
(2) For the same data set, employ the different algorithms [86];
(3) For the subsets, carry out the clustering respectively [86];
(4) For the same data set, carry out the clustering in different feature spaces based on
different kernels [87].
The initial clustering results are integrated by means of the consensus function.
The consensus functions can be divided into the following 9 categories, summarized
in Table 16:
For more detailed information about this kind of clustering algorithms, you can
refer to [95].
Analysis:
(1) The time complexity of this kind of algorithm is based on the specific method and
algorithms involved in the algorithm;
(2) Advantages: robust, scalable, able to be parallel and taking advantage of the
strengths of the involved algorithms;
(3) Disadvantages: inadequate understanding about the difference among the initial
clustering results, existing deficiencies of the design of the consensus function.

5.3 Clustering Algorithm Based on Swarm Intelligence

The basic idea of this kind of clustering algorithms is to simulate the changing
process of the biological population. Typical algorithms include the 4 main cate-
gories: ACO_based [96,97], PSO_based [97,98], SFLA_based [99] and ABC_based
[100]. The core idea of LF [101], the typical algprithm of the ACO_based, is that data
is distributed randomly on the grid of two dimensions first, then the data is selected
or not for further operation based on the decision of an ant and this process is iterated
until a satisfactory clustering result is got. The PSO_based algorithms regard the data
point as a particle. The initial clusters of particles is got by the other clustering algo-
rithm first, then the clusters of particles is updated continuously based on the center

123
Ann. Data. Sci.

Table 17 Time complexity

ACO_based (LF) PSO_based SFLA_based ABC_based

High High High High

of clusters and the location and speed of each particle, until a satisfactory clustering
result is got. The core idea of the SFLA_based algorithms is to simulate the infor-
mation interaction of frogs and taking advantage of the local search and the global
information interaction. The core idea of the ABC_based algorithms is to simulate the
foraging behavior of three types of bee, of which the duty is to determine the food
source, in a bee population and making use of the exchange of local information and
global information for clustering.
For more detailed information about this kind of clustering algorithms, you can
refer to [102–104].
Analysis:
(1) Time complexity (Table 17):
(2) The time complexity of this kind of algorithm is high, mainly for the large number
of iterations;
(3) Advantages: algorithm with the character of overcoming being easily drawn into
local optimal and getting the global optimal, easy to understand the algorithm;
(4) Disadvantages: low scalability, low operating efficiency and not suitable for high
dimensional or large-scale data.

5.4 Clustering Algorithm Based on Quantum Theory

The clustering algorithm based on quantum theory is called quantum clustering, of

which the basic idea is to study the distribution law of sample data in the scale space
by studying the distribution law of particles in the energy field. The typical algorithms
of this kind include QC [105,106] and DQC [107]. The core idea of QC (quantum
clustering), suitable for high dimensional data, is to get the potential energy of each
object by Schrodinger Equation using the iterative gradient descent algorithm, regard
the object with low potential energy as the center of the cluster, and put the objects
into different clusters by the defined distance function. DQC, an improvement of QC,
adopts the time-based Schrodinger Equation in order to study the change of the original
data set and the structure of the quantum potential energy function dynamically.
For more detailed information about this kind of clustering algorithms, you can
refer to [108–110].
Analysis:
(1) Time complexity (Table 18):
(2) The time complexity of QC is high, for the process of solving the Schrodinger
Equation and the large number of iterations;
(3) The time complexity of DQC which is more practical compared with DQ, is
middle for the process of solving the Schrodinger Equation;

123
Ann. Data. Sci.

Table 18 Time complexity

QC DQC

(Schrodinger Equation + a large number of iterations) (Schrodinger Equation)

High Middle

(4) Advantages: the number of parameters involved in this kind of algorithm being
small, the determination of the center of a cluster based on the potential informa-
tion of sample data;
(5) Disadvantages: the clustering result sensitive to the parameters of the algorithm,
the algorithm model not able to describe the change law of data completely.

5.5 Clustering Algorithm Based on Spectral Graph Theory

The basic idea of this kind of clustering algorithms is to regard the object as the
vertex and the similarity among objects as the weighted edge in order to transform
the clustering problem into a graph partition problem. And the key is to find a method
of graph partition making the weight of connection between different groups small
as much as possible and the total weight of connection among the edges within the
same group high as much as possible [111]. The typical algorithms of this kind of
clustering can be mainly divided into two categories, recursive spectral and multiway
spectral and the typical algorithms of this two categories are SM [50] and NJW [51]
respectively. The core idea of SM which is usually used for image segmentation is to
minimize Normalized Cut by heuristic method, based on the eigenvector. And NJW
carries out the clustering analysis in the feature space constructed by the eigenvectors
corresponding to the k largest eigenvalues of the Laplacian matrix.
For more detailed information about this kind of clustering algorithms, you can
refer to [51,84,112–114].
Analysis:

(1) Time complexity (Table 19):

(2) The time complexity of SM is high, for the process of figuring out the eigenvectors
and the heuristic method involved in the algorithm;
(3) The time complexity of NJW is high, for the process of figuring out the eigen-
vectors;
(4) Advantages: suitable for the data set with arbitrary shape and high dimension,
converged to the global optimal, only the similarity matrix needed as the input,
and not sensitive to the outliers;
(5) Disadvantages: the clustering result sensitive to the scaling parameter, time com-
plexity relatively high, unclear about the construction of similarity matrix, the
selection of eigenvector not optimized and the number of clusters needed to be
preset.

123
Ann. Data. Sci.

Table 19 Time complexity

SM NJW

(Eigenvector + heuristic method) (Eigenvector)

High High

5.6 Clustering Algorithm Based on Affinity Propagation

AP (affinity propagation clustering) is a significant algorithm, which was proposed in

Science in 2007. The core idea of AP is to regard all the data points as the potential
cluster centers and the negative value of the Euclidean distance between two data
points as the affinity. So, the sum of the affinity of one data point for other data
points is bigger, the probability of this data point to be the cluster center is higher. AP
algorithm takes the greedy strategy which maximizes the value of the global function
of the clustering network during every iteration [15].
For more detailed information about this kind of clustering algorithms, you can
refer to [115–117].
Analysis:
(1) The time complexity of AP is O(nˆ2*logn);
(2) Advantages: simply and clear algorithm idea, insensitive to the outliers and the
number of clusters not needed to be preset;
(3) Disadvantages: high time complexity, not suitable for very large data set, and the
clustering result sensitive to the parameters involved in AP algorithm.

5.7 Clustering Algorithm Based on Density and Distance

DD (Density and distance-based clustering) is another significant clustering algorithm

proposed in Science in 2014 [118], of which the core idea is novel. And the main
characteristic of DD is for the description of the cluster center, which is shown as
follows:
(1) with high local density: the number of data points near the cluster center within
a certain scope must be big enough;
(2) away from other data points with high local density: cluster center must be away
from other data points that could be the center of a cluster.
The core idea of DD is to figure out, based on the distance function, the local density
of each data point and the shortest distance among each data point and other data
points with higher local density in order to construct the decision graph first, select
the cluster centers based on the decision graph then, and put the remaining data points
into the nearest cluster with higher local density at last.
Analysis:
(1) The time complexity of DD is O(nˆ2);
(2) Advantages: simply and clear algorithm idea, suitable for the data set with arbi-
trary shape and insensitive to the outliers;

123
Ann. Data. Sci.

Table 20 Time complexity

DBSCAN STING Wavecluster CLARANS

O(n*logn) O(n) O(n) O(nˆ2)

Middle Low Low High

(3) Disadvantages: relatively high time complexity, relatively strong subjectivity for
the selection of the cluster center based on the decision graph and the clustering
result sensitive to the parameters involved in DD algorithm.

5.8 Clustering Algorithm for Spatial Data

Spatial data refers to the data with the two dimensions, time and space, at the same time,
sharing the characteristics of large in scale, high in speed and complex in information.
The typical algorithms of this kind of clustering include DBSCAN [36], STING [52],
Wavecluster [54] and CLARANS [11]. The core idea of Wavecluster which can be
used for parallel processing is to carry out the clustering in the new feature space by
applying the Wavelet Transform to the original data. And the core idea of CLARANS
is to sample based on CLARA [10] and carry out clustering by PAM [9]. DBSCAN
has been discussed in the section Clustering algorithm based on density and STING
has been discussed in the section Clustering algorithm based on grid.
For more detailed information about this kind of clustering algorithms, you can
refer to [119–122], ST-DBSCAN [123].
Time complexity (Table 20):

5.9 Clustering Algorithm for Data Stream

Data stream shares the characteristics of arriving based on sequence, large in scale and
limited frequency of reading. The typical algorithms of this kind of clustering include
STREAM [124], CluStream [125], HPStream [126], DenStream [127] and the latter
three are incremental algorithms. STREAM, based on the idea of divide and conquer,
deals with the data successively according to the sequence of data arriving in order to
construct the hierarchical clustering structure. CluStream, which mainly deals with the
shortcoming of STREAM that only describing the original data statically, regards data
as a dynamic changing process. So CluStream can not only give the timely response for
a request, but it also gives the clustering result in terms of different time granularities
by figuring out the Micro-clusters online and offline. HPStream, an improvement of
CluStream, takes the attenuation of data’s influence over time into consideration and
is more suitable for clustering data with high dimension. DenStream, which takes the
core idea of the clustering algorithm based on density, is suitable for the nonconvex
data set and can deal with outliers efficiently, compared with the algorithms mentioned
above in this section.
For more detailed information about this kind of clustering algorithms, you can
refer to [128–131], D-Stream [41,132].
Time complexity (Table 21):
The time complexity of CluStream, HPStream and DenStream is involved in the
online and offline processes.

123
Ann. Data. Sci.

Table 21 Time complexity

STREAM CluStream HPStream DenStream

O(kn) (online and offline processes)

Low Low

5.10 Clustering Algorithm for Large-Scale Data

Big data shares the characteristics of 4 V’s, large in volume, rich in variety, high in
velocity and doubt in veracity [133]. The main basic ideas of clustering for big data
can be summarized in the following 4 categories:
(1) sample clustering [10,18];
(2) data merged clustering [17,134];
(3) dimension-reducing clustering [135,136];
(4) parallel clustering [114,137–139];
Typical algorithms of this kind of clustering are K-means [7], BIRCH [17], CLARA
[10], CURE [18], DBSCAN [36], DENCLUE [43], Wavecluster [54] and FC [59].
For more detailed information about this kind of clustering algorithms, you can
refer to [2,13,140,141].
The time complexity of DENCLUE is O(nlogn) and the complexities of K-means,
BIRCH, CLARA, CURE, DBSCAN, Wavecluster and FC have been described before
in other sections.

6 Conclusions

This paper starts at the basic definitions of clustering and the typical procedure, lists the
commonly used distance (dissimilarity) functions, similarity functions, and evaluation
indicators that lay the foundation of clustering, and analyzes the clustering algorithms
from two perspectives, the traditional ones that contain 9 categories including 26
algorithms and the modern ones that contain 10 categories including 45 algorithms.
The detailed and comprehensive comparisons of all the discussed clustering algorithms
are summarized in Appendix Table 22.
The main purpose of the paper is to introduce the basic and core idea of each
commonly used clustering algorithm, specify the source of each one, and analyze the
advantages and disadvantages of each one. It is hard to present a complete list of all the
clustering algorithms due to the diversity of information, the intersection of research
fields and the development of modern computer technology. So 19 categories of the
commonly used clustering algorithms, with high practical value and well studied, are
selected and one or several typical algorithm(s) of each category is(are) discussed in
detail so as to give readers a systematical and clear view of the important data analysis
method, clustering.
Acknowledgments This work has been partially supported by grants form National Natural Science Foun-
dations of China (Nos. 61472390, 11271361, 71331005, and 11226089), Major International (Regional)
Joint Research Project (No. 71110107026).

Appendix

123
123
Table 22 The detailed and comprehensive comparisons of all the discussed clustering algorithms

Category Typical Complexity Scalability For large- For high Shape Sensitive Sensitive to References
algorithm (time) scale dimensional of to the noise/out
data data suitable sequence of lier
data set inputting
data

Based on K-means Low 0(knt) Middle Yes no Convex Highly Highly [7]
partition
K-medoids High O(k(n-k)ˆ2) Low No No Convex Moderately Little [8]
PAM High 0(kˆ3*nˆ2) Low No No Convex Moderately Little [9]
CLARA Middle High Yes No Convex Moderately Little [10]
0(ksˆ2+k(n-k))
CLARANS High 0(nˆ2) Middle Yes No Convex Highly Little [11]
AP * * * * * * * [15]
Based on BIRCH Low 0(n) High Yes No Convex Moderately Little [17]
hierarchy
CURE Low 0(sˆ2*logs), High Yes Yes Arbitrary Moderately Little [18]
ROCK High 0(nˆ2*logn) Middle No Yes Arbitrary Moderately Little [19]
Chameleon High 0(nˆ2) High No No Arbitrary Moderately Little [20]
Based on fuzzy FCM Low 0(n) Middle No No Convex Moderately Highly [23–25]
theory
FCS High (kernel) Low No No Arbitrary Moderately Highly [26]
MM Middle 0(vˆ2*n) Low No No Arbitrary Moderately Little [27]
Ann. Data. Sci.
Ann. Data. Sci.

Table 22 continued

Based on DBCLASD Middle 0(n*logn) Middle Yes Yes Arbitrary Little Little [31]
distribution
GMM High 0(nˆ2*kt) High No No Arbitrary Highly Little [32]
Based on DBSCAN Middle 0(n*logn) Middle Yes No Arbitrary Moderately Little [36]
density
OPTICS Middle 0(n*logn) Middle Yes No Arbitrary Little Little [37]
Mean-shift High (kernel) Low No No Arbitrary Little Little [38]
DENCLUE * * * * * * * [43]
Based on graph CLICK Low 0(k*f(v,e)) High Yes No Arbitrary Highly Highly [44]
theory
MST Middle 0(e*logv) High Yes No Arbitrary Highly Highly [45]
SM * * * * * * * [50]
NJW * * * * * * * [51]

123
Table 22 continued

123
Category Typical Complexity Scalability For large- For high Shape Sensitive Sensitive to References
algorithm (time) scale dimensional of to the noise/out
data data suitable sequence of lier
data set inputting
data

Based on grid STING Low 0(n) High Yes Yes Arbitrary Little Little [52]
CLIQUE Low 0(n+kˆ2) High No Yes Convex Little Moderately [53]
Wavecluster * * * * * * * [54]
Based on FC Low 0(n) High Yes Yes Arbitrary Highly Little [59]
fractal
theory
Based on model COBWEB Low (distribution) Middle Yes No Arbitrary Little Moderately [64]
GMM * * * * * * * [32]
SOM High (layer) Low No Yes Arbitrary Little Little [65]
ART Middle (type+layer) High Yes No Arbitrary Highly Highly [66–69]
Based on kernel kernel K-means High (kernel) Middle No No Arbitrary Moderately Little [76]
kernel SOM High (kernel) High No No Arbitrary Little Little [77]
kernel FCM High (kernel) Middle No No Arbitrary Moderately Little [78]
SVC High (kernel) Low No No Arbitrary Little Little [79]
MMC High (kernel) Low No No Arbitrary Little Little [80]
MKC High (kernel) Low No No Arbitrary Little Little [81]
Based on NA NA NA NA NA NA NA NA [85–94]
ensemble
Ann. Data. Sci.
Table 22 continued

Category Typical Complexity (time) Scalability For For high Shape Sensitive Sensitive to References
algorithm large- dimensional of to the noise/out
scale data suitable sequence of lier
Ann. Data. Sci.

data data set inputting

data

Based on swarm ACO_based High Low No No Arbitrary Little Highly [101]

intelligence (LF) (iterations)
PSO_based High Low No No Arbitrary Moderately moderately [98]
(iterations)
SFLA_based High Low No No Arbitrary Moderately moderately [97,99]
(iterations)
ABC_based High Low No No Arbitrary Moderately moderately [100]
(iterations)
Based on QC High Middle No No Convex Little Little [105,106]
quantum (Schrodinger
theory Equation
+iterations)
DQC Middle Middle No No Convex Little Little [107]
(Schrodinger
Equation)
Spectral SM High Middle No Yes Arbitrary Little Little [50]
clustering (eigenvector
+heuristics)
NJW High Middle No Yes Arbitrary Little Little [51]
(eigenvector)
Based on AP High Low No No Convex Moderately Little [15]
affinity 0(nˆ2*logn)
propagation
Based on DD High 0(nˆ2) Low No No Arbitrary Little Little [118]
density and
distance

123
Table 22 continued

Category Typical Complexity Scalability For large- For high Shape Sensitive Sensitive to References

123
algorithm (time) scale dimensional of to the noise/out
data data suitable sequence of lier
data set inputting
data

For spatial DBSCAN * * * * * * * [36]

data
STING * * * * * * * [52]
Wavecluster Low 0(n) High Yes No Arbitrary Little [54]
CLARANS * * * * * * * [11]
For data STREAM Low 0(kn) Middle Yes No Convex Highly Highly [124]
stream
CluStream Low (online+offline)High Yes No Convex Highly Highly [125]
HPStream High Yes Yes Convex Highly Highly [126]
DenStream High Yes No Arbitrary Highly Little [127]
For K-means * * * * * * * [7]
large-scale
data
BIRCH * * * * * * * [17]
CLARA * * * * * * * [10]
CURE * * * * * * * [18]
DBSCAN * * * * * * * [36]
DENCLUE Middle 0(nlogn) Middle Yes Yes Arbitrary Moderately Little [43]
Wavecluster * * * * * * * [54]
FC * * * * * * * [59]

Label NA in the row of algorithm based on ensemble indicates that the evaluation value depends on the specific selected method/model/algorithm
Label * indicates that this algorithm has been or will be discussed in other section
Ann. Data. Sci.
Ann. Data. Sci.

References
1. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle River
2. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678
3. Everitt B, Landau S, Leese M (2001) Clustering analysis, 4th edn. Arnold, London
4. Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871
5. Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explor
Newsl 4:65–75
6. Färber I, Günnemann S, Kriegel H, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using
class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering,
summarizing and using multiple clusterings held in conjunction with KDD, Washington, DC
7. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc
Fifth Berkeley Symp Math Stat Probab 1:281–297
8. Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl
36:3336–3341
9. Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam). Finding groups in
data: an introduction to cluster analysis. Wiley, Hoboken
10. Kaufman L, Rousseeuw P (2008) Finding groups in data: an introduction to cluster analysis, vol 344.
Wiley, Hoboken. doi:10.1002/9780470316801
11. Ng R, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans
Knowl Data Eng 14:1003–1016
12. Boley D, Gini M, Gross R, Han E, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1999)
Partitioning-based clustering for web document categorization. Decis Support Syst 27:329–341
13. Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666
14. Velmurugan T, Santhanam T (2011) A survey of partition based clustering algorithms in data mining:
an experimental approach. Inf Technol J 10:478–484
15. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science
315(5814):972–976
16. Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32:241–254
17. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large
databases. ACM SIGMOD Rec 25:103–104
18. Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. ACM
SIGMOD Rec 27:73–84
19. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes.
In: Proceedings of the 15th international conference on data engineering, pp 512-521
20. Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling.
Computer 32:68–75
21. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26:354–
359
22. Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering
methods. J Mach Learn Res 11:1425–1470
23. Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-
separated clusters. J Cybern 3:32–57
24. Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York
25. Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci
10:191–203
26. Dave R, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipses. IEEE Trans
Neural Netw 3:643–662
27. Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man
Cybern 24:1279–1284
28. Yang M (1993) A survey of fuzzy clustering. Math Comput Model 18:1–16
29. Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE
Trans Syst Man Cybern Part B 29:778–785
30. Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recog-
nition. Wiley, Hoboken

123
Ann. Data. Sci.

31. Xu X, Ester M, Kriegel H, Sander J (1998) A distribution-based clustering algorithm for mining in
large spatial databases. In: Proceedings of the fourteenth international conference on data engineering,
pp 324-331
32. Rasmussen C (1999) The infinite Gaussian mixture model. Adv Neural Inf Process Syst 12:554–560
33. Preheim S, Perrotta A, Martin-Platero A, Gupta A, Alm E (2013) Distribution-based clustering: using
ecology to refine the operational taxonomic unit. Appl Environ Microbiol 79:6593–6603
34. Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution
similarity. IEEE Trans Knowl Data Eng 25:751–763
35. Kriegel H, Kröger P, Sander J, Zimek A (2011) Densitybased clustering. Wiley Interdiscip Rev
1:231–240
36. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large
spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference
on knowledge discovery and data mining, pp 226–231
37. Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering
structure. In: Proceedings on 1999 ACM SIGMOD international conference on management of data,
vol 28, pp 49–60
38. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE
Trans Pattern Anal Mach Intell 24:603–619
39. Januzaj E, Kriegel H, Pfeifle M (2004) Scalable density-based distributed clustering. In: Proceedings
of the 8th european conference on principles and practice of knowledge discovery in databases, pp
231–244
40. Kriegel H, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh
ACM SIGKDD international conference on knowledge discovery in data mining, pp 672–677
41. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th
ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
42. Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with
noise. Inf Syst 32:978–986
43. Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with
noise. In Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery
and data mining 98: 58–65
44. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression
analysis. In: Proc international conference intelligent systems molecular biolgy, pp 307–316
45. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323
46. Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6:281–
297
47. Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inf Process Lett
76:175–181
48. Estivill-Castro V, Lee I (2000) Amoeba: hierarchical clustering based on spatial proximity using
delaunay diagram. In: Proceedings of the 9th international symposium on spatial data handling,
Beijing
49. Cherng J, Lo M (2001) A hypergraph based clustering algorithm for spatial data sets. In: Proceedings
of the 2001 IEEE international conference on data mining, pp 83–90
50. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell
22:888–905
51. Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf
Process Syst 2:849–856
52. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data
mining. In VLDB, pp 186–195
53. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high
dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international
conference on management of data, vol 27, pp 94–105
54. Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: A multi-resolution clustering approach
for very large spatial databases. In: VLDB, pp 428–439
55. Ma E, Chow T (2004) A new shifting grid clustering algorithm. Pattern Recognit 37:503–514
56. Park N, Lee W (2004) Statistical grid-based clustering over data streams. ACM SIGMOD Rec 33:32–
37

123
Ann. Data. Sci.

57. Pilevar A, Sukumar M (2005) GCHL: a grid-clustering algorithm for high-dimensional very large
spatial data bases. Pattern Recognit Lett 26:999–1010
58. Mandelbrot B (1983) The fractal geometry of nature. Macmillan, London
59. Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth
ACM SIGKDD international conference on knowledge discovery and data mining, pp 260–264
60. Zhang A, Cheng B, Acharya R (1996) A fractal-based clustering approach in large visual database
systems. In Representation and retrieval of visual media in, multimedia systems, pp 49–68
61. Menascé D, Abrahao B, Barbará D, Almeida V, Ribeiro F (2002) Fractal characterization of web
workloads. In: Proceedings of the “ Web Engineering” Track of WWW2002, pp 7–11
62. Barry R, Kinsner W (2004) Multifractal characterization for classification of network traffic. Conf
Electr Comput Eng 3:1453–1457
63. Al-Shammary D, Khalil I, Tari Z (2014) A distributed aggregation and fast fractal clustering approach
for SOAP traffic. J Netw Comput Appl 41:1–14
64. Fisher D (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172
65. KohonenKohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480
66. Carpenter G, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern
recognition machine. Comput Vis Gr Image Process 37:54–115
67. Carpenter G, Grossberg S (1988) The ART of adaptive pattern recognition by a self-organizing neural
network. Computer 21:77–88
68. Carpenter G, Grossberg S (1987) ART 2: self-organization of stable category recognition codes for
analog input patterns. Appl Opt 26:4919–4930
69. Carpenter G, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-
organizing pattern recognition architectures. Neural Netw 3:129–152
70. Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods.
Mach Learn 42:9–29
71. Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J
Am Stat Assoc 97:611–631
72. McLachlan G, Bean R, Peel D (2002) A mixture model-based approach to the clustering of microarray
expression data. Bioinformatics 18:413–422
73. Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene
expression profiles. Bioinformatics 18:1194–1206
74. Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–
1037
75. McNicholas P, Murphy T (2010) Model-based clustering of microarray expression data via latent
Gaussian mixture models. Bioinformatics 26:2705–2712
76. Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem.
Neural Comput 10:1299–1319
77. MacDonald D, Fyfe C (2000) The kernel self-organising map. Proc Fourth Int Conf Knowl-Based
Intell Eng Syst Allied Technol 1:317–320
78. Wu Z, Xie W,Yu J (2003) Fuzzy c-means clustering algorithm based on kernel method. In: Proceedings
of the fifth ICCIMA, pp 49–54
79. Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2002) Support vector clustering. J Mach Learn Res
2:125–137
80. Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Advances in
neural information processing systems, pp 1537–1544
81. Zhao B, Kwok J, Zhang C (2009) Multiple kernel clustering. In SDM, pp 638–649
82. Müller K, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning
algorithms. IEEE Trans Neural Netw 12:181–201
83. Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13:780–
784
84. Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for
clustering. Pattern Recognit 41:176–190
85. Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans
Pattern Anal Mach Intell 27:835–850
86. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple
partitions. J Mach Learn Res 3:583–617

123
Ann. Data. Sci.

87. Fern X, Brodley C (2003) Random projection for high dimensional data clustering: a cluster ensemble
approach. ICML 3:186–193
88. Dimitriadou E, Weingessel A, Hornik K (2001) Voting-merging: an ensemble method for clustering.
In: ICANN, pp 217–224
89. Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of
the SIAM international conference on data mining, pp 379
90. Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions.
IEEE Trans Pattern Anal Mach Intell 27:1866–1881
91. Yoon H, Ahn S, Lee S, Cho S, Kim J (2006) Heterogeneous clustering ensemble method for combining
different cluster results. In: Data mining for biomedical applications, pp 82–92
92. Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive
metrics for clustering high dimensional data. Data Min Knowl Discov 14:63–97
93. Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels.
Pattern Recognit 43:2712–2724
94. Punera K, Ghosh J (2008) Consensus-based ensembles of soft clusterings. Appl Artif Intell 22:780–
810
95. Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern
Recognit Artif Intell 25:337–372
96. Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1:95–113
97. Abraham A, Das S, Roy S (2008) Swarm intelligence algorithms for data clustering. In: Soft computing
for knowledge discovery and data mining, pp 279–313
98. Van der Merwe D, Engelbrecht A (2003) Data clustering using particle swarm optimization. Congr
Evol Comput 1:215–220
99. Amiri B, Fathian M, Maroosi A (2009) Application of shuffled frog-leaping algorithm on clustering.
Int J Adv Manuf Technol 45:199–209
100. Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm.
Appl Soft Comput 11:652–657
101. Lumer E, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int
Conf Simul Adapt Behav 3:501–508
102. Shelokar P, Jayaraman V, Kulkarni B (2004) An ant colony approach for clustering. Anal Chim Acta
509:187–195
103. Karaboga D, Akay B (2009) A survey: algorithms simulating bee swarm intelligence. Artif Intell Rev
31:61–85
104. Xu R, Xu J, Wunsch D (2012) A comparison study of validity indices on swarm-intelligence-based
clustering. IEEE Trans Syst Man Cybern Part B 42:1243–1256
105. Horn D, Gottlieb A (2001) Algorithm for data clustering in pattern recognition problems based on
quantum mechanics. Phys Rev Lett 88:018702
106. Horn D, Gottlieb A (2001) The method of quantum clustering. In: Advances in neural information
processing systems, pp 769–776
107. Weinstein M, Horn D (2009) Dynamic quantum clustering: a method for visual exploration of struc-
tures in data. Phys Rev E 80:066117
108. Horn D (2001) Clustering via Hilbert space. Phys A 302:70–79
109. Horn D, Axel I (2003) Novel clustering algorithm for microarray expression data in a truncated SVD
space. Bioinformatics 19:1110–1115
110. Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: ICML, pp 1–8
111. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416
112. Yu S, Shi J (2003) Multiclass spectral clustering. In: Proceedings of the ninth IEEE international
conference on computer vision, pp 313–319
113. Verma D, Meila M (2003) A comparison of spectral clustering algorithms. University of Washington
Tech Rep UWCSE030501 1: 1–18
114. Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems.
IEEE Trans Pattern Anal Mach Intell 33:568–586
115. Lu Z, Carreira-Perpinan M (2008) Constrained spectral clustering through affinity propagation. In:
IEEE conference on computer vision and pattern recognition, pp 1–8
116. Givoni I, Frey B (2009) A binary variable model for affinity propagation. Neural Comput 21:1589–
1600

123
Ann. Data. Sci.

117. Shang F, Jiao L, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel
approach. Pattern Recognit 45:474–486
118. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–
1496
119. Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB, pp
144–155
120. Sander J, Ester M, Kriegel H, Xu X (1998) Density-based clustering in spatial databases: the algorithm
gdbscan and its applications. Data Min Knowl Discov 2:169–194
121. Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of the seventh
ACM SIGKDD international conference on knowledge discovery and data mining, pp 281–286
122. Zaïane O, Lee C (2002) Clustering spatial data when facing physical constraints. In: Proceedings of
the IEEE international conference on data mining, pp 737–740
123. Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl
Eng 60:208–221
124. O’callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for
high-quality clustering. In: ICDE, p 0685
125. Aggarwal C, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In:
VLDB, pp 81–92
126. Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional
data streams. In: VLDB, pp 852–863
127. Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with
noise. SDM 6:328–339
128. Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Proceedings of
the 41st annual symposium on foundations of computer science, pp 359–366
129. Barbará D (2002) Requirements for clustering data streams. ACM SIGKDD Explor Newsl 3:23–27
130. Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory
and practice. IEEE Trans Knowl Data Eng 15:515–528
131. Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58:180–
204
132. Silva J, Faria E, Barros R, Hruschka E, de Carvalho A, Gama J (2013) Data stream clustering: a
survey. ACM Comput Surv 46:13
133. Leskovec J, Rajaraman A, Ullman JD (2014) Mining massive datasets. Cambridge University Press,
Cambridge
134. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD
Workshop Text Min 400:525–526
135. Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM
SIGKDD Explor Newsl 6:90–105
136. Kriegel H, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clus-
tering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3:1
137. Judd D, McKinley P, Jain A (1996) Large-scale parallel data clustering. In: Proceedings of the 13th
international conference on pattern recognition, vol 4, pp 488–493
138. Tasoulis D, Vrahatis M (2004) Unsupervised distributed clustering. In: Parallel and distributed com-
puting and networks, pp 347–351
139. Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Cloud computing,
pp 674–679
140. Herwig R, Poustka A, Müller C, Bull C, Lehrach H, O’Brien J (1999) Large-scale clustering of
cDNA-fingerprinting data. Genome Res 9:1093–1105
141. Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl
Inf Syst 5:387–415

123
View publication stats

MIT6 006S20 Ps2-Solutions
No ratings yet
MIT6 006S20 Ps2-Solutions
10 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Aiml 5th Module Part2
No ratings yet
Aiml 5th Module Part2
28 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
No ratings yet
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
35 pages
dm 4
No ratings yet
dm 4
76 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Ijcttjournal V1i1p12
No ratings yet
Ijcttjournal V1i1p12
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
60 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Clustering
No ratings yet
Clustering
5 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
No ratings yet
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
24 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
M4 - Clustering
No ratings yet
M4 - Clustering
43 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
AIML Chapter 13
No ratings yet
AIML Chapter 13
26 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Data Clustering A Review
No ratings yet
Data Clustering A Review
60 pages
45
No ratings yet
45
16 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
No ratings yet
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
59 pages
Ambo University Inistitute of Technology Department of Computer Science
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
13 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Supervised Learning vs. Unsupervised Learning
No ratings yet
Supervised Learning vs. Unsupervised Learning
7 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
Clustering and Association Rule
No ratings yet
Clustering and Association Rule
69 pages
Sine Cosine Based Algorithm For Data Clustering
No ratings yet
Sine Cosine Based Algorithm For Data Clustering
5 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
Data Clustering: A Review
No ratings yet
Data Clustering: A Review
60 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
No ratings yet
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
34 pages
02 - Clustering
No ratings yet
02 - Clustering
43 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
66 pages
(PML ITS - Week 10) - Clustering
No ratings yet
(PML ITS - Week 10) - Clustering
42 pages
Clustering Theory Applications and Algorithms
No ratings yet
Clustering Theory Applications and Algorithms
9 pages
Data Mining: Clustering
No ratings yet
Data Mining: Clustering
46 pages
Research on k Mean Algorithm
No ratings yet
Research on k Mean Algorithm
5 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
07Clustering
No ratings yet
07Clustering
34 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Cluster
100% (1)
Cluster
72 pages
CCPS521 WIN2023 Week08 Clustering
No ratings yet
CCPS521 WIN2023 Week08 Clustering
15 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
ML Unit-4-1
No ratings yet
ML Unit-4-1
39 pages
Clustering
No ratings yet
Clustering
29 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Conformal Mapping
From Everand
Conformal Mapping
Zeev Nehari
4/5 (1)
CDAC - Common Admission Test Syllabus Section A - English (20 Questions)
No ratings yet
CDAC - Common Admission Test Syllabus Section A - English (20 Questions)
5 pages
Unit 8 Sorting
No ratings yet
Unit 8 Sorting
37 pages
22CY205-AA
No ratings yet
22CY205-AA
3 pages
Data Structures: J.B. Institute of Engineering and Technology
No ratings yet
Data Structures: J.B. Institute of Engineering and Technology
270 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
36 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
EXIT EXAM 2017
No ratings yet
EXIT EXAM 2017
66 pages
Master's Theorem
No ratings yet
Master's Theorem
4 pages
15 Class Design
No ratings yet
15 Class Design
12 pages
Detecting The Fault From Spectrograms by Using Genetic Algorithm Techniques
No ratings yet
Detecting The Fault From Spectrograms by Using Genetic Algorithm Techniques
10 pages
CS502 Finaltermsolved Mcqswithreferencesby Moaaz
No ratings yet
CS502 Finaltermsolved Mcqswithreferencesby Moaaz
43 pages
C ++
No ratings yet
C ++
187 pages
The Islamia University of Bahawalpur: Minutes
No ratings yet
The Islamia University of Bahawalpur: Minutes
3 pages
Big O Notation: 1 Comparing Algorithms
No ratings yet
Big O Notation: 1 Comparing Algorithms
6 pages
Dsa Basic Data Structure
No ratings yet
Dsa Basic Data Structure
72 pages
Adsa-1 Unit
No ratings yet
Adsa-1 Unit
44 pages
Daa Unit I
No ratings yet
Daa Unit I
15 pages
Common Functions Used in Analysis
No ratings yet
Common Functions Used in Analysis
7 pages
Data Structures & Algorithms Cheatsheet
No ratings yet
Data Structures & Algorithms Cheatsheet
5 pages
Tutorial 02
No ratings yet
Tutorial 02
1 page
9.5 Shapley Values: 9.5.1 General Idea
No ratings yet
9.5 Shapley Values: 9.5.1 General Idea
14 pages
Bcs 042
No ratings yet
Bcs 042
15 pages
Data Structures And Algorithms MCQ Questions
No ratings yet
Data Structures And Algorithms MCQ Questions
9 pages
Analys and Design of Algorithm
No ratings yet
Analys and Design of Algorithm
3 pages
Instant download Algorithmic number theory lattices number fields curves and cryptography 1st Edition J.P. Buhler pdf all chapter
100% (3)
Instant download Algorithmic number theory lattices number fields curves and cryptography 1st Edition J.P. Buhler pdf all chapter
60 pages
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
No ratings yet
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
26 pages
Preview - Looking For A Challenge
No ratings yet
Preview - Looking For A Challenge
48 pages
Time and Space Complexity
No ratings yet
Time and Space Complexity
13 pages
Assignment 1
No ratings yet
Assignment 1
18 pages

A Comprehensive Survey of Clustering Algorithms

Uploaded by

A Comprehensive Survey of Clustering Algorithms

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Comprehensive Survey of Clustering Algorithms

Article in Annals of Data Science · August 2015

Dongkuan Xu Yingjie Tian

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

A Comprehensive Survey of Clustering Algorithms

Dongkuan Xu1,2 · Yingjie Tian2,3

Received: 25 May 2015 / Revised: 18 July 2015 / Accepted: 31 July 2015

Abstract Data analysis is used as a common method in modern science research,

Keywords Clustering · Clustering algorithm · Clustering analysis · Survey ·

1 School of Mathematical Sciences, University of Chinese Academy of Sciences,

Clustering, considered as the most important question of unsupervised learning, deals

2 Distance and Similarity

Table 1 Distance functions

Name Formula Explanation

4 Traditional Clustering Algorithms

Table 2 Similarity functions

Name Function formula or Explanation

Jaccard similarity J (A, B) = |A∩B|

4.1 Clustering Algorithm Based on Partition

Table 3 Evaluation indicators

Name Formula or measure method Explanation

4.2 Clustering Algorithm Based on Hierarchy

Table 4 Evaluation indicators

Name Formula or measure method Explanation

Rand indicator T P+T N

Table 5 Traditional algorithms

Clustering algorithm based K-means, K-medoids, PAM,

Table 6 Time complexity

K-means K-medoids PAM CLARA CLARANS

O(knt) O(k(n-k)ˆ2) O(kˆ3*nˆ2) O(ksˆ2+k(n-k)) O(nˆ2)

Table 7 Time complexity

O(n) O(sˆ2*s), O(nˆ2*logn) O(nˆ2)

4.3 Clustering Algorithm Based on Fuzzy Theory

Table 8 Time complexity

O(n) (kernel) O(vˆ2*n)

Table 9 Time complexity

4.4 Clustering Algorithm Based on Distribution

Table 10 Time complexity

O(n*logn) O(n*logn) (kernel)

4.5 Clustering Algorithm Based on Density

(1) Time complexity (Table 10):

4.6 Clustering Algorithm Based on Graph Theory

According to this kind of clustering algorithms, clustering is realized on the graph

Table 11 Time complexity

O(k*f(v, e)) O(e*logv)

Table 12 Time complexity

4.7 Clustering Algorithm Based on Grid

4.8 Clustering Algorithm Based on Fractal Theory

4.9 Clustering Algorithm Based on Model

Table 13 Time complexity

(distribution) O(nˆ2*kt) (layer) (type+layer)

5 Modern Clustering Algorithms

Table 14 Modern algorithms

Category Typical algorithm

Clustering algorithm based on kernel K-means, kernel SOM, kernel

Table 15 Time complexity

kernel K-means kernel SOM kernel FCM SVC MMC MKC

(kernel) (kernel) (kernel) (kernel) (kernel) (kernel)

5.1 Clustering Algorithm Based on Kernel

5.2 Clustering Algorithm Based on Ensemble

Clustering algorithm based on ensemble is also called ensemble clustering, of which

Table 16 Consensus functions

Name Typical algorithm and application

Based on co-association matrix [85]

5.3 Clustering Algorithm Based on Swarm Intelligence

Table 17 Time complexity

ACO_based (LF) PSO_based SFLA_based ABC_based

High High High High

5.4 Clustering Algorithm Based on Quantum Theory

The clustering algorithm based on quantum theory is called quantum clustering, of

Table 18 Time complexity

(Schrodinger Equation + a large number of iterations) (Schrodinger Equation)

5.5 Clustering Algorithm Based on Spectral Graph Theory

(1) Time complexity (Table 19):

O(n) O(sˆ2s), O(nˆ2logn) O(nˆ2)

O(nlogn) O(nlogn) (kernel)

O(kf(v, e)) O(elogv)