Automatic_clustering_algorithms_a_systematic_revie
Automatic_clustering_algorithms_a_systematic_revie
https://doi.org/10.1007/s00521-020-05395-4
(0123456789().,-volV)(0123456789().
,- volV)
REVIEW
Received: 5 August 2020 / Accepted: 24 September 2020 / Published online: 10 October 2020
Springer-Verlag London Ltd., part of Springer Nature 2020
Abstract
Cluster analysis is an essential tool in data mining. Several clustering algorithms have been proposed and implemented,
most of which are able to find good quality clustering results. However, the majority of the traditional clustering algo-
rithms, such as the K-means, K-medoids, and Chameleon, still depend on being provided a priori with the number of
clusters and may struggle to deal with problems where the number of clusters is unknown. This lack of vital information
may impose some additional computational burdens or requirements on the relevant clustering algorithms. In real-world
data clustering analysis problems, the number of clusters in data objects cannot easily be preidentified and so determining
the optimal amount of clusters for a dataset of high density and dimensionality is quite a difficult task. Therefore,
sophisticated automatic clustering techniques are indispensable because of their flexibility and effectiveness. This paper
presents a systematic taxonomical overview and bibliometric analysis of the trends and progress in nature-inspired
metaheuristic clustering approaches from the early attempts in the 1990s until today’s novel solutions. Finally, key issues
with the formulation of metaheuristic algorithms as a clustering problem and major application areas are also covered in
this paper.
123
which clustering has found relevance are social network and daunting. Therefore, automatic clustering techniques
analysis, customer segmentation, market research, image have emerged as a promising solution to these problems.
segmentation, biological data analysis, data summarization, Automatic data clustering has the same outcome as the
image retrieval, machine learning, data mining, and data traditional clustering, with the advantage that we do not
analysis [4–10, 15, 82, 98, 118]. Jain [106] reviewed some need to have any background information relating to the
of the principal methods of data clustering and also pre- data objects in question. This then essentially accommo-
sented the evolution of and trends in data clustering that dates real-world scenarios, which inevitably involve large
have been developed over the last few decades. According and complex data requiring to be accurately partitioned
to Ramadas and Abraham [178], the process of clustering into small clusters. Furthermore, real-life data sets are
data occurs in seven different ways: specifically, data usually unlabeled, so the manual identification and classi-
collection, initial screening of data, data representation, fication of the data points is nearly impossible. With tra-
clustering tendency, clustering strategy, data validation, ditional clustering methods, the major problem has always
and clustering interpretation. been the difficulty in determining the optimal number of
Data collection involves acquiring and gathering data clusters and appropriate partitioning of the data objects.
from their various sources. Initial screening consists of the This problem of poor performance by traditional clustering
interchange of the different data that have been extracted approaches is linked to some degree to the inherent limi-
from sources. The extracted data are then prepared and tations in the algorithms. For example, traditional algo-
represented in such a way that they are made fit for a rithms are mostly local search algorithms, and except for
particular algorithm. Not all the extracted data would be linear and convex optimization, they thus cannot guarantee
useful; hence, the need to first verify whether or not there is global optimality, so results often depend on the initial
a tendency for them to be clustered. On the one hand, if starting points. Most traditional methods also tend to be
data or a group of data have been verified as suitable for problem-specific (for example, the k-mean algorithms,
being placed in a cluster, a clustering strategy chooses the partitioning around medoids (PAM), Clustering for Large
right algorithm and parameters. After a particular algo- Applications (CLARA), and Cobweb clustering algorithm),
rithm or method has been selected, it is then used to and they struggle to cope with problems of discontinuity
examine and test the set of data manually. Finally, the [13]. However, the recent shift toward automatic data
resulting clustering solutions are interpreted, and further clustering techniques implemented using nature-inspired
analyses are suggested and performed. On the other hand, metaheuristic approaches has helped to overcome these
some data are ambiguous, large, and complex, such that challenges in clustering analysis and has also offered sev-
grouping or naming them is almost impossible. This is eral improvements in the methods of clustering
because the intrinsic natural properties are unknown or [14, 75, 111].
difficult to know, as found primarily in real-world prob- Clustering methods can broadly be classified mainly into
lems where information regarding the data in view is two categories of hierarchical or partitional clustering
unavailable. Due to their inability to handle the grouping of [178]. The hierarchical clustering algorithms are iterative-
large datasets, and the limitations associated with cluster- based clustering procedures, which generate outputs that
ing [13], the traditional method of clustering is exhausting are similar to a hierarchical tree or dendrogram that shows
123
a sequence of clustering with each of the clusters belonging clustering methods. It is interesting to note here that meta-
to a partition of the data objects [38, 39]. By contrast, the heuristic search algorithms are the most applied techniques,
partitional clustering algorithms start by decomposing the which are commonly used for the implementation of auto-
datasets into a set of disjoint clusters based on specific matic clustering algorithms as identified in the study con-
optimization criteria. Examples of the many data clustering ducted by José-Garcı́a and Gómez-Flores [111].
algorithms that have been categorized and implemented This paper presents an in-depth and systematic review of
over the years include the density-based algorithms [65], nature-inspired metaheuristic algorithms used for automatic
prototype-based algorithms [144], graph-based (hierarchi- clustering analysis. The focus of this paper is on the meta-
cal agglomerative clustering) methods [216], and hybrid heuristic algorithms that have been employed to solve clus-
algorithms [119]. Under the partitional clustering method, tering problems over the last three decades. There have been
we find the well-known traditional clustering methods such some review studies on cluster analysis reported in the lit-
as the k-means, fuzzy c-means, and simulated annealing; erature, some with an overwhelming citation by other
among which the k-means clustering algorithms and its researchers. For example, Nanda and Panda in 2014 pre-
variants dominate, due primarily to their implementation sented a survey study on nature-inspired metaheuristic
simplicity and the ease of adaption of natural methods for algorithms with a specific focus on partitional clustering
unsupervised learning. However, associated with the sim- methods. In his work, Jain [106] provided a brief overview of
plicity of these techniques are some drawbacks, which most of the well-known clustering methods, with the
make them less scalable and robust. For example, as the emphasis on the significant challenges and critical issues in
number of features, attributes, and dimensionality of the designing clustering algorithms. Further, Jain also high-
data objects increases, most of the methods easily become lighted some of the emerging and useful research directions,
entrapped into local optima [5, 8] and their effectiveness is which point to the idea of semi-supervised clustering,
highly dependent on the initial solution. ensemble clustering, and simultaneous feature selection
In recent times, several nature-inspired metaheuristic during data clustering, and large-scale data clustering, which
techniques (evolutionary algorithms, swarm intelligence, to an extent are all trending and active clustering research
and stochastic, population-based algorithms) have been areas. Jafar and Sivakumar [105], in their paper, gave a brief
developed to mitigate some of the drawbacks of the tradi- review on the application of biologically inspired data
tional-based optimization methods. Among the well-known clustering technique with a focus on the ant-based clustering
metaheuristic algorithms that have received much desired algorithms. Xu and Wunsch [233] presented a comprehen-
attention in the area of automatic clustering we note, in sive survey of major clustering algorithms of data sets
particular, the firefly algorithm (FA) [14], particle swarm appearing in some research fields such as statistics, computer
optimization (PSO) [63], genetic algorithm (GA) [85], dif- science, and machine learning. Berkhin [30] presented a
ferential evolution (DE) [211], artificial bee colony (ABC) more general overview of some clustering techniques, with
[115], symbiotic organisms search algorithm (SOS) [176], his survey concentrating mainly on clustering algorithms
and teaching learning-based optimization (TLBO) [183], from a data mining perspective. The survey study presented
which have all played some significant role in different by Abbasi and Younis [1] focused on a general overview
application domains [69–74] and have also become domi- discussion of clustering algorithms, with application inter-
nant problem-solving methods in other aspects of clustering ests to wireless sensor networks, while Yu and Chong [237]
analysis. These algorithms are referred to as nature-inspired gave a comprehensive survey study of clustering schemes for
metaheuristic algorithms because their development is mobile ad hoc networks. A study that might be assumed to be
inspired by concepts inherent in a natural occurring phe- more closely related to the current survey is the work of José-
nomenon. The metaheuristic techniques require some higher Garcı́a and Gómez-Flores [111], whose research presented a
level of search procedures due to the trade-off balance review of sixty-five clustering methods based on nature-in-
between local search and randomization [79]. Since auto- spired metaheuristics that can be used for automatic clus-
matic data clustering is aimed at minimizing the similarity tering analysis. However, we strongly believe that the
within a cluster and maximizing the dissimilarity between current study differs largely from these existing review
clusters, it has been classified as an optimization problem, works, because it is based on a 30-year detailed bibliometric
and therefore, most metaheuristic approaches are judged to analysis and an up-to-date systematic review of all nature-
fit well into the context of the new clustering paradigm [129]. inspired metaheuristic algorithms for automatic data clus-
Our detailed review of the various proposed metaheuristic tering problems.
algorithms for clustering analysis is discussed in the subse- Considering the considerable growth of interdisciplinary
quent sections of this paper. Figure 2 shows the taxonomy of interests, and the dynamics in the application of clustering
clustering methods, starting from classical-based clustering analysis to different research domains, it is obviously true
methods to the most recent metaheuristic search-based to say that much has been achieved since the latest previous
123
123
The problem is compounded by the fact that the number Fuzzy clustering may be defined in terms of fuzzy sets,
of clusters is usually unknown. The search space size for in which each pattern may belong to more than one cluster
finding the optimal number of clusters is given as follows: simultaneously, with a certain degree of membership
X
N uj 2 ½0; 1. The membership value of the ith pattern in the
Bð N Þ ¼ SðN; K Þ; ð2Þ jth cluster should satisfy both of the following two
K¼1 conditions:
where BðN Þ is known as the Bell number.
X
K
Because the data clustering (or grouping) problem of • uj ðXi Þ ¼ 1; i ¼ 1; . . .; N;
finding an optimal solution is considered to be NP-hard j¼1
when K [ 3 [77], hence, even for moderate-sized prob-
lems, the clustering task could be computationally pro- X
N
hibitive [46]. • uj ðXi Þ\N; j ¼ 1; . . .; K:
i¼1
123
Initially used in the library and information sciences, bib- The methodology and data statistics employed in the cur-
liometric or scientometric analysis has now become a rent study are solely dependent on the bibliometric data,
separate independent research domain, having extended its which contain all the information regarding a publication,
roots into different technical fields. This type of analysis such as authors, journal, keywords, countries, document
makes overt exclusive, intrinsic and hidden structures type, etc. This bibliometric data is generally indexed by
about the research area, based on the publication data (also Web of Science (WoS), Scopus, and Google Scholar, etc.
known as bibliometric data). The technique provides not In this paper, we have considered only the WoS database,
only a fruitful field where young researchers can kick-start as this platform indices only high-quality journals and
their research journey, but it also provides useful insights ranked international conferences, thereby ensuring that the
into the field, thereby helping any researchers to make publications are of high quality.
novel contributions. In the literature, one can find two types This paper is based on quite recent automatic clustering
of bibliometric studies these days, being journal specific or approaches, which are a subset of the clustering approa-
research area specific. Some of the notable journal specific ches. Therefore, for the research methodology, we chose
bibliometric studies have focused on soft computing [152], the keyword query to be ‘‘clustering algorithms’’ or ‘‘Au-
Applied Soft Computing [158], IEEE Transactions on tomatic Clustering Algorithm’’ (ACA). Collectively we
Fuzzy Systems [236], and neurocomputing [109]. Journal- call the output of the search query as ACA. This query will
specific studies are centered on the publications of that return all those publications where any of these keywords
specific journal. An alternative focus for bibliometric appear in the title, abstract, or authors’ keywords. The
studies may be the whole research area, listing out all the search was performed on April 19, 2020, which resulted in
bibliometric information. There are studies available in the 5063 papers. However, because this was only 4 months
into 2020, we have considered data only until December
123
2019. Therefore, the full-year range is 1989–2019. The 2017 (TC = 15,445), 2016 (TC = 14,191), 2015 (TC =
refined query resulted in a total of 4875 publications. These 12,802), and 2014 (TC = 11,295). To further explore the
publications were classified by the WOS as 13 document citation structure, Table 2 shows the citation structure of
types, which are shown in Table 1. Some of the document all the publications in ACA. From this table, we find that
types include articles (4751), proceeding papers (465), there is one paper (that is 0.02% of all the papers) which
reviews (87), and early access (14). It should be noted that received more than 6000 citations. This paper is ‘‘Data
the total numbers for all these document types and the total clustering: A review’’ by Jain et al. [107] in the ACM
percentage may be higher than expected (4875 for publi- computing surveys, which has received a total of 6154
cations and 100% for percentage) as there could be a few citations since September 1999. There are three other
publications which were classified as more than one doc- papers (0.06% of all the papers), each with more than 3000
ument type. but less than 6000 citations. Specifically, these are
The year-wise publication count, total publications (TP) ‘‘CLUMPP: a cluster matching and permutation program
is shown in Fig. 3. Since the growth in publications was for dealing with label switching and multimodality in
increasing over the years, the maximum publications came analysis of population structure’’ by Jakobsson and
in 2019 (TP = 577). We take the 5-year interval to be as Rosenberg [108], ‘‘A tutorial on spectral clustering’’ by
follows: Y1 (1989–1994), Y2 (1995–1999), Y3 Von Luxburg [226], and ‘‘Data clustering: 50 years beyond
(2000–2004), Y4 (2005–2009), Y5 (2010–2014), and Y6 K-means’’ by Jain [106]. In addition, two papers (0.04%)
(2015–2019). The interval Y1 has a range of 6 years to have between 2000 and 3000 citations, while nine papers
make all other ranges to be 5 years. The maximum per- (0.19%) have between 1000 and 2000 citations. Notably,
centage of growth rate can be seen during the year range Jain [106] is singularly the most highly cited author; con-
Y3 (169.43%), followed by Y4 (123.17%) and Y6 tributing 9266 citations among the total of 148,134 cita-
(74.70%). Interestingly, the lowest growth rate is observed tions. More details on the authors’ contribution are
during Y5. Irrespective of this variation, the overall growth discussed in Sect. 3.3.
has increased over the years, which confirms the growing
interest in the research community to contribute to and 3.2 Source/journal analysis
evolve this ACA domain.
Corresponding to the data in Fig. 3, we have Fig. 4, In considering the development of a research field like
which shows the number of citations received by all the ACA, the source of publications plays a very prominent
papers in ACA. As would be obvious from the results for a role as the propagator or server of information. Moreover,
number of publications, the highest citations were received it gives a straightforward direction to the young research-
in year 2019, with a citation count of 19,012. Apart from ers. Table 3 lists the top 25 journals, ranked on the basis of
this, there are five other years for which the received number of publications in ACA. This table also contains
citations were more than 10,000, i.e., 2018 (TC = 16,945), the citations received by those published papers, citations
per paper received, and IF of the journals.
As it turns out, the journal Pattern Recognition published
Table 1 Document types in Web of Science (WoS) the most papers in ACA (TP = 120), followed by IEEE
Document types Total number Percentage (%) Access (TP = 114), BMC Bioinformatics (TP = 105), and
Expert Systems with Applications (TP = 104). These are the
Article 4751 97.45
only four journals with more than 100 publications, although
Proceeding paper 465 9.53
all the journals in the top 25 have published more than 20
Review 87 1.78
papers in the field. With respect to influential journals as
Early access 14 0.28
judged by the number of citations (TC), Bioinformatics is the
Editorial material 9 0.18
most influential among the top 25, with a total citation count
Letter 8 0.16
of 9330. Other influential journals in the list are IEEE
Correction 6 0.12 Transactions on Pattern Analysis and Machine Intelligence
Meeting abstract 6 0.12 (TC = 6059), Pattern Recognition (TC = 5993), and Pat-
Note 4 0.08 tern Recognition Letters (TC = 5633).
Book chapter 3 0.06 Since ACA covers the analysis of the clusters of the
News item 2 0.04 datasets, there are journals in the top 25 that are related to
Database review 1 0.02 medical fields such as BMC Bioinformatics (TP = 105,
Software review 1 0.02 TC = 3065), PLOS One (TP = 76, TC = 1098), Bioinfor-
matics (TP = 67, TC = 9330), and IEEE-ACM Transac-
tions on Computational Biology and Bioinformatics
123
577
600
470
500
Total Publications
370
400
347
308
284
270
300
238
227
219
201
188
178
167
158
200
134
96
86
100
62
49
45
32
27
25
25
24
22
22
18
3
0 3
Number of Years
19012
Number of citations over the years
citations over the years from
16945
1989 to 2019 20000
15445
18000
14191
16000
12802
11295
14000
12000
9237
8161
10000
6860
6129
6013
8000
4822
6000 4227
3365
2639
2008
1535
4000
1030
656
474
388
253
213
167
2000
97
87
40
25
16
2
0
Table 2 Citations’ structure of publications in ACA (TP = 22, TC = 382). That Bioinformatics is also the most
influential journal, as discussed above, shows the impor-
Number of citations # of publications % publications
tance of clustering approaches for data in the medical field.
C 6000 1 0.02 Interestingly, Bioinformatics has also received the second
C 3000 3 0.06 largest number of citations per paper (CPP = 139.25), after
C 2000 2 0.04 IEEE Transactions on Pattern Analysis and Machine
C 1000 9 0.19 Intelligence (CPP = 144.26). IEEE Transactions on Sys-
C 500 21 0.43 tems Man and Cybernetics Part B-Cybernetics also has
C 250 52 1.06 more than 100 citations per paper, i.e., 104.38. For com-
C 100 166 3.39 parison, while still falling into the top 25 for a number of
C 50 294 6.03 publications, journals with less than 5 CPP are IEEE access
C 25 4347 89.16 (CPP = 2.12), Mathematical Problems in Engineering
Total publications 4875 100 (CPP = 2.28), Journal of Intelligent & Fuzzy Systems
(CPP = 3.89), Intelligent Data Analysis (CPP = 3.98), and
Wireless Personal Communications (CPP = 4.21).
The next part of journal analysis is the bibliographic
coupling of ACP publications. A minimum threshold of 10
123
documents published by each journal was used. Of the that some journal names are not visible, which may be
1344 journals identified, only 87 meet the set threshold; the either because their node is too small or because they are
bibliographic coupling data for the top 50 journals are hiding behind a bigger node. This is just a limitation of
plotted in Fig. 5. Here, the links between the journals VOSviewer. For instance, Pattern Recognition Letters
represent the articles which most frequently refer to the (green node) should be the fifth biggest node according to
common documents. The size of each node represents the the number of publications, but it is hiding behind the
number of publications associated with it, which implies biggest node, which is Pattern Recognition. Some of the
that the bigger the node, the greater the number of coupled major journals in the red color clusters are Pattern
papers. This pattern can be verified from the results of Recognition, Neurocomputing, IEEE Access, and IEEE
Table 3. For example, as shown earlier, the most produc- Transactions on Knowledge and Data Engineering. The
tive journals are Pattern Recognition, IEEE Access and green node clusters comprised of journals like Pattern
BMC Bioinformatics, which produced the biggest nodes in Recognition Letters, Applied Soft Computing, Expert Sys-
Fig. 5. tems with Applications, IEEE Transaction on Fuzzy Sys-
Different colors are used in Fig. 5, so that different tems, etc. It seems that this cluster of journals cites papers
colors indicate sets of nodes that form clusters. Same color mostly from the fields of artificial intelligence, machine
clusters are those journals wherein articles mostly cite the learning optimization, etc.
same documents. Of interest are the medical journals Figure 6 shows the co-citation analysis among the
forming a cluster of blue color consisting of BMC Bioin- journals that have published papers on ACA. Here, co-
formatics, Bioinformatics, PLOS One, IEEE-ACM Trans- citation means that a link is established between two
actions on Computational Biology and Bioinformatics, and journals if a document has cited a publication from both
Computers in Biology and Medicine, etc. It is to be noted journals. Specifically, in Fig. 6 we can see that biomedical
123
Fig. 5 Bibliographic coupling of the source publishing on ACA (color figure online)
journals like Bioinformatics are on the right side and have 3.3 Authors’ analysis
the links with the journals on the left, which are the ulti-
mate sources of the algorithms on clustering. As clustering Author analysis is also important in that it highlights the
is more of a data mining/machine learning approach, the most prominent authors who are publishing and getting
papers on its theory and implementation are published in recognition for their work in ACA. Scholars can then fol-
journals like Pattern Recognition, Expert Systems with low the work of the respective authors and become aware
Applications, etc. of developments from the previously published papers.
Table 4 shows the top 25 authors publishing in ACA. On
123
the left side of this table, authors are sorted based on TP has received 6154 citations over the 20 years up to 2019.
(productivity) and on the right side, according to TC Jain is followed by Von Luxburg and Bezdek with,
(influence). respectively, 3481 and 2776 citations and 696.9 and 106.77
Bezdek is the most productive author in the ACA field citations per paper.
with 26 publications and received a total of 2776 citations. The bibliographic coupling analysis involved two
With the next most publications, we find Jiao L. and Yang thresholds; authors should have a minimum of five docu-
M-S., with 21 ACA papers each. However, Jiao stands ments with 10 citations. These criteria returned only 133
second as he has received more citations (TC = 376). The authors out of all the 13,970 authors. We selected the top
same ranking approach has been followed for all the 50 of these to plot the visual representation, as shown in
authors who have the same number of publications. Maulik Fig. 7. The bibliographic coupling among authors implies
is the fourth most productive author with 18 publications that these authors have cited the same set of papers in their
and an apparently higher CPP (26.94) than the above two publications. These sets of authors formed clusters as
authors. Bezdek has an even greater CPP of 106.77. There indicated by different colors in the figure. The most
are two authors with 15 papers, namely Yang MS. (TC = prominent authors in each of the colored clusters are Jiao
953, CPP = 63.53) and Wang S. (TC = 509, L. (red cluster), Maulik (yellow cluster), Yang M-S. (green
CPP = 33.93). cluster), and Liang (blue cluster). One might have expected
In terms of the most influential authors, Jain has an the author Bezdek to have been the largest node, but the
astonishing citation per paper of 1491.8, translating to 7459 data had saved as either Bezdek, James C or Bezdek, J.
citations from just 5 papers. This remarkable achievement C. Thus, there were two entries as per the visualization,
is primarily due to the most cited clustering paper ‘‘Data which is a limitation of the method. However, this mistake
clustering: A review,’’ which was published in 1999 and is corrected in the corresponding Table 4.
123
Fig. 7 Bibliographic coupling of the authors publishing on ACA (color figure online)
Figure 8 shows the co-citation analysis among the contributions of the universities in these countries. The
authors who have published in ACA. Co-citation analysis Chinese Academy of Sciences and Xiadin University in
for authors implies that some third author’s document has China top the list of most productive institutions with 78
cited papers of the first two authors. It can be verified by and 49 publications, respectively. The Indian Statistical
the largest node in the figure (Jain, A.K.). Since the paper Institute stands at third position with 42 publications.
by Jain et al. [107], being a review paper, is the most cited Although China (TC = 16,114) is the most productive
paper and lies in the middle of all the authors with par- country, USA stands at the top of the most influential
ticularly strong links with Bezdek, Kohonen, etc. nation in publications on ACA with 65,666 citations. The
next two countries in the list, with more than 10,000 cita-
3.4 Country and institution analysis tions each, are India (TC = 12,326) and Germany (TC =
11,580). Once again, the USA has considerably more
Country and the institution analysis are also very important citation counts than the countries behind it. This almost
in the bibliometric study as they indicate the regional stands to reason because the most cited author (Bezdek) is
source of the work in the ACA domain and the universities from the USA. More evidence is presented in Table 6,
where the work is being performed. Tables 5 and 6 show where the same effect can also be seen with the most
the side-by-side comparison according to the most pro- influential institutions being from the USA, namely
ductive and most influential of the best 25 countries and Michigan State University (TC = 10,834) and Ohio State
institutions, respectively. These tables also contain the University (TC = 6562). The institute from India, Indian
citations per publications for each country and institution. Institute of Sciences, is at number three with 6501 cita-
China tops the list as the most productive country with tions. The next three universities are also from the USA:
1160 publications followed by USA (TP = 1140), India University of Michigan (TC = 4263), University of
(TP = 313), and England (TP = 249). Notably, third posi- Washington (TC = 3900), and University of Missouri
tion India has 72.5% fewer publications than second (TC = 3252).
position USA. There are only 15 countries with more than Next, we present a graphical visualization of countries
100 publications in ACA. This trend can be verified by the and institutions, corresponding to the two Tables 5 and 6.
123
Figures 9 and 10 show the bibliographical coupling of authors’ keywords from papers publishing in ACA. As
countries and institutions, respectively. As shown in Fig. 9, would be expected, ‘‘clustering’’ is the biggest node fol-
a bigger node represents the country with most publica- lowed by ‘‘clustering algorithms,’’ ‘‘data mining,’’ ‘‘fuzzy
tions, which can also be verified in Table 5. Here, the clustering,’’ etc. The keyword in each colored cluster sig-
coupling means that the publications from these countries nifies that they are mostly used together. The blue colored
have cited the same references. Thus, there is a link cluster has keywords like ‘‘clustering algorithms,’’ ‘‘opti-
between the countries. The same color nodes and links mization,’’ ‘‘feature selection,’’ ‘‘partition algorithms,’’ etc.
suggest that these countries publications have mostly cited Similarly, the green color cluster has keywords such as
the papers published in the corresponding colored node ‘‘clustering,’’ ‘‘data mining,’’ ‘‘classification,’’ ‘‘subspace
country. We can see four clusters here: China is prominent clustering,’’, and ‘‘microarray.’’
in the green color cluster containing South Korea, Turkey,
India, Taiwan, Iran, Egypt, etc. The USA is prominent in
the blue color cluster containing countries such as England, 4 Metaheuristic clustering algorithms
Norway, Austria, Ireland, and Scotland. Similar behavior
can be seen in Fig. 10 corresponding to Table 5. The Cluster analysis has over the years evolved to be an
bibliographic coupling indicates that these universities important and fast-growing area with widespread applica-
have cited papers mostly from the linked or same cluster tions, but especially in data mining. Many studies have
universities. In addition, this also revealed that authors been carried out in this regard, and different approaches to
seem to work and communicate within their geographic data clustering have also been proposed and implemented.
region and community. Nature-inspired metaheuristic algorithms are broadly cat-
egorized into three research fields, namely evolutionary
3.5 Authors’ keyword analysis algorithms (for example, evolutionary strategy, genetic
algorithms, and differential evolution algorithms), swarm
This section portrays the keywords most commonly used intelligence (for example, ant colony optimization, particle
by the authors who have published papers in the domain of swarm optimization, firefly, cuckoo search and symbiotic
ACA. Figure 11 shows the analysis of co-occurrence of organisms search algorithms), or stochastic, population-
123
based, nature-inspired optimization algorithms. In the next ones from biological occurrences) methods, and to those
subsections, various nature-inspired metaheuristic cluster- based on a group of particles called a swarm, up till those
ing analysis methods are presented systematically and that are chemical-based, geographical-based, music-based,
discussed. and even sport-based. Although these classes of algorithms
Driver and Kroeber, in 1932, were the first to introduce operate by different modalities, there are nevertheless
and use the concept of cluster analysis, by applying it in the apparent similarities among their behavior. Next, we pre-
field of anthropology. Later, Zubin applied it in psychology sent the generalized algorithmic design framework for the
in 1938 and Tryon in 1939. Further, it was used for trait two main classes of metaheuristics techniques.
theory classification in personality psychology by Cattell in
early 1943. Subsequently, cluster analysis has grown sig- 4.1 Generalized metaheuristic algorithmic
nificantly due to its relevance in diverse areas. Jain [106] design
presented a review of the prominent clustering approaches
that had been in existence for over five decades, indicating In general, the algorithmic design frameworks and proce-
their evolution, and showing trends in data clustering dures of the swarm intelligence optimization and evolu-
techniques. Also, Fahad et al. [76] and Nerurkar et al. [164] tionary algorithms appear to be somewhat similar in terms
gave reviews and comparisons of the different clustering of their design and solution representation. However,
techniques and their applications to big data. As mentioned solution representation is a prerequisite for the overall
in Sect. 1, clustering algorithms fall into the two main performance of both optimization approaches. Similarly,
categories of hierarchical or partitional; all the clustering the two broad classes of algorithms share the same design
approaches that have been developed so far are designed steps, which begins with the first step of random initial-
based on these two classifications. These methods of ization of population size. The second step is the evaluation
clustering have emerged from the classical (traditional) of the initialized population to identify the candidate
methods, through to the evolutionary (natural evolving individual or particles, which in this case would represent
123
1 Chinese Academy of Sciences 78 1696 21.74 Michigan State University 10,834 16 677.13
2 Xidian University 49 617 12.59 Ohio State University 6562 17 386.00
3 Indian Statistical Institute 43 1911 44.44 Indian Institute of Sciences 6501 9 722.33
4 Nanyang Technological University 42 1371 32.64 University of Michigan 4263 16 266.44
5 City University Hong Kong 42 1115 26.55 University of Washington 3900 28 139.29
6 Hong Kong Polytech University 41 853 20.80 University of Missouri 3252 18 180.67
7 Islamic Azad University 36 592 16.44 Korea University 3139 16 196.19
8 University of Sydney 34 908 26.71 University Toronto 2202 21 104.86
9 Beijing Jiaotong University 34 434 12.76 Cornell University 2191 7 313.00
10 Jadavpur University 32 1141 35.66 University Chicago 2042 5 408.40
11 University of Sao Paulo 31 754 24.32 Stanford University 2028 27 75.11
12 University of Illinois 31 705 22.74 MIT 2002 21 95.33
13 Chung Yuan Christian University 30 713 23.77 Harvard University 1931 20 96.55
14 Harbin Institute of Technology 30 486 16.20 Indian Statistical Institute 1911 43 44.44
15 Dalian University of Technology 30 354 11.80 Imperial College of Science, Tech. & Medicines 1837 10 183.70
16 University Elect Sci & Tech. China 29 537 18.52 University of Texas 1743 20 87.15
17 University of Washington 28 3900 139.29 Chinese Academy of Sciences 1696 78 21.74
18 Jiangnan University 28 446 15.93 University W Florida 1673 10 167.30
19 Wuhan University 28 261 9.32 CSIRO 1605 5 321.00
20 Stanford University 27 2028 75.11 Athens University Econ & Business 1560 7 222.86
21 Natl Taiwan University Sci & Technol 27 556 20.59 Nanjing University Aeronaut & Astronaut 1517 17 89.24
22 Shenzhen University 27 278 10.30 University of Maryland 1487 18 82.61
23 Tsinghua University 26 382 14.69 University of Minnesota 1428 25 57.12
24 Natl University Singapore 26 324 12.46 Nanyang Technol University 1371 42 32.64
25 Tianjin University 26 213 8.19 University S Florida 1349 15 89.93
the choice of solution. The third step is to generate a new made by comparing the previous solution with the current
population by modifying individual specified variation solutions with precedence often given to the current best
operators, and this is done iteratively, after which the new solution. These steps are iteratively repeated until the ter-
individuals are evaluated based on their fitness. An update mination condition is met, and the optimal best solutions
is made depending on which of the candidate individual is are identified. The general algorithm design for both the
better in terms of the problem-defined objective function. swarm intelligence and evolutionary algorithms is shown
Generally, the selection of the best candidate solutions is in Algorithm listings 1 and 2.
123
Algorithm 2 represents a generic framework design for 4.2 Metaheuristic solution representation
the swarm intelligence-based algorithmic concept, which and encoding scheme
includes the PSO, ACO, SOS.
Generally, for any optimization problem model design, a
well-formulated solution representation is vital for the
123
successful performance and scalability of most nature-in- metaheuristic techniques to solve any real-world problem
spired metaheuristic algorithm application to the problem is linked to a well-formulated solution representation. The
at hand. In other words, the successful application of solution representation for a candidate clustering problem
123
can be formulated as follows: Consider a dataset X, which methods, these traditional approaches are instead driven by
contains n data points say x1 ; x2 ; . . .; xn , with d-dimensional ‘‘trial and error’’ in order to find or discover solutions to a
attributes, features, variables, components [75]. Formally, specific problem. There is, therefore, no guarantee that the
this can be expressed in a vector form as, X ¼ optimal solutions will be found in a reasonable time
fx1 ; x2 ; . . .; xn g to represent a set of n data points, each because they are prone to be trapped within the search
having d real-valued features. In other words, space. Xu and Tian [232] have provided a comprehensive
xi 2 Rd ; 8i ¼ 1; 2; . . .n, xi ¼ xi1 ; xi2 ; . . .; xij ; . . .xid , review of clustering algorithms based on different charac-
where xij denote all the features of xi . Therefore, the pop- teristics. The k-means clustering algorithm [238], which is
ulation matrix can be initialized as follows: the most common and popular classical and partitional
2 3 2x 3 clustering algorithm, and also considered to be one of the
x1 1;1 x1;2 x1;d
top ten algorithms in data mining [231], has been in the
6 x2 7 6 x2;1 x2;2 x2;d 7
6 . 7 6 6 .. 7
7 forefront of these heuristic techniques, due to its ease of
6 . 7 6 .. .. ..
7
6 7 . . . . . implementation, flexibility, and efficiency [5, 8]. However,
X¼6 7¼6 6
7
7
6 i 7 6 i;1
x x x i;2 x i;j xi;d 7 the major limitations of the k-means stem from the method
6 . 7 6 . . . . 7
4 . 5 4 . .. .. .. 5 relying on the need for predetermined information about
. .
xn the data, as well as the initial solution. This dependence
xn;1 xn;2 xn;d
means the algorithm may be easily trapped within a local
The population matrix, in this case, comprises strings optimum [245]. Another classical clustering method is the
which are composed of real numbers that encode the cen- fuzzy c-means (FCM) algorithm [32], which, although not
ters of the cluster partitions. Now let the individuals in the as popular as the k-means algorithm, is the most common
population encode the number of clusters ki . The grouping in the field of fuzzy clustering. In 2017, Sathapan and
of individuals in the population into a similar class is colleagues [198] carried out a literature study on numerous
estimated based on the lower bound denoted by kmin and traditional clustering algorithms for uncertain data. Some
upper bounds denoted by kmax of the number of groups in of these traditional algorithms include the k-means,
the population. The number of clusters for each individual k-medoid, global kernel k-means, k-mode, u-rule, uk-
is evaluated using the following expression. means, and fuzzy c-means. Their study was motivated by
ki ¼ rand ð0; 1Þ ðkmin þ ðkmax kmin ÞÞ the increasing need to deal with the complexities associated
with real-world data. To overcome these challenges arising
where the function randðÞ denotes randomly generated from the limitations, researchers have devised and come up
numbers between 0 and 1. with some other more productive and efficient approaches.
Again, for a d-dimensional space, the length of a d- Various approaches have over time been implemented,
dimensional dataset is denoted as d kmax , while an indi- which have been designed specially to handle high-di-
vidual consists of a vector of real numbers of dimension mensional and complex real-world problems. These are
kmax þ kmax d. The length of ith individual is given as either evolutionary-based or swarm intelligence algorithms
d ki . Note that the first kmax values are positive floating and are referred to as being ‘‘metaheuristic’’ because they
points numbers in [0, 1], and these values are similarly require a higher heuristic search in finding optimal solu-
used to determine the suitability of the corresponding tions. They are thus more significant in solving general
clusters for the data points classification. Afterward, the problems, especially those of optimization and computa-
remaining kmax values are set aside for kmax clusters’ cen- tional problems. Further, metaheuristic methods look out
ters, each d dimensional space [13, 75]. Also, note that the for the most promising (optimal) solution in their search
initial population for the metaheuristic algorithms is gen- space, thus keeping a balance between diversification and
erated randomly. For example, in the case of PSO, FA, intensification, and they try as much as possible to prevent
SOS, and IWO algorithms, a random number between [0, their search results from jumping into unpromising regions
1] is generated for each data points (in this case individual within the search space. It is noteworthy to mention here
particles) position in the first part. In contrast, for the initial that the intensification feature helps metaheuristics meth-
centroid, a data point is picked randomly for each cluster ods to obtain the best value for decision variables, while
centroid. the diversification feature makes them well suitable for
problems with large search space.
4.3 Trend from traditional algorithms Hussain et al. [103] provided a study on the meta-
to metaheuristic algorithms heuristic methods that had been in use for about 33 years in
the area of optimization, in which they also investigated the
Traditional approaches to clustering do not require an in- trend by which metaheuristic methods have grown over
depth or rigorous search process. Also called ‘‘heuristic’’ that time. Evolutionary-based algorithms (EAs) are
123
strategies that are based on the processes occurring during 4.4 Clustering with swarm intelligence-based
the natural evolution of different species [42], and are, algorithms
more specifically, known as Darwin’s concept of evolution
of living things according to the ‘‘survival of the fittest’’ All swarm intelligence methods, also called population-
[178, 246]. These algorithms include the GA [85], or dif- based algorithms, are inspired by the collective intelligent
ferential evolution (DE) [53, 211]. The two are similar in interactions which emerge from the social behavior of a
that the DE uses the same genetic functions of GA, namely group of individual (organisms) or particles in an envi-
crossover, mutation, and selection, but the GA depends ronment. It should be noted that, while not all population-
more on the crossover function, while the DE is based more based methods are swarm intelligence algorithms, all
on the mutation factor. Evolutionary programming (EP) swarm intelligence algorithms are population-based,
[217], which is one of the foundational approaches to because they require the co-interaction of participating
modern methods, also uses the mutation and crossover individuals to carry out an exhaustive search for possible
operators. Other examples include teaching learning-based and promising solutions. In this section, we review the
optimization, which is based on the actual learning expe- clustering algorithms based on swarm intelligence.
rience of students in a learning environment [116]. Evo-
lutionary algorithms usually have a few random individuals 4.4.1 Particle swarm optimization (PSO)
to initiate the search process, and then, the algorithm pro-
ceeds with the search for the best solution while being Merwe and Engelbret were the first to solve clustering
guided by rules [177]. Hruschka et al. [101] presented a problems using the PSO [224], whereby the particles that
study on evolutionary computation based on clustering were randomly generated were mapped to one data vector,
algorithms. Akyol and Alatas [19] classified evolutionary such that a data vector represents a cluster centroid. They
algorithms into nine groups, based on the method and proposed two new approaches using PSO for data cluster-
approach by which they are inspired, which is similar to the ing. The first approach was to show that PSO can be used to
study carried out by Rajpurohit et al. [177] where a total of find centroids of a user-specified number of clusters and,
176 metaheuristic algorithms were reviewed. Some of the secondly, extend the PSO with the k-means algorithm to
metaheuristic algorithms, as mentioned earlier, are chem- seed the initial swarm, thus refining the newly formed
ical-based, such as the artificial chemical reaction opti- clusters. Results showed that their approaches have better
mization algorithm (ACROA) [20], or biology-based, such convergence, large inter-cluster distances, and smaller
as the bacterial colony optimization (BCO) [166], bacterial intra-cluster distances than the other compared methods
evolutionary algorithm (BEA) [52], and bacterial swarming from the literature. However, their proposed methods were
(BS) [42]. Swarm intelligence (SIs) algorithms are based not able to automatically determine the number of clusters,
on the collective and social behavior of living creatures; and also, the methods could not scale through on high-
they are also referred to as ‘‘population-based’’ algorithms dimensional data.
[246]. Some of these SIs are, for instance, particle swarm Similar to the study in [224], Zhao et al. [241] proposed
optimization (PSO) [63], firefly algorithm (FA) [234], an improved performance of the PSO by integrating it with
artificial bee colony (ABC) [115], invasive weed opti- the k-means algorithm to avoid the hybrid PSO algorithm’s
mization (IWO) [151], cuckoo search (CS) [204], ant col- performance being directly affected by the original clus-
ony optimization (ACO) [55, 56], teaching learning-based ters. Experimental results and comparison with the classi-
optimization (TLBO) [183, 246], the FIFA World Cup cal k-means algorithm showed that the improved PSO with
[185], which is one of the new generation metaheuristic k-means had superiority over classical k-means with
algorithms, and so forth. Recently, Molina et al. [155] respect to time. Nevertheless, the proposed method also
presented a comprehensive study on the taxonomies of could not determine the appropriate number of clusters
nature-inspired and bio-inspired optimization algorithms. automatically.
The study includes several metaheuristic algorithms based Chuang et al. [43] proposed a new strategy by com-
on their source of inspiration and behavior of the particles bining the particle swarm optimization strategy with an
or organisms. acceleration strategy. The new chaotic optimization algo-
Almost all of these metaheuristic algorithms have been rithm for data clustering was called ACPSO. The proposed
effectively applied to solve clustering problems. In this method searches through arbitrary datasets for appropriate
section, we review published studies that have been done centroids, thus efficiently finding better solutions. They
on both non-automatic and automatic clustering using compared their results with six other algorithms, and the
nature-inspired metaheuristic algorithms. The categoriza- results for their proposed method showed its superiority
tion done in this section is according to the taxonomy study over other compared methods in terms of robustness in
presented in [155].
123
finding cluster centroids and time efficiency. ACPSO, enhanced and intelligent operations than the standard PSO.
however, cannot automatically identify clusters, as it The enhanced modifications of PSO included a chaotic
requires the number of clusters to be defined or known a initial population generation with a systematic migration
priori. procedure; thus, the ECPSO was able to improve the
Silva Filho et al. [209] proposed two data clustering exploration ability and convergence rate of the original
methods by hybridizing PSO with fuzzy c-means, called PSO. Simulation results were compared with those for four
FCM-IDPSO and FCM2-IDPSO. Their methods were other algorithms and showed that ECPSO yielded more
aimed at dynamically adjusting PSO parameters during optimized solutions and a higher degree of purity than did
execution, hence to provide the right balance between the other algorithms. The authors mentioned that ECPSO
exploitation and exploration, while avoiding falling into was able to achieve this superiority because the number of
local minima, and so providing better solutions. Their clusters was predefined before the proposed method was
results, when compared with other existing methods and trained on the chosen datasets. Hence, ECPSO was not able
those based on the PSO, showed more accuracy and to automatically determine the number of clusters. The
obtained better solutions. A limitation of their proposed authors also suggested that for future works, the method
model was that it could not automatically determine the could be integrated or combined with some other evolu-
number of clusters. The authors, however, proposed that tionary algorithms for better efficiency.
their model should be extended so it could be able to obtain Recently, Alswaitti et al. [22] implemented a density-
clusters automatically. In other words, they suggested that based PSO, called DPSO, for data clustering. In trying to
automatic data clustering is an active approach that could find a balance between intensification and diversification
handle the shortfall. and address the issue of premature convergence with the
Rana et al. presented a boundary-restricted adaptive classical PSO, they used a combination of a kernel-based
particle swarm optimization, the BR-APSO algorithm, for density estimation technique that is associated with a new
data clustering [180]. The boundary restriction was intro- bandwidth estimation method and also estimated multi-
duced into the standard PSO to allow particles to go outside dimensional gravitational learning coefficients. DPSO used
the boundaries of their search spaces. Still, it could also the Dunn index to evaluate its effectiveness. Simulation
forcibly bring back particles that had gone outside the results were compared with those from five other state-of-
search space during the evaluation process. Experimental the-art algorithms, and DPSO showed better performance
results showed that, when compared with seven other in terms of classification greater accuracy and cluster
algorithms, the proposed adaptive PSO outdid them in compactness, with lesser computational time than did the
terms of robustness, accuracy, and convergence speed. The others.
BR-APSO was, nevertheless, unable to detect clusters Duan et al. [58] implemented a hybrid approach using
automatically. the artificial bee colony (ABC) and PSO, called ABCPS, in
Another use of PSO for data clustering is found in the order to establish a form of diversity within the swarm
study carried out by Cura [48], where he applied a new during exploration and to give fast convergence. ABCPS
approach of PSO to data clustering. His new approach made use of the modified partition coefficient index
followed the ‘‘gbest neighborhood topology’’ of the stan- (MOC), Fukuyama and Sugeno (FS) index, and weighted
dard PSO algorithm, such that a new particle moves toward inter-intra (Wint) index. From simulation results, their
its previous best position and toward the best particle in the proposed hybrid method outperformed three other com-
search space. Cura compared results from his study with pared methods, in terms of better solutions.
those from three existing algorithms and could show that A clustering algorithm based on the hybridization of
his proposed method outperformed the others in terms of PSO with k-means, called IKPSO, was developed by
robustness, effectiveness and easy tuning. Although his Atabay et al. [26]. The authors exploited the simplicity and
method solved the clustering problem in which the number speed of the k-means and the generalization and effec-
of clusters was not known, it, however, did not give the tiveness of PSO. Results were compared with those from
distinct number of clusters present in the datasets used. the classical PSO and k-means algorithm and showed that
Also, while implementing the method for an unknown IKPSO outperformed the other two in terms of accuracy
number of clusters, robustness was reduced due to the large and speed.
difference obtained between the best and worst fitness Similarly, Nayak et al. [161] hybridized an improved
values. PSO with a genetic algorithm (GA) and k-means for cluster
Lashkari and Moattar [135] proposed an extended analysis (GA-IPSO-K-means). The improved PSO was
chaotic PSO, known as ECPSO, for data clustering. The used to determine the number of clusters; the GA being
proposed ECPSO used the purity index to evaluate the integrated to improve the quality of particles, while the
clustering solutions, and it was shown that it had more k-means algorithm refined the solutions in the last search
123
phase to avoid premature convergence. Simulation results, the local compactness of each cluster by the local density
when compared with results from other algorithms, showed function, which makes the PSO drift toward maximizing
that the proposed method outperformed the others in terms compactness, thereby avoiding many clusters to be iden-
of fast convergence and optimal solution. The proposed tified during evolution. A distance constraint that was
method, however, was not able to choose the number of based on local fitness mechanisms and a partition mea-
clusters automatically, neither could it find better fitness surement were also incorporated to maintain diversity in
value. Furthermore, the number of clusters needed to be the population and provide good performance. The per-
defined before training with the method. formance of PLDC, when compared with six other state-of-
Huang et al. [102] hybridized the PSO with ant colony the-art methods using the Rand index (RI), showed that it
optimization (ACO) for data clustering, calling their was able to precisely and appropriately determine the
method ACOR-PSO. Their proposed method incorporated number of clusters, as well as achieve a better grouping.
a continuous ACO with PSO to improve the searchability The method, however, could not handle outliers that were
by looking into four models of hybridization, namely detected in the datasets.
sequence approach, parallel approach, sequence approach Kuo and Zulvia [131] implemented an improved particle
with an enlarged pheromone-particle table and global swarm optimization for automatic data clustering. Their
exchange. Simulation results when compared with those proposed algorithm, called automatic data clustering using
from the k-means algorithm, classical PSO and classical PSO (ACPSO), addressed two main issues in automatic
ACO showed that the sequence approach with the enlarged clustering. The first section addressed the problem of
pheromone-particle table is superior in effectiveness to determining the number of clusters, while the second sec-
other approaches because of the diversity of generation of tion handled the representation of the cluster centroids.
new solutions that the pheromone table offers from the Further, they employed a sigmoid function to handle
ACO, which in turn prevents entrapment within the local infeasible solutions and then used the k-means algorithm to
optima. adjust the cluster centroids. Their experimental results
A dynamic clustering approach, called DCPSO, was showed that their proposed ACPSO algorithm outper-
developed by Omran et al. [167] and applied to image formed three other related algorithms when compared in
segmentation. DCPSO automatically obtained the appro- terms of accuracy and consistency.
priate number of clusters and simultaneously partitioned Nanda and Panda [160] proposed a multi-objective
the clusters without the need for human interference. The immunized PSO algorithm (MOIMPSO) to classify actions
algorithm starts by partitioning the dataset into a large of 3D human models. Their proposed algorithm provided a
number of clusters to reduce initial condition effects. Then, suitable Pareto optimal archive for unsupervised problems
with the aid of a binary PSO, the clusters are chosen, and by automatically evolving cluster centers and simultane-
finally, the centroids of the selected clusters are refined by ously optimizing two different objective functions. Also,
the k-means algorithm. Results showed that DCPSO was from the Pareto optimal archive, a single best solution that
able to generate the appropriate number of clusters for the satisfies users’ requirements was provided. The resulting
images used for the test. analysis showed that the performance of the proposed
A dynamic clustering using combinatorial PSO was algorithm was better in terms of result accuracy and
presented by Masoud et al. [148]. The proposed model, computational time when compared to results from other
called CPSOII, was implemented to automatically find the related algorithms.
best number of clusters, as well as group or partition the A kernel-based modified PSO (kernel MEPSO) by
data effectively. As a preprocessing step, the model used a Abraham et al. [3] and a multi-elitist PSO (MEPSO) by
renumbering approach and so extended the PSO operators Das et al. [51] were developed and implemented for
in order to improve population diversity, quality of solu- automatic clustering of complex data. The studies are
tions, and convergence speed. Further, CPSOII used the similar, and the two sets of authors proposed two algo-
variance ration criterion (VRC) and DB index to evaluate rithms in which, instead of using the conventional square-
its performance. Experimental results showed that CPSOII, measure distance approach, they adopted a kernel-induced
when compared with three other algorithms, achieved similarity function. This adaptation enabled data that are
superiority in terms of effectiveness and robustness. The non-separable in its original form to be clustered into
authors suggested that for future study, CPSOII could be homogenous groups in a high-dimensional feature space
integrated with multi-objective PSO to improve its transformation. Comparison of the kernel_MEPSO with
performance. other algorithms showed that, for some test cases, it is
In the study carried out by Ling et al. [138], their pro- statistically significantly superior to them.
posed method, called PLDC, was able to estimate the Kao and Chen [113] implemented a hybrid PSO for
number of clusters automatically. Their method measured automatic clustering for generalized cell formation, called
123
PSOAC. The method adopted an integer number and set of different networks, which are usually dynamic and
real numbers, which were then used to encode the number heterogeneous, and where the number of clusters is also
of machine cells, a discrete PSO was used to search for the unknown in advance. The HPSOM was further extended to
number of machine cells, and a continuous PSO was used AHPSOM, in order to generate and readjust the clusters
for the machine clustering. The method searched for the automatically over the mobile network devices, thereby
number of machine cells in two ways: either by random facilitating the generation of sustainable clusters. The
selection or by inheritance of past best results. The former performance of HPSOM was compared with some known
option aids PSOAC from becoming trapped within local evolutionary clustering methods. The effectiveness of
optima, and the latter allows the proposed method to AHPSOM was evaluated using cluster numbers, inter- and
exploit the best machine cell solution found, as well as intra-cluster distances, ARI, and F-measure, as well as
reduce infeasible solutions from occurring, thus saving being compared with other state-of-art automatic clustering
computational time. Experimental results showed that techniques. Results showed that the proposed algorithms
PSOAC was able to determine the number of clusters were both superior to the competing algorithms, in terms of
automatically, and it also assigned the most suitable rout- well-separated, compact, and sustainable clusters.
ing process for each part. Further, results showed that Rana et al. [179] proposed a hybrid sequential approach,
PSOAC had more time efficiency with large-sized prob- which integrated PSO into a sequence with the k-means
lems than did other compared methods from the literature. algorithm for data clustering. The proposed algorithm was
Another study on a multi-objective approach to auto- able to handle the limitations of both the PSO and k-means
matic data clustering was conducted by Abubaker et al. algorithms and provide improved quality of clustering
[11] by integrating a multi-objective PSO with multi-ob- while also avoiding the solutions from being trapped by
jective simulated annealing (SA) for automatic data clus- local optima. The authors used the quantization error, intra-
tering. The proposed method, called MOPSOSA, cluster distance and inter-cluster distance to evaluate the
simultaneously optimized three cluster validity indices in quality of the clustering solutions. Further, in the proposed
order for the suitable number of clusters to be established, approach, the PSO was used to start the clustering search
as well as for appropriate partitioning. The first validity process due to its fast convergence rate. Then, the results
index (DB index) focused on Euclidean distance, the sec- from the PSO algorithm were fine-tuned by the k-means
ond (Sym index) on the point symmetry distance, while the algorithm. The performance of the proposed hybrid
third (Conn index) was centered on short distance (i.e., a sequential algorithm was compared with that of four other
relative neighborhood graph concept). MOPSOSA existing algorithms, and results showed that it generated
addressed the issue of automatic identification of suit- better clustering solutions than the counter algorithms.
able clusters and partitioning of the identified clusters in
the datasets. Comparing results from MOPSOSA with 4.4.2 Firefly algorithm (FA)
those from six other automatic clustering algorithms
showed MOPSOSA’s superiority in terms of accurate The firefly algorithm has been widely and successfully
results, obtaining the correct number of clusters, handling applied to solve data sorting problems due to its several
overlapped datasets, datasets with various irregular shapes benefits, which include robustness, efficiency, ability to
and datasets that contained many clusters. handle problems in different fields and domains, including
A fast and high-performance PSO algorithm called those that are NP-hard, and versatility. Comprehensive
MPREPSO was implemented by Tsai et al. to handle the reviews of the FA were carried out by Fister et al. in 2013
time complexity associated with the classical PSO [221]. and 2014, which discuss the diverse areas and broad
The proposed method adopted two other operators in spectrum of real-world applications where the algorithm
addition to those of the classical PSO. The first, a pattern has been successfully applied with satisfactory results. In
reduction operator, determined whether a pattern could be both works, the authors went so far as to suggest future
regarded as static, thus compressing it, and the second, a directions for the algorithm, FA. Although the FA has been
multi-start operator, improved the quality of the final studied extensively and shown to have good track records
results obtained. Experimental results showed that across diverse domains, its implementation in data clus-
MPREPSO reduced the computational time and also pro- tering and automatic data clustering scopes is, however,
vided better results when compared with results from five still very scanty. Few works were identified for this review
other existing algorithms. on the application of the firefly algorithm to data clustering,
Recently, Sharma and Chhabra [202] introduced a and even fewer previous studies were found concerning its
mutation operator into a hybrid PSO for sustainable auto- application to automatic data clustering. These two appli-
matic data clustering, calling their hybrid HPSOM. The cations are discussed next.
proposed HMPSO was used to group data generated from
123
A performance study on the firefly algorithm (FA) for they partake in the mutation and crossover operations of
data clustering was carried out by Senthilnath et al. [201]. the genetic algorithm. Also, the initialization stage of FGA
They acknowledged the strengths of FA and applied clas- results in global optimization, which prevents the solutions
sification error percentage (CEP) to generate optimal from getting trapped within the local optima. The test
cluster centroids. The standard FA was implemented for results, when compared to the classical genetic algorithm
data clustering by focusing primarily on the attractiveness, and firefly algorithm, showed that FGA had better inter-
light absorption, population size, and distance; CEP was cluster and intra-cluster distances and more satisfactory
applied in order to check the method that generated the results.
optimal number of clusters. Further, FA was compared Nayak et al. [162] implemented an improved FA, by
with ABC, PSO, and nine other clustering methods. Results incorporating a fuzzy c-means algorithm; the hybrids,
showed that the classification efficiency of FA is superior called FAFCM and improved FAFCM, were used for real-
to others in terms of reliability, efficiency, excellent global world clustering datasets. These two hybrids addressed the
performance, and robustness. shortfalls of the fuzzy c-means method: specifically, local
Hassanzadeh and Meybodi in 2012 presented a new optima entrapment and high sensitivity to initialization.
hybrid approach based on FA and k-means for data clus- FAFCM was designed with two stages: firstly a standard
tering [95]. The proposed model called K-FA was imple- firefly algorithm with fuzzy c-means clustering, and sec-
mented such that FA was first used to find cluster centroids ondly, an improved firefly algorithm with fuzzy c-means
for a user-specified number of clusters, and then, the FA clustering. The first handled the limitations of the fuzzy
was extended using the k-means algorithm. The extension c-means algorithm by minimizing the objective function,
of the algorithm was aimed at refining the cluster centroids while the second phase refined the cluster centers that had
that had been detected by FA; also, global optima were been identified from the first phase, and it also helped in
used to improve the standard FA. Experimental results further minimization of the objective function. FAFCM
showed that K-FA outperformed three other clustering performance was compared with that of three other clus-
algorithms in terms of better efficiency and a decrease in tering algorithms, and the results showed that FAFCM had
intra-cluster distances, which allowed the k-means method consistent results over the test datasets, a faster conver-
to have a proper initialization. gence speed, as well as a minimized objective function,
Banati and Bajaj in 2013 conducted a viability perfor- although the number of clusters was predefined before
mance analysis of FA for data clustering. The proposed centroid assignment by FAFCM.
method, called FClust, which is centroid-based, adopted An efficient hybrid method based on a modified FA and
the flashing behavior of fireflies as the objective function of a dynamic k-means algorithm for data clustering was
the clustering problem to obtain the optimal solution. The developed by Sundararajan and Karthikeyan [213]. The
performance of FClust was evaluated using two statistical proposed algorithm is called a hybrid modified firefly and
criteria, namely trace within criteria (TWR) and variance dynamic k-means algorithm. The dynamic k-means algo-
ratio criteria (VRC) [171]. A comparison of the simulation rithm was incorporated so that it could adequately find the
results of FClust with those from the standard PSO and DE optimal number of clusters during execution time, as well
showed that FClust achieved the best mean fitness and as improve the cluster quality and optimality. Since the
standard deviation values on the VRC measure. Further, model works well for a predefined number of clusters, at
the quality of solutions obtained by FClust was evaluated each iteration it determines new centroids by the cluster
using the number of function evaluations via the run length counter increasing by one until the required cluster quality
distribution (RLD) approach [99]. RLD for FClust showed is achieved. Experimental results showed that the proposed
that when compared with results from the same algorithms model found better clusters quality in less time with
as in Banati and Bajaj [28], it achieved the best function increased optimality, than did the compared algorithm.
evaluation value and a faster convergence rate.
In 2015, Kaushik and Arora integrated FA with an 4.4.3 Artificial bee colony (ABC)
improved genetic algorithm [120]; the hybrid was called
FGA. The proposed model selects its initial population Karaboga and Ozturk [117] implemented the classical
from a population pool, which is based on solutions from artificial bee colony algorithm for data clustering, applying
the firefly algorithm, i.e., the initial population is generated it to handle the problems that are present in the classifi-
from the global best solutions of the firefly algorithm. FAG cation of some benchmark datasets. The datasets were
operates in two ways; first, the classical FA is applied to divided into training and testing datasets, and the classifi-
sets of a randomly selected initial population, which gen- cation error percentage (CEP) was used to evaluate the
erates chromosomes of a set, and secondly, the chromo- percentage of the test datasets with incorrectly classified
somes are then positioned in the mating pool from where patterns. Further, the ABC was used to minimize the sum
123
of the Euclidean distance between two data points and where it was able to achieve optimal clusters with rea-
centroids for all the training datasets. The performance of sonable classifications for the set of images used.
ABC was compared with that for the PSO and nine other Kuo et al. [129] integrated ABC with kernel clustering
clustering methods, and results showed that ABC was able to devise a method called AKC-BCO. When their proposed
to classify the datasets more successfully than other com- algorithm was implemented to solve automatic data clus-
peting algorithms. tering, it determined the appropriate number of clusters as
A hybrid between the discrete ABC with a greedy ran- well as correctly assigning data points to clusters. They
domized adaptive search procedure called hybrid DABC- accomplished this by using a kernel function, which
GRASP was proposed and implemented by Marinakis et al. increased the clustering capability of the ABC algorithm.
[147] to optimize the clustering solution with known (user- Experimental results showed that AKC-BCO was superior
defined) number of clusters. The algorithm comprised to the three other algorithms chosen for comparison in
of two stages, firstly, feature selection and then clustering terms of faster convergence, no local optimum entrapment,
solution. In the proposed model, the feature selection is and better and more stable clustering results. The AKC-
addressed by the discrete ABC, while the greedy random- BCO was further applied to a real-life case of a prostate
ized procedure solves the clustering phase. Experimental cancer prognosis system, and results revealed that AKC-
results showed that the hybrid DABC-GRASP had a better BCO clustered patients tested data appropriately and were
performance, with the largest number of correct clusters, also able to predict survival chances for patients diagnosed
than the other eleven compared algorithms, although the with the disease.
number of clusters was defined a priori. Similar to the study in Kuo et al. [129], Kuo and Zulvia
Tran et al. [220] proposed a hybrid clustering method [128] proposed improving the ABC by incorporating the
based on hybridizing an enhanced ABC with the k-means k-means algorithm for automatic data clustering and
algorithm, the new approach being called EABCK. In the this was further applied to customer segmentation [128].
method, ABC was enhanced by a new mutation operator, The hybrid method, called iABC, improved the classical
which was guided by the best global solution obtained from ABC by directing the movements of bees to better posi-
the enhanced ABC alone, without the k-means algorithm tions and providing better initial centroids for the clusters
(EABC). After that, the global best solution in each itera- defined by the k-means algorithm. These centroids then
tion was updated using the k-means algorithm. EABCK provide the onlooker bees with an improved method of
was compared with six other clustering algorithms, and the finding a better solution faster than was possible with the
results showed that EABCK outperformed the others in first onlooker bees’ movement. Simulation results showed
terms of convergence speed and accuracy. Since the that iABC, which used the VI index to evaluate its effec-
number of clusters was predefined by the method, the tiveness, was superior to seven other automatic clustering
authors suggested that for future research, the proposed algorithms in terms of better and steadier solutions,
method should be applied to solve high-dimensional data- although with a relatively high computational time. When
sets, as well as an automatic clustering problem. the method was further applied to customer segmentation,
Similarly, the authors Karaboga and Ozturk [117], results showed that iABC classified customers appropri-
together with Ozturk et al. [168], devised, devised an ately into ten different clusters so that organizations would
improved binary ABC for dynamic clustering, called IDi- be able to identify potential customers and design the most
sABC. The discrete ABC has the shortfall of depending on suitable marketing strategy to bring new customers
measuring the similarity between binary vectors through onboard, thereby increasing profit.
the Jaccard coefficient; the IDisABC addresses the shortfall
by using all the similarities to efficiently enhance the dis- 4.4.4 Ant colony optimization (ACO)
crete ABC through the genetic components. The crossover
and swap operators are then used on the newly generated Pacheco et al. [169] addressed the problem of automatic
solutions according to the similarity cases. The VI index grouping of data by implementing an automatic clustering
and correct classification percentage (CCP) were used to method based on the collective intelligence of ants in the
evaluate the efficiency of the proposed model and the ant colony optimization (ACO) algorithm. The proposed
quality of clustering results. The performance analysis of method, called Anthill, made use of adaptive strategies to
the method was compared with that of five other clustering speed up the process of building the solution. The silhou-
algorithms, and the results clearly show that IDisABC ette index and visual inspection were used to evaluate the
obtained the optimal number of clusters automatically, performance of the proposed model and also to assess the
with good quality of solutions and fast convergence rates. quality of the generated clusters. Experimental results on
Further, IDisABC was applied to image segmentation, the proposed Anthill algorithm indicated excellent perfor-
mance when compared to results from three other existing
123
methods, and it obtained significant partitioning of the them impeding the other; for commensalism, only one
found clusters. organism benefits from the interaction but does not cause
Niknam et al. [165] proposed an efficient hybrid evo- any harm to the other; while in the parasitism phase, one
lutionary algorithm, called ACO-SA, that combined ant organism benefits from the interaction while causing harm
colony optimization (ACO) and simulated annealing (SA) to the other. The simulation results from SOS were com-
algorithms to solve the clustering analysis problem. The pared with those for seven other metaheuristic algorithms,
proposed model, which is applicable only when the number and it was shown that SOS outperformed the others in
of clusters is known a priori, was intended to find optimal terms of quality of solutions, accuracy, stability, and con-
or near-optimal solutions for clustering problems. Simu- vergence speed. The computational time, however, was
lation results of the ACO-SA showed that the hybrid higher than for any of those compared, which is due to it
algorithm outperformed the basic SA, ACO, and k-means, having the most function evaluations.
individually, for the partitional clustering problem in terms
of robustness and efficiency. 4.4.6 Bacterial evolutionary algorithm (BEA)
Liu and Fu [142] proposed the ESacc clustering algo-
rithm, which was based on ant colony optimization to solve Das et al. [52] proposed a bacterial evolutionary algorithm
unsupervised clustering. Their proposed method iteratively for automatic data clustering (ACBEA) in Das et al. [52].
keeps the best solutions stochastically. The proposed The proposed method, according to the authors, was
method made use of the Dunn, Jaccard, Folks and Mallows, inspired by biological microbial evolution, which uses the
and Rand indices to evaluate the optimal number of clus- operations of bacterial mutation (it mimics the process
ters. Computational results obtained from ESacc were occurring at the genetic level in bacteria, which improve
compared with those from the original Sacc algorithm, and the chromosome parts) and gene transfer (information is
it showed that ESacc had lesser run time, better clustering exchanged between chromosomes in the population). The
effect, more performance stability, and greater efficiency. operators were then modified to handle the variable length
The paper presented by Boryczka [35] used a modifi- of chromosomes that encode different clustering classifi-
cation of Lumer and Faieta’s algorithm for data clustering, cations. The CS index was used to evaluate the perfor-
called ant-based clustering algorithm (ACA). The approach mance of the proposed algorithm, which was then
mimics the clustering behavior that had been observed in compared with that of two other clustering algorithms, to
real ant colonies. It improved clustering convergence and show that ACBEA was superior in terms of result accuracy.
the spatial separation between clusters. Further, the algo-
rithm was able to detect the number of clusters automati- 4.4.7 Grey wolf optimizer (GWO)
cally without the prior need for information about the data
objects. They used Euclidean distance, cosine measure, and A grey wolf optimizer (GWO)-based automatic clustering
Gower measure to evaluate the quality of the clustering for satellite image segmentation was proposed by Kapoor
solutions so obtained. Although ACA dealt with numerical et al. [114]. The algorithm was further applied to two
databases, it did not require any information about the satellite images from New Delhi, and its performance was
feature of the clusters or the number of clusters. Also, the evaluated using the DB index, inter-cluster distance, and
ACA algorithm was able to obtain comparative clustering intra-cluster distance. Computational results showed that
results when compared with another existing algorithm. GWO is computationally efficient, and its accuracy is
The authors, however, suggested that the ACA algorithm superior to those of the other three compared clustering
could be hybridized with other metaheuristics to improve algorithms. Furthermore, the result of the image segmen-
its performance and efficiency. tation showed that GWO reveals the growth of urbaniza-
tion and infrastructure and a decrease in green forest
4.4.5 Symbiotic organism search (SOS) vegetation in the surrounding areas of New Delhi.
A new swarm metaheuristic algorithm, called symbiotic 4.4.8 Sine–cosine algorithm (SCA)
organism search (SOS), was implemented for automatic
data clustering by Zhou et al. [245]. SOS mimics the More recently, Elaziz et al. [64] proposed an automatic
symbiotic interaction of organisms needed for survival and data clustering algorithm, called ASOCSA, which is based
proliferation in an ecosystem; it was proposed to address on the hybridization of the sine–cosine algorithm (SCA)
the shortfalls of the k-means algorithm. SOS adopts three with the atom search optimization (ASO). The main
biological interaction phases, namely mutualism, com- objective of the proposed algorithm is to automatically find
mensalism, and parasitism. In the mutualism phase, the optimal number of centroids and their respective
organisms benefit from each other without either of positions to minimize the CS index. To achieve this,
123
ASOCSA improves on the original ASO by adopting SCA the authors proposed this shortfall should be a focus of
as the local search operator. The effectiveness of the pro- future research.
posed clustering algorithm was evaluated using different
validity indices, such as the Dunn index, Silhouette index, 4.4.10 Bat algorithm (BA)
Davies–Bouldin index, and the Calinski–Harabasz index.
ASOCSA showed superiority over five other existing Jensi and Jiji [110] implemented a modified bat algorithm,
clustering algorithms in terms of robustness and efficacy. called MBA-LF, for data clustering. In their work, they
employed the Lévy flight mechanism to accelerate the
4.4.9 Cuckoo search (CS) movement and foraging abilities of bats in other to enhance
the search process. Further, the Lévy flight was used to
Goel et al. [84] proposed a cuckoo search clustering improve the quality of the clustering results. They used the
algorithm (CSCA). The CSCA was able to group a set of Euclidean distance to evaluate the distances between the
data points into clusters having similar attributes. The clusters that had been obtained. The computational results,
algorithm was also able to work in an unsupervised way when compared with those from three other existing
without having to consider the class of the data points algorithms, showed that MBA-LF achieved better clusters
during the clustering process. The Davies–Bouldin (DB) for the test data objects, it escaped entrapment of local
index was used to evaluate the performance of the pro- optima, and it effectively explored the search space.
posed method. Experimental results showed that the CSCA
algorithm demonstrated high accuracy. The authors further 4.4.11 Bee-inspired algorithm (BeeA)
applied the CSCA algorithm to a satellite image and used it
to extract images of water from a real-time multispectral A new encoding scheme based on the bee-inspired algo-
remote sensing image. rithm was called cOptBees when presented by Cruz et al.
Senthilnath et al. [200] performed a comparative study [47]. The proposed algorithm employed an encoding
based on three nature-inspired techniques; the genetic scheme, whereby each bee represented a prototype of the
algorithm (GA), particle swarm optimization (PSO), and generated clusters. The proposed method was able to
cuckoo search (CS) were implemented and analyzed for the generate and maintain the diversity of solutions by finding
performance in the clustering problem. The cuckoo search multiple suboptimal solutions in a single run. Furthermore,
exploits the Lévy flights mechanism, which is heavy-tailed the method explored the multimodality feature that is
and helps in covering the output domain efficiently. To associated with bee colonies. They used entropy, classifi-
evaluate the clustering solutions obtained by the three cation error percentage (CEP), purity, and the silhouette
algorithms, the authors employed the classification error index as the fitness functions to evaluate the quality of the
percentage (CEP) and the statistical significance test. In the clustering solutions obtained and the performance of the
experimental results, compared to the PSO, the GA took algorithm. The test results that were presented showed that
less time, but the CS algorithm required considerably less when cOptBees was compared to five other algorithms of
time than did the PSO, with the GA also being more time bi-dimensional and n-dimensional datasets, cOptBees
efficient than the PSO. The heavy-tailed property of the obtained better performance than the other competing
Lévy flights also helped the solutions to converge quickly, algorithms. It was able to find high-quality cluster parti-
thus increasing efficiency. tions without needing to know the appropriate number of
Bouyer et al. [36] implemented a hybrid approach based clusters in the dataset.
on integrating the cuckoo search and differential evolution
algorithms for data clustering and called the hybrid 4.5 Clustering using the plant-based algorithm
HCSDE. The HCSDE algorithm firstly initializes a random
population. Then, the cuckoo search uses the Mantegna 4.5.1 Flower pollination algorithm (FPA)
Lévy distribution to produce new nests and also boost the
local search capability. As a validity metric to evaluate the In the study carried out by Wang et al. [228], a flower
performance HCSDE the authors used the intra-cluster pollination algorithm (FPA) with bee pollinators was pro-
distance measure (for an internal quality measure) and the posed to solve the clustering problem. The proposed
error rate (ER) (for an external, internal quality measure). method called BPFPA employed the discard pollen oper-
Experimental results were compared with those for six ator and the crossover operator to increase diversity in the
other algorithms, which showed that HCSDE was superior population, and it enhanced the local search ability by
in terms of convergence speed, accuracy, and better total using the elite-based mutation operator. The pollens rep-
within-variance value. However, in some instances, the resent the centroids of the predefined clusters. These
HCSDE algorithm became trapped in local minima. Hence, operators were incorporated to address the local
123
entrapment problem and the poor explorative ability of the participate in the fuzzy cluster, and otherwise for value 0.
basic FPA, thus, enhancing the explorative ability, The superiority of the proposed technique, as compared
improving the convergence speed, as well as increasing the with other algorithms, showed that ADEFC consistently
diversity in the population. Experimental results were performed better than did the other clustering techniques.
compared with those from six other metaheuristic algo- Maulik and Saha [150] then extended their study and
rithms and showed the BPFPA’s superiority in terms of a proposed a new real-coded modified DE-based automatic
higher level of stability, higher accuracy, and faster con- fuzzy clustering method, called MoDEAFC [149]. The
vergence, indicating it was more competitive in solving method was implemented to address the issues of proper
clustering problems. The authors suggested that BPFPA cluster numbering, as well as good partitioning. It extended
could be extended to determine the optimal number of the ADEFC by using a fixed-length representation to
clusters dynamically and that its applicability to higher- encode the centroids of each individual and a masker to
dimensional problems should be investigated. activate or deactivate a centroid. The Xie–Beni index was
Agarwal and Mehta [12] studied the application of a also used in assigning proper points to different clusters by
modified flower pollination algorithm (FPA) to solve the minimization, while also considering the Euclidean dis-
data clustering problem. The objective function of the tance. A new mutation operator, which was used to replace
proposed MFPA-C is to maximize both intra-cluster simi- the mutation operator in the classical DE, exponentially
larity and inter-cluster dissimilarity. The performance of decreased within the range of [1, 0.5]. Experimental results
MFPA-C was compared to that of three other clustering showed that MoDEAFC consistently performed better in
algorithms, which showed that it achieved more promising terms of the accurate number of clusters than did the four
results with consistent performance than did the others. other compared algorithms. Further, its application to IRS
satellite images of Calcutta and Mumbai showed efficiency
4.6 Clustering using breeding-based algorithms in the image segmentation.
Another multi-objective application of DE to automatic
4.6.1 Differential evolution (DE) fuzzy clustering, called MODE, was presented by Suresh
et al. [214]. These authors used a real encoded centroid-
Das et al. [50] proposed an improved differential evolution based scheme for their search variables, which also con-
(DE) algorithm for the automatic clustering of unlabeled tained the variable number of cluster centers. Further, the
datasets, called ACDE. The proposed algorithm was able to best solutions from the Pareto optimal set were obtained
find the optimal number of clusters automatically and was using a gap statistic. MODE was compared with four other
applicable for high-dimensional datasets. Further, the per- state-of-the-art multi-objective methods, and results
formance of the proposed method was validated with the showed that it produced better clustering results than did
CS-index [40] and DB-index [54]. From the statistical the others.
analysis of experiments, ACDE outperformed the other Kundu et al. [127] implemented GADE, which is an
compared state-of-the-art algorithms, including the classi- integration of a multi-objective DE-based algorithm with
cal DE clustering algorithms, although it did not win in all genetic algorithm (GA), for automatic clustering. The
the instances. proposed model incorporated some operators of the clas-
Lee and Chen [136] implemented an improved differ- sical genetic algorithm and used the XB and FCM indices
ential evolution (ACDE-O) algorithm using a crisp number as the objective functions to be optimized. Computational
of oscillations for automatic clustering. The oscillation experiments showed that GADE, when compared with the
mechanism was used to improve the search possibility of results obtained from two other algorithms, achieved the
finding more possible clusters in the case where the number best performance in terms of adjusted Rand index and
of initial clusters was inadequate as a result of bad clusters. silhouette index with an equal number of runs for all the
Their test results, when compared to those for another generations. Zhong et al. [243] also optimized the XB and
clustering algorithm, showed that ACDE-O was better at FCM indices for multi-objective DE by utilizing a two-
finding a more suitable number of clusters. layer fuzzy clustering technique, called AFCMDE.
Saha et al. [192] implemented a differential evolution
(DE)-based fuzzy clustering for automatic cluster evolution 4.6.2 Genetic algorithm (GA)
(ADEFC), where they used the Xie–Beni index to assign
points to different clusters. The Xie–Beni index was also A genetic algorithm (GA) that was modified to improve the
used as the validity measure for the cluster partitioning, accuracy of classification in cluster analysis was devised by
and then, the centers of the clusters were encoded in vec- Wang and Wu [229]. Called the chaotic genetic algorithm
tors represented by 0’s and 1’s. Value 1 of the masker cell (CGA), the proposed method adopted the ergodic property
determines that the encoded center of the vector can of chaotic phenomena to optimize the initial population in
123
order to speed up the process of selection, crossover, and local optimum close to the solution. Although GGA was
mutation operators, as well as the convergence property of not tested on real-world datasets, the representation
the genetic algorithm. Experimental results showed that the scheme, however, resulted in high time complexity for a
proposed CGA attained global cluster centroids and greatly high volume of data.
improved the amplitude of operation than did the three Similar to the study in [16], Salcedo-Sanz et al. [195]
other existing models. However, CGA was not able to focused on using a fuzzy version of the DBI index, and
automatically assign clusters. their encoding scheme was composed of the membership
Dutta et al. [62] implemented a mixed feature multi- matrix and group members for their proposed GGA-based
objective GA with k-means for data clustering, called method. However, the encoding scheme they adopted also
MOGA. MOGA addressed the issues of continuous and resulted in high time complexity as a result of the data size.
mixed features that are present in datasets. It simultane- Also, similar to the studies carried out in [16, 195],
ously optimized the intra-cluster distance (homogeneity) Raposo et al. implemented an automatic clustering method
and inter-cluster distance (separation) by using a unique using a genetic algorithm with new solution encoding and
distance feature, which was sufficient for both the contin- operations called automatic clustering genetic algorithm
uous and mixed features. Experimental results showed that (ACGA) [184]. The method adopted a new solution
MOGA achieved accurate cluster centroids. The model, encoding-based scheme that had not been tested with the
however, was not able to deal with unseen data points and classical GA, and new genetic operators (two new muta-
missing features and did not obtain the optimum number of tions and one new crossover) were developed to ensure that
clusters. there was high diversity in the population. The CH index
The earliest attempt at using genetic algorithm for was used to test the effectiveness of ACGA; experimental
automatic data clustering called CLUSTERING was pre- results showed that ACGA outperformed two classical
sented by Tseng and Yang in 2001 [223]. The proposed algorithms in terms of better convergence and higher fit-
approach addressed clustering in three ways: firstly, a ness function values.
nearest-neighbor clustering method was used to group In 2012, He and Tan proposed a two-stage genetic
together data points that are similar in order for small algorithm for automatic clustering, which they called
clusters to be obtained. Secondly, the proposed method TGCA. The model employed the selection and mutation
merged the set of small clusters into larger ones, which operators of the classical genetic algorithm but changed the
was done by using a weighted difference between the BGS probabilities of these operators according to the consis-
and WGS indices that define the fitness function. Lastly, tency of the number of clusters present in the population.
the appropriate cluster partitioning was done using a TGCA focused firstly on searching for the best number of
heuristic approach. Simulation results showed that CLUS- clusters and then gradually moved to find global optimal
TERING outperformed three other compared algorithms. centroids. The model was evaluated using the CH index as
An automatic clustering algorithm that used the genetic the fitness function. The efficiency of TGCA was shown
algorithm, called AGCUK, was implemented by Liu et al. when it was compared with three automatic clustering
in 2011. A noisy selector and division-absorption mutation algorithms. It was evident that TGCA did better in terms of
operator were used to create a balance between selection automatically finding the correct number of clusters and
pressure and population diversity. The model also adopted clustering accuracy. Two limitations of the method are,
a cluster-based representation, whereby an individual rep- firstly, the quality of the final clustering solution may not
resents a real-coded chromosome of variable length, which be good enough due to failure to capture the chromosomes
is randomly selected. Further, they used the DB index to representing all clusters as a result of the random selection,
compute the fitness of an individual. AGCUK outper- and, secondly, the method does not rearrange the chro-
formed four other automatic clustering algorithms in terms mosomes before the crossover operation.
of obtaining the optimum number of clusters and lower To address these limitations, Rahman and Islam [175]
misclassification rates. proposed a different GA-based approach, called GenClust,
Agusti et al. [16] presented an automatic clustering that is capable of identifying the right chromosomes using
method based on the grouping genetic algorithm (GGA). In a novel initial population selection approach, chromosomes
the proposed GGA, they applied a type of partition-based rearrangement, twin removal operator, and a fitness func-
encoding scheme, and the DBI index was used as the tion and then also finding the right number of clusters
objective function. The functions of the GGA were, firstly, automatically. GenClust avoided a user-defined number of
the crossover operation between the groups, and offspring clusters while achieving clusters of high quality. The
was arranged according to the generated groups, then the superiority of GenClust over five other existing approaches
mutation operator was applied by merging and splitting proved that it was able to rearrange chromosomes with the
clusters, and finally, the local search was used to find the aid of k-means, which produced better results than that of
123
He and Tan’s algorithm [97]. Further, GenClust’s initial 4.7 Clustering with the social human behavior-
selection of a population was based on a deterministic and based algorithms
random process. GenClust, however, was not able to han-
dle datasets of high dimension due to the increased com- 4.7.1 Teaching learning-based optimization (TLBO)
plexity of GA.
Satapathy and Naik [197] developed a TLBO algorithm,
4.6.3 Invasive weed optimization (IWO) which they used to find the centroids of a user-specified
number of clusters. The proposed method made use of two
The first attempt at using IWO for automatic data cluster- phases of the TLBO algorithm, namely the teacher phase
ing was proposed by Chowdhury et al. in 2011. The and the learner phase. The teacher phase is when learning
algorithm made use of a modified Sym-K index, as the from the teacher occurs, while in the learner phase learning
fitness function, in order to evaluate the appropriate parti- comes from the interactions occurring between learners.
tioning of the datasets. The performance of the algorithm The learner phase corresponds to the fitness function, while
was compared with that of three other algorithms, and the teacher phase relates to the best solution. The proposed
results showed that IWO partitioned data better than they algorithm halted once the user-specified number of itera-
did, which was evident in the Minkowski scores of the real- tions was exceeded. Experimental results showed that the
life datasets. However, the optimal solutions were derived proposed TLBO method had more potential to find
with a minimum number of populations, which also appropriate centroids to the predefined number of clusters
reduced computational time. than did the two other compared algorithms.
Zhao and Zhou [242] proposed an improved kernel In another related study, Sahoo and Kumar [193] pro-
possibilistic fuzzy c-means algorithm for clustering anal- posed two different modifications for the TLBO method, to
ysis problem called IWO-KPFCM, which was based on the enhance its performance in clustering domains. The mod-
IWO algorithm; the proposed model was designed to ifications were, such that instead of random initialization, a
handle the issues arising from both the fuzzy c-means and predefined method was previously used to exploit initial
the possibilistic fuzzy c-means algorithms. IWO-KPFCM, centroids. Also, the technique could handle data vectors
at first, uses the basic IWO to find the optimal solutions to that had gone out of the boundary conditions. The perfor-
the initial centroids, and then, maps the input data from the mance of the proposed modified TLBO was evaluated
sample space into the high-dimensional feature space by based on quantization error, intra-cluster distance, and
using the kernel approach. Further, the sample variance is inter-cluster distance. Further, a comparison was made
infused into the objective function to measure the degree of between the modified method and three other algorithms,
data compactness, and then, the improved algorithm clus- including the basic TLBO algorithm. From the experi-
ters the data. Results clearly showed that the proposed mental results, the proposed modified TLBO method
method had increased cluster accuracy, faster convergence showed more accurate results than did the others.
speed, and a more robust ability to repel noise and outliers Similar to the study carried out by Das et al. [50]
than did the two other compared algorithms. (Sect. 4.4.1), Murty et al. [159] implemented a teaching
Liu et al. [139] implemented a multi-objective IWO learning-based optimization (AUTO-TLBO) for automatic
algorithm, called MOIWO, to solve the clustering problem data clustering. The effectiveness of their proposed method
[139]. A feedback-update mechanism was employed to was evaluated with the CS validity index and compared
maintain the diversity of the number of clusters during the with that of four other existing algorithms. Results showed
iteration process. The feedback-update mechanism helps that AUTO-TLBO was superior to the other techniques in
the solution set to accommodate all the types of cluster terms of optimally finding the number of clusters auto-
numbers that are identified. Further, the silhouette index matically and a fast convergence rate, although their
was used to evaluate the efficiency of the proposed algo- method did not win for all the test instances.
rithm as well as to select the best solution. Experimental
results showed good performance of the multi-objective 4.7.2 Imperialist competitive algorithm (ICA)
IWO, although it was not able to detect the optimal number
of clusters automatically. A recent application of the imperialist competitive algo-
rithm was in applying it for the first time to solve automatic
data clustering problems, which were carried out by Ali-
niya and Mirroshandel in 2019. The proposed algorithm,
called automatic clustering using an imperialist competi-
tive algorithm (AC-ICA), was based on a novel
123
combinatorial merge-split method. In the proposed method, Experimental results demonstrated that ACGSA outper-
the authors introduced a change at the assimilation step of formed five other existing automatic clustering methods in
colonies in order to increase the exploration ability of the terms of efficiency, determining the accurate number of
colonies’ movement. Furthermore, a new method was clusters, best partition, and effectiveness. Moreover, its
provided to change the number of clusters by combining a application to the automatic segmentation of both grayscale
random and homogeneity-based merge-split approach. and colored images also showed its efficacy.
Also, for the re-initialization of empty centroids, an effi-
cient method based on density was adopted. To address the 4.8.2 Harmony search (HS)
automatic clustering problems, the initialization and
imperialist competition steps were changed. AC-ICA used In 2016, Kumar et al. [125] developed a parameter adaptive
the functions of purity, entropy, Rand index (RI), and harmony search algorithm (ACPAHS) for automatic data
adjusted Rand index (ARI), to determine the fitness values clustering. The authors used a real-coded variable-length
and quality of the solutions obtained. Computational results harmony vector, which was able to detect the number of
were compared with the imperialist competitive algorithm clusters automatically. Furthermore, the assignment of data
(ICA) and its three other variants, and they showed that points to different cluster centers was done using a new
AC-ICA achieved a more accurate number of clusters, approach of weighted Euclidean distance, which was able
better convergence rate, higher accuracy of solutions and to detect any type of cluster regardless of its geometric
better homogeneity. Further, when AC-ICA was applied to shape. The authors also applied their method to automatic
face recognition, the results achieved were satisfactory. image segmentation and compared it with four other
existing clustering techniques. Experimental results
4.8 Clustering with physics-based algorithms showed ACPAHS outperformed other techniques in
detecting the number of clusters automatically and gave
4.8.1 Gravitational search algorithm (GSA) better clustering result.
Kumar and Sahoo [126] carried out a review study on the 4.8.3 Black hole (BH) algorithm
gravitational search algorithm (GSA), which is based on
the theory of gravity, and its application to data clustering. Hatamlou [96] proposed another data clustering algorithm,
The algorithm is able to solve large problems, including called BH algorithm, which is also inspired by the black
optimization problems, because it requires only two hole phenomenon. His proposed algorithm has two sig-
parameters to be adjusted and has the ability to find near- nificant advantages over the compared algorithms. First, it
global optimum solutions. This ability allows the algorithm has a simple structure and easy implementation. Second, it
to provide better results when compared with other nature- is free from parameter tuning issues. Further, in the pro-
inspired algorithms. The authors went further to discuss the posed BH algorithm, the best candidate among all the
variants of GSA and hybrid methods. As reported, hybrids candidates in the search space at each iteration is selected
based on GSA with other algorithms handled a more as a black hole, while all the other candidates are generated
comprehensive range of problems, thus providing more as the normal stars. The creation of the black hole is not a
robust solutions and enhancing the capabilities of GSA. random process; instead, it involves one of the real can-
Furthermore, GSA and four of its hybrid-based variants didates of the population. After that, all the candidates then
were compared with seven other algorithms, and results migrate toward the black hole, based on their current
showed that two of the hybrid-based GSA algorithms location and a random number. The BH algorithm also has
obtained better quality of solutions, better computational the ability for the black hole to absorb the stars that sur-
time and more efficient convergence. Moreover, it was rounds it. The performance of the BH algorithm is evalu-
reported that the GSA was widely applied and reported in ated using the intra-cluster distance measure (for internal
many publications in the areas of computer science and quality measure) and the error rate (ER) measure (for
computing, as well as civil and mechanical engineering. external, internal quality measure). Experimental results,
Kumar et al., in 2014, implemented a gravitational when compared with those from four other existing algo-
search algorithm for the automatic evolution of clusters and rithms, showed that the BH algorithm obtained higher-
applied it to image segmentation (Kumar, Chhabra, and quality clusters than others. For future research on this
Kumar, [124]. The proposed model, called ACGSA, used a work, the author suggested that the BH algorithm could be
variable chromosome representation to encode the cluster combined with other nature-inspired algorithms for greater
centroids of different numbers for clusters. Further, two effectiveness.
new operations of threshold setting and weighted cluster Abdulwahab et al. [2] addressed the issue of exploration
centroid were employed to refine the centroids. capabilities that are associated with the original black hole
123
algorithm by introducing the Lévy flight mechanism. Their centroids rapidly. Secondly, to improve the accuracy of
proposed algorithm, called Lévy Flight Black Hole (LBH), clustering and convergence rate, a modified clonal muta-
was able to optimize the performance of the original black tion scheme was introduced. Further, two mutation strate-
hole algorithm with enhanced global search capacity to gies were selected to produce new antibodies. The
avoid being trapped in the local minima. In the proposed performance of the proposed algorithm was evaluated
LBH, the movement of each star depends mainly on the using the PBM index and was compared with the perfor-
step size that is generated by the Lévy distribution. Each mance of five other state-of-the-art algorithms. Results
star explores an area that is far from the current black hole showed that DLSIAC provided more consistent results than
when the step size is big, while, when the step size is small, did the others. Further application of the algorithm to
the star explores an area that is near the current black hole. image segmentation showed that it kept the regional con-
Further, the Lévy flight resolved the issue of global opti- sistency and detailed image information. DLSIAC was also
mization and efficiently improved the search capability of able to find the optimal number of clusters.
the stars within the search space. The performance of the
LBH algorithm was evaluated using the Euclidean distance 4.9.3 Action-set learning automata
measure. Furthermore, the performance of the LBH algo-
rithm was compared with that of nine other clustering Recently, Anari et al. [24] proposed continuous action-set
algorithms, and results showed that LBH clustered the data learning automata, called ACCALA, for automatic data
objects efficiently, escaped local optima entrapment, and clustering, which was further applied to image segmenta-
effectively explored the search space more than the other tion. The continuous learning automata is an optimization
competing algorithms did. tool that interacts with a dynamic environment and learns
the optimal action from the feedback from the environ-
4.9 Clustering with miscellaneous sources ment. The continuous learning automata further defined a
of inspiration algorithms suitable action set for each automation, thus significantly
impacting the search behavior. Also, the proposed
4.9.1 Membrane computing ACCALA was aimed at finding an action set for each
automation automatically. The simulation results of
Peng et al. [173] published an automatic clustering algo- ACCALA, in comparison with results from seven well-
rithm inspired by membrane computing. In their method, a known automatic clustering methods, showed that it pro-
tissue-like membrane system with fully connected struc- duced a more accurate number of clusters (compact and
tures was designed to be the computing framework for well-separated clusters) with greater efficiency. ACCALA
automatically determining the most appropriate number of also performed well in the segmentation of grey-scale and
clusters as well as the optimal partitioning between clus- colored images.
ters. The method also incorporated a modification of the
velocity-position model that was developed as evolution 4.9.4 Artificial immune system (AIS)
rules based on its communication system, and the CS index
was used to evaluate the efficiency of the proposed method. Younsi and Wang [235] published a new artificial immune
Comparison with three other existing algorithms showed system (AIS) algorithm for data clustering, which was
that the proposed membrane algorithm effectively deter- based on the CLONALG algorithm, although the proposed
mined the most appropriate number of clusters, and dis- method did not use the cloning process in CLONALG
played more robustness, better scalability, and better because it would have introduced intensive matching
clustering effects on high-dimensional datasets than did the computation between antibodies and antigens. In the new
other competing algorithms. method, the Euclidean distance was used to measure the
distance between the antigens and cells (data vector), as
4.9.2 Dynamic local search follows. On the one hand, if the Euclidean distance
obtained is less than or equal to the network affinity
Liu et al. [140] implemented an automatic clustering threshold (NAT), the value is selected and retained in the
algorithm, called DLSIAC, which was based on a dynamic long-term memory. If, on the other hand, the distance
local search. The proposed algorithm was developed to measure is more than one, the cell identifies the antigen,
automatically generate the number of clusters as well as and then, the closest cell is selected. Once all the cells have
proper partitioning for some selected datasets. Two tech- been presented to the antigen, the closest cell that has been
niques contributed to the performance of the proposed selected eliminates the antigen. The cells can identify more
algorithm. Firstly, a dynamic local search strategy was than one antigen, and the resulting cells are the represen-
employed to find the correct numbers of clusters and tation of the data that is being processed. Further, the
123
selected cells represent the data that is being compressed. approach used, and finally the cluster validity index used in
Secondly, the algorithm was used for clustering and data evaluating the quality of the proposed clustering
visualization. All the cells that had been selected and stored algorithms.
in the memory were then matched to other similar/identical
cells to form clusters. The NAT value in this phase is re-
calculated using the memory cells and a new randomly 5 Clustering similarity measures
chosen vector. If the value of the Euclidean distance is less
than or equal to the NAT value, the cells that are close to One question that comes to mind once a clustering algo-
each other are linked together to form clusters. Further- rithm has classified a dataset is, how well does the classi-
more, the elimination of the antigens reduces the time fication fit the input data? This is pertinent because no
complexity of repeatedly matching similar cells, which is single clustering algorithm is optimal, so given different
used as the stopping criteria. The AIS algorithm was effi- conditions, different or even the same algorithm would
cient enough in the clustering results that were obtained. yield different results. This makes it necessary to estimate
how well a classification fits the underlying structure of the
4.9.5 Metaheuristics for automatic clustering input dataset. The evaluation criteria for validating results
of the clustering algorithm are a fundamental aspect of data
José-Garcı́a and Gómez-Flores [111] presented a survey clustering. Cluster validation criteria are usually classified
study on some nature-inspired metaheuristic algorithms as being either internal or external validation. However,
that had been used for automatic data clustering. The study there is a third classification called relative validation
also covered some cluster validity indices (CVIs) that were [31, 137].
applied to evaluate the quality of automatic clustering
algorithms as well as clustering solutions. The authors 5.1 Internal validation criteria
reviewed a total of 65 automatic clustering approaches,
which were based on single-solution, single-objective, and In real-world applications, the underlying structure of the
multiobjective metaheuristic techniques. According to dataset is usually unknown; therefore, there is no way of
José-Garcı́a and Gómez-Flores, the usage percentages of knowing the correct partitioning of the dataset. The internal
the three aforementioned techniques were recorded as 3%, validation criteria focus on the partitioned dataset (by the
69%, and 28%, respectively. Among their findings, they clustering algorithm); it measures the intra-cluster com-
reported that even though the single-objective clustering pactness and the inter-cluster separation. There are a
algorithms appears to be suitable and efficient for the tasks variety of criteria that have been proposed, which are
of grouping linearly separable clusters, many researchers outlined below.
had recently become more inclined to use the multiobjec-
tive algorithms to address nonlinearly separable problems. 5.1.1 Sum of squared error
More recently, Ezugwu [75] carried out a comprehen-
sive study on the major nature-inspired metaheuristic Sum of squared error (SSE) is one of the most popular
algorithms that had been applied to solve automatic data cluster evaluation criteria; it is defined as follows:
clustering problems. The publication included a similar X
K X
comparative study of several modified well-known global SSE ¼ kxi lk k2 ð7Þ
metaheuristic algorithms in order to evaluate their suit- k¼1 8xi 2Ck
ability for solving automatic clustering problems. Further,
where Ck is the set of all instances in the cluster k and lk is
Ezugwu implemented several representatives of single and
the vector mean of k. So, we need to look for a partition
hybrid swarm intelligence and evolutionary algorithms,
with the lowest SSE [93, 222].
namely particle swarm differential evolution algorithm,
firefly differential evolution algorithm, and invasive weed
5.1.2 Scatter criteria
optimization differential evolution algorithm to deal with
the task of automatic data clustering.
For any kth cluster, the scatter criteria [59, 188] are given
In Table 7, we present a summary of the state-of-the-art
as follows:
literature works carried out on the classical and automatic X
data clustering techniques. The summary presented in Sk ¼ ðx lk Þðx lk ÞT : ð8Þ
Table 7 focuses mainly on the contributing authors, clus- x2Ck
123
Table 7 Summary of some metaheuristics that have been applied to non-automatic clustering and automatic clustering
References Clustering method Application area Clustering Cluster validity index (CVI)
type
Van der Merwe and PSO Cluster analysis Non- Intra-cluster and inter-
Engelbrecht [224] automatic cluster distance
Zhao et al. [241] PSO Cluster analysis Non- Intra-cluster distance
automatic
Chuang et al. [43] ACPSO Cluster analysis Non- –
automatic
Silva Filho et al. [209] FCM-IDPSO Cluster analysis Non- Rand index (RI)
automatic
FCM2-IDPSO Cluster analysis Non-
automatic
Rana et al. [180] BR-APSO Cluster analysis Non- –
automatic
Cura [48] PSO Cluster analysis Non- –
automatic
Lashkari and Moattar [135] ECPSO Cluster analysis Non- Purity index
automatic
Alswaitti et al. [22] DPSO Cluster analysis Non- Dunn index
automatic
Duan et al. [58] ABCPS Cluster analysis Non- MOC index
automatic Fukuyama and Sugeno
index
Weighted inter–intra-index
Atabay et al. [26] IKPSO Cluster analysis Non- –
automatic
Nayak et al. [161] GA-IPSO-K Cluster analysis Non- –
automatic
Huang et al. [102] ACOR-PSO Cluster analysis Non- –
Continuous optimization automatic
problem
Omran et al. [167] DCPSO Cluster analysis Automatic Dunn index (DI)
Turi index
Image segmentation S_Dbw index
Masoud et al. [148] CPSOII Cluster analysis Automatic Variance ratio criterion
(VRC)
Combinatorial Davies–Bouldin (DB) index
optimization problem
Ling et al. [138] PLDC Cluster analysis Automatic RI
Kuo and Zulvia [131] ACPSO Cluster analysis Automatic VI index
Nanda and Panda [160] MOIMPSO Cluster analysis Automatic –
3D human models
Das et al. [51] MEPSO, Kernel_MEPSO Cluster analysis Automatic
Kao and Chen [113] PSOAC Cluster analysis Automatic –
Cell formation
Abubaker et al. [11] MOPSOSA Cluster analysis Automatic DB index
Symmetry (Symm) index
Conn index
Tsai et al. [221] MPREPSO Cluster analysis Non- Compact-separated (CS)
automatic index
Image segmentation Non-
automatic
123
Table 7 (continued)
References Clustering method Application area Clustering Cluster validity index (CVI)
type
123
Table 7 (continued)
References Clustering method Application area Clustering Cluster validity index (CVI)
type
123
Table 7 (continued)
References Clustering method Application area Clustering Cluster validity index (CVI)
type
Sharma and Chhabra [202] HPSOM Cluster analysis Automatic Inter-cluster distance
AHPSOM Mobile network data Intra-cluster distance
Adjusted Rand index (ARI)
F-measure
Aliniya and Mirroshandel AC-ICA Cluster analysis Automatic Purity entropy Rand index
[21] Face recognition (PERI)
ARI
F-measure
Rana et al. [179] Hybrid sequential Cluster analysis Non- Intra-cluster distance
automatic Inter-cluster distance
Quantization error
Boryczka [35] ACA Cluster analysis Non- Cosine measure
automatic Gower measure
Euclidean distance
Bouyer et al. [36] HCSDE Cluster analysis Non- Inter-cluster distance
automatic Intra-cluster distance
Error rate
Senthilnath et al. [200] PSO, GA, CS Cluster analysis Non- CEP
automatic Statistical significance test
Goel et al. [84] CSCA Cluster analysis Non- DB index
Satellite image automatic
Jensi and Jiji [110] MBA-LF Cluster analysis Non- Euclidean distance
automatic
Cruz et al. [47] cOptBees Cluster analysis Non- Entropy
automatic CEP
Purity
SI
Kumar and Sahoo [126] GSA Cluster analysis Non- –
automatic
Abdulwahab et al. [2] LBH Cluster analysis Non- Euclidean distance
automatic
Hatamlou [96] BH Cluster analysis Non- Error rate
automatic Inter-cluster distance
Intra-cluster distance
Younsi and Wang [235] AIS Cluster analysis Non- Euclidean distance
automatic
Ezugwu [75] FA, DE, PSO, IWO Cluster analysis Automatic DB and CS index
Rajah and Ezugwu [176] SOS, SOSFA, SOSDE, SOSTLBO, Cluster analysis Automatic DB and CS index
SOSPSO
Agbaje et al. [14] FAPSO Cluster analysis Automatic DB and CS index
X X X X
5.1.3 Condorcet’s criterion s xj ; xk þ ð9Þ
d xj ; xk
Ci 2C
xj ; xk 2 Ci Ci 2C xj 2Ci ;xk 62Ci
Condorcet [44] proposed an evaluation criterion, which is xj 6¼ xk
given as follows:
where s xj ; xk and d xj ; xk , respectively, are similarity
and distance between the vectors xj and xk .
123
An extension of Condorcet’s validity index is given in [80]. The Calinski–Harabasz validity index measures compact-
The C-criterion is defined as follows: ness by calculating the distances between the points in a
X X cluster to their centroids, and the separation is calculated
s xj ; xk c
Ci 2C x ; x 2 C
by measuring the distance from the centroids to the global
j k i
centroid [37]. This index is defined as
X xj 6¼ xX
k
traceðSBÞ np 1
þ c s xj ; xk ð10Þ CH ¼ ð13Þ
Ci 2C xj 2Ci ;xk 62Ci traceðSwÞ np k
where c is a threshold value. where (SB) is the inter-cluster scatter matrix, (Sw) the
intra-cluster scatter matrix, np is the number of entities in a
5.1.5 Category utility metric cluster, and k the number of clusters.
The category utility metric measures the goodness of fit- 5.1.8 Davies–Bouldin index (DB)
ness of the category [45, 83]. It is the evaluation criterion
used in the popular conceptual clustering algorithm called The DB index measures the average inter-cluster similarity
COBWEB [78]. Given a set of entities, the binary feature between two clusters and the one that is closest to it [54]. It
set of size n is defined as requires that information of at least two clusters be known.
The Davies–Bouldin index is defined as follows:
F ¼ ffi g; i ¼ 1; 2; . . .; n ( )
1X c
dðxi Þ þ d xj
and the binary category C ¼ fc; cg is defined as follows: BD ¼ Max ð14Þ
" c i¼1 i6¼j d ci ; cj
Xn
CUðC; F Þ ¼ pðcÞ pðfi jcÞ log pðfi jcÞ where c is the number of clusters, i, j are cluster labels,
i¼1
# dðxi Þ and d xj are all entities in clusters i and j, d ci ; cj is
X
n X
n
þpðcÞ pðfi jcÞ log pðfi jcÞ pðfi Þ log pðfi Þ; the distance between the cluster centroids. Minimizing DB
i¼1 i¼1 results in a ‘‘better’’ clustering solution.
ð11Þ
5.1.9 Silhouette index
given that:
pðcÞ is the prior probability of an entity belonging to the The silhouette index measures the compactness and sepa-
positive category c ration of clusters [189]. It requires that information of at
pðfi jcÞ is the conditional probability of the feature fi , least two clusters be known.
given that it belongs to the positive category c Given a cluster, Xj ðj ¼ 1; . . .cÞ; the index assigns to the
pðfi jcÞ is the conditional probability of the feature fi , ith entity of Xj the silhouette width, sðiÞ ¼ ði ¼ 1; . . .mÞ.
given that it belongs to the positive category c This value gives a degree of likelihood of the ith sample
and pðfi Þ is the previous probability of the entity. belonging in the cluster Xj . The index is defined as:
ðbðiÞ aðiÞÞ
sðiÞ ¼ ð15Þ
5.1.6 Bayesian information criterion (BIC) index MaxfaðiÞ; bðiÞg
where a(i) is the average distance between the ith entity in
The BIC aims to solve the problem of overfitting of the
the cluster and the remaining entities of cluster Xj ; b(i) is
partitions generated by the algorithm [174] and is defined
the minimum average distance between the ith s and all of
as follows:
the entities clustered in Xk (k = 1,… c; k = j).
BIC ¼ lnðLÞ þ v lnðnÞ ð12Þ
where n is the number of entities, L is the likelihood of the 5.1.10 Dunn index
parameters to generate the data in the model, and v is the
number of free parameters in the Gaussian model. Mini- The focus of the Dunn index is to estimate the ratio
mizing the BIC is the goal. between the smallest inter-cluster distance and the largest
intra-cluster distance in a partitioning [60]. It requires that
information of at least two clusters be known. Several
123
variants of this index exist in the literature [31, 170]. The 5.1.14 C-index
Dunn index is defined as follows:
The C-index can be used for varying input data; it is easy to
d ci ; cj
Dunn ¼ min min ð16Þ compute. It is easily generalized to estimate the cohesion of
1ic max1 k c dðXk Þ
clusters [49]. The C-index is defined as:
where d ci ; cj is the distance between cluster Xi and Xj ; SðCÞ Smin ðC Þ
dðXk Þ represents the distance between members of cluster CIðC Þ ¼ ð20Þ
Smax ðCÞ Smin ðC Þ
(Xk ) and c is the number of clusters in the dataset. Maxi-
mizing the Dunn index gives a good clustering solution. where
Some setbacks of the Dunn include its time complexity, X X
Sð C Þ ¼ de x i x j
and it is affected by noise in datasets. Ck 2C xi ;xj 2Ck
X
5.1.11 NIVA index Smin ðC Þ ¼ minðnw Þxi ;xj 2X de xi xj
X
Smax ðC Þ ¼ maxðnw Þxi ;xj 2X de xi xj :
The NIVA validation index [186] is defined as follows:
CompacðCÞ
NIVAðC Þ ¼ ð17Þ 5.1.15 Sym-index
SepxGðCÞ
where Compac ðC Þ is the average compactness of the The theoretical base for the sym-index is the I-index, which
cluster C and SepxGðC Þ is the average separability of is a point symmetry distance measure [29]. It is defined as
cluster C. follows:
n o
5.1.12 Gamma index maxCk ;Cl 2 de ðck ; cl Þ
SymðCÞ ¼ P P ðx ; c Þ
ð21Þ
K ck 2 xi 2ck dps i k
The gamma index [27] is defined as:
P P
ck 2C xi ;xj 2ck dl xi ; xj 5.1.16 COP index
GðC Þ ¼ ð18Þ
N
nw nw
2 The compactness or COP index is measured as the distance
between the cluster points and their centroids, whereas the
where dl xi ; xj is the number of all object pairs in X.
separation is a measure of the largest distance between
neighbors [25]. It is defined as follows:
5.1.13 Score function
P 1
1X jck j d ðx c Þ
xi 2ck e i k
This index measures the cluster separation by estimating COPðC Þ ¼ jck j : ð22Þ
N ck 2 minxi 62ck maxxi 2ck de xi xj
the distance between cluster centroids to the global cen-
troid. The compactness of the clusters is measured by
estimating the distance from the points in a cluster to their 5.1.17 Negentropy increment
centroid [194]. The index is defined as follows:
1 The negentropy index measures the normality rather than
SFðC Þ ¼ 1 ð19Þ
eebcdðCÞ þwcdðCÞ the compactness or separation of the clusters [133].
where 1X X X
P NIðC Þ ¼ pðck Þ log 1=2 log
2 c 2C
ck 2C jck jde ðck ; X Þ k ck X ð23Þ
bcdðCÞ ¼ X
NK pðck Þ log pðck Þ
ck 2C
and
X 1 X
wcdðC Þ ¼ de ðxi ; ck Þ: 5.1.18 SV-index
jc j
c 2C k xi 2ck
k
123
P n o
1
mincl 2Cnck de ðck ; cl Þ
ck 2C Dens bwðnc Þ ¼
SVðCÞ ¼ P P nc ðnc 1Þ
0 1
ck 2C 10=jck j maxxi 2ck ð0:1jck jÞfde ðxi ck Þg
ð24Þ B C
Xnn B X nc C
B density uij
B C
C
B
i¼1 @ j ¼ 1
max densityðvi Þ; density vj C
A
5.1.19 OS-index
i 6¼ j
The OS-index [57] is defined as follows:
P P where vi ; vj are the centroids of cluster ci cj , and uij the
ck 2C OVðxi ; ck Þ
OSðCÞ ¼ P P xi 2ck middle point of the line segment.
ck 2C 10=jck j maxxi 2ck ð0:1jck jÞfde ðxi ck Þg
ð25Þ 5.1.23 Root-mean-square standard deviation (RMSSTD)
where
5.1.25 Compact-separated (CS) index
1X nc
Scatðnc Þ ¼ rðvi Þ=rð X Þ
nc i¼1 The CS index is a validity measure [75, 121] that estimates
!1 the ratio of the sum of within-cluster scatter to between-
Dmax Xnc Xnc
cluster separation. A large value of a CS index indicates
Disðnc Þ ¼ vk vz :
Dmin k¼1 z¼1 low compactness or separation, while a lesser value means
a better clustering. It has been shown that the CS index
offers more efficiency in handling clusters having different
5.1.22 S_Dbw validity index dimensions, densities, or sizes. Although it is computa-
tionally more expensive than the DB index in terms of
The S_Dbw validity index uses the underlying character- execution time, it does, however, produces better quality
istics of the clusters to measure the validity of the results solutions than does the DB index. Let the within-cluster
from the clustering algorithm [88]. It is defined as follows scatter be denoted as Xi and the between-cluster separation
S Dbwðnc Þ ¼ Scatðnc Þ þ Dens bwðnc Þ ð28Þ be represented as Xj , such that the distance measure V is
where given as V Xi ; Xj : Hence, the CS index for a clustering Q
is computed as follows [75].
123
1
PP h 1 P i 5.2.3 F-measure
P i¼1 Dn Xi 2Qi maxXj 2Qi V Xi ; Xj
CSðQ; V Þ ¼ 1
PP
Pi¼1 minj2P;j6¼i fVðxi ; xj g Equal weighing for the false positives and false negatives
PP h 1 P i usually results in undesirable features. So the F-measure
i¼1 Qi Xi 2Q maxXj 2Q V Xi ; Xj
¼ PP index addresses this by using weighting recall parameter
i¼1 minj2P;j6¼i fVðxi ; xj g g [ 0 to balance the false negatives [187]. The F-measure
ð31Þ is defined as follows:
This index is based on the mutual interdependency between if A and B are empty, then 0 J ðABÞ 1:
the partition result and the underlying structure of the
5.2.5 Fowlkes–Mallows index
dataset [212].
For m instances in clusters C ¼ C1 ; C2 ; . . .; Cg and
This index measures the compactness of clusters obtained
target attribute z with domain domðzÞ ¼ fc1 ; c2 ; . . .; ck g,
from a clustering algorithm. Maximizing the index results
the index is defined as follows:
in higher similarities [81]. It is defined as
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 XX
g k
mi;h m TP TP
C¼ mi;h loggk ð32Þ FM ¼ ð36Þ
m i¼1 h¼1 m:;h mi;: TP þ FP TP þ FN
where mi;h is the number of instances in cluster Ci and in
class ch ; m:;h denotes the total number of instances in the 5.2.6 NMI measure
class ch , and ml;: denotes the number of instances in cluster
Ci . The normalized mutual information (NMI) is defined as
follows:
5.2.2 Rand index
I ðX; Y Þ
NMIðX; Y Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð37Þ
This index shows the similarities between partitions of the H ð X ÞH ðY Þ
clustering algorithm and the underlying structure of the where I(X, Y) is the mutual information between two ran-
dataset [181]. The index is defined as follows: dom variables X and Y and H(X) denotes the entropy of X,
TP þ TN X is the partition by a clustering algorithm, and Y repre-
RAND ¼ ð33Þ
TP þ FP þ FN þ TN sents the true labels of the dataset [137].
where TP is the number of true positives, TN is the number
5.2.7 Purity
of true negatives, FP is the number of false positives, and
FN is the number of false negatives.
Purity applies to a set of clusters. The purity for each
The index usually is in the range 0 to 1, with a Rand
cluster Pj is defined as:
index of 1 indicating a perfect match.
123
Pj ¼
1
Maxi nij ð38Þ
6 Discussion
nj
Clustering is aimed at ensuring that a set of objects is
The purity for the set of clusters is calculated as a
efficiently classified into their various clusters. The
weighted sum of the individual purities [123]. This is given
approach taken to achieve the clustering of such objects
as:
relies largely on the algorithms designed for the task.
Xm
nj Hence, algorithms for clustering need to be very efficient in
Purity ¼ Pj ð39Þ
j¼1
n their approach to classifying or categorizing objects into
their most fitting class or category. Moreover, the challenge
where nj denote the size of cluster j, m is the number of of managing extensive data, which are frequently gener-
clusters and n is the total number of entities. ated from social media and other online network-based
platforms with high throughput streams, requires that
5.2.8 Entropy clustering algorithms evolve to allow for efficient deploy-
ment. In addition to the challenge of data size, memory
Entropy increases as the classification of objects in a usage and management are also a design issue with clus-
cluster become more varied. If all the objects in the cluster tering algorithms, meaning that algorithms that are
belong to one label, then the entropy is 0 [57]. In this case, designed to function well in an environment of limited
entropy is defined as follows: memory are considered efficient. Clustering algorithms that
X
Ej ¼ pij log pij ð40Þ are aimed at streaming data would demonstrate an inter-
i esting performance in this scenario.
The application of clustering techniques and the corre-
sponding array of algorithms, to different domains or fields
5.2.9 Relative validation such as engineering, medicine, and data mining, continue
to affect the development and modification of clustering
This scheme tries to validate the partitions of a clustering algorithms and their related techniques. The development
algorithm by estimating the best clustering scheme possible of a clustering algorithm entails the following phases:
under certain assumptions and parameters. It tunes the feature selection, pattern proximity, cluster formation, and
parameters and evaluates or compares the resulting cluster clustering validation [215]. The task of designing and
structures produced by the algorithm. The relative valida- developing clustering algorithms also factors in the auto-
tion involves a lot of statistical testing [89]. Assuming the matic and non-automatic pattern of clustering objects the
clustering problem is defined as thus: proposed clustering algorithm adopts. Design of automatic
Let Palg be the set of parameters associated with a clustering algorithms is non-trivial because algorithm
specific clustering algorithm (e.g., the number of design must accommodating the number of clusters being
clusters nc ). Among the clustering schemes Ci , unknown a priori [196], contrasting with non-automatic
i ¼ 1; . . .; nc , defined by a specific algorithm, for clustering algorithms which require that such a parameter
different values of the parameters in Palg , choose the be known. This, therefore, is one important concept that
one that best fits the data set. [89]. has shaped the trends in the emergence of new clustering
algorithms, which we shall discuss in the following
The following cases hold: paragraphs.
• Palg does not contain nc as a parameter. Clustering algorithms have evolved and are evolving
The idea here is to tune the parameter over a wide due to the nature of data being classified, the approach for
range of values and run the clustering algorithm and similarity measures (like Euclidean distance), the evalua-
then choose the maximum range for which nc is tion criteria, the validity and accuracy of the clusters
constant. Normally, nc N where N is the number of generated, and the high dimensionality of data, which often
tuples leads to increased computational cost. Although evaluation
• Palg contains nc as a parameter. of clusters derived from a clustering algorithm may follow
Define a maximum and minimum range first, then the method of homogeneity, completeness, and V-measure,
run the algorithm r times over each nc between the the models used for the evaluation and performance mea-
minimum and maximum range, tuning the parameters sures are also worth noting. These include Condorcet’s
during each run. Then plot the best values of the index criterion, edge-cut metrics, the sum of squared error, cat-
obtained against nc . The plot may indicate the best egory utility metric, scatter criteria, and C-criterion in the
cluster. category of internal performance measures, while the likes
123
of Fowlkes–Mallows index, Rand index, mutual informa- capability of BIRCH, it is known to achieve and maintain a
tion-based measure, confusion matrix, F-measure, and computational complexity of OðN Þ.
Jaccard index may form the external performance mea- Similarly, CURE, a multi-centroid clustering algorithm,
sures. These performance measures may not directly affect has been shown to be tolerant of outliers and able to handle
the design of the clustering algorithm. They do, however, large-scale databases well. However, CURE archives a
weigh in on its performance, which invariably influences computational complexity of OðN2 log N Þ compared to the
the trends in design, evolution, and application of such low value associated with BIRCH and has an ineffective
clustering algorithms. Accordingly, we shall focus our approach for handling noise. Nevertheless, it performs well
discussion on the evolution of popular traditional or clas- in high-dimensional datasets having varying densities of
sical clustering algorithms, namely BIRCH, DBSCAN, data points, and also it shows the capacity to locate non-
k-Means, Mini-Batch k-Means, Mean Shift, OPTICS, spherically shaped and wide variance-sized datasets [86],
Spectral Clustering, and Mixture of Gaussians. Also, we even though it does not use a distance function. The ability
will discuss trends for the task of data clustering using of CURE to support clustering of non-spherically shaped
algorithms that adopt or repurpose approaches based on datasets lies in its mechanism for representing clusters
nature-inspired (NI) algorithms. Meanwhile, in our dis- using well-scattered points per cluster, thereby yielding
cussion of the trends in clustering, we will touch on issues more than one point per cluster, which leads to its geo-
like how clustering algorithms are able to handle high metrical flexibility and its ability to shrink to detect
dimensionality of data while also managing computational outliers.
cost, and their effect on the consistency of algorithms, and Although BIRCH and CURE are not purely hierarchical
other relevant issues. clustering algorithms, they do represent an improved ver-
sion of hierarchical clustering algorithms; as such, they are
6.1 Recent trends in clustering algorithms two integrated hierarchical clustering algorithms. Some-
times, the use of BIRCH is complemented by other clus-
In Sects. 1 and 2, it was noted that clustering techniques tering algorithms, which are applied to the result of
might be categorized as either hierarchical or partitional, summaries generated by BIRCH. BIRCH’s summarizing
with some literature including the classifications of grid nature allows it to minimize memory usage during clus-
based, density based or model based. Therefore, the fol- tering operation. While CURE performs well on non-
lowing subsections present trends in clustering algorithms spherical data, BIRCH suffers some limitations which are
based on their methods, namely hierarchical based, parti- overcome by the improved versions of it: for instance, Link
tional based, grid based, density based and model based. BIRCH (LBIRSCH) leverages on the concept of link, as
used in ROCK [87].
6.1.1 Hierarchical-based algorithms The robust clustering algorithm (ROCK) uses links
instead of a distance function for the purpose of clustering.
Clustering algorithms based on a hierarchical approach It is a clustering algorithm used on categorical datasets and
exploit the hierarchical structure inherent in the data. An has demonstrated an interesting performance in cluster
interesting issue about hierarchical clustering algorithms is forming, cluster merging, and other cluster-based opera-
its non-sensitivity to the chosen distance metric and the tions. It has a computational complexity of
automatic nature of the discovery of the number of clusters OðN2 þ Nm þ N2 log N Þ, meaning that it also scales
existing in a dataset. These characteristics, especially the poorly in this area. QROCK is a quick version of the
selection of distance metric, are unusual among other cat- ROCK algorithm for clustering of categorical data.
egories of clustering algorithms. Some popular clustering In contrast, the CHAMELEON clustering algorithm is
algorithms that fit well into this category are the balanced known for handling low-dimensional spaces and allows for
iterative reducing and clustering using hierarchies merging of clusters using proximity between two clusters.
(BIRCH), clustering using representatives (CURE), as well The algorithm’s capability of operating on the sparse
as ROCK and CHAMELEON. graph, where nodes denote data items and where edges
As clustering algorithms have evolved, the need for with weights represent similarities between data objects,
robustness in handling outliers and continuous increments makes it perform well as a clustering algorithm. CHA-
in the size of data (high dimensionality of data) when MELEON, despite having a computational complexity of
clustering has motivated the use of BIRCH. However, an OðNm þ NlogN þ m2logN Þ, has been proven to outper-
attempt to further improve the performance of BIRCH in form DBSCAN and CURE. For better time complexity,
this direction has resulted in the variants named Bubble and BIRCH may outperform CURE even though CURE
Bubble-FM clustering algorithms. In addition to this effectively handles larger datasets and with a better quality
of clustering [199].
123
6.1.2 Partitional-based algorithm of clusters and also the ability to use its windows to
eliminate near-duplicates. While k-means and its variants
The partional-based category of clustering algorithms leverage computing means and medians, the expectation–
applies the Euclidean distance as the most commonly used maximization (EM) clustering algorithm uses Gaussian
criterion data object. Examples of clustering algorithms mixture models (GMM) so that data points are Gaussian-
found in this category include k-means, expectation–max- distributed. By using this approach of using a mean func-
imization (EM), fuzzy c-means, PAM, CLARA, and tion, EM clustering algorithm circumvents the assumption
CLARANS. that its clusters are circular.
One of the best-known and probably most used classical Partitioning around medoids (PAM) is another memory
clustering algorithms is the k-means algorithm [199] with a demanding clustering algorithm, which often stores its
computational complexity of OðN Þ, meaning that it is result of pairwise dissimilarity matrix computation in
computationally efficient. Its simplicity has probably won memory, thereby limiting its application to large datasets.
it the attention it has garnered, sometimes making it a The computational complexity of the algorithm is
benchmark for clustering algorithms. The popular k letter O K ðN K Þ2 , where K is the number of clusters, and
used in its name defines k centroids, one for each cluster
such that each point in the dataset is assigned to the closest N represents the number of points in the data. To overcome
centroid. However, this makes it sensitive to the initial this memory demand of PAM, another clustering algo-
randomly set k value, thereby limiting its performance on rithm, clustering large applications (CLARA) was pro-
data objects which are only clustered on the spherical shape posed. CLARA minimizes the average dissimilarity
or are inconsistent [219]. Its method of handling features is between objects and objects closest to them. It does this by
similar to the approach that the BIRCH clustering algo- reducing the search space by searching only a sub-graph
rithm adopts in dealing with metric attributes. prepared by a sampled OðK Þ data points, although it has
the ability to draw multiple samples. This leads to the
The fuzzy c-means (FCM) algorithm changes discrete
values of the belonging label, f0; 1g into the continuous computational complexity of O K ð40 þ K Þ2 þK ðN K Þ
interval ½0; 1. As with the k-means clustering algorithm, while providing users with a clustering algorithm capable
FCM has a computational complexity of OðN Þ, except for of handling large datasets, thereby earning it the rank of
some of its approaches such as the use of fuzzy logic in best partitioning clustering algorithm considering the out-
determining clusters. However, FCM has the drawback of put. But to further improve the efficiency of CLARA, a
susceptibility to local optima, dependence on its initial variant clustering algorithm named CLARANS was pro-
partition and sensitivity to noise and outliers. Because posed, which has a computational complexity of OðKN 2 Þ.
FCM often arrives at inexact clustering results, an It operates by searching the entire graph while seeking to
improved version of the algorithm was proposed, namely obtain an optimal local solution. Unlike CLARA, this
the weighted fuzzy c-means (WFCM), which uses a two- algorithm achieves its sampling through a dynamic
stage feature of selection and weighting. Efforts to address approach that uses iterative operations for the search pro-
the memory and speed issues associated with k-means cedure. This dynamic sampling leads to the efficient per-
algorithm resulted in what is known as MapReduce-based formance of CLARANS and also influences the clusters
k-means (PK-means) demonstrating the distributive nature derived from its operation. Nevertheless, a study has
of its parent algorithm. The PK-means clustering algorithm revealed that BIRCH outperforms CLARANS [163].
allows computation tasks to be distributed among partici- Generally speaking, some of the clustering algorithms in
pating machines, which thereby improves the performance this category are known to present the drawback of being
of k-means because it allows for easy scaling up while unable to adjust themselves when a merge or split decision
speed and dataset sizes are also increased. has been executed [182].
To improve on the sensitivity of k-means to outliers,
another clustering algorithm, k-medians, was proposed. It 6.1.3 Grid-based clustering algorithms
uses the median vector of the group to compute the center,
although it is slower in cases with larger datasets arising The approach of clustering adopted in grid-based algo-
from computing the median vector. Another centroid-based rithms is similar to geometric settings of grid structure; it
clustering algorithm is the mean shift clustering, which uses a multi-resolution grid data structure. It does this by
works by updating candidates for center points to be the quantizing the clustering space into a given number of cells
mean of the points. This clustering algorithm uses a slid- before performing the required operations on the quantized
ing-window-based algorithm to find dense areas among space. Clustering algorithms like STING, Wave-Cluster
data points. The algorithm outdoes the k-means, based on
its mechanism of using mean shift to discover the number
123
and CLIQUE constitute members of this category of forms. Recently, another variant of DBSCAN algorithm is
clustering technique. the MR-DBSCAN- or MapReduce-based DBSCAN. By
The STING clustering algorithm has been proven rele- exploiting the provision of Hadoop frameworks in
vant in parallel processing because it operates by breaking MapReduce, the MR-DBSCAN adopts an approach that
down available space of objects into cells of rectangular favors scalability and reduction in computational cost
shapes and a hierarchical format. The resulting data in the through the use of data partitioning method. Also, to take
hierarchical structure is considered as its clusters [232]. advantage of GPUs which have thousands of cores that
The clustering algorithm does this successfully in such a propel their speed and computational power, G-DBSCAN
way as to remove the resource burden engaged during was developed. This variant of DBSCAN algorithm enjoys
clustering and query-based problems. STING is often rated the merits of a parallel computing environment. Other
to outperform DBSCAN, BIRCH, and even CLARANS, variants and improved versions of DBSCAN, proposed to
although the algorithm suffers from slower execution in overcome its limitations, are LD-BSCA [230], FDBSCAN,
comparison with those it outperforms, like DBSCAN. VDBSCAN, IDBSCAN, Revised DBSCAN (RDBSCAN
Another grid-based clustering algorithm, which is efficient capable of easily identifying the borders of objects lying
in terms of computational time complexity trade-off, is the close to adjacent borders) and shared nearest-neighbor
so-called Wave-Cluster algorithm, in which detection of algorithm (SNN, which leverages some concepts in ROCK
arbitrarily shaped clusters is based on wavelet transfor- clustering algorithm to produce a density-based clustering
mations. Its outstanding performance is widely reported to algorithm). In related work, a study reported how experi-
be 30 times better than for CLARANS and 10 times more mentation involving k-means, k-medoids, fuzzy c-means,
efficient than the hierarchical-based BIRCH [23]. In DBSCAN, OPTICS, and hierarchical clustering algorithms
another effort to leverage the DNF expression-like combined DBSCAN with other algorithms to provide a
approach, another grid-based clustering algorithm, Clus- simple Amplitude Modulation (AMC) algorithm [156].
tering in QUEst (CLIQUE) follows that approach to gen- Advances related to DBSCAN were also reported by Vo-
erate its clusters. This makes it insensitive to the sequence Van et al. [227], who adopted the epsilon radius neighbors
in which inputs are entered into the algorithm as it searches used in DBSCAN to identify the number and shape of
clusters by exploiting density-based clusters in subspaces. clusters automatically. Table 9 presents a category based
Its support for the detection of clusters in subspaces of high comparison of the performance of clustering algorithms.
dimensionality makes it different to and better than other DBSCAN is associated with the problems of being
clustering algorithms. The assumption is that such highest unable to detect interesting clusters from datasets present-
dimensionality possesses high-density clusters in ing varying densities and sensitivity to the radius of the
subspaces. neighborhood and the minimum number of points in a
neighborhood. To overcome these, a density-based clus-
6.1.4 Density-based clustering algorithms tering algorithm, which was aimed at tackling this limita-
tion of DBSCAN, is a connectivity-based algorithm named
The approach of grid-based clustering positions for some OPTICS. OPTICS yields more efficiency than does
limitations, as outlined in Table 8, has resulted in a density DBSCAN at a computational complexity of Oðn log nÞ,
clustering technique designed to overcome those limita- although it can only generate clusters with local-density
tions. In this section, we shall explore trends in some non-similar clusters. Meanwhile, another density-based
density-based clustering algorithms like DBSCAN, clustering proposed is the Distribution-based Clustering of
OPTICS, DBCLASD, and DENCLUE. Large Spatial Databases (DBCLASD), which performs
The first, and presumably the most popular, clustering well by building clusters from large spatial databases.
algorithm in this category is the Density-Based Spatial DBCLASD essentially employs an incremental approach to
Clustering of Applications with Noise (DBSCAN). place points in a cluster.
DBSCAN is built so that, in any direction, cluster points In a related work to advance the performance and speed
are closely packed together and as a result yield clusters of of DBSCAN, and even CLARANS clustering algorithms,
different shapes. However, DBSCAN is limited by its DENsity-based CLUstEring (DENCLUE) was able to
nature in that it does not capture various kinds of noise speed up DBSCAN while forming center defined and
points in clusters of different densities. Meanwhile, a par- multi-center defined type of clusters. DENCLUE, based on
allel form of DBSCAN was proposed, namely PDBSCAN. kernel density estimation, uses strong mathematical models
Furthermore, to advance the concept of generalization in making it capable of working with datasets having noise
density-based clustering algorithms, another clustering and modeling arbitrarily shaped clusters in high-dimen-
algorithm (GDBSCAN) was proposed to enable support for sional datasets. DENCLUE has also demonstrated good
cluster objects in their numerical and categorical attribute clustering properties and results in datasets with large
123
BIRCH BIRCH uses a hierarchical data OðnÞ Non- High Numerical Yes/no Reasonably fast, it can be It may not work well
structure called CF-tree for convex used as a more when clusters are not
partitioning the incoming data intelligent alternative to ‘‘spherical’’ because it
points in an incremental and data sampling in order to uses the concept of
dynamic way improve the scalability radius or diameter to
of other clustering control the boundary of
algorithms a cluster
It is order-sensitive as it
may generate different
clusters for different
orders of the same input
data
Neural Computing and Applications (2021) 33:6247–6306
CURE A constant number of representative Oðn2 log nÞ Varying High Numerical Yes/yes Can recognize arbitrarily Ignores the information
points are chosen to represent a shapes shaped clusters and is about the aggregate
cluster robust to the presence of inter-connectivity of
outliers objects in two clusters
ROCK Using the Jaccard coefficient to Oðn2 þ nm3 a þ n2 log nÞ Varying Middle Categorical No/yes Robust and appropriate for Space complexity
measure similarity. It accepts as shapes large dataset depends on
input the set S of n sampled points initialization of local
to be clustered (that are drawn heaps
randomly from the original data
set), and the number of desired
clusters k
CHAMELEON Oðn2 Þ Varying High Numerical, No/no Effective in datasets that Low-dimensional spaces,
shapes categorical, contain points in 2D and not applied to high
spatial, space, and clusters of dimensions
multivariate different shapes, Time complexity in high
and others densities, sizes, noise, dimensions is high
and artifacts
PAM O kðn kÞ2 Non- Low Numerical No/no It is robust to outliers Number of clusters must
convex be predetermined
CLARA O kð40 þ kÞ2 þkðn kÞ Non- High Numeric Yes/no Can handle large datasets Not robust to outliers
convex
CLARANS Oðkn2 Þ Non- Middle Numeric Yes/no Robust to outliers Has high computational
convex cost
123
Table 8 (continued)
6292
Clustering Description Complexity of algorithm Shape of Scalability Type of Suitability for Advantage Disadvantage
algorithm cluster dataset large/
123
dimensionality
data
Fuzzy c-means Similar to k-means algorithm and O (n) Non- Middle Numerical Yes/no
also membership value convex
K-means Partitions the set of feature vectors O (n) Non- Middle Numerical Yes/no Effective in dealing with It fails in cases where the
into K disjoint convex huge datasets and often clusters are not circular
terminates at a local
optimum
STING Forms a hierarchical structure from O (k) Varying Spatial Yes/no Allows parallelization and Resulting clusters are all
several levels of rectangular cells shapes multiresolution bounded horizontally or
vertically and not
diagonally
WaveCluster O (n) Varying Spatial Yes/no No need for an a priori Not efficient in high-
shapes information on number dimensional space
of clusters
CLIQUE Generates the set of two- OðCk þ mkÞ Varying High Numerical Yes/yes Has good scalability as the Prone to high-
dimensional cells that might shapes number of attributes is dimensional clusters
possibly be dense from one- increased
dimensional spaces
SOM net Uses two layers of neural network Oðn2 mÞ Non- Multivariate No/yes Easily detects outlier and
and their neurons as cluster centers convex can deal with missing
data values
DENCLUE Based on the minimization of both OðlogjDjÞ Varying Large no. of Yes/ Has a solid mathematical The density parameter
the WGS and DB indices to shapes data base and is capable of and the noise threshold
explore and exploit the search generalizing various need to be selected
space, respectively clustering methods like carefully as it
partitioning-, significantly affects the
hierarchical-, and quality of results
density-based methods
DBCLASD Design good cluster for spatial Oð3n2 Þ Varying Spatial data Useful when time does not Slow computational
database shapes with matter but clustering procedure
uniformly quality is desired
distributed
points
Hierarchical Oðn2 Þ Handle any forms of similarity or Already created clusters are not revisited for Radius of
distance functions and are applicable to improvement cluster,
any attributes type branching
factor
Partitioning OðnÞ Simple to understand and implement. It Whenever a point is close to the center of Number of
takes less time to execute as compared another cluster; it gives poor outcome due to clusters
to other techniques overlapping of data points required
Grid based OðnÞ Possess a quick processing time and is Adversely affected by the number of cells in Size of grid,
independent of the number of data each dimension in the quantized space number of
objects objects in a
cell
Density Oðn log nÞ Efficiently handles large amount of noise Performs poorly when there is high Threshold,
based in dataset and does not need a priori dimensionality data radius
specification
noise. In another variation, the SUBspace CLUstering algorithms. The challenge with some such clustering
(SUBCLU) clustering algorithm uses the approach of algorithms is that they are easily entrapped within local
cluster identification by dense regions being separated from optima and present difficulties in handling complex data-
the sparse ones to build clusters using a bottom-up model. sets. Therefore, we felt it necessary to also dwell more on
Finally, for this section of density-based clustering algo- modern clustering algorithms. This category of clustering
rithms, the Fast Density-Based Clustering (FDC) algo- algorithm consists of those based on an ensemble of
rithm, uses a density-linked relationship, defined by models: quantum theory (e.g., quantum clustering QC and
equivalence. DQC), spectral graph theory (e.g., SM and NJW), affinity
propagation (AP) and nature-inspired (NI) oriented clus-
6.1.5 Model-based clustering algorithm tering algorithms. We shall focus our discussion in this
section on several clustering algorithms using nature-in-
In this section, we discuss two model-based clustering spired metaheuristic algorithms to solve data clustering
algorithms, namely COBWEB and SOM. Model-based problems. For instance, metaheuristic algorithms consisting
clustering algorithms are designed to use selected models of biotic and abiotic forms, like cuckoo search (CS), firefly
for representation of clusters. All clustering algorithms in algorithm, BAT algorithm, genetic algorithm (GA), parti-
this category are usually categorized into statistical learn- cle swarm optimization (PSO), ant colony optimization
ing method (COBWEB) and neural network learning (ACO), gravitational and Tabu search algorithms, have
method (SOM and ART). SOM leverages the presumed been well exploited for clustering tasks. We shall therefore
existence of topology to translate mappings from a high divide our discussion of clustering algorithms in this con-
dimension in the input space to a lower dimension in the text into three: evolutionary, biotic algorithms together
output space. SOM uses the Euclidean distance function for with collective intelligence metaheuristic approaches, and
its distance measure. On the one hand, the notable perfor- abiotic algorithms
mance of the SOM clustering algorithm has been observed Clustering algorithms resulting from repurposing
even when the number of clusters increases, although with approaches, based on evolutionary algorithms like evolu-
slower performance. However, SOM is sensitive to noise in tion strategies (ES) evolutionary programming (EP)
datasets. On the other hand, COBWEB uses some genetic algorithm (GA) particle swarm optimization (PSO),
exploratory criteria to build clusters through classification differential evolution (DE), and ant colony optimization
trees, which translate into a hierarchical clustering. (ACO), have yielded powerful clustering algorithms.
Examples of these clustering algorithms are the EP-based
6.1.6 Modern clustering algorithms and GA-based clustering algorithms such as evolutionary
programming clustering algorithm (EPC) and its improved
In the previous sections, we have largely centered our version GEP-cluster; GA-based clustering algorithm has
discussion on trends in traditional or classical clustering been leveraged in tackling problems of automatic
123
clustering, of which CLUSTERING, quantum-based QGA, optimized to solve even nonlinearly separable problems.
two staged-based TGCA, multi-objective MOKGA, multi- This is necessary given that single-objective clustering
objective soft subspace-based MOEASSC, VGA-cluster- algorithms are good at efficiently grouping linearly sepa-
ing, k-means-based clustering KMQGA, fuzzy-based rable clusters but suffer from getting entrapped in local
FVGA-clustering, symmetry-based VGAPS, fuzzy-based regions. Although NI-based approaches in single-objective
VGAPS named FVGAPS, and AGCUK (which does not automatic clustering using evolution strategy (ES), genetic
require the cluster number be specified a priori) algorithms algorithm (GA), evolutionary programming (EP), or dif-
are good examples. The DE-based approach has also been ferential evolution (DE) have been attained, multiobjective
optimized to produce clustering algorithms like the auto- metaheuristics optimize more than one objective function
matic-based ACDE, a corresponding fuzzy-based AFDE, simultaneously.
which was further enhanced to use the kernel to produce NI-based clustering algorithms make use of their foun-
KFNDE, and an automatic version of AFDE called MoD- dational approaches to demonstrate the ability to learn or
EAFC. Other examples include multi-objective versions of adapt to new situations while solving clustering problems
clustering algorithms like MODE, DEMO, and a fuzzy- even in complex and changing environments [145]. Studies
based MOMoDEFC. have reported a state-of-the-art performance of such algo-
The biotic approaches which include swarm intelligence rithms as seen in an improved firefly algorithm, which was
(SI), artificial immune systems (AIS), invasive weed opti- hybridized with particle swarm optimization algorithm to
mization (IWO), and simulated annealing (SA) have also solve automatic data clustering problems [14]. Similarly,
been harnessed successfully to develop new clustering an NI-based clustering that uses SOS algorithm was
algorithms. Some of these clustering algorithms include SI applied to solve clustering problems [244].
PSO-based DCPSO clustering algorithm, a segmentation-
based MEPSO, point-symmetry-based PSOPS, a similitude 6.2 Open challenges and further research
of AGCUK named CPSO, DCPG resulting from directions
hybridization of PSO and GA, ant-based called Ant-clus-
tering, an improved version of ant-based called ATTA-C, In the previous subsection, we chronicled the advances and
automatic single-objective IWO-clustering, BCO-based trends in the evolution of most of the clustering algorithms,
automatic AKC-BCO which uses kernels, AIS-based while highlighting their strengths and weaknesses. This
cluster algorithm named GTCSA (which uses clonal analysis has, accordingly, revealed the challenges and
selection algorithm), and DLSIAC. Other multi-objective opportunities inherent in building new clustering algo-
clustering algorithms include IDCMC, SI-based MOPSO, rithms; the gap which researchers can exploit. We assert
MOIMPSO and MOPSOSA (resulting from hybridization that some open challenges associated with the clustering
efforts), MOCLONAL, SA-based AMOSA, MOIWO algorithms we have reviewed should be clearly outlined in
(which is similar to CPSO but does not need to know the this section so as to provide readers with direction from
number of clusters a priori), Bird Flock Gravitational which concepts and ideas can be generated in building new
Search Algorithm (BFGSA), and specifically three-objec- clustering algorithms. Although other mainstream issues
tive-based VAMOSA and GenClustMOO. There are stud- like the evaluation criteria, distance function or approach
ies that have also attempted to hybridize traditional of measures of similarity may be considered as open
clustering algorithms with NI-based methods to yield opportunities for further research, sophisticated automatic
clustering algorithms like k-Means-ALO (ant lion opti- clustering and problems associated with widening fields of
mization), KMeans-PSO, and KMeans-FA. applications of such clustering algorithms leave much
Finally, clustering algorithms in the category of the room for future work. Meanwhile, the proliferation of
abiotic approach have also been proposed and developed. clustering algorithms also provides users with an array of
For instance, a clustering algorithm named GRIN is based options from which to select the most appropriate algo-
on gravity theory in physics and has proved to be effective rithm for their domain. We observed that this is also an
due to its non-sensitivity to the distribution of the data set avenue where developers of clustering algorithms may
[122]. attempt to focus algorithm design domain specification
Considerable attention has recently been generated by rather than generalizing their operations. We argue that
automatic clustering algorithm based on nature-inspired such a focus might reveal some latent clustering properties
metaheuristics [111] and their applications, due to the shared by domains of application, thereby highlighting
limitations of the single-objective metaheuristics-based likely areas for fruitful hybridization of clustering con-
clustering algorithm in automatic clustering algorithms, so cepts. For instance, many application domains have large
multi-objective clusterings have generated many clustering amounts of high-dimensional datasets, for which most
approaches, and algorithms are continuously being automated clustering algorithms suffer some limitations
123
when handling this problem and may result in unattractive data need improvement or a complete redesign of
clusters. such.
Furthermore, the peculiarity of the categories of clus- 2. The change in focus of design patterns in developing
tering techniques and algorithms discussed in Sect. 6.1 clustering algorithms in the category of non-auto-
presents developers of related algorithms with open chal- matic approach has revealed problems peculiar to the
lenges for revolutionizing approaches for designing clus- resulting automatic clustering algorithms. In addition
tering algorithms. For instance, one consideration is that to designing clustering algorithms that are robust
hierarchical-based clustering algorithms are associated enough to handle problems of automatic-based
with a growing time complexity as the number of instances algorithms, there is also a need for a sophisticated
increases. Another aspect is that grid-based clustering automatic clustering technique that allows for flex-
algorithms are limited by their difficulty in discovering ibility and effectiveness in use.
clusters of varied shapes or sizes. By contrast, density- 3. The challenge of sophisticated or complex automatic
based clustering algorithms successfully identify clusters of clustering problems may arise from solutions in (2),
varied shapes and can effectively handle noise in datasets. probably due to the nature of input or the dataset.
Clustering algorithms based on the technique of partition- However, such clustering algorithms may be further
ing require the a priori knowledge about the distribution of improved through the design of mechanisms for
data leading to preknowledge of the number of clusters discovering the intrinsic nature of the input to allow
since such information is needed as input to those classes for choosing between single-objective or multiob-
of clustering algorithms. Careful exploitation of the pros jective optimizations.
and cons of these clustering algorithms holds interesting 4. Notwithstanding the advances made through the
avenues of exploration for algorithm designers. application of the NI-based approach, clustering
Also, considering a widely adopted approach of devel- algorithms originating from some of these
oping clustering algorithms through efforts aimed at approaches, like the gravitational search algorithm,
leveraging approaches found in nature-inspired algorithms, bacterial foraging and firefly optimizations, still
most of the problems associated with automatic clustering leave room for developing improved versions of
are being addressed. This research effort has produced such clustering algorithms capable of being used in
interesting clustering algorithms that have proven to be the automatic clustering task.
effective and outperformed classical clustering algorithms 5. Widening fields of application for clustering tech-
[111]. However, even clustering algorithms patterned after niques, especially in real-time systems, is pushing for
this approach show evidence of the need for further research into advancing clustering algorithms aimed
enhancement. Again, this is another area open for further at reduced time complexity for memory or vice
advancing the design of the clustering algorithm. Mean- versa. Hence, the clustering algorithm design might
while, other challenges that might have been only partially even be skillfully exploited for improved perfor-
addressed by some clustering algorithms and so provide mance of both parameters. Nevertheless, whatever
opportunities for improvement are the difficulty in direction research on this issue takes, it will result in
extending clusters resulting from low-dimensional cases to more effective measures for handling large datasets
high-dimensional cases; the challenge of discovering with varied forms of data.
important parameters for tuning clustering algorithms for 6. Studies have shown that when operators are com-
effective application across domains; and the difficulty bined in algorithm design, the tendency has been to
associated with verifying and interpreting clusters of high poise such algorithm to robustly handle diversity in
dimensionality. data or population, thereby improving the quality of
Considering all the challenges above, the following list result within a short time [191]. Leveraging on this
outlines some areas open for advancing design and devel- concept, designers of the clustering algorithm have
opment of clustering algorithms and their techniques. the opportunity to develop algorithms capable of
Although some of them might not appear to be completely handling input with diverse attributes.
new, they do, however, need further improvement. 7. Similarly, clustering algorithms can be further
improved to learn techniques for adapting to clusters
1. Increase in sources and platforms for data generation
with non-uniform sparsity and size. This adaptation
continues to present data analysts with challenges
mechanism must make room for handling outliers.
associated with the extraction of knowledge from
8. The role of the objective function in the clustering
terabytes and petabytes of data. Therefore, clustering
algorithm design has largely influenced the creation
algorithms targeted at effectively clustering such
and advancement of different clustering algorithms,
as presented in Sect. 6.1. There is therefore a need
123
for studies focused on investigating the performance 6.3 Trending application areas of clustering
of different objective functions across various clas- algorithms
sification techniques with the hope of cueing moti-
vation for the design of better clustering algorithms, Clustering analysis is broadly used in several real-world
especially in multi-objective optimization. application areas, such as sport, education, market
9. We mentioned in Sect. 6.1.6 that ensembles of research, pattern recognition, data analysis, image pro-
clustering algorithm are recent advances associated cessing, advertisements (recommender systems), big data
with modern clustering techniques. In addition to this analysis, and drug activity prediction. The focus of this
ensemble of clustering algorithms, the researcher section is to briefly introduce to a wider audience or data
may consider building clustering algorithms in mining enthusiast some of the trending real-world appli-
distributed clustering. This has become necessary cation areas in which most of the state-of-the-art clustering
due to the limitation of classical algorithms in algorithms have been applied to solve difficult clustering
handling huge amount of data, so that, with a sample problems. Specifically, those with recent practical appli-
size of a petabyte of data, clustering is a challenge. cations are of interest to us here. For example, the recent
10. Advances in ensembles of clustering algorithms may challenging application domain includes big data analysis,
also be furthered in the need for hybridization of NI- satellite image processing, wireless sensor networks, and
based clustering algorithms through a reasonable gene sequence clustering in bioinformatics. However, out
combination of such algorithms in a fashionable and of the endless list of useful applications of clustering
performance enhancement way. analysis, a few selected application areas are discussed
11. Moreover, hybridization arising from the idea in (9) below.
above leads to a new clustering algorithm demon-
strating the properties and advantages of two or more 6.3.1 Identifying fake news
metaheuristics-based clustering algorithms. An
example of such hybridization concepts is the hybrid Although social media provides a platform for quick and
clustering algorithm resulting from integrating FCM seamless access to information, the propagation of false
algorithm with feature weighting by a three-layer information, especially in recent years, has raised some
NN, and a hybrid clustering approach based on major concerns, given that social media are the primary
MODE and GA resulted in GADE algorithm. source of information for much of the world population.
12. In addition, the combinatorial effect of those clus- The impact of fake news cannot be underestimated. Simply
tering algorithm methods has the potential to put, fake news items often spread faster than genuine news
increase the computational time complexity, due to their tendency to manipulate an individual’s beliefs,
resource and efficiency, resulting in clustering with devastating consequences in a country where such is
algorithms. accepted as the norm. Therefore, one major challenge is to
13. Another option to those in (9–11) could be to attempt automatically identify false information by categorizing all
to integrate domain-based requirements into a new articles into different types and then to notify users about
and single algorithm. The approach here looks in the the credibility of the chosen article or information-shared
direction of application of the clustering algorithm online. In this case, automatic clustering algorithms can
rather than the algorithm procedure itself. easily be applied to solve the problem. In the study con-
14. Increasing production of computational resources ducted by Hosseinimotlagh and Papalexakis [100], the
with high-capacity GPU and also exploiting the authors explored the option of fake news identification
advantage of parallel computing may help in devis- using tensor decomposition ensembles. The proposed
ing design patterns aimed at delivering better clus- clustering algorithm presented in [100] works in such a
tering algorithms. Also, it is reported that way that it accepts as input the content of the possibly fake
MapReduce-based clustering algorithms promise to news article, the corpus, examining the words used in the
provide scalable and faster clustering algorithms article and then clustering such words. The clustering
[205]. processes are what help the technique to distinguish
15. New clustering algorithms can also emerge from between genuine and fake news. Certain words are found
designing solutions to some fundamental challenges more commonly in sensationalized, ‘‘click-bait’’ articles.
of non-automatic and automatic clustering. When a researcher sees a high percentage of specific terms
in an article, it gives a higher probability of the material
being fake news.
123
123
well-known standard clustering algorithms such as the characteristics of High Buyers and Frequent Visitors
k-means, k-medoids, fuzzy c-means, and expectation (HBFV) and High Buyers and Occasional Visitors (HBOV)
maximization algorithms, have been used to develop effi- [112].
cient clustering techniques, which are able to monitor
students’ academic performance [61]. These algorithms are 6.3.9 Recommender system
able to conduct unsupervised learning-based clustering
tasks using students’ raw academic scores to help classify Recommender system, also known as RS, is built to pro-
each student into a well-defined cluster that clearly defines vide support in automating the process of deriving pro-
the behavior and learning style of all students in the cluster. posals very close to the user’s real interest. Such a system
can accurately predict a user’s opinion or even ratings on
6.3.8 Customer segmentation issues, products, or services by leveraging the user’s past
response or behavior. Clustering techniques have been
The task of customer segmentation requires that producers, widely applied to this task of recommending appropriate
business owners and sellers effectively categorize their items or services to users. On the one hand, authors in
customers into groups that are intelligible enough to pro- [154] used the learning-based Gravitational Search Algo-
vide sensitive information for improving sales or perfor- rithm and Learning Automata (GSA-LA), an approach of
mance. Traditional approaches require the use of business single-objective hybrid evolutionary clustering technique
analytical packages, which have not demonstrated robust- for grouping items in the offline collaborative filtering RS.
ness and efficiency in the face of huge data. Hence, the use On the other hand, Ku_zelewska [132] developed a clus-
of clustering techniques has generated interesting perfor- tering method for identifying similar features existing
mance in this area. Major clustering models and algorithms among users as well as their profiles in order to aid the
such as k-means, agglomerative hierarchical, hierarchical process of the user’s selection of products [132]. Their
and non-hierarchical clustering algorithms have been similarity measurement models were those of Euclidean
adopted for different research needs. Segmentation of distance, log-likelihood function, correlation coefficient,
customers may assume forms that use customer behaviors, and cosine.
demography, psychographics, needs, wants, characteristics
and even geographical place, such that each segment 6.3.10 Wireless sensor network’s based application
comprises customers who share similar market character-
istics. Ezenkwu et al. [66] and Kansal et al. [112] are good The use of wireless sensor networks (WSNs) has gained
examples of researchers who have attempted to apply wide acceptance in different fields of research. Health
clustering techniques and algorithms to solve this problem facilities, home surveillance systems, educational or insti-
of customer segmentation. The first study applied k-means tutional centers, military establishments, and even industry
clustering algorithm to the issue of segmenting customers are areas where WSNs are frequently used. However, the
while adopting MATLAB platform for implementing the application of clustering technique to the use of WSNs is
algorithm, which they trained using a z-score normalized becoming even more attractive because of the problems of
two-feature dataset. The approach was able to generate four energy efficiency of nodes and lifetime of the network as it
clusters of customers from the dataset containing features relates to WSNs. Hence, clustering algorithms can be
of customer information as related to the average amount adopted to improve the energy of WSNs nodes and as well
of goods purchased and the average number of customer as the scalability of the nodes. Singh et al. [210] presented
visits. As a result, they were able to class customers as the application of clustering algorithms to the challenges
High-Buyers-Regular-Visitors (HBRV), High-Buyers-Ir- listed earlier. Their work outlines different studies that
regular-Visitors (HBIV), Low-Buyers-Regular-Visitors have successfully applied clustering algorithms to this task.
(LBRV) and Low-Buyers-Irregular-Visitors (LBIV) [66]. They revealed that different approaches of clustering have
Similarly, the second study also applied the k-means used centralized, distributed, hybrid, equal and unequal
clustering algorithm in addition to agglomerative, and clustering techniques in solving the problem of sensor
meanshift clustering-based algorithms in segmenting cus- networks [210]. Furthermore, the authors stated that most
tomers. They also applied their method to a dataset con- of the studies using these approaches leverage the residual
taining features of customers based on their mean amount energy of nodes and distance to the base station as
of shopping and average visits to the shop. As a result, their parameters for selecting cluster heads. Zanjireh et al. [240],
studies generated five segments of clusters: Careless, by optimizing the distribution of cluster heads across the
Careful, Standard, Target and Sensible customers. Fur- network, were able to derive a new clustering algorithm for
thermore, the application of mean shift clustering algo- wireless sensor networks, which they claim successfully
rithm further revealed other segments of the cluster with
123
reduced the energy consumption of the network, thereby unsupervised learning algorithms that are equipped with
prolonging the lifetime of the node [240]. mechanisms that can efficiently help computing resource to
easily explore and understand the detailed structural com-
6.3.11 Drug activity prediction position of any data objects. The recent developments in
the application of nature-inspired metaheuristic optimiza-
The application of clustering techniques and their related tion algorithms have paved the way for researchers to
algorithms have also attracted the interest of pharmaceu- develop some of the simplest yet most robust data
tical companies. Complex and larger pharmacology net- abstraction and analysis tools, which do not require any
works can be made simpler through the use of clustering prior knowledge of the data to be processed. It is note-
algorithms for clustering existing drugs into new groups. worthy to mention that several studies have revealed that
These new clusters can form the basis for repositioning or nature-inspired metaheuristic clustering algorithms are
repurposing the drugs for other use. This classification task ideally suited for achieving better, simple and compact
often takes the approach of first selecting compound, vir- representations of data objects in complex and large-scale
tual library generation, High-Throughput Screening (HTS), datasets.
Quantitative Structure–Activity Relationship (QSAR) This paper has provided a detailed, state-of-the-art col-
Analysis and Absorption, Distribution, Metabolism, Elim- lection in the form of a comprehensive overview and bib-
ination and Toxicity (ADMET) prediction. Malhat et al. liometric analysis of the well-known clustering algorithms.
[146] presented chemoinformatics clustering algorithms by A taxonomy of clustering algorithms is presented and
using k-means, bisecting k-means, and Ward clustering discussed. The focus is centered on clustering algorithms
algorithms for drug discovery processes. Besides this, the and automatic clustering algorithms in the bibliometric
authors also carried out a comparative analysis of the analysis. The publications and citation structures are ana-
performances of the algorithm for the discovery of the most lyzed from the early 1990s to 2019. A total of 5063 papers
effective algorithm in this task. Furthermore, they applied were extracted, among which 97.41% were articles.
the algorithms over homogeneous and heterogeneous Publication numbers and citation growth have increased
chemical datasets, thereby obtaining a result that shows the significantly every year, which may be attributed to the
k-means algorithm is more fit for a small number of clus- current era of large datasets in almost all domains. Four
ters while bisecting k-means and Ward algorithms are more papers each had more than 3000 citations. Pattern
fit for large clusters formation for homogeneous and Recognition is the journal with the highest number of
heterogeneous data sets in terms of time and standard publications. As to authors, Bezdek has been the most
deviation. Similarly, Hameed et al. [91] applied two-tiered productive (most number of papers), while Jain is highly
drug-centric unsupervised clustering algorithms, which influential (most number of citations). China has published
they proposed for drug repositioning. Their clustering the highest number of papers, while those from the USA
approach to solving the problem of repositioning drugs was have been cited several times. Among institutions, the
achieved through an integration of information such as Chinese Academy of Sciences is the most productive and
drug–chemical, drug–disease, drug–gene, drug–protein and Michigan State University is the most influential. Top-most
drug–side effects in first clustering drugs by using the keywords used by authors are clustering, data mining,
Growing Self Organizing Map (GSOM) based on their clustering algorithms, etc. For each of the bibliometric
homogenous information. Secondly, the authors performed indicators, we have also presented visualizations using the
clustering of the resulting previous groups using drug–drug VOSviewer.
relation matrices [91]. Furthermore, an all-inclusive review of the metaheuris-
tic clustering approaches from the early 1990s to date was
presented. The current study has introduced the basic and
7 Summary core idea of some of the commonly used clustering algo-
rithms; specifically, this paper has focused on reviewing
In real life, the relevance of clustering analysis cannot be nature-inspired metaheuristic clustering techniques with
overemphasized, specifically for those application areas the primary aim of identifying their common inspirational
where decision making and exploratory pattern analysis are sources, taxonomical classification, advantages and disad-
required to be carried out on large-scale datasets. In most vantages of each. However, it is burdensome to present a
cases, extracting the essential information from several complete list of all the clustering algorithms due to the
millions of data samples with tens of thousands of di- intersection of research fields and the diversity in appli-
mensionalities is a very daunting task to embark on. cation areas. Therefore, this study has only considered the
Therefore, the only methods that have worked very well well-researched and commonly used clustering algorithms,
with data analysis of such magnitude are those with high practical values. The overall review of the
123
clustering algorithms presented in this paper has the major 3. Abraham A, Das S, Konar A (2007) Kernel based automatic
goal of providing interested readers with a systematic and clustering using modified particle swarm optimization algo-
rithm. In: Proceedings of the 9th annual conference on genetic
clear understanding of the importance of existing data and evolutionary computation. ACM, pp 2–9
analysis algorithms and their application areas. 4. Abualigah LM, Khader AT (2017) Unsupervised text feature
In conclusion, the subject of data clustering or analysis selection technique based on hybrid particle swarm optimization
is considered to be an interesting, useful, and challenging algorithm with genetic operators for the text clustering. J Su-
percomput 73(11):4773–4795
problem. Moreover, the tasks of clustering involve a very 5. Abualigah LM, Khader AT, Al-Betar MA (2016) Multi-objec-
complex sequence of processes that must be carefully tives-based text clustering technique using K-mean algorithm.
sorted and executed in order to obtain any meaningful In: 2016 7th international conference on computer science and
result from the candidate datasets. It is also interesting to information technology (CSIT). IEEE, pp 1–6
6. Abualigah LM, Khader AT, Hanandeh ES (2018) A hybrid
note that literature in the area of clustering is quite diverse, strategy for krill herd algorithm with harmony search algorithm
with so much practical potential in real-world application to improve the data clustering. Intell Decis Technol 12(1):3–14
areas such as pattern recognition, marketing and sales 7. Abualigah LM, Khader AT, Al-Betar MA, Alomari OA (2017)
research, predictive gaming, web network traffic classifi- Text feature selection with a robust weight scheme and dynamic
dimension reduction to text document clustering. Expert Syst
cation, and document filtering and retrieval. However, Appl 84:24–36
there are still some major concerns with the problem of 8. Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA
dealing with large-scale datasets, in terms of determining (2016) A krill herd algorithm for efficient text documents
the number of clusters automatically, selection of cluster- clustering. In: 2016 IEEE symposium on computer applications
& industrial electronics (ISCAIE). IEEE, pp 67–72
ing methods and more efficient automatic clustering algo- 9. Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2016)
rithms to handle real-world clustering problems. A new hybridization strategy for krill herd algorithm and har-
Nevertheless, researches in these areas are still very active mony search algorithm applied to improve the data clustering.
within the research community. Finally, considering the In: 1st EAI international conference on computer science and
engineering. European Alliance for Innovation (EAI), p 54
relevance of clustering tasks to most real-world problems, 10. Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES (2017)
it is still possible to explore and exploit further application Unsupervised text feature selection technique based on particle
potential areas with the most efficient data abstraction swarm optimization algorithm for improving the text clustering.
algorithms, specifically, using state-of-the-art nature-in- In: EAI international conference on computer science and
engineering
spired metaheuristic clustering algorithms. 11. Abubaker A, Baharum A, Alrefaei M (2015) Automatic clus-
As a way forward, considering the large volume of lit- tering using multi-objective particle swarm and simulated
erature available in clustering and its applications, it is annealing. PLoS One 10(7):e0130995
possible that the current study missed out some recently 12. Agarwal P, Mehta S (2016) Enhanced flower pollination algo-
rithm on data clustering. Int J Comput Appl 38(2–3):144–155
published related clustering methods. Therefore, we rec- 13. Agarwal P, Alam MA, Biswas R (2011) Issues, challenges and
ommend the consideration of this specific limitation in any tools of clustering algorithms. arXiv preprint arXiv:1110.2610
future research. Further, it will also be interesting to con- 14. Agbaje MB, Ezugwu AE, Els R (2019) Automatic data clus-
sider extending the current literature review to include a tering using hybrid firefly particle swarm optimization algo-
rithm. IEEE Access 7:184963–184984
more constructive discussion on the merits and demerits of 15. Aggarwal CC (ed) (2014) Data classification: algorithms and
all the reviewed state-of-the-art clustering algorithms that applications. CRC Press, Boca Raton
are presented and discussed in this paper. 16. Agusti LE, Salcedo-Sanz S, Jiménez-Fernández S, Carro-Calvo
L, Del Ser J, Portilla-Figueras JA (2012) A new grouping
genetic algorithm for clustering problems. Expert Syst Appl
39(10):9695–9703
17. Akinyelu AA, Ezugwu AE (2019) Nature inspired instance
Compliance with ethical standards selection techniques for support vector machine speed opti-
mization. IEEE Access 7:154581–154599
Conflict of interest The authors declare that there is no conflict of 18. Akinyelu AA, Ezugwu AE, Adewumi AO (2019) Ant colony
interests regarding the publication of the paper. optimization edge selection for support vector machine speed
optimization. Neural Comput Appl 32:1–33
19. Akyol S, Alatas B (2017) Plant intelligence based metaheuristic
References optimization algorithms. Artif Intell Rev 47(4):417–462
20. Alatas B (2011) ACROA: artificial chemical reaction opti-
mization algorithm for global optimization. Expert Syst Appl
1. Abbasi AA, Younis M (2007) A survey on clustering algorithms
38(10):13170–13180
for wireless sensor networks. Comput Commun
21. Aliniya Z, Mirroshandel SA (2019) A novel combinatorial
30(14–15):2826–2841
merge-split approach for automatic clustering using imperialist
2. Abdulwahab HA, Noraziah A, Alsewari AA, Salih SQ (2019)
competitive algorithm. Expert Syst Appl 117:243–266
An enhanced version of black hole algorithm via levy flight for
22. Alswaitti M, Albughdadi M, Isa NAM (2018) Density-based
optimization and data clustering problems. IEEE Access
particle swarm optimization algorithm for data clustering.
7:142085–142096
Expert Syst Appl 91:170–186
123
23. Anand N, Vikram P (2015) Comprehensive analysis & perfor- 45. Corter, J. E. and Gluck, M. A. (1992). ‘‘Explaining basic cate-
mance comparison of clustering algorithms for big data. Rev gories: Feature predictability and information,’’ Psychological
Comput Eng Res 4:54–80 Bulletin, vol. 111, no. 2, pp 291–303, 1992
24. Anari B, Torkestani JA, Rahmani AM (2017) Automatic data 46. Cowgill MC, Harvey RJ, Watson LT (1999) A genetic algorithm
clustering using continuous action-set learning automata and its approach to cluster analysis. Comput Math Appl 37(7):99–108
application in segmentation of images. Appl Soft Comput 47. Cruz DPF, Maia RD, de Castro LN (2013) A new encoding
51:253–265 scheme for a bee-inspired optimal data clustering algorithm. In:
25. Arbelaitz O, Gurrutxaga I, Muguerza J, PéRez JM, Perona I 2013 BRICS congress on computational intelligence and 11th
(2013) An extensive comparative study of cluster validity Brazilian congress on computational intelligence. IEEE,
indices. Pattern Recognit 46(1):243–256 pp 136–141
26. Atabay HA, Sheikhzadeh MJ, Torshizi M (2016) A clustering 48. Cura T (2012) A particle swarm optimization approach to
algorithm based on integration of K-means and PSO. In: 2016 clustering. Expert Syst Appl 39(1):1582–1588
1st conference on swarm intelligence and evolutionary compu- 49. Dalrymple-Alford EC (1970) The measurement of clustering in
tation (CSIEC). IEEE, pp 59–63 free recall. Psycol. Bull. 74:32–34
27. Baker FB, Hubert LJ (1975) Measuring the power of hierar- 50. Das S, Abraham A, Konar A (2007) Automatic clustering using
chical cluster analysis. J Am Stat Assoc 70(1975):31–38 an improved differential evolution algorithm. IEEE Trans Syst
28. Banati H, Bajaj M (2013) Performance analysis of firefly algo- Man Cybern Part A Syst Hum 38(1):218–237
rithm for data clustering. Int J Swarm Intell 1(1):19–35 51. Das S, Abraham A, Konar A (2008) Automatic kernel clustering
29. Bandyopadhyay S, Saha S (2008) A point symmetry-based with a multi-elitist particle swarm optimization algorithm. Pat-
clustering technique for automatic evolution of clusters. IEEE tern Recognit Lett 29(5):688–699
Trans Knowl Data Eng 20(2008):1441–1457 52. Das S, Chowdhury A, Abraham A (2009) A bacterial evolu-
30. Berkhin P (2006) A survey of clustering data mining techniques. tionary algorithm for automatic data clustering. In: 2009 IEEE
In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidi- congress on evolutionary computation. IEEE, pp 2403–2410
mensional data. Springer, Berlin, pp 25–71 53. Das S, Mullick SS, Suganthan PN (2016) Recent advances in
31. Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. differential evolution—an updated survey. Swarm Evol Comput
IEEE Trans Syst Man Cyber Part B 28(3):301–315 27:1–30
32. Bezdek JC (2013) Pattern recognition with fuzzy objective 54. Davies DL, Bouldin DW (1979) A cluster separation measure.
function algorithms. Springer, Berlin IEEE Trans Pattern Anal Mach Intell 2:224–227
33. Blanco-Mesa F, León-Castro E, Merigó JM (2019) A biblio- 55. Dorigo M, Birattari M (2010) Ant colony optimization.
metric analysis of aggregation operators. Appl Soft Comput Springer, Berlin, pp 36–39
81:105488 56. Dorigo M, Stützle T (2019) Ant colony optimization: overview
34. Blanco-Mesa F, Merigó JM, Gil-Lafuente AM (2017) Fuzzy and recent advances. In: Gendreau M, Potvin JY (eds) Hand-
decision making: a bibliometric-based review. J Intell Fuzzy book of metaheuristics. Springer, Cham, pp 311–351
Syst 32(3):2033–2050 57. Drewes B (2005) Some industrial applications of text mining.
35. Boryczka U (2009) Finding groups in data: cluster analysis with In: Knowledge mining. StudFuzz, vol 185. Springer, Berlin,
ants. Appl Soft Comput 9(1):61–70 Heidelberg, pp 233–238
36. Bouyer A, Ghafarzadeh H, Tarkhaneh O (2015) An efficient 58. Duan G, Hu W, Zhang Z (2016) A novel data clustering algo-
hybrid algorithm using cuckoo search and differential evolution rithm based on modified adaptive particle swarm optimization.
for data clustering. Indian J Sci Technol 8(24):1–12 Int J Signal Process Image Process Pattern Recognit
37. Calinski T, Harabasz J (1974) A dendrite method for cluster 9(3):179–188
analysis. Commun Stat 3(1974):1–27 59. Duda RO, Hart PE, Stork DG (2001) Pattern classification.
38. Chang DX, Zhang XD, Zheng CW, Zhang DM (2010) A robust Wiley, New York
dynamic niching genetic algorithm with niche migration for 60. Dunn JC (1973) A Fuzzy relative of the ISODATA process and
automatic clustering problem. Pattern Recognit its use in detecting compact well-separated clusters. J Cyber
43(4):1346–1360 3(1973):32–57
39. Chang H, Yeung DY (2008) Robust path-based spectral clus- 61. Dutt A, Ismail MA, Herawan T (2017) A systematic review on
tering. Pattern Recognit 41(1):191–203 educational data mining. IEEE Access 5:15991–16005
40. Chou CH, Su MC, Lai E (2004) A new cluster validity measure 62. Dutta D, Dutta P, Sil J (2012) Data clustering with mixed fea-
and its application to image compression. Pattern Anal Appl tures by multi objective genetic algorithm. In: 2012 12th inter-
7(2):205–220 national conference on hybrid intelligent systems (HIS). IEEE,
41. Chowdhury A, Bose S, Das S (2011) Automatic clustering based pp 336–341
on invasive weed optimization algorithm. In: International 63. Eberhart R, Kennedy J (1995) A new optimizer using particle
conference on swarm, evolutionary, and memetic computing. swarm theory. In: MHS’95. Proceedings of the sixth interna-
Springer, Berlin, Heidelberg, pp 105–112 tional symposium on micro machine and human science. IEEE,
42. Chu Y, Mi H, Liao H, Ji Z, Wu QH (2008) A fast bacterial pp 39–43
swarming algorithm for high-dimensional function optimization. 64. Elaziz MA, Nabil NEGGAZ, Ewees AA, Lu S (2019) Automatic
In: 2008 IEEE congress on evolutionary computation (IEEE data clustering based on hybrid atom search optimization and
world congress on computational intelligence). IEEE, sine–cosine algorithm. In: 2019 IEEE congress on evolutionary
pp 3135–3140 computation (CEC). IEEE, pp 2315–2322
43. Chuang LY, Hsiao CJ, Yang CH (2011) Chaotic particle swarm 65. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based
optimization for data clustering. Expert Syst Appl algorithm for discovering clusters in large spatial databases with
38(12):14555–14563 noise. In: KDD, vol 96, No. 34, pp 226–231
44. Condorcet MJAN (2014) ‘‘Essai sur l‘‘Application de l‘‘Analyse 66. Ezenkwu CP, Ozuomba S, Kalu C (2015) Application of
‘a la Probabilité des decisions rendues a la Pluralité des Voix,’’ K-means algorithm for efficient customer segmentation: a
paris: L‘‘Imprimerie Royale, 1785 strategy for targeted customer services. Int J Adv Res Artif Intell
(IJARAI) 4(10):40–44
123
67. Ezugwu AE (2019) Enhanced symbiotic organisms search 89. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering
algorithm for unrelated parallel machines manufacturing validity checking methods: part II. ACM Sigmod Rec
scheduling with setup times. Knowl-Based Syst 172:15–32 31(3):19–27
68. Ezugwu AES, Adewumi AO (2017) Discrete symbiotic organ- 90. Halkidi M, Vazirgiannis M, Batistakis I (2000) Quality
isms search algorithm for travelling salesman problem. Expert scheme assessment in the clustering process. In: Proceedings of
Syst Appl 87:70–78 PKDD, Lyon, France
69. Ezugwu AE, Adewumi AO (2017) Soft sets based symbiotic 91. Hameed PN, Verspoor K, Kusljic S, Halgamuge S (2018) A
organisms search algorithm for resource discovery in cloud two-tiered unsupervised clustering approach for drug reposi-
computing environment. Future Gener Comput Syst 76:33–50 tioning through heterogeneous data integration. BMC Bioinform
70. Ezugwu AE, Akutsah F (2018) An improved firefly algorithm 19:29
for the unrelated parallel machines scheduling problem with 92. Hamerly G, Elkan C (2004) Learning the k in k-means. In:
sequence-dependent setup times. IEEE Access 6:54459–54478 Advances in neural information processing systems. MIT
71. Ezugwu AE, Prayogo D (2019) Symbiotic organisms search Cambridge Press, 2003, pp 281–288
algorithm: theory, recent advances and applications. Expert Syst 93. Hamilton JD (1994) Time series analysis. Princeton University
Appl 119:184–209 Press, Princeton, p 159
72. Ezugwu AE, Adeleke OJ, Viriri S (2018) Symbiotic organisms 94. Hartigan JA, Wong MA (1979) Algorithm AS 136: A k-means
search algorithm for the unrelated parallel machines scheduling clustering algorithm. J R Stat Soc Ser C (Appl Stat)
with sequence-dependent setup times. PLoS One 28(1):100–108
13(7):e0200030 95. Hassanzadeh T, Meybodi MR (2012) A new hybrid approach for
73. Ezugwu AE, Adeleke OJ, Akinyelu AA, Viriri S (2019) A data clustering using firefly algorithm and K-means. In: The
conceptual comparison of several metaheuristic algorithms on 16th CSI international symposium on artificial intelligence and
continuous optimisation problems. Neural Comput Appl signal processing (AISP 2012). IEEE, pp 007–011
32:1–45 96. Hatamlou A (2013) Black hole: a new heuristic optimization
74. Ezugwu AE, Akutsah F, Olusanya MO, Adewumi AO (2018) approach for data clustering. Inf Sci 222:175–184
Enhanced intelligent water drops algorithm for multi-depot 97. He H, Tan Y (2012) A two-stage genetic algorithm for auto-
vehicle routing problem. PLoS One 13(3):e0193751 matic clustering. Neurocomputing 81:49–59
75. Ezugwu AE (2020) Nature-inspired metaheuristic techniques for 98. He Q, Jin X, Du C, Zhuang F, Shi Z (2014) Clustering in
automatic clustering: a survey and performance study. SN Appl extreme learning machine feature space. Neurocomputing
Sci 2(2):273 128:88–95
76. Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, 99. Hoos HH, Stützle T (2004) Stochastic local search: foundations
Foufou S, Bouras A (2014) A survey of clustering algorithms for and applications. Elsevier, Amsterdam
big data: taxonomy and empirical analysis. IEEE Trans Emerg 100. Hosseinimotlagh S, Papalexakis EE (2018) Unsupervised con-
Top Comput 2(3):267–279 tent-based identification of fake news articles with tensor
77. Falkenauer E (1998) Genetic algorithms and grouping problems. decomposition ensembles. In: Proceedings of the workshop on
Wiley, Chichester misinformation and misbehavior mining on the web (MIS2)
78. Fisher DH (1987) Knowledge acquisition via incremental con- 101. Hruschka ER, Campello RJ, Freitas AA (2009) A survey of
ceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10. evolutionary algorithms for clustering. IEEE Trans Syst Man
1007/BF00114265 Cybern Part C (Appl Rev) 39(2):133–155
79. Fister I, Fister I Jr, Yang XS, Brest J (2013) A comprehensive 102. Huang CL, Huang WC, Chang HY, Yeh YC, Tsai CY (2013)
review of firefly algorithms. Swarm Evol Comput 13:34–46 Hybridization strategies for continuous ant colony optimization
80. Fortier JJ, Solomon H (1966) Clustering procedures. In: Krish- and particle swarm optimization applied to data clustering. Appl
naiah PR (ed) Multivariate analysis, vol 62. Academic Press, Soft Comput 13(9):3864–3872
New York 103. Hussain K, Salleh MNM, Cheng S, Shi Y (2019) Metaheuristic
81. Fowlkes EB, Mallows CL (1983) A method for comparing two research: a comprehensive survey. Artif Intell Rev
hierarchical clusterings. J Am Stat Assoc 78(383):553–569 52(4):2191–2233
82. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, 104. Jaccard P (1901) Distribution de la flore alpine dans le bassin
and applications, vol 20. SIAM, Philadelphia des Dranses et dans quelques régions voisines. Bulletin de la
83. Gluck MA, Corter JE (1985) Information, uncertainty, and the Société Vaudoise des Sciences Naturelles 37:241–272
utility of categories. In: Program of the 7th annual conference of 105. Jafar OM, Sivakumar R (2010) Ant-based clustering algorithms:
the cognitive science society, pp 283–287 a brief survey. Int J Comput Theory Eng 2(5):787
84. Goel S, Sharma A, Bedi P (2011) Cuckoo search clustering 106. Jain AK (2010) Data clustering: 50 years beyond K-means.
algorithm: a novel strategy of biomimicry. In: 2011 world Pattern Recognit Lett 31(8):651–666
congress on information and communication technologies. 107. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review.
IEEE, pp 916–921 ACM Comput Surv (CSUR) 31(3):264–323
85. Goldberg DE, Holland JH (1988) Genetic algorithms and 108. Jakobsson M, Rosenberg NA (2007) CLUMPP: a cluster
machine learning. Mach Learn 3(2):95–99 matching and permutation program for dealing with label
86. Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering switching and multimodality in analysis of population structure.
algorithm for large databases. Inf Syst 26(1):35–58 Bioinformatics 23(14):1801–1806
87. Guo D, Chen J, Chen Y, Li Z (2018) LBIRCH: an improved 109. Janmaijaya M, Shukla AK, Abraham A, Muhuri PK (2018) A
BIRCH algorithm based on link. ICMLC 2018:74–79 scientometric study of neurocomputing publications
88. Halkidi M, Vazirgiannis M (2001) Clustering validity assess- (1992–2018): an aerial overview of intrinsic structure. Publica-
ment: finding the optimal partitioning of a data set. In: Pro- tions 6(3):32
ceedings 2001 IEEE international conference on data mining. 110. Jensi R, Jiji GW (2015) MBA-LF: a new data clustering method
IEEE, pp 187–194 using modified BAT algorithm and levy flight. ICTACT J Soft
Comput 6(1):1093–1101
123
111. José-Garcı́a A, Gómez-Flores W (2016) Automatic clustering 131. Kuo R, Zulvia F (2013) Automatic clustering using an improved
using nature-inspired metaheuristics: a survey. Appl Soft particle swarm optimization. J Ind Intell Inf 1(1):46–51
Comput 41:192–213 132. Ku_zelewska U (2014) Clustering algorithms in hybrid recom-
112. Kansal T, Bahuguna S, Singh V, Choudhury T (2018) Customer mender system on MovieLens data. Stud Log Gramm Rhetor
segmentation using K-means clustering. In: 2018 international 37(50):125–139
conference on computational techniques, electronics and 133. Lago-Fernández LF, Corbacho F (2010) Normality-based vali-
mechanical systems (CTEMS) dation for crisp clustering. Pattern Recognit 43(3):782–795
113. Kao Y, Chen CC (2014) Automatic clustering for generalised 134. Landers JR, Duperrouzel B (2018) Machine learning approaches
cell formation using a hybrid particle swarm optimisation. Int J to competing in fantasy leagues for the NFL. IEEE Trans Games
Prod Res 52(12):3466–3484 11(2):159–172
114. Kapoor S, Zeya I, Singhal C, Nanda SJ (2017) A grey wolf 135. Lashkari M, Moattar MH (2015) The improved K-means clus-
optimizer based automatic clustering algorithm for satellite tering algorithm using the proposed extended PSO algorithm. In:
image segmentation. Proc Comput Sci 115:415–422 2015 international congress on technology, communication and
115. Karaboga D (2005) An idea based on honey bee swarm for knowledge (ICTCK). IEEE, pp 429–434
numerical optimization, vol 200. Technical report-tr06. Erciyes 136. Lee WP, Chen SW (2010) Automatic clustering with differential
Universityersity, Engineering Faculty, Computer Engineering evolution using cluster number oscillation method. In: 2010 2nd
Department, pp 1–10 international workshop on intelligent systems and applications.
116. Karaboğa D, Ökdem S (2004) A simple and global optimization IEEE, pp 1–4
algorithm for engineering problems: differential evolution 137. Legány C, Juhász S, Babos A (2006) Cluster validity measure-
algorithm. Turk J Electr Eng Comput Sci 12(1):53–60 ment techniques. In: Proceeding of the 5th. WSEAS interna-
117. Karaboga D, Ozturk C (2011) A novel clustering approach: tional conference on artificial, knowledge engineering and data
artificial bee colony (ABC) algorithm. Appl Soft Comput bases, Madrid, Spain, February 15–17, 2006, pp 388–393
11(1):652–657 138. Ling HL, Wu JS, Zhou Y, Zheng WS (2016) How many clus-
118. Karthikeyan M, Aruna P (2013) Probability based document ters? A robust PSO-based local density model. Neurocomputing
clustering and image clustering using content-based image 207:264–275
retrieval. Appl Soft Comput 13(2):959–966 139. Liu R, Wang X, Li Y, Zhang X (2012) Multi-objective invasive
119. Karypis G, Han EH, Chameleon VK (1999) A hierarchical weed optimization algortihm for clustering. In: 2012 IEEE
clustering algorithm using dynamic modeling. IEEE Comput congress on evolutionary computation. IEEE, pp 1–8
32(8):68–75 140. Liu R, Zhu B, Bian R, Ma Y, Jiao L (2015) Dynamic local
120. Kaushik K, Arora V (2015) A hybrid data clustering using firefly search based immune automatic clustering algorithm and its
algorithm based improved genetic algorithm. Proc Comput Sci applications. Appl Soft Comput 27:250–268
58:249–256 141. Liu T, Rosenberg C, Rowley HA (2007) Clustering billions of
121. Kosters WA, Laros JF (2007) Metrics for mining multisets. In: images with large scale nearest neighbor search. In: 2007 IEEE
International conference on innovative techniques and applica- workshop on applications of computer vision (WACV’07).
tions of artificial intelligence. Springer, London, pp 293–303 IEEE, p 28
122. Kotsiantis S, Pintelas EP (2004) Recent advances in clustering: a 142. Liu X, Fu H (2010) An effective clustering algorithm with ant
brief survey. WSEAS Trans Inf Sci Appl 1:73–81 colony. JCP 5(4):598–605
123. Kovács F, Ivancsy R (2006) Cluster validity measurement for 143. Liu Y, Wu X, Shen Y (2011) Automatic clustering using genetic
arbitrary shaped clustering. In: Proceeding of the 5th. WSEAS algorithms. Appl Math Comput 218(4):1267–1279
international conference on artificial, knowledge engineering 144. MacQueen J (1967) Some methods for classification and anal-
and data bases, Madrid, Spain, February 15–17, 2006, ysis of multivariate observations. In: Proceedings of the fifth
pp 372–377 Berkeley symposium on mathematical statistics and probability,
124. Kumar V, Chhabra JK, Kumar D (2014) Automatic cluster vol 1, No. 14, pp 281–297
evolution using gravitational search algorithm and its applica- 145. Majhi SK, Biswal S (2018) Optimal cluster analysis using
tion on image segmentation. Eng Appl Artif Intell 29:93–103 hybrid K-means and ant lion optimizer. Karbala Int J Mod Sci
125. Kumar V, Chhabra JK, Kumar D (2016) Automatic data clus- 4(4):347–360
tering using parameter adaptive harmony search algorithm and 146. Malhat MG, Mousa HM, El-Sisi AB (2014) Clustering of
its application to image segmentation. J Intell Syst chemical data sets for drug discovery. In: 2014 9th international
25(4):595–610 conference on informatics and systems
126. Kumar Y, Sahoo G (2014) A review on gravitational search 147. Marinakis Y, Marinaki M, Matsatsinis N (2009) A hybrid dis-
algorithm and its applications to data clustering & classification. crete artificial bee colony-GRASP algorithm for clustering. In:
Int J Intell Syst Appl 6(6):79 2009 international conference on computers & industrial engi-
127. Kundu D, Suresh K, Ghosh S, Das S, Abraham A, Badr Y neering. IEEE, pp 548–553
(2009) Automatic clustering using a synergy of genetic algo- 148. Masoud H, Jalili S, Hasheminejad SMH (2013) Dynamic clus-
rithm and multi-objective differential evolution. In: International tering using combinatorial particle swarm optimization. Appl
conference on hybrid artificial intelligence systems. Springer, Intell 38(3):289–314
Berlin, Heidelberg, pp 177–186 149. Maulik U, Saha I (2009) Modified differential evolution based
128. Kuo RJ, Zulvia FE (2018) Automatic clustering using an fuzzy clustering for pixel classification in remote sensing ima-
improved artificial bee colony optimization for customer seg- gery. Pattern Recognit 42(9):2135–2149
mentation. Knowl Inf Syst 57(2):331–357 150. Maulik U, Saha I (2010) Automatic fuzzy clustering using
129. Kuo RJ, Huang YD, Lin CC, Wu YH, Zulvia FE (2014) modified differential evolution for image classification. IEEE
Automatic kernel clustering with bee colony optimization Trans Geosci Remote Sens 48(9):3503–3510
algorithm. Inf Sci 283:107–122 151. Mehrabian AR, Lucas C (2006) A novel numerical optimization
130. Kuo RJ, Syu YJ, Chen ZY, Tien FC (2012) Integration of par- algorithm inspired from weed colonization. Ecol Inform
ticle swarm optimization and genetic algorithm for dynamic 1(4):355–366
clustering. Inf Sci 195:124–140
123
152. Merigó JM, Cobo MJ, Laengle S, Rivas D, Herrera-Viedma E 172. Pelleg D (2000) Extending K-means with efficient estimation of
(2019) Twenty years of soft computing: a bibliometric over- the number of clusters in ICML. In: Proceedings of the 17th
view. Soft Comput 23(5):1477–1497 international conference on machine learning, pp 277–281
153. Milligan GW, Cooper MC (1987) Methodology review: clus- 173. Peng H, Wang J, Shi P, Riscos-Núñez A, Pérez-Jiménez MJ
tering methods. Appl Psychol Meas 11(4):329–354 (2015) An automatic clustering algorithm inspired by membrane
154. Mohammadpour T, Bidgoli AM, Enayatifar R, Javadi HH computing. Pattern Recognit Lett 68:34–40
(2019) Efficient clustering in collaborative filtering recom- 174. Raftery A (1986) A note on Bayes factors for log-linear con-
mender system: hybrid method based on genetic algorithm and tingency table models with vague prior information. J R Stat Soc
gravitational emulation local search algorithm. Genomics 48(2):249–250
111(6):1902–1912 175. Rahman MA, Islam MZ (2014) A hybrid clustering technique
155. Molina D, Poyatos J, Del Ser J, Garcı́a S, Hussain A, Herrera F combining a novel genetic algorithm with K-means. Knowl-
(2020) Comprehensive taxonomies of nature-and bio-inspired Based Syst 71:345–365
optimization: inspiration versus algorithmic behavior, critical 176. Rajah V, Ezugwu AE (2020) Hybrid symbiotic organism search
analysis and recommendations. arXiv preprint arXiv:2002. algorithms for automatic data clustering. In: 2020 conference on
08136 information communications technology and society (ICTAS).
156. Mouton JP, Ferreira M, Helberg SJA (2020) A comparison of IEEE, pp 1–9
clustering algorithms for automatic modulation classification. 177. Rajpurohit J, Sharma TK, Abraham A, Vaishali A (2017)
Expert Syst Appl 151:113317 Glossary of metaheuristic algorithms. Int J Comput Inf Syst Ind
157. Muhuri PK, Shukla AK, Abraham A (2019) Industry 4.0: a Manag Appl 9:181–205
bibliometric analysis and detailed overview. Eng Appl Artif 178. Ramadas M, Abraham A (2019) Metaheuristics for data clus-
Intell 78:218–235 tering and image segmentation. Springer, Berlin
158. Muhuri PK, Shukla AK, Janmaijaya M, Basu A (2018) Applied 179. Rana S, Jasola S, Kumar R (2010) A hybrid sequential approach
soft computing: a bibliometric analysis of the publications and for data clustering using K-Means and particle swarm opti-
citations during (2004–2016). Appl Soft Comput 69:381–392 mization algorithm. Int J Eng Sci Technol 2(6)
159. Murty MR, Naik A, Murthy JVR, Reddy PP, Satapathy SC, 180. Rana S, Jasola S, Kumar R (2013) A boundary restricted
Parvathi K (2014) Automatic clustering using teaching learning adaptive particle swarm optimization for data clustering. Int J
based optimization. Appl Math 5(08):1202 Mach Learn Cybern 4(4):391–400
160. Nanda SJ, Panda G (2013) Automatic clustering algorithm based 181. Rand WM (1971) Objective criteria for the evaluation of clus-
on multi-objective immunized PSO to classify actions of 3D tering methods. J Am Stat Assoc 66(336):846–850
human models. Eng Appl Artif Intell 26(5–6):1429–1441 182. Rani Y, Rohil H (2013) A study of hierarchical clustering
161. Nayak J, Kanungo DP, Naik B, Behera HS (2016) Evolutionary algorithm. Int J Inf Comput Technol 3(10):1115–1122
improved swarm-based hybrid K-means algorithm for cluster 183. Rao RV, Savsani VJ, Vakharia DP (2011) Teaching–learning-
analysis. In: Proceedings of the second international conference based optimization: a novel method for constrained mechanical
on computer and communication technologies. Springer, New design optimization problems. Comput Aided Des
Delhi, pp 343–352 43(3):303–315
162. Nayak J, Nanda M, Nayak K, Naik B, Behera HS (2014) An 184. Raposo C, Antunes CH, Barreto JP (2014) Automatic clustering
improved firefly fuzzy c-means (FAFCM) algorithm for clus- using a genetic algorithm with new solution encoding and
tering real world data sets. In: Advanced computing, networking operators. In: International conference on computational science
and informatics, vol 1. Springer, Cham, pp 339–348 and its applications. Springer, Cham, pp 92–103
163. Nayyar A, Puri V (2017) Comprehensive analysis & perfor- 185. Razmjooy N, Khalilpour M, Ramezani M (2016) A new meta-
mance comparison of clustering algorithms for big data. Rev heuristic optimization algorithm inspired by FIFA world cup
Comput Eng Res 4(2):54–80 competitions: theory and its application in PID designing for
164. Nerurkar P, Shirke A, Chandane M, Bhirud S (2018) Empirical AVR system. J Control Autom Electr Syst 27(4):419–440
analysis of data clustering algorithms. Proc Comput Sci 186. Rendon LE, Garcia R, Abundez I, Gutierrez C et al (2002) Niva:
125:770–779 a robust cluster validity. In: 2th. WSEAS international confer-
165. Niknam T, Olamaie J, Amiri B (2008) A hybrid evolutionary ence on scientific computation and soft computing, Crete,
algorithm based on ACO and SA for cluster analysis. J Appl Sci Greece, pp 209–213
8(15):2695–2702 187. Rijsbergen V (1979) Information retrieval. Butterworths, Lon-
166. Niu B, Wang H (2012) Bacterial colony optimization. Discrete don, p 1979
Dyn Nat Soc 2012:698057. https://doi.org/10.1155/2012/698057 188. Rokach L (2005) ‘‘Clustering methods’’, data mining and
167. Omran MG, Salman A, Engelbrecht AP (2006) Dynamic clus- knowledge discovery handbook. Springer, Berlin, pp 331–352
tering using particle swarm optimization with application in 189. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the inter-
image segmentation. Pattern Anal Appl 8(4):332 pretation and validation of cluster analysis. J Comput Appl Math
168. Ozturk C, Hancer E, Karaboga D (2015) Dynamic clustering 20:53–65
with improved binary artificial bee colony algorithm. Appl Soft 190. Sabau AS (2012) Survey of clustering based financial fraud
Comput 28:69–80 detection research. Inform Econ 16(1):110
169. Pacheco TM, Gonçalves LB, Ströele V, Soares SSR (2018) An 191. Saemi B, Hosseinabadi AA, Kardgar M, Balas VE (2018) Nat-
ant colony optimization for automatic data clustering problem. ure inspired partitioning clustering algorithms: a review and
In: 2018 IEEE congress on evolutionary computation (CEC). analysis. In: Advances in intelligent systems and computing,
IEEE, pp 1–8 pp 96–116
170. Pal NR, Biswas J (1997) Cluster validation using graph theoretic 192. Saha I, Maulik U, Bandyopadhyay S (2009) A new differential
concepts. Pattern Recognit 30(6):847–857 evolution based fuzzy clustering for automatic cluster evolution.
171. Paterlini S, Krink T (2006) Differential evolution and particle In: 2009 IEEE international advance computing conference.
swarm optimisation in partitional clustering. Comput Stat Data IEEE, pp 706–711
Anal 50(5):1220–1247 193. Sahoo AJ, Kumar Y (2014) Modified teacher learning based
optimization method for data clustering. In: Advances in signal
123
processing and intelligent recognition systems. Springer, Cham, 212. Strehl A, Ghosh J (2000) Clustering guidance and quality
pp 429–437 evaluation using relationship-based visualization. In: Intelligent
194. Saitta S, Raphael B, Smith I (2007) Abounded index for cluster engineering systems through artificial neural networks, St.
validity. In: Perner P (ed) Machine learning and data mining in Louis, Missouri, USA, pp 483–488
pattern recognition, vol 4571. Lecture notes in computer sci- 213. Sundararajan S, Karthikeyan S (2014) An efficient hybrid
ence. Springer, Berlin, pp 174–187 approach for data clustering using dynamic K-means algorithm
195. Salcedo-Sanz S, Carro-Calvo L, Portilla-Figueras A, Cuadra L, and firefly algorithm. J Eng Appl Sci 9(8):1348–1353
Camacho D (2013) Fuzzy clustering with grouping genetic 214. Suresh K, Kundu D, Ghosh S, Das S, Abraham A (2009)
algorithms. In: International conference on intelligent data Automatic clustering with multi-objective differential evolution
engineering and automated learning. Springer, Berlin, Heidel- algorithms. In 2009 IEEE congress on evolutionary computa-
berg, pp 334–341 tion. IEEE, pp 2590–2597
196. Sarsoh JT, Hashim KM, Miften FS (2009) Comparisons between 215. Taghva K, Sharma M (2007) Comparison of automatic clus-
automatic and non-automatic clustering algorithms. J Coll Educ tering and manual categorization of documents. In: Akhgar B
Pure Sci 4(1):221–227 (ed) ICCS
197. Satapathy SC, Naik A (2011) Data clustering based on teaching- 216. Tan PN, Steinbach M, Kumar V (2013) Data mining cluster
learning-based optimization. In: International conference on analysis: basic concepts and algorithms. In: Introduction to data
swarm, evolutionary, and memetic computing. Springer, Berlin, mining, pp 487–533
Heidelberg, pp 148–156 217. Tang WH, Wu QH (2011) Evolutionary computation. In: Tang
198. Sathappan S, Sridhar S, Tomar DC (2017) A literature study on WH, Wu QH (eds) Condition monitoring and assessment of
traditional clustering algorithms for uncertain data. J Adv Math power transformers using computational intelligence. Springer,
Comput Sci 21:1–21 London, pp 15–36
199. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A 218. Theodoridis S, Koutroubas K (1999) Pattern recognition. Aca-
et al (2017) A review of clustering techniques and develop- demic Press, Cambridge
ments. Neurocomputing 267:664–681 219. Thomas MC, Romagnoli J (2016) Extracting knowledge from
200. Senthilnath J, Das V, Omkar SN, Mani V (2013) Clustering historical databases for process monitoring using feature
using levy flight cuckoo search. In: Proceedings of seventh extraction and data clustering. In: Proceedings of the 26th
international conference on bio-inspired computing: theories and European symposium on computer aided process engineering—
applications (BIC-TA 2012). Springer, India, pp 65–75 ESCAPE vol 26, pp 861–864
201. Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly 220. Tran DC, Wu Z, Wang Z, Deng C (2015) A novel hybrid data
algorithm: performance study. Swarm Evol Comput clustering algorithm based on artificial bee colony algorithm and
1(3):164–171 k-means. Chin J Electron 24(4):694–701
202. Sharma M, Chhabra JK (2019) Sustainable automatic data 221. Tsai CW, Huang KW, Yang CS, Chiang MC (2015) A fast
clustering using hybrid PSO algorithm with mutation. Sustain particle swarm optimization for clustering. Soft Comput
Comput Inform Syst 23:144–157 19(2):321–338
203. Sharma SC (1996) Applied multivariate techniques. Wiley, New 222. Tsay RS (2005) Analysis of financial time series. Wiley, New
York York
204. Shehab M, Khader AT, Al-Betar MA (2017) A survey on 223. Tseng LY, Yang SB (2001) A genetic approach to the automatic
applications and variants of the cuckoo search algorithm. Appl clustering problem. Pattern Recognit 34(2):415–424
Soft Comput 61:1041–1059 224. Van der Merwe DW, Engelbrecht AP (2003) Data clustering
205. Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) using particle swarm optimization. In: The 2003 congress on
Big data clustering: a review. Lecture notes in computer science. evolutionary computation, 2003. CEC’03, vol 1. IEEE,
Springer, Cham, pp 707–720 pp 215–220
206. Shukla AK, Banshal SK, Seth T, Basu A, John R, Muhuri PK 225. Van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a
(2020) A bibliometric overview of the field of type-2 fuzzy sets computer program for bibliometric mapping. Scientometrics
and systems [discussion forum]. IEEE Comput Intell Mag 84(2):523–538
15(1):89–98 226. Von Luxburg U (2007) A tutorial on spectral clustering. Stat
207. Shukla AK, Sharma R, Muhuri PK (2018) A review of the Comput 17(4):395–416
scopes and challenges of the modern real-time operating sys- 227. Vo-Van T, Nguyen-Hai A, Tat-Hong MV, Nguyen-Trang T
tems. Int J Embed Real-Time Commun Syst (IJERTCS) (2020) A new clustering algorithm and its application in
9(1):66–82 assessing the quality of underground water. Sci Program
208. Shukla N, Merigó JM, Lammers T, Miranda L (2020) Half a 2020:6458576. https://doi.org/10.1155/2020/6458576
century of computer methods and programs in biomedicine: a 228. Wang R, Zhou Y, Qiao S, Huang K (2016) Flower pollination
bibliometric analysis from 1970 to 2017. Comput Methods algorithm with bee pollinator for cluster analysis. Inf Process
Programs Biomed 183:105075 Lett 116(1):1–14
209. Silva Filho TM, Pimentel BA, Souza RM, Oliveira AL (2015) 229. Wang S, Wu Y (2010) Clustering analysis based on chaos
Hybrid methods for fuzzy clustering based on fuzzy c-means genetic algorithm. In: 2010 Chinese control and decision con-
and improved particle swarm optimization. Expert Syst Appl ference. IEEE, pp 16–19
42(17–18):6315–6328 230. Wei G, Liu H, Xie M (2009) Clustering large spatial data with
210. Singh J, Kumar R, Mishra AK (2015) Clustering algorithms for local-density and its application. Inf Technol J 8(4):476–485
wireless sensor networks: a review. In: 2015 2nd international 231. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H,
conference on computing for sustainable global development McLachlan GJ, Ng A, Liu B, Philip SY, Zhou ZH (2008) Top 10
(INDIACom) algorithms in data mining. Knowl Inf Syst 14(1):1–37
211. Storn R, Price K (1997) Differential evolution—a simple and 232. Xu D, Tian Y (2015) A comprehensive survey of clustering
efficient heuristic for global optimization over continuous algorithms. Ann Data Sci 2(2):165–193
spaces. J Glob Optim 11(4):341–359 233. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE
Trans Neural Netw 16(3):645–678
123
234. Yang XS (2010) Firefly algorithm, nature inspired metaheuristic 242. Zhao XQ, Zhou JH (2015) Improved kernel possibilistic fuzzy
algorithms, 2010. LUniversityer Press, Frome clustering algorithm based on invasive weed optimization.
235. Younsi R, Wang W (2004) A new artificial immune system J Shanghai Jiaotong Univ (Sci) 20(2):164–170
algorithm for clustering. In: International conference on intel- 243. Zhong Y, Zhang S, Zhang L (2013) Automatic fuzzy clustering
ligent data engineering and automated learning. Springer, Ber- based on adaptive multi-objective differential evolution for
lin, Heidelberg, pp 58–64 remote sensing imagery. IEEE J Sel Top Appl Earth Obs
236. Yu D, Xu Z, Kao Y, Lin CT (2017) The structure and citation Remote Sens 6(5):2290–2301
landscape of IEEE Transactions on Fuzzy Systems (1994–2015). 244. Zhou Y, Wu H, Luo Q, Abdel-Baset M (2018) Automatic data
IEEE Trans Fuzzy Syst 26(2):430–442 clustering using nature-inspired symbiotic organism search
237. Yu JY, Chong PHJ (2005) A survey of clustering schemes for algorithm. Knowl-Based Syst 163:546–557
mobile ad hoc networks. IEEE Commun Surv Tutor 7(1):32–48 245. Zhou Y, Wu H, Luo Q, Abdel-Baset M (2019) Automatic data
238. Žalik KR (2008) An efficient k0 -means clustering algorithm. clustering using nature-inspired symbiotic organism search
Pattern Recognit Lett 29(9):1385–1391 algorithm. Knowl-Based Syst 163:546–557
239. Žalik KR, Žalik B (2011) Validity index for clusters of different 246. Zou F, Chen D, Xu Q (2019) A survey of teaching–learning-
sizes and densities. Pattern Recognit Lett 32(2):221–234 based optimization. Neurocomputing 335:366–383
240. Zanjireh MM, Shahrabi A, Larijani H (2013) ANCH: a new
clustering algorithm for wireless sensor networks Publisher’s Note Springer Nature remains neutral with regard to
241. Zhao M, Tang H, Guo J, Sun Y (2014) Data clustering using jurisdictional claims in published maps and institutional affiliations.
particle swarm optimization. In: Future information technology.
Springer, Berlin, Heidelberg, pp 607–612
123
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at