2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
Survey paper
1. Introduction two groups have proved to be very effective and efficient, they gen-
erally depend on providing prior knowledge or information of the
Clustering (an aspect of data mining) is considered an active method exact number of clusters for each dataset to be clustered and analyzed
of grouping data into many collections or clusters according to the (Chang et al., 2010). More so, when dealing with real-world datasets,
similarities of data points features and characteristics (Jain, 2010; it is normal not to expect or have any prior information regarding the
Abualigah, 2019). Over the past years, dozens of data clustering tech- number of naturally occurring groups in the data objects (Liu et al.,
niques have been proposed and implemented to solve data clustering 2011). Therefore, the concept of automatic data clustering algorithms
problems (Zhou et al., 2019; Abualigah et al., 2018a,b). In general,
is introduced to address this limitation. Automatic clustering algorithms
clustering analysis techniques can be divided into two main groups:
refer to any clustering techniques used to automatically determine
hierarchical and partitional (Tan, 2018). Although methods in these
∗ Correspondence to: School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Road, Pietermaritzburg,
KwaZulu-Natal 3201, South Africa.
∗∗ Corresponding author at: Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953, Jordan.
E-mail addresses: [email protected] (A.E. Ezugwu), [email protected] (A.M. Ikotun), [email protected] (O.O. Oyelade),
[email protected] (L. Abualigah), [email protected] (J.O. Agushaka), [email protected] (C.I. Eke), [email protected]
(A.A. Akinyelu).
https://doi.org/10.1016/j.engappai.2022.104743
Received 7 December 2021; Received in revised form 16 January 2022; Accepted 3 February 2022
Available online 23 February 2022
0952-1976/© 2022 Elsevier Ltd. All rights reserved.
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
2
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
taxonomy of existing clustering algorithms, debating each algorithm’s • Provides a concise presentation of concepts, architecture, and
various measures of similarity and evaluation criteria. Nagpal (2013) taxonomy of clustering algorithms.
carried out a comparative analysis of the different clustering algo- • Presents a discussion on open recent research issues relating to
rithms concerning both the mixed and categorical datasets with the clustering problems
observation that no clustering algorithm can be adjudged as best for • Defined possible future research trends and directions regarding
handling a large dataset of either the mixed or categorical dataset. the implementation and application of clustering algorithms in
Oyelade et al. (2016) examined various clustering algorithms and their different research domains.
suitability for gene expression data to discover and provide helpful
knowledge that will guarantee stability and a high degree of accuracy
in the area. Jain (2010) summarized well-known clustering methods 2. Methodology
with a discussion on critical issues and challenges in the design of
clustering algorithms. Jain et al. (1999) discussed emerging techniques This section presents the procedure used in selecting and reviewing
for non-numeric constraints and large sets of patterns. Ezugwu et al. the various clustering methods considered in this survey. In this com-
(2020a) presented an in-depth and systematic review of nature-inspired prehensive review process and methodology, the standard approach
metaheuristic algorithms used for automatic clustering analysis focus- for systematic literature review was adopted and followed to ensure
ing on the metaheuristic algorithms that have been employed to solve that the topic of interest is sufficiently covered and reduce bias on
clustering problems over the last three decades. the review work. In this study, the literature review procedure pro-
Obviously, from the literature, there has been a considerable growth posed by (Weidt and Silva, 2016) was used in this paper. Moreover,
of interdisciplinary interests and dynamics in the application of clus-
the work (Thilakaratne et al., 2019) served as a guide. The search
tering analysis to different research domains indicating that without a
techniques, search keywords, databases, data sources, and the inclusion
doubt, much has been achieved regarding clustering with new emerg-
and exclusion criteria used in this survey are explained below.
ing research directions in automatic clustering algorithms (Ezugwu
et al., 2020a). However, despite the decades of reported research on
clustering methods and algorithms, the existing literature is remark- 2.1. Keywords
ably segmented. Moreover, applied researchers find it challenging to
acquire systematic information on research progress and advancement In order to obtain the relevant literature, different keywords were
on the subject (Ezugwu, 2020a). Therefore, there is the need for a selected regarding the defined goal of this survey. Initially, several
comprehensive systematic survey of literature on both the traditional
keywords were formulated but later streamlined to reflect the research
and recently proposed clustering techniques that have been applied in
objective. The keywords used in the extraction of articles include:
different fields. Hence, the following main research question for this
‘‘Clustering’’, ‘‘non-supervised classification’’, ‘‘Clustering Algorithms’’,
study has been formulated as follow:
‘‘Clustering Methods’’, ‘‘Evolutionary Clustering Algorithms’’, ‘‘Nature
‘‘What are the various state-of-the-art clustering methods and algorithms Inspired Clustering Algorithms’’; ‘‘Data Mining Algorithms’’; ‘‘Cluster-
discussed in the literature, and in what research domains have they been ing Applications Areas’’; ‘‘Clustering in Data Mining’’. Each of the
applied?’’. various clustering methods from the taxonomy was paired with the
word ‘‘clustering’’ to search for articles solely on the technique. For
Towards realizing the answer for the main research question, the
example, ‘‘mode seeking clustering’’, ‘‘subspace clustering, etc. The
following sub-research questions are formulated:
various application areas discussed were also paired with the term
(a) What are the various traditional and recently proposed clustering ‘‘Clustering’’ to search for articles that reported clustering activities in
techniques and algorithms in existence today? the field and the clustering algorithm employed.
(b) What research has been conducted using both the traditional
and recently proposed clustering techniques to address identified
2.2. Searching the articles
challenges of clustering ?
(c) In what domains have both the traditional and recently pro-
posed clustering techniques been applied in solving clustering Two different rounds of the search were performed. The first round
problems? of searches was carried out from 11th to 28th of November 2020,
(d) How have various similarity measures been employed in tradi- while the second round of questioning was conducted between the
tional and recently proposed clustering techniques? 7th to 18th of December 2020. More searches were conducted in late
(e) What are the characteristic differences between the traditional December 2020 through mid-January 2021. During the search, more
and recently proposed clustering techniques that have been ap- related papers were extracted from the citations of the selected relevant
plied in different fields? articles.
(f) What are other challenges of clustering problems yet to be explored
by researchers in this research area? 2.3. Academic databases
This survey aims to provide an up-to-date comprehensive review of
the different clustering techniques applied to many data mining-related The formulated keywords were used in the retrieval of the litera-
fields. Retrospectively, we also highlight novel and most recent practi- ture. Reputable peer-reviewed journals, conference proceedings, and
cal applications areas of clustering. This survey is intended to provide edited books indexed in different academic databases were targeted in
a convenient research path for new researchers, furnishing them with a the study. Table 1 below contains the list of the academic databases
comprehensive study on the various data clustering techniques and re- targeted during the search. No specific interval period was stated at
search progression over the years in clustering techniques. This survey the beginning of the investigation. This enables proper capturing and
will also help experts develop new algorithms for emerging challenges in-depth study of existing traditional clustering methods, which served
in the research area. The main contribution of this survey study is as as a bedrock for the more recent ones. However, in selecting previous
follows:
related reviews and clustering applications, the selected articles were
• Provides an up-to-date comprehensive systematic review of the streamlined to the articles published from 1999 to date. This is to
traditional and recently proposed clustering techniques that have capture updated works on unsupervised classification techniques for
been applied in different fields. different domains.
3
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
4
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Table 2
The inclusion and exclusion criteria used.
Inclusion Exclusion
The review focused on the various Clustering methods and algorithms. Articles on general Data Mining techniques were not considered.
Articles that reviewed specific clustering methods and their variants. Articles focusing on the review of the performance of a specific method in a specific
domain were excluded.
Articles on the application of a specific method in a specific domain. Articles that focused on the comparison of various methods in a specific domain were
not considered
Only articles written in the English language were considered. Articles written in other languages were excluded from the review
Only published articles from reputable peer-reviewed journals, conference Keynote speeches, PowerPoint presentation slides were not considered.
proceedings and edited books were considered.
Table 3
Summary of the previous related review on metaheuristic algorithms.
Reference Publication date Remark
Jain et al. (1999) 1999 Areas covered include pattern representation, similarity computation, grouping process, and cluster representations.
Other areas include statistical, fuzzy, neural, evolutionary, and knowledge-based approaches for clustering.
Application areas covered include image segmentation, object recognition, document retrieval and data mining.
Murtagh (1983) 1983 The study presented an in-depth survey of agglomerative hierarchical clustering algorithms and discussed efficient
implementations in R and other software environments. Similarly, a review of grid-based clustering focusing on
hierarchical density-based approaches was also presented.
Belkin et al. (2006) 2006 Covered a few classical and evolutionary clustering algorithms with their associated challenges to the identified
practical application areas
José-García and 2016 A total of 65 automatic clustering approaches were reviewed based on single-solution, single-objective, and
Gómez-Flores (2016) multiobjective metaheuristics, whose usage percentages are 3%, 69%, and 28%, respectively.
Saxena et al. (2017) 2017 Presented a comprehensive study on existing clustering methods and developments made at various times. The
similarity and the evaluation criteria, which are the central components of clustering, are also presented in the paper.
Bandaru et al. (2017) 2017 The paper surveyed the different data mining methods that can be applied to extract knowledge about multi-objective
optimization problems from the solutions generated during optimization.
Yang and Wang (2018) 2018 This paper serves as an introductory text and survey for multi-view clustering. Simultaneously, the study summarizes
many multi-view clustering algorithms and provides a taxonomy according to the mechanisms and principles involved.
Hancer et al. (2020) 2020 The paper introduces a comprehensive survey on feature selection approaches for clustering, reflecting on some
advantages/disadvantages of current approaches from different perspectives and identifying promising trends for future
research.
Ezugwu et al. (2020a) 2020 Presented a systematic taxonomical overview and bibliometric analysis of trends and progress in nature-inspired
metaheuristic clustering approaches from the early attempts in the 1990s until today’s novel solutions
Bhattacharjee and Mitra 2021 The survey presents a comprehensive study of various Density-based clustering algorithms (DBCLAs) over last the two
(2021) decades, along with their classification.
This study 2021 Presentation of a comprehensive survey, taxonomy, analysis of state-of-the-art clustering algorithms that have ever
been identified in literature till today’s novel development and their practical applications areas.
cluster to any member of the other cluster. The complete-linkage algo- of (O(N2 ). Bekhin (2006) reported that hierarchical clustering algo-
rithm clusters are more compact and tightly bound than single-linkage rithms based on linkage metrics suffer from time complexity. Apart
clustering (Jain and Dubes, 1988). from these, there is also the problem of the vagueness of termination
In calculating the inter-cluster distances, the three proximity as criteria (Bekhin, 2006) and lack of robustness due to its sensitivity
mentioned earlier measures consider all the points of a pair of clus- to noise and outliers. It has also been reported that linkage metrics-
ters. They are regarded as graph methods (Xu and Wunsch, 2005). based hierarchical clustering algorithms that use Euclidean distance
Sibson (1973) implemented the single linkage hierarchical clustering tend to form spherical shapes (Bekhin, 2006). Fraley (988), in their
algorithm to produce SLINK, Voorhees (1986) and Defays (1977) work, observed that ‘‘A drawback of agglomerative methods is that
implemented Voorhees’ method and CLINK, which are the implementa- those that are practical in terms of time efficiency require memory
tions of average link clustering algorithm and complete link clustering usage proportional to the square of the number of groups in the initial
algorithm, respectively. Using the central point for determining prox- partition’’.
imity measure, other geometrics methods were developed based on the
same idea. These included the median linkage, centroid linkage, and
4.1.1. Agglomerative hierarchical clustering
the minimum variance linkage metrics (Berkhin, 2001; Murtagh, 1983;
Day and Edelsbrunner, 1984). A distance-based proximity measure cap- In the agglomerative method, clusters are built up from single
tures the inter-cluster closeness while the similarity measures capture objects, which are iteratively merged appropriately into larger clusters
the intra-cluster connectivity. that form the hierarchy’s various levels until the entire object includes
The hierarchical clustering algorithms can easily handle any simi- a single cluster or until the stopping criterion is met. The single cluster
larity measure and flexible level of granularity (Bekhin, 2006, Punit, forms the root of the hierarchy. Two closest objects (based on the
2018). As a result of these, it applies to any attribute type. The hierar- similarity measure used) are combined to form a cluster during the
chical clustering methods are plagued with irreversible split or merge merging operation. It requires at most n iterations to complete the
processes such that already formulated clusters cannot be revisited clustering operation since two clusters are merged during one iteration.
to reassign wrongly assigned objects. The application of hierarchi- Ward’s clustering method implements an agglomerative clustering
cal clustering in large-scale data sets is limited because it has high algorithm that is not based on the linkage metric implemented by Ward
computational complexity. According to Saxena et al. (2017), most in 1963 (Ward, 1963). It was based on K-Means’ objective function
of the hierarchical clustering has at least a computational complexity with the merger decision dependent on its effect on the objective
5
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Table 4
Summary of major survey literature on clustering algorithms.
Clustering methods Study covered Application area Author and year Impact as of 2021
Clustering algorithms with Survey of clustering algorithms for data sets Benchmark data sets, such as the Xu and Wunsch (2005) 6369
design concepts that are appearing in statistics, computer science, traveling salesman problem and
based on the following: and machine learning. Several tightly bioinformatics
graph theory, related topics, such as proximity measure
combinatorial search and cluster validation, were also covered in
techniques, fuzzy set the paper
theory, neural networks,
and kernels techniques
FCM, BIRCH, DENCLUE, The survey provided a comprehensive study Big data clustering covering Adil et al. (2014) 785
OptiGrid, and EM of the clustering algorithms proposed in the
literature, namely Fuzzy-Cmeans (FCM),
BIRCH algorithm, DENCLUE algorithm,
Optimal Grid (OptiGrid), and
Expectation-Maximization (EM)
Traditional and modern The survey covered at least 19 categories of Generic application Xu and Tian (2015) 680
clustering algorithms the commonly used clustering algorithms,
with high practical relevance and
well-studied in the literature.
Hierarchical and General discussion and presentation of Document collection Rasmussen (1992) 677
nonhierarchical methods clustering algorithms specifically targeted at
information retrieval applications
Text clustering algorithms Provided a detailed survey of the problem Text clustering relative to social Aggarwal and Zhai 653
including Hierarchical of text clustering. The key challenges of the network and linked data (2012)
methods clustering problem, as it applies to the text
domain were discussed as well.
BIRCH, CluStream, Presented a survey of data stream clustering Network intrusion detection, sensor Silva et al. (2013) 463
ClusTree, D-Stream, algorithms, providing a thorough discussion networks, and stock market analysis
DenStream, DGClust, of the main design components of
ODAC, Scalable K-Means, state-of-the-art algorithms.
Single-pass K-Means,
Stream, Stream LSearch,
StreamKM++,
SWClustering
Evolutionary algorithms, Provided an up-to-date review of all major Character recognition, traveling Nanda and Panda 374
physical algorithms, swarm nature-inspired metaheuristic algorithms salesman problem, blind channel (2014)
intelligence, bio-inspired that were employed for partitional equalizer design, human action
algorithms, and other clustering classification, book clustering, texture
nature inspired algorithms segmentation, tourism market
segmentation, analysis of gene
expression patterns,
electrocardiogram processing, the
security assessment in power
systems, manufacturing cell design,
clustering of sensor nodes,
identification of clusters for accurate
analysis of seismic catalogs.
Fuzzyc-means, The paper provided surveys and summaries Business and socio-economics, Liao (2005) 2524
Agglomerative hierarchical, of most previous works investigating the engineering, science, medicine, art
K-Means, fuzzyc-Means, clustering of time series data in various and entertainment
K-Medoids-based genetic application domains.
clustering, Neural network
clustering performed by a
batch EM version of
minimal free energy vector
quantization
Evolutionary algorithms This paper provides an up-to-date overview Image processing, computer security, Hruschka et al. (2009) 731
of evolutionary algorithms for clustering, and bioinformatics
including advanced topics such as
multiobjective and ensemble-based
evolutionary clustering. Similarly, the study
also provides a taxonomy that highlights
some important aspects in the context of
evolutionary data clustering, namely, fixed
or variable number of clusters,
cluster-oriented or non-oriented operators,
context-sensitive or context-insensitive
operators, guided or unguided operators,
binary, integer, or real encodings,
centroid-based, medoid-based, label-based,
tree-based, or graph-based representations,
among others.
6
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
( )
function. Furthermore, it is considered a general agglomerative hi- of 𝑂 𝑁 2 𝑙𝑜𝑔𝑁 and its performance is good on a 2-dimensional data
erarchical clustering procedure. The criterion for choosing the pair set. CURE achieves scalability by using data sampling and partitioning;
of clusters to merge at each step is based on the optimal value of clusters with fine granularity are first constructed in partitions. Clusters
an objective function. This clustering method is most appropriate for are represented by a fixed number of points scattered around it. The
quantitative variables and not binary variables. Gowda and Krishna distance between two clusters is generated by finding the minimum
(1978) developed a non-parametric hierarchical, agglomerative cluster- distance between the two clusters’ representative points. The use of
ing algorithm based on the use of a conventional nearest neighbor to scattered representative points enables CURE to identify clusters of
determine the mutual neighborhood value (MNV) and mutual nearest diverse sizes and shapes. The scattered representative points are shrunk
neighbors (MNN) of a sample point. Their simple, non-deterministic to the cluster’s geometric centroid as the clustering progresses based on
and non-iterative algorithm requires low storage and can discern non- the user-specified factor. The choice of the input parameters for CURE:
spherical and spherical clusters. More so, their method was reported the shrink factor, representative point number, sample size, and the
to have the ability to discern mutually homogeneous clusters and number of partitions affect the clustering output. CURE was developed
applications to a wide class of data of arbitrary shape, large size, and to work with a dataset with numerical attributes (Berkhin, 2001).
high dimensionality.
The Clustering Using Representative (CURE) implements agglomer- 4.1.2. Divisive hierarchical clustering
ative hierarchical clustering algorithms for large databases. It is more Divisive hierarchical clustering is a reverse of the agglomerative
robust to outliers and can identify clusters of different sizes and shapes clustering process that effectively divides every cluster into smaller
but with lesser cluster quality than BIRCH. It has a time complexity chunks beginning with every object in a single cluster until the required
7
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
likely bipartitions. The larger the number of objects, the possible bipar-
tition number is further decreased by the monothetic method. Besides,
they offer binary questions that facilitate the interpretation of cluster-
ing structures. On the other hand, the polythetic divisive clustering
approach utilizes all variables concurrently via dissimilarity or distance
values. It does not rely on single variable order but depends entirely
on distance values, and the distance values reflect on all variable
dissimilarity concurrently (Kim and Billard, 2011).
Various divisive clustering is polythetic such that all variables are
considered concurrently in tasks to reveal the two instances’ similarities
(Kim et al., 2017). However, a large variable may lead to a scalability
issue when there is a large variable. In this case, monothetic clustering
that works on a single feature in a time cycle is the best option.
Wang et al. (2014) proposed a modern divisive clustering algorithm
termed ‘Hierarchical grid clustering using data field’ (HGCUDF). In
this approach, hierarchical grids divide and conquer large datasets
in their subset’s hierarchy. However, the clustering regions limit the
search scope, minimizing the data space for producing data fields.
Fig. 2. A dendrogram representation for hierarchical clustering of data objects 1, 2, HGCUDF exhibits rapid execution and stability, which improves the
3, 4, 5, 6, 7..
clustering results on a large automated dataset. In another study,
Naim et al. (2014) investigated a model-based clustering technique
for high-dimension datasets. Its operation is divided into three stages:
number of clusters is attained. By contrast with the agglomerative multi-modal splitting, iterative weighted sampling, and uni-modality
clustering method, the divisive approach uses the top-down method preserving merging to measure the model-based clustering approach of
whereby the data objects are primarily considered a fused cluster large high-dimensional datasets. This method of clustering algorithm
that progressively divides pending when the cluster number is ac- solves the problem of small datasets and the effective scaling of the
quired (Boley, 1998; Savaresi et al., 2002; Chavent et al., 2007). The large datasets when evaluated with synthetic datasets compared with
standard method of splitting a cluster into two subsets that contains the conventional methods. It is helpful in immune response jobs and
one or more elements requires the consideration of every likely bi- can tremendously regulate rare populations.
partition. Though it is normal to analyze all the likely bipartitions, Most of the clustering algorithms in the literature have focused on
whereby each cluster can be split into two sub-cluster, it is evident binary data. Recently, the clustering of categorical data has attracted
that the full enumeration process offers a universal optimum but very most researchers in finding a solution to the definite data-clustering
expensive in terms of computation cost. Various divisive clustering problem. Many researchers have proposed different divisive hierarchi-
approaches that do not consider all bipartitions have been investigated. cal clustering algorithms to combat the problem. For instance, Herawan
For instance, Karypis and Kumar (2000) proposed a bisecting K-Means et al. (2010) suggested ‘Maximum Dependency of Attributes’ (MDA) for
divisive clustering method to attain more accurate results than the divisive hierarchical clustering attributes selection. The maximum de-
traditional K-Means or agglomerative method. In another study, Zhong pendency of attributes is created by relying on attributes dependency in
et al. (2008) investigated a novel clustering method called ‘‘reference- rough set theory, which measures the dataset’s attributes dependency.
point-based dissimilarity measure’’ (DIVFRP) by combining it with the Mazlack et al. (2000) investigated a bi-clustering method for choosing
divisive clustering method purpose of datasets partition. An improved two-valued attributes by considering multi-valued attributes and a
particle optimizer (IDPSO) was proposed by Feng et al. (2010) to Total Roughness (TR) approach. They maintained that high total rough-
determine the closest optimal partition hyperplane for splitting des- ness attributes attain optimum performance and are suitable for cluster
ignated clusters into two smaller chunks. The divisive hierarchical splitting. In another study, Parmar et al. (2007) developed a Min–Min-
approach that uses this splitting method is both practical and efficient. Roughness (MMR) metric to resolve the uncertainty in the categorical
data clustering process. However, MMR signifies TR’s reverse and does
Macnaughton-Smith et al. (1964) and Kaufman and Rousseeuw (2009)
not yield clustering algorithms with comparative improvement in com-
used an average dissimilarity between an object and a set of objects
plexity or accuracy (Herawan and Deris, 2009; Herawan et al., 2010).
to investigate the iterative divisive procedure. However, a different
Xiong et al. (2009) investigated a divisive method for categorical data
approach that uses a dissimilarity matrix as an input relies on the
based on ‘‘Multiple Correspondence Analysis’’ (MCA). Similarly, Qin
optimization criteria that include the partition or bipartition (Guénoche
et al. (2014) implemented information theory-based divisive clustering
et al., 1991; Wang et al., 1996).
for categorical data by employing Mean Gain Ratio (MGR) to choose
Divisive clustering can be classified into two primary, monothetic,
clustering attributes and select class equivalents on the cluster attribute
or polythetic methods. A divisive cluster is monothetic if combining using cluster entropy. Although divisive clustering is appealing based
a logical characteristic of each one involving one variable is essential on computational time, partitioned clusters’ quality is better than a
and adequate for cluster membership (Sneath and Sokal, 1973). Mono- divisive one.
thetic divisive clusters are acquired by employing a single variable in Naim et al. (2014) proposed a model-based SWIFT (scalable
each splitting by partitioning objects with specific object values from weighted iterative Flow-Clustering Technique) for high-dimensional
those without value. Monothetic is usually a variant of the ‘‘associa- large datasets. The model consists of three stages, namely, multimodal-
tion analysis method’’ (Williams and Lambert, 1959) and is proposed ity splitting, iterative weighted sampling, and unimodality preserv-
for binary data. Various studies have applied monothetic clusters for ing merging for model-based clustering scaling on high-dimensional
problem-solving. For instance, Kim (2009) and Brito and Chavent datasets are constructed to be effectively scalable to large datasets,
(2012) employed a monothetic clustering approach on interval and his- offering a significant enhancement when compared with the current
togram data. Similarly, Kim and Billard (2012) utilized the monothetic soft clustering approaches (Lo et al., 2008; Ge and Sealfon, 2012).
clustering method on multi-modal data. However, those monothetic These three major SWIFT stages are motivated by two main require-
approach decreases the number of computations required to identify an ments: rare population identification and scalability to large datasets. In
optimum bipartition, such that only 𝑝 (𝑛 − 1) bipartition is needed for SWIFT, multimodality splitting and weighted iterative sampling iden-
( )
testing to determine the optimum bipartition instead of every 2𝑛−1 − 1 tify rare populations. This algorithm is usually met for Flow Cytometry
8
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
(FC) and finding rare populations. The multimodality stage plays a vital
role in identifying rare subpopulations. When evaluated with synthetic
datasets, the algorithm solves small datasets and can effectively scale
large datasets compared to conventional methods. SWIFT may also be
employed to represent skewed clusters by LDA-based agglomerative
merging, which decreases clusters numbers as it preserves the separate
unimodal populations.
The interaction between the merging and multimodality splitting in
many clusters uses a reasonable heuristic (cluster modality). It is more
reasonable when compared with knee point in entropy plots formally
employed (Lo et al., 2008; Finak et al., 2009). The algorithm advanta-
geous immune response tasks and efficient scaling on large FC datasets.
Moreover, the soft clustering method utilized in SWIFT is essential
for understanding the overlapping clusters compared to the complex
clusters approach like K-Means (Murphy, 1985) or spectral clusters
(Zare et al., 2010). SWIFT has the power to control the tremendously
rare populations. SWIFT is partially synonymous with flowPeaks (Ge
and Sealfon, 2012) since both depend on unimodality criterion. Thus,
flowPeaks focuses on the significant peaks without modality splitting
and leans in missing tiny overlapping clusters. Consequently, one of
the limitations of SWIFT is that it is restricted to a specific clustering Fig. 3. A partition with 𝑛 = 154 and 𝑘 = 4.
task (Naim et al., 2014).
4.1.3. Some implementations for improving hierarchical clustering 4.2. Partitional clustering algorithm
The traditional algorithm has been enhanced over time to over-
come the deficiencies of hierarchical algorithms. One improvement In a partitional clustering algorithm, data is organized into a nested
that takes hierarchical clustering limitation (due to large dataset) into sequence of groups without any hierarchical structure (Jain and Dubes,
consideration is the Balanced Iterative Reducing and Clustering Using 1988; Jain, 2010). Jain et al. (1999) stated that the partitioning method
Hierarchies (BIRCH) clustering algorithm (Zhang et al., 1996). Birch is suitable for handling clustering problems in applications involv-
employed the idea of Cluster Features (CF) which is presented as a ing large data sets for which the construction of a dendrogram is
triple containing the cluster objects total number n, the linear sum of computationally prohibitive. Their operation is based on generating
attribute values of the cluster objects LS, and the sum of squares of the data clusters to recover the existing natural groupings inherent in the
attribute’s values of the cluster object SS, CF (n, LS, SS). The CF triple dataset. Fig. 3 illustrates the clustering pattern representation of the
is a data structure that summarizes the information maintained about partitional clustering method.
a cluster (Oyelade et al., 2016). CF triple is kept in a tree form, and The dataset of 𝑛 objects is iteratively partitioned into a pre-
only the tuples are kept in the main memory. determined k number of distinct subsets through the process of opti-
BIRCH is reported to be robust to outliers and can achieve O(N) mization of a criterion function (Ahmad and Dey, 2007). The squared-
computational complexity (Nagpal, 2013). The BIRCH algorithm con-
error criterion is the criterion function on which the most commonly
sists of four phases, with phases 2 and 4 being optional (Oyelade et al.,
used partitional clustering algorithm is based. The general objective is
2016). The scanning of the entire dataset and the construction of the CF
to find the partition for which a fixed number of clusters minimizes the
tree is handled in phase 1. The clustering information stored in the CF
square error. In this case, deviations of the patterns from the centroids
Tree is done in such a way as to adequately reflect all the information
are represented by the error with a view of the patterns as a collection
in the dataset as much as possible while still accommodating the limita-
of 𝑘 numbered spherically shaped clusters. The target cost function 𝜁
tion imposed by the memory space with crowded data points grouped
which can be minimized is given in Eq. (1):
as fine sub-clusters. During the CF tree formation, outliers are treated
as sparse data points and are removed from the dataset. The generation ∑
𝑛
‖ ‖𝑞
𝜁= ‖𝑑𝑖 − 𝐶𝑗 ‖ (1)
of the CF tree in phase 1 ensures no other input–output operation ‖ ‖
𝑖=1
is required in the subsequent phases, thus reducing the computation
time for the remaining steps. The clustering activity is also reduced to where 𝐶𝑗 is defined as the center of the 𝑗th cluster and is the center
smaller sub-datasets of each sub-clusters in the entries of leaves of the nearest to the data object 𝑑𝑖 , while the variable 𝑛 denotes the number
CF tree because these are generated through incremental updating of of elements in the data set, and 𝑞 is an integer that defines the nature
the CF (Oyelade et al., 2016). The order of leaf entry of the initial tree of the distance function (𝑞 = 2 for Euclidean distance) as discussed in
construction produces a better data locality, enhancing the clustering Ahmad and Dey (2007).
output. BIRCH algorithm is credited with the ability to handle outliers, The partitional clustering algorithm starts with an initial dataset
large datasets, and a good clustering output that is not affected by partition and iteratively assigns data points or patterns to clusters to
order of input data. The efficiency of BIRCH is, however, dependent reduce the square error. A set of K seed points well separated from
on proper parameter settings. It also has the problem of biases towards each other can be chosen randomly from the pattern matrix to be the
non-spherical clusters due to diameter/radius in controlling the cluster initial partition. Nagpal (2013) noted that good seed points could be
boundary. Evaluation of BIRCH using both synthetic and real datasets determined if the selected initial points are from existing data objects
showed that it returns a better result in computational time complexity, and sufficiently distanced from one another. As the number of clusters
the robustness of the approach, and cluster quality. increases, the square error tends to reduce, and this minimization can
COBWEB is another implementation of the hierarchical cluster- only be achieved for a fixed number of clusters.
ing algorithm for categorical data characterized with two major vital In some algorithms, the square error criterion function in partition
qualities: incremental learning and clusters’ ability to be modeled or clustering makes the generated K numbered clusters as compact and
described intrinsically instead of being regarded as a collection of separated as possible. It is less computationally demanding than other
points. By processing one data point at a time, Cobweb dynamically criterion functions (Jain and Dubes, 1988). Because square-error-based
builds the hierarchy of clusters instead of the merging or splitting algorithms can converge to local minimal, different initial partitions
approach of the agglomerative or divisive methods. can produce varying clusters as output, especially if the initial points
9
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
are not well separated (Nagpal, 2013). According to (Jain and Dubes,
1988), partitional techniques are frequently used in engineering ap-
plications where single partitions are most important and appropriate
for efficient representation and compression of large databases. It has
also been observed that partitional algorithms are preferred in pattern
recognition due to the nature of the available data (Jain, 2010). The
partitional clustering method is a local search technique (Khaled et al.,
1995) and local convergence. Therefore the optimal global solution
cannot be guaranteed (Sanse and Sharma, 2015a).
Partitioned-based clustering is an NP-hard optimization problem
with the standard approach of finding an approximate solution (Har-
shada et al., 2015). Jain et al. (1999) stated that ‘‘the combinatorial
search of the set of possible labelings for an optimum value of a
criterion function is computationally prohibitive’’. As a result, the
typical partitional clustering algorithm runs several times with varying
starting partitions so that the one that gives the best clustering output Fig. 4. Initial dataset represented as an undirected graph.
of all the runs is chosen as the optimal solution Jain et al. (1999).
One major disadvantage of Partitional Clustering algorithms is the
need for predefined user values for parameter 𝑘, which is usually non-
deterministic (Suganya et al., 2018; Jain et al., 1999). This arbitrary
choice of clusters centroid leads to wrong clustering output (Oyelade
et al., 2016). Clustering Algorithms based on the partitional methods
usually generate clusters of approximately similar sizes because data
objects are permanently assigned to the nearest centroid, which invari-
ably results in incorrect cut borders between clusters (Harshada et al.,
2015). The partitional clustering method has been noted for having
biasedness to spherically shaped clusters and that it is also unable
to handle highly connected clusters and a high-dimensional dataset
(Oyelade et al., 2016).
The partitional Clustering algorithm can be categorized based on the
various techniques adopted in generating the clusters and the nature of
the resultant clusters produced. These include Hard/Crisp Clustering,
Fuzzy clustering, and Mixture Resolving Clustering.
Fig. 5. Resulting clustered subgraphs with cuttings at points a, b, c, d.
4.2.1. Hard/Crisp CLustering
Each data object belongs to only one cluster in a hard or crisp clus-
tering algorithm. The clustering methods under this category include
Chameleon (an agglomerative hierarchical clustering algorithm) (Xu
Graph-theoretic clustering, Density-based clustering, Model-based clus-
and Wunsch, 2005). Using the k-nearest-neighbor graph approach,
tering, Subspace clustering and Miscellaneous Clustering.
Chameleon constructs a sparse graph with each data object representing
4.2.1.1 Graph-theoretic clustering A Graph structure is a data structure a vertex of the graph with an edge existing between pairs of vertices.
made up of nodes and edges connecting the nodes. A graph can The weight of each edge indicates the similarities (distance) between
model relationships between features of data objects and list important, the corresponding vertices. The k-nearest neighbor graph is partitioned
relevant features during data analysis. In graph-theoretic clustering, into several relatively small sub-clusters using a graph partitioning
clusters are represented in graphs (Saxena et al., 2017). The data algorithm in such a way as to minimize the weight of the edges to
objects are represented as nodes that are connected as edges. The edges be cut. The clustering eliminates edges whose vertices are not within
reflect the proximities between pairs of data points (Xu and Wunsch, the k closest points concerning each other and uses an agglomerative
2005). Nodes are divided into clusters so that the edge density across hierarchical clustering algorithm to merge similar sub-clusters.
clusters is smaller compared to edge density within clusters (Saxena
Another graph representation of hierarchical clustering is the Delau-
et al., 2017). Edges whose clustering length (weight) is substantially
nay triangulation graph (DTG) that uses a hypergraph where more than
larger than the nearby edges’ average is termed inconsistent. Nodes
two vertices are connected to an edge creating the hypergraph structure
are grouped into clusters based on the graph topology so that the out-
(Cherng and Lo, 2001). Zahn’s clustering algorithm (Zahn, 1971) is an
put clusters are characterized by high intra-connectivity/homogeneity
example of graph theory for non-hierarchical clustering. Uneven edges
and low inter-connectivity/homogeneity among the generated clusters.
in minimum spanning trees are detected and discarded in the bid of
Representing clusters in graphs is convenient, but it is not robust in
connecting components as clusters (Jain and Dubes, 1988). However,
handling outliers.
Graph theory can be used to represent both hierarchical and non- there is a need for the cluster shape’s pre-knowledge to select the proper
hierarchical clusters. Graph method that directly deals with connectiv- heuristic to identify irregular edges. Cluster Identification via Connec-
ity graphs can be used for linkage metrics-based hierarchical clustering tivity Kernels (CLICK) is an example of a graph theory-based clustering
when the connectivity 𝑁 × 𝑁 matrix is sparse (Jain and Dubes, 1988; algorithm. In CLICK, the minimum weight division is performed on the
Berkhin, 2001). This clustering method uses some topological prop- graph to generate clusters (Dongkuan, 2015; Sharan and Shamir, 2000).
erties in graphs to build clusters from a network of data objects. Specifying suitable parameters and criterion properties lead to some
Finding the maximally connected subgraphs in a graph structure is the practical difficulties to be addressed (Jain and Dubes, 1988). According
same as the problem of single linkage hierarchical clustering. In like to Jain and Dubes (1988), ‘‘no theory exists for choosing among the
manner, finding maximally complete subgraphs in a graph structure various properties of graphs to select the best clustering method for a
is equivalent to the total linkage hierarchical clustering (Jain and particular application’’. Figs. 4 and 5 show examples of graph-theoretic
Dubes, 1988). The k-nearest-neighbor graph model was used to develop clustering approaches.
10
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
4.2.1.2 Subspace clustering Subspace clustering is an extension of the points are naturally eliminated. Examples include Density-Based Spatial
traditional clustering algorithm with the primary aim of finding clusters Clustering of Applications with Noise (DBSCAN), ordering points to
in different subspaces in which a dataset exists. It is often better to use identify the clustering structure (OPTICS), DENsity-based CLUstEring
the subspaces in which a dataset exists for its description instead of (DENCLUE). The DBSCAN has a well-defined cluster model with fairly
describing a large dimensional dataset as a whole (Parsons et al., 2004). low complexity (Harshada et al., 2015). OPTICS solved the DBSCAN’s
This way, the subspace clustering technique helps discover hidden problem of choosing an appropriate value for the range parameter
knowledge in such a sizeable dimensional dataset. Clusters existing in producing a hierarchical output similar to linkage clustering (Harshada
multiple overlapping subspaces are easily identifiable using subspace et al., 2015).
clustering. In subspace clustering, redundant and irrelevant dimen- The use of the spatial index in finding data point’s neighborhood has
sions are removed using feature selection, leaving only the relevant ( )
been reported as improving the complexity of the model from 𝑂 𝑛2
dimension that the clustering algorithm uses to find the clusters in the to 𝑂 (𝑛𝑙𝑜𝑔𝑛) compare with other methods (Nagpal, 2013). The density-
dataset. based clustering method is reported as resistant to outliers, insensitive
The subspace clustering algorithm is categorized into two subsec-
to data object ordering, ability to form arbitrary shape clusters, and no
tions using the clustering algorithm search strategies: the top-down
need for pre-stating the number of clusters (Sanse and Sharma, 2015b).
and bottom-up approaches. The bottom-up subspace method uses an
However, they are not ideal for large data sets due to dimensionality.
APRORI style approach to leverage density’s downward closure prop-
The low-density areas as noise make the algorithms based on this
erty to reduce search space. The density’s downward closure property’s
clustering method unable to detect intrinsic cluster structure common
idea is those dense units exist in k-dimension, then dense units in (k-1)
in real-life data. There is also the problem of cluster border detection
dimension projections. Based on this, the bottom-up method creates a
because there is the need to have data point’s density drop to show the
histogram for each dimension and selects dimensions whose density is
demarcation between clusters (Harshada et al., 2015). Fig. 6 presents
above a given threshold. Examples of bottom-up subspace clustering
include CLIQUE (Agrawal et al., 1998), ENCLUS (Cheng et al., 1999), the clustering pattern of the density-based clustering method.
MAFIA (Goil et al., 1999), CBF (Chang and Jin, 2002), CLTree (Cheng 4.2.1.4 Model-based clustering In model-based clustering, data are as-
et al., 1999), DOC (Procopiuc et al., 2002). sumed to be generated by an underlying probability distribution or a
In the top-down subspace clustering, an initial approximation of the model (Fraley and Raftery, 1998). Each component of the distribution
clusters in the whole feature space with equally weighted dimensions represents a different cluster. The principle is to recover the model and
is first found. In the next step, a weight is assigned for each size in
use it to determine the data points that satisfy the generated model
each cluster using a sampling technique to improve the algorithm’s
or the probability functions in building clusters of similar data points.
performance. Clusters formed using this approach forms partitions of
Model-Based clustering seeks to optimize the fitness of the predefined
the given dataset, with each instance of the data object belonging to
model concerning the given data. Since clusters are generated using the
exactly one cluster. There is a need to specify the number of clusters
given data point, the total number of clusters present can be automati-
and the subspace’s size ahead of time, which is a bottleneck for the
cally generated to identify outliers easily. In Model-based clustering, a
approach. Parameter tuning must be performed to achieve a meaningful
mixture model is used in representing data, and the components of the
result. Dealing with outliers in the dataset is another challenge in this
model correspond to the different clusters.
approach. PROCLUS (Aggarwal et al., 1999), ORCLUS (Aggarwal et al.,
2000), FINDIT (Woo and Lee, 2002), COSA Friedman and Meulman, 𝛿- Fraley and Raftery (1998) reported two ways for formulating mod-
Clusters (Yang et al., 2002) are examples of clustering methods that use els for the composition of clusters: the classification likelihood ap-
this subspace clustering approach. proach and the mixture likelihood approach. Model parameters can
be found using the Maximum Likelihood Estimation (MLE) criterion
4.2.1.3 Density-based clustering In density-based clustering, dense re- (Sanse and Sharma, 2015b) as well as the Bayesian information cri-
gions in pattern space separated by low pattern density regions are terion (BIC) (Fraley and Raftery, 1998; Dasgupta and Raftery, 1998;
viewed as clusters in the pattern space. These regions with high density Mukerjee et al., 1998). The BIC can also determine between two
called modes are associated with a cluster center, while the objects in clusters the closest to which a data point will be assigned (Campbell
the sparse areas separating the clusters are considered as noise and et al., 1997). Data clustering uses two major approaches using this
outliers (Harshada et al., 2015). The data points are then added to clus- method: the statistical approach and the neural network approach
ters with the closest center. A histogram is constructed by dividing the
(Sanse and Sharma, 2015b). Examples of the Model-Based Clustering
pattern space into non-overlapping regions to identify the pattern space
method includes EM (Expectation-Maximization) (Fraley and Raftery,
modes. The regions with high-frequency counts form the potential
1998; Dempster et al., 1977; Mclachlan and Krishnan, 1997), COBWEB,
modes and the histogram structure’s valleys as the boundaries between
SOM. The Expectation-Maximization algorithm for maximum likeli-
the clusters. The major concern in using a histogram to measure the
hood can determine the partition. A parametric mixture distribution
density function is that the pattern space must be large enough to
for a random vector 𝐴 such that 𝐴 can be written as
identify the sections (Jain and Dubes, 1988). Furthermore, clusters
that are small in size are usually very noisy because they cannot be ∑𝐵
( )
𝑓 (𝜗) = 𝜋𝑏 𝑓𝑏 𝑎|𝜃𝑏 (2)
adequately defined. In contrast, enormous clusters cannot properly 𝑏=1
define the cluster properties because of the member patterns’ varied ∑
where 𝜋𝑏 > 0; such that 𝐵 𝑏=1 𝜋𝑏 = 1 are regarded as the mixing
properties. It is also difficult to locate the precise values for the peak ( )
proportions. The 𝑓𝑏 𝑎|𝜃𝑏 is the bth component density with the pa-
and valleys in the histogram. ( )
Several works proposing the general concept for mode identifica- rameter vectors of the distribution represented as 𝜗 = 𝜋, 𝜃1, … , 𝜃𝑏
tion has been reported in the literature (Jain and Dubes, 1988). This with (𝜋 = 𝜋1 , … , 𝜋𝑏 ). The 𝑓 (𝜗) is called the B-component finite mixture
( ) ( ) ( ) ( )
clustering method has been used extensively in engineering, mostly density. 𝑓1 𝜃1 , 𝑓1 𝜃1 , 𝑓2 𝜃1 , … , 𝑓𝑏 𝜃1 represent the distribution
in remote sensing applications (Wharton, 1983). In some other cases, components, that is, the clusters of the parametric mixture distribution.
clusters are formed based on the data points density within a region. The distribution components are the same type for all the 𝑏. Shekar
Data points are added to the cluster until the neighborhood’s density is et al. (1987) proposed a knowledge-based clustering scheme by intro-
less than a given threshold. In this case, a cluster in the neighborhood of ducing the notion of conceptual cohesiveness as precise enough to be
a given radius must contain a minimum number of objects concerning adopted for semantic grouping of related objects based on cohesion
the specified threshold. Generation of the cluster this way enables forest knowledge structure. The authors presented a set of axioms that
the building of clusters with arbitrary shapes. Outliers or noisy data should be satisfied to give meaning to the generated clusters.
11
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
4.2.1.5 Search-based clustering Search-based clustering algorithms are tasks for moderately sized problems could be computationally pro-
nature-inspired metaheuristic approaches, termed automatic data clus- hibitive. This makes most metaheuristic approaches suitable for finding
tering algorithms. They spontaneously determine the structure and solutions to data clustering problems (Kuo et al., 2014), and the meta-
number of clusters in a dataset without prior information on the heuristic search algorithms became the most applied techniques for im-
dataset’s attributes values (Aliniya and Mirroshandel, 2019). They plementing automatic clustering algorithms (José-García and Gómez-
emerge as a solution to the need to provide the traditional clustering Flores, 2016). Nature-Inspired metaheuristic algorithms are practically
algorithms, a priori information (Agarwal, 2011) on the number of designed to handle high-dimensional and complex real-world problems
clusters generated (Ezugwu, 2020a). The need to provide this vital (Ezugwu et al., 2020b). Moreover, their higher heuristic search capa-
information usually impose some additional computational burdens or bility makes them look for the most promising (optimal) solution while
requirements on the relevant traditional clustering algorithms (Ezugwu, balancing intensification and diversification in the search. Also, while
searching for optimal solutions, they ensure that the generated solution
2020a). Determining the best estimate of cluster number is a fundamen-
gets into unpromising regions within the search space are avoided.
tal problem in cluster analysis, and it is tagged ’automatic clustering
These natured inspired metaheuristics algorithms have solved a wide
problem’ (José-García and Gómez-Flores, 2016). This problem becomes
range of continuous and discrete combinatorial optimization problems,
more pronounced for real-world data clustering analysis characterized
particularly the GA, DE, PSO, FA and IWO (Ezugwu, 2020b). The
by high density and dimensionality datasets. The lack of prior domain
automatic clustering algorithms are superior in performance compared
knowledge makes it difficult to choose appropriate cluster numbers, with the traditional clustering algorithms in terms of convergence speed
especially in datasets with many dimensionalities with widely varied and their ability to produce good quality solutions
cluster shape, size, density and sometimes overlapping. It is a pro- Some of the nature-inspired algorithms that have been deployed
foundly difficult task to determine the optimal number of clusters for as search based clustering algorithms include Genetic Algorithm (GA)
such data sets, so pre-identifying the number of clusters for a data (Jain and Dubes, 1988; He and Tan, 2012; Doval et al., 1999), Dif-
clustering algorithm is not easy. ferential Evolution (DE) (Paterlini and Krink, 2006; Suresh et al.,
Automatic clustering techniques where such a requirement is not 2009), Artificial Bee Colony Optimization Algorithm (ABC) (Kuo et al.,
needed become a better option for real-world data sets with high 2014; Su et al., 2012), Ant Colony Optimization (ACO), Particle Swarm
density and dimensionality. Automatic clustering algorithms produce Optimization (PSO) (Izakian et al., 2016; Das and Roy, 2008), Inva-
the same results as the traditional clustering technique without sup- sive Weed Optimization (IWO) (Chowdhury et al., 2011), Symbiotic
plying any background information concerning the datasets (Aliniya Organisms Search (SOS), Bacterial Evolutionary Algorithm (BEA) (Das
and Mirroshandel, 2019; Jain and Dubes, 1988; Agrawal et al., 2005; et al., 2009), Variable Neighborhood Search (VNS), Firefly Algorithm
Ezugwu, 2020a). It has also been found appropriate for handling au- (FA) (Senthilnath et al., 2011) and Tabu Search (TS) Algorithm.
tomatic identification and classification of unlabeled data points in The metaheuristics-based clustering algorithms can be classified
real-world datasets, which is evidently difficult and almost impossible into the Evolutionary and the Swarm intelligence metaheuristics al-
manually. Automatic clustering algorithms have a higher possibility gorithms. The GA and DE come under the Evolutionary group, while
of obtaining optimal global solutions, unlike the traditional clustering the rest fall under the Swarm intelligence group. These two broad
algorithms that are mostly local search algorithms whose solutions are classes of algorithms have common design steps: the starting point of
random initializing population size, then identifying suitable candidate
influenced by the initial starting points. They cannot guarantee global
individuals representing the choice solution (Ezugwu et al., 2020a).
optimality except linear and convex optimization (Ezugwu, 2020a).
This is achieved by evaluating the candidate’s members of the initial
Aside these, the nature-inspired clustering algorithms have demon-
generation. The choice solutions are then used to generate a new
strated more flexibility in handling clustering problems in various fields
population by modifying the individual-specific variation operators.
than the traditional clustering algorithms that are mostly problem-
The second and third steps are repeated iteratively, and an update is
specific and lack continuity (Agarwal et al., 2011). With the main made concerning which candidate individual is best fitted in terms of
aim of clustering algorithm as having the ability to generate clusters the defined objective function of the problem. The best candidate’s
that exhibit the characteristics of reduced intracluster distance and choice is achieved by comparing the current generation solution with
increased inter-cluster distance, the automatic clustering algorithms the previous generation solution, and precedence is given to the current
treat clustering problems as optimization problems with a focus on the best solution. The subsequent sections will report reviews of different
minimization of the dissimilarity within a cluster and maximization of research involving the application of various nature-inspired algorithms
dissimilarity between clusters (Ezugwu, 2020a; Kuo et al., 2014). to clustering problems.
As an optimization problem, finding an optimal solution for a
(a.) Evolutionary algorithm
clustering problem is classified as NP-hard when the number of clusters
is more than three; that is 𝑘 > 3 (Falkenauer, 1998). Thus, clustering i. Genetic Algorithm-Based Clustering Techniques
12
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Genetic algorithm is a single objective evolutionary computation al- those that directly mimic ants’ behavioral nature and those that are
gorithm that has been used for automatic clustering. Holland developed less directly inspired by nature. The first groups considered gathering
the algorithm in the early 1970s (Holland, 1975). Its idea stemmed items and occasional sorting activities observed in the nest and brood
from Charles Darwin’s principle of evolution by natural selection. In care of ants (Deneubourg et al., 1991). The behavior is directly imitated
genetic algorithms, some fundamental genetic ideas are borrowed and in the clustering of abstract data where the clustering objective is
artificially used in constructing robust search algorithms with minimal implicitly defined. The second group, which is less directly inspired by
problem information (Sheikh et al., 2008). The search is performed nature, handles clustering tasks as an optimization task using the ant-
in large, complex multimodal landscapes providing a near-optimal based optimization method to generate good or near-optimal clusters.
solution for the search problem’s stated objectives or fitness function. The second group has the advantage of explicit specification of the
In GA-based clustering techniques, the capability of GA is applied to objective function, offering a better understanding and prediction of
evolve the proper number of clusters and provide appropriate clustering the clustering performance (Handl and Meyer, 2007). ACO clustering
(Sheikh et al., 2008). algorithm falls under this second group. It is inspired by the foraging
The search space parameters are represented as strings called chro- behavior of mass recruiting ants. The ants use pheromones to mark
mosomes. A combination of cluster centroids encodes them, and a areas of promising forage and potential food sources (Handl and Meyer,
collection of them forms the algorithm’s population. At the first genera- 2007; Dorigo and Stützle, 2004; Dorigo et al., 1996). Runkler (2005)
tion, a random population representing different search space solutions and Saatchi and Hung (2005) carried out some research on the ACO-
is created at the initial stage. Each chromosome has an objective and based clustering algorithms. An ant-based clustering algorithm was
fitness function associated with it which measures the degree of good- also presented by Kanade and Hall (2004), which finds the adequate
ness of the chromosome. Using the principle of survival of the fittest, number of clusters and initializes the fuzzy c means algorithm. Handl
the best fit among existing chromosomes is selected to ‘birth’ the next et al. (2006) worked on adaptive time-dependent transporter ant for
generation of chromosomes through the biologically inspired operators: clustering.
the crossover and mutation operators. The selection, crossover and ii. Particle Swarm Optimization
mutation operation are iteratively repeated for a given number of The PSO is another general-purpose optimization metaheuristic al-
generations or until a stopping criterion is met (Goldberg, 1989). In GA- gorithm inspired by the collective behaviors of unsophisticated agents
based clustering techniques, the selection operators control the search that interact locally among the neighboring individuals and their en-
direction, while the crossover and mutation operators generate new vironment to cause more complex valuable behavior for solving opti-
regions for search. mization problems (José-García and Gómez-Flores, 2016; Engelbrecht,
Several research efforts in developing GA-based clustering algo- 2005). Kennedy and Eberhart introduced the PSO as a population-based
rithms have been reported in the literature. Krovi (1992) investigated search algorithm where individual population members are grouped
the potential feasibility of the use of GA for the purpose of cluster- into a swarm. During optimization, the swarm of particles moves coop-
ing. Krishna and Murty (1999) proposed Genetic K-Means Algorithm eratively in the defined region representing the objective function, with
(GKA), a novel hybrid GA to find a globally optimized partition of a each particle representing a complete solution. Each particle moves in
given dataset into a specified number of clusters using the K-Means response to the influence of forces that attract it to a good position in
algorithm to solve the problem of expensive crossover operations. the search space which previously has been explored by other swarm
The K-Means operators were used as search operators in place of the members or itself. According to José-García and Gómez-Flores (2016),
crossover. GKA searches faster and converges to the global optimum. It ‘‘the particles explore the search space by adjusting their trajectories
minimized total within-cluster variation (TWCV) (Krishna and Murty, iteratively according to self-experience and neighboring particles’’. Sev-
1999). The Fast Genetic K-Means Algorithm (FGKA) by Lu et al. (2004a) eral particles with randomly assigned velocities are initially placed at
is another GA-based clustering algorithm inspired by GKA featuring random positions in the search space at the search’s commencement.
several improvements over it. Other GA-based clustering algorithms Each particle will evaluate the objective function at its position at every
include Incremental Genetic K-Means Algorithm (IGKA) (Lu et al., iteration, updating its position, velocity, and memory for its individual
2004b), GA-clustering (Maulik and Bandyopadhyay, 2000), Genetically best position (Handl and Knowles, 2007).
Guided Algorithm GGA (Hall et al., 1999). The study conducted by the Omran first introduced the application of PSO for solving clustering
following authors: Sheikh et al. (2008), José-García and Gómez-Flores problems in 2002. The algorithm used a fixed number of clusters,
(2016); Ezugwu 2020a provides other references on further works on and then PSO was used in searching for the optimal centroids of the
GA-based clustering. Based on a different encoding scheme for GA- clusters and each data point assigned to the closest centroid. The work
based clustering, José-García and Gómez-Flores (2016) discussed four was further extended in their presentation of the DCPSO — dynamic
different categories of automatic clustering based on GA: the Centroid- clustering approach based on PSO (Omran et al., 2002; José-García
based encoding of variable length, the Centroid-based encoding of fixed and Gómez-Flores, 2016). Another PSO-based segmentation algorithm
length, the Label-based encoding, and the Binary-based encoding. for automatically grouping image pixels into different regions was
proposed by Das et al. (2006). Other researches on PSO based clustering
(b.) Swarm Intelligence Algorithm algorithm can be found in (Qu et al., 2010; Ouadfel et al., 2010; Cura,
i. Ant Colony Optimization Clustering Algorithm 2012; Kuo et al., 2012, 2014; Van der Merwe and Engelbrecht, 2003;
The ACO algorithm is a stochastic metaheuristic for combinatorial Cui et al., 2005; Cui and Potok, 2005; Kanungo et al., 2000). It has been
optimization (Dorigo and Stützle, 2004) classified under Swarm Intel- observed that PSO-based clustering algorithms give excellent results in
ligence (SI). SI is a category of artificial intelligence paradigm inspired quality clustering to find the correct cluster number. The PSO basic
by the study of emergent behavior in decentralized, self-organized algorithmic form is characterized by extreme simplicity. It is mostly
systems. SI methods aim to imitate the behavior and apply it in finding used to optimize the functions of continuous variables.
solutions to hard computational problems. They are credited with hav- iii. Whale Optimization Algorithm
ing simple design capability, scalability and robustness. There are other The Whale Optimization Algorithm (WOA) is a swarm based meta-
algorithms based on the natural swarm behavior, such as bee colony, heuristic optimization algorithm proposed by Mirjalili and Lewis in
flocks of birds, and schools of fish. However, the ACO is categorized 2016 (Mirjalili and Lewis, 2016). The WOA was inspired by the social
under ant-based techniques under SI. The Ant-based clustering methods behavior of humpback whales with reference to their foraging behavior
are directly modeled on ant’s social behavior (Dorigo and Stützle, termed bubble-net hunting strategy. In hunting school of krill or small
2004). It is the most widely used group of swarm-based clustering fishes close to the ocean surface, the humpback whales use two maneu-
algorithms. There are two major approaches to ant-based clustering: vers: the ‘upward-spirals’ and the ‘double-loops’. In the first maneuver,
13
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
a ‘9’ shaped path is created around the prey from a depth of 12 m by the The Variable Neighborhood Search (VNS) is a metaheuristic algo-
humpback whale and start to swim up towards the surface. The prey is rithm that represents a flexible framework for a heuristic building to
captured in the second maneuver in three behavioral patterns called: find approximate solutions to combinatorial and non-linear continuous
the coral loop, lobtail and capture loop (Goldbogen et al., 2013). optimization problems. It was proposed by Mladenovic and Hansen in
Nasiri et al. (2018) proposed the Whale Clustering Optimization 1997 (Mladenovic and Hansen, 1997). It is characterized by systematic
Algorithm based on the humpback whales’ foraging behavior for data exploitation of changes of neighborhood, finding the local minimum in
clustering. The advantages of WOA, among which include, low number the descent phase while escaping from the corresponding valley. The
of parameters and lack of local optimal entrapment, were harnessed in neighborhood structures are systematically changed while searching
the proposed clustering algorithm. The main goal was to use the WOA for an optimal (or near-optimal) solution. According to Brimberg et al.
for a complete search to cluster unlabeled data for better clustering (2017), it is a proven heuristic framework for finding good solutions
results using a simple solution. Soppari and Chandra (2020) used an to combinatorial and global optimization problems. Compared with
optimized clustering approach to develop an effective framework for other metaheuristics, the VNS basic scheme and its variants are simple,
digital watermarking. They combined Least Favorable-based Whale Op- requiring non or few parameters. The reasons for these characteristic
timization Algorithm (LF-WOA) with optimized Fuzzy-Cmeans (FCM) behaviors of VNS are stated by Alguwaizani et al. (2011) as based on
for selecting the initial centroid to identify regions for watermarks the following properties:
insertion in digital watermarking. More recent literature regarding
the WOA-based clustering algorithm includes (Reddy and Babu, 2019; • Relativity of a local optimum to the corresponding neighborhood
Rahnema and Gharehchopogh, 2020; Jadhav and Gomathi, 2018). structure, that is, a local optimum relative to one neighborhood
structure, is not necessarily a local optimum for another neighbo-
iv. Crow search algorithm
The Crow Search Algorithm (CSA) is a population-based metaheuris- rhood structure.
tic optimizer inspired by the intelligent behavior of crows developed • Consideration of global optimality with reference to local optimal
by Askarzadeh (2016). The crows are distributed widely and are con- in terms of all neighborhood structure. ‘A global optimum is
sidered among the world’s most intelligent birds. Crows stores excess a local optimum with respect to all neighborhood structures’
food in certain places, which are later retrieved when needed. They are (Hansen and Mladenovic, 2018).
known to be greedy, following each other to obtain better food sources. • The relative closeness of all or majority of local optimal to one
A crow watches and observes the location where other birds hide their another. In the words of Alguwaizani et al. (2011) ‘Empirical
food to steal it. It also takes precautionary steps of moving its own evidence shows that all or a large majority of the local optima
hiding place to prevent others from stealing its food. If a crow detects are relatively close to each other for many problems.
that another one is following it, the crow goes to another position of
The VNS increasingly uses complex moves to find the local optimal in
the environment to fool the one following it.
all the neighborhood structures. If the local optima found is poor, sev-
Lakshmi et al. (2018) combined CSA with K-means algorithm to
eral neighborhoods are used. An increase in exploitation of the vicinity
improve the performance of K-means algorithm to achieve global op-
of the incumbent solution is also suggested (Hansen and Mladenovic,
timum clustering solution. The CSAK algorithm was used to find the
2018).
optimum solution for the initial centroids for the K-means algorithm.
The basic VNS algorithm, as with other metaheuristic algorithms,
Balavand et al. (2018) also combined K-means with CSA for automatic
clustering based on data envelopment analysis which measures the starts with a set of randomly generated initial solutions. This is followed
efficiency of the decision making units of the algorithm. In this case, by the random generation of a neighbor of the incumbent solution. This
the CSA carry out the clustering processing using the initial cluster phase is called the shaking step. After this phase, the moving step is
centers generated by the K-means algorithm. Wu et al. (2015) proposed then performed where the local optimal and the incumbent solution are
a hybrid clustering algorithm based on WOA and CSA, harnessing compared, and the incumbent solution is updated if the local optimal
the advantages of the two algorithms with respect to their search solution is better. These two phases are repeated until the maximum
strategy. In Anter et al. (2019), the CSA improved the Fast Fuzzy C- neighborhood number’s termination condition is met.
means algorithm for data clustering. The CSA generates the initial The VNS and its variants have been used to solve many clustering
cluster centers for the fuzzy C-means algorithm for a more accurate problems. Alguwaizani et al. (2011) used VNS to solve the harmonic
cluster result. In the proposed algorithm, generating an optimal cluster means clustering problem and reported that VNS compared favorably
center ensures the FFCM avoids getting stuck in the local minimal and with solutions obtained from Tabu Search TS and Simulated Annealing
improves computational performance. SA Heuristics. The capacitated clustering problem was solved using two
VNS based heuristics by Brimberg et al. (2019) and the performance
v. Emperor Penguin Optimizer
accessed on benchmark instances from the literature. According to
The Emperor Penguin Optimizer is a bio-inspired metaheuristic
their report, all VNS procedures outperform the state-of-the-art in the
algorithm introduced by Dhiman and Kumar (2018). It mimics the em-
stated problem. Consoli et al. (2019), Orlov et al. (2018); Mladenovic
peror penguin’s huddling behavior for successful survival in the depth
and Hansen (1997), Hansen (2005), Hansen and Mladenovic (2018),
of the Antarctic winter. The huddling behavior is made up of four major
Hansen et al. (2009), Rozhnov et al. (2019), Hansen and Mladenovic
steps: the huddle boundary generation, computation of the temperature
around the huddle, calculating the distance and finding the effective (2001), Martins (2020), Carrizosa et al. (2013) are other research
mover. Ragaventhiran and Kavithadevi (2020)) adopted CSA as an reports on using VNS and its variants for solving clustering problems.
optimizer in their proposed Frequent Pattern Mining in which Affinity Ros and Guillaume (2019), in related work on neighborhood search,
propagation-based clustering is implemented in one of the five pro- proposed a clustering method referred to as Munec to address the chal-
cesses with the main objective of performing preprocessing to remove lenge of new clustering algorithms in finding an appropriate number of
data redundancy. Furthermore, the authors validate the performance clusters in complex datasets and demonstrate self-tuning capability and
of their implemented method with previous approaches for succeeding adaptiveness for input parameters in identifying acceptable solutions.
metrics that are execution time, response time, load balancing rate, and Their algorithm adopted the nearest neighbor technique in group re-
scalability. The results obtained in the work revealed that the authors’ lated data objects. Nearest neighbor data objects are merged without
proposed map-optimize-reduce mining technique on Hadoop achieved constraints at the beginning until the number of groups attains the
excellent results compared to existing literature results. maximum (at least two groups). Subsequent merging is then based on
mutual neighbor groups with a similar distance between neighbors.
c. Other Algorithms
The experimental tests carried out using two-dimensional datasets
i. Variable Neighborhood Search revealed that Munec proved to match a ground truth target highly
14
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
effectively. More so, under the same input configuration, Munec can The K-Means clustering algorithm is a centroid-based partitioning
identify clusters of various densities, arbitrary shapes, and a large technique in which data objects are distributed into a specified number
amount of noise (Ros and Guillaume, 2019). of k clusters. The distribution is done through the use of an objective
function which accesses the quality of the partition, ensuring that the
ii. Tabu Search Algorithm
similarities of objects within a cluster (intra-cluster similarity) is higher
The Tabu Search (TS) Algorithm is an optimization problem-solving
compared with objects in another cluster (inter-cluster similarity). K-
higher-level heuristic procedure designed to guide other methods in
escaping the possibility of getting trapped in local optimal (Glover, Means clustering is a centroid-based technique, and it uses the mean
1990). It uses a local search procedure guided to avoid local optimal, to represent the centroid of a cluster. The centroid of a cluster is a
rejecting already visited points (kept on the tabu list) in search space measure of the center point of the cluster. Specified 𝑘 numbers of data
(Batres, 2012). A thorough information search is enabled in tabu search points/objects are randomly selected from a set of the existing data
using a flexible memory structure. This aids the algorithm in strategi- points as the representative center for 𝑘 clusters. The Euclidean distance
cally constraining and freeing the search process with varying period between the remaining data points and each assumed center point is
memory functions for intensification and diversification. According to then iteratively measured. The obtained value assigns the data point
Glover (1990), ‘‘the form of guidance provided by tabu search is highly to the cluster with the smallest distance. The intracluster similarity
flexible and often motivates the creation of new types of moves and is improved each time a new data point is given to the cluster by
evaluation criteria to take advantage of its adaptability to different computing a new mean using the objects previously assigned to the
problem structures and strategic goals’’. TS has been credited with clusters. The new mean is then used to reassign the data objects. This
finding solutions superior to the best solutions of other alternative procedure is repeated severally until stability is achieved.
methods for various problem settings. Implementing the TS algorithm The sum of square function for the Euclidean distances produces
is easy, and it can handle additional considerations. For instance, con- compact and well-separated clusters. The K-Means algorithm tries to
straints that are not included in the original formulation of the problem minimize the sum of the squared error criterion (Ezugwu, 2020a;
are given consideration. TS algorithm was applied to subset clustering Hartigan and Wong, 1979; MacQueen, 1967). The major problems
problems by Glover et al. in 1985 with a solution obtained in less identified with K-Means clustering algorithms include the problem of
than one minute on a V77 minicomputer (Glover, 1990). As mentioned the initial definition of the number of clusters at the algorithm’s onset.
earlier, three basic themes can be identified in the development of the An efficient and universal method for determining the initial number
TS algorithm with its core embedded in the short-term memory process: of clusters and partition is not found. K-Means algorithm is reported
to be very sensitive to initial centroid selection such that suboptimal
• Usage of flexible attribute-based memory structure
solution may be produced when wrongfully chosen (Punit, 2018). Also,
• An associated control mechanism for employing the memory convergence to global optimum cannot be guaranteed. Using means as
structure and a centroid limits the K-Means algorithm’s application to data objects
• Incorporation of memory functions of different time spans from
with numerical variables (Xu and Wunsch, 2005). Not only these, but
short to long term.
the K-Means algorithm is also sensitive to outliers (objects that are quite
In recent times, there have been several reports on using TS algorithms far from the cluster centroid are forced into the cluster, distorting the
to solve various clustering problems. Cao et al. (2015) presented a TS cluster’s shape (Saxena et al., 2017). It works on the assumption that
algorithm for solving cohesive clustering problems in various business the variance of the distribution of each attribute is spherical and thus
applications. They introduced an objective function for generating as produces a roughly equal number of observations. Moreover, the mem-
pure as possible clusters, maximizing the intracluster similarity as ory space requirement is high and the number of iterations to obtain a
much as possible. They employed the intensification and diversification stable distribution is unknown. Due to the simplicity of implementation
strategies of tabu search to enhance the clustering outcome. Sung and and low computation complexity (Jain, 2010), the K-Means algorithm
Jin (2000) combined the tabu search algorithm with complimentary is still popular and widely used today (Ezugwu, 2020a).
packing and releasing procedures for solving the clustering problem. Lu Some research work extending K-Means has been reported. For
et al. (2018) proposed a tabu search-based clustering algorithm and its example, the G-means (Hamerly and Elkan, 2004) and the X-means
parallel implementation on Spark. Their design was adapted to alleviate algorithms (Pelleg, 2000). The sum of square function for the Euclidean
the challenges associated with big data applications taking advantage distances for the K-Means algorithm is given as:
of the parallel processing based on the Spark framework. They found ∑ 𝑚
( )2
𝑑𝑖𝑘 = 𝑥𝑖𝑗 − 𝑐𝑘𝑗 (4)
the system superior to other similar systems in terms of scalability,
𝑗=1
accuracy, and effectiveness. Other TS algorithm applications for solving
clustering problems can be found in Kharrousheh et al. (2011), Xia et al. where 𝑑𝑖𝑘 is the Euclidean distance, 𝑥𝑖𝑗 is the 𝑗th data point for 𝑖th
(2018), Yaghini and Ghazanfari (2020) and Ibrahim et al. (1994). cluster and 𝑐𝑘𝑗 is the centroid for the 𝑗th cluster.
The K-Means is arguably the most popular clustering method but
4.2.1.6 Square error clustering The square error clustering method is a is plagued with drawbacks such as poor scalability, sensitivity to ini-
partitioning clustering method that assigns data points into a specified tialization and outliers, assumed knowledge of cluster count, and local
number of clusters based on the sum of square error criterion functions. production rather than the global optimum. It is noteworthy to mention
The squared differences between each data point and the estimated that the most recent extensions and improvements on the K-Means seek
center value for each stated group have been divided into the data to advance the state-of-the-art in addressing these issues.
point. In cases where the sum of squared error for a group of data
objects is equal to zero, the cluster’s data points are identical (very ii. K-MCI (K-Means modified cohort intelligence) Clustering algorithm
close). The K-MCI (K-Means modified cohort intelligence) is an efficient
The formula for Sum of Square Error is: hybrid evolutionary data clustering algorithm that combines the K-
∑𝑛
( )2 Means algorithm with modified cohort intelligence (Krishnasamy et al.,
𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒 𝑒𝑟𝑟𝑜𝑟 = 𝑥𝑖 − 𝑥 (3) 2014). Cohort Intelligence(CI) is an optimization algorithm inspired
𝑖=1
by the natural and societal tendency of cohort candidates/individuals
where 𝑛 represents the number of data points and 𝑥i represents the 𝑖th
learning from one another. It was proposed by Kulkarni et al. (2013).
data point in the group and 𝑥 is the center object relative to the group.
In cohort intelligence, while observing every other candidate, each
The k-means clustering algorithm is the best know squared error-based
candidate tries to improve their behavior. The MCI is a modified
clustering algorithm (Xu and Wunsch, 2005).
cohort intelligence with improved accuracy and speed of convergence
i. K-Means Clustering of the traditional CI. In K-MCI, the K-Means algorithm enhances the
15
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
the similarity between the objects. The ELM proposed by Huang et al.
(2006) is a new learning algorithm that randomly generates hidden
nodes for single hidden layer feedforward neural networks(SLFNs) and By clustering time-series data, diverse scientific areas have dis-
determines the output weight of the SLFNs analytically. ELM is credited covered patterns that data analysts have used to extract valuable in-
with a meager computational cost for its operations and has been used formation from complex and massive datasets (Aghabozorgi et al.,
in finding the solution to classification and regression problems. 2015). According to Aghabozorgi et al. (2015), Chiş et al. (2009a,b),
iv. K-means based multiview clustering methods and K-means time series clustering as an exploratory data mining technique is the
subspace clustering models most used approach and has also been a subroutine in more complex
The generation of high dimensional data due to the social network’s data mining algorithms. The visual image representation of time se-
rapid development has been a significant challenge to the traditional ries cluster structures helps users quickly understand the clusters, the
K-means clustering generally tagged as the curse of dimensionality. anomalies, the structure of the data, and other datasets’ regularities.
Redundant features and noises in such data make efficient clustering Time series clusters have been used in finding answers to numerous
of such data very difficult. The K-means based multiview clustering real-world problems such as anomaly, novelty or discord detection
methods are developed to provide simple and efficient algorithms for (Keogh et al., 2002; Chan and Mahoney, 2005; Wei et al., 2005; Leng
accurately exploring shared information in multiview data. Zheng et al. et al., 2009), recognition of dynamic changes in time series (He et al.,
(2018) proposed a robust discriminative multiview K-means clustering 2011), prediction and recommendation (Sfetsos and Siriopoulos, 2004;
with feature selection and group sparsity learning. The proposed algo- Pavlidis et al., 2006; Ito et al., 2009; Graves and Pedrycz, 2010) and
rithm addressed the problem of extreme time consuming and sensitivity pattern discovery (Wang et al., 2002; Das et al., 1998). The enormous
to outliers that is common with clustering of high-dimensional feature size of time series data requires being stored on disks during processing,
space. It efficiently handles the curse of dimensionality by using group resulting in an exponential decrease in the speed of the clustering
sparsity constraints for selecting the most important views and the most process.
relevant features. Time-series clustering can be classified into three (Aghabozorgi
In handling high dimensional data of a real-world application, us- et al., 2015): the whole time series clustering, the subsequence clus-
ing eigenvalue decomposition by existing K-means subspace clustering tering and the time point clustering. Performing clustering operations
algorithm to find an approximate solution is less efficient. Moreover, on many individual time series to similar group ones into a cluster is
their loss functions exhibit sensitivity to outliers or suffer small loss called the Whole-time series clustering. In this case, each time series
errors (Wang et al., 2019). A new adaptive Multiview subspace cluster- is treated as an object, and the conventional clustering algorithm is
ing method was recently proposed by Yan et al. (2020) for integrating employed. Subsequence clustering entails a clustering set of sliding win-
heterogeneous data in low-dimensional feature space. Their work ex- dow extractions of a single time series intending to find the similarity
tended K-Means clustering with feature learning capability for handling and differences among the extracted time windows. According to Chiş
high-dimensional data. Wang et al. (2019) developed a fast adaptive et al. (2009a,b), the subsequence clustering is a standard subroutine
K-means (FAKM) type subspace clustering model embedded with a in rule discovery algorithm (Das et al., 1998; Fu et al., 2004; Halkidi
mechanism for flexible cluster indicator using an adaptive loss function. and Vazirgiannis, 2001), indexing, classification algorithm, prediction
According to Wang et al. the existing methods of combining sub- algorithm (Ormerod and Mounfield, 2000; Popivanov and Miller, 2002)
space learning with K-means clustering still exhibit some limitations. and anomaly detection (Steinback et al., 2002). The time point cluster-
These include: no thorough capturing of discriminative information ing (Aghabozorgi et al., 2015; Gionis and Mannila, 2003; Ultsch and
in low-dimensional subspace, consideration of intrinsic geometric in- Mörchen, 2005; Morchen et al., 2005) involves clustering of time point-
formation is rare, and the vulnerability to noises of the optimizing based on the combination of the time point’s temporal proximity and
procedure of a discrete cluster indicator. They proposed a robust dimen- the similarity of the corresponding values. Fig. 7 presents a time-series
sion reduction for clustering with a local adaptive learning algorithm to clustering taxonomy.
address these limitations. The proposed algorithm adaptively explores There are different ways to cluster time series data recorded in the
the discriminative information by unifying K-means clustering with literature. These includes:
local adaptive subspace learning.
• Customize existing conventional clustering algorithms (Aghabo-
zorgi et al., 2015; Warrenliao, 2005).
4.2.1.7 Miscellaneous clustering techniques i. Time series Clustering
• Conversion of time series data into simple objects as input into
A time series is a sequence of real numbers collected regularly in
conventional clustering algorithms (Aghabozorgi et al., 2015;
time, where each number represents a value. It is the simplest form
Warrenliao, 2005).
of temporal data, which is naturally characterized as high dimensional
• The use of multi resolutions of time series as input of a multi-step
and large data size (Chiş et al., 2009a,b; Antunes and Oliveira, 2001;
approach (Aghabozorgi et al., 2015)
Warrenliao, 2005; Rani and Sikka, 2012; Lin et al., 2004). Since time-
series data feature changes at a function of time, they are classified Three different clustering time series data approaches were identified:
as dynamic data. Each time series is made up of many data points, feature-based, model-based, and shape-based approaches. The shape-
but at the same time, they can be seen as a single object. Time-series based approach employs the conventional clustering methods to match
clustering is an aspect of temporal data mining research that provides two time-series shapes by a non-linear stretching and contracting of
useful information in various domains (Liao, 2005; Aghabozorgi et al., the time axes using a distance or similarity measure appropriate for
2015; Wang et al., 2002; Das et al., 1998). the time series. The feature-based convert the raw time series into
16
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
17
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Mode is a measure of central tendency that returns the value that iv. Multiview Clustering
occurs most frequently in a data set. The mode can be determined for The big data paradigm introduced multi-view data in more recent
qualitative and quantitative attributes, and it is possible that a single times, which are data observed from different views or generated from
different sources (Yang and Wang, 2018). Multiview data exhibits its
data set can have more than one mode. In mode-seeking clustering
specific heterogeneous property while holding potential connections
algorithms, estimated density functions generate clusters (Fukunaga with others. The specific property of a particular view may be associ-
and Hostetler, 1975; Comaniciu and Meer, 2002). These modes are ated with a particular task of knowledge discovery with other different
the local maxima of the probability density functions. Mode-seeking views containing information that is complementary to it, which may
clustering assigns cluster labels by associating data samples with the be exploited. Multiview clustering involves using advanced techniques
that exploit the complementary and consensus information that falls
nearest modes (Sasaki et al., 2018). The number of detected modes
across the multiple views from where the data is drawn. It provides
automatically forms the number of generated clusters in the mode- a means of discovering the hidden power of knowledge embedded in
seeking clustering algorithm. According to Duin et al. (2012c), the such type of data. The early work on multiview clustering includes
mode-seeking clustering can be considered an agglomerative approach reinforcement clustering for multi-type interrelated data (Wang et al.,
where a density function is estimated for the dataset (running a mean- 2003), two view version of EM-based and agglomerative algorithm
(Bickel and Scheffer, 2004) and multiview version of DBSCAN (Kailing
shift iteration initialized at every data point) and have each mode
et al., 2004).
defining one cluster. In the clustering phase, to decide which mode The inherent problem being addressed by multiview clustering is
an object belongs to, the density gradient from that object is followed maximizing the clustering quality within each view while maintaining
until a mode is found. Objects that end up in the same mode belong clustering consistency across the different views. Other challenges of
to the same cluster. This procedure allows the number of clusters to be multiview clustering include the successful handling of incomplete
multiview data. Incomplete multiview data have some data objects that
identical to the number of modes (Duin et al., 2012c; Carreira-Perpiñán,
have no observation in some of the views or have only part of their
2015). Duin et al. (2012c) discussed two-mode seeking procedures
features registered on the view.
relative to the non-parametric density estimates used: the mean shift Five categories of multiview clustering algorithms were discussed
procedure (that uses the Parzen Kernel for mode seeking (Fukunaga and from the survey work on multiview clustering carried out by Yang
Hostetler, 1975; Cheng, 2002)) and the kNN mode seeking procedure and Wang (2018). These include multi-kernel learning, co-training
which uses the K-nearest Neighbor estimator (Koontz et al., 1976; style algorithms, multi-view subspace clustering, multi-task multi-view
clustering and multi-view graph clustering. In multi-kernel learning,
Kittler, 1976; Shaffer et al., 1979). The two procedures have a width
predefined kernels that correspond to different views are used. The
parameter that influences the number of modes in the density estimate, kernels are then combined in a linear or non-linear order to improve
and the consideration is on the clustering. clustering performance. The co-training style algorithms use a co-
training strategy to treat multiview data. It uses prior knowledge or
The Mean shift clustering is a mode-seeking clustering algorithm
learning knowledge obtained from other views to bootstrap each view’s
that initially considers all objects of a data set as candidates for cluster clustering. This process is done iteratively with each view’s clustering
centers which are then updated iteratively towards the nearest mode of results tending towards each other to produce the broadest consensus
the estimated density by gradient descent. In Myhre et al. (2018), Co- that cut across all the views.
maniciu and Meer (2002), mode seeking is a prominent density-based The multi-view subspace clustering assumes that all views share a
unified representation. Based on this assumption, this unified represen-
clustering method represented mainly by the Mean shift algorithm. To
tation that serves as input into a model for clustering is learned from all
some extent, the mode-seeking clustering algorithms do allow capturing the views’ subspace features and used for clustering. The methods under
nonlinear clusters because the density can adapt locally to the data. As this category include the subspace learning-based method and the non-
stated earlier, the density estimate determines the number of clusters. negative matrix factorization-based method. The multi-task multi-view
Apart from these, it is also robust to outliers because an outlier is clustering saddles each view with a single or multiple related tasks. The
inter-task knowledge is transferred among the various views to exploit
represented by its cluster and, based on its density value, can easily
the Multiview relationship and the multitasking capability to improve
be thresholded away (Myhre et al., 2018).
the performance of the clustering process. The multi-view graph clus-
The k-NN mode seeking procedure defines a pointer for every tering applies a graph clustering algorithm or related algorithm such
object with the highest density in its neighborhood. The density of as spectra clustering on a sought-out fusion graph that cuts across
all the multiview data views. This category of multiview clustering is
every object is proportional to the distance to its 𝑘th Neighbor. The
further sub-divided into three methods based on the clustering method
pointers are then followed to the object that points to itself because applied. These are the graph-based method, network-based method and
it represents a mode in the density as the objects with the highest spectral-based method.
density in its neighborhood. The k-NN procedure has been reported to The success and effectiveness of multiview clustering are based on
be significantly faster than the mean shift algorithm and can handle two related principles: the complementary principle and the consensus
principles. These two principles reflect the underlying assumptions em-
larger datasets (in terms of large numbers of an object and high
ployed in the clustering process and how such algorithms are modeled
dimensions) compared with the Mean shift algorithm, which can handle and operated. The complementary principle reflects the necessity of
large datasets for only low dimensional spaces (Duin et al., 2012a,b). employing multiview for a comprehensive and accurate description of
This is because the mean shift algorithm has problems tracking the data objects. In contrast, the consensus principle handles consistency
density gradient in high dimensions. From the conclusion (Duin et al., maximization across multiple distinct views based on the generalization
error analysis proposed by Dasgupta et al. (2002).
2012c), the mode seeking clustering algorithm is presented as the most
According to Yang and Wang (2018), each category of the multiview
natural procedure for cluster analysis, but there is a need for the dataset algorithm has its pros and cons. For instance, in the co-training style,
to be sufficiently large enough before a good density estimate can be the clusters of different views are enhanced interactively through the
obtained. information exchange, but the approach becomes intractable when the
18
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
view size exceeds three. The kernel-based has the advantage of the 4.2.2 Mixture Resolving Algorithms
kernel but has high computation complexity. The interpretability of The Mixture Resolving Algorithm or mixture-based algorithm as-
the multiview subspace method is straightforward, but it suffers from sumes that a set of observed objects emanates from a mixture of
dependence on initialization parameters. The multi-view graph method instances from multiple probabilistic clusters. Therefore, a probabilistic
harnesses the spectral graph theory and its advantages. However, it cluster is chosen according to the cluster’s probabilities to generate
relies on the constructed affinity. In the multi-task multiview method, each observed object. A sample is then chosen according to the proba-
the approach enjoys the advantages inherent in multi-view clustering bility density function of the chosen cluster. The data set is assumed
and multi-task cluster properties. Nonetheless, the research work in this to be a mixture of a given number of different cluster groups with
area is still relatively new. varying proportions during clustering. The mixture likelihood-based
approach to clustering is model-based because the specification of each
v. Deep Learning Clustering
component density of observation is required in advance. Aitkin and
Deep learning clustering methods use deep neural networks to learn
Rubin (1985) stated that a statistical model to be used must be stated
clustering representations (Min et al., 2018). The optimizing objective
or known ahead in the clustering of samples from a population. With
of the deep clustering usually refers to as the loss function, has two
this overlapping relationship between the model-based clustering and
parts: the clustering loss L𝑐 and the network loss L𝑛 . The network
mixture-based algorithm, it is possible to conduct estimation analysis
loss 𝐿𝑛 learns the feasible features and also avoids irrelevant solutions
and hypothesis testing of clustering methods based on mixture models
while the 𝐿𝑐 fosters the formation of feature points groups or become
using standard statistical theory. Marriott (1974), in support of this,
discriminatory. The loss function is given as :
had stated that the mixture likelihood-based approach ‘is about the
𝐿 = 𝜆𝐿𝑛 + (1 − 𝜆) 𝐿𝑐 (5) only clustering technique that is entirely satisfactory from the math-
ematical point of view’. It assumes a well-defined mathematical model,
where 𝜆 ∈ [0, 1] as a hype-parameter balances 𝐿𝑛 and 𝐿𝑐 . The use
investigates it by well-established statistical techniques and provides
of deep neural networks for data clustering makes learning no-linear
a test of significance for the results.’. The determination of the most
mappings possible, transforming data into a more clustering-friendly
suitable number of clusters can be easily achieved in a Mixture-based
representation, eliminating the need for manual feature extraction or
algorithm because it has a clear probabilistic foundation (Berkhin,
selection.
2012). Mclachlan and Basford (1988) stated that providing an effective
The similarity methods used in convectional data clustering meth-
clustering of various data sets under various experimental designs is
ods cause poor performance when clustering high-dimensional data
one of the mixture model’s usefulness. However, the assumptions made
(Min et al., 2018). Methods of feature transformation and dimensional-
regarding the data distribution are rather strong, and the computation
ity reduction have been applied for mapping raw data into new feature
complexity is high. Moreover, each cluster is viewed as a single simple
space for generated data that existing classifiers can easily separate.
distribution, constraining the cluster’s shape (Grira et al., 2005).
However, the high complexity of the latent structure of data still poses
a challenge to the existing clustering methods effectiveness. Data can be 4.2.2.1 Expectation maximization The expectation–maximization algo-
transformed into a more clustering-friendly representation using deep rithm is a framework that employs two major steps in approaching the
learning algorithms such as deep neural networks (DNN) because of its maximum likelihood of estimates of parameters in a statistical model;
inherent highly non-linear transformation characteristic. T deep neural the Expectation Steps and the Maximization Step. In the expectation
networks-based clustering methods have proven promises for effective step, objects are assigned to clusters based on the probabilistic clusters’
and efficient clustering of real-world data (Aljalbout et al., 2018). parameters and in the Maximization step, the new clustering or pa-
There are existing novel deep learning-based clustering methods rameter that maximizes the expected likelihood is found. Given initial
that combine deep neural networks with clustering methods. According random values for the probabilistic distribution parameters such as
to Li et al. (2018), the various approaches can be categorized into two: the mean and standard deviation, the E-step and the M-step are then
one, the unified approach that optimizes the clustering objectives and iteratively conducted until the parameter converges or the change is
the deep representation learning, and two, the sequential methods that sufficiently small. During clustering, each object’s probability belongs
apply clustering on the learned DNN representation. Xie et al. (2016) to each distribution is calculated, and the probabilistic distribution
proposed a deep embedding clustering (DEC) method, which uses deep parameters are adjusted to maximize each cluster object’s expected
neural networks to simultaneously learn feature representations and likelihood in the M-step. Several computations are required for each
cluster assignments. Its operations involve mapping from the data iteration of the EM algorithm. The product of the number of data
to a lower-dimensional feature space while iteratively optimizing a points and the number of mixture components scaled linearly with
clustering objective. Li et al. (2018) used fully convolutional auto- this iterative computation, limiting EM Algorithm’s applicability in
encoders for learning image features as a base for a unified clustering large-scale applications (Verbeek, 2004). The EM algorithm is easy
framework for joint image representation and cluster centers learning. to implement, and there is no need to set any parameters that will
In the same vein, Yang et al. (2017) proposed a joint K-means clustering influence the optimization algorithm (Verbeek, 2004). However, the
approach and dimensionality reduction with the dimensionality reduc- found solution is highly sensitive to the initial parameter values with
tion accomplished by learning a deep neural network. Other methods all local optimization methods.
include DEPICT — deep embedded regularIzed clustering (Dizaji et al.,
2017).), VaDE — variation deep embedding (Jiang et al., 2016), 4.2.3 Fuzzy clustering
CCNN-CNN-based joint clustering (Hsu and Lin, 2018) and DTAGnet Fuzzy clustering is a clustering method based on the Fuzzy set devel-
— deep learning task-specific and graph-regularized network (Wang oped by Zadeh (1965). The clusters are defined in fuzzy sets, with each
et al., 2016). pattern simultaneously belonging to more than one cluster, as shown in
Pitchai et al. (2021) proposed brain tumor segmentation using deep Fig. 9. Data points are assigned to two or more clusters with a degree
learning and fuzzy K-means for magnetic resonance images. Artificial of membership in the reflecting clusters, thereby building a non-binary
neural networks and fuzzy K-means were combined to segment the relationship (Ezugwu, 2020a; Saxena et al., 2017). This way, clusters
tumor locale in their work. In Huang et al. (2021), the hidden rep- are allowed to overlap, showing what is regarded as Fuzzy overlap.
resentation associated with different implicit lower-level attributes is Fuzzy overlap reflects the cluster boundaries’ fuzziness enumerating the
learned using their proposed robust deep K-means model. Systematic number of data points with significant membership in the overlapping
taxonomy of clustering methods that used deep neural networks was clusters. This clustering method is beneficial for clusters of data points
presented by Aljalbout et al. (2018). Their taxonomy presentation was whose boundaries are ambiguous and not well separated (Kaufman and
based on a comprehensive review of recent work. Rousseeuw, 1990). The degree of membership assigned to data points
19
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
20
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Table 5
Summary of recent work on clustering algorithms.
Clustering methods Study covered Application area Author and year Impact as of 2021
Ratio of Deviation of The authors designed a cluster validity Artificial and real-world Li et al. (2020) 5
Sum-of-squares and Euclid evaluation technique based on Ratio of datasets, including Iris plants,
distance Deviation of Sum-of-squares and Euclid Glass dataset, Wine dataset,
distance Gauss datasets, and shape
datasets
Entropy-based initialization The authors proposed a technique for 2-dimensional and Chowdhury et al. (2020) 5
method for K-Means calculating the optimal number of 3-dimensional image datasets
algorithm clusters in a dataset. They also designed
an entropy-based initialization method
for K-Means algorithm.
Depth difference (DeD) for The authors proposed a novel method 2-D synthetic datasets Patil and Baidari (2019) 23
K-Means algorithm for obtaining the optimal number of
clusters in a dataset, called depth
difference (DeD)
A novel technique called The authors propose an unsupervised Medical dataset, Iris, Seeds, Sinaga and Yang (2020) 23
U-K-Means algorithm learning procedure for K-Means Australian credit approval,
clustering algorithm, called U-K-Means Flowmeter D, Sonar, Wine,
algorithm. Horse, and waveform
Bisecting K-Means The authors designed a technique for Different large-scale datasets Safari et al. (2020) –
algorithm and a unique automatically determining the number of
splitting measure clusters for large-scale datasets
Fuzzy C–Medoids (FCMd) The authors proposed a fuzzy Two simulation studies and D’urso and Massari (2019) 16
algorithm clustering-based method for handling two empirical applications
mixed features. The model is based on were performed.
the Fuzzy C–Medoids algorithm. They
designed a weighting scheme to
calculate the weight of each attribute in
a dataset.
Modified K-Means The authors proposed an algorithm for Synthetic and real-world Behzadi et al. (2020) 3-
algorithm clustering mixed data, called CLustering datasets including Automobile
mixed-type data Including COncept and Adult data sets.
Trees (ClicoT). The algorithm is based
on the principle of Minimum Description
Length (MDL)
K-Means clustering The authors proposed a framework for Real-world UCI datasets Tran et al. (2021) –
handling mixed-type datasets, called including KDD99, Income,
COrrelation-Preserving Embedding Titanic, and Echo
framework (COPE). The framework uses
Autoencoder Vincent et al. (2008) to
learn the representations of categorical
features in mixed-type data
k-prototype clustering The authors introduced a mixed-type Life insurance Yin et al. (2021) –
algorithm data clustering technique for risk
management.
numerical, and embedded data. The technique also preserves the corre- discussion is divided into three sub-sections. The first section presents
lation between numerical and categorical attributes. Extensive experi- a discussion on the performance of existing clustering algorithms. The
ments were performed on different real-world datasets, and the results second section presents some open issues in clustering algorithms. The
show that the proposed method generates very good representations of third subsection presents some validation and similarity measures used
categorical features. in both traditional and recently proposed clustering techniques.
Death benefits are one of the largest items that affect life insurance
companies. Moreover, some life insurance companies do not have a 6.1 Performance of clustering algorithms
process of effectively tracking and monitoring death claims. Yin et al.
The performance of clustering algorithms can be categorized using
(2021) introduced a mixed-type data clustering technique for risk man-
nine properties (Al-Jabery et al., 2019). These properties form the
agement. They used the technique to examine the difference between
major and important criteria for evaluating any clustering algorithm.
actual and expected death claims. The authors used the k-prototype
The properties are presented and discussed below.
clustering method to extract insights from a real-world mixed-type
Scalability measures the running time and memory requirements for
dataset. The dataset contains policy information on life insurance. They
executing the clustering algorithm. It is the top priority for a clustering
used gap statistics to obtain optimal clusters from the dataset, and algorithm because of the ever-increasing data from different big data
each cluster had low actual to expected death claims. The method was mining sources. Linear or near-linear complexity is therefore highly
evaluated, and results showed that it identified a policy-holder feature, desirable for all clustering algorithms.
which can improve decision-making (see Table 5).
High dimensionality: This measures the algorithm’s ability to handle data
6 Discussion and open challenges with many features that sometimes may be larger than the number of
objects in the dataset. Identifying relevant features or capturing the
intrinsic dimension is important for describing the real data structure.
Many clustering-based algorithms have been proposed in the lit-
erature, and some of them performed remarkably well. This section Robustness: Dataset is usually not pure because there is a level of con-
presents a discussion on various issues in clustering analysis. The tamination introduced at the different stages of measurement, storage,
21
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
and processing, hence the need for data cleaning in data mining. It is 6.2 Open issues in clustering algorithms
inevitable for noise and outliers to be present in the data. The measure
of the robustness of a clustering algorithm is its ability to detect and One of the major challenges in clustering analysis is identifying the
remove possible outliers and noise in the data set. number of clusters apriori. This challenge occurs due to a lack of prior
domain knowledge. It also occurs when a dataset has many dimensions
User-dependent K: Knowing the number of clusters apriori is the most with different shapes, size, density, and overlapping among groups.
fundamental problem in cluster analysis. Many existing algorithms Although effort has been deployed towards handling this problem, it
require the number of clusters to be specified as part of the user param- remains a major challenge. Future studies can explore nature-inspired
eters required for running the algorithm. Determining this parameter algorithms to solve this problem (José-García and Gómez-Flores, 2016).
apriori is known as automatic clustering, and it has continued to attract José-García and Gómez-Flores (2016) noted that nature-inspired ap-
more attention since, in many of the recent algorithms, it is difficult proaches such as bacterial foraging optimization, firefly optimization,
to decide without prior knowledge. Ability to specify the correct num- and gravitational search algorithms could be further considered beyond
ber of clusters assists in obtaining optimal solutions to the clustering non-automatic clustering to automatic clustering problems. Further-
more, the authors revealed that only a few studies thought hybrid
problems in many applications. The algorithm should determine the
nature-inspired algorithms in their studies. Nature-Inspired techniques
number of clusters based on the data properties. Therefore, clustering
can be hybridized with traditional techniques to design more effi-
algorithms are considered an optimization problem that can be solved
cient and faster cluster-based algorithms. Nature-inspired clustering
using metaheuristic algorithms. Metaheuristics algorithms can be used
algorithms should be hybridized through a reasonable combination of
to discover the number of clusters automatically. They can also be used related algorithms in a fashionable and performance enhancement way
to find the identified clusters. that can produce improved results. Moreover, swarm intelligence-based
clustering algorithms have not been fully explored in solving NP-hard
Parameter reliance: Apart from the requirement of specifying the num-
problems in computational biology (Das et al., 2008b). Many more
ber of clusters apriori, there are still other algorithm-sensitive parame-
open issues in clustering analysis exist in the literature, and some of
ters that are user-defined for many of the existing clustering algorithms
them are discussed under the sub-headings below.
for users to specify for the proper functioning of such algorithms. This
leaves such clustering algorithms’ performances at the mercy of the Computational complexity: Some clustering algorithms have computa-
wide range of users’ guesses. Thus, providing practical guidance in tional complexity issues, especially when applied to datasets with large
the self-determination of such parameters by the algorithm itself or instances and high-dimensional feature space. This problem can be
solved by increasing computational resources production with high-
incorporating schemes that decrease reliance of algorithms on user-
capacity GPU (Shirkhorshidi et al., 2014). Moreover, exploiting parallel
dependent parameters is a good measure of clustering algorithm’s
computing’s advantage may help design patterns to deliver better clus-
performance.
tering algorithms. Two separate studies (Shirkhorshidi et al., 2014;
Irregular cluster shape: The ability to discover irregular clusters is an- Zerhari et al., 2015) reported that clustering algorithms based on paral-
other challenge of Clustering algorithms. There are many applications lel computing appears to be very useful but suffer from the challenge of
whose data may not always be formed into clusters with regular shapes. complexity as it relates to implementation. MapReduce-based cluster-
ing algorithms are an alternative to parallel computing. MapReduce-
For applications with such a dataset, an optimal solution to their
based clustering algorithms are more scalable and faster. They can
clustering problem will be presenting clusters of the natural shape
considerably deploy clustering algorithms on GPU-based MapReduce
(regular or irregular). Rather than being confined to some particular
frameworks to achieve better scalability and speed.
shape, a good cluster algorithm should detect irregular cluster shapes.
In addition to increasing computational resources with high-
Order dependence: Maintaining the order of input patterns in a dataset capacity such as GPU to tackle computational complexity issues, fur-
is important in achieving correct clusters. Such characteristic is com- ther refinement and enhancement to the clustering algorithm might
reduce complexity. This became necessary considering each clustering
monly in incremental or online (stream) data. Clustering algorithms
algorithm’s different computational complexity, yet they can achieve
for such a dataset may require that their clustering solutions vary
come commendable measure of clustering operation. We opined that
with different orders of the presentation of the input patterns. Such
finding an optimal representation of each clustering algorithm without
algorithms must be able to achieve this. A major challenging problem
losing its clustering operation helps minimize complexity while main-
in incremental learning is a reduced sensitivity or outright insensitivity taining or maximizing the quality of clusters derived from its clustering
to the effects of input patterns. operation.
Considering computational complexity from the general point of
Visualization: Good presentation of clustering output enhances the
view of clustering categorizations, the hierarchical clustering method
proper interpretation of the presented result and also aids the extraction
is known to have the complexity of O(n2 ), partitioning is O(n), the
of useful information. Good visual representation of clustering output
grid-based method is O(n), and density based method is O(n log n).
aids interpretability of the same in the problem domain, thus assisting
Although each of these methods has a wide range of examples of
users in understanding the results and extracting useful information clustering algorithms that often may not have the same complexity
from the data. as demonstrated by the respective category they belong to, we argue
that finding a minimization mechanism for representing the clustering
Mixed data types: Clustering algorithms are expected to be flexible
algorithm will help reduce complexity. To further support our assertion,
enough to handle any data type in which the dataset is presented.
we show the complexity of some clustering algorithms belonging to
This is very important because data obtained from different sources
the clustering methods (hierarchical, density-based, grid-based, and
may be characterized by additional features, such as categorical or
partitioned) and how they differ in their complexity compared with
continuous. Some studies have shown that when operators are com- their parent category. The computational complexity of the clustering
bined in algorithm design, it tends to poise such algorithms to robustly algorithms BIRCH is O(n), CURE is O(n2 logn), ROCK O(n2 + nmmma
handle diversity in data/ population, thereby improving the quality of + n2 logn), is CHAMELEON is O(n2 ), PAM is O(k(n-k)2 ), CLARA is
results within a short time (Saemi et al., 2018). Leveraging this concept, O(k(40 + k)2 + k(n-k)), CLARANS is O(kn2 ), DBSCAN is O(nlogn),
designers of clustering algorithms have the opportunity of developing Fuzzy C-means is O(n), K-Means is O(n), STING is O(k), WaveCluster
algorithms capable of handling input with different attributes. is O(n), CLIQUE is O(Ck + mk), SOM net is O(n2 m), DENCLUE is
22
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
O(log |D|), and DBCLASD is O(3n2 ). A careful observation of each a notion of object similarity (Plant and Böhm, 2009). The differences
algorithm’s complexity concerning the methods they fall into might in data object representation across different application areas por-
reveal a slight variation, indicating that the algorithms’ enhancement tend a viable research interest. One relevant outcome of finding an
can reduce complexity while sustaining good and qualitative clustering efficient way for representing data in clustering operation is that it
results. supports clustering algorithms’ performance by reducing computational
complexity. This allows the clustering algorithm to be scalable by
Refinement of Clusters: The resulting clusters from a clustering operation
identifying regions or special distribution in data. Some useful regions
often require further improvement using either the same clustering
to identify are those aspects of the compressible data, regions that
algorithm or another clustery algorithm. This refinement aims to ensure
do not need to be swapped out of memory but maintained in main
that objects wrongly clustered due to inefficient similarity measures
memory, and regions that are discardable based on noise or irrelevance
might be displaced to the cluster where they fit well. Some clustering
to the outcome of clustering operation.
methods, such as the divisive method, apply two approaches to the
refinement task of clusters, namely monothetic and polythetic. While Evaluation measures: The following measures can be used as yard-
the former split a cluster using only one attribute, the latter split a sticks to evaluate and compare the performance of different clustering
cluster using all attributes. We considered that such approaches are algorithms: accuracy, algorithm stability, and dataset normalization
already proof that more techniques can evolve to improve the quality (Chaouni et al., 2019). Moreover, there is the need to design algo-
of clusters. This refinement issue presents itself as necessary regarding rithmic approaches to compare different clustering methods based on
the effect of wrongly classifying objects into clusters in life-threatening different validity indices such as internal, stability and biology indices
applications. In fact, for improved performance of the applicability of (Bouveyron et al., 2012). Although Kokate et al. (2018) suggested that
metaheuristic algorithms, such algorithms’ hybrids may be considered a single algorithm may not satisfy all evaluation measures, starting
to achieve optimal performance of the refinement task. with one algorithmic solution may lead to further hybridized or robust
solutions.
Speed of Convergence: A wide range of metaheuristic algorithms have
inspired nature and human activities to solve optimization problems, Data streams: The peculiarity of clustering makes the clustering process
including clustering operations effectively. Since a good convergence more demanding than clustering on static data. Kokate et al. (2018),
is one of the pointers to the effectiveness and measure of how quali- reported some challenges of clustering methods on data streams. Clus-
tative the resulting clusters from a clustering algorithm is, continuous tering methods should be robust enough to deal with the existence
research into applying metaheuristic algorithms to clustering problem of outliers/noise. Moreover, clustering algorithms should be capable
is encouraged. In addition to repurposing metaheuristic algorithms of sharply detecting the change in context and grouping of streaming
for solving clustering convergence problems, other related clustering data objects to support the analysis of trends in data streams. Further,
operation problems that may be optimized are the sensitivity of the the increasing volume of data streams generated from different media
initialization phase and multi-objective functions involving both inter- like social networks should improve the computational capability and
and intra-cluster measurement and escape from local optima. Most memory space optimization of clustering algorithms. More research ef-
clustering algorithms suffer from these clustering-related operations forts will be needed to produce adaptive models for clustering evolving
and have presented open issues allowing for further research. We argue data streams and improving existing context-based adaptive clustering
that the effective repurposing of metaheuristic algorithms for solving methods.
these problems has great performance for clustering algorithms. For
Knowledge extraction: Another problem in clustering is knowledge ex-
instance, variants of the firefly algorithm have been applied to the
traction from big datasets. It is caused by increased data sources and
problem of initialization and escape from local optima in K-Means
generation (Ezugwu et al., 2020a). This problem poses a big challenge
clustering operation (Xie et al., 2019), application of PSO to multi-
to the data analyst, as they cannot effectively extract knowledge from
objective clustering (Gong et al., 2017). We, however, note that there
terabytes and petabytes of data. Future studies can design improved
are other clustering operations, such as convergence, spatial clustering,
techniques that can overcome the limitation of knowledge extraction
and many more, which are yet to be optimized through the application
from big datasets, such as distributed clustering and parallel evolu-
of optimization algorithms.
tionary algorithm. Moreover, future studies can develop new clustering
Data dimensionality: Algorithms such as K-Means, Gaussian mixture methods that can choose between single-objective or multi-objective
model (GMM) clustering, maximum-margin clustering and information- optimizations.
theoretic clustering cannot be easily applied to problems with high- Therefore, in the light of the studies mentioned above and discus-
dimensional data. This problem can be solved by projecting the original sions, a comprehensive survey paper may be necessary which is focused
data onto a low-dimensional subspace and then clustering on the on presenting an exhaustive list of clustering algorithms since it is
feature embedding such as sparse codes (Wang et al., 2016). often reported that it is hard to give a complete list of all clustering
algorithms due to the diversity of information (Xu and Tian, 2015).
Effectiveness and scalability: Effectiveness and scalability are two major
challenges that open further research in clustering methods related to
6.3 Cluster similarity and validation measures
Big Data. The approach of deep learning has been introduced as a
potential solution to this challenge. Also, decreasing the reliance of
This section presents the various cluster similarity and validation
algorithms on user-dependent parameters can improve the effectiveness
measures employed in traditional and recently proposed clustering
of clustering algorithms. Future studies can integrate domain-based
techniques. A similar framework structure that briefly discussed most
requirements into a new single algorithm. Additionally, future research
of the commonly used clustering validation measures was presented in
can develop new clustering algorithms emerging from designing solu-
Ezugwu et al. (2020a).
tions to some fundamental challenges of non-automatic and automatic
clustering. Future research can also design improved algorithms that
6.3.1 Clustering similarity measures
deal with newly occurring data without relearning from scratch.
Cluster similarity measure gives the degree of closeness or sepa-
Data object representation: Data object representation is another chal- ration of characteristics of various data points. The degree of separa-
lenge with clustering methods. The data objects are not represented in tion or closeness can be defined explicitly or implicitly, reflecting the
a suitable format. Moreover, data objects are represented differently strength of the relationship between data points. All cluster methods
across different application areas. Some data objects are represented clearly define cluster relationships among data objects that they are
as feature vectors, while some are represented as graphs together with applied on, which plays a significant role in the clustering method’s
23
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
success or otherwise (Patidar et al., 2012). The subsequent subsection attribute’s relative importance or weights are available. This distance
presents a brief overview of commonly used similarity measures in is defined in Eq. (13).
√
traditional and recently proposed clustering techniques. √𝑚
√∑ ( )2
Euclidean distance: Euclidean distance is considered the standard or 𝐷𝑤𝑒𝑖𝑔ℎ𝑡 = √ 𝑤𝑖 𝑋𝑖𝑘 − 𝑋𝑗𝑘 (13)
𝑘=1
commonly used metric for numerical data. Simply put, it is the distance
between two points, 𝑋 and 𝑌 . Euclidean distance has wide acceptance Chord distance: Given two normalized or non-normalized data points
in many clustering problems, and it is the default distance measure used within a hypersphere of radius one, the chord length joining the points
with the K-means algorithm. The Euclidean distance is shown in Eq. (6) 𝑋 and 𝑌 is called the chord distance. Chord distance is another variant
(Singh et al., 2013). of the Euclidean distance (Gan et al., 2007). Chord distance is defined
√
√𝑚 by Eq. (14).
√∑ ( )2
𝐷𝑖𝑠𝑡𝑋𝑌 = √ 𝑋𝑖𝑘 − 𝑋𝑗𝑘 (6) ( ∑𝑛 )1
2
𝑘=1 𝑖=1 𝑥𝑖 𝑦𝑖
𝑑𝑐ℎ𝑜𝑟𝑑 = 2 − 2 (14)
‖𝑥‖2 ‖𝑦‖2
where
√∑
where 𝑋 and 𝑌 are two objects or sets of a certain class in cluster 𝑘. 𝑛
where ‖𝑥‖2 is the 𝐿2 -norm ‖𝑥‖2 = 2
𝑖=1 𝑥𝑖
Cosine distance: The Cosine distance measures the cosine of the angle be-
Mahalanobis distance: Mahalanobis distance is a data-driven similarity
tween two data points given by Eq. (7) (Pandit and Gupta, 2011). Here measure far from previously discussed metrics (Boriah et al., 2008).
𝜃 gives the angle between two data vectors, and A, B are n-dimensional It can extract hyper ellipsoidal clusters and solve the issues caused by
vectors and have excellent application in document similarity. linear correlation among features measured (Mao and Jain, 1996) (AK,
arccos (𝐴.𝐵) MN, & PJ., 1999). Mahalanobis distance is defined by Eq. (15) (Abonyi
𝜃= (7)
‖𝐴‖ ‖𝐵‖ and Feil, 2007).
√
Jaccard distance: The Jaccard distance or coefficient typically measures 𝑑𝑚𝑎ℎ = (𝑥 − 𝑦) 𝑆 −1 (𝑥 − 𝑦)𝑇 (15)
the similarity between the two data objects by evaluating the intersec- where 𝑆 is the covariance matrix of the dataset.
tion divided by the data objects’ unions shown in Eq. (8) (Pandit and
Gupta, 2011). The Jaccard similarity measure was applied in ecological Pearson correlation: Pearson correlation has found great application
clustering species (Choi et al., 2010). in clustering gene expression data. Pearson correlation evaluates the
similarities in the shape of gene expression pattern in this case (Xu and
|𝐴 ∩ 𝐵| Wunsch, 2005). The Pearson correlation is defined by Eq. (16).
𝐽 (𝐴, 𝐵) = (8)
|𝐴 ∪ 𝐵| ∑𝑛 ( )( )
𝑖=1 𝑥𝑖 − 𝜇𝑥 𝑦𝑖 − 𝜇𝑦
𝑃 𝑒𝑎𝑟𝑠𝑜𝑛 (𝑥, 𝑦) = √ √ (16)
Manhattan distance: Manhattan distance measures the absolute differ- ∑𝑛 ( ) 2 ∑𝑛 ( )2
ences between coordinates of pairs of data objects 𝑋 and 𝑌 , as shown in 𝑖=1 𝑥𝑖 − 𝑦𝑖 𝑖=1 𝑥𝑖 − 𝑦𝑖
Eq. (9) (Singh et al., 2013). Manhattan distance in clustering algorithms where 𝜇𝑥 and 𝜇𝑦 are the means for 𝑥𝑎𝑛𝑑𝑦 respectively.
results in hyper-rectangular-shaped clusters (Xu and Wunsch, 2005).
Multi Viewpoint-Based Similarity Measure: Multi Viewpoint-Based Simi-
| | larity Measure has excellent advantages in document clustering, where
𝐷𝑖𝑠𝑡𝑋𝑌 = |𝑋𝑖𝑘 − 𝑋𝑗𝑘 | (9)
| | multiple viewpoints can be used to make a more informative assess-
Chebyshev distance: The determination of the absolute magnitude of ment of similarity. Multi Viewpoint was proposed by Sruthi and Reddy
(2013), where the two data points whose similarity is to be measured
the differences between coordinates of a pair of data objects 𝑋 and
must be in the same cluster, whereas the viewpoint is outside the
𝑌 is Chebyshev’s goal. Eq. (10) gives the formula for Chebyshev (Singh
cluster. Eq. (17) defines the multi viewpoint-based similarity measure.
et al., 2013).
( )
| | 𝑀𝑉 𝑆 𝑑𝑖 , 𝑑𝑗 |𝑑𝑖 , 𝑑𝑗 ∈ 𝑆𝑟
𝐷𝑖𝑠𝑡𝑋𝑌 = 𝑚𝑎𝑥𝑘 |𝑋𝑖𝑘 − 𝑋𝑗𝑘 | (10)
| | 1 ∑ ( )
‖ ‖𝑑𝑗 − 𝑑ℎ ‖
= cos 𝑑𝑖 − 𝑑ℎ , 𝑑𝑗 − 𝑑ℎ ‖‖𝑑𝑖 − 𝑑ℎ ‖ ‖ ‖ (17)
𝑛 − 𝑛𝑟 𝑑 ‖ ‖
Minkowski distance: Minkowski Distance is also called the generalized ℎ
distance metric because given Eq. (11), when 𝑝 = 2, the distance where 𝑑𝑖 𝑎𝑛𝑑 𝑑𝑗 are points in clusters 𝑆𝑟 and 𝑑ℎ is viewpoint.
becomes the Euclidean distance and taking the limit when 𝑝 = ∞ Bilateral Slope-Based Distance: A new similarity measure was proposed
becomes Chebyshev distance metric. The Minkowski distance has a by Kamalzadeh et al. (2020) in their work on time-series clustering. It
significant advantage when the embedded clusters in the dataset are combines a simple representation of time series, slope of each segment
compacted or isolated. Otherwise, it performs poorly (Mao and Jain, of time series, Euclidean distance, and the so-called dynamic time
1996). warping. The bilateral slope-based distance (BSD) is defined as shown
( 𝑑 )𝑝 in Eq. (18).
∑| |
1
𝐷𝑖𝑠𝑡𝑋𝑌 = |𝑋𝑖𝑘 − 𝑋𝑗𝑘 | 𝑝 (11) ( )
| | 𝑑𝐵𝑆𝐷 𝑇 𝑆𝑖(1) , 𝑇 𝑆𝑗(2)
𝑘=1
| | | (2) | | (1) |
= |𝑥(1) − 𝑥(2) (1) (1)
𝑗 || + ||𝑠𝑖𝑛𝜃𝑖 − 𝑠𝑖𝑛𝜃𝑗 || + ||𝑠𝑖𝑛𝜃𝑖−1 − 𝑠𝑖𝑛𝜃𝑖−1 || (18)
Average distance: As a solution to the drawback of Euclidean distance, | 𝑖
the average distance, which is a modified version of the Euclidean
Distance measure for non-continuous-valued attributes: The distance mea-
distance, was proposed to improve the results in the work of Gan et al.
sures discussed above are best applied for continuous-valued attributes.
(2007). The average distance is defined in Eq. (12).
Nevertheless, not all data attributes are continuous; therefore, the
√
√ 𝑚 distance measures are revised for categorical, binary, ordinal, or mixed
√1 ∑( )2
𝐷𝑎𝑣𝑒 = √ 𝑋 − 𝑋𝑗𝑘 (12) type attributes.
𝑛 𝑘=1 𝑖𝑘
The contingency table may be used for binary attributes, and this is
given using the simple matching coefficient shown in Eq. (19).
Weighted Euclidean distance: The Weighted Euclidean distance modifies ( ) 𝑟+𝑠
Euclidean distance (Hand et al., 2001). It can be used when each 𝑥𝑖 , 𝑥𝑗 = (19)
𝑞+𝑟+𝑠+𝑡
24
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
( )
For nominal attributes, the simple matching in Eq. (20) may be feature 𝑓𝑖 given that it belongs to the positive category 𝑐 𝑝 𝑓𝑖 |𝑐 is
used. the conditional probability( of) the feature 𝑓𝑖 given that it belongs to the
( ) 𝑝−𝑚 positive category 𝑐 and 𝑝 𝑓𝑖 is the previous probability of the entity
𝑑 𝑥𝑖 , 𝑥𝑗 = (20)
𝑝 (Corter and Gluck, 1992; Ezugwu et al., 2020a).
Bayesian information criterion (BIC) index: The problem of overfitting
6.3.2 Cluster validation measure
the partitions generated by the clustering algorithm is huge, and BIC
The analogous question in cluster analysis is ‘‘how to evaluate the
tries to solve it. BIC is a minimization problem (Raftery, 1986) and is
goodness of the resulting clusters’’. The resulting cluster is greatly
defined as follows:
influenced by each clustering method’s parameters and initial condi-
tions; therefore, evaluating the goodness of the clusters should consider 𝐵𝐼𝐶 = − ln (𝐿) + 𝑣𝑙𝑛 (𝑛) (26)
this range of parameters and constraints. Though cluster validation
where 𝑛 is the number of entities, 𝐿 is the likelihood of the param-
is an arduous task, it plays a significant role in avoiding situations
eters to generate the data in the model, and 𝑣 is the number of free
where patterns are found in the presence of noise, comparing clustering
parameters in the Gaussian model.
algorithms, comparing two sets of clusters, and comparing two clusters.
Cluster validation criteria are usually internal or external validation Calinski–Harabasz index: The Calinski–Harabasz validity index measures
(Bezdek and Pal, 1998). However, a third classification is called relative the compactness or closeness of the clusters by calculating the distances
validation (Legány et al., 2006). between the points in a cluster to their centroids. Likewise, the separa-
tion is calculated by measuring the distance from the centroids to the
6.3.2.1 Internal validation criteria The underlying structure of the
global centroid (Calinski and Harabasz, 1974; Ezugwu et al., 2020a).
dataset plays a significant role in successfully partitioning the dataset.
This index is defined as
Practically, the dataset’s underlying structure is usually unknown, and trace (SB) 𝑛𝑝 − 1
there is no way of knowing the correct partitioning of the dataset. 𝐶𝐻 = . (27)
trace (Sw) 𝑛 − 𝑘
𝑝
The internal validation criteria measure the intra-cluster compactness
where (𝑆𝐵) is the inter-cluster scatter matrix, (𝑆𝑤) the intra-cluster
and the inter-cluster separation after the dataset partitioning by the
scatter matrix, 𝑛𝑝 is the number of entities in a cluster, and 𝑘 the
clustering algorithm. There are a variety of standards that have been
number of clusters.
proposed, which are outlined below.
Davies–Bouldin index (DB): The DB index evaluates the average inter-
Sum of squared error: Sum of squared error (SSE) is one of the most
cluster similarity between any two clusters and their nearest. This
popular cluster evaluation criteria, and it is defined as follows:
information is critical to the success of the DB index (Davies and
∑𝐾 ∑
𝑆𝑆𝐸 = ‖𝑥 − 𝜇 ‖2 Bouldin, 1979). For better results, DB is minimized. The Davies–Bouldin
‖ 𝑖 𝑘‖ (21)
𝑘=1 ∀𝑥𝑖 ∈𝐶𝑘 index is defined as
{follows:
( ) ( )}
1∑
𝑐
where 𝐶𝑘 is the set of all instances in the cluster 𝑘 and 𝜇𝑘 is the vector 𝑑 𝑥𝑖 + 𝑑 𝑥𝑗
𝐵𝐷 = 𝑀𝑎𝑥 ( ) (28)
mean of 𝑘. The partition with the lowest SSE is considered the best 𝑐 𝑖=1
𝑖≠𝑗 𝑑 𝑐 ,𝑐 𝑖 𝑗
(Hamilton, 1994; Tsay, 2005) ( )
where
( ) 𝑐 is the number of clusters, 𝑖, 𝑗 (are cluster) labels, 𝑑 𝑥𝑖 and
Scatter criteria: The scatter criteria (Rokach, 2005; Duda et al., 2001) is 𝑑 𝑥𝑗 are all entities in clusters 𝑖 and 𝑗, 𝑑 𝑐𝑖 , 𝑐𝑗 is the distance between
given as follows: the cluster centroids.
∑ ( )( )𝑇
𝑆𝑘 = 𝑥 − 𝜇𝑘 𝑥 − 𝜇𝑘 . Silhouette index: The silhouette index requires that information about
(22)
𝑥∈𝐶𝑘 the compactness and separation of at least two clusters be known
(Rousseeuw, 1987).
Condorcet’s criterion: Condorcet’s criterion is given as follows: Given a cluster, 𝑋𝑗 (𝑗 = 1, ..𝑐), the index assigns to the 𝑖th entity of
∑ ∑ ( ) ∑ ∑ ( )
𝑠 𝑥𝑗 , 𝑥𝑘 + 𝑑 𝑥𝑗 , 𝑥𝑘 𝑋𝑗 the silhouette width, 𝑠 (𝑖) = (𝑖 = 1, … , 𝑚). This value gives a degree
𝐶𝑖 ∈𝐶 𝑥𝑗 ,𝑥𝑘 ∈𝐶𝑖 𝐶𝑖 ∈𝐶 𝑥𝑗 ∈𝐶𝑖 ;𝑥𝑘 ∉𝐶𝑖 (23) of likelihood of the 𝑖th sample belonging in the cluster 𝑋𝑗 . The index
𝑥𝑗 ≠𝑥𝑘
is defined as:
( ) ( ) (𝑏 (𝑖) − 𝑎 (𝑖))
where 𝑠 𝑥𝑗 , 𝑥𝑘 and 𝑑 𝑥𝑗 , 𝑥𝑘 , respectively, are similarity and distance 𝑠 (𝑖) =
𝑀𝑎𝑥 {𝑎 (𝑖) , 𝑏 (𝑖)} (29)
between the vectors 𝑥𝑗 𝑎𝑛𝑑 𝑥𝑘 .
where 𝑎(𝑖) is the average distance between the 𝑖th entity in the cluster
C-criterion: An extension of Condorcet’s validity index is given in Fortier
and the remaining entities of cluster 𝑋𝑗 ; 𝑏(𝑖) is the minimum average
and Solomon (1996). The C-criterion is defined as follows:
∑ ∑ ( ( ) ) ∑ ∑ ( ( )) distance between the 𝑖th 𝑠 and all of the entities clustered in 𝑋𝑘 (𝑘 = 1,
𝑠 𝑥𝑗 , 𝑥𝑘 − 𝛾 + 𝛾 − 𝑠 𝑥𝑗 , 𝑥𝑘 . . 𝑐; 𝑘 ≠ 𝑗).
𝐶𝑖 ∈𝐶 𝑗 𝑘 𝑖
𝑥 ,𝑥 ∈𝐶 𝐶𝑖 ∈𝐶 𝑥𝑗 ∈𝐶𝑖 ;𝑥𝑘 ∉𝐶𝑖 (24)
𝑥𝑗 ≠𝑥𝑘 Dunn index: The Dunn index basically looks for the ratio between the
where 𝛾 is a threshold value. smallest inter-cluster distance and the largest intra-cluster distance in
a partitioning (Dunn, 1973). The Dunn index is defined as follows:
Category utility metric: Given a set of entities, the binary feature set of ⎧ ⎧ ( ) ⎫⎫
size n is defined as ⎪ ⎪ 𝑑 𝑐𝑖 , 𝑐𝑗 ⎪⎪
𝐷𝑢𝑛𝑛 = 𝑚𝑖𝑛1≤𝑖≤𝑐 ⎨min ⎨ ( ) ⎬⎬ (30)
{ } ⎪ 𝑚𝑎𝑥 𝑑 𝑋𝑘 ⎪⎪
⎪ 1≤𝑘≤𝑐
𝐹 = 𝑓𝑖 , 𝑖 = 1, 2, … , 𝑛 ⎩ ⎩ ⎭⎭
{ } ( ) ( )
and the binary category 𝐶 = 𝑐, 𝑐 is defined as follows: where 𝑑 𝑐𝑖 , 𝑐𝑗 is the distance between cluster 𝑋𝑖 and 𝑋𝑗 ; 𝑑 𝑋𝑘
[ ] represents the distance between members of cluster (𝑋𝑘 ) and 𝑐 is the
∑𝑛
( ) ( ) ( )∑ 𝑛
( ) ( ) number of clusters in the dataset. The Dunn index is a maximization
𝐶𝑈 (𝐶, 𝐹 ) = 𝑝 (𝑐) 𝑝 𝑓𝑖 |𝑐 log 𝑝 𝑓𝑖 |𝑐 + 𝑝 𝑐 𝑝 𝑓𝑖 |𝑐 log 𝑝 𝑓𝑖 |𝑐
𝑖=1 𝑖=1 problem with setbacks that include its time complexity, and it is being
∑
𝑛
( ) ( ) affected by noise in datasets.
− 𝑝 𝑓𝑖 log 𝑝 𝑓𝑖 , (25)
𝑖=1
NIVA index: The NIVA validation index (Rendon et al., 2008) is defined
as follows:
given that: 𝑝 (𝑐) is the prior probability of an entity belonging to
( ) 𝐶𝑜𝑚𝑝𝑎𝑐 (𝐶)
the positive category 𝑐 𝑝 𝑓𝑖 |𝑐 is the conditional probability of the 𝑁𝐼𝑉 𝐴 (𝐶) = (31)
𝑆𝑒𝑝𝑥𝐺 (𝐶)
25
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
where 𝐶𝑜𝑚𝑝𝑎𝑐 (𝐶) is the average compactness of the cluster C and OS-Index: The OS-index
∑ (Drewes,
∑ 2005) is defined as follows:
𝑆𝑒𝑝𝑥𝐺 (𝐶) is the average separability of cluster C. 𝑐𝑘 ∈𝐶 𝑥𝑖 ∈𝑐𝑘 𝑂𝑉 (𝑥𝑖 , 𝑐𝑘 )
𝑂𝑆 (𝐶) = ∑ ∑ ( ) ( )
𝑐𝑘 ∈𝐶 10∕|𝑐𝑘 | 𝑚𝑎𝑥𝑥𝑖 ∈𝑐𝑘 0.1 ||𝑐𝑘 || {𝑑𝑒 𝑥𝑖 𝑐𝑘 } (39)
Gamma Index: The gamma index (Baker and Hubert, 1975) is defined
as: ∑ ∑ ( ) The modified Hubert 𝛤 statistic: The modified Hubert 𝛤 statistic
𝑐𝑘 ∈𝐶 𝑥𝑖 ,𝑥𝑗 ∈𝑐𝑘 𝑑𝑙 𝑥𝑖 , 𝑥𝑗
𝐺 (𝐶) = (( ) ) (Theodoridis and Koutroubas, 1999) is defined as follows:
(32)
𝑛𝑤 𝑁2 − 𝑛𝑤 ( ) 𝑁−1
1 ∑ ∑
𝑁
Score function: The scoring index evaluates the dispersion amongst where the dimension of the dataset is N, 𝑀 = 𝑁(𝑁−1)
2
, P is the proximity
clusters by estimating the distance between cluster centroids and the matrix of the dataset and Q is an 𝑁 × 𝑁 matrix.
global centroid. The score function measures the degree of closeness
SD validity index: The SD validity index measures the mean intra- and
of the clusters by estimating the distance from the points in a cluster inter-cluster scattering (Halkidi et al., 2002). The definition for this
to their centroid (Saitta and Smith, 2007; Saitta, Raphael, and Smith, index is as follows:
2007; Ezugwu et al., 2020a). The index is defined as follows: ( ) ( ) ( )
1 𝑆𝐷 𝑛𝑐 = 𝑎.𝑆𝑐𝑎𝑡 𝑛𝑐 + 𝐷𝑖𝑠 𝑛𝑐 (41)
𝑆𝐹 (𝐶) = 1 − 𝑏𝑐𝑑(𝐶) (33)
𝑒𝑒 +𝑤𝑐𝑑(𝐶)
where
where
( ) ∑𝑛𝑐 ‖ ( )‖
∑ | | ( )
1
‖𝜎 𝑣𝑖 ‖
𝑖=1 ‖ ‖
𝑐𝑘 ∈𝐶 |𝑐𝑘 | 𝑑𝑒 𝑐𝑘 , 𝑋
𝑛𝑐
𝑏𝑐𝑑 (𝐶) = 𝑆𝑐𝑎𝑡 𝑛𝑐 =
𝑁 ×𝐾 ‖𝜎 (𝑋)‖
and 𝑛𝑐
(𝑛 )−1
∑ ( ) 𝐷 ∑ ∑ 𝑐
1 ∑ ( ) 𝐷𝑖𝑠 𝑛𝑐 = 𝑚𝑎𝑥 ‖𝑣𝑘 − 𝑣𝑧 ‖
𝑤𝑐𝑑 (𝐶) =
|𝑐𝑘 |
𝑑𝑒 𝑥𝑖 , 𝑐𝑘 . 𝐷𝑚𝑖𝑛 𝑘=1 𝑧=1 ‖ ‖
𝑐𝑘 ∈𝐶 | | 𝑥𝑖 ∈𝑐𝑘
Dbw validity index: The clusters’ underlying characteristics are used by
C-Index: The C-index (Dalrymple-Alford, 1970) is defined as:
S_Dbw validity index aims to measure the validity of the results from
𝑆 (𝐶) − 𝑆min(𝐶)
𝐶𝐼 (𝐶) = the clustering algorithm (Halkidi and Vazirgiannis, 2001). It is defined
𝑆max(𝐶) − 𝑆min(𝐶) (34)
as follows
where ( )
𝑆𝐷𝑏𝑤(𝑛𝑐 ) = 𝑆𝑐𝑎𝑡 𝑛𝑐 + 𝐷𝑒𝑛𝑠𝑏𝑤(𝑛𝑐 ) (42)
∑ ∑ ( )
𝑆 (𝐶) = 𝑑𝑒 𝑥𝑖 𝑥𝑗 where
𝐶𝑘 ∈𝐶 𝑥𝑖 ,𝑥𝑗 ∈𝐶𝑘
𝑛𝑛 ⎛ 𝑛𝑐 ⎞
∑ ( ) { ( )} 1 ∑ ⎜∑ 𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑢𝑖𝑗 ) ⎟
( ) ( )
𝑛𝑐 .(𝑛𝑐 − 1) 𝑖=1 ⎜⎜ 𝑗=1 max{𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑣𝑖 , 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑣𝑗 } ⎟⎟
𝑆min(𝐶) = min 𝑛𝑤 𝑥 ,𝑥 ∈𝑋 𝑑𝑒 𝑥𝑖 𝑥𝑗 𝐷𝑒𝑛𝑠_𝑏𝑤(𝑛𝑐 ) =
𝑖 𝑗
∑ ( ) { ( )} ⎝ 𝑖≠𝑗 ⎠
𝑆max(𝐶) = max 𝑛𝑤 𝑥 ,𝑥 ∈𝑋 𝑑𝑒 𝑥𝑖 𝑥𝑗
𝑖 𝑗
where 𝑣𝑖 , 𝑣𝑗 are the centroids of cluster 𝑐𝑖 𝑐𝑗 , and 𝑢𝑖𝑗 the middle point of
the line segment.
Sym-index: The sym-index (Bandyopadhyay and Saha, 2008). It is de-
fined as follows: { ( )} Root-mean-square standard deviation (RMSSTD): The RMSSTD evaluates
𝑚𝑎𝑥𝐶𝑘 ,𝐶𝑙 ∈ 𝑑𝑒 𝑐𝑘 , 𝑐𝑙 the square root of the variance of all the attributes used in the clustering
𝑆𝑦𝑚 (𝐶) = ∑ ∑ ( ) (35) process (Davies and Bouldin, 1979). It is defined as:
∗ 𝑥 ,𝑐 ( )
⎡ ∑ 𝑖=1…𝑛𝑐 ∑𝑛𝑖𝑗 𝑥 − 𝑥 2 ⎤
𝐾 𝑐𝑘 ∈ 𝑥𝑖 ∈𝑐𝑘 𝑑𝑝𝑠 𝑖 𝑘
⎢ 𝑗=1…𝑣 𝑘=1 𝑘 𝑘
⎥
COP Index: The COP index is measured as the distance from the
𝑅𝑀𝑆𝑆𝑇 𝐷 = ⎢ ∑ ( ) ⎥ (43)
⎢ 𝑖=1…𝑛𝑐 𝑛𝑖𝑗 − 1 ⎥
centroids to the cluster points. Likewise, the separation is measured ⎣ 𝑗=1…𝑣 ⎦
as the largest distance between neighbors (Arbelaitz et al., 2013). It
is defined as follows: R-squared (RS): The RS index (Sharma, 1996) is defined as follows:
∑
1 𝑆𝑆𝑏 𝑆𝑆𝑡 − 𝑆𝑆𝑤
1 ∑| | |𝑐𝑘 | 𝑥𝑖 ∈𝑐𝑘 𝑑𝑒 (𝑥𝑖 𝑐𝑘 ) 𝑅𝑆 =
𝑆𝑆𝑡
=
𝑆𝑆𝑡
. (44)
𝐶𝑂𝑃 (𝐶) = 𝑐 ( )
𝑁 𝑐 ∈ | 𝑘 | 𝑚𝑖𝑛𝑥 ∉𝑐 𝑚𝑎𝑥𝑥 ∈𝑐 𝑑𝑒 𝑥𝑖 𝑥𝑗
(36)
𝑘 𝑖 𝑘 𝑖 𝑘
Compact-Separated (CS) index: The CS index measures the ratio of the
Negentropy Increment: Measuring the normality of clusters instead of the sum of within-cluster scatter to between-cluster separation (Kosters and
compactness or separation of the clusters is the goal of the negentropy Laros, 2007). Minimizing the CS index leads to better clustering. Let
index (Lago-Fernández and Corbacho, 2010) the within-cluster scatter be denoted as 𝑋𝑖 and the between-cluster
separation be represented as 𝑋𝑗 , such that the distance measure 𝑉 is
1 ∑ ( ) ∑ ∑ ∑ ( ) ( )
𝑁𝐼 (𝐶) = 𝑝 𝑐𝑘 log | | − 1∕2 log | |− 𝑝 𝑐𝑘 𝑙𝑜𝑔𝑝(𝑐𝑘 ) given as 𝑉 𝑋𝑖 , 𝑋𝑗 . Hence, the CS index for a clustering 𝑄 is computed
2 𝑐 ∈𝐶 𝑐 𝑋 𝑐 ∈𝐶
𝑘 𝑘 𝑘 as follows (Ezugwu et al., 2020a).
(37) 1 ∑𝑃 1 ∑
𝑃 𝑖=1 [ 𝐷𝑛 𝑋𝑖 ∈𝑄𝑖 𝑚𝑎𝑥𝑋𝑗 ∈𝑄𝑖 {𝑉 (𝑋𝑖 , 𝑋𝑗 )}]
1 ∑𝑃 [ { ( )}]
𝐶𝑆 (𝑄, 𝑉 ) =
SV-Index: The SV-index evaluates cluster separation as a measure be- 𝑃 𝑖=1 𝑚𝑖𝑛𝑗∈𝑃 ,𝑗≠𝑖 𝑉 𝑥𝑖 , 𝑥𝑗
∑𝑃 [ ( ) ]
tween nearest neighbors. The compactness measures border points 1 ∑
𝑖=1 𝑄𝑖 𝑋𝑖 ∈𝑄 𝑚𝑎𝑥𝑋𝑗 ∈𝑄 {𝑉 𝑋𝑖 , 𝑋𝑗 }
to centroids of the cluster (Žalik and Žalik, 2011). It is defined as = ∑𝑃 { ( )} (45)
follows: ∑ 𝑖=1 [𝑚𝑖𝑛𝑗∈𝑃 ,𝑗≠𝑖 𝑉 𝑥𝑖 , 𝑥𝑗 ]
𝑐𝑘 ∈𝐶 𝑚𝑖𝑛𝑐𝑙 ∈𝐶∖𝑐𝑘 {𝑑𝑒 (𝑐𝑘 , 𝑐𝑙 )} where ||𝐷𝑛 || represents the number of data points in cluster 𝑃 , the
𝑆𝑉 (𝐶) = ∑ ∑ ( ) ( ) (38) ( )
𝑐𝑘 ∈𝐶 10∕|𝑐𝑘 | 𝑚𝑎𝑥𝑥𝑖 ∈𝑐𝑘 0.1 ||𝑐𝑘 || {𝑑𝑒 𝑥𝑖 𝑐𝑘 } function 𝑉 𝑋𝑖 , 𝑋𝑗 is the distance between within-cluster scatter 𝑋𝑖
26
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
( )
and between-cluster separation 𝑋𝑗 , 𝑉 𝑥𝑖 , 𝑥𝑗 is the distance of data where
points 𝑑 from their centroids, and 𝑃 is the number of clusters in 𝑄.
∑
𝐾 ∑
( )
𝑆𝑊 = 𝑑 𝑀𝑖 , 𝑀𝑗
Ball–Hall index: The Ball–Hall index gives the mean of the mean disper- 𝑖,𝑗∈𝐼𝑘
𝑘=1
sion of all the clusters, and it is given as follows (Ball and Hall, 1965): 𝑖<𝑗
∑ ∑ ( )
𝑆𝐵 = 𝑑 𝑀𝑖 , 𝑀𝑗
𝑘<𝑘′ 𝑖∈𝐼𝑘 ,𝑗∈𝐼𝑘′
1 ∑ 1 ∑ ‖ {𝑘}
𝐾
‖2 𝑖<𝑗
𝐵𝑎𝑙𝑙–𝐻𝑎𝑙𝑙 𝑖𝑛𝑑𝑒𝑥 (𝐶) = ‖𝑀 − 𝐺{𝑘} ‖ (46)
𝐾 𝑘=1 𝑛𝑘 𝑖∈𝐼 ‖ 𝑖 ‖
𝑁𝐵 is the total number of distances between pairs of points that do not
𝑘
belong to the same clusters and 𝑁𝑊 is the total number of distances
Banfeld–Raftery index: The Banfeld–Raftery index evaluates the between pairs of points that belong to the same clusters
weighted sum of the logarithms of the traces of the variance of the
covariance matrix of each cluster. The definition is given as follow PMB index: The PBM index evaluates the distance between the points
(Banfield and Raftery, 1993): and their barycenters. Furthermore, it also evaluates the distances be-
( ) tween the barycenters Pakhira et al. (2004). PMB is the acronym for the
∑𝐾
𝑇 𝑟(𝑊 𝐺){𝑘}
𝐶= 𝑛𝑘 𝑙𝑜𝑔 (47) author’s initials of the names of its authors (Pakhira, Bandyopadhyay,
𝑘=1
𝑛𝑘 and Maulik)
Det Ratio index: The Det Ratio index is defined as follows (Scott and ( )2
1 𝐸
Symons, 1971) 𝐶= × 𝑇 × 𝐷𝐵 (56)
𝐾 𝐸𝑊
det(𝑇 )
𝐷𝑒𝑡 𝑅𝑎𝑡𝑖𝑜 = (48) where
det(𝑊 𝐺) ( )
′
where T is the total scatter matrix and WG is the individual matrices 𝐷𝐵 = max 𝑑 𝐺{𝑘} , 𝐺{𝑘 }
𝑘<𝑘′
Baker-Hubert Gamma index: Given two vectors, A and B, with the same ∑
𝐾 ∑
( )
dataset size, the Baker-Hubert Gamma index evaluates the two vectors’ 𝐸𝑊 = 𝑑 𝑀𝑖 , 𝐺{𝑘}
𝑘=1 𝑖∈𝐼𝑘
correlation. The index is an adaptation of the 𝛤 index, and it is defined
as follows (Baker and Hubert, 1975): ∑
𝑁
𝐸𝑇 = 𝑑(𝑀𝑖 , 𝐺)
𝑆+ − 𝑆− 𝑖=1
𝐶=𝛤 = + (49)
𝑆 + 𝑆− Point-Biserial index: The Point-Biserial index (Milligan, 1981) is
+ ∑ ∑ − ∑ ∑
where 𝑆 = (𝑟,𝑠)𝜖𝐼𝐵 (𝑢,𝑣)𝜖𝐼𝑊 1{𝑑𝑢𝑣 <𝑑𝑟𝑠 } and 𝑆 = (𝑟,𝑠)𝜖𝐼𝐵 (𝑢,𝑣)𝜖𝐼𝑊 defined as follows:
1{𝑑𝑢𝑣 >𝑑𝑟𝑠 } ( )√
𝑆𝑊 𝑆 𝑁𝑊 𝑁𝐵
GDI index: The Generalized Dunn’s Index (GDI) evaluates the intra- 𝐶 = 𝑠𝑛 × 𝑟𝑝𝑏 (𝐴, 𝐵) = − 𝐵 (57)
𝑁𝑊 𝑁𝐵 𝑁𝑇
cluster and inter-cluster distances (Bezdek and Pal, 1998)
( ) where
𝑚𝑖𝑛𝑘≠𝑘′ 𝛿 𝐶𝑘 , 𝐶𝑘′ √
𝐶= ( ) (50) 𝑀𝐴1 − 𝑀𝐴0 𝑛𝐴0 𝑛𝐴1
𝑚𝑎𝑥𝑘 𝛥 𝐶𝑘 𝑟𝑝𝑏 (𝐴, 𝐵) =
𝑠𝑛 𝑛2
where 𝛿 is a measure of inter-cluster distance and 𝛥 is a measure of
𝑀𝐴1 is the mean of the intra-cluster distances and 𝑀𝐴0 is the mean of
intra-cluster distance, with 1 ≤ 𝑘 ≤ 𝐾𝑎𝑛𝑑1 ≤ 𝑘′ ≤ 𝐾
the inter-cluster distances. 𝑠𝑛 is the standard deviation of A, 𝑛𝐴0 , 𝑛𝐴1
G-plus index: The G-plus index (Rohlf, 1974) is defined as follows: are the number of elements in each group. Set A represents distances
2𝑆 − between pairs of cluster points, and the value of B is 1 if a pair of points
𝐺+ = (51) are in the same cluster and 0 otherwise.
𝑁𝑇 (𝑁𝑇 − 1)
Ratkowsky–Lance index: The Ratkowsky–Lance index (Ratkowsky and
Ksq_DetW index: Also denoted as 𝑘2 |𝑊 |, this index (Marriot, 1975) is
Lance, 1978). is defined as follows:
defined as follows: √
𝐶 = 𝐾 2 det (𝑊 𝐺) (52) 𝐶=
𝑅
= √
𝑐
(58)
𝐾 𝐾
where WG is the individual cluster matrices
where
Log_Det_Ratio index: This index is the logarithmic version of Det_Ratio
1 ∑ 𝐵𝐺𝑆𝑆𝑗
𝑝
given in Eq. (53), and it is defined as follows (Scott and Symons, 1971): 𝑐2 = 𝑅 =
𝑝 𝑗=1 𝑇 𝑆𝑆𝑗
( )
det (𝑇 ) 𝐵𝐺𝑆𝑆𝑗 is the 𝑗th diagonal term of the matrix BG
𝐶 = 𝑁𝑙𝑜𝑔 (53)
det (𝑊 𝐺)
Ray–Turi index: The Ray–Turi index (Ray and Turi, 1999) can be
Log_SS_Ratio index: The Log_SS_Ratio index evaluates the ratio of the defined as follows
traces of matrices 𝐵𝐺 and 𝑊 𝐺. The definition of the index is given as 1 𝑊 𝐺𝑆𝑆
follows (Hartigan, 1975): 𝐶= (59)
𝑁 min𝑘<𝑘′ 𝛥2 ′
( ) 𝑘𝑘
𝐵𝐺𝑆𝑆
𝐶 = 𝑙𝑜𝑔 (54) The numerator is the mean of the squared distances of all points
𝑊 𝐺𝑆𝑆
concerning the barycenter of the cluster they belong. The denomina-
McClain–Rao index: The McClain–Rao index calculates the mean ratio of tor is the minimum of the squared distances between all the cluster
intra-cluster and inter-cluster distances (McClain and Rao, 1975). The barycenters
index is defined as follows:
𝑁 𝑆 Scott–Symons index: The Scott–Symons index evaluates each cluster, the
𝐶= 𝐵 𝑊 (55) weighted sum of the logarithms of the variance–covariance matrix’s
𝑁𝑊 𝑆𝐵
27
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
determinant (Scott and Symons, 1971). where 𝑚𝑖,ℎ is the number of instances in cluster 𝐶𝑖 and in class 𝑐ℎ , 𝑚.,ℎ
( ) denotes the total number of instances in the class 𝑐ℎ , and 𝑚𝑙,. denotes
∑
𝐾
𝑊𝐺 {𝑘}
𝐶= 𝑛𝑘 log 𝑑𝑒𝑡 (60) the number of instances in cluster 𝐶𝑖 .
𝑘=1
𝑛𝑘
Rand Index: Rand index refers to the similarities between partitions of
where the clustering algorithm and the dataset’s underlying structure (Rand,
𝑊 𝐺{𝑘} are matrices and the determinants of the matrices are positive. 1971; Ezugwu et al., 2020a). The index is defined as follows:
𝑇𝑃 + 𝑇𝑁
Tau index: The Tau index can be defined as follows: 𝑅𝐴𝑁𝐷 = (67)
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
𝑠+ − 𝑠−
𝐶= √ ( ) (61)
𝑁𝑇 (𝑁𝑇 −1) F-measure: The consequence of equal weighting for the false positives
𝑁𝐵 𝑁𝑊 2 and false negatives is that we usually end up with undesirable features.
The F-measure index uses the weighting recall parameter 𝜂 > 0 to
when inter-cluster and intra-cluster distances are equal, the numerator
balance the false negatives (Rijsbergen, 1979; Ezugwu et al., 2020a).
is not affected because 𝑠+ 𝑎𝑛𝑑𝑠− do not count ties
The F-measure is defined as follows:
( 2 )
Trace_W index: This index is defined as follows (Edwards and Cavalli- 𝜂 + 1 .𝑃 .𝑅
Sforza, 1965): 𝐹 = (68)
𝜂 2 .𝑃 + 𝑅
𝐶 = 𝑇 𝑟 (𝑊 𝐺) = 𝑊 𝐺𝑆𝑆 (62) where 𝑃 is the precision rate and 𝑅 is the recall rate.
The range of recall starts from no effect (𝜂 = 0) to more effect as 𝜂
where WGSS is the within-cluster sum of squares and WG is the sum of increases (indicating a higher clustering quality).
all the clusters
Jaccard Index: The Jaccard index evaluates the ratio of the intersection
Trace_WiB index: The Trace_WiB index (Friedman and Rubin, 1967) can of both dataset elements and the union of the elements in both datasets.
be defined as follows The Jaccard index is defined as follows:
𝐴∩𝐵 𝑇𝑃
𝐶 = 𝑇 𝑟(𝑊 𝐺−1 .𝐵𝐺) (63) 𝐽 𝐽 (𝐴𝐵) = = (69)
𝐴∪𝐵 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
if 𝐴 and 𝐵 are empty, then 0 ≤ 𝐽 (𝐴𝐵) ≤ 1.
Wemmert–Gançarski index: The weighted mean of all quantities (𝐽𝑘 ) for
all the clusters is evaluated by the Wemmert–Gançarski index. This Fowlkes–Mallows Index: Fowlkes–Mallows index measures the compact-
index is defined as follows ness of clusters obtained from a clustering algorithm, maximizing
the index results in higher similarities (Fowlkes and Mallows, 2010;
1 ∑ ∑ ( )
𝐾
𝐶− max{0, 𝑛𝑘 − 𝑅 𝑀𝑖 } (64) Ezugwu et al., 2020a). The Fowlkes–Mallows index is defined as follows
𝑁 𝑘=1 𝑖∈𝐼 𝑘
√
where for M belonging to cluster 𝐶𝑘 𝑇𝑃 𝑇𝑃
𝐹𝑀 = . (70)
𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁
1 ∑ ( )
𝐽𝑘 = max{0, 1 − 𝑅 𝑀𝑖 }
𝑛𝑘 𝑖∈𝐼 NMI measure: The normalized mutual information (NMI) is defined as
𝑘
‖ ‖ follows:
‖𝑀 − 𝐺{𝑘} ‖
‖ ‖ 𝐼 (𝑋, 𝑌 )
𝑅(𝑀) = 𝑁𝑀𝐼 (𝑋, 𝑌 ) = √
min𝑘′ ≠𝑘 ‖
‖𝑀 − 𝐺 {𝑘′ } ‖
‖ 𝐻 (𝑋) 𝐻 (𝑌 )
(71)
28
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
involves a lot of statistical testing (Halkidi et al., 2002; Ezugwu et al., 7.2 Speech processing
2020a). Assuming the clustering problem is defined as thus:
Speech is an important communication medium among humans
‘‘ Let 𝑃𝑎𝑙𝑔 be the set of parameters associated with a specific clustering
and even animals as well. Large data sourced from communication
algorithm (e.g. the number of clusters 𝑛𝑐 ). Among the clustering schemes
on phone and tele-media platforms may require clustering methods
𝐶𝑖 , 𝑖 = 1, … , 𝑛𝑐 , defined by a specific algorithm, for different values of the
to discover knowledge. Sonkamble and Doye proposed a clustering
parameters in 𝑃𝑎𝑙𝑔 , choose the one that best fits the data set’’. (Halkidi et al., algorithm named Modified K-MeansLBG. They applied it to obtain a
2002; Ezugwu et al., 2020a). good codebook for vector quantization which is usually used in speech
The following cases hold: coding, image coding, speech recognition, speech synthesis and speaker
∙ 𝑃𝑎𝑙𝑔 does not contain 𝑛𝑐 as a parameter. recognition (Sonkamble and Doye, 2012). Rovetta et al. applied fuzzy
The idea here is to tune the parameter over a wide range of values clustering to the task of recognizing emotion from speech signals in
and run the clustering algorithm, then choose the maximum range for addition to the use of probabilistic, possibilistic and graded-possibilistic
which 𝑛𝑐 is constant. Normally, 𝑛𝑐 ≪ 𝑁 where 𝑁 is the number of c-means techniques (Rovetta et al., 2019). Neel (2005) used K-Means
tuples and fuzzy K-Means clustering to capture phonetic classification effec-
∙ 𝑃𝑎𝑙𝑔 contains 𝑛𝑐 as a parameter. tively. They used clustering to identify the kinds of phonetic features
Define a maximum and minimum range first, then run the algorithm to the group and discover automatic clustering efficiency in clustering
𝑟 times over each 𝑛𝑐 between the minimum and maximum range, tuning speech data. In another study, Bach and Jordan (2006) applied spectral
the parameters during each run. Then plot the best values of the index clustering to the blind one-microphone speech separation problem.
obtained against 𝜆. The plot may indicate the best cluster. They cast the problem as one of the segmentation of the spectrogram.
Vani and Anusuya (2019) investigated the performances of clustering
techniques (like K-Means, Fuzzy C means, and Kernel Fuzzy C Means
7 Trending application areas of clustering algorithms
algorithms) for clustering noisy speech signals. Single-space and Multi-
level clustering methods were applied to the speech processing system’s
Clustering algorithms can be applied to different domains. This sec- problem (Räsänen, 2007). Fig. 10 gives the taxonomy of clustering
tion provides diverse areas that cluster analysis that has been success- algorithm applications.
fully utilized. Specifically, we present the applicability of the clustering
algorithms reviewed in Section 4 to the field of medicine, financial 7.3 Medical science: Disease onset and progression
sector, artificial intelligence, aviation sector, marketing and sales sec-
tor, industries and manufacturing context, urban development, privacy The Healthcare sector is one of the predominant fields of human
protection, and robotics. Fig. 10 summarizes these fields by presenting endeavor that requires improvement alongside contemporary society’s
a taxonomy of the applicability of clustering algorithms. development. It is one of the main sectors that generally impact the
members of the public. Presently, clustering analysis has solely de-
7.1 Web usage pended on the transformation of healthcare services. Clustering algo-
rithm has tremendously contributed to disease diagnosis aspects of
The ever-growing electronic data size originating from web applica- healthcare services such as simplification of the ocular disease detection
tions’ proliferation has motivated the exploration of hidden information process, utilizing retinal blood vessels segmentation (Waheed et al.,
from text content. Nirkhi and Hande (2008) presented a summary on 2015), detection of tumors using K-Means clustering algorithm (Patel
the use of web page rank algorithms (such as Hyperlink Induced Topic et al., 2013), and detection of neovascularization in retina image
by employing multivariate m-Medoids clustering algorithms (Akram
Search, page rank) and web page-based clustering algorithms (such
et al., 2013). Besides the diagnosis, medical imaging is another crucial
as Suffix Tree, Vivisimo and Lingo clustering algorithms) (Nirkhi and
aspect of medicine that plays a role in inpatient treatment. It assists
Hande, 2008). Similarly, Ivancsy and Kovacs (2006) listed some newer
the medical research domain in investigating different parts of hu-
and relevant approaches to web clustering algorithms, such as Fuzzy
man anatomy and understanding the effects of particular illnesses. For
clustering.
instance, mean shift clustering has been employed in blood oxygen
The corresponding web server logs information for each user is ac-
level-dependent function MRI activation detection (Ai et al., 2014).
cessing a page, including IP address, time of access, file path, browser,
Also, semi-supervised clustering has been utilized for brain image
and amount of transferred data. Vast volumes of web server log data
segmentation (Saha et al., 2016). Similarly, fuzzy clustering – a hybrid
are generated every day, and they can be used for commercial and non-
approach – has been successfully applied to segment inhomogeneous
commercial applications such as designing online shops or providing medical images (Rastgarpour et al., 2014). Medical image analysis
users with personalized content in digital libraries. Madhulatha (2015) uses spectral clustering (Kuo et al., 2014). Recent research has been
supported this claim that clustering algorithms have been used for conducted on genetics and hype found in modern medicine to produce a
library book ordering, document classification, clustering web log data personalized treatment system. Thus, clustering analysis has facilitated
to discover groups of similar access patterns. Another study proposed such a research process (Datta and Datta, 2003; Aouf et al., 2008;
using the KEA-Means algorithm, which combines the keyphrase extrac- Oyelade et al., 2016).
tion algorithm and the K-Means algorithm to generate the number of In a study (Magoeva et al., 2018), authors investigated the possi-
web documents clusters from a dataset (Ware and Dhawas, 2012). Lin bility of applying some clustering algorithms (such as DBSCAN and
et al. applied a novel algorithm named hierarchical clustering (HSClus) K-Means) to detect critical patient patients conditions with coronary
to extract a similarity matrix among pages via in-page and cross- syndrome using medical parameter time series. The study successfully
page link structures. This resulted in clusters that hierarchically groups performed a preliminary analysis showing that outlier clustering is
densely linked web pages into semantic clusters (Lin et al., 2009). viable for potential outlier classification and analysis. Their results
Tang Rui et al. (2012) carried out an investigative study to discover revealed that the investigated clustering algorithms achieve a moderate
the possibilities of applying other nature-inspired-based optimization performance in critical patient detection. Another study (Newcomer
algorithms: Fireflies, Cuckoos, Bats, and Wolves for performing cluster- et al., 2011) reported using cluster analysis to identify sub-populations
ing over Web Intelligence data. Sardar and Ansari (2018a) applied the of complex patients who may benefit from targeted care management
K-Means algorithm using the MapReduce programming model to the strategies. This was achieved by using a massive 2-year cohort of health
task of document clustering (Sardar and Ansari, 2018a,b). maintenance organization members with 2 or more chronic conditions.
29
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
A new study proposes a medical-oriented big data clustering algorithm and identified cluster centers to carry out hierarchical segmentation.
using an improved immune evolutionary method (evolutionary algo- In another study (Nameirakpam and Jina, 2015), the authors combined
rithm and FCM) by applying the clustering algorithm to medical data, local histogram equalization and K-Means clustering for image segmen-
including encoding, constructing fitness function, and selecting genetic tation. Bora and Gupta carried out a similar study using a hard K-Means
operators (Yu et al., 2020). To unravel patient heterogeneity, Forte clustering algorithm with cosine distance measure while filtering and
et al. (2019) applied some selected hard and soft clustering algorithms analyzing the segmented image with the Sobel filter and watershed
(based on the expected overlap between sub-phenotypes and the size algorithm (Bora and Gupta, 2014).
of the dataset) to conduct a panorama of clustering analyzes using het- Gulhane et al. (2012) investigated the association that exists be-
erogeneous and complex ICU data. In a different study (Venkataramana tween the density of data points in the given data set of pixels of an
et al., 2017), the authors attempted to apply the clustering method image using K-Means and M-step differently. Parida (2018) carried out
(Fuzzy C-Means and K-Means) to speed up the process of analyzing image segmentation using fuzzy C-means clustering by applying it to
patients’ samples in high volume hospitals to aid decide on the stage the variance feature image to separate its transitional features (Parida,
of the disease. 2018). An attempt to apply clustering algorithms to the novel Covid-
19 disease was made by the authors in ElazizID et al. (2020). They
7.4 Image processing and segmentation carried out image segmentation on COVID-19 Computed Tomography
(CT) images. They proposed a clustering method to improve the density
peaks clustering (DPC) combined with generalized extreme value (GEV)
Clustering methods have been applied to the problem of image
distribution.
segmentation and image processing. While image processing describes
a situation where a wide range of likely computational operations
7.5 Information retrieval
is (are) applied to an image for knowledge discovery of image re-
finement, a component of image processing is image segmentation.
The field of information retrieval is focused on finding an effec-
Image segmentation can be expressed as exhaustive partitioning of an
tive computational approach to automate the storage and retrieval
input image into regions so that each is considered homogeneous con-
of documents. This information retrieval task has proven relevant
cerning some image property. Chopade and Sheetlani (2017) reported
in online search engines and library management systems. A study
that studies had employed evolutionary fuzzy clustering methods with
(Manning et al., 2009) summarized potential application areas such as
knowledge-based evaluation to handle image segmentation problems.
search result clustering, scatter–gather, collection clustering language
The approach of the GAs clustering method was applied to the prob- modeling, cluster-based retrieval, for the use of clustering methods
lem, which was formulated as an optimization so that clustering of the related analysis to the problem of information retrieval. Vries
small regions in color feature space was achieved. A study by Saxena (2014) used two novel clustering algorithms (TopSig Ktree and EM-
et al. (2017) reported that K-Means had been successfully applied to tree) to evaluate document cluster quality for large scale collections
image segmentation. The quality of images presented by Magnetic by first performing document clustering in information retrieval. The
Resonance Imaging (MRI) to visualize objects and living organisms’ authors in Prabhu (2011) designed a technique for information retrieval
internal structures made it a candidate for the segmentation task. This using K-Means clustering. Elbattah and Molloy (2017) used K-Means to
task can be formulated as a clustering problem from MRI images so that support decision-making regarding elderly healthcare in Ireland, with
feature vectors obtained through transformation image measurements a particular focus on hip fracture care. Another study (Bellot and El-
and pixel positions are grouped into several structures. Dhanachan- Bèze, 2000) applied hierarchical clustering methods to classify a set of
dra et al. combined K-Means and subtractive cluster algorithms for documents retrieved by any information retrieval system. Liu and Croft
image segmentation. They achieved this by pre-processing the image performed cluster-based retrieval using K-Means in conjunction with
using partial stretching enhancement and the hybridized algorithm to language models to the task of information retrieval (Liu and Croft,
generate the initial centers. 2004).
The K-Means algorithm then uses the initial centers to segment
the image before removing unwanted regions using a medial filter 7.6 Aviation and automotive systems
(Dhanachandra et al., 2015). In a similar work (Chena et al., 2015), the
authors applied the use of the DP clustering algorithm. They directly Clustering methods have been applied to aviation and automotive-
provided the cluster number of the image based on the decision graph related issues. Specifically, clustering algorithms have been used to
30
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
detect a fault, manage emergencies, and proactive risk management. Li data into different groups whose signals are similar to each other in
et al. (2015) investigated the applicability of cluster-based solutions to biomedical signals (Nunes, 2011). Two variants of the K-Means genetic
detect anomalies regarding flights aiding detection of associated risks algorithm selection process (KGA) were applied to the selection process
from routine airline operations. They used DBSCAN to achieve a cluster in the GA algorithm. These algorithms can be applied to medical
analysis that successfully detected abnormal flights. Also, multiple informatics and bioinformatics (Chehouri et al., 2017). In the study (Ko-
kernel anomaly detection was able to identify operationally significant rdos and Blachnik, 2019), the authors attempt to overcome a genetic
anomalies, surpassing the capability of exceedance detection. In a algorithm’s limitation by applying a fuzzy c-means (FCM) clustering
related study, authors applied trajectory clustering by deriving a frame- algorithm to reduce the chromosome size and solve the clusters border
work to monitor aircraft behavior in given airspace (Gariel et al., 2011). problem allowing the clusters to overlap.
This trajectory clustering method was achieved by using K-Means and
DBSCAN algorithms. Wang and Pham proposed using ANOVA and 7.8 Financial systems and economics
Scheffe post hoc with a clustering analysis task to evaluate and improve
service networks operated at airports (Wang and Pham, 2020). A The sensitivity and level of security mechanisms built around the
similar study (Rose et al., 2020) applied the K-Means algorithm with systems associated with banking and financial institutions require that
a combination of 2-D mapping and t-Distributed Stochastic Neighbor the data generated from them be intelligently handled by automated
Embedding (t-SNE) for a cluster post-processing routine which identi- systems. Cai et al. reported that banking and financial institutes are
fies driving factors in each cluster and building a hierarchical structure widely using clustering methods to capture the natural structure of
of cluster and sub-cluster labels. Mangortey et al. (2020) investigated data in a quest to improve their customer services and profit margins
the performance of Model-based Clustering, Self-Organizing Tree Algo- (Cai et al., 2010). Dardac and Boitan carried out a study to investigate
rithm (SOTA), Divisive Analysis (DIANA), Agglomerative Hierarchical, the performance clustering analysis on the data from Romania institu-
PAM, CLARA, K-Means clustering algorithms. The algorithms were tions. They aimed to uncover the bank’s risk profile by grouping such
used to group similar flights and identify abnormal operations and institutions into smaller, homogeneous clusters to assess which credit
anomalies. institutions have similar patterns according to their risk profile and
profitability (Dardac and Boitan, 2019). In another study, the author
7.7 Bioinformatics presented research into main cluster-type methodologies for grouping
data to use them in economic fields (Stefan, 2014). The study selected
The application of clustering methods to bioinformatics can be six macroeconomic indicators necessary to reveal a country’s economic
grouped into two, namely, (i) the analysis of gene expression data development and then applied hierarchical-based cluster methods on
generated from DNA microarray technologies and (ii) clustering pro- some sets of complex and heterogeneous data. In a related study,
cesses that directly work on linear deoxyribonucleic acid (DNA) or Brauksa (2013) applied K-Means for cluster analysis in comparing
protein sequences (Saxena et al., 2017). Gene expression analysis has socio-economic development of different municipalities. Another study
benefited from clustering methods. It enables biologists to identify curated economic data and carried out clustering analysis using K-
patterns within datasets relating to this domain. A huge amount of data clustering and proposed techniques that use similarity measures for
on molecular biology is being generated from different experimenta- data files with nominal variables (Řezanková, 2014). In a different
tion in this domain. Such data consists of gene expression features, consideration of the applicability of clustering methods, authors in
consisting of conclusions regarding the amount of the corresponding Novaliendry et al. (2015) performed clustering using K-Means to ease
gene product and concurrent measurement of the gene expression level the process of handling extensive data of a systematic, detailed list of
for thousands of genes under hundreds of conditions. Data generated receipts, expenditures, and local spending within a year. An interesting
in this form is often represented in a Data Matrix, in which rows improved K-Means algorithm based on the historical financial ratios
are genes and columns are different experimental conditions, different was applied to analyze indicators related to economic attributes listed
tissues, consecutive time slots, or different patients. Other tasks that enterprises in Zhejiang province (Qian, 2006). A different study (Boyko
have benefited from clustering algorithms include: identification of et al., 2020) combined apriori and K-Means clustering algorithms to get
homology; discovering natural structure inherent in gene expression a user behavior analysis template to predict a person location for the
data; knowing subtypes of cells; understanding gene functions and gene next month.
regulation; and mining useful information from noisy data (Oyelade The banking sector is one of the key thriving sectors at the forefront
et al., 2016; Guzzi et al., 2014). of global digitization. However, numerous threats surfaced to hamper
The authors in Chopade and Sheetlani (2017) disclosed that in the digital banking initiative’s advancement in digital banking devel-
handling microarray data analysis, researchers employed the clustering opment. Thus, clustering analysis can offer a convenient solution to
method to group thousands of genes by their similarities of expres- such threats. One of the big threats expected in the banking sector is
sion levels, thereby supporting the task of analyzing gene expression money laundering. The money laundering issue can be resolved by im-
profiles. Similarly, it is widely reported that Genetic Convex Cluster- plementing Density-Based Spatial Clustering of Applications with Noise
ing Algorithms (GCCA) has been successfully applied to address the (DBSCAN) clustering in the "Anti Money Laundering Regulatory Appli-
problem of clustering on an unbounded number of processors. A study cation System’’. In such a case, DBSCAN is employed for detecting and
in Sugavaneswaran (2017) stated that clustering algorithms had been reporting suspicious banking transactions. However, the anti-money
used in gene expression. The genes are analyzed and grouped based on laundering regulatory application system has been effectively evaluated
similarity in profiles using one of the widely used K-Means clustering on large financial data where it has successfully detected and prevented
algorithms. Lakhani et al. reported that to analyze biological sequences money laundering on likely suspicious transactions (Yang et al., 2014).
and group them into similar genes, clustering algorithms like evolution- Every banking institution must acknowledge the reality of the
ary clustering, hierarchical clustering, K-Means, and bi-clustering have threats emanating from customers’ ignorance of basic safe banking tips
been successfully applied to the domain (Lakhani et al., 2015). Mishra like giving out confidential information such as security passwords and
et al. also strengthen the fact that the K-Means clustering algorithm pin to strangers, resulting in huge bank fraud. Banking institutions can
has proven relevant in bioinformatics (Mishra et al., 2015). Research apply clustering analysis to fish out such customers and give extra
in Zahoor and Zafar (2020) proposed a warzone-inspired infiltration warning of being cautious about such activities and the overall notice
tactics-based optimization algorithm (ITO) to classify microarray gene provided to all the customers. K-Means++ and K-Means clustering ap-
expression. Nunes (2011), in a Master thesis, proposed time series- proaches have recorded tremendous success in appraising cardholders’
based clustering algorithms for separating and organizing unlabeled share vulnerable to bank scams such as skimming, fishing, fishing, etc.
31
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
(Alkhasov et al., 2015). Besides the threat of mitigation in the banking modified variable string length genetic algorithm (MVGA) has success-
institution, clustering could also provide other vital applications such as fully applied text clustering (Chopade and Sheetlani, 2017). K-Means
location siting for optimal business extension. Every banking institution clustering algorithms have also been successfully applied to analyz-
would like to expand the service covering locations to attract more ing news headlines across different portals by using document pre-
customers, thereby growing the organizational profit margins. This can processing techniques and creating clusters of similar news headlines
be implemented by installing e-corners and ATMs strategically to cover (Lama, 2013).
important economic locations and create branches at locations with Using an improved version of K-Means (Tunali et al., 2015) ap-
excessive demand. plied a spherical K-Means (SKM) algorithm named multi-cluster SKM
We can observe that the location sitting option plays a vital role in clustering high dimensional document collections with high perfor-
expanding the financial economy. The solution to such a task can be ad- mance and efficiency. Similarly, Lydia et al. (2018) applied K-Means
dressed by applying the generalized density-based clustering algorithm to the document clustering process task while calculating the cen-
(GFDBSCAN) cluster algorithm (Kisore and Koteswaraiah, 2017). The troid similarity, and cluster similarity was achieved with Euclidean
contemporary world is mostly reliant on industrial relationships, and Distance Similarity. Another study (Kalyanasundaram et al., 2015)
financial crises are crucial to any nation. Thus, long-term prediction applied K-Means to perform document clustering using supplementary
of failure in the banking sector presents a significant confrontation. information and the content for generating clusters with higher purity.
Happily, Fuzzy refinement domain adaptation offers a good answer to The study in Sathya Priya and Priyadharshini (2012) proposed using
such a challenge (Behbood et al., 2013). semantic clustering and feature selection method to improve the clus-
tering and feature selection mechanism with semantic relations of the
7.9 Robotics text documents. Abualigah et al. solved the problem of text clustering
by leveraging nature-inspired clustering-based algorithms, including
Asmaa & Sadok presented a cluster-based solution to the Multi- Harmony Search (HS) Algorithm, Genetic Algorithm (GA), Particle
Robot Task Allocation (MRTA) problem (Asmaa and Sadok, 2019). The Swarm Optimization (PSO) Algorithm, Ant Colony Optimization (ACO),
authors proposed a clustering algorithm based on a dynamic distributed Krill Herd Algorithm (KHA), Cuckoo Search (CS) Algorithm, Gray Wolf
PSO technique and called it ACD2 PSO. The new clustering algorithm Optimizer (GWO), and Bat-inspired Algorithm (BA) (Abualigah et al.,
first groups the robot tasks into clusters using the dynamic distributed 2020).
particle swarm optimization (D2 PSO) and, after that, allocates the
robots to the clusters using the concept of multiple traveling salesman 7.11 Video surveillance
problems (MTSP). In another study, Guérin et al. (2018) addressed the
problem of unsupervised robotic sorting (URS) using the combination Text-based data may appear to be increasing in size on the internet
of deep CNN feature extraction and standard clustering algorithms and across different automated systems, but sometimes, the amount
(K-Means, Minibatch K-Means (MBKM), Affinity Propagation, Mean of information or knowledge we can discover from them might not
Shift, Agglomerative Hierarchical Clustering, and DBSCAN) to obtain be comparable with video files. A small video file can contain more
an industrial robotic system. Using hierarchical clustering, Arslan et al. information than text documents and other media files such as audio
(2016) addressed the problem of feedback motion planning and control and images. Asad et al. in Asad et al. (2019) proposed a clustering
to achieve a provably correct, computationally efficient coordinated algorithm that variation of the Hierarchical Agglomerative Clustering
multi-robot motion design. Boldt-Christmas and Wong (2015) applied algorithm. The algorithm was designed to cluster face images of hu-
to shift clustering algorithm that allows for the placement of robotic mans. Another study (Auslander et al., 2011) applied the combination
agents anywhere within the communicative range. Janati et al. applied K-Means clustering algorithm and the k-NN Localized 𝑝-value Estimator
the K-Means clustering algorithm to solve the problem of assigning (KNN-LPE) to surveillance tasks for detecting threats in the domain of
tasks to robots for many tasks, and robots can handle their assigned ground-based maritime video surveillance. Authors in Wu et al. (2015)
tasks efficiently and efficiently (Janati et al., 2017). formulate static video summarization as a clustering problem by using
a new clustering algorithm (named Video Representation based High-
7.10 Text mining Density Peaks Search (VRHDPS)) to analyze properties of video and
gather similar frames into the cluster. In another related work on video
Text mining applications are essential to research, considering the surveillance, Damnjanovic et al. applied a spectral clustering algorithm
ever-growing databases across different platforms. It is widely used that detects events from video through two types of summaries: static
in several sectors such as publishing and media; telecommunications, and dynamic (Damnjanovic et al., 2008).
energy, and other services industries; information technology sector
and Internet; banks, insurance and financial markets; political insti- 7.12 Marketing
tutions, political analysts, public administration and legal documents;
pharmaceutical and research companies and healthcare (G. & K., 2015). Considering the wide range of clustering algorithms available in the
With this wide range of applicability, clustering analysis has also been research domain, different studies have applied them to the problem
researched and applied to the task of text mining. of analyzing and predicting customers so that services are provided
The automation of extracting knowledge from text documents ex- for them based on their requirements. In addition to this, automating
ploits the solutions provided by clustering algorithms to fast-track this market segmentation tasks, new product development, and product
task. In-text documents mining, clusters are built based on themes positioning have become possible through clustering methods.
from text documents. This is often achieved by first transforming text Marketers have taken advantage of the automation of customer
documents into high-dimensional feature vectors based on frequency recommendation systems for processing reviews from customers. Clus-
features. This results in a data matrix that contains rows and columns, tering methods are now being widely applied to this task by using
where columns represent the count of one particular term. Further- captured customer reviews and grouping them into reviews with similar
more, the processed documents are grouped based on the frequency of preferences for market analysis. Such analysis provides markets with
words and similar words in a subset of terms; these subsets are related means for strategizing on aggressive marketing to beat or outper-
to the theme of the documents. This may further reveal some important form their market competitors. Clustering algorithms have provided
knowledge about the documents ranging from knowing the different solutions to aid this process by grouping customers with overlapping
themes a document may consist of and how to deduce the category preferences based on product type. K-Means and EM clustering al-
to which the document may be assigned. Studies have shown that the gorithms were used in Hanumanth Sastry and PrasadaBabu (2013)
32
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
to analyze a steel’s annual sales data to analyze sales volume and transform high-dimensional time series data onto a lower-dimensional
value concerning dependent attributes like products, customers and space by selecting important points in the time series (Fuad, 2017).
quantities sold. In a similar report, the author confirmed that clustering Nanda (2014) addressed the problem of efficiently handling large
methods had been applied to finding groups of customers with similar data related to environmental disaster management by proposing an
behavior given a large database of customer data containing their enhancement to nature-inspired clustering algorithms by developing
properties and past buying records (Madhulatha, 2015). Rajagopal multi-objective and constrained approaches (Nanda, 2014). Sardar and
attempted to identify the high-profit, high-value and low-risk cus- Ansari (2018a,b) emphasized the need for maximizing the MapRe-
tomers by one of the data mining techniques — customer clustering duce programming paradigm for handling clustering challenges in real-
(Rajagopal, 2011); Örnek & Subaşıdeveloped a methodology using K- world large dataset clustering through evolving new partition-based
Means and E-M to identify the characteristics of customers (Örnek clustering (Sardar and Ansari, 2018a,b).
and Subaşı, 2011). A study (M., Hanji, & Hanji, 2014) attempted
to achieve market segmentation of customers by using enhanced K- 7.16 Data transfer through network
Means, which uses similarity measures for efficient segmentation of
two-wheeler market data. In a similar study, Piggott (2015) designed Nowadays, huge user-generated data are shared online via social
a new airline-based market segment model with data clustering by media sites and forums. An effective transmission system is required
investigating the performances of K-Means, expectation–maximization to attain high-speed data transmission and prevent setbacks. However,
(EM), X-means, hierarchical and random clustering. the application of clustering analysis assists in such a situation. For
instance, DBSCAN, AutoClass algorithm, and K-Means can be used to
7.13 Object recognition and character recognition group similar traffic detection with transport layer statistics centered
on the application’s unique features as they transmit via a particular
A wide range of clustering methods and algorithms has been pro- network (Erman et al., 2006). This facilitates data transfer by transfer-
posed for grouping views of 3D objects for object recognition in range ring a cluster of the same traffic and the network. It is preferred that
data. This is made possible by first representing objects in terms of sensor data be transmitted via the sensor network to the processing end
a library of range images of that object to make it possible to apply be achieved using the lowest energy. In such an instance, hierarchical
clustering tasks. Some studies, such as Saxena et al. (2017), Chel- clustering has greatly impacted such tasks by significantly decreasing
lapilla et al. (2006) and Connell and Jain (1999), identified lexemes in power consumption (Erman et al., 2006). This can be performed by
handwritten text for writer independent handwriting recognition using classifying wireless sensor networks into clusters.
clustering algorithms.
7.17 Urban development
7.14 Data mining and big data mining
33
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Fig. 11. A distribution of application of clustering analysis among the studies summarized in Section 6.1 through 6.14.
challenge in information security settings is identifying which fam- improvements that could be done in that specific clustering algorithm
ily a particular malware belongs to and the signature generation of rather than invest research efforts on a clustering algorithm that may
anti-virus systems. Malware that contradicts normal data is signifi- not be profitable to that particular domain of interest.
cantly regarded as outliers and, as a result, eliminated from normal To further study the trends indeed applying clustering methods to
data clusters. However, normal clustering techniques such as x-means different domains of interest, this study plotted a graph of categoriza-
and K-Means and co-clustering algorithms have been successfully em- tion of the use of all clustering algorithms in each domain/field. Again,
ployed for network anomaly detection (Ahmed et al., 2016). Behavioral this was carried out to check the peak and distribution of frequency of
malware-clustering has been made possible by exploiting hierarchical applicability of clustering algorithms in each group. For instance, the
(single-linkage) clustering to secure data from attack (Biggio et al., field of financial and economic institutions/systems attracts clustering
2014). Even though K-Means clustering has succeeded in anomaly- algorithms more than other fields of human endeavors. In contrast,
based intrusion detection, the modified cluster labeling algorithm and other clustering algorithms are least adopted for some fields. This
’Opt-Grid Clustering’ are more capable of attaining enhanced perfor- leaves a thoughtful researcher with questions about investigating each
mance in the high-performance intrusion identification domain (Ishida domain’s problem’s characteristics versus the properties of the clus-
et al., 2005). Clustering algorithms have made a tremendous contri- tering algorithms of interest and how both maps efficiently solve the
bution to the domain of computer security. Thus, a data clustering problem in the field.
algorithm can deal extensively with cyber-attacks and simultaneously .
improve data privacy and data mining.
As reviewed in the sub-sections above, a summary of the appli- 8 Concluding remark
cations of clustering methods to different domains is presented in
graphical format. Our interest is to visually glance at the most applied Clustering is a powerful data mining and analysis tool used in many
clustering algorithm or methods and distribution of such methods’ fields, including machine learning, bioinformatics, robotics, pattern
application in each domain. recognition, and image analysis. Identifying the number of clusters
The graph displayed in Fig. 11 shows the distribution of publications apriori is the most fundamental problem in cluster analysis. Speci-
reviewed across the applicability of clustering algorithms. This study fying the correct number of clusters apriori can help obtain optimal
found that image segmentation/processing, medicine, and bioinformat- solutions to many clustering problems. Because of this, automatic
ics attract more research interest, applying clustering algorithms or clustering algorithms are taking over traditional clustering algorithms.
methods to tackle data object grouping problems. This is followed Automated clustering algorithms are designed to perform clustering
by data mining and financial or economic-related application areas. without the prior knowledge of data sets. They also can determine
The findings in this study will give prospective research work im- the optimal number of clusters in noisy datasets. This study presents a
petus to investigate the applicability of clustering algorithms to the comprehensive and up-to-date survey of traditional and state-of-the-art
anticipated domain and to know the extent and number the clustering clustering algorithms. The paper will be beneficial for both practi-
algorithms are applied. This will also allow examining the application tioners and researchers. It outlines the strengths and weaknesses of
of newer or even hybrid clustering algorithms to a domain of choice various clustering algorithms. Moreover, it presents many open issues
depending on the trend observed, as shown in Fig. 12. Furthermore, in that interested researchers and practitioners can explore. It also offers
Fig. 13, we studied and presented how each clustering algorithm has valuable insights into clustering algorithms’ practical applications to
been applied across domains. This study discovered that while some different sectors, including medical science, image processing, robotics,
clustering algorithms enjoy an extensive range of applicability, others aviation, automotive, financial systems, and big data mining.
suffer. This could be pointers to many research opportunities for keen The findings in this study show that the field of bioinformatics,
researchers. For instance, one could investigate a clustering algorithm’s financial systems and economics, image processing, and segmentation
properties that make it suitable for application and enjoy adoption attracts more research interest than other domains, such as data mining
by many in a particular domain. This could further reveal possible and object recognition. It also shows that the K-Means algorithm, Fuzzy
34
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Fig. 12. An illustration of the distribution of the application of the clustering algorithms/methods across all studies considered in this section.
Fig. 13. Trend of the applicability of clustering algorithms/methods to different domains across all studies considered in this section.
35
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
C-Means, and Hierarchical clustering technique are among the most Al-Jabery, K., Obafemi-Ajayi, T., Olbricht, G., Wunsch, D., 2019. Computational
widely used clustering algorithms in the literature. Moreover, this study Learning Approaches to Data Analytics in Biomedical Applications. Academic Press.
Alguwaizani, A., Hansen, P., Mladenović, N., Ngai, E., 2011. Variable neighborhood
shows that many clustering algorithms (such as nature-inspired ones)
search for harmonic means clustering. Appl. Math. Model. 35 (6), 2688–2694.
have not been explored fully. These findings should give impetus for Aliniya, Z., Mirroshandel, S.A., 2019. A novel combinatorial mergesplit approach for
future research studies. Future research can investigate the applicability automatic clustering using imperialist competitive algorithm. Expert Syst. Appl.
of clustering algorithms to different domains. 117, 243–266.
Furthermore, future studies can investigate the characteristics of Aljalbout, E., Golkov, V., Siddiqui, Y., Strobel, M., Cremers, D., 2018. Clustering with
deep learning: Taxonomy and new methods.
the problems experienced in different domains versus the properties Alkhasov, S.S., Tselykh, A.N., Tselykh, A.A., 2015. Application of cluster analysis for the
of different clustering algorithms and how both maps efficiently solve assessment of the share of fraud victims among bank card holders. In: Proceedings
problems in other application domains. Moreover, future studies can in- of the 8th International Conference on Security of Information and Networks.
vestigate the application of newer or even hybrid clustering algorithms Sochi Russia, pp. 103–106, September 8 - 10. http://dx.doi.org/10.1145/2799979.
2800033.
to a field of choice depending on various trends shown in this survey.
Alshamiri, A.K., Surampudi, B.R., Singh, A., 2015. A novel ELM K-means algorithm for
This survey will serve as a good reference point for researchers and clustering. In: Panigrahi, B., Suganthan, P., Das, S. (Eds.), Swarm, Evolutionary,
practitioners to design improved and efficient state-of-the-art clustering and Memetic Computing. SEMCCO 2014. In: Lecture Notes in Computer Science,
algorithms. vol. 8947, Springer, Cham, http://dx.doi.org/10.1007/978-3-319-20294-5_19.
Amini, A., Wah, T.Y., Teh, Y.W., 2012. DENGRIS-Stream: A density-grid based clustering
algorithm for evolving data streams over sliding window. In: Proc. International
Declaration of competing interest
Conference on Data Mining and Computer Engineering. pp. 206–210.
Anter, A.M., Hassenian, A.E., Oliva, D., 2019. An improved fast fuzzy c-means using
The authors declare that they have no known competing finan- crow search optimization algorithm for crop identification in agricultural. Expert
cial interests or personal relationships that could have appeared to Syst. Appl. 118, 340–354. http://dx.doi.org/10.1016/j.eswa.2018.10.009.
influence the work reported in this paper. Antunes, C., Oliveira, A.L., 2001. Temporal data mining: an overview. In: KDD
Workshop on Temporal Data Mining. pp. 1–13.
Aouf, M., Lyanage, L., Hansen, S., 2008. Review of data mining clustering techniques
References to analyze data with high dimensionality as applied in gene expression data
(june 2008). In: 2008 International Conference on Service Systems and Service
Abonyi, J., Feil, B., 2007. Cluster Analysis for Data Mining and System Identification. Management. IEEE, Melbourne, VIC, Australia, pp. 1–5.
Birkhäuser: Springer Science & Business Media, Basel. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., PéRez, J.M., Perona, I., 2013. An extensive
Abualigah, L.M.Q., 2019. Feature Selection and Enhanced Krill Herd Algorithm for Text comparative study of cluster validity indices. Pattern Recognit. 46 (1), 243–256.
Document Clustering. Springer, Berlin, pp. 1–165. Arslan, O., Guralnik, D.P., Koditschek, D.E., 2016. Clustering-based robot navigation
Abualigah, L., Gandomi, A.H., Elaziz, M.A., Hussien, A.G., Khasawneh, A.M., Alshin- and control. In: 2016 IEEE International, Conference on Robotics and Automation,
wan, M., et al., 2020. Nature-inspired optimization algorithms for text document Workshop on Emerging Topological Techniques in Robotics.
clustering—A comprehensive analysis. Algorithms 13 (345), 1–32. Asad, M.-U., Mustafa, R., Hossain, M.S., 2019. An efficient strategy for face clustering
Abualigah, L.M., Khader, A.T., Hanandeh, E.S., 2018a. A new feature selection method use in video surveillance system. In: Joint 2019 8th International Conference
to improve the document clustering using particle swarm optimization algorithm. on Informatics, Electronics & Vision (ICIEV) & 3rd International Conference on
J. Comput. Sci. 25, 456–466. Imaging, Vision & Pattern Recognition (IVPR). pp. 12–17.
Abualigah, L.M., Khader, A.T., Hanandeh, E.S., 2018b. Hybrid clustering analysis using Askarzadeh, A., 2016. A novel metaheuristic method for solving constrained engineering
improved krill herd algorithm. Appl. Intell. 48 (11), 4047–4071. optimization problems: Crow search algorithm. Comput. Struct. 169, 1–12. http:
Ackermann, Marcel R., Martens, Marcus, Raupach, Christoph, Swierkot, Kamil, Lam- //dx.doi.org/10.1016/J.COMPSTRUC.2016.03.001.
mersen, Christiane, Streamkm, Christian Sohler, 2012. A clustering algorithm for Asmaa, A., Sadok, B., 2019. PSO-based dynamic distributed algorithm for automatic
data streams. ACM J. Exp. Algorithmics 17 (4). task clustering in a robotic swarm. In: Procedia Computer Science, 23rd Interna-
Adil, Fahad, Najlaa, Alshatri, Zahir, Tari, Abdullah, Alamri, Ibrahim, Khalil, Al- tional Conference on Knowledge-Based and Intelligent Information & Engineering
bert, Zomaya, Sebti, Foufou, Abdelaziz, Bouras, 2014. A survey of clustering Systems, Vol. 159. pp. 1103–1112.
algorithms for big data: Taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Auslander, B., Gupta, K.M., Aha, D.W., 2011. A Comparative Evaluation of Anomaly De-
Comput. 2, 267–279. http://dx.doi.org/10.1109/TETC.2014.2330519. tection Algorithms for Maritime Video Surveillance. Knexus Research Corporation,
Agarwal, P., Alam, M.A., Biswas, R., 2011. Issues, challenges and tools of clustering Springfield, pp. 1–15.
algorithms. arXiv preprint arXiv:1110.2610. Bach, F.R., Jordan, M.I., 2006. Learning spectral clustering, with application to speech
Aggarwal, C.C., Hinneburg, A., Keim, D.A., 2000. On the Surprising Behavior of separation. J. Mach. Learn. Res. 1963–2001.
Distance Metrics in High Imensional Space. IBM Research report, RC 21739. Baker, F.B., Hubert, L.J., 1975. Measuring the power of hierarchical cluster analysis.
Aggarwal, C.C., Philip, S.Y., Han, J., Wang, J., 2003. A framework for clustering J. Amer. Statist. Assoc. 70, 31–38.
evolving data streams. In: Proceedings 2003 VLDB Conference. Morgan Kaufmann, Balavand, A., Kashan, A.H., Saghaei, A., 2018. Automatic clustering based on crow
pp. 81–92. search algorithm-kmeans (CSA-Kmeans) and data envelopment analysis (DEA). In:
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S., 1999. Fast algorithms International Journal of Computational Intelligence Systems, Vol. 11.
for projected clustering. In: Proceedings of the ACM SIGMOD Conference, 61-72, Ball, G.H., Hall, D.J., 1965. ISODATA, a Novel Method of Data Analysis and Pattern
Philadelphia, PA. Classification. Stanford research inst, Menlo Park CA.
Aggarwal, C.C., Zhai, C., 2012. A survey of text clustering algorithms. In: Mining Text Bandaru, S., Ng, A.H., Deb, K., 2017. Data mining methods for knowledge discovery
Data. Springer, Boston, MA, pp. 77–128. in multi-objective optimization: Part A-Survey. Expert Syst. Appl. 70, 139–159.
Aghabozorgi, Saeed, Shirkhorshidi, Ali Seyed, Wah, Teh Ying, 2015. Time-series http://dx.doi.org/10.1016/j.eswa.2016.10.015.
clustering – A decade review. Inf. Syst. 53, 16–38. Bandyopadhyay, S., Saha, S., 2008. A point symmetry-based clustering technique for
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., 1998. Automatic subspace automatic evolution of clusters. IEEE Trans. Knowl. Data Eng. 20, 1441–1457.
clustering of high dimensional data for data mining applications. In: Proceedings Banfield, J.D., Raftery, A.E., 1993. Model-based Gaussian and non-Gaussian clustering.
of the ACM SIGMOD Conference, 94-105, Seattle, WA. Biometrics 80, 3–821.
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., 2005. Automatic subspace Behbood, V., Lu, J., Zhang, G., 2013. Fuzzy refinement domain adaptation for long
clustering of high dimensional data. Data Min. Knowl. Discov. 11 (1), 5–33. term prediction in banking ecosystem. IEEE Trans. Ind. Inf. 10 (2), 1637–1646.
Ahmad, A., Dey, L., 2007. A k-mean clustering algorithm for mixed numeric and Behzadi, S., Müller, N.S., Plant, C., Böhm, C., 2020. Clustering of mixed-type data
categorical data. Data Knowl. Eng. 63 (2), 503–527. considering concept hierarchies: problem specification and algorithm. Int. J. Data
Ahmed, T., Shaffer, P., Connelly, K., Crandall, D., Kapadia, A., 2016. Addressing phys- Sci. Anal. 10 (3), 233–248.
ical safety, security, and privacy for people with visual impairments. In: Twelfth Belkin, M., Niyogi, P., Sindhwani, V., 2006. Manifold regularization: A geometric
Symposium on Usable Privacy and Security (SOUPS 2016, Denver, Colorado, pp. framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res.
341–354. 7 (11).
Ai, L., Gao, X., Xiong, J., 2014. Application of mean-shift clustering to Blood oxygen Bellot, P., El-Bèze, M., 2000. A clustering method for information retrieval.
level dependent functional MRI activation detection. BMC Med. Imaging 14 (1), Benabdellah, A.C., Benghabrit, A., Bouhaddou, I., 2018. A survey of clustering
1–10. algorithms for an industrial context. In: Procedia Computer Science, Second
Aitkin, M., Rubin, D., 1985. Estimation and hypothesis testing in finite mixture models. International Conference on Intelligent Computing in Data Sciences (ICDS 2018),
J. R. Stat. Soc. B 47, 67–75. Vol. 148. pp. 291–302.
Akram, M.U., Khalid, S., Tariq, A., Javed, M.Y., 2013. Detection of neovascularization Benabdellah, Abla Chouni, Benghabrit, Asmaa, Bouhaddou, Imane, 2019. A survey
in retinal images using multivariate m-mediods based classifier. Comput. Med. of clustering algorithms for an industrial context. Procedia Comput. Sci. (ISSN:
Imaging Graph. 37 (5–6), 346–357. 1877-0509) 148, 291–302.
36
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Berkhin, P., Beche, J.D., Randall, D.J., 2001. Interactive path analysis of web site Chen, Yixin, Tu, Li, 2007. Density-based clustering for real-time stream data. In
traffic. In: Proceedings of the Seventh ACM SIGKDD International Conference on Proceedings of the 13th ACM SIGKDD International Conference on Knowledge
Knowledge Discovery and Data Mining. pp. 414–419. http://dx.doi.org/10.1145/ Discovery and Data Mining, 2007.
502512.502574. Chena, Z., Qi, Z., Meng, F., Cui, L., Shi, Y., 2015. Image segmentation via improving
Bezdek, J.C., 2013. Pattern Recognition with Fuzzy Objective Function Algorithms. clustering algorithms with density and distance. In: Procedia Computer Science,
Springer Science & Business Media. Springer, Berlin. Information Technology and Quantitative Management (ITQM 2015), Vol. 55. pp.
Bezdek, J.C., Pal, N.R., 1998. Some new indexes of cluster validity. IEEE Trans. Syst. 1015–1022.
Man Cybern. B 28 (3), 301–315. Cheng, Y., 2002. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal.
Bhattacharjee, P., Mitra, P., 2021. A survey of density based clustering algorithms. Mach. Intell. 17 (8), 790–799, (1995) Comaniciu and Meer.
Front. Comput. Sci. 15 (1), 1–27. Cheng, C., Fu, A., Zhang, Y., 1999. Entropy-based subspace clustering for mining
Bickel, S., Scheffer, T., 2004. Multi-view clustering. In: ICDM, Vol. 4. pp. 19–26. numerical data. In: Proceedings of the 5th ACM SIGKDD. San Diego, CA. pp. 84-93.
Biggio, B., Corona, I., Nelson, B., Rubinstein, B.I., Maiorca, D., Fumera, G., Roli, F., Cherng, J., Lo, M., 2001. A hypergraph based clustering algorithm for spatial data sets.
2014. Security evaluation of support vector machines in adversarial environments. In: Proc. IEEE Int. Conf. Data Mining (ICDM’01), pp. 83–90.
In: Support Vector Machines Applications. Springer, Cham, pp. 105–153. Chiş, M., Banerjee, S., Hassanien, A.E., 2009a. Clustering time series data: an
Bindra, K., Mishra, A., 2017. A detailed study of clustering algorithms. In: 6th evolutionary approach. Found. Comput. Intell. 6 (1), 193–207.
International Conference on Reliability, Infocom Technologies and Optimization Chiş, Monica, Banerjee, Soumya, Hassanien, Aboul Ella, 2009b. Clustering time series
(Trends and Future Directions) (ICRITO). Noida, pp. 371–376. http://dx.doi.org/ data: An evolutionary approach. In: Abraham, A., et al. (Eds.), Foundations of
10.1109/ICRITO.2017.8342454. Comput. Intel. Vol. 6, SCI 206. Springer-Verlag, Berlin Heidelberg, pp. 193–207,
Boldt-Christmas, a., Wong, b., 2015. A Study of Algorithms for 2-Dimensional Self- springerlink.com©.
Assembly in Swarm Robotics. Degree Project, in Computer Science, kth Royal Choi, S.S., Cha, S.H., Tappert, C.C., 2010. A survey of binary similarity and distance
Institute of Technology, Stockholm, Sweden. measures. J. Syst. Cybern. Inform. 8 (1), 43–48.
Boley, D., 1998. Principal direction divisive partitioning. Data Min. Knowl. Discov. 2 Chopade, N., Sheetlani, J., 2017. Recent trends in incremental clustering: A review.
(4), 325–344. IOSR J. Comput. Eng. 19 (1), 19–24.
Bora, D.J., Gupta, A.K., 2014. A novel approach towards clustering based image Chowdhury, A., Bose, S., Das, S., 2011. Automatic clustering based on invasive weed
segmentation. Int. J. Emerg. Sci. Eng. (IJESE) 2 (11), 6–10. optimization algorithm. In: International Conference on Swarm, Evolutionary, and
Boriah, S., Chandola, V., Kumar, V., 2008. Similarity measures for categorical data: A Memetic Computing. Springer, Berlin, pp. 105–112.
comparative evaluation. In: Proceedings of the 2008 SIAM International Conference Chowdhury, K., Chaudhuri, D., Pal, A.K., 2020. An entropy-based initialization method
on Data Mining. Society for Industrial and Applied Mathematics, pp. 243–254. of K-Means clustering on the optimal number of clusters. Neural Comput. Appl.
Borlea, I.D., Precup, R.E., Borlea, A.B., Iercan, D., 2021. A unified form of fuzzy C- 1–18.
means and K-means algorithms and its partitional implementation. Knowl.-Based Comaniciu, D., Meer, P., 2002. Mean shift: A robust approach toward feature space
Syst. 214, http://dx.doi.org/10.1016/j.knosys.2020.106731. analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24 (5), 603–619.
Bouveyron, C., Hammer, B., Villmann, T., 2012. Recent developments in clustering Connell, Scott D., Jain, Anil K., 1999. Writer adaptation of online handwritten models.
algorithms. In: ESANN 2012 Proceedings, European Symposium on Artificial Neural In: Proc. 5th Int. Conf. Document Analysis and Recognition, pp. 434–437.
Networks, Computational Intelligence and Machine Learning. pp. 447–458. Consoli, S., Korst, J., Pauws, S., Geleijnse, G., 2019. Improved variable neighbourhood
Boyko, N., Komarnytska, H., Kryvenchuk, Y., Malynovskyy, Y., 2020. Clustering search heuristic for quartet clustering. In: Sifaleras, A., Salhi, S., Brimberg, J.
algorithms for economic and psychological analysis of human behavior. In: CMiGIN- (Eds.), Variable Neighbourhood Search. ICVNS 2018. In: Lecture Notes in Computer
2019: International Workshop on Conflict Management in Global Information Science, vol. 11328, Springer, Cham, http://dx.doi.org/10.1007/978-3-030-15843-
Networks. pp. 1–13. 9_1.
Brauksa, I., 2013. Use of cluster analysis in exploring economic indicator differences Corter, J.E., Gluck, M.A., 1992. Explaining basic categories: Feature predictability and
among regions: The case of latvia. J. Econ. Bus. Manage. 1 (1), 42–45. information. Psychol. Bull. 111 (2), 291–303.
Brimberg, J., Janićijević, S., Mladenović, N., Urošević, D., 2017. Solving the clique Cui, X., Potok, T.E., 2005. Document clustering analysis based on hybrid PSO+K-Means
partitioning problem as a maximally diverse grouping problem. Optim. Lett. 11 algorithm. J. Comput. Sci. 5, 27–33.
(6), 1123–1135. Cui, X., Potok, T.E., Palathingal, P., 2005. Document clustering using particle swarm
Brimberg, J., Mladenović, N., Todosijević, R., et al., 2019. Solving the capacitated optimization. In: Proceedings of the IEEE Swarm Intelligence Symposium, SIS 2005.
clustering problem with variable neighbourhood search. Ann. Oper. Res. 272, IEEE Press, Piscataway, pp. 185–191.
289–321. http://dx.doi.org/10.1007/s10479-017-2601-5. Cura, T., 2012. A particle swarm optimization approach to clustering. Expert Syst. Appl.
Brito, P., Chavent, M., 2012. Divisive monothetic clustering for interval and 39 (1), 1582–1588. http://dx.doi.org/10.1016/j.eswa.2011.07.123.
histogram-valued data. Dafir, Z., Lamari, Y., Slaoui, S.C., 2021. A survey on parallel clustering algorithms for
Cai, F., Le-Khac, N.-A., Kechadi, M.-T., 2010. Clustering approaches for financial data big data. Artif. Intell. Rev. 54 (4), 2411–2443.
analysis: a survey. Dalrymple-Alford, E.C., 1970. The measurement of clustering in free recall. Psychol.
Calinski, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Commun. Stat. Bull. 74, 32–34.
3, 1–27. Damnjanovic, U., Fernandez, V., Izquierdo, E., Martinez, J.M., 2008. Event detec-
Campbell, J.G., Fraley, C., Murtagh, F., Raftery, A.E., 1997. Linear flaw detection in tion and clustering for surveillance video summarization. In: Ninth International
woven textiles using model based clustering. Pattern Recognit. Lett. 18, 1539–1548. Workshop on Image Analysis for Multimedia Interactive Services. pp. 63–66.
Cao, F., Ester, M., Qian, W., Zhou, A., 2006. Density-based clustering over an evolving Dang, X.H., Lee, V., Ng, W.K., Ciptadi, A., Ong, K.L., 2009. An EM-based algorithm
data stream with noise. https://epubs.siam.org/page/term. for clustering data streams in sliding windows. In: International Conference on
Cao, Buyang, Glover, Fred, Rego, Cesar, 2015. A tabu search algorithm for cohesive Database Systems for Advanced Applications. Springer, Berlin, Heidelberg, pp.
clustering problems. J. Heuristics 21, http://dx.doi.org/10.1007/s10732-015-9285- 230–235.
2. Dardac, N., Boitan, I.A., 2019. A cluster analysis approach for banks’ risk profile: The
Carreira-Perpiñán, M.Á., 2015. A review of mean-shift algorithms for clustering. arXiv Romanian evidence. Eur. Res. Stud. 7 (1), 109–118.
preprint arXiv:1503.00687. Das, S., Abraham, A., Konar, A., 2008b. Swarm intelligence algorithms in
Carrizosa, E., Mladenovic, N., Todosijevic, R., 2013. Variable neighbourhood search bioinformatics. Stud. Comput. Intell. (SCI) 94, 113–147.
for minimum sum-of-squares clustering on networks. European J. Oper. Res. 230, Das, S., Abraham, A., Sarkar, S., 2006. A hybrid rough set–particle swarm algorithm
356–363. for image pixel classification. http://dx.doi.org/10.1109/HIS.2006.264909, 26.
Chan, P.K., Mahoney, M.V., 2005. Modeling multiple time series for anomaly detection. Das, S., Chowdhury, A., Abraham, A., 2009. A bacterial evolutionary algorithm for
In: Proceedings of Fifth IEEE International Conference on Data Mining. pp. 90–97. automatic data clustering. In: 2009 IEEE Congresson Evolutionary Computation.
Chang, J.W., Jin, D.S., 2002. A new cell-based clustering method for large, high- IEEE, pp. 2403–2410.
dimensional data in data mining applications. In: Proceedings of the 2002 ACM Das, G., Lin, K.I., Mannila, H., Renganathan, G., Smyth, P., 1998. Rule discovery from
Symposium on Applied Computing. ACM Press, pp. 503–507. time series. Knowl. Discov. Data Min. 98, 16–22.
Chang, D.X., Zhang, X.D., Zheng, C.W., Zhang, D.M., 2010. A robust dynamic niching Das, A.S., Roy, S., 2008. Swarm intelligence algorithms for data clustering. In: Soft
genetic algorithm with niche migration for automatic clustering problem. Pattern Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA, USA,
Recognit. 43 (4), 1346–1360. pp. 279–313.
Chaouni, A., Benabdellah, Benghabrit, A., Bouhaddou, I., 2019. A survey of clustering Dasgupta, S., Littman, M.L., McAllester, D., 2002. PAC generalization bounds for
algorithms for an industrial context. Procedia Comput. Sci. 148, 291–302. co-training. In: Advances in Neural Information Processing Systems, Vol. 1. pp.
Chavent, M., Lechevallier, Y., Briant, O., 2007. DIVCLUS-T: A monothetic divisive 375–382.
hierarchical clustering method. Comput. Statist. Data Anal. 52 (2), 687–701. Dasgupta, A., Raftery, A.E., 1998. Detecting features in spatial point processes with
Chehouri, A., Younes, R., Khoder, J., Perron, J., Ilinca, A., 2017. A selection process clutter via model-based clustering. J. Amer. Statist. Assoc. 93, 294–302.
for genetic algorithm using clustering analysis. Algorithms 10 (123), 1–15. Datta, S., Datta, S., 2003. Comparisons and validation of statistical clustering techniques
Chellapilla, K., Simard, P., Abdulkader, A., 2006. Allograph based writer adaptation for for microarray gene expression data. Bioinformatics 19 (4), 459–466.
handwritten character recognition. In: Tenth International Workshop on Frontiers Davies, D.L., Bouldin, D.W., 1979. A cluster separation measure. IEEE Trans. Pattern
in Handwriting Recognition. Suvisoft. Anal. Mach. Intell. (2), 224–227.
37
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Day, W.H.E., Edelsbrunner, H., 1984. Efficient algorithms for agglomerative hierar- Ezugwu, A.E., Shukla, A.K., Agbaje, M.B., Oyelade, O.N., José-García, A.,
chical clustering methods. J. Classification 1, 7–24. http://dx.doi.org/10.1007/ Agushaka, J.O., 2020b. Automatic clustering algorithms: a systematic review and
BF01890115. bibliometric analysis of relevant literature. Neural Comput. Appl. 1–60.
Defays, D., 1977. An efficient algorithm for a complete link method. Comput. J. 20, Falkenauer, E., 1998. Genetic Algorithms and Grouping Problems. John Wiley and Sons
364–366. Ltd., Chichester, England.
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood for incomplete Feng, L., Qiu, M.-H., Wang, Y.-X., Xiang, Q.-L., Yang, Y.-F., Liu, K., 2010. A fast divisive
data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38. clustering algorithm using an improved discrete particle swarm optimizer. Pattern
Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chrétien, L., Recognit. Lett. 31 (11), 1216–1225.
1991. The dynamics of collective sorting: robot-like ants and ant-like robots. In: Finak, G., Bashashati, A., Brinkman, R., Gottardo, R., 2009. Merging mixture compo-
Meyer, J.-A., Wilson, S. (Eds.), From Animals to Animals: Proceedings of the nents for cell population identification in flow cytometry. Adv. Bioinform. 2009,
First International Conference on Simulation of Adaptive Behavior. MIT Press, 1–12. http://dx.doi.org/10.1155/2009/247646.
Cambridge, pp. 356–365. Forte, J.C., Perner, A., Horst, I.C., 2019. The use of clustering algorithms in critical
Dhanachandra, N., Manglem, K., Chanu, Y.J., 2015. Image segmentation using K-means care research to unravel patient heterogeneity. Intensive Care Med. 45, 1025–1028.
clustering algorithm and subtractive clustering algorithm. In: Procedia Computer Fortier, J., Solomon, H., 1996. Clustering procedures. In: P.R. Krishnaiah (Ed.),
Science, Eleventh International Multi-Conference on Information Processing-2015 Proceedings ofthe Multivariate Analysis, ’66. pp. 493–506.
(IMCIP-2015), Vol. 54. pp. 764–771. Fowlkes, E.B., Mallows, C.L., 2010. A method for comparing two hierarchical
Dhiman, G., Kumar, V., 2018. Emperor penguin optimizer: A bio-inspired algorithm for clusterings. J. Amer. Statist. Assoc. 78 (383), 553–569.
engineering problems. http://dx.doi.org/10.1016/j.knosys.2018.06.001. Fraley, Chris, Raftery, Adrian E., 1998. How many clusters? Which clustering method?
Dizaji, K., Herandi, A., Deng, C., Cai, W., Huang, H., 2017. Deep clustering via joint Answers via model-ased cluster analysis.
convolutional autoencoder embedding and relative entropy minimization. Friedman, J.H., Meulman, J.J., 2002. Clustering objects on subsets of attributes. http:
Djouzi, K., Beghdad-Bey, K.A., 2019. Review of clustering algorithms for big data. //citeseer.nj.nec.com/friedman02clustering.html.
In: 2019 International Conference on Networking and Advanced Systems (ICNAS). Friedman, H.P., Rubin, J., 1967. On some invariant criteria for grouping data. J. Amer.
Annaba, Algeria, pp. 1–6. http://dx.doi.org/10.1109/ICNAS.2019.8807822. Statist. Assoc. 62 (320), 1159–1178.
Dongkuan, Xu, 2015. A comprehensive survey of clustering algorithms. Fu, T.K., Chung, F., Ng, C.M., 2004. Financial time series segmentation based on
Dorigo, M., Maniezzo, V., Colorni, A., 1996. Ant system: optimization by a colony of specialized binary tree representation.
cooperating agents. IEEE Trans. Syst. Man Cybern. B 26 (1), 29–41. Fuad, M.M., 2017. Applying Nature-Inspired Optimization Algorithms for Selecting
Dorigo, M., Stützle, T., 2004. Ant Colony Optimization. MIT Press, Cambridge. Important Timestamps to Reduce Time Series Dimensionality. Coventry University,
Doval, D., Mancoridis, S., Mitchell, B.S., 1999. Automatic clustering of software systems pp. 1–13.
using a genetic algorithm. In: STEP’99. Proceedings Ninth International Workshop
Fukunaga, K., Hostetler, L.D., 1975. The estimation of the gradient of a density function,
Software Technology and Engineering Practice. IEEE, pp. 73–81.
with applications in pattern recognition. IEEE Trans. Inform. Theory 21 (1), 32–40.
Drew, J., Moore, T., 2014. Automatic identification of replicated criminal websites
Gama, J., Gaber, M. (Eds.), 2007. Learning from Data Streams. Springer.
using combined clustering. In: 2014 IEEE Security and Privacy Workshops. IEEE,
Gan, G., Ma, C., Wu, J., 2007. Data Clustering: Theory, Algorithms, and Applications.
pp. 116–123.
Society for Industrial and Applied Mathematics, Philadelphia, PA.
Drewes, B., 2005. Some industrial applications of text mining. In: Knowledge Mining.
Gariel, M., Srivastava, A.N., Feron, E., 2011. Trajectory clustering and an application
Springer, Berlin, Heidelberg, pp. 233–238.
to airspace monitoring. IEEE Trans. Intell. Transp. Syst. 12 (4), 1511–1524.
Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification. Wiley Publications.
Gath, I., Geva, A., 1989. Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern
Duin, R.P., Fred, A.L., Loog, M., P˛ ekalska, E., 2012c. Mode seeking clustering by KNN
Anal. Mach. Intell. 11 (7), 773–781.
and mean shift evaluated. In: Joint IAPR International Workshops on Statistical
Ge, Y., Sealfon, S.C., 2012. Flowpeaks: A fast unsupervised clustering for flow cytometry
Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern
data via K-means and density peak finding. Bioinformatics 28 (15), 2052–2058.
Recognition (SSPR). Springer, Berlin, Heidelberg, pp. 51–59.
http://dx.doi.org/10.1093/bioinformatics/bts300.
Duin, R.P.W., Fred, A.L.N., Loog, M., Pękalska, E., 2012a. Mode seeking clustering by
Gionis, A., Mannila, H., 2003. Finding recurrent sources in sequences. In: Proceedings
KNN and mean shift evaluated. In: Gimel’farb, G., et al. (Eds.), Structural, Syntactic,
of the Seventh Annual International Conference on RESEARCH in Computational
and Statistical Pattern Recognition. SSPR/SPR 2012. In: Lecture Notes in Computer
Molecular Biology, 2003. pp. 123–130.
Science, vol. 7626, Springer, Berlin, Heidelberg, http://dx.doi.org/10.1007/978-3-
Glover, Fred, 1990. Tabu search: A tutorial. Interfaces 20, 74–94. http://dx.doi.org/10.
642-34166-3_6.
1287/inte.20.4.74, Rafael Batres, in Computer Aided Chemical Engineering, 2012.
Duin, Robert P.W., Fred, Ana L.N., Loog, Marco, Pękalska, Elzbieta, 2012b. Mode
Glover, Fred, McMillan, C., Novick, Beth, 1985. Interactive decision software and
seeking clustering by KNN and mean shift evaluated. In: farb, G.L. Gimel’, et al.
computer graphics for architectural and space planning. Ann. Oper. Res. 5,
(Eds.), SSPR & SPR. In: LNCS, vol. 7626, Springer-Verlag Berlin Heidelberg, pp.
557–573. http://dx.doi.org/10.1007/BF02023611.
51–59.
Dunn, J.C., 1973. A fuzzy relative of the ISODATA process and its use in detecting Goil, S., Nagesh, H., Choudhary, A., 1999. MAFIA: Efficient and Scalable Subspace Clus-
compact well-separated clusters. J. Cybern. 3 (3), 32–57. tering for Very Large Data Sets. Technical Report CPDC-TR-9906-010, Northwestern
D’urso, P., Massari, R., 2019. Fuzzy clustering of mixed data. Inform. Sci. 505, 513–534. University.
Edwards, A.W., Cavalli-Sforza, L.L., 1965. A method for cluster analysis. Biometrics Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine
362–375. Learning. Addison-Wesley, New York.
ElazizID, M.A., Al-qaness, M.A., Zaid, E.O., Lu, S., Ibrahim, R.A., Ewees, A.A., 2020. Goldbogen, J.A., Friedlaender, A.S., Calambokidis, J., McKenna, M.F., Simon, M.,
Automatic clustering method to segment COVID-19 CT images. PLoS ONE 16 (1), Nowacek, D.P., 2013. Integrative approaches to the study of baleen whale diving
1–13. behavior, feeding performance, and foraging ecology. BioScience 63 (2), 90–100.
Elbattah, M., Molloy, O., 2017. Clustering-aided approach for predicting patient http://dx.doi.org/10.1525/bio.2013.63.2.5.
outcomes with application to elderly healthcare in Ireland. In: The AAAI-17 Joint Gong, C., Chen, H., He, W., Zhang, Z., 2017. Improved multi-objective clustering
Workshop on Health Intelligence WS-17-09. pp. 533–541. algorithm using particle swarm optimization. Plos One.
Engelbrecht, A.P., 2005. Fundamentals of Computational Swarm Intelligence. John Gowda, K.C., Krishna, G., 1978. Agglomerative clustering using the concept of mutual
Wiley and Sons Ltd. nearest neighbourhood. Pattern Recognit. 10 (2), 105–112. http://dx.doi.org/10.
Erdogmus, P., Kayaalp, F., 2020. Introductory chapter: Clustering with nature-inspired 1016/0031-3203(78)90018-3.
optimization algorithms. In: Erdogmus, P., Kayaalp, F. (Eds.), Introduction to Data Graves, D., Pedrycz, W., 2010. Proximity fuzzy clustering and its application to time
Science and Machine Learning. IntechOpen, p. 16. series clustering and prediction. In: Proceedings of the 2010 10th International
Erkin, Z., Veugen, T., Toft, T., Lagendijk, R.L., 2013. Privacy-preserving distributed Conference on Intelligent Systems Design and Applications ISDA10. pp. 49–54.
clustering. EURASIP J. Inf. Secur. vol. 2013 (1), 1–15. http://dx.doi.org/10.1186/ Grira, N., Crucianu, M., Boujemaa, N., 2005. Unsupervised and Semi-Supervised Clus-
1687-417X-2013-4. tering: A Brief Survey * a Review of Machine Learning Techniques for Processing
Erman, J., Arlitt, M., Mahanti, A., 2006. Traffic classification using clustering algo- Multimedia Content, Re-Port of the MUSCLE European Network of Excellence (6th
rithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Framework Programme).
Data. pp. 281–286, Pisa Italy September 11 - 15, 2006. http://dx.doi.org/10.1145/ Guénoche, A., Hansen, P., Jaumard, B., 1991. Efficient algorithms for divisive
1162678.1162679. hierarchical clustering with the diameter criterion. J. Classification 8 (1), 5–30.
Ezugwu, A.E., 2020a. Nature-inspired metaheuristic techniques for automatic clustering: Guérin, J., Thiery, S., Nyiri, E., Gibaru, O., 2018. Unsupervised robotic sorting: Towards
a survey and performance study. SN Appl. Sci. 2 (2), http://dx.doi.org/10.1007/ autonomous decision making robots. Int. J. Artif. Intell. Appl. (IJAIA) 9 (2), 81–98.
s42452-020-2073-0. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L., 2000. Clustering data streams. In:
Ezugwu, A.E., 2020b. Nature-inspired metaheuristic techniques for automatic clustering: Proceedings of the 41st Annual Symposium on Foundations of Computer Science.
a survey and performance study. SN Appl. Sci. 2 (2), http://dx.doi.org/10.1007/ IEEE Computer Society.
s42452-020-2073-0. Gulhane, A., Paikrao, P.L., Chaudhari, D.S., 2012. A review of image data clustering
Ezugwu, A.E., Shukla, A.K., Agbaje, M.B., Oyelade, O.N., José-García, A., techniques. Int. J. Soft Comput. Eng. (IJSCE) 2 (1), 212–215.
Agushaka, J.O., 2020a. Automatic clustering algorithms: a systematic review and Guzzi, P.H., Masciari, E., Mazzeo, G.M., Zaniolo, C., 2014. A discussion on the biological
bibliometric analysis of relevant literature. Neural Comput. Appl. 1–60. relevance of clustering results. In: ITBAM 2014. In: LNCS, pp. 30–44.
38
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002. Clustering validity checking methods: Jain, A.K., Murty, M.N., Flynn, P.J., 1999. Data clustering: a review. ACM Comput.
part II. ACM Sigmod Rec. 31 (3), 19–27. Surv. 31 (3), 264–323.
Halkidi, M., Vazirgiannis, M., 2001. Clustering validity assessment: Finding the optimal Janati, F., Abdollahi, F., Ghidary, S.S., Jannatifar, M., Baltes, J., Sadeghnejad, S.,
partitioning of a data set. In: Proceedings 2001 IEEE International Conference on 2017. Multi-robot task allocation using clustering method. In: Robot Intelligence
Data Mining. IEEE, pp. 187–194. Technology and Applications, Advances in Intelligent Systems and Computing. pp.
Hall, L.O., Ozyurt, I.B., Bezdek, J.C., 1999. Clustering with a genetically optimized 223–247.
approach. IEEE Trans. Evol. Comput.. Jiang, Z., Zheng, Y., Tan, H., Tang, B., Zhou, H., 2016. Variational deep embedding:
Hamerly, G., Elkan, C., 2004. Learning the k in K-means. In: Advances in Neural An unsupervised and generative approach to clustering. arXiv preprint arXiv:
Information Processing Systems, Vol. 2003. MIT Cambridge Press, pp. 281–288. 1611.05148.
Hamilton, J.D., 1994. Time Series Analysis. Princeton University Press, p. 159. José-García, A., Gómez-Flores, W., 2016. Automatic clustering using nature-inspired
Hancer, E., Xue, B., Zhang, M., 2020. A survey on feature selection approaches for metaheuristics: A survey. Appl. Soft Comput. 41, 192–213.
clustering. Artif. Intell. Rev. 53 (6), 4519–4545. Kailing, K., Kriegel, H.P., Pryakhin, A., Schubert, M., 2004. Clustering multi-represented
Hand, D.J., Mannila, H., Smyth, P., 2001. Principles of Data Mining (Adaptive objects with noise. In: Pacific-Asia Conference on Knowledge Discovery and Data
Computation and Machine Learning). MIT Press. Mining. Springer, Berlin, Heidelberg, pp. 394–403.
Handl, J., Knowles, J., 2007. An evolutionary approach to multiobjective clustering. Kalyanasundaram, C., Snehal, A., Gaurav, J., Jain, S., 2015. Text clustering for
IEEE Trans. Evol. Comput. 11 (1), 56–76. information retrieval system using supplementary information. Int. J. Comput. Sci.
Handl, J., Knowles, J., Dorigo, M., 2006. Ant-based clustering and topographic mapping. Inf. Technol. 6 (2), 1613–1615.
Artif. Life 12 (1), 35–62. Kamalzadeh, H., Ahmadi, A., Mansour, S., 2020. Clustering time-series by a novel
Hansen, P., 2005. Variable neighbourhood search. Search methodology. In: Bruke, E.K., slope-based similarity measure considering particle swarm optimization.. Appl. Soft
Kendall, G. (Eds.), Search Metodologies. Springer, New York, NY, USA, pp. Comput. 96, 106701.
211–238. Kanade, P.M., Hall, L.O., 2004. Fuzzy ant clustering by centroid positioning. In:
Hansen, P., Brimberg, J., Urosevic, D., Mladenovic, N., 2009. Solving large p-median Proceedings of the IEEE International Conference on Fuzzy Systems, Vol. 1. IEEE
clustering problems by primal dual variable neighbourhood search. Data Min. Press, Piscataway, pp. 371–376.
Knowl. Discov. 19, 351–375. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.,
Hansen, P., Mladenovic, N., 2001. J -Means: A new local search heuristic for minimum 2000. The analysis of a simple K-Means clustering algorithm. In: Symposium on
sum-of-squares clustering. Pattern Recognit. 34, 405–413. Computational Geometry. ACM, New York, 100109.
Hansen, P., Mladenovic, N., 2018. Variable neighbourhood search. In: Martí, R., Karypis, G., Kumar, V., 2000. Multilevel k-way hypergraph partitioning. VLSI Des. 11
Pardalos, P., Resende, M. (Eds.), Handbook of Heuristics. Springer, Cham, (3), 285–300.
Switzerland. Kaufman, L., Rousseeuw, P., 1990. Finding Groups in Data: An Introduction to Cluster
Hanumanth Sastry, S., PrasadaBabu, M., 2013. Analysis & prediction of sales data Analysis. John Wiley & Sons, New York.
in saperp system using clustering algorithms. Int. J. Comput. Sci. Inf. Technol. Kaufman, L., Rousseeuw, P.J., 2009. Finding Groups in Data: An Introduction to Cluster
(IJCSITY) 1 (4), 95–109. Analysis, Vol. 344. John Wiley & Sons.
Harshada, S., Deshmukh, P., Ramteke, L., 2015. Comparing the techniques of cluster Keogh, E., Lonardi, S., Chiu, B.Y., 2002. Finding surprising patterns in a time series
analysis for big data. database in linear time and space. In: Proceedings of the Eighth ACM SIGKDD. pp.
Hartigan, J.A., 1975. Clustering Algorithms. John willey and sons. Inc, new york. 550–556.
Hartigan, J.A., Wong, M.A., 1979. Algorithm AS 136: A K-means clustering algorithm. Khaled, S., Al-Sultan, M., Khan, Maroof, 1995. Computational experience on four
J. R. Stat. Soc. Ser. C (Appl. Stat.) 28 (1), 100–108. algorithms for the hard clustering problem.
He, W., Feng, G., Wu, Q., He, T., Wan, S., Chou, J., 2011. A new method for abrupt Kharrousheh, Adnan, Abdullah, Salwani, Nazri, Mohd Zakree Ahmad, 2011. A modified
dynamic change detection of correlated time series. Int. J. Climatol. 32 (10), tabu search approach for the clustering problem. J. Appl. Sci. 11, 3447–3453.
1604–1614. Kim, J., 2009. Dissimilarity Measures for Histogram-Valued Data and Divisive Clustering
He, H., Tan, Y., 2012. A two-stage genetic algorithm for automatic clustering. of Symbolic Objects. University of Georgia, Athens, GA, USA.
Neurocomputing 81, 49–59. Kim, J., Billard, L., 2011. A polythetic clustering process and cluster validity indexes
Herawan, T., Deris, M.M., 2009. A framework on rough set-based partitioning attribute for histogram-valued objects. Comput. Statist. Data Anal. 55 (7), 2250–2262.
selection. In: Paper Presented at the International Conference on Intelligent Kim, J., Billard, L., 2012. Dissimilarity measures and divisive clustering for symbolic
Computing. multimodal-valued data. Comput. Statist. Data Anal. 56 (9), 2795–2808.
Herawan, T., Deris, M.M., Abawajy, J.H., 2010. A rough set approach for selecting Kim, J., Lee, W., Song, J.J., Lee, S.-B., 2017. Optimized combinatorial clustering for
clustering attribute. Knowl.-Based Syst. 23 (3), 220–231. stochastic processes. Cluster Comput. 20 (2), 1135–1148.
Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Kisore, N.R., Koteswaraiah, C.B., 2017. Improving ATM coverage area using density
Michigan Press, Michigan, USA. based clustering algorithm and voronoi diagrams. Inform. Sci. 376, 1–20. http:
Hruschka, E.R., Campello, R.J., Freitas, A.A., 2009. A survey of evolutionary algorithms //dx.doi.org/10.1016/j.ins.2016.09.058.
for clustering. IEEE Trans. Syst. Man Cybern. C 39 (2), 133–155. Kittler, J.V., 1976. A locally sensitive method for cluster analysis. Pattern Recognit. 8
Hsu, C.C., Lin, C.W., 2018. CNN-based joint clustering and representation learning with (1), 23–33.
feature drift compensation for large-scale image data. IEEE Trans. Multimed. 20 (2), Kokate, U., Deshpande, A., Mahalle, P., Patil, P., 2018. Review: Data stream clustering
421–429. http://dx.doi.org/10.1109/TMM.2017.2745702. techniques, applications, and models: Comparative analysis and discussion. Big Data
Huang, S., Kang, Z., Xu, Z., Liu, Q., 2021. Robust deep k-means: An effective and Cogn. Comput. 2 (32), http://dx.doi.org/10.3390/bdcc2040032.
simple method for data clustering. Pattern Recognit. 117, http://dx.doi.org/10. Koontz, W.L.G., Narendra, P.M., Fukunaga, K., 1976. A graph-theoretic approach to
1016/j.patcog.2021.107996. nonparametric cluster analysis. IEEE Trans. Comput. 25, 936–944.
Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006. Extreme learning machine: theory and Kordos, M., Blachnik, M., 2019. Improving evolutionary instance selection with
applications. Neurocomputing 70, 489–501. clustering and ensembles. In: PP-RAI2019 Conference Proceeding. pp. 302–305.
Ibrahim, Osman, H., Christofides, Nicos, 1994. Capacitated clustering problems by hy- Kosters, W., Laros, J., 2007. Metrics for mining multisets. In: International Conference
brid simulated annealing and tabu search. Int. Trans. Oper. Res. (ISSN: 0969-6016) on Innovative Techniques and Applications of Artificial Intelligence. Springer,
1 (3), 317–336. http://dx.doi.org/10.1016/0969-6016(94)90032-9. London, pp. 293–303.
Ishida, C., Arakawa, Y., Sasase, I., Takemori, K., 2005. Forecast techniques for Kovács, F., Ivancsy, R., 2006. Cluster Validity Measurement for arbitrary Shaped
predicting increase or decrease of attacks using bayesian inference. In: PACRIM. clustering. In: Proceeding of the 5th. WSEAS Int.Conf. on Artificial, Knowledge
2005 IEEE Pacific Rim Conference on Communications, Computers and Signal Engineering and Data Bases. Madrid, Spain. pp. 372–377.
Processing, 2005. IEEE, Victoria, BC, Canada, pp. 450–453. http://dx.doi.org/10. Krishna, Murty, M.N., 1999. Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern.
1109/PACRIM.2005.1517323. B 29 (3).
Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H., 2009. Detection of preference shift timing Krishnapuram, R., Keller, J., 1993. A possibilistic approach to clustering. IEEE Trans.
using time-series clustering. pp. 1585–1590. Fuzzy Syst. 1 (2), 98–110.
Ivancsy, R., Kovacs, F., 2006. Clustering techniques utilized in web usage mining. Krishnasamy, G., Kulkarni, A.J., Paramesran, R., 2014. A hybrid approach for data
In: Proceedings of the 5th WSEAS Int. Conf. on Artificial Intelligence, Knowledge clustering based on modified cohort intelligence and K-Means. Expert Syst. Appl.
Engineering and Data Bases, Madrid, Spain, pp. 237-242. (ISSN: 0957-4174) 41 (13), 6009–6016. http://dx.doi.org/10.1016/j.eswa.2014.03.
Izakian, Z., Mesgari, M.S., Abraham, A., 2016. Automated clustering of trajectory data 021.
using a particle swarm optimization. Comput. Environ. Urban Syst. 55, 55–65. Krovi, R., 1992. Genetic algorithms for clustering: a preliminary investigation. In:
Jadhav, A.N., Gomathi, N., 2018. WGC: Hybridization of exponential grey wolf System Sciences, Proceedings of the Twenty-Fifth Hawaii International Conference,
optimizer with whale optimization for data clustering. Alex. Eng. J. 57 (3), Volume: Iv on. pp. 540–544.
1569–1584. http://dx.doi.org/10.1016/j.aej.2017.04.013. Kulkarni, A.J., Durugkar, I.P., Kumar, M., 2013. Cohort intelligence: A selfsuper-
Jain, A.K., 2010. Data clustering: 50 years beyond K-Means. Pattern Recognit. Lett. 31 vised learning behavior. In: IEEE International Conference on Systems, Man, and
(8), 651–666. Cybernetics (SMC). pp. 1396–1400.
Jain, A.K., Dubes, R.C., 1988. Algorithms for Clustering Data. Prentice-Hall, Inc, Upper Kuo, R.J., Huang, Y.D., Lin, C.C., Wu, Y.H., Zulvia, F.E., 2014. Automatic kernel
Saddle River, NJ. clustering with bee colony optimization algorithm. Inform. Sci. 283, 107–122.
39
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Kuo, R.J., Syu, Y.J., Chen, Z.Y., Tien, F.C., 2012. Integration of particle swarm Martins, P., 2020. Goal clustering: VNS based heuristics. Available online: https://arxiv.
optimization and genetic algorithm for dynamic clustering. Inform. Sci. 195 (2012), org/abs/1705.07666v4 (accessed on 24 October 2020).
124–140. http://dx.doi.org/10.1016/j.ins.2012.01.021. Maulik, U., Bandyopadhyay, S., 2000. Genetic algorithm-based clustering technique.
Lago-Fernández, L.F., Corbacho, F., 2010. Normality-based validation for crisp Pattern Recognit. 33.
clustering. Pattern Recognit. 43 (3), 782–795. Mazlack, L.J., He, A., Zhu, Y., 2000. A rough set approach in choosing partitioning
Lakhani, J., Chowdhary, A., Harwani, D., 2015. Clustering techniques for biological attributes. In: Paper Presented At the Proceedings of the ISCA 13th International
sequence analysis: A review. J. Appl. Inf. Sci. 14–32. Conference (CAINE-2000).
Lakshmi, K., Karthikeyani Visalakshi, N., Shanthi, S., 2018. Data clustering using K- McClain, J.O., Rao, V.R., 1975. Clustisz: A program to test for the quality of clustering
means based on crow search algorithm. Sādhanā 43. http://dx.doi.org/10.1007/ of a set of objects. J. Mar. Res. 456–460.
s12046-018-0962-3S. Mclachlan, G., Basford, K., 1988. Mixture Models: Inference and Applications to
Lama, P., 2013. Clustering System Based on Text Mining using the K-Means Clustering. Marcel Dekker, New York, NY.
Algorithm (Bachelor’s thesis). (UAS), Information Technology. Mclachlan, G., Krishnan, T., 1997. The EM Algorithm and Extensions. John Wiley &
Legány, C., Juhász, S., Babos, A., 2006. Cluster validity measurement techniques. In: Sons, New York, NY.
Proceeding of the 5th. WSEAS Int. Conf. on Artificial, Knowledge Engineering and Meng, Y., Liu, X., 2007. Application of K-means algorithm based on ant clustering
Data Bases, February 15-17. WSEAS, Madrid, Spain, pp. 388–393. algorithm in macroscopic planning of highway transportation hub. In: 2007 First
Leng, M., Lai, X., Tan, G., Xu, X., 2009. Time series representation for anomaly IEEE International Symposium on Information Technologies and Applications in
detection. In: Proceedings of 2nd IEEE International Conference on Computer Education. IEEE, Kunming, China, pp. 483–488. http://dx.doi.org/10.1109/ISITAE.
Science and Information Technology. In: ICCSIT 2009, vol. 2009, pp. 628–632. 2007.4409331.
Li, L., Das, S., John Hansman, R., Palacios, R., Srivastava, A.N., 2015. Analysis of flight Van der Merwe, D., Engelbrecht, A., 2003. Data clustering using particle swarm
data using clustering techniques for detecting abnormal operations. J. Aerosp. Inf. optimization. In: Proceedings of the 2003 Congress on Evolutionary Computation.
Syst. 12 (9). IEEE Press, Piscataway, pp. 215–220.
Li, X., Liang, W., Zhang, X., Qing, S., Chang, P.C., 2020. A cluster validity evaluation Milligan, G.W., 1981. A Monte Carlo study of thirty internal criterion measures for
method for dynamically determining the near-optimal number of clusters. Soft cluster analysis. Psychometrika 46 (2), 187–199.
Comput. 24 (12), 9227–9241. Min, E., Guo, X., Liu, Q., Zhang, G., Cui, J., Long, J., 2018. A survey of clustering
Li, F., Qiao, H., Zhang, B., 2018. Discriminatively boosted image clustering with fully with deep learning: From the perspective of network architecture. IEEE Access 6,
convolutional auto-encoders. Pattern Recognit. 83, 161–173. http://dx.doi.org/10. 39501–39514. http://dx.doi.org/10.1109/ACCESS.2018.2855437.
1016/j.patcog.2018.05.019. Mirjalili, S., Lewis, A., 2016. The whale optimization algorithm. Adv. Eng. Softw. 95,
Liao, T.W., 2005. Clustering of time series data-a survey. Pattern Recognit. 2005. 51–67. http://dx.doi.org/10.1016/j.advengsoft.2016.01.008.
http://dx.doi.org/10.1016/j.patcog.2005.01.025. Mishra, J., Agarwal, V., Sharma, M., Srivastava, J., 2015. Clustering algorithms: Brief
Lin, J., Lin, H., 2009. A density-based clustering over evolving heterogeneous data review in bioinformatics. Int. J. Sci. Res. (IJSR) 1012–1019.
stream. In: 2009 ISECS International Colloquium on Computing, Communication,
Mitsa, T., 2009. Temporal Data Mining, Vol. 33. Chapman & Hall/CRC Taylor and
Control, and Management, Vol. 4. IEEE, pp. 275–277.
Francis Group, Boca Raton, FL.
Lin, J., Vlachos, M., Keogh, E., Gunopulos, D., 2004. Iterative incremental clustering
Mladenovic, N., Hansen, P., 1997. Variable neighbourhood search. Comput. Oper. Res.
of time series. Adv. Database Technol. 521–522.
24, 1097–1100.
Lin, C.X., Yu, Y., Han, J., Liu, B., 2009. Hierarchical Web-Page Clustering Via in-Page
Morchen, F., Ultsch, A., Mörchen, F., Hoos, O., 2005. Extracting interpretable muscle
and Cross-Page Link Structures. University of Illinois at Urbana-Champaign.
activation patterns with time series knowledge mining. J. Knowl. Based 9 (3),
Liu, X., Croft, W.B., 2004. Cluster-based retrieval using language models. In: Proceed-
197–208.
ings of the 27th Annual International ACM SIGIR Conference on Research and
Mukerjee, S., Feigelson, E.D., Babu, G.J., Murtagh, F., Fraley, C., Raftery, A., 1998.
Development in Information Retrieval. University of Sheffield, UK, pp. 186–193.
Three types of gamma ray bursts. Astrophys. J. 508, 314–327.
http://dx.doi.org/10.1145/1008992.1009026.
Murphy, R.F., 1985. Automated identification of subpopulations in flow cytometric list
Liu, Y., Wu, X., Shen, Y., 2011. Automatic clustering using genetic algorithms. Appl.
mode data using cluster. In: Alan R. Liss, Inc. Cytometry, Vol. 6.
Math. Comput. 218 (4), 1267–1279.
Murtagh, F., 1983. A survey of recent advances in hierarchical clustering algorithms.
Lo, K., Brinkman, R.R., Gottardo, R., 2008. Automated gating of flow cytometry data
Comput. J. 26, 354–359. http://dx.doi.org/10.1093/comjnl/26.4.354.
via robust model-based clustering. Cytometry A 73 (4), 321–332. http://dx.doi.org/
Murtagh, F., 1985. A survey of algorithms for contiguity-constrained clustering and
10.1002/cyto.a.20531.
related problems. Comput. J. 28, 82–88. http://dx.doi.org/10.1093/comjnl/28.1.
Lu, Y., Cao, B., Rego, C., Glover, F., 2018. A tabu search based clustering algorithm
82.
and its parallel implementation on spark. Appl. Soft Comput. 63, 97–109.
Myhre, J.N., Mikalsen, K.Ø., Løkse, S., Jenssen, R., 2018. Robust clustering using a
Lu, Yi, Lu, Shiyong, Fotouhi, Farshad, 2004a. FGKA: A fast genetic K-means clustering
kNN mode seeking ensemble. Pattern Recognit. 76, 491–505.
algorithm. In: SAC’04 Nicosia, Cyprus. ACM, 1-58113-812-1/03/04.
Lu, Yi, Lu, Shiyong, Fotouhi, Farshad, Deng, Youping, Susan, Brown J., 2004b. An Nagpal, Arpita, 2013. Review based on data clustering algorithms.
incremental genetic K-means algorithm and its application in gene expression data Naim, I., Datta, S., Rebhahn, J., Cavenaugh, J.S., Mosmann, T.R., Sharma, G., 2014.
analysis. BMC Bioinformatics. SWIFT—scalable clustering for automated identification of rare cell populations
Lydia, E.L., Govindaswamy, P., Lakshmanaprabu, S.K., Ramya, D., 2018. Document in large, high-dimensional flow cytometry datasets, part 1: Algorithm design.
clustering based on text mining K-means algorithm using euclidean distance Cytometry A 85 (5), 408–421.
similarity. J. Adv. Res. Dyn. Control Syst. 10 (02-Special Issue), E. Nameirakpam, D., Jina, C.Y., 2015. Image segmentation method using K-means
Macnaughton-Smith, P., Williams, W., Dale, M., Mockett, L., 1964. Dissimilarity clustering algorithm for color image. Adv. Res. Electr. Electron. Eng. 2 (11), 68–72.
analysis: a new technique of hierarchical sub-division. Nature 202 (4936), Nanda, S.J., 2014. Nature inspired clustering algorithms for analysis of natural
1034–1035. databases. Hydrol. Meteorol. 5, 4.
MacQueen, J., 1967. Some methods for classification and analysis of multivariate Nanda, S.J., Panda, G., 2014. A survey on nature inspired metaheuristic algorithms for
observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical partitional clustering. Swarm Evol. Comput. 16, 1–18.
Statistics and Probability, Vol. 1, No. 14. pp. 281–297. Nasiri, J., Farzin, Khiyabani, M., Khiyabani, F.M., 2018. A whale optimization algorithm
Madhulatha, T.S., 2015. An overview on clustering methods. IOSR J. Eng. 2 (4), (WOA) approach for clustering under a creative commons attribution (CC-BY)
719–725. 4.0 license a whale optimization algorithm (WOA) approach for clustering. http:
Magoeva, K., Krzhizhanovskaya, V.V., Kovalchuk, S.V., 2018. Application of clustering //dx.doi.org/10.1080/25742558.2018.1483565.
methods for detecting critical acute coronary syndrome patients. In: Procedia Neel, J., 2005. Cluster Analysis Methods for Speech Recognition (Master Thesis in
Computer Science, 7th International Young Scientist Conference on Computational Speech Technology). Department of Speech, Music and Hearing, Royal Institute
Science, Vol. 138. pp. 370–379. of Technology, Stockholm.
Mangortey, E., Monteiro, D., Ackley, J., Gao, Z., Puranik, T.G., Kirby, M., et al., 2020. Newcomer, S.R., Steiner, J.F., Bayliss, E.A., 2011. Identifying subgroups of complex
Application of Machine Learning Techniques To Parameter Selection for Flight Risk patients with cluster analysis. Am. J. Manage. Care 17 (8), 324–332.
Identification. Georgia Inst. Technol., Atlanta, pp. 1–39. Nirkhi, S., Hande, K., 2008. A survey on clustering algorithms for web applications.
Manning, C.D., Raghavan, P., Schütze, H., 2009. Flat clustering. In: Clustering in In: SWWS. pp. 124–129.
Information Retrieval. Cambridge University Press, Cambridge, pp. 349–375. Novaliendry, D., Hendriyani, Y., Yang, C.-H., Hamimi, H., The optimized K-means
Mansalis, S., Ntoutsi, E., Pelekis, N., Theodoridis, Y., 2018. An evaluation of data clustering algorithms to analyzed the budget revenue expenditure in padang. in:
stream clustering algorithms. Statist. Anal. Data Min. ASA Data Sci. J. 11. http: Proceeding of International Conference on Electrical Engineering, Computer Science
//dx.doi.org/10.1002/sam.11380. and Informatics (EECSI 2015), Palembang, Indonesia, pp. 61-66.
Mao, J., Jain, A.K., 1996. A self-organizing network for hyperellipsoidal clustering Ntoutsi, Irene, Zimek, Arthur, Palpanas, Themis, Kroger, Peer, Kriegel, Hans-Peter,
(HEC). IEEE Trans. Neural Netw. 7 (1), 16–29. 2012. Density-based projected clustering over high dimensional data streams. In:
Marriot, F.H., 1975. Practical problems in a method of cluster analysis. Biometrics 27, Proc. of the 12th SIAM International Conference on Data Mining.
456–460. Nunes, N.F., 2011. Algorithms for Time Series Clustering Applied to Biomedical Signals.
Marriott, F.H.C., 1974. The Interpretation of Multiple Observations. Academic Press, New University of Lisbon, Faculty of Sciences and Technology, Physics Department,
London. Lisbon.
40
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Olson, C., 1995. Parallel algorithms for hierarchical clustering. Parallel Comput. 21, Ragaventhiran, J., Kavithadevi, M.K., 2020. Map-optimize-reduce: CAN tree assisted FP-
1313–1325. growth algorithm for clusters based FP mining on hadoop. Future Gener. Comput.
Omran, M., Salman, A., Engelbrecht, A., 2002. Image classification using particle swarm Syst. 103, 111–122. http://dx.doi.org/10.1016/j.future.2019.09.041.
optimization. In: Wang, L., Tan, K.C., Furuhashi, T., Kim, J.-H., Yao, X. (Eds.), Rahnema, N., Gharehchopogh, F.S., 2020. An improved artificial bee colony algorithm
Proceedings of the Fourth Asia-Pacific Conference on Simulated Evolution and based on whale optimization algorithm for data clustering. Multimedia Tools Appl.
Learning, SEAL’02. IEEE Press, Piscataway, pp. 370–374. 79 (43–44), 32169–32194. http://dx.doi.org/10.1007/s11042-020-09639-2.
Orlov, V., Kazakovtsev, L., Rozhnov, I., Popov, N., Fedosov, V., 2018. Variable Rajagopal, S., 2011. Customer data clustering using data mining technique. Int. J.
neighbourhood search algorithm for K-means clustering. IOP Conf. Series: Mater. Database Manage. Syst. ( IJDMS ) 3 (4).
Sci. Eng. 450, 022035. Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. J. Amer.
Ormerod, P., Mounfield, C., 2000. Localised structures in the temporal evolution of Statist. Assoc. 66 (336), 846–850.
asset prices. In: New Approaches to Financial Economics. Santa Fe Conference. Rani, S., Sikka, G., 2012. Recent techniques of clustering of time series data: a survey.
Örnek, Ö., Subaşı, A., 2011. Clustering marketing datasets with data mining techniques. Int. J. Comput. Appl. 52 (15).
Journal 408–412. Räsänen, O., 2007. Speech Segmentation and Clustering Methods for a New Speech
Ouadfel, S., Batouche, M., Taleb-Ahmed, A., 2010. A modified particle swarm Recognition Architecture. Helsinki University of Technology, Department of Elec-
opti-mization algorithm for automatic image clustering. In: Proceedings of the trical and Communications Engineering, Laboratory of Acoustics and Audio Signal
1stInternational Symposium on Modeling and Implementing Complex Systems. Processing.
Constantine, Algeria. Rasmussen, E.M., 1992. Clustering Algorithms. Information Retrieval: Data Structures
Oyelade, J., Isewon, I., Oladipupo, F., Aromolaran, O., Uwoghiren, E., Ameh, F., et al., & Algorithms, vol. 419. p. 442.
2016. Clustering algorithms: Their application to gene expression data. Bioinform. Rastgarpour, M., Shanbehzadeh, J., Soltanian-Zadeh, H., 2014. A hybrid method
Biol. Insights 10, 237–253. based on fuzzy clustering and local region-based level set for segmentation of
Pakhira, M.K., Bandyopadhyay, S., Maulik, U., 2004. Validity index for crisp and fuzzy inhomogeneous medical images. J. Med. Syst. 38 (8), 1–15.
clusters.. Pattern Recognit. 37 (3), 487–501. Ratkowsky, D.A., Lance, G.N., 1978. A criterion for determining the number of groups
Pandit, S., Gupta, S., 2011. A comparative study on distance measuring approaches for in a classification. Aust. Comput. J. 10, 115–117.
clustering.. Int. J. Res. Comput. Sci. 2 (1), 29–31. Ray, S., Turi, R.H., 1999. Determination of number of clusters in k-means clustering and
Parida, P., 2018. Fuzzy clustering based transition region extraction for image application in colour image segmentation. In: Proceedings of the 4th International
segmentation. Future Comput. Inform. J. 32, 1–333. Conference on Advances in Pattern Recognition and Digital Techniques. Narosa
Parmar, D., Wu, T., Blackhurst, J., 2007. Mmr: an algorithm for clustering categorical Pub. House, Calcutta, pp. 137–143.
data using rough set theory. Data Knowl. Eng. 63 (3), 879–893. Reddy, M., Babu, M.R., 2019. Implementing self adaptiveness in whale optimization
Parsons, Lance, Haque, Ehtesham, Liu, Huan, 2004. Subspace clustering for high for cluster head section in Internet of Things. Cluster Comput. 22 (1), 1361–1372.
dimensional data: A review. SIGKDD Explor. 6, 90–105. http://dx.doi.org/10.1145/ Ren, J., Cai, B., Hu, C., 2011. Clustering over data streams based on grid density and
1007730.1007731. index tree. J. Converg. Inf. Technol. 6 (1), 83–93.
Ren, J., Ma, R., 2009. Density-based data streams clustering over sliding windows. In:
Patel, P.M., Shah, B.N., Shah, V., 2013. Image segmentation using K-mean clustering
2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery,
for finding tumor in medical application. Int. J. Comput. Trends Technol. 4 (5),
Vol. 5. IEEE, pp. 248–252.
1239–1242.
Rendon, E., Garcia, R., Abundez, I., Gutierrez, C., Gasca, E., Del Razo, F., Gonzalez, A.,
Paterlini, S., Krink, T., 2006. Differential evolution and particle swarm optimisation in
2008. Niva:A robust cluster validity. In: WSEAS International Conference. Proceed-
partitional clustering. Comput. Stat. Data Anal. 50 (5), 1220–1247.
ings. Mathematics and Computers in Science and Engineering. World Scientific and
Patidar, A.K., Agrawal, J., Mishra, N., 2012. Analysis of different similarity measure
Engineering Academy and Society, Crete, Greece, pp. 209–213.
functions and their impacts on shared nearest neighbor clustering approach. Int. J.
Řezanková, H., 2014. Cluster analysis of economic data. STATISTIKA 94 (1), 73–86.
Comput. Appl. 40 (16), 1–5.
Rijsbergen, V., 1979. Information Retrieval. Butterworths, London.
Patil, C., Baidari, I., 2019. Estimating the optimal number of clusters k in a dataset
Rohlf, F.J., 1974. Methods of comparing classifications. Annu. Rev. Ecol. Syst. 5 (1),
using data depth. Data Sci. Eng. 4 (2), 132–140.
101–113.
Pavlidis, N., Plagianakos, V.P., Tasoulis, D.K., Vrahatis, M.N., 2006. Financial fore-
Rokach, L., 2005. Clustering Methods, Data Mining and Knowledge Discovery
casting through unsupervised clustering and neural networks. Oper. Res. 6 (2),
Handbook. Springer, pp. 331–352.
103–127.
Ros, F., Guillaume, S., 2019. Information sciences munec: A mutual neighbor-based
Pelleg, D., 2000. Extending K-Means with efficient estimation of the number of clusters
clustering algorithm. In: Information Sciences, Vol. 00. https://www.elsevier.com/
in ICML. In: Proceedings of the 17th international conference on machine learning,
open-access/userlicense/1.0/.
pp. 277–281.
Rose, R.L., Puranik, T.G., Mavris, D.N., 2020. Natural language processing based method
Peng, X., Zhou, C., Hepburn, D.M., Judd, M.D., Siew, W.H., 2013. Application of K-
for clustering and analysis of aviation safety narratives. Aerospace 7 (143), 1–22.
means method to pattern recognition in on-line cable partial discharge monitoring.
Rousseeuw, P., 1987. Silhouettes: a graphical aid to the interpretation and validation
IEEE Trans. Dielectr. Electr. Insul. 20 (3), 754–761.
of cluster analysis. J. Comput. Appl. Math. 20, 53–65.
Piggott, J., 2015. Identification of Business Travelers Through Clustering Algo- Rovetta, S., Mnasri, Z., Masulli, F., Cabri, A., 2019. Emotion recognition from speech
rithms (Master thesis Business and Information Technology). Faculty of Man- signal using fuzzy clustering. In: Atlantis Studies in Uncertainty Modelling, 11th
agement and Governance, Faculty of Electrical Engineering, Mathematics and Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT
Computer Science. 2019). pp. 120–127.
Pitchai, R., Supraja, P., Sulthana, R., Veeramakali, T., 2021. Brain tumor segmentation Rozhnov, I.P., Orlov, V.I., Kazakovtsev, L.A., 2019. Vns-based algorithms for the
and prediction using fuzzy neighborhood learning approach for 3D MRI images. centroid-based clustering problem. Facta Univ. Ser. Math. Inform. 34, 957–972.
Plant, C., Böhm, C., 2009. Novel Trends in Clustering. Technische Universität München, Ruiz, Carlos, Menasalvas, Ernestina, Spiliopoulou, Myra, C-denstream 2009. Using
Munich Germany, pp. 1–28. domain knowledge on a data stream. In: Proc. of the International Conference
Pomente, A., Aleandri, D., 2017. Convolutional expectation maximization for population on Information Engineering and Computer Science, ICIECS.
estimation. In: CLEF (Working Notes). http://ceur-ws.org/Vol-1866/paper_138.pdf. Runkler, T.A., 2005. Ant colony optimization of clustering models. Int. J. Intell. Syst.
(Accessed July 7, 2021). 20 (12), 1233–1251.
Popivanov, I., Miller, R.J., 2002. Similarity search over time series data using wavelets. Saatchi, S., Hung, C.C., 2005. Hybridization of the ant colony optimization with the
In: Proceedings of the 18th Int’l Conference on Data Engineering, San Jose, CA, K-means algorithm for clustering. In: Image Analysis. In: Lecture notes in computer
Feburary 26-March 1, pp. 212–221. science, vol. 3540, Springer, Berlin.
Prabhu, P., 2011. Document clustering for information retrieval – a general perspective. Saemi, B., Hosseinabadi, A.A., Kardgar, M., Balas, V.E., Ebadi, H., 2018. Nature inspired
Indian Streams Res. J. 1–4. partitioning clustering algorithms: A review and analysis. Soft Comput. Appl. Adv.
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M., 2002. A monte carlo algo- Intell. Syst. Comput. 643, 97–116.
rithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD Safari, Z., Mursi, K.T., Zhuang, Y., 2020. Fast automatic determination of cluster
International Conference on Management of Data. ACM Press, pp. 418–427. numbers for high dimensional big data. In: Proceedings of the 2020 the 4th
Punit, R., 2018. Big Data Cluster Analysis and its Applications (Doctor of Philosophy). International Conference on Compute and Data Analysis, pp. 50–57.
The University Of Melbourne. Saha, S., Alok, A.K., Ekbal, A., 2016. Brain image segmentation using semi-supervised
Qian, Y., 2006. K-means algorithm and its application for clustering companies listed clustering. Expert Syst. Appl. 52, 50–63.
in zhejiang province. Data Min. VII: Data Text Web Min. Bus. Appl. 35–44. Saitta, R.B., Smith, I., 2007. Abounded index for cluster validity. In: Perner, P. (Ed.),
Qin, H., Ma, X., Herawan, T., Zain, J.M., 2014. Mgr: An information theory based In: Machine Learning and Data Mining in Pattern Recognition Lecture Notes in
hierarchical divisive clustering algorithm for categorical data. Knowl.-Based Syst. Computer Science, vol. 4571, Springer, Berlin, Heidelberg, pp. 174–187.
67, 401–411. Sanse, K., Sharma, M., 2015a. Clustering methods for big data analysis. Int. J. Adv.
Qu, J., Shao, Z., Liu, X., 2010. Mixed PSO clustering algorithm using point symmetry Res. Comput. Eng. Technol. 4 (3), 642–648.
distance. J. Comput. Inf. Syst. 6 (6), 2027–2035. Sanse, Keshav, Sharma, Meena, 2015b. Clustering methods for Big data analysis.
Raftery, A., 1986. A note on Bayes factors for log-linear contingency table models with Sardar, T.H., Ansari, Z., 2018a. An analysis of MapReduce efficiency in document
vague prior information. J. R. Statist. Soc. 48 (2), 249–250. clustering using parallel K-means algorithm. Future Comput. Inform. J. 3, 200–209.
41
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Sardar, T.H., Ansari, Z., 2018b. Partition based clustering of large datasets using Tang Rui, S.F., Yang, X.-S., Deb, S., 2012. Nature-inspired clustering algorithms for
MapReduce framework: An analysis of recent themes and directions. Future web intelligence data. In: 2012 IEEE/WIC/ACM International Conferences on Web
Comput. Inform. J. 3, 143–151. Intelligence and Intelligent Agent Technology. pp. 147–153.
Sasaki, Hiroaki, Kanamori, Takafumi, Hyvarinen, Aapo, Niu, Gang, Sugiyama, Masashi, Theodoridis, S., Koutroubas, K., 1999. Pattern Recognition. Academic Press.
2018. Mode-seeking clustering and density ridge estimation via direct estimation Thilakaratne, M., Falkner, K., Atapattu, T., 2019. A systematic review on literature-
of density-derivative-ratios. J. Mach. Learn. Res. 18 (2018), 1–47. based discovery workflow [PeerJ]. PeerJ Computer Science, 5, e235.trees. Pattern
Sathya Priya, P., Priyadharshini, S., 2012. Clustering technique in data mining for text Recognit. 43, 752–766.
documents. Int. J. Comput. Sci. Inf. Technol. 294, 3–2947. Thomas, M.C., Romagnoli, J., 2016. Extracting knowledge from historical databases for
Savaresi, S.M., Boley, D.L., Bittanti, S., Gazzaniga, G., 2002. Cluster selection in divisive process monitoring using feature extraction and data clustering. In: Proceedings of
clustering algorithms. In: Paper Presented at the Proceedings of the 2002 SIAM the 26th European Symposium on Computer Aided Process Engineering – ESCAPE
International Conference on Data Mining. 26, 859-864.
Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O.P., Tiwari, A., et al., 2017. A Tran, L., Fan, L., Shahabi, C., 2021. Mixed-type data with correlation-preserving embed-
review of clustering techniques and developments. Neurocomputing http://dx.doi. ding. In: International Conference on Database Systems for Advanced Applications.
org/10.1016/j.neucom.2017.06.053. Springer, Cham, pp. 342–358.
Scott, A.J., Symons, M.J., 1971. Clustering methods based on likelihood ratio criteria. Tsay, R.S., 2005. Analysis of Financial Time Series. John Wiley & SONS.
Biometrics 387–397. Tunali, V., Bilgin, T., Camurcu, A., 2015. An improved clustering algorithm for text
mining: Multi-cluster spherical K-means. Int. Arab J. Inform. Technol. 1, 2–19.
Senthilnath, J., Omkar, S.N., Mani, V., 2011. Clustering using firefly algorithm:
Ultsch, A., Mörchen, F., 2005. ESOM-Maps: Tools for Clustering, Visualization, and
performance study. Swarm Evol. Comput. 1 (3), 164–171.
Classification with Emergent SOM.
Sfetsos, A., Siriopoulos, C., 2004. Time series forecasting with a hybrid clustering
Vaidya, J., Clifton, C., 2003. Privacy-preserving k-means clustering over vertically par-
scheme and pattern recognition. IEEE Trans. Syst. Man Cybern. 34 (3), 399–405.
titioned data. In: Proceedings of the Ninth ACM SIGKDD International Conference
Shaffer, E., Dubes, R.C., Jain, A.K., 1979. Single-link characteristics of a mode-seeking
on Knowledge Discovery and Data Mining. pp. 206–215.
clustering algorithm. Pattern Recognit. 11 (1), 65–70.
Vani, H., Anusuya, M.A., 2019. Fuzzy clustering algorithms - comparative studies for
Sharan, R., Shamir, R., 2000. CLICK: a clustering algorithm with applications to gene
noisy speech signals. Ictact J. Soft Comput. 9 (3), 1920–1926.
expression analysis. In: Proc international conference intelligent systems molecular
Venkataramana, B., Padmasree, L., Srinivasa Rao, M., Ganesan, G., Rama Krishna, K.,
biology, pp. 307–316.
2017. Implementation of clustering algorithms for real datasets in medical
Sharma, S., 1996. Applied Multivariate Techniques. John Willwy & Sons. diagnostics using MATLAB. J. Soft Comput. Appl. 2017 (1), 53–66.
Sheikh, R.H., Raghuwanshi, M.M., Jaiswal, A.N., 2008. Genetic algorithm based Verbeek, J., 2004. Mixture Models for Clustering and Dimension Reduction (Doctoral
clustering: a survey. In: First International Conference on Emerging Trends in dissertation). Universiteit van Amsterdam.
Engineering and Technology, Vol. 2. IEEE, Nagpur, India, pp. 314–319. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A., 2008 Extracting and com-
Shekar, B., Murty, M.N., Krishna, G., 1987. A knowledge-based clustering scheme. posing robust features with denoising autoencoders. In Proceedings of the 25th
Pattern Recognit. Lett. 5 (4), 253–259. international conference on Machine learning, pp. 1096–1103.
Shi, Z., Wu, D., Guo, C., Zhao, C., Cui, Y., Wang, F.Y., 2021. FCM-RDpA: TSK Vlachos, M., Gunopulos, D., Das, G., 2004. Indexing time-series under conditions of
fuzzy regression model construction using fuzzy C-means clustering, regularization, noise. In: Last, M., Kandel, A., Bunke, H. (Eds.), Data Mining in Time Series
Droprule, and Powerball Adabelief. Inform. Sci. 574, 490–504. http://dx.doi.org/ Databases. World Scientific, Singapore, p. 67.
10.1016/j.ins.2021.05.084. Voorhees, Ellen M., 1986. Implementing agglomerative hierarchic clustering algorithms
Shirkhorshidi, A.S., Shirkhorshidi, A.S., Wah, T.Y., Herawan, T., 2014. Big data for use in document retrieval. Inf. Process. Manage. (ISSN: 0306-4573) 22 (6),
clustering: A review. In: ICCSA 2014, Part V. In: LNCS 8583, Springer International 465–476.
Publishing Switzerland, pp. 707–720. Vries, C.M., 2014. Document clustering algorithms, representations and evaluation
Sibson, R., 1973. SLINK: An optimally efficient algorithm for the single-link cluster for information retrieval. In: Computational Intelligence and Signal Processing.
method. Comput. J. 16 (1), 30–34. http://dx.doi.org/10.1093/comjnl/16.1.30. Queensland University of Technology, Brisbane, Australia.
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., Carvalho, A.C.D., Gama, J., 2013. Žalik, K.R., Žalik, B., 2011. Validity index for clusters of different sizes and densities.
Data stream clustering: A survey. ACM Comput. Surv. 46 (1), 1–31. Pattern Recognit. Lett. 32 (2), 221–234.
Sinaga, K.P., Yang, M.S., 2020. Unsupervised K-means clustering algorithm. IEEE Access Waheed, A., Akram, M.U., Khalid, S., Waheed, Z., Khan, M.A., Shaukat, A., 2015.
8, 80716–80727. Hybrid features and mediods classification based robust segmentation of blood
Singh, S., Srivastava, S., 2020. Review of clustering techniques in control system. vessels. J. Med. Syst. 39 (10), 1–14.
Procedia Comput. Sci. 173, 272–280. http://dx.doi.org/10.1016/j.procs.2020.06. Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K., 2009. Density-based clustering of
032. data streams at multiple resolutions. ACM Trans. Knowl. Discov. Data (TKDD) 3
Singh, A., Yadav, A., Rana, A., 2013. K-means with three different distance metrics. (3), 1–28.
Int. J. Comput. Appl. 67 (10). Wang, Z., Chang, S., Zhou, J., Wang, M., Huang, T.S., 2016. Learning a task-specific
Sneath, P.H., Sokal, R.R., 1973. Numerical taxonomy. In: The Principles and Practice deep architecture for clustering. In: Proceedings of the 2016 SIAM International
of Numerical Classification. Conference on Data Mining. Society for Industrial and Applied Mathematics, pp.
Sonkamble, B.A., Doye, D.D., 2012. Speech recognition using vector quantization 369–377.
through modified K-MeansLBG Algorithm. Comput. Eng. Intell. Syst. 3 (7), 137–145. Wang, X.D., Chen, R.C., Yan, F., Zeng, Z.Q., Hong, C.Q., 2019. Fast adaptive K-
Soppari, K., Chandra, N.S., 2020. Development of improved whale optimization-based means subspace clustering for high-dimensional data. IEEE Access 7, 42639–42651.
FCM clustering for image watermarking. Comp. Sci. Rev. 37, 100287. http://dx. http://dx.doi.org/10.1109/ACCESS.2019.2907043.
Wang, S., Fan, J., Fang, M., Yuan, H., 2014. HGCUDF: Hierarchical grid clustering
doi.org/10.1016/J.COSREV.2020.100287.
using data field. Chinese J. Electron. 23 (1), 37–42.
Sruthi, K., Reddy, M.B., 2013. Document clustering on various similarity measures. Int.
Wang, T.-C., Pham, Y.T., 2020. An application of cluster analysis method to determine
J. Adv. Res. Comput. Sci. Softw. Eng. 3 (8), 1269–1273.
Vietnam airlines’ ground handling service quality benchmarks. J. Adv. Transp.
Stefan, R.-M., 2014. Cluster type methodology for grouping data. Proc. Econ. Finance
Hindawi 1–13.
Emerg. Markets Queries Finance Bus. 15, 357–362.
Wang, H., Wang, W., Yang, J., Yu, P.P.S., 2002. Clustering by pattern similarity in
Steinback, M., Tan, P.N., Kumar, V., Klooster, S., Potter, C., 2002. Temporal data mining
large data sets. In: Proceedings of 2002 ACM SIGMOD International Conference
for the discovery and analysis of ocean climate indices. In: The 2nd Workshop
Management data – SIGMOD ’02, 2, p. 394.
on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on
Wang, Y., Yan, H., Sriskandarajah, C., 1996. The weighted sum of split and diameter
Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada.
clustering. J. Classification 13 (2), 231–248.
Strehl, A., Ghosh, J., 2000. Clustering guidance and quality evaluation using Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., Ma, W.Y., 2003. Recom: reinforcement
relationship-based visualization. In: Intelligent Engineering Systems through clustering of multi-type interrelated data objects. In Proceedings of the 26th annual
Artificial Neural Networks. St. Louis, Missouri, USA, pp. 483–488. international ACM SIGIR conference on Research and development in informaion
Su, Z.G., Wang, P.H., Shen, J., Li, Y.G., Zhang, Y.F., Hu, E.J., 2012. Automatic fuzzy retrieval, pp. 274–281.
partitioning approach using variable string length artificial bee colony (VABC) Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. J. Amer.
algorithm. Appl. Soft Comput. 12 (11), 3421–3441. Statist. Assoc. 58, 236–244.
Suganya, R., Pavithra, M., Nandhini, P., 2018. Algorithms and challenges in big data Ware, S., Dhawas, N.A., 2012. Web document clustering using KEA-means algorithm.
clustering. Int. J. Eng. Tech. 4 (4), 40–47. Int. J. Comput. Technol. Appl. 3 (5), 1720–1725.
Sugavaneswaran, L., 2017. Mathematical modeling of gene networks. Encycl. Biomed. Warrenliao, T., 2005. Clustering of time series data—a survey. Pattern Recognit 38
Eng. 1–23. http://dx.doi.org/10.1016/B978-0-12-801238-3.64118-1. (11), 1857–1874.
Sung, C.S., Jin, H.W., 2000. A tabu-search-based heuristic for clustering. Pattern Wei, L., Kumar, N., Lolla, V., Keogh, E., 2005. Assumption-free anomaly detection in
Recognit. 33, 849–858. http://dx.doi.org/10.1016/S0031-3203(99)00090-4. time series. In: roceedings of the 17th International Conference on Scientific and
Suresh, K., Kundu, D., Ghosh, S., Das, S., Abraham, A., 2009. Data clustering using Statistical Database Management, pp. 237–240.
multi-objective differential evolution algorithms. Fund. Inform. 97 (4), 381–403. Weidt, F., Silva, R., 2016. Systematic Literature Review in Computer Science-a Practical
Tan, P.N., 2018. Introduction to Data Mining. Pearson Education India. Guide. Relatórios Técnicos Do DCC/UFJF, 1.
42
A.E. Ezugwu, A.M. Ikotun, O.O. Oyelade et al. Engineering Applications of Artificial Intelligence 110 (2022) 104743
Wharton, S.W., 1983. A generalized histogram clustering scheme for multidimensional Yang, J., Wang, W., Wang, H., Yu, P., 2002. 𝛿-Clusters: capturing subspace correlation
image data. Pattern Recognit. 16 (2), 193–199. in a large data set. In: Data Engineering, 2002. Proceedings. 18th International
Williams, W.T., Lambert, J.M., 1959. Multivariate methods in plant ecology: I. Conference on. pp. 517–528.
Association-analysis in plant communities. J. Ecol. 83–101. Yin, S., Gan, G., Valdez, E.A., Vadiveloo, J., 2021. Applications of clustering with mixed
Woo, K.-G., Lee, J.-H., 2002. FINDIT: A Fast and Intelligent Subspace Clustering type data in life insurance. Risks 9 (3), 47.
Algorithm using Dimension Voting (Ph.D. thesis). Korea Advanced Institute of Yu, J., Li, H., Liu, D., 2020. Modified immune evolutionary algorithm for medical data
Science and Technology, Taejon, Kore. clustering and feature extraction under cloud computing environment. J. Healthcare
Wu, J., Zhong, S.-h., Jiang, J., Yang, Y., 2015. A novel clustering method for static Eng. 1–11.
video summarization. Multimedia Tools Appl. http://dx.doi.org/10.1007/s11042- Zadeh, L., 1965. Fuzzy sets. Inform. Control 8 (3), 338–353.
016-3569-x. Zahn, C.T., 1971. Graph-theoretical methods for detecting and describing gestalt
Xia, Y., Fu, Z., Pan, L., Duan, F., 2018. Tabu search algorithm for the distance- clusters. IEEE Trans. Comput. C-20 (1), 68–86. http://dx.doi.org/10.1109/T-C.
constrained vehicle routing problem with split deliveries by order. PLoS ONE 13 1971.223083.
(5), e0195457. http://dx.doi.org/10.1371/journal.pone.0195457. Zahoor, J., Zafar, K., 2020. Classification of microarray gene expression data using an
Xie, X.L., Beni, G., 1991. A validity measure for fuzzy clustering. IEEE Trans. Pattern infiltration tactics optimization (ITO) algorithm. Genes 11 (819), 1–28.
Anal. Mach. Intell. 13 (8), 841–847. Zare, H., Shooshtari, P., Gupta, A., Brinkman, R.R., 2010. Data reduction for spec-
Xie, J., Girshick, R., Farhadi, A., 2016. Unsupervised deep embedding for clustering tral clustering to analyze high throughput flow cytometry data. http://www.
analysis. https://github.com/piiswrong/dec. biomedcentral.com/1471-2105/11/403.
Xie, H., Zhang, L., Lim, C.P., Yu, Y., Liu, C., Liu, H., et al., 2019. Improving K-means Zerhari, B., Lahcen, A.A., Mouline, S., 2015. Big data clustering: Algorithms and
clustering with enhanced firefly algorithms. Appl. Soft Comput. 84. challenges. In: Conference: LRIT, Unité Associée au CNRST URAC 29. Mohammed
Xiong, T., Wang, S., Mayers, A., Monga, E., 2009. A new MCA-based divisive V University, Rabat, Morocco.
hierarchical algorithm for clustering categorical data. In: Paper Presented At the Zhang, T., Ramakrishnan, R., Linvy, M., 1996. BIRCH: An efficient method for very
2009 Ninth IEEE International Conference on Data Mining. large databases. In: ACM SIGMOD.
Xu, D., Tian, Y., 2015. A comprehensive survey of clustering algorithms. Ann. Data. Zhong, C., Miao, D., Wang, R., Zhou, X., 2008. DIVFRP: An automatic divisive
Sci. 2 (2), 165–193. hierarchical clustering method based on the furthest reference points. Pattern
Xu, R., Wunsch, D., 2005. Survey of clustering algorithms. Neural Netw. IEEE Trans. Recognit. Lett. 29 (16), 2067–2077.
on 16, 645–678. http://dx.doi.org/10.1109/TNN.2005.845141. Zhou, Aoying, Cao, Feng, Qian, Weining, Jin, Cheqing, 2008. Tracking clusters in
Yager, R., Filev, D., 1994. Approximate clustering via the mountain method. IEEE Trans. evolving data streams over sliding windows. Knowl. Inf. Syst. 15 (2).
Syst. Man Cybern. Part B: Cybern. 24 (8), 1279–1284. Zhou, L., Wang, D., Guo, L., Wang, L., Jiang, J., Liao, W., 2017. FDS Analysis for
Yaghini, M., Ghazanfari, N., 2020. Tabu-KM: A hybrid clustering algorithm based on multilayer insulation paper with different aging status in traction transformer of
tabu search approach. Int. J. Ind. Eng. Prod. Res. (2010), 71–79. high-speed railway. IEEE Trans. Dielectr. Electr. Insul. vol. 24 (5), 3236–3244.
Yan, F., Wang, X.D., Zeng, Z.Q., Hong, C.Q., 2020. Adaptive multi-view subspace Zhou, Y., Wu, H., Luo, Q., Abdel-Baset, M., 2019. Automatic data clustering using
clustering for high-dimensional data. Pattern Recognit. Lett. 130, 299–305. nature-inspired symbiotic organism search algorithm. Knowl.-Based Syst. 163,
Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M., 2017. Towards K-means-friendly spaces: 546–557.
Simultaneous deep learning and clustering. https://github.com/boyangumn/DCN. Zolhavarieh, S., Aghabozorgi, S., Teh, Y.W., 2014. A review of subsequence time series
Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T., 2014. Multitask spectral clustering by clustering. Sci. World J. 312521, 19. http://dx.doi.org/10.1155/2014/312521.
exploring intertask correlation. IEEE Trans. Cybern. 45 (5), 1083–1094.
Yang, Y., Wang, H., 2018. Multi-view clustering: A survey. In Big Data Mining and
Analytics, Vol. 1, (2). Tsinghua University Press, pp. 83–107. http://dx.doi.org/10.
26599/BDMA.2018.9020003.
43