0% found this document useful (0 votes)
91 views12 pages

Expert Systems With Applications: D. Binu

This document discusses using optimization algorithms like genetic algorithms, cuckoo search, and particle swarm optimization with newly designed objective functions for cluster analysis. It explores using three new objective functions that incorporate fuzzy membership, distance values, and kernel space. A total of 21 different clustering algorithms are evaluated on 16 datasets and compared using five evaluation metrics. The results suggest selecting suitable clustering algorithms based on the objective function and dataset. Optimization algorithms can improve cluster analysis by finding optimal clusters.

Uploaded by

kalokos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views12 pages

Expert Systems With Applications: D. Binu

This document discusses using optimization algorithms like genetic algorithms, cuckoo search, and particle swarm optimization with newly designed objective functions for cluster analysis. It explores using three new objective functions that incorporate fuzzy membership, distance values, and kernel space. A total of 21 different clustering algorithms are evaluated on 16 datasets and compared using five evaluation metrics. The results suggest selecting suitable clustering algorithms based on the objective function and dataset. Optimization algorithms can improve cluster analysis by finding optimal clusters.

Uploaded by

kalokos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ESWA 9942 No.

of Pages 12, Model 5G


10 April 2015

Expert Systems with Applications xxx (2015) xxxxxx


1

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

5
6

3 Cluster analysis using optimization algorithms with newly designed


4 objective functions
7 D. Binu
8 Aloy Labs, Bengaluru, India

10
9
11
a r t i c l e i n f o a b s t r a c t
1 4
2 3
14 Article history: Clustering nds various applications in the eld of medical and telecommunication for unsupervised 25
15 Available online xxxx learning which is much required in expert system and its application. Various algorithms have been 26
developed to clustering for the past fty years after the introduction of k-means clustering. Recently, 27
16 Keywords: optimization algorithms are applied for clustering to nd optimal clusters with the help of different 28
17 Clustering objective functions. Accordingly, in this research, clustering is performed using three newly designed 29
18 Optimization objective functions along with four existing objective functions with the help of optimization algorithms 30
19 Genetic algorithm (GA)
like, genetic algorithm, cuckoo search and particle swarm optimization algorithm. Here, three different 31
20 Cuckoo search (CS)
21 Particle swarm optimization (PSO)
objective functions are designed including the cumulative summation of fuzzy membership and distance 32
22 Kernel space value with normal data space, kernel space as well as multiple kernel space. In addition to the existing 33
23 seven objective functions, totally, 21 different clustering algorithms are discussed and the performance 34
is validated with 16 different datasets which are synthetic, small and large scale real data. The compar- 35
ison is made with ve different evaluation metrics to validate the effectiveness and efciency. From the 36
research outcome, the suggestion is presented to select a suitable algorithm among 21 algorithms for a 37
particular data and results proved that the effectiveness of cluster analysis is mainly dependent on objec- 38
tive function and the efciency of cluster analysis is based on search algorithm. 39
2015 Published by Elsevier Ltd. 40
41

42
43
44 1. Introduction k-means clustering, including fuzzy concept in computing the clus- 64
ter centroids. FCM- algorithm is also found at various variants 65
45 Expert systems are intelligent software programs designed for among the researchers (Ji, Pang, Zhou, Han, & Wang, 2012; Ji 66
46 taking useful and intelligent managerial decisions for various et al., 2012; Kannana, Ramathilagam, & Chung, 2012; Linda & 67
47 domains like, agriculture, nance, education, medicine to military Manic, 2012; Maji, 2011). The important variants of FCM algorithm 68
48 science, process control, space technology and engineering. The is kernel fuzzy clustering (Zhang & Chen, 2004) and multiple ker- 69
49 expert systems require different data mining methods to support nel-based clustering algorithm (Chen, Chen, & Lu, 2011) which 70
50 decision making process. Among different data mining methods, are based on the concept of FCM algorithm with the inclusion of 71
51 classication and clustering are two important methods applied kernels, accepted widely for its capability of doing the task for 72
52 widely for expert system. Clustering which is unsupervised learn- non-linear data. More interestingly, these entire algorithms have 73
53 ing has received signicant attention among the researchers due to found importance in image segmentation (Chen et al., 2011; 74
54 its wide applicability for the past fty years after the introduction Zhang & Chen, 2004) and the relevant applications related to image 75
55 of k-means clustering algorithm (McQueen, 1967), which is well- segmentation (Ji, Pang, et al., 2012; Ji et al., 2012; Li & Qi, 2007; 76
56 known algorithm for clustering due to its simplicity. Due to the Sulaiman & Isa, 2010; Szilgyi, Szilgyi, Benyb, & Beny, 2011; 77
57 reception of k-mean clustering, variants of k-means clustering Zhao, Jiao, & Liu, 2013). 78
58 algorithms are introduced by different researchers by pointing After the introduction of soft computing techniques, the cluster- 79
59 out various problems like, initialization (Khan & Ahmad, 2004), ing problem is transformed to optimization problem, nding the 80
60 k-value (Pham, Dimov, & Nguyen, 2004), and distance com- optimal clusters in the dened search space. Accordingly most of 81
61 putation. One of the most accepted methods of clustering after the optimization algorithms are applied to clustering problems. 82
62 the introduction of k-means clustering is fuzzy c-means clustering For example, the rst and pioneer optimization algorithm called, 83
63 (FCM) (Bezdek, 1981), which is a popular algorithm after the GA (Mualik & Bandyopadhyay, 2002) is applied for clustering ini- 84
tially and then, PSO algorithm (Premalatha & Natarajan, 2008), 85

E-mail address: [email protected] Articial Bee Colony (Zhang, Ouyang, & Ning, 2010), Bacterial 86

http://dx.doi.org/10.1016/j.eswa.2015.03.031
0957-4174/ 2015 Published by Elsevier Ltd.

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

2 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx

87 Foraging Optimization (Wan, Li, Xiao, Wang, & Yang, 2012), further way, the multiple objectives are combined to do the clus- 152
88 Simulated Annealing (Selim & Alsultan, 1991), Differential tering optimization as like (Bandyopadhyay, 2011). Here, cluster 153
89 Evolution Algorithm (Das, Abraham, & Konar, 2008), and stability and validity are combined as tness and then it is solved 154
90 Evolutionary algorithm (Castellanos-Garzn & Diaz, 2013) and using optimization algorithm, simulated annealing (Saha & 155
91 Firey (Senthilnath, Omkar, & Mani, 2011) are subsequently Bandyopadhyay, 2009). 156
92 _
applied for clustering. Recently, Inkaya, Kayalgil, and zdemirel, By means of the overall analysis, our nding is that most of the 157
93 2015 utilized Ant Colony Optimization for clustering methodology optimization algorithm utilizes k-means (KM) and FCM objective 158
94 using two objective functions, namely adjusted compactness and for their clustering optimization. Moreover with the best of our 159
95 relative separation. Liyong, Witold, Wei, Xiaodong, and Li (2014) knowledge, MKFCM (Multiple Kernel FCM) objective is not solved 160
96 utilized genetically guided alternating optimization for fuzzy c- previously through optimization clustering. So, with the intention 161
97 means clustering. Here, interval number was introduced for attri- of doing clustering task with optimization totally, two well known 162
98 bute weighting in the weighted fuzzy c-means (WFCM) clustering objectives (k-means and FCM), two recent objectives (KFCM and 163
99 to obtain appropriate weights more easily from the viewpoint of MKFCM) and three newly designed objective functions are utilized 164
100 geometric probability. Hoang, Yadav, Kumar, and Panda (2014) here. The reason of selecting these objectives is its (i) applicability 165
101 have utilized the recent optimization algorithm called, Harmony and popularity (k-means and FCM objectives are chosen), (ii) 166
102 Search Algorithm for clustering. Yuwono, Su, Moulton, and regency and standard (KFCM and MKFCM are chosen), (iii) effec- 167
103 Nguyen (2014) have developed Rapid Centroid Estimation utilizing tiveness and importance (three newly designed objective func- 168
104 the rules of PSO algorithm to reduce the computational complexity tion). Then, we are in need of optimization algorithm for solving 169
105 and produced the clusters with higher purity. These recent algo- these objectives. Even though various optimization algorithms 170
106 rithms utilized the traditional objective function for evaluating are presented in the literature, three optimization algorithms are 171
107 the clustering solution. chosen for our task of applying clustering process. Here, GA, PSO 172
108 After that, hybrid algorithms are in the eld of doing clustering algorithm and CS algorithm are chosen because GA is traditional 173
109 process over the datasets to utilize the advantages of both the algo- and popular one (Goldberg & David, 1989), PSO is an intelligent 174
110 rithms taken for hybridization. Here, two optimization algorithms algorithm accepted by various researchers to its capability of 175
111 are combined to do the clustering task as like, GA with PSO (Kuo, changing the condition according to its most optimistic position 176
112 Syu, Chen, & Tien, 2012). From this, we can say that if any new (Kennedy & Eberhart, 1995), CS is a recent and effective algorithm 177
113 optimization algorithms are being done, researchers are waiting proved better for various complex task of engineering problems 178
114 to utilize the updating algorithm for clustering process. Due to the (Yang & Deb, 2010). 179
115 successful application of hybrid algorithms in clustering process, The basic organization of the paper is given as follows: Section 2 180
116 researchers are then hybridized the traditional clustering algo- provides contributions of the paper and Section 3 discusses objec- 181
117 rithms with the optimization algorithm. For example, GA is com- tive measures taken from the literature. Section 4 presents new 182
118 bined with k-means clustering, called genetic-k-means (Krishna & objective functions designed and Section 5 provides the solution 183
119 Murty, 1999) and the similar type of work is given in Niknam and encoding procedure. Section 6 discusses optimization algorithms 184
120 Amiri (2010). Recently, Krishnasamy, Kulkarni, and Paramesran taken for data clustering and Section 7 discusses the experi- 185
121 (2014) have proposed hybrid evolutionary data clustering algo- mentation with detailed results. Finally, the conclusion is summed 186
122 rithm referred to as K-MCI, whereby, K-means with modied cohort up in Section 8. 187
123 intelligence are combined for data clustering. Wei, Yingying, Soon
124 Cheol, and Xuezhong (2015) have developed hybrid evolutionary 2. Contributions of the paper 188
125 computation approach utilizing Quantum-behaved particle
126 swarm optimization for data clustering. Garcia-Piquer, Fornells, The most important contributions of the paper are discussed as 189
127 Bacardit, Orriols-Puig, and Golobardes (2014) have developed follows: 190
128 Multiobjective Clustering to guide the search following a cycle
129 based on evolutionary algorithms. Tengke, Shengrui, Qingshan, (i) Clustering process with optimization: We have developed 191
130 and Huang (2014) have proposed a cascade optimization clustering process through optimization technique in order 192
131 framework that combines the weighted conditional probability to accomplish the optimal cluster quality. So, two traditional 193
132 distribution (WCPD) and WFI models for data clustering. objective function (KM and FCM), two recent objective func- 194
133 In optimization-based clustering applications, clustering prac- tions (KFCM and MKFCM) and three newly developed objec- 195
134 tices are operated, based on the tness function, which validates tive functions are operated to do the task. Moreover, the 196
135 the optimal cluster achieved. Here, the constraint is that tness optimization algorithms such as, GA, PSO and CS algorithm 197
136 function developed should be capable of providing the good clus- are considered. 198
137 ters quality. The objective function is also responsible for the val- (ii) Hybridization: With the best of our knowledge, MKFCM 199
138 idation of the clustering output and directing it through the objective is rstly solved with the optimization algorithms 200
139 optimal cluster centroids. However, when looking into clustering in this work. Hence, three optimization algorithms such as, 201
140 tness functions, most of the optimization-based algorithm uti- GA, PSO and CS algorithm are combined with MKFCM objec- 202
141 lized the k-mean objective (minimum mean square distance) as t- tive functions to get three new hybridization algorithms, 203
142 ness function for optimal searching of cluster task (Wan et al., (GA-MKFCM, PSO-MKFCM, and CS-MKFCM) which are not 204
143 2012) because of its simplistic computation. Similarly, FCM objec- presented in the literature previously. 205
144 tive is also applied as tness function for nding the optimal clus- (iii) New objective functions: We have designed three new objec- 206
145 ter centroids (Ouadfel & Meshoul, 2012) due to its exibility and its tive functions (FCM + CF, KFCM + KCF, MKFCM + MKCF), 207
146 effectiveness. Also, authors utilize some cluster validity indices to including the cumulative summation of fuzzy membership 208
147 apply on swarm intelligence-based optimization algorithm (Xu, and distance value. Here, the same cumulative summation 209
148 Xu, & Wunsch, 2012) with the different perspective of cluster qual- is also performed with kernel space as well as multiple ker- 210
149 ity. In addition to, fuzzy cluster validity indices are developed with nel space. Again, these three new objective functions are 211
150 the inclusion of fuzzy theory and then, it is applied on optimization derived with good mathematical formulation and the 212
151 algorithm, GA (Pakhira, Bandyopadhyay, & Maulik, 2005). In a corresponding theorem and the proof is moreover provided. 213

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

D. Binu / Expert Systems with Applications xxx (2015) xxxxxx 3

Table 1 3.2. Objective function 2: (FCM) 258


Algorithms.

GA PSO CS The objective of clustering problem can be represented in 259

KM GA-KM c
PSO-KM c
CS-KM c another way utilizing the fuzzy membership function along with 260
FCM GA-FCMc PSO-FCMc CS-FCMc the distance variable. Let the objective function of FCM (Bezdek, 261
KFCM GA-KFCMc PSO-KFCMc CS-KFCMc 1981) 262
MKFCM GA-MKFCMb PSO-MKFCMb CS-MKFCMb
263
FCM + CF GA-FCM + CFa PSO-FCM + CFa CS-FCM + CFa
X
n X
g
 2
GA-KFCM + KCFa PSO-KFCM + KCFa CS-KFCM + KCFa
KFCM + KCF
OBFCM ubij xi  mj  3
MKFCM + MKCF GA- PSO- CS- 265
i1 j1
MKFCM + MKCFa MKFCM + MKCFa MKFCM + MKCFa
a
Denotes algorithms with new objective function (noval works). subject to the following constraints: 266
267
b
Denotes algorithms with old objective function (no similar works).
c i mi \ mj /; i; j 1; 2; . . . ; g; i j
Denotes existing algorithms.
ii mi /; i 1; 2; . . . ; g
214 (iv) Algorithms: We have presented nine algorithms (GA- g
215 FCM + CF, PSO-FCM + CF and CS-FCM + CF, GA-KFCM + KCF, iii [ xi 2 mj X; i 1; 2; . . . ; n
j1
216 PSO-KFCM + KCF, CS-KFCM + KCF, GA-MKFCM + MKCF, PSO- X
g
4
217 MKFCM + MKCF, and CS-MKFCM + MKCF) recently with the iv uij 1; i 1; 2; . . . ; n
218 help of three newly designed objective functions. Three algo- j1
219 rithms (GA-MKFCM, PSO-MKFCM, and CS-MKFCM) are pre-   1 
xi  mj b1
220 sented by hybridizing in a new way as not presented in v ubij P   1 
221 the literature. Nine obtainable algorithms are also consid- g xi  mj b1
j1 269
222 ered subsequently. So, totally, 21 algorithms are discussed
223 in this paper. These algorithms are the major contribution
224 of this paper after detailed analysis with the literature. The 3.3. Objective function 3: (KFCM) 270
225 clustering algorithms formulated based on the hybridization
226 of optimization algorithms and its objective function is The same clustering problem can be indicated as optimization 271
227 specied in Table 1. problem including, distance and fuzzy membership as variables 272
228 (v) Validation: To validate the 21 algorithms, ve evaluation but, the computation is done in kernel space instead of original 273
229 metrics and 16 datasets where, eight real datasets, two data space. The kernel-based clustering problem is given as 274
230 image data and six datasets generated are synthetically uti- (Zhang & Chen, 2004), 275
231 lized. Then, performance of algorithms is extensively ana- 276
232 lyzed with three different perspectives like, search X
n X
g

233 algorithm, objective functions and hybridization view to OBKFCM ubij 1  kxi ; mj 5
i1 j1 278
234 encounter the effect of clustering results. Most excellent
235 algorithms for clustering are suggested at last with respect subject to the following constraints: 279
236 to the characteristics of input data. 280
237 i mi \ mj /; i; j 1; 2; . . . ; g; i j
238 2.1. Problem denition ii mi /; i 1; 2; . . . ; g
[
g

239 Let X be the database, consisting of n data points located in d-di- iii xi 2 mj X; i 1; 2; . . . ; n
j1
240 mensional real space of xi 2 Rd . The denition of clustering is to
X
g
241 divide the n data points into m clusters, which means that iv uij 1; i 1; 2; . . . ; n
242 mj 1 6 j 6 g cluster centre should be identied from the input 6
j1
243 database. 1
1  kxi ; mj b1
v ubij P 1
g
244 3. Objective measures considered from the literature  kxi ; mj b1
j1 1
  !
xi  mj 2
245 3.1. Objective function 1: (KM) v i kxi ; mj exp 
r2 282
246 The clustering problem dened above is converted into As suggested in Chen et al. (2011), the value of r and fuzzica- 283
247 optimization problem in such a way that minimizing the summa- tion coefcient, b are xed as 150 and 2 respectively. 284
248 tion of distance between the entire data points with its nearest
249 cluster. Let the objective function of k-means clustering 3.4. Objective function 4: (MKFCM) 285
250 (McQueen, 1967)
251
X
n X
g
  The purpose of grouping the data points presented in the origi- 286
OBKM xi  mj 2 1 nal data base, X can be also denoted as optimization problem (Chen 287
253 i1 j1 et al., 2011) like, 288

254 subject to the following constraints: 289


255 X
n X
g
i mi \ mj /; i; j 1; 2; . . . ; g; i j OBMKFCM ubij 1  kcom xi ; mj 7
i1 j1 291
ii mi /; i 1; 2; . . . ; g
2 subject to the following constraints:
[
g 292
iii xi 2 mj X; i 1; 2; . . . ; n 293
257 j1

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

4 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx

i mi \ mj /; i; j 1; 2; . . . ; g; i j utilized to do the membership and distance computation instead 328

ii mi /; i 1; 2; . . . ; g of the original data space. Hence, we named it as, kernel cumula- 329

[
g tive functions (KCF). Let the new denition of objective function 330
iii xi 2 mj X; i 1; 2; . . . ; n for clustering considering KCF 331
! 332
j1
X
g X
n X
g X
n
X
g
OBKFCMKCF ubij 1  kxi ; mj ubij
iv uij 1; i 1; 2; . . . ; n j1 i1;i2j j1 i1;i2j
j1 8 !
1 X
n
1  kcom xi ; mj b1  1  kxi ; mj 11
v ubij P 1 i1;i2j 334
m
j1 1  kcom xi ; mj b1

v i kcom xi  mj k1 xi ; mj  k2 xi ; mj subject to the following constraints: 335


  ! 336
xi  mj 2 i mi \ mj /; i; j 1; 2; . . . ; g; i j
v ii k1 xi ; mj k2 xi ; mj exp  ii mi /; i 1; 2; . . . ; g
295 r2
[
g
iii xi 2 mj X; i 1; 2; . . . ; n
j1
296 4. Devising of new objective functions X g
iv uij 1; i 1; 2; . . . ; n
12
297 4.1. Objective function 5: (FCM + CF) j1
1
1  kxi ; mj b1

298 The new objective function is introduced with the concern of v ubij P 1
g
j1 1  kxi ; mj
b1
299 distance, fuzzy variable along with two additional variables that   !
300 are not dened previously, called cumulative distance and cumula- xi  mj 2
v i kxi ; mj exp 
301 tive fuzzy values. These two additional variables are added into the r2 338
302 objective function of clustering because outlier data points or
303 noises can contribute to objective function even more with the
304 original data points. This contribution can be omitted by adding 4.3. Objective function 7: (MKFCM + MKCF) 339
305 the cumulative distance and cumulative membership value with
306 the old objective function. With the intention of this, two variables Even though the kernel space played a signicant role in dis- 340
307 are additionally added with FCM objective function to make more tance computation of clustering algorithm, multiple kernels also 341
308 suitable for clustering even outlier data points are presented. The played key role in differentiating the data points. Multiple kernels 342
309 following objective function is designed for clustering with the are combined to act as kernel at this point. It is just the variant of 343
310 addition of FCM objective with the newly introduced term called, kernel-based objective function in addition with combined kernel. 344
311 cumulative function (CF). Let the objective function of clustering Similarly, cumulative function is also performed with multiple ker- 345
312 based on new term dened, nels so that the name of the function we utilized after this is multi- 346
313 ! ple kernel cumulative function (MKCF). Let the addition of these 347
X
g X
n  2 Xg X
n
two objective function taken for clustering optimization 348
OBFCMCF ubij xi  mj  ubij ! 349
j1 i1;i2j j1 i1;i2j
X
g X
m X
g X
n

! OBMKFCMMKCF ubij 1  kcom xi ; mj ubij


n 
X   j1 i1;i2j j1 i1;i2j
 xi  mj 2 9 !
315 i1;i2j
X
n
 
 1  kcom xi ; mj 13
i1;i2j 351
316 subject to the following constraints:
317
i mi \ mj /; i; j 1; 2; . . . ; g; i j subject to the following constraints: 352
353
ii mi /; i 1; 2; . . . ; g i mi \ mj /; i; j 1; 2; . . . ; g; i j
[
g
ii mi /; i 1; 2; . . . ; g
iii xi 2 mj X; i 1; 2; . . . ; n
j1
[
g
iii xi 2 mj X; i 1; 2; . . . ; n
X
g 10 j1
iv uij 1; i 1; 2; . . . ; n
j1
X
g

  1  iv uij 1; i 1; 2; . . . ; n
xi  mj b1 j1 14
v ubij P   1  1
g xi  mj b1 1  kcom xi ; mj b1
319 j1 v ubij P 1
g
j1 1  kcom xi ; mj b1
v i kcom xi ; mj k1 xi ; mj  k2 xi ; mj
320 4.2. Objective function 6: (KFCM + KCF)   !
xi  mj 2
v ii k1 xi ; mj k2 xi ; mj exp 
321 The new denition of cumulative function by considering the r2 355
322 kernel space is given and it is utilized along with the objective
323 function of KFCM to dene the clustering optimization.
324 Absolutely, it is a small variant of cumulative function which is 5. Solution encoding 356
325 dened above but, it affects the performance of the algorithm very
326 well. The difference is very simple that the cumulative function is The explanation encoding procedure employed in the clustering 357
327 performed in the original data space but here, the kernel space is is given in Fig. 1. Every solution dened in the problem space is a 358

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

D. Binu / Expert Systems with Applications xxx (2015) xxxxxx 5

Fig. 1. Solution encoding.

399
359 vector which consists of d  g v alues: This signies that the solution is 6.2. PSO algorithm 400
360 centroid values for the given dataset, X. Suppose, the datasets is
361 having 3 attributes (3 dimensions), and then the solution vector Step 1. Initially, P solutions (centroid sets) are given in an initial 402
401
362 will contain six elements if the number of cluster needed is two. set of particles. Initial solutions are taken from the input 403
363 The rst three elements in the solution vector is the rst centroid dataset X and the velocity value of every particle is set to 404
364 and the last three elements in the solution vector the second cen- zero. 405
365 troid. This way of representing the solution can fulll the optimiza- Step 2. For every particle (centroid), the position is computed 406
366 tion criterion with less computation times even though the based on the objective function and the particle with mini- 407
367 dimension of the solution is high. mum tness is assigned as pbest for the current iteration. 408
pbest is local best guided towards reaching best position 409
368 6. Optimization algorithms for data clustering
based on one particle. 410
Step 3. Determine the particle that have minimum tness with 411
369 Solving optimization problems requires well-established
respect to entire iterations executed and update it as g best 412
370 heuristic procedures from the eld of intelligent search towards
which is global best particle found by all the particles in 413
371 using decision making in expert system. Metaheuristics are exten-
search space. 414
372 sively renowned as efcient approaches for many hard optimiza-
Step 4. Once we nd pbest and g best , the particles (centroids) 415
373 tion problems including cluster analysis. The eld of
velocities are newly generated using the following 416
374 metaheuristics for the application to real world optimization prob-
equation. 417
375 lems is a rapidly growing eld of research. This is due to the impor- 418
376 tance of optimization problems for the scientic as well as the v t1 w  v t /1  rnd  pbest  xt /2  rnd
377 industrial world to do decision making much faster. As these meta-  g best  xt 15 420
378 heuristics technologies mature and lead to widespread deploy-
379 ment of expert systems, optimal nding of solution becomes where /1 and /2 are set as two. v t is the old velocity of the 421
380 more important. Here, three different heuristic search algorithms particle and rnd is a random number between (0, 1). xt is 422
381 are effectively utilized for solving clustering problem. the current particle taken for nding new velocity. 423
Step 5. Then, new positions for all the particles are found out using 424
382 6.1. Genetic algorithm the new velocity and previous positions. The formulae 425
used for computing new position are given as follows: 426
384
383 Step 1: Initial population: Initially, P solutions are given in a pop- 427
385 ulation and every chromosome with d  m vector. xt1 xt v t1 16 429
386 Step 2: Fitness computation: For every chromosome (centroid Once we generate new position, lower bound and upper 430
387 sets), the objective function is computed. bound conditions are checked based on lower bound and 431
388 Step 3: Selection: From the initial population, set of chromosomes upper bound values availed in the input database. If the 432
389 are selected randomly based on selection rate. new position value is less than the lower bound, then, the 433
390 Step 4: Crossover: The crossover operator is applied on the new value is replaced with lower bound value and if it is 434
391 selected two candidates, and this produces two individuals greater, replace with upper bound value. 435
392 newly. 7. Go to step 2 until it reach maximum iteration. 436
393 Step 5: Mutation: The obtained new set of individuals is then fed 437
394 to the mutation operator that also provides a new set of 6.3. Cuckoo search algorithm 438
395 chromosomes.
396 Step 6: Termination: After performing cross over and mutation Step 1: Initially, P solutions are given in an initial set of nests and 440
439
397 operators, the algorithm go to step 2 up to the maximum
every nest represents with d  m matrix. 441
398 number of iteration specied by the user are reached.

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

6 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx


506
442 Step 2: Choose a random number j through levy ight equation Jaccard co-efficient; JC SS=SS SD DS 21 508
443 in between 1 to P and the corresponding solution (cen-
444 troids sets) is chosen. Adjusted rand Index, 509
510
445 Step 3: Evaluate the tness of nest in j th location based on the

n
446 objective function and a random number generated in SS DD  SS SDSS DS DS DDSD DD
2
447 between 1 to P blindly and the solution given in i th loca- ARI
2
448 tion of initial population is chosen to nd the tness of the n
 SS SDSS DS DS DDSD DD
449 solution. 2
450 Step 4: Replacing nest j by new solution if the tness belongs to j 22 512
451 is less than i : The evaluation of the tness of both solu-
452 tions taken from the previous steps is found out by com- Here, SS, SD, DS, DD represent the number of possible pairs of data 513

453 paring the tness. The new solution xt1 for the worst points where, 514

454 nest is performed by,


455 SS: both the data points belong to the same cluster and same 515
457 xt1 xt a  Lev yk 17 group. 516

458 where a > 0 is the step size which should be related to the SD: both the data points belong to the same cluster but different 517

459 scales of the problem of interest. The product  means groups. 518

460 entry-wise multiplications. Levy ights essentially provide DS: both the data points belong to different clusters but same 519

461 a random walk whereas their random steps are drawn group. 520

462 from a Levy distribution for large steps, DD: both the data points belong to different clusters and differ- 521
463 ent groups. 522
465 Lev y  u t k ; 1 < k 6 3 18 523
Computation time: The efciency of all the algorithms is evalu- 524
466 Once we generate new position, lower bound and upper
ated with the execution time which is measured by Matlab syntax, 525
467 bound conditions are checked based on lower bound and
tic and tac. toc reads the elapsed time from the stopwatch 526
468 upper bound values availed in the input database. If the
timer started by the tic function. 527
469 value obtained in new solution based on levy ight is
470 greater than upper bound, the values are replaced with
471 upper bound value. If the value is less than lower bound, 7.2. Datasets description 528
472 new value is replaced with lower bound value.
473 Step 5: Based on the probability pa given in the algorithm, the The experimental validation is performed with 16 different 529
474 worst set of nests are identied and building a new one datasets which are taken under four different categories like, small 530
475 in the corresponding location. scale real data, synthetic data, large scale real data and image data. 531
476 Step 6: The best set of nest is maintained in every iteration based Table 2 details total data objects, total attributes number of classes 532
477 on the objective function and the process is continued and dimension of solution for all the 16 datasets taken for the 533
478 from step 2 to 5 until the maximum iteration is reached. experimentation and Fig. 2 views the synthetic data generated 534
479 and the image data. 535
480 7. Results and discussion Small scale real data: For small scale data validation, eight data- 536
sets are captured from UCI learning repository (UCI, 2013). The 537
481 This section presents experimental validation of the clustering datasets are Iris, PID, Wine, Blood transfusion, Mammogram and 538
482 techniques. In order to handle with, evaluation metrics and dataset sonar dataset which are used to evaluate the clustering. Synthetic 539
483 taken for the validation of the clustering techniques is given with data: The synthetic information is generated to validate the perfor- 540
484 full description. Then, detailed experimental results are given with mance of the algorithm with respect to various size, shape, over- 541
485 tables and the corresponding discussion is furthermore given in lapping and classes. This synthetic data-based experimentation 542
486 this section. provides to what extent the algorithms are better for different 543
shapes, density of data records, data size, overlapping and classes. 544
487 7.1. Evaluation metrics

488 The performance of the algorithm is measured through effec-


Table 2
489 tiveness and efciency where, the effectiveness of the algorithms
Description of datasets.
490 is evaluated with three different evaluation metrics, like clustering
491 accuracy (CA) given in Yang and Chen (2011), Rand efcient (RC), Dataset name Notation Data Attributes No. of Dimension of
objects class solution
492 Jaccard co-efcient (JC) given in Wan et al. (2012) and Adjusted
493 rand index (ARI). The efciency is measured with computation Iris RD1 150 4 3 9
PID RD2 768 8 2 14
494 time.
Wine RD3 178 13 3 36
495 The denition of these metrics is given as follows: Sonar RD4 208 60 2 118
496 Clustering accuracy, Blood Transfusion RD5 748 5 2 8
497  
Mammogram RD6 961 6 2 10
PK jC \P j Secom LRD1 1567 591 2 1180
i1 maxjf1;2;...;gg 2 C i Pmj
j i j mj jj Madelon LRD2 4400 500 2 998
499 CA 19 Synthetic SD1 10,029 3 2 4
K
Synthetic SD2 12,745 3 3 6
500 Here, C fC 1 ; . . . ; C K g is a labeled data set that offers the ground Synthetic SD3 42,248 3 2 4

501 truth and Pm Pm1 ; . . . ; P mg is a partition produced by a cluster- Synthetic SD4 55,647 3 2 4
Synthetic SD5 5761 3 2 4
502 ing algorithm for the data set.
503 Synthetic SD6 22,162 3 2 4
505 Rand co-efficient; RC SS DD=SS SD DS DD 20 Image Micro 65,025 6 2 10
Image MRI 65,025 6 3 15

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

D. Binu / Expert Systems with Applications xxx (2015) xxxxxx 7

SD1 SD2 SD3 SD4

SD6 Microarray image MRI image


SD5

Fig. 2. Visualization of synthetic and image data.

545 Accordingly, SD1 and SD5 datasets are generated with the inten- and 10, respectively as these two are global convergence operator 582
546 tion of evaluating the performance in clustering of letter symbols. which is common for all optimization algorithms. 583
547 SD2 is generated to evaluate the performance of clustering algo-
548 rithm for the widely accepted shapes. SD6 is generated to validate
7.4. Performance evaluation 584
549 the performance of clustering algorithm in overlapping. Similarly,
550 SD3 and SD4 are used for validating the performance of the algo-
7.4.1. Experimentation with synthetic data 585
551 rithms in irregular shape of clusters. The synthetic dataset are gen-
The experimentation of all the 21 algorithms and k-means for 586
552 erated through the synthetic code which gets an input as an image
the synthetic data is executed in this section. The average perfor- 587
553 drawn by the users. Then, the image is converted to data sample by
mance of the algorithms for synthetic data is given with three 588
554 nding the location of the pixel with intensity as more than 0.
tables that are plotted based on three different views like, algo- 589
555 Based on the pixel location, 2D data is formed for every pixel;
rithm perspective, objective perspective and hybridization per- 590
556 refereeing x and y coordinate values.
spective. After nding performance measures for all the 591
557 Large scale real data: Here, two datasets such as secom and
algorithms, results are summarized by taking mean value of the 592
558 madelon are chosen with large dimension of solution nearly
six synthetic data. So, for the hybridization perspective, six sam- 593
559 1000 from UCI learning repository (UCI, 2013). Image data: Two
ples (measures of all the six synthetic data) are collected for all 594
560 medical images such as MRI and microarray image are taken and
the 21 algorithms and k-means to ll up Table 5. Similarly, the 595
561 then, data is formulated based on the gray level intensity. Here,
summary of algorithmic perspective given in Table 3 for synthetic 596
562 the total number of data records will be equal to the total number
data is lled up by taking the seven samples (out of 21 algorithms, 597
563 of pixels in the image and attributes of every pixel will be its inten-
every search algorithms are used seven times) for every data so 598
564 sity value and its four neighbor pixels value.
globally, the average performance of 42 samples (7 algorithms 6 599
synthetic data). It means that every tuple of Table 3 is the average 600
565 7.3. Experimental set up performance of seven genetic-based algorithms given in Table 1 601
with six synthetic data. Table 4 is the performance summary of 602
566 The clustering algorithms are written in MATLAB programming all the clustering algorithms with the perspective of objective 603
567 (Version: R2011a) and the results are taken after running with a formulation. The values are lled by taking 3 samples (out of 21 604
568 system of having 2.13 GHz Intel (R) Pentium (R) CPU with 2 GB algorithms, three unique objective functions are used) for every 605
569 RAM. For genetic algorithm, we have xed a Cross over rate and data so globally, the average performance of 18 samples (3 same 606
570 Mutation rate as 0.8 and 0.005 respectively, based on the sugges- objectives algorithms6 synthetic data) which are collected after 607
571 tion given in Hong and Kwong (2008). For PSO algorithm, the performing the clustering with three same objective-based algo- 608
572 parameter xed are, w 0:72; /1 1:49; /2 1:49 based on rithms given in Table 1 with six synthetic data samples. 609
573 recommendation (Merwe & Engelbrecht, 2003). Also, For all the six synthetic datasets, the maximum performance 610
574 pa 0:25; a 1 is set for cuckoo search algorithm as per (one) is achieved by any one of the 21 algorithms in terms of CA, 611
575 (Elkeran, 2013). Another one operator considered is cluster size RC and JC. In terms of ARI, CS-KM achieved maximum performance 612
576 which is xed based on the ground truth classes in the dataset, by reaching 0.9936. The minimum computation time for SD1, SD2, 613
577 means that the number of classes in the dataset is equivalent to SD3, SD4, SD5 and SD6 data is 148.206113 s, 280.510192 s, 614
578 cluster size given for clustering. Since the cluster size and attri- 701.163344 s, 950.327078 s, 87.518454 s and 5415 s. The mini- 615
579 butes are xed, the dimension of the solution is also xed. The mization will be varied based on datasets and the objective func- 616
580 other two common parameters in the optimization algorithm are tions so, through these objectives, the performance cannot be 617
581 iteration and population. These two operators are xed as 100 compared. From Table 3, CS in terms of CA is obtained 0.9164 618

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

8 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx

Table 3
Summary of data experimentation in the view of search algorithm.

GA PSO CS K-means GA PSO CS K-means


Synthetic data CA 0.8921 0.8748 0.9164 0.8944 Image data CA 0.9229 0.911 0.9293 0.9211
RC 0.8724 0.8548 0.8735 0.8463 RC 0.8772 0.8322 0.8506 0.8567
JC 0.8609 0.7647 0.8745 0.7747 JC 0.8833 0.7506 0.7910 0.8083
ARI 0.9028 0.8885 0.9179 0.9014 ARI 0.9338 0.9277 0.9287 0.9286
Time 49288.0 2237.0 5665.1 5415.2 Time 55994.0 1665.0 3771.0 13490.0
Small scale real data CA 0.809386 0.808186 0.819169 0.7535 Large scale real data CA 0.7961 0.8104 0.8059 0.7625
RC 0.59194 0.611712 0.599895 0.6522 RC 0.5967 0.6094 0.5944 0.5149
JC 0.45826 0.481398 0.441381 0.4768 JC 0.6274 0.6374 0.6337 0.5769
ARI 0.8047 0.7975 0.8168 0.7605 ARI 0.7513 0.7584 0.7562 0.7346
Time 54.9 2.8 7.2 24.3 Time 903.6 34.9 79.7 295.3

Table 4
Summary of data experimentation in the view of objective formalization.

KM FCM KFCM MKFCM FCM + CF KFCM + KCF MKFCM + MKCF


Synthetic data CA 0.9870 0.9620 0.8030 0.7418 0.9486 0.9227 0.7392
RC 0.9876 0.9436 0.8226 0.6556 0.8854 0.8775 0.6513
JC 0.9728 0.9146 0.8707 0.5145 0.8334 0.8140 0.4968
ARI 0.9772 0.9187 0.9555 0.7568 0.9464 0.8633 0.7660
Time 5415.0 7341.0 8979.0 5079.0 3980.0 3015.0 6067.0
Small scale real data CA 0.7539 0.7944 0.8670 0.6135 0.8409 0.9309 0.8848
RC 0.6523 0.6215 0.6194 0.4806 0.6431 0.6744 0.5167
JC 0.4768 0.4629 0.4777 0.3678 0.4586 0.5465 0.4319
ARI 0.7608 0.7893 0.8769 0.5803 0.8458 0.9322 0.8592
Time 24.3 24.4 21.0 20.2 23.8 18.6 19.3
Image data CA 0.9362 0.9564 0.9641 0.8115 0.9769 0.9456 0.8568
RC 0.9068 0.8886 0.8938 0.7312 0.9141 0.8699 0.7924
JC 0.8728 0.8287 0.8537 0.6862 0.8759 0.8181 0.7228
ARI 0.9212 0.9500 0.9780 0.8252 0.9948 0.9654 0.8759
Time 13490 22132 22138 30323 10477 14392 30387
Large scale real data CA 0.7628 0.7235 0.8525 0.8527 0.7321 0.8529 0.8529
RC 0.5149 0.4738 0.6872 0.6874 0.4636 0.6872 0.6872
JC 0.5770 0.5496 0.6873 0.6877 0.5536 0.6873 0.6873
ARI 0.7228 0.7150 0.7797 0.7796 0.7188 0.7797 0.7797
Time 295.4 306.1 359.1 364.4 306.2 434.5 376.7

Table 5 The CS algorithm outperformed in synthetic data as compared with 623


Summary of synthetic data experimentation in the view of objective formalization existing algorithms. For the objective minimization, KM objective 624
and search algorithm.
achieved 0.9870 which is higher than the existing and the pro- 625
CA RC JC ARI Time posed objective function. For the algorithmic perspective also, 626
K-means 0.8944 0.8463 0.7747 0.9014 5415.2 KM combined algorithms provided better results. 627
GA-KMc 0.9792 0.9628 0.9395 0.9711 14735.7
PSO-KMc 1 1 1 0.9669 454.9
CS-KMc 1 1 1 0.9936 1055.1 7.4.2. Experimentation with real data with small and medium scale 628
GA-FCMc 0.9713 0.9296 0.8826 0.9740 12661.2
dimension 629
PSO-FCMc 1 0.9578 0.9297 0.9749 455.2
CS-FCMc 0.9844 0.9615 0.9315 0.9872 1174.6 The summary experimental results of iris, PID, wine, sonar, 630
GA-KFCMc 0.9751 0.9254 0.8742 0.9699 15075.4 transfusion and mammogram data are given in Table 3, 4 and 6. 631
PSO-KFCMc 0.8884 0.8648 0.8054 0.9151 650.9 This experimentation is prepared with the intention of testing 632
CS-KFCMc 0.9864 0.9608 0.9308 0.9815 1224.7 the effectiveness and efciency of the clustering algorithm in small 633
GA-MKFCMb 0.7259 0.6422 0.4996 0.7741 25063.1
PSO-MKFCMb 0.7244 0.6390 0.5124 0.7221 604.7
and medium scale dimension problems. In the datasets of PID, 634
CS-MKFCMb 0.7752 0.6857 0.5314 0.7742 1611.6 sonar, transfusion and mammogram, the maximum CA has been 635
GA-FCM + CFa 0.8904 0.8623 0.7835 0.9233 13021.1 reached. The maximum accuracy reached by iris data is 0.9867. 636
PSO- FCM + CFa 0.9742 0.9140 0.8627 0.9741 555.5 In iris data, maximum RC is 0.8923 and JC is 0.719. In terms of 637
CS- FCM + CFa 0.9212 0.8799 0.8540 0.9419 2584.4
ARI, maximum value is 0.9480. 638
GA-KFCM + KCFa 0.9752 0.9243 0.8725 0.9524 8800.7
PSO- KFCM + KCFa 0.7232 0.6664 0.5015 0.6533 628.5 Every clustering algorithm has taken six different real data as 639
CS- KFCM + KCFa 0.7168 0.6148 0.4518 0.5814 2616.8 input and the clustering output are measured with CA, JC, RC, 640
GA-MKFCM + MKCFa 0.7712 0.6627 0.5035 0.7547 14249.6 ARI and computation time. Then, summary of experimental results 641
PSO- MKFCM + MKCFa 0.7186 0.6189 0.4510 0.7776 986.4 are obtained with three different perspectives and presented in 642
CS- MKFCM + MKCFa 0.7711 0.6722 0.5357 0.7655 2965.1
Tables 3, 4 and 6. In Table 3, every tuples belonging to real data 643
are found out after taking mean value from CA, JC and RC out of 644
42 samples (seven genetic-based algorithms six different data). 645
619 which is higher than the genetic and PSO algorithm. For RC, CS So, Table 3 of real data has been the mean value of 42 samples. 646
620 algorithm is better and achieved 0.8735 which is higher than other Based on this analysis, PSO algorithm provided better performance 647
621 two algorithms. Similarly, in terms of ARI, CS algorithm is better in terms of RC and JC but for cuckoo search, the summary value is 648
622 and achieved 0.9179 which is higher than other two algorithms 0.819 in terms of CA and 0.8168 in terms of ARI which is higher 649

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

D. Binu / Expert Systems with Applications xxx (2015) xxxxxx 9

Table 6 Table 7
Summary of real data experimentation in the view of objective formalization and Summary of image data experimentation in the view of objective formalization and
search algorithm. search algorithm.

CA RC JC ARI Time CA RC JC ARI Time


K-means 0.7535 0.6522 0.4768 0.7605 24.3 K-means 0.9211 0.8567 0.8083 0.9286 13490.6
GA-KMc 0.7729 0.6550 0.4848 0.7771 62.5 GA-KMc 0.9480 0.9191 0.9401 0.9366 35772.9
PSO-KMc 0.7148 0.6457 0.4590 0.7268 3.3 PSO-KMc 0.8820 0.8382 0.7335 0.8512 1683.7
CS-KMc 0.7741 0.6562 0.4867 0.7780 7.2 CS-KMc 0.9787 0.9632 0.9447 0.9786 3013.7
GA-FCMc 0.7489 0.5913 0.4332 0.7323 62.6 GA-FCMc 0.9722 0.9606 0.9383 0.9745 60674.2
PSO-FCMc 0.8179 0.6376 0.4783 0.8276 3.2 PSO-FCMc 0.9114 0.7962 0.6981 0.8896 1903.7
CS-FCMc 0.8163 0.6356 0.4772 0.8079 7.3 CS-FCMc 0.9857 0.9090 0.8497 0.9855 3818.4
GA-KFCMc 0.9144 0.6380 0.5049 0.9069 52.8 GA-KFCMc 0.9563 0.9342 0.9381 0.9615 61348.8
PSO-KFCMc 0.8699 0.6266 0.4806 0.8804 3.1 PSO-KFCMc 0.9706 0.8777 0.8170 0.8239 1658.9
CS-KFCMc 0.8167 0.5935 0.4475 0.8441 7.1 CS-KFCMc 0.9656 0.8695 0.8061 0.9706 3406.6
GA-MKFCMb 0.5833 0.4631 0.3939 0.5814 50.2 GA-MKFCMb 0.8783 0.8212 0.8783 0.9106 85831.7
PSO-MKFCMb 0.5927 0.4621 0.3807 0.5223 3.2 PSO-MKFCMb 0.7475 0.6918 0.5933 0.7762 1673.6
CS-MKFCMb 0.6646 0.5165 0.3287 0.6374 7.1 CS-MKFCMb 0.8088 0.6807 0.5870 0.7892 3463.7
GA-FCM + CFa 0.7900 0.6228 0.4523 0.7912 56.3 GA-FCM + CFa 0.9689 0.9469 0.9492 0.9963 27193.3
PSO-FCM + CFa 0.8846 0.6620 0.4734 0.8889 2.9 PSO-FCM + CFa 0.9749 0.8857 0.8258 0.9822 1380.1
CS-FCM + CFa 0.8480 0.6444 0.4502 0.8572 12.2 CS-FCM + CFa 0.9870 0.9096 0.8529 0.8859 2858.5
GA-KFCM + KCFa 0.9480 0.7089 0.5494 0.9585 48.7 GA-KFCM + KCFa 0.8970 0.8544 0.8194 0.8942 38051.3
PSO- KFCM + KCFa 0.9458 0.6404 0.5892 0.9464 2.1 PSO-KFCM + KCFa 0.9542 0.8497 0.7850 0.8954 1658.3
CS-KFCM + KCFa 0.8990 0.6740 0.5006 0.8918 4.9 CS-KFCM + KCFa 0.9857 0.9055 0.8500 0.9928 3468.1
GA-MKFCM + MKCFa 0.9079 0.4641 0.3890 0.8858 51.2 GA-MKFCM + MKCFa 0.8401 0.7740 0.7199 0.8621 83090.0
PSO-MKFCM + MKCFa 0.8312 0.6073 0.5082 0.7909 2.0 PSO-MKFCM + MKCFa 0.9363 0.8861 0.8016 0.9830 1699.5
CS-MKFCM + MKCFa 0.9152 0.4787 0.3984 0.9011 4.6 CS-MKFCM + MKCFa 0.7941 0.7171 0.6471 0.7822 6371.3

650 than other two algorithms. The computation complexity is much and 0.9492 in terms of JC. The proposed GA-FCM + CF have pro- 689
651 less for PSO algorithm for these real datasets. vided the better results of 0.9963 through ARI. 690
652 In Table 4, summary is taken based on objective function-based
653 performance of real data. Here, each values are obtained by nding
654 the mean value of 18 samples (three same objective functions out 7.4.4. Experimentation with real data of large scale dimension 691
655 of 21six datasets). Here, the proposed kernel-based objective out- For the large scale dimension analysis, two real datasets such as, 692
656 performed all other objectives in terms of CA, RC and JC. The pro- secom and madelon are taken and the performance of algorithms is 693
657 posed KFCM + KCF have achieved 0.9309 in CA and 0.9322 in ARI analyzed to nd the suitability of algorithm in large scale cluster 694
658 which is higher than existing objective functions. Likewise, the analysis. Based on the experimentation with large scale data, the 695
659 summary of hybridized algorithm for real data is given in Table 6 maximum CA reached for secom data is 0.9994 and the maximum 696
660 in which the values are taken by taking the average performance RC reached is 0.877. The maximum ARI reached for secom data is 697
661 of six different real data considered. For this, GA-KFCM + KCF pro- 0.8525. The minimum computation time required in secom data 698
662 vided 0.94807 as CA value and 0.7089 as RC value. In terms of JC, is 26.73115 s. For the madelon data, the maximum performance 699
663 PSO-KFCM + KCF provided better results. in terms of CA, RC and JC is 0.7065, 0.4995 and 0.5085. The com- 700
putation time required for the minimum case is 34.75245 s. 701
The summarized results are presented in Tables 3, 4 and 8. In 702
Table 3, the average performance is computed by taking mean 703
664 7.4.3. Experimentation with image data value of seven same search-based algorithms belonging to two dif- 704
665 The experimentation of all the 21 algorithms for the image data ferent data and so it is the mean value of 14 samples. Table 3 705
666 is performed in this section. To further analyze the clustering algo- clearly indicated that PSO algorithm outperformed in all the 706
667 rithm, the image data is taken and we apply all the algorithms to evaluation metrics when compared with other search algorithms. 707
668 the image data like micro array and MRI image. The maximum The PSO algorithm has given 0.8104 as CA whereas the second rank 708
669 CA reached by any one of the algorithm is 0.9895 and RC is is for CS which reached only 0.8054. The computation time of PSO 709
670 0.9792. The JC reached for Microarray data is 0.9751. Similarly, algorithm for large scale data is half of the CS algorithm. From 710
671 for the MRI image, the maximum value obtained by analyzing Table 4, MKFCM objectives are better as compared with other 711
672 the entire algorithms is 0.9992, 0.9774, and 0.9639 in terms of objective functions. The proposed MKFCM + MKCF have provided 712
673 CA, RC and JC respectively. The minimum time taken is 0.8529 as CA which is higher than the existing algorithms. From 713
674 1159.035 s for microarray data and 1601.326 s for MRI image data. Table 8, PSO-MKFCM achieved better results in all the format of 714
675 The summarized result of two image data is given in Tables 3, 4 evaluation metrics taken, CA, RC, ARI and JC. The values for this 715
676 and 7. In Table 3, the performance metrics are found out by taking objective function are 0.85295, 0.68825 and 0.68845. 716
677 the average performance over the same search algorithms with
678 two different data (totally, 14 samples). Based on this Table 3, CS
679 algorithm has given 0.9293 in terms of CA and other metrics are 7.5. Discussion 717
680 better outperformed by GA. The summarized results of image data
681 with the perspective of objective functions are given in Table 4. A thorough discussion about the experimentation of 21 differ- 718
682 Here, FCM + CF-based objectives (proposed) are outperformed over ent algorithms with 16 different datasets is provided in this sec- 719
683 all the presented objectives. The proposed FCM + CF have achieved tion. The performance of clustering algorithms is analyzed and 720
684 nearly a value of 0.9 for all the evaluation metrics, CA, RC, ARI and discussed through four different categories (based on datasets) 721
685 JC. The summarized result of hybridized form of algorithms is with respect to effectiveness and efciency along with three per- 722
686 given in Table 7. Here, average value of two datasets for every algo- spectives (search algorithm, objective function and hybridized 723
687 rithm is given. Based on this Table 7, except RC, the proposed CS- form). The effectiveness has found out based on CA, JC and RC. 724
688 FCM + CF have provided the better results of 0.987 through CA The efciency is found out using computation time. 725

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

10 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx

Table 8 The computation effort is minimized for PSO algorithm. For the 769
Summary of large data experimentation in the view of objective formalization and objective-based analysis, the proposed KFCM + KCF proved better 770
search algorithm.
in all the effectiveness measure as well as efciency as compared 771
CA RC JC ARI Time with other objective function. In terms of hybridized results, GA- 772
K-means 0.7625 0.5149 0.5765 0.7346 590.7 KFCM + KCF scored the highest rank in terms of CA and RC. On 773
GA-KMc 0.7601 0.5015 0.5739 0.7333 784.3 the other hand, PSO-KFCM + KCF are better in JC. The better ef- 774
PSO-KMc 0.7664 0.5565 0.5800 0.7364 30.9 ciency is achieved by the PSO-KFCM + KCF. The existing objective 775
CS-KMc 0.7620 0.4868 0.5771 0.7340 70.9
GA-FCMc 0.6678 0.4609 0.5156 0.6871 816.3
functions are not much provided the effectiveness as compared 776

PSO-FCMc 0.7616 0.4944 0.5753 0.7345 30.9 with the proposed kernel-based objective function. The kernel- 777
CS-FCMc 0.7412 0.4663 0.5581 0.7275 71.1 based proposed objective function provided better result for the 778
GA-KFCMc 0.8529 0.6872 0.6873 0.7795 956.3 real data which are from different application. The performance 779
PSO-KFCMc 0.8529 0.6872 0.6873 0.7795 36.8
achieved by the proposed algorithm ensures that the application 780
CS-KFCMc 0.8529 0.6872 0.6873 0.7797 84.5
GA-MKFCMb 0.8526 0.6866 0.6868 0.7795 968.6 and its data range is not a problem for kernel-based objective 781
PSO-MKFCMb 0.8529 0.6882 0.6884 0.7795 38.0 function. 782
CS-MKFCMb 0.8526 0.6875 0.6878 0.7795 86.5 For the third category (experimentation with image data), GA 783
GA-FCM + CFa 0.7333 0.4667 0.5538 0.7199 816.5 algorithm performed better in two evaluation metrics and other 784
PSO-FCM + CFa 0.7333 0.4651 0.5561 0.7199 31.3
one by CS algorithm. Again, the time is minimized for PSO algo- 785
CS-FCM + CFa 0.7272 0.4591 0.5508 0.7165 70.9
GA-KFCM + KCFa 0.8529 0.6872 0.6873 0.7795 979.4 rithm. In the perspective of objective formulation, the proposed 786
PSO-KFCM + KCFa 0.8529 0.6872 0.6873 0.7795 41.2 FCM + CF has outperformed in both effectiveness and efciency. 787
CS-KFCM + KCFa 0.8529 0.6872 0.6873 0.7795 86.3 In the hybridized form, GA-FCM + CF have provided better in terms 788
GA-MKFCM + MKCFa 0.8529 0.6872 0.6873 0.7795 1004.2
of JC and CS-FCM + CF has provided better in terms of CA. The com- 789
PSO-MKFCM + MKCFa 0.8529 0.6872 0.6873 0.7795 38.4
CS-MKFCM + MKCFa 0.8529 0.6872 0.6873 0.7795 87.6 putation effort is minimized when PSO-FCM + CF is used for the 790
image data, so this ensures that the proposed FCM + CF provided 791
good signicance if the data range is constant for all the attributes. 792
For the fourth category (experimentation with large scale data), 793
726 7.5.1. Complexity analysis
PSO-based algorithm outperformed in both effectiveness and ef- 794
727 The computational complexity of objective function is com-
ciency. In the case of objective function-based analysis, the effec- 795
728 puted based on big O notation. For worst case, the computational
tiveness-based measures are better improved by MKFCM and 796
729 complexity (CC) of the proposed objective function is computed
MKFCM + MKCF. The better efciency is obtained by k-means for 797
730 as follows:
731 the large scale data. For the hybridized form, PSO-MKFCM is better 798
733 CC Ongd 1 n log g 2ng log g 22 choice to obtain more effective in clustering. The efcient result is 799
achieved by the PSO-FCM algorithm for the large scale data. The 800
734 For best case,
735 conclusion from the experimentation with large scale data is that 801
737 CC Ongd 3 1 23 if the attribute size is very large along with different data range 802
and its variance are not uniform, proposed MKFCM + MKCF pro- 803
738 For average case, vided the better results as compared with existing objective 804
739
741 CC Ongd 1:5 g 24 functions. 805

742 where n are the total data points and d is the dimension of real data
7.5.3. Suggestions 806
743 space and g is the number of clusters.
Based on the analysis, we can easily say that the algorithmic 807
effectiveness is decided by objective function and the algorithmic 808
744 7.5.2. Findings
efciency is decided by the search algorithm. For better efciency, 809
745 For the rst category (experimentation with synthetic data), CS-
PSO algorithm is a right choice for all the different set of data 810
746 based algorithms are outperformed in three different effectiveness
which may be small scale or large scale. For effectiveness, the right 811
747 metrics and the efciency is better for PSO-based algorithms in the
objective function should be taken based on the characteristics of 812
748 case of search algorithm dependent perspective. In the perspective
datasets such as, range of values, dimension, image and data type 813
749 of objective functions, KM objective outperformed for all three
(integer or oating point). Based on this data characteristic, the 814
750 evaluation metrics considered to prove the effectiveness. With
suggestion is that KM can be chosen if (i) the data is fully integers 815
751 respect to the hybridized clustering, PSO-KM and CS-KM provided
and within constant interval, (ii) the dimension of solution is less. 816
752 the maximum accuracy and the PSO-KM provides less computation
The KFCM + KCF can be chosen if, (i) the data may be any integer or 817
753 time. This performance ensures that the existing KM is better suit-
oating point value data, (ii) the dimension may be small or med- 818
754 able if the data is integer.
ium. The FCM + CF can be chosen only if (i) the range of value is 819
755 From the results, we can easily understand that the multiple
constant for all attributes, (ii) the values are medium, (iii) suitable 820
756 kernel-based algorithms is struggled to nd the exact number of
for image analysis. Finally, the MKFCM + MKCF can be chosen only 821
757 clusters as compared with other clustering algorithms. While ana-
if, (i) the dimension of data is high, (iii) the range of values is not 822
758 lyzing the performance based on optimization algorithms, there is
constant, (iii) the data may be integer or oating point. 823
759 no big performance deviation for GA, PSO and CS algorithm. Also,
760 objective function-based performance is changed for every objec-
761 tive function which is the core of the analysis of effectiveness. 8. Conclusion 824
762 Here, multiple kernel-based algorithms which may be proposed
763 objective function or the existing function does not contribute We have presented 21 different techniques to nd optimal clus- 825
764 much on the performance of clustering process as compared with ters with the help of different objective functions and optimization 826
765 other objective functions. algorithms for expert systems and its applications in the eld of 827
766 For the second category (experimentation with small and med- medical, telecommunication and engineering. Here, three objective 828
767 ium scale real data), PSO algorithm has outperformed in two functions are newly designed with four existing objective functions 829
768 evaluation metrics and other one by cuckoo search algorithm. to incorporate with optimization algorithms. The new objective 830

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

D. Binu / Expert Systems with Applications xxx (2015) xxxxxx 11

831 functions was introduced with the consideration of distance, fuzzy Castellanos-Garzn, J. A., & Diaz, F. (2013). An evolutionary computational model 894
applied to cluster analysis of DNA microarray data. Expert Systems with 895
832 variable along with two additional variables that are not dened 896
Applications, 40(7), 25752591.
833 previously, called cumulative distance and cumulative fuzzy val- Chen, L., Chen, C. L. P., & Lu, M. (2011). A multiple-kernel fuzzy c-means algorithm 897
834 ues. Totally, 21 different clustering algorithms are discussed with for image segmentation. IEEE Transactions on Systems, Man, and Cybernetics 898
Part B: Cybernetics, 41(5). 899
835 mathematical formulation after blending objective function with 900
Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved
836 search algorithm like, GA, CS and PSO algorithm. The performance differential evolution algorithm. IEEE Transactions on Systems, Man, and 901
837 of algorithms is evaluated with 16 different datasets of different Cybernetics Part A: Systems and Humans, 38(1), 218237. 902
Hoang, D. C., Yadav, P., Kumar, R., & Panda, S. K. (2014). Implementation of a 903
838 shape and characteristics. The effectiveness and efciency of algo-
harmony search algorithm-based clustering protocol for energy-efcient 904
839 rithms are compared using three different evaluation metrics with wireless sensor networks. IEEE Transactions on Industrial Informatics, 10(1), 905
840 computation time. From the research outcome, we can easily con- 774783. 906
841 clude that the algorithmic effectiveness is better dependent on Elkeran, A. (2013). A new approach for sheet nesting problem using guided cuckoo 907
search and pairwise clustering. European Journal of Operational Research, 231(3), 908
842 objective function and the algorithmic efciency is decided based 757769. 909
843 on search algorithm. Finally, the right choice of algorithms is sug- Krishnasamy, G., Kulkarni, A. J., & Paramesran, R. (2014). Hybrid approach for data 910
844 gested depending on the characteristic of input data. clustering based on modied cohort intelligence and K-means. Expert Systems 911
with Applications, 41(13), 60096016. 912
845 The clustering is a potential method of having various applications Garcia-Piquer, A., Fornells, A., Bacardit, J., Orriols-Puig, A., & Golobardes, E. (2014). 913
846 especially, in expert systems. The various experts systems needs of Large-scale experimental evaluation of cluster representations for 914
847 good unsupervised learning strategy to fulll their requirements. multiobjective evolutionary clustering. IEEE Transactions on Evolutionary 915
Computation, 18(1), 3653. 916
848 The important practical application found out from the literature 917
Goldberg & David (1989). Genetic algorithms in search, optimization and machine
849 for the clustering is given as, speaker clustering (Tang, Palo Alto, learning. 978-0201157673. Reading, MA: Addison-Wesley Professional. 918
850 Chu, Hasegawa-Johnson, & Huang, 2012), analysis of fMRI Data Graves, D., & Pedrycz, W. (2010). Kernel-based fuzzy clustering and fuzzy 919
clustering: A comparative experimental study. Fuzzy Sets and Systems, 161, 920
851 (Zhang, Xianguo, Zhen, Wei, & Huafu, 2011), wireless sensor network 921
522543.
852 (Youssef, Youssef, & Younis, 2009), grouping of text documents (Cao, Hong, Y., & Kwong, S. (2008). To combine steady-state genetic algorithm and 922
853 Zhiang, Junjie, & Hui, 2013), Auditory Scene Categorization (Cai, Lie, & ensemble learning for data clustering. Journal Pattern Recognition Letters, 29(9), 923
14161423. 924
854 Hanjalic, 2008), News Story Clustering (Xiao, Chong-Wah, &
Huang, J., Heli, S., Qinbao, S., Hongbo, D., & Jiawei, H. (2013). Revealing density- 925
855 Hauptmann, 2008), Target Tracking (Liang et al., 2010), network based clustering structure from the core-connected tree of a network. IEEE 926
856 (Huang, Heli, Qinbao, Hongbo, & Jiawei, 2013), Cancer Gene Transactions on Knowledge and Data Engineering, 25(8), 18761889. 927
_
Inkaya, T., Kayalgil, S., & zdemirel, N. E. (2015). Colony optimization based 928
857 Expression Proles (Zhiwen, Le, You, Hau-San, & Guoqiang, 2012),
clustering methodology. Applied Soft Computing, 28, 301311. 929
858 social networks (Caimei, Xiaohua, & Jung-ran, 2011). Ji, J., Pang, W., Zhou, C., Han, X., & Wang, Z. (2012). A fuzzy k-prototype clustering 930
859 The major limitation of the proposed algorithm is user given algorithm for mixed numeric and categorical data. Journal of Knowledge-Based 931
860 cluster size which requires data knowledge for the user. The sec- Systems, 30, 129135. 932
Ji, Z., Xi, Y., Chen, Q., Sun, Q., Xia, D., & Feng, D. D. (2012). Fuzzy c-means clustering 933
861 ond limitation is thenumber of iteration which is a termination cri- with weighted image patch for image segmentation. Applied Soft Computing, 12, 934
862 teria utilized here for convergence. The critical analysis is required 16591667. 935
863 on dening the better termination criteria. Also, multiple paramet- Kannana, S. R., Ramathilagam, S., & Chung, P. C. (2012). Effective fuzzy c-means 936
clustering algorithms for data clustering problems. Expert Systems with 937
864 ric inputs and optimal xing of threshold values are to overcome 938
Applications, 39(7), 62926300.
865 for better application of clustering process. Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. Proc. of the IEEE 939
866 The proposed clustering algorithm can be applied to clinical intl conf. on neural networks (Vol. IV, pp. 19421948). Piscataway, NJ: IEEE 940
service center. 941
867 decision support system, disease diagnosis, agricultural research, 942
Khan, S. S., & Ahmad, A. (2004). Cluster center initialization algorithm for K-means
868 forecasting and routing to obtain more effective results based on clustering. Pattern Recognition Letters, 25, 12931302. 943
869 the ndings of research. Also, clustering can be extended by mod- Krishna, K., & Murty (1999). Genetic K-means algorithm. IEEE Transactions on 944
Systems Man and Cybernetics B Cybernetics, 29, 433439. 945
870 ifying the optimization search algorithm towards reducing the
Kuo, R. J., Syu, Y. J., Chen, Z. Y., & Tien, F. C. (2012). Integration of particle swarm 946
871 time complexity. Again, the effectiveness can be even improved optimization and genetic algorithm for dynamic clustering. Journal of 947
872 including the different constraints as per the datasets or user in Information Sciences, 19, 124140. 948
873 the objective functions. Liang, Z., Chaovalitwongse, W. A., Rodriguez, A. D., Jeffcoat, D. E., Grundel, D. A., & 949
ONeal, J. K. (2010). Optimization of spatiotemporal clustering for target 950
tracking from multisensor data. IEEE Transactions on Systems, Man, and 951
874 9. Uncited reference Cybernetics, Part C: Applications and Reviews, 40(2), 176188. 952
Linda, O., & Manic, M. (2012). General type-2 fuzzy c-means algorithm for uncertain 953
875 Graves and Pedrycz (2010). fuzzy clustering. IEEE Transactions on Fuzzy Systems, 20(5), 883897. 954
Li & Qi (2007). Spatial kernel K-harmonic means clustering for multi-spectral image 955
segmentation. IET Image Processing, 1(2), 156167. 956
876 Appendix A. Supplementary data Liyong, Z., Witold, P., Wei, L., Xiaodong, L., & Li, Z. (2014). An interval weighed fuzzy 957
c-means clustering by genetically guided alternating optimization. Expert 958
Systems with Applications, 41(13), 59605971. 959
877 Supplementary data associated with this article can be found, in Maji, P. (2011). Fuzzy-rough supervised attribute clustering algorithm and 960
878 the online version, at http://dx.doi.org/10.1016/j.eswa.2015.03. classication of microarray data. IEEE Transactions on Systems, Man, and 961
Cybernetics Part B: Cybernetics, 41(1), 222233. 962
879 031. 963
McQueen, J. (1967). Some methods for classication and analysis of multivariate
observations. In Proc. of fth berkeley symposium on math. Vol: Statistics and 964
880 References probability (pp. 281297). 965
Merwe, D. W. V. & Engelbrecht, A. P. (2003). Data clustering using particle swarm 966
optimization. In The congress on evolutionary computation (Vol. 1, pp. 215220). 967
881 Bandyopadhyay, S. (2011). Multiobjective simulated annealing for fuzzy clustering
Mualik, U., & Bandyopadhyay, S. (2002). Genetic algorithm based clustering 968
882 with stability and validity. IEEE Transactions on Systems, Man, and Cybernetics
technique. Pattern Recognition, 33, 14551465. 969
883 Part C: Applications and Reviews, 41(5), 682691.
Niknam, T., & Amiri, B. (2010). An efcient hybrid approach based on PSO, ACO and 970
884 Bezdek, J. C. (1981). Pattern recognition with fuzzy objective function algorithms.
k-means for cluster analysis. Applied Soft Computing, 10, 183197. 971
885 0306406713. New York: Plenum Press.
Ouadfel, S., & Meshoul, S. (2012). Handling fuzzy image clustering with a modied 972
886 Cai, R., Lie, L., & Hanjalic, A. (2008). Co-clustering for auditory scene categorization.
ABC algorithm. International Journal of Intelligent Systems and Applications, 12, 973
887 IEEE Transactions on Multimedia, 10(4), 596606.
6574. 974
888 Caimei, L., Xiaohua, H., & Jung-ran, P. (2011). The social tagging network for web
Pakhira, M. K., Bandyopadhyay, S., & Maulik, U. (2005). A study of some fuzzy 975
889 clustering. IEEE Transactions on Systems, Man and Cybernetics. Part A: Systems and
cluster validity indices, genetic clustering and application to pixel classication. 976
890 Humans, 41(5), 840852.
Fuzzy Sets and Systems, 155(2). 977
891 Cao, J., Zhiang, W., Junjie, W., & Hui, X. (2013). SAIL: Summation-based incremental
Pham, D. T., Dimov, S. S. & Nguyen, C. D. (2004). Selection of K in K-means 978
892 learning for information-theoretic text clustering. IEEE Transactions on
clustering, In Proc. IMechE (p. 219). 979
893 Cybernetics, 43(2), 570584.

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031
ESWA 9942 No. of Pages 12, Model 5G
10 April 2015

12 D. Binu / Expert Systems with Applications xxx (2015) xxxxxx

980 Premalatha, K., & Natarajan, A. M. (2008). A new approach for data clustering based Xu, R., Xu, J., & Wunsch, D. C. (2012). A comparison study of validity indices on 1011
981 on PSO with local search. Computer and Information Science, 1(4). swarm-intelligence-based clustering. IEEE Transactions on Systems, Man, and 1012
982 Saha, S., & Bandyopadhyay, S. (2009). A new multiobjective simulated annealing Cybernetics Part B: Cybernetics, 42(4), 12431256. 1013
983 based clustering technique using symmetry. Pattern Recognition Letters, 30, Yang, Y., & Chen, K. (2011). Temporal data clustering via weighted clustering 1014
984 13921403. ensemble with different representations. IEEE Transactions on Knowledge and 1015
985 Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering Data Engineering, 23(2). 1016
986 problem. Pattern Recognition, 10(24), 10031008. Yang, X. S., & Deb, S. (2010). Engineering optimization by cuckoo search. 1017
987 Senthilnath, J., Omkar, S. N., & Mani, V. (2011). Clustering using rey algorithm: International Journal of Mathematical Modelling and Numerical Optimisation, 1018
988 Performance study. Swarm and Evolutionary Computation, 1, 164171. 1(4), 330343. 1019
989 Sulaiman, S. N., & Isa, N. A. M. (2010). Adaptive fuzzy-K-means clustering algorithm Youssef, M., Youssef, A., & Younis, M. (2009). Overlapping multihop clustering for 1020
990 for image segmentation. IEEE Transactions on Consumer Electronics, 56(4), wireless sensor networks. IEEE Transactions on Parallel and Distributed Systems, 1021
991 26612668. 20(12), 18441856. 1022
992 Szilgyi, L., Szilgyi, S. M., Benyb, B., & Beny, Z. (2011). Intensity inhomogeneity Yuwono, M., Su, S. W., Moulton, B. D., & Nguyen, H. T. (2014). Clustering using 1023
993 compensation and segmentation of MR brain images using hybrid c-means variants of rapid centroid estimation. IEEE Transactions on Evolutionary 1024
994 clustering models. Biomedical Signal Processing and Control, 6, 312. Computation, 18(3), 366377. 1025
995 Tang, H., Palo Alto, C. A., Chu, S. M., Hasegawa-Johnson, M., & Huang, T. S. (2012). Zhang, D. Q., & Chen, S. C. (2004). A novel kernelized fuzzy C-means algorithm with 1026
996 Partially supervised speaker clustering. IEEE Transactions on Pattern Analysis and application in medical image segmentation. Articial Intelligence in Medicine, 1027
997 Machine Intelligence, 34(5), 959971. 32(1), 3750. 1028
998 Tengke, X., Shengrui, W., Qingshan, J., & Huang, J. Z. (2014). A novel variable-order Zhang, C., Ouyang, D., & Ning, J. (2010). An articial bee colony approach for 1029
999 Markov model for clustering categorical sequences. IEEE Transactions on clustering. Expert Systems with Applications, 37, 47614767. 1030
1000 Knowledge and Data Engineering, 26(10), 23392353. Zhang, J., Xianguo, T., Zhen, Y., Wei, L., & Huafu, C. (2011). Analysis of fMRI data 1031
1001 UCI machine learning repository from <http://archive.ics.uci.edu/ml/datasets. using an integrated principal component analysis and supervised afnity 1032
1002 html>. propagation clustering approach. IEEE Transactions on Biomedical Engineering, 1033
1003 Wan, M., Li, L., Xiao, J., Wang, C., & Yang, Y. (2012). Data clustering using bacterial 58(11), 31843196. 1034
1004 foraging optimization. Journal of Intelligent Information Systems, 38(2), 321341. Zhao, F., Jiao, L., & Liu, H. (2013). Kernel generalized fuzzy c-means clustering with 1035
1005 Wei, S., Yingying, Q., Soon Cheol, P., & Xuezhong, Q. (2015). A hybrid evolutionary spatial information for image segmentation. Digital Signal Processing, 23, 1036
1006 computation approach with its application for optimizing text document 184199. 1037
1007 clustering. Expert Systems with Applications, 42(5), 25172524. Zhiwen, Y., Le, L., You, J., Hau-San, W., & Guoqiang, H. (2012). SC(3): Triple spectral 1038
1008 Xiao, W., Chong-Wah, N., & Hauptmann, A. G. (2008). Multimodal news story clustering-based consensus clustering framework for class discovery from 1039
1009 clustering with pairwise visual near-duplicate constraint. IEEE Transactions on cancer gene expression proles. IEEE/ACM Transactions on Computational Biology 1040
1010 Multimedia, 10(2), 188199. and Bioinformatics, 9(6), 17511765. 1041
1042

Please cite this article in press as: Binu, D. Cluster analysis using optimization algorithms with newly designed objective functions. Expert Systems with
Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.03.031

You might also like