Coletta 2012
Coletta 2012
net/publication/234003539
CITATIONS READS
90 818
5 authors, including:
All content following this page was uploaded by Luiz F. S. Coletta on 22 May 2014.
Abstract—There are some variants of the widely used Fuzzy this type of clustering as collaborative clustering. In essence,
C-Means (FCM) algorithm that support clustering data dis- collaborative clustering or distributed clustering gives rise to
tributed across different sites. Those methods have been studied a discovery of structure through clustering locally available
under different names, like collaborative and parallel fuzzy
clustering. In this study, we offer some augmentation of the two data and establishing a certain interaction with the clustering
FCM-based clustering algorithms used to cluster distributed algorithms operating at individual data sites.
data by arriving at some constructive ways of determining In our study, we are particularly interested in the FCM
essential parameters of the algorithms (including the number of variants conceived to handle distributed data, such as the
clusters) and forming a set of systematically structured guide- algorithms described in [9], [10], [11]. The algorithms of
lines as to a selection of the specific algorithm depending upon
a nature of the data environment and the assumptions being this type are useful for what has been called in the literature
made about the number of clusters. A thorough complexity “Distributed Knowledge Discovery” [12]. More specifically,
analysis including space, time, and communication aspects is such algorithms have been designed to perform Distributed
reported. A series of detailed numeric experiments is used to Data Clustering [13], [14], [15], [16], [17], by assuming that
illustrate the main ideas discussed in the study. the data to be clustered are distributed across a number of
Index Terms—Distributed Knowledge Discovery, Collabora- data sites.
tive and Parallel Fuzzy Clustering, Validity Indices, Design and The term “Collaborative Fuzzy Clustering” has been
Selection Guidelines. originally proposed in [9], and it invokes an active way of
forming information granules (clusters) where a concept of
I. I NTRODUCTION communicating information granules plays an essential role
that makes it possible to share summarized knowledge struc-
T HE ultimate objective of any clustering algorithm is to
determine a finite set of categories (groups) to describe
a data set according to similarities among its objects [1],
tures present at different data sites and reconcile potential
differences [10]. In this context, it is particularly important
[2]. In particular, fuzzy clustering deals with overlapping not to violate domain constraints (e.g., privacy and security
data clusters [3]. In this sense, when a fuzzy clustering issues, network bandwidth capacity, etc.). By taking into
algorithm is applied to a data set, the result is a partition of account all these aspects, a state of the art algorithm for
the data into a certain number (c) of fuzzy clusters where one collaborative fuzzy clustering has been proposed in [10].
admits a partial belongingness of an object to a cluster. The Having a different motivation in mind (improving the com-
well-known Fuzzy C-Means (FCM) algorithm [4], [3] is an putational efficiency of FCM), the widely known Parallel
extension of the classical K-means algorithm [5], [6]. While Fuzzy C-Means (PFCM) has been introduced in [11] and
K-means assumes that every object must belong exclusively can also be used for related tasks, as will be discussed later
to a single cluster, FCM relaxes this restriction, so that every in the paper. The improvement of particular characteristics
object belongs to some (possibly nonzero) degree to every of these algorithms, and the study of application scenarios
cluster [3], [7]. There are numerous variants of the generic in which a given algorithm could be more appropriate, have
FCM algorithm applied to a single, “centralized” data — motivated our work.
see [8], [7]. The main contributions of this study are as follows.
Along with various tasks of clustering and fuzzy clus- First, we carefully revisit the class of FCM-based clustering
tering applied to a single data set, there is a growing area algorithms for distributed data. Second, by identifying some
of clustering realized for a certain finite collection of data existing limitations, we propose improvements to the CFC
sets where each data set is considered individually (so algorithm [10] as well as develop two new algorithms
they cannot be treated en block) however our intent is to that are capable of estimating the number of clusters from
discover a structure over all data. We will be referring to data. Third, the intrinsic nature of the collaborative fuzzy
clustering problem has led us to consider the Parallel
L. Coletta, L. Vendramin, E. Hruschka and R. Campello are with Depart- Fuzzy C-Means (PFCM) algorithm [11] as a potentially
ment of Computer Science, University of Sao Paulo, Sao Carlos, Brazil. promising distributed clustering vehicle for some application
W. Pedrycz is with Department of Electrical and Computer Engineering,
University of Alberta, Edmonton, Canada and Systems Research Institute, scenarios. However, PFCM also requires that the number
Polish Academy of Sciences, Warsaw, Poland. of clusters be provided in advance. In order to circumvent
2
this limitation, we develop an extension of this algorithm otherwise. In fuzzy clustering, we admit partial membership
which, while keeping its sound properties, allows for the to a cluster meaning that the entries of the partition matrix
automatic estimation of the number of clusters. For all these assume values from the [0,1] interval. In other words, we
algorithms, a thorough complexity analysis including space, form c fuzzy clusters described by a fuzzy partition, such
time, and communication criteria is reported. Finally, in that: (
practice, depending on the underlying assumptions about U = [uij ]c×N
(2)
the data (i.e., data that come from the same population or uij ∈ [0, 1]
from different populations1 ), as well as on the difficulty
to set some user-defined parameters (such as the number where U is a c × N fuzzy partition matrix whose element
of clusters and/or the interaction level between data sites), uij represents the membership of the jth object to the ith
some algorithms may be preferred to the others. From cluster. Fuzzy C-Means (FCM) [4], [3] is one of the most
this viewpoint, we suggest a decision-tree-like structure for commonly studied methods in this area. In what follows, we
the user interested in choosing among the available algo- briefly review the essence of this method.
rithms. By considering such a decision-tree-like structure, B. Fuzzy C-Means algorithm and its relatives
illustrative numeric examples that highlight performance In a nutshell, FCM [4], [3] is an iterative procedure
differences among the algorithms under study versus the that finds local solutions (local minima) to the following
assumptions made about the data are discussed, allowing us optimization problem:
to derive a number of conclusions that can be useful for the N X
X c
2
practice of collaborative fuzzy clustering algorithms. min J = u2ij kxj − vi k (3)
uij ,vi
The paper is organized as follows. Section II addresses j=1 i=1
some related work, whereas Section III focuses on the main
topics of this study, that is, the development of collaborative
0 ≤ uij ≤ 1
Pc
fuzzy clustering algorithms and their categorization as well s.t. i=1 uij = 1 ∀j ∈ {1, ..., N }
as some general guidelines on the use of a specific version of
PN
the algorithm depending upon the scenario of the application 0 < j=1 uij < N ∀i ∈ {1, ..., c}
and assumptions being made. From this perspective, a where xj (j = 1, ..., N ) are the data (objects) to be
number of numeric examples demonstrating the performance clustered into c clusters (groups), vi = [vi1 , ..., vid ]T ∈
of each of the algorithms are provided in Section IV. Finally, ℜd (i = 1, ..., c) are the cluster prototypes, uij stands for the
Section V concludes the paper. membership of the jth object to the ith fuzzy cluster, and
In this study, we will adhere to the standard notation ||.|| denotes an inner-product norm (e.g., Euclidean distance
encountered in the area. Scalar values are described by — adopted in this work). The iterative procedure to solve the
small italic letters (except for the number of objects and problem formulated in (3) is summarized as Algorithm 1.
the number of data sites, which will be denoted by N and Remark 1: The relationship (4) requires that
P, respectively), sets are described by capital italic letters, ||xj − vl ||2 > 0 for all j ∈ {1, ..., N } and l ∈ {1, ..., c}.
vectors are denoted by small boldface letters, and matrices For every j, if ||xj − vl ||2 = 0 for l ∈ I ⊆ {1, ..., c}, then
by capital boldface letters.
P define uij as follows: a) uij = 0 for i ∈ Ī; and b)
we
II. R ELATED STUDIES i∈I uij = 1.
Remark 2: An effective convergence criterion is that
A. Fuzzy clustering – a few notes the maximal absolute difference between elements of the
When a (non-fuzzy) clustering algorithm of partitioning partition matrix in two consecutive iterations be lower than
nature is applied to a set of N data X = {x1 , ..., xN }, a given positive threshold ε. A usual setting, also adopted
xj = [xj1 , ..., xjd ]T ∈ ℜd , each of which is composed of in this paper, is ε = 10−3 .
d attributes (features), the final result is a Boolean partition Leaving out minor variants of the above algorithm —
of this data into a certain number c of clusters, such that: dealing simply with the usage of a slightly different stopping
rule and/or initialization procedure (e.g., starting from an
H = [hij ]c×N initial partition matrix like in our work, instead of initial
prototypes) — there is an impressive number of different
hij ∈ {0, 1} (1)
algorithms based on modifications and/or extensions of the
Pc
i=1 hij = 1 ∀j ∈ {1, ..., N } ordinary FCM formulation that have been developed during
where H is a c × N Boolean partition matrix whose element the past 35 years. Probably, the most popular one regards
hij is either 1 if the jth object belongs to the ith cluster or 0 the use of a different value of the exponent (other than two),
called fuzzifier m, to control the weight of the partition
1 As usual, the population is the entire finite or infinite aggregate of matrix elements in the optimization functional J in (3) and,
individuals or items from which samples are drawn. as a consequence, the fuzziness of the resulting clusters
3
Algorithm 1: Fuzzy C-Means (FCM) [4], [3] as the Fuzzy Analysis (FANNY) algorithm [1], the
1 Set up a value of c (the number of clusters); Relational Fuzzy C-Means (RFCM) [34], the Non-
2 Select initial cluster prototypes v1 , v2 , ..., vc from Euclidean Relational Fuzzy C-Means (NERFCM) [35],
xj , j = 1, 2, ..., N ; [36], the Fuzzy C-Medoids (FCMdd) [33], and the
3 Compute the distances ||xj − vi || between objects and Fuzzy C-Trimmed Medoids (FCTMdd) [33];
prototypes; • Algorithms for handling objects with missing value at-
4 Compute the elements of the fuzzy partition matrix tributes (incomplete data), such as the Partial Distance
(i = 1, 2, ..., c; j = 1, 2, ..., N ): Strategy FCM (PDSFCM) and the Optimal Completion
" c #−1 Strategy FCM (OCSFCM), both proposed in [37];
X kxj − vi k 2 • Algorithms conceived to scale up FCM in terms of run-
uij = (4) ning time and memory storage requirements, such as
kxj − vl k
l=1
those based on some sort of fast numerical approximate
5 Compute the cluster prototypes (i = 1, 2, ..., c): solution or efficient exact algorithmic implementation
PN 2
of the FCM (see [38], [39] and [40], respectively), tree-
j=1 uij xj structured data based FCM approaches [41], [39], and
v i = PN (5)
2 sub-sampling based FCM approaches [42], [43], among
j=1 uij
others — e.g., [44];
6 Stop if convergence is attained or the number of • Algorithms developed to incorporate partially super-
iterations exceeds a given limit. Otherwise, go to vised information provided by users, such as the
Step 3; Proximity-Based FCM (P-FCM) [45] and other related
knowledge-based algorithms [46], [47];
• Algorithms for handling distributed data, such as those
described in [9], [10], [11]. This last category of FCM
[3]. Whether such a fuzzifier is adopted or not, the use of
variants falls directly within the scope of the present
a fixed inner-product norm in the FCM algorithm induces
work, as it will be discussed in Section III.
fuzzy clusters of a certain shape (geometry). For instance,
hyperspherical clusters are induced when the Euclidean
C. Fuzzy cluster validity measures
norm is adopted. An important class of FCM variants refers
to algorithms designed to find clusters with different (pos- Most of the existing fuzzy clustering algorithms (includ-
sibly adaptive) geometries. This class includes the Fuzzy ing FCM and its variants) require that the number c of fuzzy
C-Varieties (FCV) algorithm [3], which is able to detect clusters to be defined in advance by the user [7]. Moreover,
linear structures (such as lines and planes) in data [8], these algorithms are usually subject to possible problems
and the well-known Gustafson-Kessel (GK) algorithm [18], of getting trapped in local minima. For these reasons, it
which is able to find hyperellipsoidal fuzzy clusters with is indispensable to devise means to evaluate the quality of
different spatial orientations. Other algorithms, such as the different clustering solutions provided by different settings
Fuzzy Maximum Likelihood Estimates (FMLE) proposed of a given algorithm (or even by different algorithms). In
in [19] and the Extended FCM and GK (E-FCM and E- other words, the discovery of the best fuzzy clustering
GK) algorithms introduced in [20], are believed to be more solution among a set of candidates requires accurate criteria
suitable to handle data sets with uneven spatial distributions, to quantitatively measure the quality of the fuzzy partitions
namely, data containing clusters with different volumes and obtained.
densities [21], [8], [20]. Several criteria for the assessment of fuzzy clustering
have been proposed — see [7], [48], [49], [50], and ref-
Another important category of FCM relatives concerns
erences therein. In the experiments to be shown in the
algorithms that are more robust (less sensitive) to outliers
present paper, a recently developed criterion, which may
and noise [22]. This category includes, for instance, L1
exhibit a good trade-off between efficacy and computational
norm-based FCM variants [23], [24] and possibilistic (rather
burden, will be adopted. This criterion, named simplified
than probabilistic) versions of FCM, such as the Possibilistic
Fuzzy Silhouette [51] (FS), is a fuzzy set-based extension
C-Means (PCM) [25], [26], the Fuzzy Possibilistic C-Means
of a simplified (faster) version of the traditional Average
(FPCM) [27], the Possibilistic Fuzzy C-Means (PFCM) [28],
Silhouette Width Criterion (ASWC) [1] originally developed
and other related algorithms [29], [30], [31].
for Boolean clustering assessment. The ASWC as well as
There are many other categories of FCM variants that
its simplified and fuzzy versions are briefly reviewed.
have been proposed in the literature, such as: 1) Average Silhouette Width Criterion: In order to in-
• Algorithms for handling objects with non-numerical troduce this criterion, consider an object j ∈ {1, 2, ..., N }
(categorical/symbolic) attributes [32], whose main rep- belonging to cluster r ∈ {1, ..., c}. In the context of
resentatives are relational-data algorithms [33], such Boolean partitions produced by a prototype-based clustering
4
algorithm (e.g. K-means), this means that object j is closer {1, ..., c}) and hij = 0 otherwise. Consequently, ASWC
to the prototype of cluster r than to any other prototype. may not be able to discriminate between overlapped data
In a more general context of fuzzy partitions, on the other clusters — even if these clusters have each their own
hand, this means that the membership of the jth object to (distinct) regions with higher data densities — since it
the rth fuzzy cluster, urj , is higher than the membership of neglects information contained in the fuzzy partition matrix
this object to any other fuzzy cluster, i.e., urj > uqj for U on degrees to which clusters overlap one another. This
every q ∈ {1, ..., c}, q 6= r. information can be used to reveal those regions in the
Let the average distance of object j to all other objects data space with high data densities by stressing importance
belonging to cluster r be denoted by arj . Also, let the of data objects concentrated in the vicinity of the cluster
average distance of this object to all objects belonging to prototypes while reducing importance of objects lying in
another cluster q, q 6= r, be called dqj . Finally, let brj overlapping areas. To do so, a generalized silhouette cri-
be the minimum dqj computed over q = 1, ..., c, q 6= r, terion, named Fuzzy Silhouette (FS), has been defined as
which represents the dissimilarity of object j to its closest [51]:
neighboring cluster. Then the silhouette of object j is defined PN
in the form: j=1 (urj − uqj )sj
F S = PN (8)
brj − arj j=1 (urj − uqj )
sj = (6)
max {arj , brj } where urj and uqj are the first and second largest elements
where the denominator is used just as a normalization term. in the jth column of the fuzzy partition matrix, respectively,
Clearly, the higher sj , the better the assignment of object j and sj is the silhouette of object j according to (6), possibly
to cluster r. In case r is a singleton, i.e., if it is constituted in its faster, prototype-based version described in Section
uniquely by object j, then the silhouette of this object is II-C1.
defined as sj = 0 [1]. This prevents ASWC, defined as the There is an important aspect regarding (8) that deserves
average of sj over j = 1, 2, ..., N , i.e.: particular attention. This expression differs from (7) for
N
being an weighted average (instead of an arithmetic mean)
1 X of the individual silhouettes given by (6). The weight of
ASW C = sj (7)
N j=1 each term is determined by the difference between the
membership degrees of the corresponding object to its first
from finding the trivial solution for c = N , with each and second best matching fuzzy clusters, respectively. This
object of the data set forming a cluster on its own. This way, an object in the near vicinity of a cluster prototype
way, the best partition is achieved when ASWC in (7) is given more importance than another object located in
is maximized, which implies minimizing the intra-cluster an overlapping area (where the membership degrees of the
distance arj while maximizing the inter-cluster distance brj . objects to two or more fuzzy clusters are similar).
One can note that a problem with the ASWC measure is
that it invokes highly intensive computation of all distances
among all data objects. In order to get around this problem,
in [52], [53] it was proposed to replace the terms arj D. Estimating the number of fuzzy clusters
and brj in (6) with simplified versions of them based on
the distances among the objects and the prototypes of the In practice, several approaches for determining the ap-
corresponding clusters. This modification has shown not to propriate number of clusters can be used. Such approaches
degrade accuracy while being able to significantly reduce the are usually based on a general procedure. First, the data are
computational burden from O(N 2 ) to O(N ) [54]. Moreover, partitioned into different values of the number of clusters.
it does not change the dependency of ASWC on average Then, clustering validity measures, also known as relative
distances (in this case represented by the prototypes), which indices [56], are used to assess the appropriateness of
is a desirable property concerning robustness to noise [55]. the obtained partitions. In this work, a commonly adopted
2) Fuzzy Silhouette criterion: Both the original and sim- (practical) approach for estimating the number of clusters
plified (prototype-based) versions of ASWC discussed in is used. This approach, named Ordered Multiple Runs of
Section II-C1 can be used to evaluate fuzzy partitions [1], FCM (FCM-c*) [57] runs FCM repeatedly for an increasing
[2]. To do so, however, they do not make explicit use of number of clusters (c). For each value of c, a number of
the fuzzy partition matrix in their calculations. Instead, the partitions achieved by FCM are assessed by means of some
fuzzy partition matrix U = [uij ]c×N is used only to impose validity index (simplified version of Fuzzy Silhouette in this
on the data set a partition matrix H = [hij ]c×N to which work), for which the best obtained value is kept for further
the ASWC measure can be applied. In particular, a partition reference. After running FCM for every value of c in a given
matrix H is such that hij = 1 if i = arg maxm umj (m ∈ range, the best obtained partition (according to the validity
5
index) is chosen2 . FCM-c* is detailed by Algorithm 2. Note definition suggests, as its implicit underlying assumption,
that the (sample) data in each data site were generated from
Algorithm 2: Ordered Multiple Runs of FCM (FCM-c*) different populations. Thus, one should not expect to identify
all those different clusters from such different populations
1 Let cmax be the maximum acceptable number of
at every data site. To the contrary, it might be expected
clusters, c∗ be the number of clusters to be estimated
that every data site only takes advantage of the (relatively)
by the algorithm, SC be the stopping criterion for a
similar information communicated by the other data sites.
single run of FCM, VV C be the value of the validity
This information may be useful for refining similar clusters
index after a single run of FCM, VV∗C be the value of
found at different data sites, as realized by the Collaborative
the validity index for the best partition found by the
Fuzzy Clustering (CFC) algorithm proposed in [10].
algorithm, and np be the number of different partitions
In order to provide further details about the al-
generated for each number c ≤ cmax of clusters. Let us
gorithm developed in [10], let us consider P data
assume that the smallest possible value for VV C
sites D[1], D[2], ..., D[P ], each of them formed by
is Vmin .
N [1], N [2], ..., N [P ] objects defined in the same fea-
2 Choose cmax and SC ;
ture set. Initially, CFC induces clusterings formed by
3 VV∗C ← Vmin ;
c[1], c[2], ..., c[P ] clusters using FCM as described in Section
4 for c = 2, ..., cmax do
II-B. More precisely, CFC assumes that these numbers of
5 for i = 1, ..., np do
clusters are a priori known and invariant across different
6 Generate a random partition with c clusters;
sites, i.e., c[1] = c[2] = ... = c[P ] = c, where c is a user-
7 Run FCM until SC is met;
defined parameter. After running FCM, a set of c prototypes
8 Compute VV C for the resulting partition;
is obtained for every data site. These prototypes summarize
9 if VV C > VV∗C then
the information captured by the clusters, thus allowing
10 VV∗C ← VV C ;
representing these clusters. Let us now assume that the
11 c∗ ← c;
clustering results of a given data site can be clearly identified
12 Hold the resulting partition S for c∗ ;
by its index. So, for the iith data site we have the partition
13 end
matrix U[ii] = [uik [ii]] (i = 1, ..., c; k = 1, 2, ..., N [ii];
14 end
ii = 1, ..., P ). The corresponding prototypes are denoted by
15 end
v1 [ii], v2 [ii], ..., vc [ii]. For simplicity and without any loss
16 Return VV∗C , c∗ , and the corresponding partition S for
of generality, let us assume that the collaborative clustering
c∗ ;
is taking place at a single data site, e.g., in D[ii]. In this case,
all the remaining data sites D[1], D[2], ..., D[ii − 1], D[ii +
that this algorithm is suitable for maximization criteria like 1], ..., D[P ] communicate their respective prototypes vi [jj]
the Fuzzy Silhouette. However, it can be directly adapted (i = 1, 2, ..., c; jj = 1, 2, ..., ii − 1, ii + 1, ..., P ) to D[ii].
for minimization criteria as well. Then, using the prototypes communicated by the jjth data
The time complexity of the generic implementation of the site and the objects from D[ii] in Eq. (4) we can generate
e
an induced partition matrix (U[ii|jj]). Analogously, we can
basic FCM algorithm described in Section II-B is O(nt ·
N · c2 · d), where nt is the number of iterations, N is the perform such computation for every data site jj 6= ii,
finding P − 1 induced partition matrices, namely U[ii|1], e
number of objects, c is the number of fuzzy clusters and d
is the number of attributes. Then, noting that FCM-c* runs e
U[ii|2], e
..., U[ii|ii e
− 1], U[ii|ii e
+ 1], ..., U[ii|P ].
FCM np times for each c from 2 through cmax , its overall The collaborative process is guided by the minimization
computational cost is estimated as O(np · nt · N · cmax 3 · d). of the objective function given in the following form:
N [ii]
c
XX
E. Collaborative fuzzy clustering with a fixed interaction Q[ii] = u2ik [ii] d2ik +
level k=1 i=1
P N [ii]
c (9)
Collaborative clustering comprises a variety of different X XX
2
β (uik [ii] − u
eik [ii|jj]) d2ik
schemes [9]. Here we assume that each data site has different jj=1 k=1 i=1
objects, but these are described by the same features. Fol- jj6=ii
lowing [9], collaborative clustering concerns a process of re- where β is a non-negative coefficient whose value is pro-
2
vealing structures that may exist (at least approximately) and vided by the user and d2ik = kxk − vi [ii]k .
to some extent are common across different data sites. This The objective function Q[ii] consists of two terms. The
2 A well-known, analogous approach involves recognizing a knee on a
first is equivalent to the objective function used by FCM —
chart that depicts the number of clusters versus some measure of their J in (3). The second term reflects the impact of the cluster-
compactness [8]. ing structures found in the other data sites. In particular, the
6
PP PP
c β
X u
ejs [ii|jj] β u
ers [ii|jj]
1 1 −
jj=1
jj6=ii
+
jj=1
jj6=ii
urs [ii] = Pc 2 2 (10)
(d
j=1 rs )/(djs ) j=1
[1 + β(P − 1)] [1 + β(P − 1)]
PN [ii] PP PN [ii]
k=1 u2rk [ii]xkt + β jj=1
k=1 erk [ii|jj])2 xkt
(urk [ii] − u
jj6=ii
vrt [ii] = PN [ii] PP PN [ii] (11)
k=1 u2rk [ii] + β jj=1
k=1 erk [ii|jj])2
(urk [ii] − u
jj6=ii
2 2
where r = 1, 2, ..., c; s = 1, 2, ..., N [ii]; t = 1, 2, ..., d; d2rs = kxs − vr [ii]k e d2js = kxs − vj [ii]k .
distance between the partition matrix to be optimized and the III. A LTERNATIVE ALGORITHMS OF COLLABORATIVE
respective induced matrices must be minimized. The value FUZZY CLUSTERING
of β specifies the collaboration intensity. In other words, As addressed in Section II-E, CFC [10] depends on
β determines the influence level that clustering structures two user-defined parameters, namely β and c. In order to
found in other data sites have in the local structure (D[ii]). get rid of the former parameter, we propose two data-
High values for β imply strong collaboration, whereas for driven approaches that result in the algorithms here called
β = 0 there is no collaboration at all — in this case we CFC-βd and CFC-βf . In what concerns the latter parameter,
have P independent clusterings, one for each data site. c, we develop two new algorithms (named CFCM-c* and
The CFC algorithm can be broken into two phases: CFCM-βf -c*) that are capable of estimating the number
Initial Phase: The user provides the number of clusters, of clusters from data. In addition, the intrinsic nature of
c. Then FCM is run at every data site, thus producing a set the collaborative fuzzy clustering problem has led us to
of prototypes vi [ii], i = 1, 2, ..., c; ii = 1, ..., P . consider the widely known PFCM algorithm [11] as a
potentially promising clustering vehicle for some application
Collaborative Phase: The user provides a non-negative scenarios, specially when one assumes that the distributed
value for β. Then the prototypes (information granules) are data to be clustered come from the same population, as will
shared among different data sites in a number of collab- hopefully become evident later on. However, PFCM [11]
oration stages. At each stage, the goal is to minimize the also requires that the number of clusters be provided by
objective function (9) for every data site, D[ii], ii = 1, ..., P . the user. In order to circumvent such a limitation, we have
Such a minimization can be accomplished through an itera- developed an extension of this algorithm that, while keeping
tive procedure, which involves the computation of the local its useful properties, allows the automatic estimation of
partition matrix — Eq. (10) — and prototypes — Eq. (11) the number of clusters. The resulting algorithm has been
[10]. Algorithm 3 summarizes this phase. named here PFCM-c*. Finally, in practice, depending on the
underlying assumptions about the data (i.e., distributed data
Algorithm 3: Collaborative Phase of CFC that come from the same or from different populations), as
1 repeat // each iteration is a collaboration well as on the difficulty to set some user-defined parameters
stage. (such as the number of clusters and/or the interaction level
2 Communicate cluster prototypes from each data between data sites), some algorithms may be preferred to
site to all others; the others. From this viewpoint, we suggest a decision-tree-
3 foreach data site D[ii], ii = 1, ..., P , do like structure for the user interested in choosing among the
4 Compute the induced partition matrices based available algorithms in a given application scenario.
on (4) with local data and cluster prototypes
sent from the other data sites; A. Collaborative Fuzzy Clustering (CFC) with variable
5 repeat interaction level: CFC-βd and CFC-βf algorithms
6 Compute the elements of the local partition The CFC algorithm [10] presented in Section II-E requires
matrix using (10); that the user sets the interaction level, β, which is used for
7 Compute the local cluster prototypes using all pairs of data sites and kept constant (fixed) during the col-
(11); laboration stages. The algorithms here described are capable
8 until the objective function in (9) is minimized; of automatically estimating the interaction levels from data.
9 end In particular, the algorithm named CFC-βd dynamically
10 until cluster prototypes do not significantly change adjusts, during the collaborative process, a particular β[ii|jj]
between two consecutive iterations; value, ii, jj = 1, 2, ..., P for each pair of data sites. In
brief, if the cluster structures in two data sites are very
7
PP PP
1
c
X β[ii|jj]e
jj=1 ujs [ii|jj] jj=1 β[ii|jj]e
urs [ii|jj]
urs [ii] = Pc 1 − jj6=ii
PP + jj6=ii P (12)
2 2 P
(d
j=1 rs )/(djs ) j=1 [1 + jj=1 β[ii|jj]] [1 + jj=1 β[ii|jj]]
jj6=ii jj6=ii
PN [ii] PP PN [ii]
k=1 u2rk [ii]xkt + jj=1
k=1 erk [ii|jj])2 xkt
β[ii|jj](urk [ii] − u
jj6=ii
vrt [ii] = PN [ii] PP PN [ii] (13)
k=1 u2rk [ii] + jj=1
k=1 erk [ii|jj])2
β[ii|jj](urk [ii] − u
jj6=ii
different from each other, then β should be low, leading idea, a more computationally efficient algorithm can be
to a weak collaboration between the two data sites under developed. Such an algorithm, named CFC-βf , estimates
consideration. On the other hand, a pair of similar data sites values of β[ii|jj] only once — before the collaboration
should lead to a high value for β, which suggests that a process takes place — and keep them fixed throughout the
strong collaboration between them can be accomplished. collaboration stages. When using these two algorithms (as
More precisely, let us consider two data sites (D[ii] and well as the original CFC) the user implicitly assumes that the
D[jj]) and their respective prototypes (vi [ii] and vi [jj], data from different sites comes from different populations,
i = 1, ..., c). So let us suppose that the prototypes vi [jj] because he or she expects that only data sites that are
have been sent to the data site D[ii]. Following the CFC similar (in some relative sense) should take advantage of
algorithm (Section II-E), we compute the induced partition the collaborative clustering process. In other words, it is
e
matrix U[ii|jj]. From this matrix we can compute the value assumed, a priori, that for very different data sites there is
for the (induced) objective function: no reason to incorporate the shared information provided by
their respective prototypes into the collaborative process.
N [ii]
c
XX 2
Je [ii|jj] = e2ik [ii|jj] kxk − vi [jj]k
u (14)
B. Collaborative fuzzy clustering with a variable number of
k=1 i=1
clusters: CFCM-c* and CFCM-βf -c* algorithms
where u e2ik [ii|jj] is an element of the induced partition The algorithms addressed so far (CFC, CFC-βd and CFC-
e
matrix U[ii|jj] and xk ∈ D[ii]. βf ) require that the same number of clusters, c, be a
The FCM objective function — Eq. (3) — for D[ii] is priori chosen for all data sites and kept constant during the
denoted here as J[ii]. Then, the interaction level β[ii|jj] collaboration process. Aimed at relaxing such a constraint,
between two data sites D[ii] and D[jj], at a given collabo- so that the number of clusters can be automatically estimated
ration stage, can be defined as: from data, we propose two new algorithms.
( ) In order to address them, let us consider P data sites,
J[ii] D[1], D[2], ..., D[P ], consisting of N [1], N [2], ..., N [P ]
β[ii|jj] = min 1, (15)
Je [ii|jj] data defined in the same feature space. We are interested in
revealing structures by forming c[1], c[2], ..., c[P ] clusters
The value for Je [ii|jj] is typically greater than J[ii] in the respective data sites. The FCM algorithm reviewed
because it is computed based on the prototypes sent from in Section II-B is here used as a clustering vehicle. Again,
the jjth data site, whereas J[ii] has been optimized for the all results produced at a given site are clearly identified by
iith data site itself. Thus, the values for β[ii|jj] close to the index of this site. Thus, for the iith site the partition
0 suggest that the data sites are very different and, so, this matrix is denoted by U[ii] = [uik [ii]], i = 1, ..., c[ii], k =
implies that the collaboration should be low. Contrarily, if 1, ..., N [ii] while the corresponding prototypes are given as
one assumes that the prototypes standalone convey all the v1 [ii], v2 [ii], ..., vc[ii] [ii].
information that describes the cluster structures, then it is For the sake of simplicity, let us initially assume that
legitimate to consider that the partitions in data sites D[ii] the collaborative clustering will take place in a single data
and D[jj] will be very similar when β[ii|jj] is close to 1, site, e.g., in D[ii]. In this case, all the other data sites
and vice-versa. In this case, the collaboration level should D[1], D[2], ..., D[ii − 1], D[ii + 1], ..., D[P ] communicate
be high. their respective prototypes vi [jj](i = 1, 2, ..., c[jj]; jj =
The idea captured by (15) has been incorporated into the 1, 2, ..., ii − 1, ii + 1, ..., P ) to D[ii]. Such prototypes can
CFC algorithm in order to dynamically adjust the interaction be viewed as representatives of the clusters. In this sense,
level between every pair of data sites at every collaboration different prototypes may represent clusters that summarize
stage. Now, urs [ii] and vrt [ii] are computed using Eqs. (12) the information encoded into different numbers of objects.
and (13), respectively. To make this point clear, let us derive a Boolean partition
The resulting algorithm is named CFC-βd . Following this from the fuzzy partition induced at a particular data site
8
D[jj] by classifying its objects as belonging to the clusters are detrimental to the overall collaborative clustering pro-
of D[jj]. The kth object of D[jj] is classified to the ith cess. Therefore, ideally we should be able to validate each
fuzzy cluster of D[jj] if the membership uik [jj] is higher clustering structure found at a given data site involved in
than the membership of this object to any other fuzzy cluster, the collaboration process by quantifying the randomness
i.e., uik [jj] > uqk [jj] for every q ∈ {1, ..., c[jj]}, q 6= i. By underlying the data set at each data site. In a nutshell, this
doing so, one can compute the number of objects assigned to can be accomplished, for instance, via Monte Carlo simula-
each fuzzy cluster of D[jj]. Let this number be represented tion. More precisely, initially a baseline distribution of some
by ni [jj], i = 1, 2, ..., c[jj]. Roughly speaking, it captures statistic of interest — e.g., the functional J in (3) — under
how representative a given prototype of D[jj] is. However, the null hypothesis of no structure (or randomness) must
this approach may lead to some information loss, for that be established. In this sense, a simple approach involves
the membership values obtained from fuzzy clustering have generating m − 1 random data sets that “match” the given
been now discretized and restricted to belong to {0, 1}. data set at D[jj] in terms of the number of objects, number
To overcome this possible shortcoming, each ni [jj] could of features, and an overall spread. Then, a value for J,
be alternatively determined by considering the sigma-count considering a fixed number of clusters (c[jj]) which could
measure [58] of the ith fuzzy cluster, i.e.: have been defined a priori or estimated from data in D[jj],
is computed for each of the m − 1 randomly generated data
N [jj]
X sets. Note that we are dealing with m data sets, namely
ni [jj] = uik [jj] (16) m − 1 randomly generated plus the one available at D[jj].
k=1
Next, one could perform a statistical test of hypothesis,
thus making a better use of the information provided by the aimed at rejecting the null hypothesis of randomness at a
fuzzy clusters found in D[jj]. given significance level if the J value of the data set used
If desired, a factor γi that reflects the interestingness in the collaborative process is among the best (smallest)
and/or reliance of each cluster from an obtained data parti- values achieved in the Monte Carlo trials. For collaborative
tion can be incorporated into the computation of ni [jj]. For purposes, we do not need to necessarily place a particular
P
instance, we could make ni [jj] = γi uik [jj], γi ∈ [0, 1]. value for the significance level that leads to the rejection
High values of γi (e.g., 0.95) would suggest that the of the null hypothesis, making us to decide if either the
respective cluster is very interesting for collaboration pur- clustering structure found at D[jj] is valid or not. Actually,
poses, whereas low values would suggest the opposite. For we just want to quantify the randomness underlying the data
instance, a domain expert may eventually want to state that set at each data site. This notion canPbe simply captured
some cluster is more interesting for collaboration purposes by defining ni [jj] = (1 − r∗ /m) uik [jj], where r∗
than another one. As a particular case of this framework we is the rank of the partition found in D[jj] according to
can have γ1 = γ2 = ... = γc[jj] = γ, in which we quantify the m values obtained for J. Thus, partitions that can be
into a single measure, γ, the overall interestingness of the considered unusually good (under the null hypothesis of
prototypes of a given data site for collaboration purposes. no structure) will have more weight on the collaborative
One way to determine the value of the interestingness clustering process than partitions induced from random
factor automatically is to use the interaction level β[ii|jj] data. Note that, considering our collaborative clustering
— Eq. (15) — by doing γ = β[ii|jj] when transmitting setting, it is straightforward to take both interestingness and
prototypes from D[jj] to D[ii]. It is worth mentioning that, randomness into account by multiplying two factors, i.e.
by transmitting γ indirectly by means of ni [jj], we do not computing the product:
incur in additional communication overhead. X
Clustering algorithms induce clusters no matter whether ni [jj] = (1 − r∗ /m)γ uik [jj] (17)
the data are naturally clustered or purely random. One must
be particularly conscious of this fact in a collaborative Let us now consider that the collaborative clustering will
clustering setting, in which random clustering structures be performed for a number of iterations (tmax ). Schemat-
may negatively impact the collaboration process in a sig- ically, we can portray the collaborative clustering scheme
nificant way. For instance, let us assume that, for a given taking place at data site D[ii] at the tth iteration as presented
data site D[jj], all sets of locations in some region of in Figure 1. After the collaboration process performed at
the d-dimensional space are equally likely to appear. In iteration t, an induced partition matrix Ut [ii] = [utik [ii]], i =
other words, consider that each of the N points (objects) 1, ..., ct [ii], k = 1, 2, ..., N [ii] is obtained and kept for future
is inserted at random into the region, leading to what is reference. In addition, prototypes obtained at iteration t —
called the random position hypothesis in the statistical data t
v1t [ii], v2t [ii], ..., vc[ii] [ii] — as well as their corresponding
analysis literature. Clearly, clustering structures found at values for ni [ii], i = 1, 2, ..., ct [ii], are kept to be used in
t
D[jj] are just an artifact found by the clustering algorithm. the next collaboration stage. The induced partition can be
As a consequence, the prototypes determined at this site computed by adapting the FCM algorithm, as described in
9
...
{ni[1]}
Data site D[ii-1]:
Objects: {xK[ii-1] ∈ d; K = 1, ..., N[ii-1]}. {vi[ii-1]}
Number of clusters: c[ii-1]. {ni[ii-1]}
Prototypes: {vi[ii-1]; i = 1, 2, …, c[ii-1]}.
Weights of the each vi[ii-1]: {ni[ii-1]; i = 1, 2, …, c[ii-1]}. Data site D[ii]:
{vi[ii+1]} d;
Objects: {xK[ii] ∈ K = 1, ..., N[ii]}.
Data site D[ii+1]:
{ni[ii+1]}
Objects: {xK[ii+1] ∈ d;K = 1, ..., N[ii+1]}.
From the tth iteration of
Number of clusters: c[ii+1]. {vi[P]}
clustering collaboration
Prototypes: {vi[ii+1]; i = 1, 2, …, c[ii+1]}. {ni[P]}
with D[1], D[2], …,
Weights of the each vi[ii+1]: {ni[ii+1]; i = 1, 2, …, c[ii+1]}. D[ii-1], D[ii+1], …,
D[P] we have:
...
Fig. 1. Collaborative clustering scheme taking place at data site D[ii] at the tth iteration. Analogous schemes can be depicted for all data sites D[jj], jj =
1, 2, ..., P, jj 6= ii. For these remaining cases, the set of prototypes vi [ii], i = 1, 2, ..., c[ii], as well as their respective values for ni [ii], i = 1, 2, ..., c[ii],
achieved from clustering xK [ii] ∈ ℜd (K = 1, 2, ..., N [ii]) in D[ii] are also used for collaboration purposes for all D[jj], jj = 1, 2, ..., P, jj 6= ii.
Algorithm 4. This algorithm essentially describes the FCM, each of the P data sites — D[1], D[2], ..., D[P ] — using
except for Steps 5 and 15. The former one fundamentally only information available within their respective data sets
involves making room for additional, virtual objects that are xk [1] ∈ ℜd ; k = 1, ..., N [1], xk [2] ∈ ℜd ; k = 1, ..., N [2], ...,
not originally available at D[ii]. Such virtual objects are xk [P ] ∈ ℜd ; k = 1, ..., N [P ], thus finding c[1], c[2], ..., c[P ]
actually the prototypes transmitted by the other data sites clusters to be fed as fixed parameters to Algorithm 4. In this
and that will be taken into account in the clustering process case, any such estimates could be dynamically reviewed if
performed at D[ii]. It can also be noted from Step 5 that the respective data set is continuously updated, as in the
each virtual object is given a weight wK . This weight can case of having streaming data, for instance. Second, we may
capture the representativeness, interestingness, and random- wish to estimate the number of clusters in each data site
ness of each virtual object (prototype) — according to our D[ii] by taking into account the information provided by
previous discussions — and is used to compute the cluster the other data sites, i.e., by considering not only the objects
prototypes in Step 15 — Eq. (19). It is important to note of D[ii] but also the transmitted prototypes (in the role of
that Algorithm 4 keeps the nice properties of FCM that virtual objects, as just discussed above) for estimating the
are well studied in the literature. Also, it does not require number of clusters at D[ii]. In this case, Algorithm 4 can
any additional parameter for collaboration purposes beyond be combined with Algorithm 2.
those used to capture the representativeness, interestingness, Both the algorithms addressed in this section (CFCM-c*
and randomness, which can be automatically estimated from and CFCM-βf -c*) are based on Algorithm 2 — which
data (if it is so desired). allows estimating the number of clusters — but FCM in
The collaborative clustering portrayed in Figure 1 is per- Step 7 of Algorithm 2 is replaced with CFCM (Algorithm
formed at each data site for t = 1, ..., tmax . At t = 0, FCM 4). Notice, however, that an additional adaptation is needed,
is run for each data site, and no collaboration is performed. namely multiplying the numerator and the denominator of
Then, for t = 1, 2, ..., tmax , every data site receives infor- the Fuzzy Silhouette in (8) by the object weights (wK ).
mation from all the other sites, and clustering collaboration For computing such weights, the algorithm CFCM-c* uses
takes place by considering updated information from each Eq. (16), thus trying to induce partitions as if the data
data site. For each data site, the number of clusters can were altogether in a single data site. This characteristic
be defined beforehand if domain knowledge is available, suggests that CFCM-c* is useful for data that come from
or it can be automatically estimated from data (e.g., by the same population. CFCM-βf -c*, by its turn, computes the
using the Algorithm 2 described in Section II-D). From object weights by using Eq. (17) with γ = β[ii|jj]. Such
this standpoint, two different approaches are considered. a β[ii|jj] is estimated from data, before the collaboration
First, one can estimate the number of fuzzy clusters for phase, and is kept fixed for the whole collaboration process
10
Algorithm 4: Collaborative Fuzzy C-Means (CFCM) C. Parallel fuzzy clustering with a variable number of
1
t
Let c [ii] be the number of fuzzy clusters. Let us clusters
assume, for now, that ct [ii] has been fixed a priori. The Parallel Fuzzy C-Means (PFCM) algorithm [11] is
Alternatively, it can be estimated from data, e.g., by capable of finding, for a fixed number of clusters, c, a data
Algorithm 2. partition that is precisely the same as we would have gotten
2 Select initial cluster prototypes vt1 [ii], vt2 [ii], ..., if all data (originally distributed across different data sites)
vtct [ii] [ii] from xK , K = 1, 2, ..., N [ii]; were arranged altogether at a single data site. This nice
3 for K = 1, 2, ..., N [ii] do wK = 1; // initial property suggests that this algorithm is particularly useful
weights for each object of D[ii]. for application scenarios in which one assumes that the data
4 L ← 0; lying in different data sites come from the same population.
5 for jj = 1, 2, ..., ii − 1, ii + 1, ..., P do In this case, it is desirable that the local clusterings represent,
6 for i = 1, 2, ..., ct−1 [jj] do as accurately as possible, the population of the whole,
7 K ← K + 1; // an additional which would be better captured if one could put all the
(virtual) object is created for each available data in a single data site, so that a sample as big
prototype from D[jj]. as possible would be available for the modeling process.
8 xK = vt−1
i [jj]; Therefore, PFCM [11] is a natural choice for this kind of
9 wK = nit−1 [jj]; application scenario. However, it requires that the number of
10 L ← L + 1; // number of virtual clusters, c, be chosen by the user. In order to get around this
objects. limitation, one can simply replace FCM with PFCM in Step
11 end 7 of Algorithm 2 and employ a distributed version of the
12 end simplified Fuzzy Silhouette (FS) — Eq. (8) — computed
13 Compute the distances ||xK − vti [ii]|| between objects as the validity index in Step 8 of the algorithm. From
and prototypes (i = 1, 2, ..., ct [ii]; the observation that FS requires only distances between
K = 1, 2, ..., N [ii] + L); objects and prototypes (which are shared by all data sites)
14 Compute the elements of the fuzzy partition matrix in order to be computed, it is straightforward to show that
(i = 1, 2, ..., ct [ii]; K = 1, 2, ..., N [ii] + L): the (partial) silhouettes computed separately for each site
t −1 — by means of (8) — can be aggregated in a single site so
c [ii]
X kxK − vt [ii]k 2 that the global silhouette can be exactly obtained (without
utiK [ii] = i (18) any approximation)3 . The extension of the PFCM algorithm
kxK − vlt [ii]k
l=1 that is capable of automatically estimating the number of
15 Compute the cluster prototypes: clusters is here named PFCM-c*.
PN [ii]+L 2
t K=1 uiK xK wK D. Algorithm selection
vi [ii] = P N [ii]+L 2
(19)
K=1 uiK wK In practice, choosing an algorithm (among several alterna-
tive ones) for a particular application is usually not an easy
16 Stop if convergence is attained or the number of task. Therefore, we here summarize the main assumptions
iterations exceeds a given limit. Otherwise, go back to that can be made by the user when choosing the most
Step 13; potentially suitable algorithm to be used in his or her
application domain. To that end, we suggest the decision-
tree-like structure depicted in Figure 2, which is (hopefully)
self-explaining. Some of its nodes, however, deserve fur-
— analogously to what is done by CFC-βf , discussed in ther explanation. For instance, algorithms CFCM-c* and
Section III-A. From this observation, CFCM-βf -c* has the CFCM-βf -c* are not only capable of automatically estimat-
same characteristic already discussed when addressing the ing the number of clusters but also allow different numbers
algorithms CFC, CFC-βd , and CFC-βf , namely it is suitable of clusters for each data site. This characteristic is more
for data that come from different populations. Finally, note interesting for Scenario 1, where it is assumed that the data
that our collaborative clustering setting allows the use of come from different populations. If it is assumed that the
several relative indexes of partitional adequacy (different data come from the same population (Scenario 2), PFCM-c*
from the simplified Fuzzy Silhouette here used) as long as might be preferred to CFCM-c* when the number of clusters
weights can be incorporated into the objects. Also, more is unknown for it is more in conformity with the assumption
sophisticated clustering schemes, which make use, for in- about the population distribution — see experiments in
stance, of cluster ensembles, consensus partitions, and alike,
3 Note that there are other relative validity criteria for which this property
can also be incorporated into the discussed collaborative
setting. also holds — e.g., [59].
11
Data
distribution
assumption
data sites come from data sites come from
different populations the same population
Number of Number of
clusters (c) clusters (c)
Scenario 1 Scenario 2
Fig. 2. Decision-tree-like structure for clustering algorithm selection.
[1] CFC [10] — Section II-E [5] CFCM-βf -c* — Section III-B
[2] CFC-βd — Section III-A [6] PFCM [11]
[3] CFC-βf — Section III-A [7] PFCM-c* — Section III-C
[4] CFCM-c* — Section III-B
TABLE I
T IME , S PACE , AND C OMMUNICATION COMPLEXITY OF THE CLUSTERING ALGORITHMS .
Section IV-B. Assuming that the data are generated from considered. Actually, to the best of our knowledge, a sound
different populations (Scenario 1) and that the number of methodology for performing such an evaluation still does
clusters is a priori known, three algorithms can be used. not exist. In brief, we would like to be able to answer,
However, we note that CFC requires that the user to set a in a sound way, the following question: What should be
parameter, β (interaction level between data sites), which the expected result of the collaborative clustering process
may be a difficult task in practice. In turn, algorithms CFC- for a given number of data sites? Trying to answer this
βd and CFC-βf automatically estimate the corresponding question is particularly difficult when the data in different
parameter directly from the data, thus being more suitable. sites are assumed to come from different populations, as
Finally, CFC-βf is more computationally efficient than will be hopefully clear from the analysis of our illustrative
CFC-βd — the results of computational complexity analysis examples. Fortunately, however, it is indeed possible to
are summarized in Table I. choose some illustrative examples that highlight perfor-
mance differences among the algorithms under study w.r.t.
IV. I LLUSTRATIVE EXAMPLES the assumptions about the data (as summarized in Section
III-D). In this sense, although such analysis are inherently
Although several measures for the quantitative assessment subjective (as it is the definition of what a cluster is), they
of clustering algorithms do exist, including a number of are useful and allow us to derive a number of conclusions
validity criteria — see [54], the evaluation of clustering that can be useful for the reader interested in applying
algorithms is, in general, tricky. This is particularly valid collaborative fuzzy clustering algorithms in practice. To that
when the collaborative (fuzzy) clustering algorithms are
12
attribute2
attribute2
attribute2
attribute1 attribute1 attribute1
(a) (b) (c)
Fig. 3. Clusters obtained by FCM-c* — Data set Synth1 — (a) D1, (b) D2, and (c) D1 ∪ D2.
end, we will focus on the application scenarios depicted in and CFCM-βf -c*, from which only the last one is capable
Figure 2, which are characterized by the data distribution to automatically estimate the number of clusters. For the
assumption. Two data sets are used in our examples: remaining algorithms, we set c = 2 (correct number of
Synth1: This data set has two bi-dimensional data clusters).
sites (D1 and D2). Each data site has 400 Figure 4 illustrates the results obtained by CFC-βf after
objects clustered into groups according to a four collaboration stages. Note that the depicted clusters are
mixture of Gaussians whose mean vectors are practically identical to those illustrated in Figure 3 (a) and
D1 : v1 = [10.0, 4.0], v2 = [4.0, 10.0] and D2 : (b), except for very small differences w.r.t. the position of
v1 = [10.0, 10.0], v2 = [4.0, 4.0]; prototypes. This observation suggests that significant collab-
Synth2: This data set consists of two bi-dimensional data orative activities did not happen (site D1 was not affected
sites (D1 and D2). Each data site has 600 ob- by the very different clustering structure present in D2, and
jects, clustered into three groups that follow Gaus- vice-versa), what is the desired result for the underlying
sian distributions whose mean vectors are D1 : assumption that data come from different populations. Very
v1 = [10.0, 4.0], v2 = [4.0, 4.5], v3 = [8.5, 10.0] similar results were obtained for CFC (with β = 0.10,
and D2 : v1 = [10.0, 10.0], v2 = [6.0, 4.0], v3 = optimized from data after running the algorithm for several β
[6.0, 10.0]; values) and CFC-βd . As previously discussed, all these algo-
rithms (CFC, CFC-βd and CFC-βf ) require that the number
1 0
The same covariance matrix has been used to of clusters be given in advance. When the number of clusters
0 1
generate both data sets. is unknown, CFCM-βf -c*, which is capable of estimating it
from data, can be used. This algorithm was run for the same
data set, using cmax = 6, np = 100, and simplified FS as its
A. Scenario 1 – Data from different populations validity index. The clusters obtained after two collaboration
Let us first consider data set Synth1. Before applying stages are very similar to those shown in Figure 3 (a) and
collaborative fuzzy clustering algorithms to this data set, let (b). This algorithm also automatically estimates (from data,
us explore the clustering structure present in its constituent before collaboration starts) the interaction levels between
data sites. Running FCM-c* (Algorithm 2 in Section II-D every pair of data sites. Note that the interaction levels
based on simplified Fuzzy Silhouette), with cmax = 6 and are, by the definition in Eq. (15), asymmetric. To avoid
np = 100, for each data site (D1 and D2), results in the this potential difficulty, we have taken the average between
expected data partitions (from visual inspection) depicted in every pair of data sites. For this particular data set we
Figure 3 (a) and (b). Similarly, running FCM-c* in a virtual have β[1|2] = β[2|1] = (β[1|2] + β[2|1])/2 = 0.0987. This
data set formed by D1 ∪ D2 leads to the (expected) data low interaction level reasonably quantifies the differences
partitions shown in Figure 3 (c). At this point, we would between the clustering structures of these sites.
like to anticipate that this result is also the expected one In the second example for Scenario 1 we use the data
for algorithms that assume that the data come from the set Synth2. Again, before running collaborative clustering
same population (as will be discussed in Section IV-B), algorithms, we firstly run FCM-c* (cmax = 6 and np = 100)
which is not the case here. So, let us turn our discussion at each data site (D1 and D2). Figure 5 (a) and (b) shows
to the application scenario in which one assumes that the the obtained clusters. Note that these data sites have a
data in different sites comes from different populations. number of points (objects) in some shared regions of the
According to Figure 2, the algorithms suggested for this two-dimensional space. More precisely, these data sites were
kind of application scenario are CFC, CFC-βd , CFC-βf generated from selecting some points from the data set in
13
attribute2
attribute2
attribute2
attribute1 attribute1 attribute1
(a) (b) (c)
Fig. 5. Clusters obtained by FCM-c* — Data set Synth2 — (a) D1 and (b) D2, and (c) D1 ∪ D2.
attribute2
attribute2
attribute2
attribute1 attribute1 attribute1
(a) (a) (a)
attribute2
attribute2
attribute2
Figure 5 (c), which is indeed the union of the sets of objects of the studied algorithms.
of D1 and D2. From this point of view, it is reasonable Algorithms CFC (β = 0.35 optimized from data), CFC-
to say that, in some sense, they have some degree of βd , and CFC-βf , all with c = 3, have not shown significant
similarity. Figure 5 (c) also shows the clusters that should be changes in their prototypes beyond eleven collaboration
obtained by collaborative clustering algorithms that assume stages (i.e., the prototypes converged). Figure 6 shows the
that data come from the same population. However, when clusters obtained by CFC, where one can see that the proto-
considering Scenario 1, where data are assumed to come types are very similar to those illustrated in Figure 5 (a) and
from different populations, the situation is different, namely (b). Thus, this algorithm has just made slight adjustments
the collaborative clustering algorithms should be able to find to the prototypes, as a result of the collaborative process.
(and make good use of) common cluster structures present CFC-βd has shown results very similar to those obtained
in the data sites. This is essentially what collaboration by CFC. The algorithm CFC-βf , by its turn, has exhibited
means in those scenarios. In other words, clusters that are different results, which are shown in Figure 7. This figure
(partially) similar should be refined by the collaborative shows a cluster of objects represented by triangles in D1,
process, whereas clusters completely different should not be whose prototype of coordinates (8.5, 10.1) has influenced
affected. Having this in mind, let us analyze the performance the clusters in data site D2, in such a way that two clusters
14
of D2 were merged due to the information communicated by This discussion can be further elaborated by taking into
the prototypes of D1 — see Figure 7 (b) and observe Figure account the results obtained by CFCM-βf -c* (cmax = 6,
5 (c) as well. A possible explanation behind this behavior np = 100). Figure 8 illustrates the obtained prototypes
is that objects with attribute2 > 7 and attribute1 around 8.5 (after four collaboration stages, β[1|2] = β[2|1] = 0.3509).
in Figure 7 (b) could be considered as underrepresented, Following the same lines of reasoning as above (for CFC-
e.g., due to some idiosyncrasy of the sampling process. βf ), two clusters in D2 were merged — compare Figure 8 to
On the other hand, objects with similar characteristics are Figures 5 (a), (b), and (c). In addition, the cluster represented
found around the coordinates (8.5, 10.1) in Figure 7 (a). by circles in Figure 8 (a), whose prototype has coordinates
The prototype representing these objects has influenced the (9.8, 3.9), has not shown enough sample evidence (weight in
clustering results in D2, making the algorithm to find a Algorithm 4) to be represented in Figure 8 (b). Considering
single cluster formed by triangles there. One may argue that, that the underlying assumption is that the data come from
by doing this, CFC-βf is working as if the data come from different populations, this is an interesting result, for that
the same population. This analysis is correct for the clusters very different clusters present in other data sites do not have
represented by triangles in Figure 7. However, it only a strong influence in a particular data site. Note, however,
explains part of the results found by CFC-βf in these data that the prototype coordinates of the cluster formed by
sites, i.e., we still need to consider why the algorithm split triangles in Figure 8 (b) have been slightly adjusted by the
the other cluster in D2. Our plausible explanation comes collaborative process — taking into account the prototypes
from two key observations: (i) the algorithm is limited to (mainly from the clusters formed by squares and circles) in
induce a fixed number of clusters (three in this case); and Figure 8 (a).
(ii) objects that form the right hand side cluster (squares) in
D2 can really be considered as belonging to another cluster
B. Scenario 2 – Data from the same population
— in fact, the cluster formed by squares shown in Figure 5
(c). Please note, however, that the prototype of the cluster Given the decision-tree-like structure shown in Figure
formed by squares in Figure 7 (b) is very different from 2, we illustrate the performance of PFCM, PFCM-c* and
the respective prototype in Figure reffig:exp1:4 (c), which CFCM-c*. The results obtained by PFCM (with the assumed
should be ideally obtained by an algorithm that (unlike correct number of clusters) are the same as those achieved by
CFC-βf ) assumes that the data were generated by the same PFCM-c*. For this reason, we only analyse the performance
population (as addressed in the next section). of the latter, which does not require the number of clusters
to be specified in advance. Figures 9 and 10 show the
prototypes obtained by PFCM-c* in data sets Synth1 and
Synth2. As expected, these prototypes are the same as to
the ones that would be obtained as if the data were gathered
and stored in a single centralized repository — see Figures
3 (c) and 5 (c) —, thus appropriately representing data that
attribute2
attribute2
attribute2
attribute1 attribute1 attribute1
(a) (a) (a)
attribute2
attribute2
attribute2
attribute1 attribute1 attribute1
(b) (b) (b)
Fig. 9. Clusters obtained by PFCM-c* — Fig. 10. Clusters obtained by PFCM-c* Fig. 11. Clusters obtained by CFCM-c*
Data set Synth1 — (a) D1 and (b) D2. — Data set Synth2 — (a) D1 and (b) D2. — Data set Synth2 — (a) D1 and (b) D2.
attribute2
attribute2
attribute1 attribute1 attribute1
(a) (b) (c)
Fig. 12. Data set Synth1 — (a) and (b) clusters obtained by CFCM-c*, respectively, in D1 and D2 (no cluster structure) and (c) clusters obtained by
CFCM-βf -c* in D1.
TABLE IV TABLE V
P ROTOTYPES FOUND BY CFCM-c* — Synth1-8 DATA S ET. M EAN VECTORS — Synth2-4 DATA S ET.
the prototypes that are very similar to those reported in Table Finally, CFC and CFC-βf also produced results similar to
IV. those contained in Figures 6 and 7 (this closeness is visible
Let us now consider the four-dimensional data set Synth2- for the first two coordinates of the prototypes).
4, obtained from the original two-dimensional data by PFCM-c* constructed two clusters in each data site of
adding two new features. Table V summarizes the mean vec- Synth2-4. The respective prototypes are equal to those shown
tors for this data set, formed by two data sites (D1 and D2), in Table VI. This is again quite an expected result for the
each of which has 600 objects clustered into three groups. algorithms designed for dealing with data that come from
the same population (Scenario 2). CFCM-c* also obtained
Table VI shows the prototypes obtained after running
prototypes very similar to those in Table VI.
FCM-c* (cmax = 6 and np = 100) in the data set formed
from the union of the two data sites (D1 ∪ D2). One can 2) Breast Cancer Data: The Wisconsin Breast Cancer
observe that two clusters were found, thus suggesting that Data Set [60] comprises 683 objects described by nine
the three clusters from each data site significantly overlap. features. It was obtained from the University of Wisconsin
The algorithm CFCM-βf -c*, on the other hand, found three Hospitals, where samples arrive periodically as doctors
and two clusters in D1 and D2, respectively — see Table report clinical cases. The data set therefore reflects this
VII. In this case, the protoypes found in D1 are very similar chronological grouping of the data. Each object has one of
to the original mean vectors (see Table V). In case of the two possible classes (benign or malignant). There are 444
data site D2, which was more affected by the collaboration objects from the Benign class and 239 from the Malignant
process, CFCM-βf -c* formed two clusters, which are very class. Running FCM-c* in this data set (without the class
similar to those reported in Table VI. It is noticeable that the labels) results in prototypes very similar to the centroids of
first two coordinates (variables) of the prototypes in Table each class. Thus, the classes can be appropriately modeled
VII are very similar to those of the prototypes in Figure 8. by means of two clusters. The objects have been distributed
17
TABLE VIII
RUNNING TIMES ( IN SECONDS ) FOR B REAST C ANCER DATA (M ATLAB CODES — 2.1 GH Z I NTEL C ORE 2 D UO , 4 GB RAM)
TABLE VI
P ROTOTYPES OBTAINED BY FCM-c* — D1 ∪ D2 FROM Synth2-4. specially difficult to be performed because, at the end, we
should be able to know (a priori) what the expected results
Data site D1 ∪ D2 of the collaborative clustering process are (e.g., the resulting
c1 8.16 10.04 9.99 11.03
c2 6.49 4.16 3.63 3.97 prototypes).
Finally, we also use the Breast Cancer dataset to illustrate
TABLE VII
P ROTOTYPES OBTAINED BY CFCM-βf -c* FOR EACH DATA SITE OF the magnitude of the constant terms neglected by time
Synth2-4. complexity analyses. The running times (in seconds) of the
collaborative phase of each algorithm are summarized in
Data site D1
c1 9.81 3.97 4.95 5.99
Table VIII, which also shows the number of collaboration
c2 4.28 4.48 3.07 2.33 stages and their respective average times.
c3 8.34 10.06 10.02 11.04
Data site D2
c1 8.02 10.03 9.98 11.04 V. C ONCLUSIONS
c2 6.29 4.04 3.30 4.06
We addressed and discussed several existing and new
algorithms for collaborative fuzzy clustering. We studied and
proposed some improvements to the CFC algorithm [10],
across two data sites (D1 and D2) by means of a stratified whose performance depends on two user-defined parameters
random sampling procedure. The algorithms that are appro- (β and c). In order to make it more autonomous with regard
priate for Scenario 2 (data come from the same population) to these two parameters, we introduced two data-driven
have found two clusters for each data site. The Euclidean approaches that result in the algorithms called CFC-βd
distances between the prototypes obtained by CFCM-c* and CFC-βf . In what concerns to the latter parameter,
and PFCM-c* (cmax = 6 and np = 20) with respect we developed two new algorithms (named CFCM-c* and
to the corresponding prototypes found by FCM-c* (when CFCM-βf -c*, respectively) that are capable of estimating
applied to the complete, centralized data set) are 0.2 and 0.0, the number of clusters from data. In addition, the intrinsic
respectively. These small differences between prototypes are nature of the collaborative fuzzy clustering problem has
indeed an expected result, i.e., the obtained prototypes are led us to consider the Parallel Fuzzy C-Means (PFCM)
essentially the same that would have been obtained as if the algorithm [11] as a potentially promising distributed clus-
data were gathered and stored in a centralized repository. tering vehicle for some application scenarios, especially
In the case one assumes that the data come from dif- when one assumes that the data located at different data
ferent populations, quantitative analyses are harder to be sites come from the same population. However, PFCM [11]
performed. In brief, all the algorithms suitable for Scenario 1 also requires that the number of clusters be provided by
have also found two clusters in each data site. The Euclidean the user. In order to circumvent this limitation, we have
distances between the prototypes obtained by CFC, CFC- developed an extension of this algorithm that, while keeping
βd , and CFC-βf (c = 2) w.r.t. the respective prototypes its sound properties, allows the automatic estimation of
found by FCM-c* (when applied to the complete, centralized the number of clusters. Finally, in practice, depending on
data set) are all equal to 2.1. Analogously, the respective the underlying assumptions about the data (i.e., data that
value for CFCM-βf -c* (cmax = 6 and np = 20) is 2.2. come from the same or from different populations), as well
If compared to the results obtained for Scenario 2, these as on the difficulty to set some user-defined parameters
are expected results. In particular, algorithms for Scenario 1 (such as the number of clusters and/or the interaction level
adjust prototypes by taking into account that data come from between data sites), some algorithms may be preferred to the
different populations. From this point of view, the degree others. From this viewpoint, we suggested a decision-tree-
of adjustment tends to be smaller than that in Scenario 2, like structure for the user interested in choosing among the
which can explain the greater distances observed in relation available algorithms. By considering such a decision-tree-
to the prototypes found by FCM-c* in the centralized data. like structure, illustrative numeric examples that highlight
Relative comparisons among algorithms for Scenario 1 are performance differences among the algorithms under study
18
versus the assumptions made about the data were discussed, [16] S. Merugu and J. Ghosh, “A privacy-sensitive approach to distributed
allowing us to derive a number of conclusions that can clustering,” Pattern Recognition Letters, vol. 26, no. 4, pp. 399–410,
2005.
be useful for the practice of collaborative fuzzy clustering [17] J. C. da Silva and M. Klusch, “Inference in distributed data clus-
algorithms. The proposed decision-tree-like structure also tering,” Engineering Applications of Artificial Intelligence, vol. 19,
shows that, in two cases, more than one algorithm can be no. 4, pp. 363 – 369, 2006.
[18] D. E. Gustafson and W. C. Kessel, “Fuzzy clustering with a fuzzy
used. In particular, when it is assumed that the data sites covariance matrix,” in Proc. IEEE CDC, 1979, pp. 761–776.
come from different populations and the number of clusters [19] J. Bezdek and J. Dunn, “Optimal fuzzy partitions: A heuristic for
is a priori known, the CFC-βf algorithm will be usually estimating the parameters in a mixture of normal distributions,” IEEE
preferred, for that it automatically estimates the interaction Transactions on Computers, vol. C-24, no. 8, pp. 835 – 838, 1975.
[20] U. Kaymak and M. Setnes, “Fuzzy clustering with volume prototypes
levels directly from data and it is more computationally and adaptive cluster merging,” IEEE Transactions on Fuzzy Systems,
efficient than CFC-βd . In the case one assumes that the vol. 10, no. 6, pp. 705 – 712, 2002.
data come from the same population and the number of [21] I. Gath and A. Geva, “Unsupervised optimal fuzzy clustering,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 11,
clusters is not known in advance, the PFCM-c* algorithm no. 7, pp. 773 –780, 1989.
will be usually preferred because it keeps track of an [22] R. Dave and R. Krishnapuram, “Robust clustering methods: a unified
implicit correspondence problem between clusters and their view,” IEEE Transactions on Fuzzy Systems, vol. 5, no. 2, pp. 270–
293, 1997.
respective prototypes.
[23] P. Kersten, “Implementation issues in the fuzzy c-medians clustering
algorithm,” Proceedings of the Sixth IEEE International Conference
ACKNOWLEDGMENT on Fuzzy Systems, vol. 2, pp. 957–962, 1997.
[24] R. Hathaway, J. Bezdek, and Y. Hu, “Generalized fuzzy c-means
The financial support from the Research Agencies CNPq clustering strategies using lp norm distances,” IEEE Transactions on
and FAPESP is greatly appreciated. Fuzzy Systems, vol. 8, no. 5, pp. 576 –582, 2000.
[25] R. Krishnapuram and J. Keller, “A possibilistic approach to cluster-
ing,” IEEE Transactions on Fuzzy Systems, vol. 1, no. 2, pp. 98 –110,
R EFERENCES 1993.
[26] ——, “The possibilistic c-means algorithm: insights and recommen-
[1] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Intro- dations,” IEEE Transactions on Fuzzy Systems, vol. 4, no. 3, pp. 385
duction to Cluster Analysis. Wiley, 1990. –393, 1996.
[2] B. S. Everitt, S. Landau, and M. Leese, Cluster Analysis, 4th ed. [27] N. Pal, K. Pal, and J. Bezdek, “A mixed c-means clustering model,”
Wiley, 2001. Proceedings of the Sixth IEEE International Conference on Fuzzy
[3] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Systems, vol. 1, pp. 11–21, 1997.
Algorithms. Kluwer Academic, 1981. [28] N. Pal, K. Pal, J. Keller, and J. Bezdek, “A possibilistic fuzzy c-means
[4] J. Dunn, “A fuzzy relative of the isodata process and its use in clustering algorithm,” IEEE Transactions on Fuzzy Systems, vol. 13,
detecting compact well-separated clusters,” Journal of Cybernetics, no. 4, pp. 517 – 530, 2005.
vol. 3, pp. 32–57, 1973. [29] M. Barni, V. Cappellini, and A. Mecocci, “Comments on “A possi-
[5] J. B. MacQueen, “Some methods for classification and analysis of bilistic approach to clustering”,” IEEE Transactions on Fuzzy Systems,
multivariate observations,” in Proc. of the fifth Berkeley Symposium vol. 4, no. 3, pp. 393 –396, aug. 1996.
on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281–297.
[30] H. Timm, C. Borgelt, C. Dring, and R. Kruse, “An extension to
[6] A. Jain, “Data clustering: 50 years beyond k-means,” in Machine
possibilistic fuzzy cluster analysis,” Fuzzy Sets and Systems, vol. 147,
Learning and Knowledge Discovery in Databases, ser. Lecture Notes
no. 1, pp. 3 – 16, 2004.
in Computer Science, 2008, vol. 5211, pp. 3–4.
[31] M.-S. Yang and C.-Y. Lai, “A robust automatic merging possibilistic
[7] F. Hoppner, F. Klawonn, R. Kruse, and T. Rurkler, Fuzzy Clus-
clustering method,” IEEE T. Fuzzy Systems, vol. 19, no. 1, pp. 26–41,
ter Analysis: Methods for Classification, Data Analysis and Image
2011.
Recognition, 1999.
[8] R. Babuska, Fuzzy Modeling for Control. Kluwer Academic, 1998. [32] Y. El-Sonbaty and M. Ismail, “Fuzzy clustering for symbolic data,”
[9] W. Pedrycz, “Collaborative fuzzy clustering,” Pattern Recogn. Lett., IEEE Transactions on Fuzzy Systems, vol. 6, no. 2, pp. 195 –204,
vol. 23, no. 14, pp. 1675–1686, 2002. may. 1998.
[10] W. Pedrycz and P. Rai, “Collaborative clustering with the use of fuzzy [33] R. Krishnapuram, A. Joshi, and L. Yi, “A fuzzy relative of the k-
c-means and its quantification,” Fuzzy Sets and Systems, vol. 159, medoids algorithm with application to web document and snippet
no. 18, pp. 2399–2427, 2008. clustering,” IEEE Fuzzy Systems Conference Proceedings, vol. 3, pp.
[11] S. Rahimi, M. Zargham, A. Thakre, and D. Chhillar, “A parallel fuzzy 1281–1286, 1999.
c-mean algorithm for image segmentation,” in IEEE Annual Meeting [34] R. J. Hathaway, J. W. Davenport, and J. C. Bezdek, “Relational duals
of the North American Fuzzy Information Processing Society, 2004, of the c-means clustering algorithms,” Pattern Recognition, vol. 22,
pp. 234–237. no. 2, pp. 205 – 212, 1989.
[12] H. Kargupta and P. Chan, Advances in Distributed and Parallel [35] R. J. Hathaway and J. C. Bezdek, “Nerf c-means: non-euclidean
Knowledge Discovery. MIT Press, 2000. relational fuzzy clustering,” Pattern Recognition, vol. 27, no. 3, pp.
[13] N. Samatova, G. Ostrouchov, A. Geist, and A. Melechko, “Rachet: 429 – 437, 1994.
An efficient cover-based merging of clustering hierarchies from [36] R. Hathaway and Y. Hu, “Density-weighted fuzzy c-means cluster-
distributed datasets,” Distributed and Parallel Databases, vol. 11, ing,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 1, pp. 243
no. 2, pp. 157–180, 2002. –252, 2009.
[14] M. Klusch, S. Lodi, and G. Moro, “Agent-based distributed data [37] R. Hathaway and J. Bezdek, “Fuzzy c-means clustering of incomplete
mining: The kdec scheme,” Lecture Notes in Artificial Intelligence data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 31,
(Subseries of Lecture Notes in Computer Science), vol. 2586, pp. no. 5, pp. 735 –744, oct. 2001.
104–122, 2003. [38] R. L. Cannon, J. V. Dave, and J. C. Bezdek, “Efficient implementation
[15] J. Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch, of the fuzzy c-means clustering algorithms,” IEEE Transactions on
“Distributed data mining and agents,” Engineering Applications of Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 2, pp.
Artificial Intelligence, vol. 18, pp. 791–807, 2005. 248 –255, mar. 1986.
19
[39] F. Hoppner, “Speeding up fuzzy c-means: using a hierarchical data Luiz F. S. Coletta received his B.Sc. degree
organisation to control the precision of membership calculation,” in Computer Information Systems and M.Sc. de-
Fuzzy Sets and Systems, vol. 128, no. 3, pp. 365 – 376, 2002. gree in Computer Science from University of
[40] S. Eschrich, J. Ke, L. Hall, and D. Goldgof, “Fast accurate fuzzy Sao Paulo (USP) in 2009 and 2011, respectively.
clustering through data reduction,” IEEE Transactions on Fuzzy Currently, he is pursuing his Ph.D. studies at
Systems, vol. 11, no. 2, pp. 262 – 270, apr. 2003. the University of Sao Paulo, Sao Carlos, Brazil.
[41] M.-C. Hung and D.-L. Yang, “An efficient fuzzy c-means clustering His main research interests involve Data Mining,
algorithm,” in Proceedings of the IEEE International Conference on Machine Learning, and Bio-Inspired Computing.
Data Mining. IEEE Computer Society, 2001, pp. 225–232.
[42] T. W. Cheng, D. B. Goldgof, and L. O. Hall, “Fast fuzzy clustering,”
Fuzzy Sets and Systems, vol. 93, no. 1, pp. 49 – 56, 1998.
[43] N. Pal and J. Bezdek, “Complexity reduction for large image process-
ing,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 32,
no. 5, pp. 598 – 611, oct. 2002.
[44] M. S. Kamel and S. Z. Selim, “New algorithms for solving the fuzzy
clustering problem,” Pattern Recognition, vol. 27, no. 3, pp. 421 –
428, 1994. Lucas Vendramin received his B.Sc. degree
[45] V. Loia, W. Pedrycz, and S. Senatore, “P-FCM: a proximity-based (with high honors) in Computer Science from
fuzzy clustering for user-centered web applications,” International University of Sao Paulo (USP), Sao Carlos -
Journal of Approximate Reasoning, vol. 34, no. 2-3, pp. 121–144, Brazil, in 2011. He is currently working toward
2003. the M.Sc. degree in Computer Science with the
[46] W. Pedrycz, Knowledge-Based Clustering: From Data to Information University of Sao Paulo. His current research in-
Granules. Wiley Interscience, 2005. terests include Machine Learning and Data Min-
[47] ——, “Collaborative and knowledge-based fuzzy clustering,” Inter- ing.
national Journal of Innovative Computing, Information and Control,
vol. 3, no. 1, pp. 1–12, 2007.
[48] D. Dumitrescu, B. Lazzerini, and L. C. Jain, Fuzzy sets and their
application to clustering and training. CRC, 2000.
[49] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering valida-
tion techniques,” Journal of Intelligent Information Systems, vol. 17,
pp. 107–145, 2001.
[50] I. Sledge, J. Bezdek, T. Havens, and J. Keller, “Relational gener-
alizations of cluster validity indices,” IEEE Transactions on Fuzzy
Systems, vol. 18, no. 4, pp. 771 –786, 2010. Eduardo Raul Hruschka received his B.Sc. de-
[51] R. Campello and E. Hruschka, “A fuzzy extension of the silhouette gree in Civil Engineering from Federal University
width criterion for cluster analysis,” Fuzzy Sets and Systems, vol. 157, of Parana, Brazil, in 1995, and his M.Sc. and
no. 21, pp. 2858–2875, 2006. Ph.D. degrees in Computational Systems from
[52] E. Hruschka, L. de Castro, and R. Campello, “Evolutionary algo- Federal University of Rio de Janeiro in 1998 and
rithms for clustering gene-expression data,” Fourth IEEE Interna- 2001, respectively. He is with the Department of
tional Conference on Data Mining, pp. 403 – 406, nov. 2004. Computer Sciences of the University of Sao Paulo
[53] E. R. Hruschka, R. J. Campello, and L. N. de Castro, “Evolving (USP) at Sao Carlos, Brazil. From 2010 to 2012
clusters in gene-expression data,” Information Sciences, vol. 176, he was a visiting scholar at the Intelligent Data
no. 13, pp. 1898–1927, Jul. 2006. Exploration and Analysis Laboratory (IDEAL) at
[54] L. Vendramin, R. J. G. B. Campello, and E. R. Hruschka, “Relative the University of Texas at Austin, USA. He has
clustering validity criteria: A comparative overview,” Stat. Anal. Data authored or coauthored more than 70 research publications in peer-reviewed
Min., vol. 3, no. 4, pp. 209–235, 2010. reputed journals, book chapters, and conference proceedings. Dr. Hruschka
[55] J. Bezdek and N. Pal, “Some new indexes of cluster validity,” IEEE has been a reviewer for several journals such as IEEE TFS, IEEE TSMC,
Transactions on Systems, Man, and Cybernetics, vol. 28, no. 3, pp. IEEE TKDE, IEEE TEC, IEEE TNN, Information Sciences, Neurocom-
301 –315, jun. 1998. puting, Journal of Heuristics, Pattern Recognition Letters, Applied Soft
[56] A. K. Jain and R. C. Dubes, Algorithms for clustering data. Prentice- Computing, and Computational Statistics & Data Analysis.
Hall, 1988.
[57] R. Campello, E. Hruschka, and V. Alves, “On the efficiency of
evolutionary fuzzy clustering,” Journal of Heuristics, vol. 15, pp. 43–
75, 2009.
[58] W. Pedrycz and F. Gomide, Fuzzy Systems Engineering: Toward
Human-Centric Computing. Wiley, 2007.
[59] X. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Ricardo J. G. B. Campello received the BSc
Transactions on Pattern Analysis and Machine Intelligence, vol. 13, degree in Electronics Engineering from the State
no. 8, pp. 841 –847, aug. 1991. University of Sao Paulo (Unesp), Ilha Solteira/SP
[60] A. Asuncion and D. Newman, “UCI machine learning repository,” - Brazil, in 1994, and the MSc and Ph.D. degrees
2007. in Electrical Engineering from the School of
Electrical and Computer Engineering of the State
University of Campinas (Unicamp), Campinas/SP
- Brazil, in 1997 and 2002, respectively. In 2002
he was a visiting scholar at the Laboratoire
D’Informatique, Signaux et Systemes de Sophia
Antipolis, Universite de Nice - Sophia Antipolis
(UNSA), France. Since 2007 he is with the Department of Computer
Sciences of the University of Sao Paulo (USP) at Sao Carlos/SP - Brazil,
currently as an Associate Professor. He is currently on a sabbatical leave
at the Department of Computing Sciences of the University of Alberta,
Edmonton/AB - Canada. He has been a merit scholar of the Brazilian
National Research Council since 2005. His current research interests fall
primarily into the areas of Soft Computing, Machine Learning, and Data
Mining.
20