Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
Abstract—Data analysis plays an indispensable role for un- [60] , [167]. When the inducer reaches convergence or
derstanding various phenomena. Cluster analysis, primitive termi- nates, an induced classifier is generated [167].
exploration with little or no prior knowledge, consists of research In unsupervised classification, called clustering or ex-
developed across a wide variety of communities. The diversity,
on one hand, equips us with many tools. On the other hand, ploratory data analysis, no labeled data are available [88],
the profusion of options causes confusion. We survey clustering [150]. The goal of clustering is to separate a finite unlabeled
algorithms for data sets appearing in statistics, computer science, data set into a finite and discrete set of “natural,” hidden data
and machine learning, and illustrate their applications in some structures, rather than provide an accurate characterization
benchmark data sets, the traveling salesman problem, and bioin-
of unobserved samples generated from the same probability
formatics, a new field attracting intensive efforts. Several tightly
related topics, proximity measure, and cluster validation, are distribution [23], [60]. This can make the task of clustering fall
also discussed. outside of the framework of unsupervised predictive learning
problems, such as vector quantization [60] (see Section II-C),
Index Terms—Adaptive resonance theory (ART), clustering,
clustering algorithm, cluster validation, neural networks, prox- probability density function estimation [38] (see Section II-D),
imity, self-organizing feature map (SOFM). [60], and entropy maximization [99]. It is noteworthy that
clustering differs from multidimensional scaling (perceptual
maps), whose goal is to depict all the evaluated objects in a
I. INTRODUCTION way that minimizes the topographical distortion while using as
Fig. 1. Clustering procedure. The typical cluster analysis consists of four steps with a feedback pathway. These steps are closely related to each other and affect
the derived clusters.
•) Hierarchical clustering attempts to construct a tree- proximity matrix, as defined in Section II-A.
like nested structure partition of Once a proximity measure is chosen, the
, such that construction of a
, and imply or
.
For hard partitional clustering, each pattern only belongs to
one cluster. However, a pattern may also be allowed to belong
to all clusters with a degree of membership, , which
represents the membership coefficient of the th object in the
th cluster and satisfies the following two constraints:
and
tification via connectivity kernels (CLICK), cluster Applications in two benchmark data sets, the traveling
affinity search technique (CAST) salesman problem, and bioinformatics are illustrated in Sec-
• F. Combinatorial Search Techniques-Based tion III. We conclude the paper in Section IV.
— Genetically guided algorithm (GGA), TS clustering,
SA clustering
• G. Fuzzy II. CLUSTERING ALGORITHMS
— Fuzzy -means (FCM), mountain method (MM), pos-
sibilistic -means clustering algorithm (PCM), fuzzy Different starting points and criteria usually lead to
-shells (FCS) different taxonomies of clustering algorithms [33], [88], [124],
• H. Neural Networks-Based [150], [152], [171]. A rough but widely agreed frame is to
— Learning vector quantization (LVQ), self- classify clustering techniques as hierarchical clustering and
organizing feature map (SOFM), ART, simplified parti- tional clustering, based on the properties of clusters
ART (SART), hyperellipsoidal clustering network generated [88], [152]. Hierarchical clustering groups data
(HEC), self-split- ting competitive learning network objects with a sequence of partitions, either from singleton
(SPLL) clusters to a cluster including all individuals or vice versa,
• I. Kernel-Based while partitional clustering directly divides data objects into
— Kernel -means, support vector clustering (SVC) some prespecified number of clusters without the hierarchical
• J. Sequential Data structure. We follow this frame in surveying the clustering
— Sequence Similarity algorithms in the literature. Beginning with the discussion on
— Indirect sequence clustering proximity measure, which is the basis for most clustering
— Statistical sequence clustering algorithms, we focus on hierarchical clustering and classical
• K. Large-Scale Data Sets (See also Table II)
partitional clustering algo- rithms in Section II-B–D. Starting
— CLARA, CURE, CLARANS, BIRCH, DBSCAN,
DENCLUE, WaveCluster, FC, ART from part E, we introduce and analyze clustering algorithms
• L. Data visualization and High-dimensional Data based on a wide variety of theories and techniques, including
— PCA, ICA, Projection pursuit, Isomap, LLE, graph theory, combinato- rial search techniques, fuzzy set
CLIQUE, OptiGrid, ORCLUS theory, neural networks, and kernels techniques. Compared
• M. How Many Clusters? with graph theory and fuzzy set
TABLE II A data object is described by a set of features, usually
COMPUTATIONAL COMPLEXITY OF CLUSTERING ALGORITHMS
repre- sented as a multidimensional vector. The features can
be quan- titative or qualitative, continuous or binary, nominal
or ordinal, which determine the corresponding measure
mechanisms.
A distance or dissimilarity function on a data set is defined
to satisfy the following conditions.
1) Symmetry. ;
2) Positivity. for all and .
If conditions
3) Triangle inequality.
consideration.
Jaccard coefficient
Sokal and Sneath measure.
Gower and Legendre
measure
These measures focus on the co-occurrence features while The general agglomerative clustering can be summarized
ignoring the effect of co-absence. by the following procedure.
For nominal features that have more than two states, a 1) Start with singleton clusters. Calculate the prox-
simple strategy needs to map them into new binary features imity matrix for the clusters.
[161], while a more effective method utilizes the matching 2) Search the minimal distance
criterion
where
B. Hierarchical Clustering
Hierarchical clustering (HC) algorithms organize data into a
hierarchical structure according to the proximity matrix. The re-
sults of HC are usually depicted by a binary tree or
dendrogram. The root node of the dendrogram represents the
whole data set and each leaf node is regarded as a data
object. The interme- diate nodes, thus, describe the extent that
the objects are prox- imal to each other; and the height of
the dendrogram usually expresses the distance between each
pair of objects or clusters, or an object and a cluster. The
ultimate clustering results can be obtained by cutting the
dendrogram at different levels. This representation provides
very informative descriptions and visu- alization for the
potential data clustering structures, especially when real
hierarchical relations exist in the data, like the data from
evolutionary research on different species of organizms. HC
algorithms are mainly classified as agglomerative methods
and divisive methods. Agglomerative clustering starts with
clusters and each of them includes exactly one object. A series
of merge operations are then followed out that finally lead all
objects to the same group. Divisive clustering proceeds in an
opposite way. In the beginning, the entire data set belongs to
a cluster and a procedure successively divides it until all clus-
ters are singleton clusters. For a cluster with objects, there
are possible two-subset divisions, which is very ex-
pensive in computation [88]. Therefore, divisive clustering
is not commonly used in practice. We focus on the
agglomera- tive clustering in the following discussion and
3) Update the proximity matrix by computing the dis-
tances between the new cluster and the other
clusters.
4) Repeat steps 2)–3) until all objects are in the same
cluster.
Based on the different definitions for distance between two
clusters, there are many agglomerative clustering algorithms.
The simplest and most popular methods include single
linkage
[256] and complete linkage technique [258]. For the single
linkage method, the distance between two clusters is deter-
mined by the two closest objects in different clusters, so
it is also called nearest neighbor method. On the contrary, the
complete linkage method uses the farthest distance of a pair
of objects to define inter-cluster distance. Both the single
linkage and the complete linkage method can be generalized
by the recurrence formula proposed by Lance and Williams
[178] as
and
The best estimate can be achieved by solving the log-
likelihood equations .
Unfortunately, since the solutions of the likelihood equa-
tions cannot be obtained analytically in most circumstances
with a set of -dimensional vectors [90], [197], iteratively suboptimal approaches are required to
, where approximate the ML estimates. Among these methods, the
. expectation-maximization (EM) algorithm is the most popular
Each vector is known as a mode and is defined to [196]. EM regards the data set as incomplete and divides
each data point into two parts , where
minimize the sum of distances . The
represents the observable features and
proposed -modes algorithm operates in a similar is the missing data, where chooses value 1 or 0 according
way as -means.
Several recent advances on -means and other squared-error any types of components, but more commonly, multivariate
based clustering algorithms with their applications can be found
in [125], [155], [222], [223], [264], and [277].
where
is the fuzzy partition matrix and
is the membership coefficient of
the th object in the th cluster;
is the cluster prototype (mean or
center) matrix;
is the fuzzification parameter and
usually is set to 2 [129];
is the distance measure between and
.
We summarize the standard FCM as follows, in which
the Euclidean or norm distance function is used.
1) Select appropriate values for , and a small positive
number . Initialize the prototype matrix randomly.
Set step variable .
2) Calculate (at ) or update (at ) the member-
ship matrix by
for and
if is an uncommitted node
first activated
if is a committed node
I. Kernel-Based Clustering
Kernel-based learning algorithms [209], [240], [274] are
based on Cover’s theorem. By nonlinearly transforming a set
of complex and nonlinearly separable patterns into a higher-
di- mensional feature space, we can obtain the possibility to
separate these patterns linearly [132]. The difficulty of
curse of dimensionality can be overcome by the kernel trick,
arising from Mercer’s theorem [132]. By designing and
calculating an inner-product kernel, we can avoid the time-
consuming, sometimes even infeasible process to explicitly
describe the nonlinear mapping and compute the
corresponding points in the transformed space.
In [241], Schölkopf, Smola, and Müller depicted a kernel-
-means algorithm in the online mode. Suppose we have a set
of patterns and a nonlinear map
. Here, represents a feature space with arbitrarily high di-
mensionality. The object of the algorithm is to find centers so
that we can minimize the distance between the mapped
patterns and their closest center
if belongs to
cluster otherwise.
Then the kernel- -means algorithm can be formulated as the and, by adjusting the width parameter of RBF, SVC can form ei-
following. ther agglomerative or divisive hierarchical clusters. When some
1) Initialize the centers with the first , points are allowed to lie outside the hypersphere, SVC can
ob- servation patterns; deal with outliers effectively. An extension, called multiple
2) Take a new pattern and calculate as spheres support vector clustering, was proposed in [62], which
shown in the equation at the bottom of the page. combines the concept of fuzzy membership.
Kernel-based clustering algorithms have many advantages.
3) Update the mean vector whose corresponding
is 1 1) It is more possible to obtain a linearly separable
hyper- plane in the high-dimensional, or even infinite
feature space.
2) They can form arbitrary clustering shapes other than
where . hyperellipsoid and hypersphere.
4) Adapt the coefficients for each as 3) Kernel-based clustering algorithms, like SVC, have the
capability of dealing with noise and outliers.
for 4) For SVC, there is no requirement for prior
for knowledge to determine the system topological
structure. In [107], the kernel matrix can provide the
5) Repeat steps 2)–4) until convergence is achieved. means to estimate the number of clusters.
Two variants of kernel- -means were introduced in [66], Meanwhile, there are also some problems requiring further
motivated by SOFM and ART networks. These variants con- consideration and investigation. Like many other algorithms,
sider effects of neighborhood relations, while adjusting the how to determine the appropriate parameters, for example, the
cluster assignment variables, and use a vigilance parameter to width of Gaussian kernel, is not trivial. The problem of
control the process of producing mean vectors. The authors also compu- tational complexity may become serious for large data
illustrated the application of these approaches in case based sets.
reasoning systems. The process of constructing the sum-of-squared clustering
An alternative kernel-based clustering approach is in [107]. algorithm [107] and -means algorithm [241] presents a good
The problem was formulated to determine an optimal partition example to reformulate more powerful nonlinear versions
to minimize the trace of within-group scatter matrix in the for many existing linear algorithms, provided that the scalar
feature space product can be obtained. Theoretically, it is important to investi-
gate whether these nonlinear variants can keep some useful
and essential properties of the original algorithms and how
Mercer kernels contribute to the improvement of the
algorithms. The effect of different types of kernel functions,
which are rich in the literature, is also an interesting topic for
further exploration.
time
at th state at
th state at
CH
AIC
[242] BIC
III. APPLICATIONS
We illustrate applications of clustering techniques in three as-
pects. The first is for two classical benchmark data sets that
are widely used in pattern recognition and machine learning.
Then, we show an application of clustering for the traveling
salesman problem. The last topic is on bioinformatics. We deal
with clas- sical benchmarks in Sections III-A and III-B and the
traveling salesman problem in Section III-C. A more extensive
discussion of bioinformatics is in Sections III-D and III-E.
TABLE III
SOME CLUSTERING RESULTS FOR THE IRIS DATA SET
Fig. 7. Hierarchical and SOFM clustering of SRBCT’s gene expression data set. (a) Hierarchical clustering result for the 100 selected genes under 83 tissue
samples. The gene expression matrix is visualized through a color scale. (b) Hierarchical clustering result for the 83 tissue samples. Here, the dimension is 100
as 100 genes are selected like in (a). (c) SOFM clustering result for the 2308 genes. A 5 5 SOFM is used and 25 clusters are formed. Each cluster is represented
by the average values.
fo- cusing on specific problems [199]. The implementation of
knowledge. Some recent results can be accessed in [29], [247], the
and [291].
Fig. 8. HMM architecture [177]. There are three different states, match (M),
insert (I), and delete (D), corresponding to substitution, insertion, and deletion
operation, respectively. A begin (B) and end (E) state are also introduced to
represent the start and end of the process. This process goes through a series
of states according to the transition probability, and emits either 4-letter
nucleotide or 20-letter amino acid alphabet based on the emission probability.
IV. CONCLUSION
ACKNOWLEDGMEN
T
REFERENCES
[1] F. Abascal and A. Valencia, “Clustering of proximal sequence space
for the identification of protein families,” Bioinformatics, vol. 18, pp.
908–921, 2002.
[2] C. Aggarwal and P. Yu, “Redefining clustering for high-dimensional ap-
plications,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 2, pp. 210–
225, Feb. 2002.
[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic
subspace clustering of high dimensional data for data mining applica-
tions,” in Proc. ACM SIGMOD Int. Conf. Management of Data,
1998, pp. 94–105.
[4] H. Akaike, “A new look at the statistical model identification,” IEEE
Trans. Autom. Control, vol. AC-19, no. 6, pp. 716–722, Dec. 1974.
[5] A. Alizadeh et al., “Distinct types of diffuse large B-cell Lymphoma
identified by gene expression profiling,” Nature, vol. 403, pp. 503–
511, 2000.
[6] U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and
[7] C. Alpert and A. Kahng, “Multi-way partitioning via spacefilling [36] J. Bezdek and R. Hathaway, “Numerical convergence and
curves and dynamic programming,” in Proc. 31st ACM/IEEE Design interpretation of the fuzzy -shells clustering algorithms,” IEEE Trans.
Automa- tion Conf., 1994, pp. 652–657. Neural Netw., vol. 3, no. 5, pp. 787–793, Sep. 1992.
[8] , “Recent directions in netlist partitioning: A survey,” VLSI J., [37] J. Bezdek and N. Pal, “Some new indexes of cluster validity,” IEEE
vol. 19, pp. 1–81, 1995. Trans. Syst., Man, Cybern. B, Cybern., vol. 28, no. 3, pp. 301–315,
[9] K. Al-Sultan, “A Tabu search approach to the clustering problem,” Jun. 1998.
Pat- tern Recognit., vol. 28, no. 9, pp. 1443–1451, 1995. [38] C. Bishop, Neural Networks for Pattern Recognition. New York: Ox-
[10] S. Altschul et al., “Gapped BLAST and PSI-BLAST: A new ford Univ. Press, 1995.
generation of protein database search programs,” Nucleic Acids Res., [39] L. Bobrowski and J. Bezdek, “c-Means clustering with the and
vol. 25, pp. 3389–3402, 1997. norms,” IEEE Trans. Syst., Man, Cybern., vol. 21, no. 3, pp. 545–554,
[11] S. Altschul et al., “Basic local alignment search tool,” J. Molec. Biol., May-Jun. 1991.
vol. 215, pp. 403–410, 1990. [40] H. Bock, “Probabilistic models in cluster analysis,” Comput. Statist.
[12] G. Anagnostopoulos and M. Georgiopoulos, “Hypersphere ART and Data Anal., vol. 23, pp. 5–28, 1996.
ARTMAP for unsupervised and supervised incremental learning,” in [41] E. Bolten, A. Sxhliep, S. Schneckener, D. Schomburg, and R.
Proc. IEEE-INNS-ENNS Int. Joint Conf. Neural Networks Schrader, “Clustering protein sequences—Structure prediction by
(IJCNN’00), vol. 6, Como, Italy, pp. 59–64. transitive ho- mology,” Bioinformatics, vol. 17, pp. 935–941, 2001.
[13] , “Ellipsoid ART and ARTMAP for incremental unsupervised [42] N. Boujemaa, “Generalized competitive clustering for image segmen-
and supervised learning,” in Proc. IEEE-INNS-ENNS Int. Joint Conf. tation,” in Proc. 19th Int. Meeting North American Fuzzy Information
Processing Soc. (NAFIPS’00), Atlanta, GA, 2000, pp. 133–137.
Neural Networks (IJCNN’01), vol. 2, Washington, DC, 2001, pp.
[43] P. Bradley and U. Fayyad, “Refining initial points for -means clus-
1221–1226.
tering,” in Proc. 15th Int. Conf. Machine Learning, 1998, pp. 91–99.
[14] M. Anderberg, Cluster Analysis for Applications. New York: Aca-
[44] P. Bradley, U. Fayyad, and C. Reina, “Scaling clustering algorithms to
demic, 1973.
large databases,” in Proc. 4th Int. Conf. Knowledge Discovery and
[15] G. Babu and M. Murty, “A near-optimal initial seed value selection in
Data Mining (KDD’98), 1998, pp. 9–15.
-means algorithm using a genetic algorithm,” Pattern Recognit. [45] , “Clustering very large databases using EM mixture models,” in
Lett., vol. 14, no. 10, pp. 763–769, 1993. Proc. 15th Int. Conf. Pattern Recognition, vol. 2, 2000, pp. 76–80.
[16] , “Clustering with evolution strategies,” Pattern Recognit., vol. [46] , “Clustering very large databases using EM mixture models,” in
27, no. 2, pp. 321–329, 1994. Proc. 15th Int. Conf. Pattern Recognition, vol. 2, 2000, pp. 76–80.
[17] E. Backer and A. Jain, “A clustering performance measure based on [47] D. Brown and C. Huntley, “A practical application of simulated an-
fuzzy set decomposition,” IEEE Trans. Pattern Anal. Mach. Intell., nealing to clustering,” Pattern Recognit., vol. 25, no. 4, pp. 401–412,
vol. PAMI-3, no. 1, pp. 66–75, Jan. 1981. 1992.
[18] P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Ap- [48] C. Burges, “A tutorial on support vector machines for pattern recogni-
proach, 2nd ed. Cambridge, MA: MIT Press, 2001. tion,” Data Mining Knowl. Discov., vol. 2, pp. 121–167, 1998.
[19] P. Baldi and K. Hornik, “Neural networks and principal component anal- [49] J. Burke, D. Davison, and W. Hide, “d2 Cluster: A validated method for
ysis: Learning from examples without local minima,” Neural Netw., clustering EST and full-length cDNA sequences,” Genome Res., vol.
vol. 2, pp. 53–58, 1989. 9, pp. 1135–1142, 1999.
[20] P. Baldi and A. Long, “A Bayesian framework for the analysis of mi- [50] I. Cadez, S. Gaffney, and P. Smyth, “A general probabilistic framework
croarray expression data: Regularized t-test and statistical inferences for clustering individuals and objects,” in Proc. 6th ACM SIGKDD
of gene changes,” Bioinformatics, vol. 17, pp. 509–519, 2001. Int. Conf. Knowledge Discovery and Data Mining, 2000, pp. 140–149.
[21] G. Ball and D. Hall, “A clustering technique for summarizing multi- [51] G. Carpenter and S. Grossberg, “A massively parallel architecture for
variate data,” Behav. Sci., vol. 12, pp. 153–155, 1967. a self-organizing neural pattern recognition machine,” Comput. Vis.
[22] S. Bandyopadhyay and U. Maulik, “Nonparametric genetic clustering: Graph. Image Process., vol. 37, pp. 54–115, 1987.
Comparison of validity indices,” IEEE Trans. Syst., Man, Cybern. C, [52] , “ART2: Self-organization of stable category recognition codes
Appl. Rev., vol. 31, no. 1, pp. 120–125, Feb. 2001. for analog input patterns,” Appl. Opt., vol. 26, no. 23, pp. 4919–4930,
[23] A. Baraldi and E. Alpaydin, “Constructive feedforward ART 1987.
clustering networks—Part I and II,” IEEE Trans. Neural Netw., vol. [53] , “The ART of adaptive pattern recognition by a self-organizing
13, no. 3, pp. 645–677, May 2002. neural network,” IEEE Computer, vol. 21, no. 3, pp. 77–88, Mar.
[24] A. Baraldi and P. Blonda, “A survey of fuzzy clustering algorithms for 1988.
pattern recognition—Part I and II,” IEEE Trans. Syst., Man, Cybern. [54] , “ART3: Hierarchical search using chemical transmitters in self-
B, Cybern., vol. 29, no. 6, pp. 778–801, Dec. 1999. organizing pattern recognition architectures,” Neural Netw., vol. 3, no.
[25] A. Baraldi and L. Schenato, “Soft-to-hard model transition in clustering: 23, pp. 129–152, 1990.
A review,”, Tech. Rep. TR-99-010, 1999. [55] G. Carpenter, S. Grossberg, N. Markuzon, J. Reynolds, and D. Rosen,
[26] D. Barbará and P. Chen, “Using the fractal dimension to cluster datasets,” “Fuzzy ARTMAP: A neural network architecture for incremental
in Proc. 6th ACM SIGKDD Int. Conf. Knowledge Discovery and Data super- vised learning of analog multidimensional maps,” IEEE Trans.
Mining, 2000, pp. 260–264. Neural Netw., vol. 3, no. 5, pp. 698–713, 1992.
[27] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques [56] G. Carpenter, S. Grossberg, and J. Reynolds, “ARTMAP: Supervised
for embedding and clustering,” in Advances in Neural Information real-time learning and classification of nonstationary data by a self-or-
Processing Systems, T. G. Dietterich, S. Becker, and Z. Ghahramani, ganizing neural network,” Neural Netw., vol. 4, no. 5, pp. 169–181, 1991.
[57] G. Carpenter, S. Grossberg, and D. Rosen, “Fuzzy ART: Fast stable
Eds. Cambridge, MA: MIT Press, 2002, vol. 14.
learning and categorization of analog patterns by an adaptive
[28] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton,
resonance system,” Neural Netw., vol. 4, pp. 759–771, 1991.
NJ: Princeton Univ. Press, 1961.
[58] G. Celeux and G. Govaert, “A classification EM algorithm for clustering
[29] A. Ben-Dor, R. Shamir, and Z. Yakhini, “Clustering gene expression
and two stochastic versions,” Comput. Statist. Data Anal., vol. 14, pp.
patterns,” J. Comput. Biol., vol. 6, pp. 281–297, 1999.
315–332, 1992.
[30] Y. Bengio, “Markovian models for sequential data,” Neural Comput. [59] P. Cheeseman and J. Stutz, “Bayesian classification (AutoClass):
Surv., vol. 2, pp. 129–162, 1999. Theory and results,” in Advances in Knowledge Discovery and Data
[31] A. Ben-Hur, D. Horn, H. Siegelmann, and V. Vapnik, “Support vector Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R.
clustering,” J. Mach. Learn. Res., vol. 2, pp. 125–137, 2001. Uthurusamy, Eds. Menlo Park, CA: AAAI Press, 1996, pp. 153–180.
[32] , “A support vector clustering method,” in Proc. Int. Conf. [60] V. Cherkassky and F. Mulier, Learning From Data: Concepts,
Pattern Recognition, vol. 2, 2000, pp. 2724–2727. Theory, and Methods. New York: Wiley, 1998.
[33] P. Berkhin. (2001) Survey of clustering data mining techniques. [On- [61] J. Cherng and M. Lo, “A hypergraph based clustering algorithm for
line]. Available: http://www.accrue.com/products/rp_cluster_review.pdf spa- tial data sets,” in Proc. IEEE Int. Conf. Data Mining (ICDM’01),
http://citeseer.nj.nec.com/berkhin02survey.html 2001, pp. 83–90.
[34] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is nearest [62] J. Chiang and P. Hao, “A new kernel-based fuzzy clustering approach:
neighbor meaningful,” in Proc. 7th Int. Conf. Database Theory, 1999, Support vector clustering with cell growing,” IEEE Trans. Fuzzy Syst.,
pp. 217–235. vol. 11, no. 4, pp. 518–527, Aug. 2003.
[35] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo- [63] C. Chinrungrueng and C. Séquin, “Optimal adaptive -means algo-
rithms. New York: Plenum, 1981. rithm with dynamic adjustment of learning rate,” IEEE Trans. Neural
Netw., vol. 6, no. 1, pp. 157–169, Jan. 1995.
tering,” Mach. Learn., vol. 2, pp. 139–172, 1987.
[64] S. Chu and J. Roddick, “A clustering algorithm using the Tabu search
approach with simulated annealing,” in Data Mining II—Proceedings
of Second International Conference on Data Mining Methods and
Databases, N. Ebecken and C. Brebbia, Eds, Cambridge, U.K., 2000,
pp. 515–523.
[65] I. H. G. S. Consortium, “Initial sequencing and analysis of the human
genome,” Nature, vol. 409, pp. 860–921, 2001.
[66] J. Corchado and C. Fyfe, “A comparison of kernel methods for instan-
tiating case based reasoning systems,” Comput. Inf. Syst., vol. 7, pp.
29–42, 2000.
[67] M. Cowgill, R. Harvey, and L. Watson, “A genetic algorithm
approach to cluster analysis,” Comput. Math. Appl., vol. 37, pp. 99–
108, 1999.
[68] C. Cummings and D. Relman, “Using DNA microarray to study host-
microbe interactions,” Genomics, vol. 6, no. 5, pp. 513–525, 2000.
[69] E. Dahlhaus, “Parallel algorithms for hierarchical clustering and appli-
cations to split decomposition and parity graph recognition,” J. Algo-
rithms, vol. 36, no. 2, pp. 205–240, 2000.
[70] R. Davé, “Adaptive fuzzy -shells clustering and detection of
ellipses,”
IEEE Trans. Neural Netw., vol. 3, no. 5, pp. 643–662, Sep. 1992.
[71] R. Davé and R. Krishnapuram, “Robust clustering methods: A unified
view,” IEEE Trans. Fuzzy Syst., vol. 5, no. 2, pp. 270–293, May 1997.
[72] M. Delgado, A. Skármeta, and H. Barberá, “A Tabu search approach
to the fuzzy clustering problem,” in Proc. 6th IEEE Int. Conf. Fuzzy
Systems, vol. 1, 1997, pp. 125–130.
[73] D. Dembélé and P. Kastner, “Fuzzy -means method for clustering mi-
croarray data,” Bioinformatics, vol. 19, no. 8, pp. 973–980, 2003.
[74] Handbook of Pattern Recognition and Computer Vision, C. Chen, L.
Pau, and P. Wang, Eds., World Scientific, Singapore, 1993, pp. 3–32. R.
Dubes, “Cluster analysis and related issue”.
[75] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New
York: Wiley, 2001.
[76] J. Dunn, “A fuzzy relative of the ISODATA process and its use in de-
tecting compact well separated clusters,” J. Cybern., vol. 3, no. 3, pp.
32–57, 1974.
[77] B. Duran and P. Odell, Cluster Analysis: A Survey. New York:
Springer-Verlag, 1974.
[78] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence
Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cam-
bridge, U.K.: Cambridge Univ. Press, 1998.
[79] M. Eisen and P. Brown, “DNA arrays for analysis of gene
expression,”
Methods Enzymol., vol. 303, pp. 179–205, 1999.
[80] M. Eisen, P. Spellman, P. Brown, and D. Botstein, “Cluster analysis and
display of genome-wide expression patterns,” in Proc. Nat. Acad. Sci.
USA, vol. 95, 1998, pp. 14 863–14 868.
[81] Y. El-Sonbaty and M. Ismail, “Fuzzy clustering for symbolic data,” IEEE
Trans. Fuzzy Syst., vol. 6, no. 2, pp. 195–204, May 1998.
[82] T. Eltoft and R. deFigueiredo, “A new neural network for cluster-de-
tection-and-labeling,” IEEE Trans. Neural Netw., vol. 9, no. 5, pp.
1021–1035, Sep. 1998.
[83] A. Enright and C. Ouzounis, “GeneRAGE: A robust algorithm for se-
quence clustering and domain detection,” Bioinformatics, vol. 16, pp.
451–457, 2000.
[84] S. Eschrich, J. Ke, L. Hall, and D. Goldgof, “Fast accurate fuzzy clus-
tering through data reduction,” IEEE Trans. Fuzzy Syst., vol. 11, no. 2,
pp. 262–270, Apr. 2003.
[85] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based
algorithm for discovering clusters in large spatial databases with
noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining
(KDD’96), 1996, pp. 226–231.
[86] V. Estivill-Castro and I. Lee, “AMOEBA: Hierarchical clustering
based on spatial proximity using Delaunay diagram,” in Proc. 9th Int.
Symp. Spatial Data Handling (SDH’99), Beijing, China, 1999, pp.
7a.26–7a.41.
[87] V. Estivill-Castro and J. Yang, “A fast and robust general purpose
clus- tering algorithm,” in Proc. 6th Pacific Rim Int. Conf. Artificial
Intelli- gence (PRICAI’00), R. Mizoguchi and J. Slaney, Eds.,
Melbourne, Aus- tralia, 2000, pp. 208–218.
[88] B. Everitt, S. Landau, and M. Leese, Cluster Analysis. London:
Arnold, 2001.
[89] D. Fasulo, “An analysis of recent work on clustering algorithms,”
Dept. Comput. Sci. Eng., Univ. Washington, Seattle, WA, Tech. Rep.
01-03-02, 1999.
[90] M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture
models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp.
381–396, Mar. 2002.
[91] D. Fisher, “Knowledge acquisition via incremental conceptual clus-
Computer Science and Computational Biology. Cambridge, U.K.:
Cambridge Univ. Press, 1997.
[92] R. Fisher, “The use of multiple measurements in taxonomic
[121] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Cluster validity
problems,”
methods: Part I & II,” SIGMOD Record, vol. 31, no. 2–3, 2002.
Annu. Eugenics, pt. II, vol. 7, pp. 179–188, 1936.
[122] L. Hall, I. Özyurt, and J. Bezdek, “Clustering with a genetically opti-
[93] D. Fogel, “An introduction to simulated evolutionary
mized approach,” IEEE Trans. Evol. Comput., vol. 3, no. 2, pp. 103–112,
optimization,”
1999.
IEEE Trans. Neural Netw., vol. 5, no. 1, pp. 3–14, Jan. 1994.
[94] E. Forgy, “Cluster analysis of multivariate data: Efficiency vs.
inter- pretability of classifications,” Biometrics, vol. 21, pp. 768–
780, 1965.
[95] C. Fraley and A. Raftery, “MCLUST: Software for model-based
cluster analysis,” J. Classificat., vol. 16, pp. 297–306, 1999.
[96] , “Model-Based clustering, discriminant analysis, and density
esti- mation,” J. Amer. Statist. Assoc., vol. 97, pp. 611–631, 2002.
[97] J. Friedman, “Exploratory projection pursuit,” J. Amer. Statist.
Assoc., vol. 82, pp. 249–266, 1987.
[98] H. Frigui and R. Krishnapuram, “A robust competitive clustering
algo- rithm with applications in computer vision,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 21, no. 5, pp. 450–465, May 1999.
[99] B. Fritzke. (1997) Some competitive learning methods. [Online].
Avail- able: http://www.neuroinformatik.ruhr-uni-
bochum.de/ini/VDM/re- search/gsn/JavaPaper
[100] B. Gabrys and A. Bargiela, “General fuzzy min-max neural
network for clustering and classification,” IEEE Trans. Neural
Netw., vol. 11, no. 3, pp. 769–783, May 2000.
[101] V. Ganti, R. Ramakrishnan, J. Gehrke, A. Powell, and J. French,
“Clus- tering large datasets in arbitrary metric spaces,” in Proc.
15th Int. Conf. Data Engineering, 1999, pp. 502–511.
[102] I. Gath and A. Geva, “Unsupervised optimal fuzzy clustering,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, no. 7, pp. 773–781,
Jul. 1989.
[103] GenBank Release Notes 144.0.
[104] A. Geva, “Hierarchical unsupervised fuzzy clustering,” IEEE
Trans. Fuzzy Syst., vol. 7, no. 6, pp. 723–733, Dec. 1999.
[105] D. Ghosh and A. Chinnaiyan, “Mixture modeling of gene
expression data from microarray experiments,” Bioinformatics, vol.
18, no. 2, pp. 275–286, 2002.
[106] A. Ghozeil and D. Fogel, “Discovering patterns in spatial data
using evolutionary programming,” in Proc. 1st Annu. Conf. Genetic
Program- ming, 1996, pp. 512–520.
[107] M. Girolami, “Mercer kernel based clustering in feature space,”
IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 780–784, May 2002.
[108] F. Glover, “Tabu search, part I,” ORSA J. Comput., vol. 1, no. 3,
pp. 190–206, 1989.
[109] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.
Mesirov,
H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E.
Lander, “Molecular classification of cancer: Class discovery and
class prediction by gene expression monitoring,” Science, vol. 286,
pp. 531–537, 1999.
[110] A. Gordon, “Cluster validation,” in Data Science, Classification, and
Re- lated Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.
Bock, and Y. Bada, Eds. New York: Springer-Verlag, 1998, pp.
22–39.
[111] , Classification, 2nd ed. London, U.K.: Chapman & Hall, 1999.
[112] J. Gower, “A general coefficient of similarity and some of its
properties,”
Biometrics, vol. 27, pp. 857–872, 1971.
[113] S. Grossberg, “Adaptive pattern recognition and universal encoding
II: Feedback, expectation, olfaction, and illusions,” Biol. Cybern.,
vol. 23, pp. 187–202, 1976.
[114] P. Grünwald, P. Kontkanen, P. Myllymäki, T. Silander, and H.
Tirri, “Minimum encoding approaches for predictive modeling,” in
Proc. 14th Int. Conf. Uncertainty in AI (UAI’98), 1998, pp. 183–
192.
[115] X. Guan and L. Du, “Domain identification by clustering sequence
alignments,” Bioinformatics, vol. 14, pp. 783–788, 1998.
[116] S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering
algo- rithm for large databases,” in Proc. ACM SIGMOD Int. Conf.
Manage- ment of Data, 1998, pp. 73–84.
[117] , “ROCK: A robust clustering algorithm for categorical
attributes,”
Inf. Syst., vol. 25, no. 5, pp. 345–366, 2000.
[118] S. Gupata, K. Rao, and V. Bhatnagar, “ -means clustering
algorithm for categorical attributes,” in Proc. 1st Int. Conf. Data
Warehousing and Knowledge Discovery (DaWaK’99), Florence,
Italy, 1999, pp. 203–208.
[119] V. Guralnik and G. Karypis, “A scalable algorithm for clustering
sequen- tial data,” in Proc. 1st IEEE Int. Conf. Data Mining
(ICDM’01), 2001, pp. 179–186.
[120] D. Gusfield, Algorithms on Strings, Trees, and Sequences:
[123] R. Hammah and J. Curran, “Validity measures for the fuzzy cluster anal- [153] D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene
ysis of orientations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. expression data: A survey,” IEEE Trans. Knowl. Data Eng., vol. 16,
12, pp. 1467–1472, Dec. 2000. no. 11, pp. 1370–1386, Nov. 2004.
[124] P. Hansen and B. Jaumard, “Cluster analysis and mathematical program- [154] C. Jutten and J. Herault, “Blind separation of sources, Part I: An adaptive
ming,” Math. Program., vol. 79, pp. 191–215, 1997. algorithms based on neuromimetic architecture,” Signal Process., vol.
[125] P. Hansen and N. Mladenoviæ, “J-means: A new local search heuristic 24, no. 1, pp. 1–10, 1991.
for minimum sum of squares clustering,” Pattern Recognit., vol. 34, [155] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A.
pp. 405–413, 2001. Wu, “An efficient -means clustering algorithm: Analysis and imple-
[126] F. Harary, Graph Theory. Reading, MA: Addison-Wesley, 1969. mentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7,
[127] J. Hartigan, Clustering Algorithms. New York: Wiley, 1975. pp. 881–892, Jul. 2000.
[128] E. Hartuv and R. Shamir, “A clustering algorithm based on graph con- [156] N. Karayiannis, “A methodology for construction fuzzy algorithms for
nectivity,” Inf. Process. Lett., vol. 76, pp. 175–181, 2000. learning vector quantization,” IEEE Trans. Neural Netw., vol. 8, no. 3,
[129] R. Hathaway and J. Bezdek, “Fuzzy -means clustering of incomplete pp. 505–518, May 1997.
data,” IEEE Trans. Syst., Man, Cybern., vol. 31, no. 5, pp. 735–744, [157] N. Karayiannis, J. Bezdek, N. Pal, R. Hathaway, and P. Pai, “Repairs
2001. to GLVQ: A new family of competitive learning schemes,” IEEE
[130] R. Hathaway, J. Bezdek, and Y. Hu, “Generalized fuzzy -means clus- Trans. Neural Netw., vol. 7, no. 5, pp. 1062–1071, Sep. 1996.
tering strategies using norm distances,” IEEE Trans. Fuzzy Syst., vol. [158] J. Karhunen, E. Oja, L. Wang, R. Vigário, and J. Joutsensalo, “A class
8, no. 5, pp. 576–582, Oct. 2000. of neural networks for independent component analysis,” IEEE Trans.
[131] B. Hay, G. Wets, and K. Vanhoof, “Clustering navigation patterns on Neural Netw., vol. 8, no. 3, pp. 486–504, May 1997.
a website using a sequence alignment method,” in Proc. Intelligent [159] G. Karypis, E. Han, and V. Kumar, “Chameleon: Hierarchical clustering
Tech- niques for Web Personalization: 17th Int. Joint Conf. Artificial using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp. 68–75,
Intelli- gence, vol. s.l, 2001, pp. 1–6, 200. Aug. 1999.
[160] R. Kathari and D. Pitts, “On finding the number of clusters,” Pattern
[132] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd
Recognit. Lett., vol. 20, pp. 405–416, 1999.
ed. Englewood Cliffs, NJ: Prentice-Hall, 1999.
[161] L. Kaufman and P. Rousseeuw, Finding Groups in Data: An
[133] Q. He, “A review of clustering algorithms as applied to IR,” Univ. Illinois
Introduction to Cluster Analysis: Wiley, 1990.
at Urbana-Champaign, Tech. Rep. UIUCLIS-1999/6+IRG, 1999.
[162] W. Kent and A. Zahler, “Conservation, regulation, synteny, and
[134] M. Healy, T. Caudell, and S. Smith, “A neural architecture for pattern
introns in a large-scale C. Briggsae—C. elegans genomic alignment,”
sequence verification through inferencing,” IEEE Trans. Neural Genome Res., vol. 10, pp. 1115–1125, 2000.
Netw., vol. 4, no. 1, pp. 9–20, Jan. 1993. [163] P. Kersten, “Implementation issues in the fuzzy -medians clustering
[135] A. Hinneburg and D. Keim, “An efficient approach to clustering in algorithm,” in Proc. 6th IEEE Int. Conf. Fuzzy Systems, vol. 2, 1997,
large multimedia databases with noise,” in Proc. 4th Int. Conf. pp. 957–962.
Knowledge Discovery and Data Mining (KDD’98), 1998, pp. 58–65. [164] J. Khan, J. Wei, M. Ringnér, L. Saal, M. Ladanyi, F. Westermann, F.
[136] , “Optimal grid-clustering: Toward breaking the curse of dimen- Berthold, M. Schwab, C. Antonescu, C. Peterson, and P. Meltzer, “Clas-
sionality in high-dimensional clustering,” in Proc. 25th VLDB Conf., sification and diagnostic prediction of cancers using gene expression
1999, pp. 506–517. profiling and artificial neural networks,” Nature Med., vol. 7, no. 6,
[137] F. Hoeppner, “Fuzzy shell clustering algorithms in image processing: pp. 673–679, 2001.
Fuzzy -rectangular and 2-rectangular shells,” IEEE Trans. Fuzzy Syst., [165] S. Kirkpatrick, C. Gelatt, and M. Vecchi, “Optimization by simulated
vol. 5, no. 4, pp. 599–613, Nov. 1997. annealing,” Science, vol. 220, no. 4598, pp. 671–680, 1983.
[138] J. Hoey, “Clustering contextual facial display sequences,” in Proc. 5th [166] J. Kleinberg, “An impossibility theorem for clustering,” in Proc. 2002
IEEE Int. Conf. Automatic Face and Gesture Recognition (FGR’02), Conf. Advances in Neural Information Processing Systems, vol. 15,
2002, pp. 354–359. 2002, pp. 463–470.
[139] T. Hofmann and J. Buhmann, “Pairwise data clustering by deterministic [167] R. Kohavi, “A study of cross-validation and bootstrap for accuracy es-
annealing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 1, pp. timation and model selection,” in Proc. 14th Int. Joint Conf. Artificial
1–14, Jan. 1997. Intelligence, 1995, pp. 338–345.
[140] J. Holland, Adaption in Natural and Artificial Systems. Ann Arbor, MI: [168] T. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9,
Univ. Michigan Press, 1975. pp. 1464–1480, Sep. 1990.
[141] F. Höppner, F. Klawonn, and R. Kruse, Fuzzy Cluster Analysis: Methods [169] , Self-Organizing Maps, 3rd ed. New York: Springer-Verlag,
for Classification, Data Analysis, and Image Recognition. New York: 2001.
Wiley, 1999. [170] T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J. Honkela, V. Paatero, and
[142] Z. Huang, “Extensions to the -means algorithm for clustering large A. Saarela, “Self organization of a massive document collection,”
data sets with categorical values,” Data Mining Knowl. Discov., vol. 2, IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 574–585, May 2000.
pp. 283–304, 1998. [171] E. Kolatch. (2001) Clustering algorithms for spatial databases: A
[143] J. Huang, M. Georgiopoulos, and G. Heileman, “Fuzzy ART properties,” Survey. [Online]. Available: http://citeseer.nj.nec.com/436 843.html
Neural Netw., vol. 8, no. 2, pp. 203–213, 1995. [172] J. Kolen and T. Hutcheson, “Reducing the time complexity of the
fuzzy -means algorithm,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp.
[144] P. Huber, “Projection pursuit,” Ann. Statist., vol. 13, no. 2, pp. 435–475,
263–267, Apr. 2002.
1985. [173] K. Krishna and M. Murty, “Genetic -means algorithm,” IEEE Trans.
[145] R. Hughey and A. Krogh, “Hidden Markov models for sequence anal- Syst., Man, Cybern. B, Cybern., vol. 29, no. 3, pp. 433–439, Jun. 1999.
ysis: Extension and analysis of the basic method,” CABIOS, vol. 12, [174] R. Krishnapuram, H. Frigui, and O. Nasraoui, “Fuzzy and possiblistic
no. 2, pp. 95–107, 1996. shell clustering algorithms and their application to boundary detection
[146] M. Hung and D. Yang, “An efficient fuzzy -means clustering algo- and surface approximation—Part I and II,” IEEE Trans. Fuzzy Syst.,
rithm,” in Proc. IEEE Int. Conf. Data Mining, 2001, pp. 225–232. vol. 3, no. 1, pp. 29–60, Feb. 1995.
[147] L. Hunt and J. Jorgensen, “Mixture model clustering using the MUL- [175] R. Krishnapuram and J. Keller, “A possibilistic approach to clustering,”
TIMIX program,” Australia and New Zealand J. Statist., vol. 41, pp. IEEE Trans. Fuzzy Syst., vol. 1, no. 2, pp. 98–110, Apr. 1993.
153–171, 1999. [176] R. Krishnapuram, O. Nasraoui, and H. Frigui, “The fuzzy spherical
[148] J. Hwang, J. Vlontzos, and S. Kung, “A systolic neural network archi- shells algorithm: A new approach,” IEEE Trans. Neural Netw., vol. 3,
tecture for hidden Markov models,” IEEE Trans. Acoust., Speech, Signal no. 5, pp. 663–671, Sep. 1992.
Process., vol. 37, no. 12, pp. 1967–1979, Dec. 1989. [177] A. Krogh, M. Brown, I. Mian, K. Sjölander, and D. Haussler, “Hidden
[149] A. Hyvärinen, “Survey of independent component analysis,” Neural Markov models in computational biology: Applications to protein
Comput. Surv., vol. 2, pp. 94–128, 1999. mod- eling,” J. Molec. Biol., vol. 235, pp. 1501–1531, 1994.
[150] A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood [178] G. Lance and W. Williams, “A general theory of classification sorting
Cliffs, NJ: Prentice-Hall, 1988. strategies: 1. Hierarchical systems,” Comput. J., vol. 9, pp. 373–380,
[151] A. Jain, R. Duin, and J. Mao, “Statistical pattern recognition: A 1967.
review,” [179] M. Law and J. Kwok, “Rival penalized competitive learning for
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 1, pp. 4–37, 2000. model- based sequence clustering,” in Proc. 15th Int. Conf. Pattern
[152] A. Jain, M. Murty, and P. Flynn, “Data clustering: A review,” ACM Recognition, vol. 2, 2000, pp. 195–198.
Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999.
[180] Y. Leung, J. Zhang, and Z. Xu, “Clustering by scale-space filtering,” [207] T. Morzy, M. Wojciechowski, and M. Zakrzewicz, “Pattern-oriented
IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1396– hierarchical clustering,” in Proc. 3rd East Eur. Conf. Advances in
1410, Dec. 2000. Databases and Information Systems, 1999, pp. 179–190.
[181] E. Levine and E. Domany, “Resampling method for unsupervised es- [208] S. Mulder and D. Wunsch, “Million city traveling salesman problem
timation of cluster validity,” Neural Comput., vol. 13, pp. 2573–2593, solution by divide and conquer clustering with adaptive resonance neural
2001. networks,” Neural Netw., vol. 16, pp. 827–832, 2003.
[182] C. Li and G. Biswas, “Temporal pattern generation using hidden [209] K. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf, “An intro-
Markov model based unsupervised classification,” in Advances in duction to kernel-based learning algorithms,” IEEE Trans. Neural Netw.,
Intelligent Data Analysis. ser. Lecture Notes in Computer Science, D. vol. 12, no. 2, pp. 181–201, Mar. 2001.
Hand, K. Kok, and M. Berthold, Eds. New York: Springer-Verlag, [210] F. Murtagh, “A survey of recent advances in hierarchical clustering al-
1999, vol. 1642. gorithms,” Comput. J., vol. 26, no. 4, pp. 354–359, 1983.
[183] , “Unsupervised learning with mixed numeric and nominal data,” [211] F. Murtagh and M. Berry, “Overcoming the curse of dimensionality in
IEEE Trans. Knowl. Data Eng., vol. 14, no. 4, pp. 673–690, Jul.-Aug. clustering by means of the wavelet transform,” Comput. J., vol. 43, no.
2002. 2, pp. 107–120, 2000.
[184] C. Li, H. Garcia-Molina, and G. Wiederhold, “Clustering for approxi- [212] S. Needleman and C. Wunsch, “A general method applicable to the
mate similarity search in high-dimensional spaces,” IEEE Trans. Knowl. search for similarities in the amino acid sequence of two proteins,” J.
Data Eng., vol. 14, no. 4, pp. 792–808, Jul.-Aug. 2002. Molec. Biol., vol. 48, pp. 443–453, 1970.
[185] W. Li, L. Jaroszewski, and A. Godzik, “Clustering of highly [213] R. Ng and J. Han, “CLARANS: A method for clustering objects for
homologous sequences to reduce the size of large protein databases,” spatial data mining,” IEEE Trans. Knowl. Data Eng., vol. 14, no. 5,
Bioinformatics, vol. 17, pp. 282–283, 2001. pp. 1003–1016, Sep.-Oct. 2002.
[186] A. Likas, N. Vlassis, and J. Verbeek, “The global -means clustering
[214] T. Oates, L. Firoiu, and P. Cohen, “Using dynamic time warping to boot-
algorithm,” Pattern Recognit., vol. 36, no. 2, pp. 451–461, 2003.
strap HMM-based clustering of time series,” in Sequence Learning.
[187] S. Lin and B. Kernighan, “An effective heuristic algorithm for the
ser. LNAI 1828, R. Sun and C. Giles, Eds. Berlin, Germany: Springer-
trav- eling salesman problem,” Operat. Res., vol. 21, pp. 498–516,
1973. Verlag, 2000, pp. 35–52.
[188] R. Lipshutz, S. Fodor, T. Gingeras, and D. Lockhart, “High density [215] E. Oja, “Principal components minor components, and linear neural net-
synthetic oligonucleotide arrays,” Nature Genetics, vol. 21, pp. 20–24, works,” Neural Netw., vol. 5, pp. 927–935, 1992.
1999. [216] J. Oliver, R. Baxter, and C. Wallace, “Unsupervised learning using
[189] G. Liu, Introduction to Combinatorial Mathematics. New York: Mc- MML,” in Proc. 13th Int. Conf. Machine Learning (ICML’96),
Graw-Hill, 1968. Lorenza, Saitta, 1996, pp. 364–372.
[190] J. Lozano and P. Larrañaga, “Applying genetic algorithms to search for [217] C. Olson, “Parallel algorithms for hierarchical clustering,” Parallel
the best hierarchical clustering of a dataset,” Pattern Recognit. Lett., vol. Comput., vol. 21, pp. 1313–1325, 1995.
20, pp. 911–918, 1999. [218] C. Ordonez and E. Omiecinski, “Efficient disk-based K-means clus-
[191] J. MacQueen, “Some methods for classification and analysis of mul- tering for relational databases,” IEEE Trans. Knowl. Data Eng., vol.
tivariate observations,” in Proc. 5th Berkeley Symp., vol. 1, 1967, pp. 16, no. 8, pp. 909–921, Aug. 2004.
281–297. [219] L. Owsley, L. Atlas, and G. Bernard, “Self-organizing feature maps
[192] S. C. Madeira and A. L. Oliveira, “Biclustering algorithms for biolog- and hidden Markov models for machine-tool monitoring,” IEEE
ical data analysis: A survey,” IEEE/ACM Trans. Computat. Biol. Trans. Signal Process., vol. 45, no. 11, pp. 2787–2798, Nov. 1997.
Bioin- formatics, vol. 1, no. 1, pp. 24–45, Jan. 2004. [220] N. Pal and J. Bezdek, “On cluster validity for the fuzzy -means model,”
[193] Y. Man and I. Gath, “Detection and separation of ring-shaped clusters IEEE Trans. Fuzzy Syst., vol. 3, no. 3, pp. 370–379, Aug. 1995.
using fuzzy clustering,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, [221] N. Pal, J. Bezdek, and E. Tsao, “Generalized clustering networks and
no. 8, pp. 855–861, Aug. 1994. Kohonen’s self-organizing scheme,” IEEE Trans. Neural Netw., vol.
[194] J. Mao and A. Jain, “A self-organizing network for hyperellipsoidal 4, no. 4, pp. 549–557, Jul. 1993.
clus- tering (HEC),” IEEE Trans. Neural Netw., vol. 7, no. 1, pp. 16– [222] G. Patanè and M. Russo, “The enhanced-LBG algorithm,” Neural Netw.,
29, Jan. 1996. vol. 14, no. 9, pp. 1219–1237, 2001.
[195] U. Maulik and S. Bandyopadhyay, “Genetic algorithm-based [223] , “Fully automatic clustering system,” IEEE Trans. Neural Netw.,
clustering technique,” Pattern Recognit., vol. 33, pp. 1455–1465, vol. 13, no. 6, pp. 1285–1298, Nov. 2002.
2000. [224] W. Pearson, “Improved tools for biological sequence comparison,” Proc.
[196] G. McLachlan and T. Krishnan, The EM Algorithm and Exten- Nat. Acad. Sci., vol. 85, pp. 2444–2448, 1988.
sions. New York: Wiley, 1997.
[225] D. Peel and G. McLachlan, “Robust mixture modeling using the t-dis-
[197] G. McLachlan and D. Peel, Finite Mixture Models. New York: Wiley,
tribution,” Statist. Comput., vol. 10, pp. 339–348, 2000.
2000.
[198] G. McLachlan, D. Peel, K. Basford, and P. Adams, “The EMMIX [226] D. Pelleg and A. Moore, “X-means: Extending -means with efficient
soft- ware for the fitting of mixtures of normal and t-components,” J. estimation of the number of clusters,” in Proc. 17th Int. Conf.
Statist. Software, vol. 4, 1999. Machine Learning (ICML’00), 2000, pp. 727–734.
[199] C. Miller, J. Gurd, and A. Brass, “A RAPID algorithm for sequence data- [227] J. Peña, J. Lozano, and P. Larrañaga, “An empirical comparison of four
base comparisons: Application to the identification of vector contami- initialization methods for the -means algorithm,” Pattern Recognit.
nation in the EMBL databases,” Bioinformatics, vol. 15, pp. 111–121, Lett., vol. 20, pp. 1027–1040, 1999.
1999. [228] C. Pizzuti and D. Talia, “P-AutoClass: Scalable parallel clustering for
[200] R. Miller et al., “A comprehensive approach to clustering of expressed mining large data sets,” IEEE Trans. Knowl. Data Eng., vol. 15, no. 3,
human gene sequence: The sequence tag alignment and consensus pp. 629–641, May-Jun. 2003.
knowledge base,” Genome Res., vol. 9, pp. 1143–1155, 1999. [229] L. Rabiner, “A tutorial on hidden Markov models and selected
[201] W. Miller, “Comparison of genomic DNA sequences: Solved and un- applica- tions in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp.
solved problems,” Bioinformatics, vol. 17, pp. 391–397, 2001. 257–286, Feb. 1989.
[202] G. Milligan and M. Cooper, “An examination of procedures for deter- [230] Ralf-Herwig, A. Poustka, C. Müller, C. Bull, H. Lehrach, and
mining the number of clusters in a data set,” Psychometrika, vol. 50, J. O’Brien, “Large-scale clustering of cDNA-fingerprinting data,”
pp. 159–179, 1985. Genome Res., pp. 1093–1105, 1999.
[203] R. Mollineda and E. Vidal, “A relative approach to hierarchical [231] A. Rauber, J. Paralic, and E. Pampalk, “Empirical evaluation of clus-
clustering,” in Pattern Recognition and Applications, Frontiers in tering algorithms,” J. Inf. Org. Sci., vol. 24, no. 2, pp. 195–209, 2000.
Artificial Intelligence and Applications, M. Torres and A. Sanfeliu, [232] S. Ridella, S. Rovetta, and R. Zunino, “Plastic algorithm for adaptive
Eds. Amsterdam, The Netherlands: IOS Press, 2000, vol. 56, pp. 19– vector quantization,” Neural Comput. Appl., vol. 7, pp. 37–51, 1998.
28. [233] J. Rissanen, “Fisher information and stochastic complexity,” IEEE
[204] B. Moore, “ART1 and pattern clustering,” in Proc. 1988 Trans. Inf. Theory, vol. 42, no. 1, pp. 40–47, Jan. 1996.
Connectionist Models Summer School, 1989, pp. 174–185. [234] K. Rose, “Deterministic annealing for clustering, compression,
[205] S. Moore, “Making chips to probe genes,” IEEE Spectr., vol. 38, no. classifi- cation, regression, and related optimization problems,” Proc.
3, pp. 54–60, Mar. 2001. IEEE, vol. 86, no. 11, pp. 2210–2239, Nov. 1998.
[206] Y. Moreau, F. Smet, G. Thijs, K. Marchal, and B. Moor, “Functional
bioinformatics of microarray data: From expression to regulation,”
Proc. IEEE, vol. 90, no. 11, pp. 1722–1743, Nov. 2002.
[235] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally [262] K. Stoffel and A. Belkoniene, “Parallel -means clustering for
linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. large data sets,” in Proc. EuroPar’99 Parallel Processing, 1999, pp.
[236] D. Sankoff and J. Kruskal, Time Warps, String Edits, and Macro- 1451–1454.
molecules: The Theory and Practice of Sequence Comparison. Stan- [263] M. Su and H. Chang, “Fast self-organizing feature map algorithm,” IEEE
ford, CA: CSLI Publications, 1999. Trans. Neural Netw., vol. 11, no. 3, pp. 721–733, May 2000.
[237] O. Sasson, N. Linial, and M. Linial, “The metric space of pro- teins— [264] M. Su and C. Chou, “A modified version of the -means algorithm with
Comparative study of clustering algorithms,” Bioinformatics, vol. 18, a distance based on cluster symmetry,” IEEE Trans. Pattern Anal. Mach.
pp. s14–s21, 2002. Intell., vol. 23, no. 6, pp. 674–680, Jun. 2001.
[238] U. Scherf, D. Ross, M. Waltham, L. Smith, J. Lee, L. Tanabe, K. Kohn, [265] R. Sun and C. Giles, “Sequence learning: Paradigms, algorithms, and
W. Reinhold, T. Myers, D. Andrews, D. Scudiero, M. Eisen, E. Sausville,
applications,” in LNAI 1828, . Berlin, Germany, 2000.
Y. Pommier, D. Botstein, P. Brown, and J. Weinstein, “A gene
[266] C. Sung and H. Jin, “A Tabu-search-based heuristic for clustering,”
expression database for the molecular pharmacology of cancer,”
Pat- tern Recognit., vol. 33, pp. 849–858, 2000.
Nature Genetics, vol. 24, no. 3, pp. 236–244, 2000.
[239] P. Scheunders, “A comparison of clustering algorithms applied to color [267] SWISS-PROT Protein Knowledgebase Release 45.0 Statistics.
image quantization,” Pattern Recognit. Lett., vol. 18, pp. 1379–1384, [268] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitro-
1997. vsky, E. Lander, and T. Golub, “Interpreting patterns of gene expression
[240] B. Schölkopf and A. Smola, Learning with Kernels: Support Vector Ma- with self-organizing maps: Methods and application to hematopoietic
chines, Regularization, Optimization, and Beyond. Cambridge, MA: differentiation,” Proc. Nat. Acad. Sci., pp. 2907–2912, 1999.
MIT Press, 2002. [269] S. Tavazoie, J. Hughes, M. Campbell, R. Cho, and G. Church, “Sys-
[241] B. Schölkopf, A. Smola, and K. Müller, “Nonlinear component tematic determination of genetic network architecture,” Nature Genetics,
analysis as a kernel eigenvalue problem,” Neural Computat., vol. 10, vol. 22, pp. 281–285, 1999.
no. 5, pp. 1299–1319, 1998. [270] J. Tenenbaum, V. Silva, and J. Langford, “A global geometric frame-
[242] G. Schwarz, “Estimating the dimension of a model,” Ann. Statist., vol. work for nonlinear dimensionality reduction,” Science, vol. 290, pp.
6, no. 2, pp. 461–464, 1978. 2319–2323, 2000.
[243] G. Scott, D. Clark, and T. Pham, “A genetic clustering algorithm guided [271] R. Tibshirani, T. Hastie, M. Eisen, D. Ross, D. Botstein, and P.
by a descent algorithm,” in Proc. Congr. Evolutionary Computation, vol. Brown, “Clustering methods for the analysis of DNA microarray
2, Piscataway, NJ, 2001, pp. 734–740. data,” Dept. Statist., Stanford Univ., Stanford, CA, Tech. Rep..
[244] P. Sebastiani, M. Ramoni, and P. Cohen, “Sequence learning via [272] R. Tibshirani and K. Knight, “The covariance inflation criterion for
Bayesian clustering by dynamics,” in Sequence Learning. ser. LNAI adaptive model selection,” J. Roy. Statist. Soc. B, vol. 61, pp. 529–
1828, R. Sun and C. Giles, Eds. Berlin, Germany: Springer-Verlag,
546, 1999.
2000, pp. 11–34.
[245] S. Selim and K. Alsultan, “A simulated annealing algorithm for the clus- [273] L. Tseng and S. Yang, “A genetic approach to the automatic clustering
tering problems,” Pattern Recognit., vol. 24, no. 10, pp. 1003–1008, problem,” Pattern Recognit., vol. 34, pp. 415–424, 2001.
1991. [274] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[246] R. Shamir and R. Sharan, “Algorithmic approaches to clustering gene [275] J. Venter et al., “The sequence of the human genome,” Science, vol. 291,
expression data,” in Current Topics in Computational Molecular Bi- pp. 1304–1351, 2001.
ology, T. Jiang, T. Smith, Y. Xu, and M. Zhang, Eds. Cambridge, MA: [276] J. Vesanto and E. Alhoniemi, “Clustering of the self-organizing map,”
MIT Press, 2002, pp. 269–300. IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 586–600, May 2000.
[247] R. Sharan and R. Shamir, “CLICK: A clustering algorithm with appli- [277] K. Wagstaff, S. Rogers, and S. Schroedl, “Constrained -means clus-
cations to gene expression analysis,” in Proc. 8th Int. Conf. Intelligent tering with background knowledge,” in Proc. 8th Int. Conf. Machine
Systems for Molecular Biology, 2000, pp. 307–316. Learning, 2001, pp. 577–584.
[248] G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi- [278] C. Wallace and D. Dowe, “Intrinsic classification by MML—The SNOB
resolution clustering approach for very large spatial databases,” in program,” in Proc. 7th Australian Joint Conf. Artificial Intelligence,
Proc. 24th VLDB Conf., 1998, pp. 428–439. 1994, pp. 37–44.
[249] P. Simpson, “Fuzzy min-max neural networks—Part 2: Clustering,” [279] H. Wang, W. Wang, J. Yang, and P. Yu, “Clustering by pattern similarity
IEEE Trans. Fuzzy Syst., vol. 1, no. 1, pp. 32–45, Feb. 1993. in large data sets,” in Proc. ACM SIGMOD Int. Conf. Management of
[250] Handbook of Pattern Recognition and Computer Vision, C. Chen, L.
Data, 2002, pp. 394–405.
Pau, and P. Wang, Eds., World Scientific, Singapore, 1993, pp. 61–124.
J. Sklansky and W. Siedlecki, “Large-scale feature selection”. [280] C. Wei, Y. Lee, and C. Hsu, “Empirical comparison of fast clustering
[251] T. Smith and M. Waterman, “New stratigraphic correlation techniques,” algorithms for large data sets,” in Proc. 33rd Hawaii Int. Conf. System
J. Geology, vol. 88, pp. 451–457, 1980. Sciences, Maui, HI, 2000, pp. 1–10.
[252] P. Smyth, “Clustering using Monte Carlo cross-validation,” in Proc. 2nd [281] J. Williamson, “Gaussian ARTMAP: A neural network for fast incre-
Int. Conf. Knowledge Discovery and Data Mining, 1996, pp. 126–133. mental learning of noisy multidimensional maps,” Neural Netw., vol.
[253] , “Clustering sequences with hidden Markov models,” in 9, no. 5, pp. 881–897, 1996.
Advances in Neural Information Processing, M. Mozer, M. Jordan, and [282] M. Windham and A. Culter, “Information ratios for validating mixture
T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, vol. 9, pp. 648– analysis,” J. Amer. Statist. Assoc., vol. 87, pp. 1188–1192, 1992.
654. [283] S. Wu, A. W.-C. Liew, H. Yan, and M. Yang, “Cluster analysis of
[254] , “Model selection for probabilistic clustering using cross gene expression data based on self-splitting and merging competitive
validated likelihood,” Statist. Comput., vol. 10, pp. 63–72, 1998. learning,” IEEE Trans. Inf. Technol. Biomed., vol. 8, no. 1, pp. 5–15,
[255] , “Probabilistic model-based clustering of multivariate and Jan. 2004.
sequen- tial data,” in Proc. 7th Int. Workshop on Artificial Intelligence [284] D. Wunsch, “An optoelectronic learning machine: Invention, experi-
and Sta- tistics, 1999, pp. 299–304. mentation, analysis of first hardware implementation of the ART1 neural
[256] P. Sneath, “The application of computers to taxonomy,” J. Gen. network,” Ph.D. dissertation, Univ. Washington, Seattle, WA, 1991.
Micro- biol., vol. 17, pp. 201–226, 1957. [285] D. Wunsch, T. Caudell, C. Capps, R. Marks, and R. Falk, “An optoelec-
[257] P. Somervuo and T. Kohonen, “Clustering and visualization of large pro-
tronic implementation of the adaptive resonance neural network,”
tein sequence databases by means of an extension of the self-
IEEE Trans. Neural Netw., vol. 4, no. 4, pp. 673–684, Jul. 1993.
organizing map,” in LNAI 1967, 2000, pp. 76–85.
[258] T. Sorensen, “A method of establishing groups of equal amplitude in [286] Y. Xiong and D. Yeung, “Mixtures of ARMA models for model-based
plant sociology based on similarity of species content and its application time series clustering,” in Proc. IEEE Int. Conf. Data Mining, 2002,
to analyzes of the vegetation on Danish commons,” Biologiske pp. 717–720.
Skrifter, vol. 5, pp. 1–34, 1948. [287] R. Xu, G. Anagnostopoulos, and D. Wunsch, “Tissue classification
[259] H. Späth, Cluster Analysis Algorithms. Chichester, U.K.: Ellis Hor- through analysis of gene expression data using a new family of ART
wood, 1980. architectures,” in Proc. Int. Joint Conf. Neural Networks (IJCNN’02),
[260] P. Spellman, G. Sherlock, M. Ma, V. Iyer, K. Anders, M. Eisen, P. Brown, vol. 1, 2002, pp. 300–304.
D. Botstein, and B. Futcher, “Comprehensive identification of cell cycle- [288] Y. Xu, V. Olman, and D. Xu, “Clustering gene expression data using
regulated genes of the Yeast Saccharomyces Cerevisiae by microarray graph-theoretic approach: An application of minimum spanning trees,”
hybridization,” Mol. Biol. Cell, vol. 9, pp. 3273–3297, 1998. Bioinformatics, vol. 18, no. 4, pp. 536–545, 2002.
[261] “Tech. Rep. 00–034,” Univ. Minnesota, Minneapolis, 2000.
[289] R. Yager, “Intelligent control of the hierarchical agglomerative clus- Donald C. Wunsch II (S’87–M’92–SM’94–F’05)
tering process,” IEEE Trans. Syst., Man, Cybern., vol. 30, no. 6, pp. received the B.S. degree in applied mathematics
835–845, 2000. from the University of New Mexico, Albuquerque,
[290] R. Yager and D. Filev, “Approximate clustering via the moun- and the M.S. degree in applied mathematics and
tain method,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 8, pp. the Ph.D. degree in electrical engineering from the
1279–1284, 1994. University of Washington, Seattle.
[291] K. Yeung, D. Haynor, and W. Ruzzo, “Validating clustering for gene Heis the Mary K. Finley Missouri Distinguished
expression data,” Bioinformatics, vol. 17, no. 4, pp. 309–318, 2001. Professor of Computer Engineering, University
[292] F. Young and R. Hamer, Multidimensional Scaling: History, Theory, and of Missouri-Rolla, where he has been since 1999.
Applications. Hillsdale, NJ: Lawrence Erlbaum, 1987. His prior positions were Associate Professor and
[293] L. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, pp. 338–353, 1965. Director of the Applied Computational Intelligence
[294] J. Zhang and Y. Leung, “Improved possibilistic C-means clustering al- Laboratory, Texas Tech University, Lubbock; Senior Principal Scientist,
gorithms,” IEEE Trans. Fuzzy Syst., vol. 12, no. 2, pp. 209–217, Apr. Boeing; Consultant, Rockwell International; and Technician, International
2004. Laser Systems. He has well over 200 publications, and has attracted over $5
[295] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data million in research funding. He has produced eight Ph.D. recipients—four in
clustering method for very large databases,” in Proc. ACM SIGMOD electrical engineering, three in computer engineering, and one in computer
Conf. Management of Data, 1996, pp. 103–114. science.
[296] Y. Zhang and Z. Liu, “Self-splitting competitive learning: A new on-line Dr. Wunsch has received the Halliburton Award for Excellence in Teaching
clustering paradigm,” IEEE Trans. Neural Networks, vol. 13, no. 2, and Research, and the National Science Foundation CAREER Award. He served
pp. 369–380, Mar. 2002. as a Voting Member of the IEEE Neural Networks Council, Technical
[297] X. Zhuang, Y. Huang, K. Palaniappan, and Y. Zhao, “Gaussian mixture Program Co-Chair for IJCNN’02, General Chair for IJCNN’03, International
density modeling, decomposition, and applications,” IEEE Trans. Image Neural Net- works Society Board of Governors Member, and is now President
Process., vol. 5, no. 9, pp. 1293–1302, Sep. 1996. of the Inter- national Neural Networks Society.