Fuzzy Clustering Algorithms and Their Applications
Fuzzy Clustering Algorithms and Their Applications
Department of Computing
Imperial College of Science, Technology and Medicine
University of London, London SW7 2AZ.
D ECEMBER 2000.
2
Abstract
The general problem of data clustering is concerned with the discovery of a group-
ing structure within a finite number of data points. Fuzzy Clustering algorithms pro-
vide a fuzzy description of the discovered structure. The main advantage of this de-
scription is that it captures the imprecision encountered when describing real-life data.
Thus, the user is provided with more information about the structure in the data com-
pared to a crisp, non-fuzzy scheme.
During the early part of our research, we investigated the popular Fuzzy c-Means
(FCM) algorithm and in particular its problem of being unable to correctly identify
clusters with grossly different populations. We devised a suite of benchmark data
sets to investigate the reasons for this shortcoming. We found that the shortcoming
originates from the formulation of the objective function of FCM which allows clusters
with relatively large population and extent to dominate the solution. This led to a
search for a new objective function, which we have indeed formulated. Subsequently,
we derived a new so-called Population Diameter Independent (PDI) algorithm. PDI
was tested on the same benchmark data used to study FCM and was found to perform
better than FCM. We have also analysed PDI’s behaviour and identified how it can be
further improved.
Since image segmentation is fundamentally a clustering problem, the next step was
to investigate the use of fuzzy clustering techniques for image segmentation. We have
identified the main decision points in this process. Furthermore, we have used fuzzy
clustering to detect the left ventricular blood pool in cardiac cine images. Specifically,
the images were of the Magnetic Resonance (MR) modality, containing blood velocity
data as well as tissue density data. We have analysed the relative impact of the velocity
data in the goal of achieving better accuracy. Our work would be typically used for
qualitative analysis of anatomical structures and quantitative analysis of anatomical
measures.
3
4
D EDICATION
5
6
Acknowledgments
While a thesis has a single author by definition, many people are responsible for
its existence. Dr Peter Burger, my supervisor, is perhaps the most important of these
people. Peter provided me with regular weekly meetings and many ideas. I wish to
thank him sincerely for being very supportive and friendly throughout the whole of
my PhD. I would also like to thank my examiners: Professor Michael Fairhurst of
the Electronic Engineering Department, University of Kent, Canterbury and Professor
Xiaohui Liu of the Department of Computer Science, Brunel University. I am grateful
to Dr. Daniel Rückert for commenting extensively on an earlier draft of this thesis. I
would also like to acknowledge Dr Guang-Zhong Yang for his help in my mock viva.
During my PhD journey I met a number of excellent people with whom I have
become good friends and therefore made the journey particularly enjoyable. I would
hope that we remain friends after we have all gone separate ways. Moustafa Ghanem:
thank you for your wise and light-hearted chats. Daniel Rückert and Gerardo Ivar
Sanchez-Ortiz: thank you for being special friends with whom boundaries faded. Ioan-
nis Akrotirianakis: thank you for our many shared magical moments. Tarkan Tahseen:
thank you for your friendship, inspiration, and all those netmaze sessions. Khurrum
Sair: thank you for putting up with all sorts of inconveniences from me and for our new
friendship. Outside of College, I would like to thank Atif Sharaf and Walid Zgallai for
being my good (half-Egyptian!) Arab friends with whom I shared many a good time.
On a more personal level, I would like to thank my parents, Amaal and Ismail —
their hard work gave me the opportunity to choose the path that led here — and my
sisters, Fatima and Iman, for their continuous support and encouragement.
Last but not least, I must thank the Department of Computing, Imperial College for
kindly allowing me to use its facilities even after the expiry of my registration period.
7
8
Contents
1 Introduction 17
1.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.1 Clustering Applications . . . . . . . . . . . . . . . . . . . . 19
1.1.2 Clustering Paradigms . . . . . . . . . . . . . . . . . . . . . . 21
1.1.3 Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 General Framework and Motivation . . . . . . . . . . . . . . . . . . 24
1.4 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Main Research Contributions . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Outline of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 26
9
2.3.4 Example: Hard -Means (HCM) . . . . . . . . . . . . . . . . 40
2.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Fuzzy Clustering 44
3.1 Fuzzy Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 The Fuzzy -Means Algorithm . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 FCM Optimisation Model . . . . . . . . . . . . . . . . . . . 47
3.2.2 Conditions for Optimality . . . . . . . . . . . . . . . . . . . 48
3.2.3 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.5 Analysis of FCM Model . . . . . . . . . . . . . . . . . . . . 51
3.2.6 Notes on Using FCM . . . . . . . . . . . . . . . . . . . . . . 53
3.2.7 Strengths and Weaknesses . . . . . . . . . . . . . . . . . . . 54
3.3 Extensions of FCM . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.1 Fuzzy Covariance Clustering . . . . . . . . . . . . . . . . . . 56
3.3.2 Fuzzy -Elliptotypes Clustering . . . . . . . . . . . . . . . . 58
3.3.3 Shell Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Modifications to the FCM Model . . . . . . . . . . . . . . . . . . . . 60
3.4.1 Possibilistic C -Means (PCM) Clustering . . . . . . . . . . . 60
3.4.2 High Contrast . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4.3 Competitive Agglomeration . . . . . . . . . . . . . . . . . . 63
3.4.4 Credibilistic Clustering . . . . . . . . . . . . . . . . . . . . . 64
3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10
4.2.3 Shape of FCM Objective Function . . . . . . . . . . . . . . . 86
4.3 Population-Diameter Independent Algorithm . . . . . . . . . . . . . 89
4.3.1 The New Objective Function . . . . . . . . . . . . . . . . . . 90
4.3.2 Conditions for Optimality . . . . . . . . . . . . . . . . . . . 91
4.3.3 PDI’s Improvement on FCM . . . . . . . . . . . . . . . . . . 91
4.4 Observations on PDI . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.4.1 Shape of Objective Function . . . . . . . . . . . . . . . . . . 101
4.4.2 Varying the r -Exponent . . . . . . . . . . . . . . . . . . . . 104
4.4.3 Resilience to Initialisation . . . . . . . . . . . . . . . . . . . 107
4.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 107
11
7 Conclusions and Further Work 146
7.1 Summary of Main Results . . . . . . . . . . . . . . . . . . . . . . . 146
7.2 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.3 Final Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
References 151
12
List of Figures
13
3.5 FCM results on data set containing noise points. . . . . . . . . . . . . 55
3.6 The Gustafson-Kessel Algorithm . . . . . . . . . . . . . . . . . . . . 57
14
5.1 The process of clustering image data for the purposes of segmentation. 111
5.2 MR brain images in three features. . . . . . . . . . . . . . . . . . . . 113
5.3 Intensity-based segmentation using clustering . . . . . . . . . . . . . 114
5.4 Locations of found prototypes plotted on top of the histogram of pre-
vious image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Intensity- and spatial-cooridnates-based segmentation . . . . . . . . . 117
5.6 A membership image. . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.7 A colour-coded segmentation of MR image. . . . . . . . . . . . . . . 121
5.8 A synthetic image with w =5 . . . . . . . . . . . . . . . . . . . . . 123
5.9 Intensity distribution for w =3 : : : 11. . . . . . . . . . . . . . . . . . 125
5.10 FCM and PDI results on synthetic images . . . . . . . . . . . . . . . 127
5.11 FCM and PDI results on synthetic images (contd.) . . . . . . . . . . . 128
15
List of Tables
16
CHAPTER 1
Introduction
This dissertation contributes to the subject area of Data Clustering, and also to the
application of Clustering to Image Analysis. Data clustering acts as an intelligent
tool, a method that allows the user to handle large volumes of data effectively. The
basic function of clustering is to transform data of any origin into a more compact
form, one that represents accurately the original data. The compact representation
should allow the user to deal with and utilise more effectively the original volume of
data. The accuracy of the clustering is vital because it would be counter-productive if
the compact form of the data does not accurately represent the original data. One of
our main contributions is addressing the accuracy of an established fuzzy clustering
algorithm.
In this introductory Chapter, we provide brief descriptions of the subjects of our re-
search, and establish the motivations and aims of the research we conducted. Section
1.5 provides a summary of the main research contributions presented in this disserta-
tion. The Chapter concludes with an outline of the remainder of the dissertation.
17
1.1. CLUSTERING 18
1.1 Clustering
Research on Clustering is well-established; it dates back to the 1950s and is widely re-
ported in various current journals. The research problem is concerned with discovering
a grouping structure within a number of objects.
The Swedish botanist Carolus Linnaeus, who was concerned with classification
in the plant and animal kingdom, wrote in his seminal 1737 work Genera Plantarum
[Everitt, 1974]:
18
1.1. CLUSTERING 19
ural distinctions this method comprehends the clearer becomes our idea
of things. The more numerous the objects which employ our attention the
more difficult it becomes to form such a method and the more necessary.
For we must not join in the same genus the horse and the swine, tho’ both
species had been one hoof’d nor separate in different genera the goat, the
reindeer, and the elk, tho’ they differ in the form of their horns. We ought
therefore by attentive and diligent observation to determine the limits of
the genera, since they cannot be determined a priori. This is the great
work, the important labour, for should the Genera be confused, all would
be confusion.
The explosion of sensory and textual information available to us today has caused
many data analysts to turn to clustering algorithms to make sense of the data (thereby
heeding Linnaeus’s warning on “confusion”). It has become a primary tool for so-
called knowledge discovery [Fayyad et al., 1996a; Fayyad et al., 1996b], data mining,
2
There is a case though for its inclusion back into the clustering domain especially in concept-
forming and machine learning applications, [Mirkin, 1999]. Our research, however, has followed the
established distinction between feature selection and clustering.
19
1.1. CLUSTERING 20
and intelligent data analysis [Liu, 2000]. In fact, the massively-sized data sets of these
applications have placed high demands on the performance of the computationally
expensive clustering algorithms.
1. Formulating hypotheses concerning the origin of the data (e.g., evolution stud-
ies).
3. Predicting the future behaviour of types of this data (e.g., modelling economic
processes).
If the temporal data tends to cluster, the predictive process can be simplified by
identifying patterns of temporal behaviour based on clusters. This can be then
generalised to similar types of data.
20
1.1. CLUSTERING 21
In Table 1.1 we list the five main clustering paradigms. We describe the main
feature of each paradigm and give recent examples from the literature. Each of these
paradigms is not exclusive and considerable overlap exists between them. In Chapter
2, we will concentrate on only the hierarchical and partitional paradigms.
Our research has used the paradigm of fuzzy clustering which is based on the elements
of fuzzy set theory. Fuzzy set theory employs the notion that for a given universe of
discourse, every element in the universe belongs to a varying degree to all sets defined
in the universe. In fuzzy clustering, the universe of discourse is all the objects and the
sets defined on the universe are the clusters. Objects are not classified as belonging
to one and only one cluster, but instead, they all possess a degree of membership with
each of the clusters.
The most widely used fuzzy clustering algorithm is called fuzzy -means, or FCM.
In the five years between January 1995 to December 1999 there were 124 journal
papers containing “fuzzy c-means” in their titles or abstracts. The subject areas of
the journals were many and included Process Monitoring, Soil Science, and Protein
Engineering. The papers were split between those reporting on an application of FCM
and those reporting on improving its performance in some way. Being so widely used,
21
1.1. CLUSTERING 22
22
1.2. IMAGE ANALYSIS 23
Today, imaging plays an important role in medical diagnosis, and in planning, exe-
cuting, and evaluating surgical and radiotherapeutical procedures. The information
extracted from images may include functional descriptions of anatomical structures,
geometric models of anatomical structures, or diagnostic assessment.
Most medical imaging modalities provide data in two spatial dimensions (2D) as
well as in time (2D + time cine sequence). Data in three spatial dimensions (3D)
as well as in time (3D + time, so-called 4D) are also becoming common. The large
amount of data involved necessitates the identification or segmentation of the objects
of interest before further analysis can be made. The result of this segmentation process
is the grouping or labelling of pixels into meaningful regions or objects. Currently,
segmentation is often carried out manually by experienced clinicians or radiologists.
23
1.3. GENERAL FRAMEWORK AND MOTIVATION 24
The ability to learn is an outstanding human faculty. This faculty allows us to interact
and deal successfully with new situations and to improve our performance at whatever
task we are performing. A simplified model of learning is that it is a process over
time that uses its percepts, or perceptive input from sensors, to add continuously to
and refine knowledge about its environment [Rumelhart et al., 1986; Russel & Norvig,
1995].
The discipline of science concerned with designing computer programs that learn,
so-called Machine Learning, concentrates on supervised learning methods [Niyogi,
1995; Mitchell, 1997]. These methods must be presented with prior training examples
so that they can perform in a successful manner when dealing with new data. The
training examples consist of a finite number of input-output pairs. From this training
set, the learning agent must discover the learning function so that when it is presented
with unencountered data, it produces a “reasonable” output. The learning function
represents the knowledge gained by the learning agent. Supervised methods, thus,
assume the existence of a training set for the percepts of the learning agent. What
about when there is no training set, as is often the case in early learning experiences?
Here, unsupervised learning methods must be used. These methods operate on only
the input percepts because no training examples are available. They must work on the
basis of minimal assumptions about the data. Thus, it is these methods that capture
the formative part of learning most [Michalski & Stepp, 1983; Stepp & Michalski,
1986]. Unsupervised learning acts as an exploratory tool, a tool by means of which a
preliminary model may be generated.
One of the primary unsupervised learning methods is clustering. Thus, the research
we carried out was motivated by the desire to improve and develop clustering methods
further so that better learning agents may be built.
Our research was also motivated by another interest. Human beings can by seeing
24
1.4. RESEARCH AIMS 25
a picture recognise things in it as well as learn new things about the scene. Studies of
the human visual sytem suggest that one of the primary operations carried out is clus-
tering of visual sensory data [Ahuja & Tuceryan, 1989; Mohan, 1992; Li, 1997]. The
research we undertook, particularly in the application of clustering to image analysis,
was motivated by the similarities between clustering and perceptual grouping in the
human visual system.
1. To investigate the main fuzzy clustering algorithms and to identify their stengths
and weaknesses.
2. To study the process of using clustering for image segmentation and analysis.
1. We studied and investigated the FCM algorithm thoroughly and identified its
main strengths and weaknesses.
25
1.6. OUTLINE OF THIS DISSERTATION 26
3. We proposed a new algorithm, based on FCM, which performs far more accu-
rately than FCM on data sets like those described above. We also investigated
performance properties of our new algorithm.
5. We carried out a case study in which we applied fuzzy clustering as the main
image analysis tool for a novel type of image in cardiac Magnetic Resonance
Imaging (MRI).
This dissertation can be viewed as constituting two parts: the first part is concerned
with the clustering of data of any type, whereas the second part is concerned with the
clustering of data extracted from images. Chapters 2, 3, and 4 focus on the first part,
and Chapters 5 and 6 focus on the second part.
Chapter 2, The Basics of Data Clustering, furnishes the reader with the general
framework of the data clustering problem. The nomenclature that we used throughout
the dissertation is presented. Examples of data typically used in clustering papers are
shown. Hierarchical and Partitional clustering are described. A brief outline of two
well-established clustering algorithms is given in order to familiarise the reader with
the approaches used in solving the clustering problem. Finally, a brief commentary on
new ideas in the clustering literature is presented.
Chapter 3, Fuzzy Clustering, presents a critical review of the fuzzy clustering field,
but particularly algorithms based on an objective function model and relating to FCM.
26
1.6. OUTLINE OF THIS DISSERTATION 27
First, the FCM algorithm is examined in detail. Second, extensions and developments
on FCM are briefly reviewed. The Chapter concludes with an overview of the weak-
nesses of FCM.
Chapter 4, A New Algorithm for Fuzzy Clustering, presents the Population Diame-
ter Independent (PDI) algorithm. This is an algorithm we propose that alleviates one of
the important weaknesses of the FCM algorithm which is its tendency to mis-classify
a data set containing smaller clusters located close to larger ones. An experiment is
presented to analyse FCM’s shortcoming and to motivate the new algorithm, PDI. The
name Population-Diameter Independent is given to the algorithm because its perfor-
mance remains more accurate than FCM and independently from the populations and
diameters of clusters involved. The Chapter concludes with a review of some of PDI’s
performance parameters.
Chapter 6, Application to Medical Image Analysis, presents the results of our work
to analyse Magnetic Resonance cardiac images. The work aims to track the left ven-
tricle in cine images of the heart. The types of image we used contain velocity data
as well as tissue density data. We followed the framework we outlined previously and
conclude by reporting our results on this novel application.
Chapter 7, Conclusions and Further Work, summarises the conclusions of our re-
search and outlines several ideas for further work based on the results we achieved.
27
CHAPTER 2
In this Chapter, we expand on this definition and provide an introduction to the field.
We defer the subject of Fuzzy Clustering to the next Chapter. Definitions of the nomen-
clature used for the remainder of the dissertation are provided in Section 2.1, and ex-
amples of dot patterns encountered in clustering literature are presented in Section 2.2.
Classically, clustering algorithms have been divided into Partitional and Hierar-
chical. In Section 2.3, hierarchical and partitional algorithms are described with the
specific examples of the Single Link hierarchical algorithm and the Hard -Means par-
titional (HCM) algorithm. The Chapter concludes with a brief review of new directions
in clustering.
28
2.1. NOTATION AND TERMINOLOGY 29
features
1 p
Observations
1.5 4.31 5.33 0.354
X = fx ; x ; : : : ; xN g:
1 2
where
Si \ Sj = i; j 2 f1; : : : ; g; i 6= j
29
2.1. NOTATION AND TERMINOLOGY 30
and
Si 6= i 2 f1; : : : ; g
In fuzzy clustering (described in detail in Chapter 3), the goal would be to find
the partition matrix, U. The partition matrix is a real N matrix that defines
membership degrees for each feature vector. U is defined by:
U 2 RN u
= [ ik ℄ i 2 f1; : : : ; g; k 2 f1; : : : ; N g
Clusters should contain feature vectors relatively similar to one another. In the
general case, therefore, the results of a given clustering method very much depend on
the similarity measure used. The similarity measure will provide an indication of
proximity, likeness, affinity, or association. The more two data objects resemble one
another, the larger a similarity index and, conversely, the smaller a dissimilarity index.
In this sense, the Euclidean distance between two data vectors is a dissimilarity index,
whereas the correlation is a similarity index.
Data sets may not always contain only numeric data. Many feature observations,
especially data collected from humans, are binary. These would require an appropri-
ate similarity measure like: matching coefficients. In some cases, feature observa-
tions would have been obtained from a time-series. An appropriate similarity measure
should then take account of the temporal nature of these data. Furthermore, there are
situations where the features would be of mixed types, or when data observations are
missing. We refer the reader to [Backer, 1995] for an introduction to common ways of
extracting similarity measures for binary, mixed, and missing data.
In some applications dissimilarity data are collected directly in the form of a dis-
30
2.1. NOTATION AND TERMINOLOGY 31
Most partitional clustering methods utilise the concept that: for a given cluster
i, there exists an ideal point pi , such that pi 2 Rp , which best represents cluster
i’s members. This point is called the prototype of the cluster. Thus, the clustering
problem becomes that of finding a set of prototypes,
P = fp ; p ; : : : ; p g
1 2 where pi 2 Rp 8i 2 f1; : : : ; g
that best represent the clustering structure in X .
We note that in the general case, prototypes are not restricted to points. This is so
that they can better represent any possible cluster shape. For example, a ring-shaped
cluster would be best represented with a circle-prototype. Further, a prototype may be
composed of a set of points instead of a single point. However, choosing non-single-
point prototypes renders the clustering problem harder. We do not delve into this in
this dissertation. For now, we assume a set of single-point-prototypes as defined above.
31
2.1. NOTATION AND TERMINOLOGY 32
q
kxk pik = (xk 1 pi1 )2 + + (xkp pip )2 (2.1)
where Cx is the covariance matrix of X . The price to pay for the scale-invariance of the
Mahalanobis metric is the determination of covariance matrix and added computational
complexity.
32
2.2. EXAMPLES OF DATA SETS 33
(a) (b)
(c) (d)
(e) (f)
Figure 2.2: Sample dot patterns of two clusters of varying densities and separation —
reproduced from [Zahn, 1971].
The six dot patterns of Figures 2.2(a),(b), (c), (d), (e), and (f) show a pair of clus-
ters but with different varying point densities and varying degrees of separation. In
33
2.2. EXAMPLES OF DATA SETS 34
(a), the two clusters have approximately the same density. In (b), they have unequal
densities. In (c), the densities vary proportionally to the distance from the mean. In
(d), the clusters have smoothly varying densities, and the separation appears nearest
the points of highest densities of the two classes. In (e) and (f), the separation between
clusters becomes almost non-existent, as the clusters touch each other. These six dot
patterns should not pose a problem to a lot of the established algorithms available to-
day. However, in certain situations the accuracy of detected clustering structure may
be compromised. Our research has examined this issue in detail and we shall describe
our results in Chapter 4.
Figure 2.3 shows a plot of clusters of linear fragments with a branch-like structure.
Here, humans might themselves be unable to agree on whether there is any clustering
and if so, what it is. However, given the information that plots of this kind consist
of linear fragments, most of us would not have problems identifying the clustering
structure. On the other hand, clustering algorithms that are specifically designed to
detect linear cluster structures might fail.
Figure 2.4 shows a plot of two well defined clusters, but in a different type of pat-
tern than that of Figures 2.2(a), (b), or (c). Here, the performance of many algorithms
would be ad-hoc, depending on the length of each “string” of points and how far the
strings are apart.
Figure 2.5 shows a plot of clusters with one class enclosed by the other, but both
34
2.2. EXAMPLES OF DATA SETS 35
well-defined. With the exception of the shell clustering algorithms, no other algo-
rithms would be capable of handling ring-like patterns like this. Shell clustering is a
recent development in cluster analysis and suffers from the fact that it looks for only
shells. The non-ring shaped cluster within the shell in Figure 2.5 may confuse such
algorithms.
Figure 2.6 shows a point pattern that may have been extracted from an image pro-
cessing application. Region- or edge-based operators may have been applied to the
original image and the resulting image then thresholded. Most clustering algorithms
we know would fail with this point pattern because of the containment of one group of
points within another.
Having realised the limited success achieved in Clustering so far, we should hasten
to add that with regards to the point patterns of Figure 2.6 our expectations are mainly
35
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 36
Figure 2.6: Sample dot pattern possibly extracted from an image after some image-
processing stages.
Clustering methods tend to be divided in the literature into hierarchical and partitional
methods. In hierarchical clustering (the older of the two), a tree-structured partitioning
of the data set is produced. The tree is either constructed top-down or bottom-up, with
an all-inclusive cluster at the top of it and the individual data points at the bottom of it.
Different partitions may be suggested according to where we “cut” the tree.
Hierarchical clustering algorithms transform a proximity data set into a tree-like struc-
ture which for historical reasons is called a dendogram [Jardine & Sibson, 1971]. The
36
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 37
dissimilarity 1
0
1
01
1 0
2
0
1 1
0
1
0
x1 3
x2 4
1
0
0
1 5
x3 1
0
1
0
x4 6
x5 1
0
0
1
x6 7
1
0
1
x7 0
Figure 2.7: An example of the dendogram that might be produced by a hierarchical al-
gorithm from the data shown on the right. The dotted lines indicate different partitions
at different levels of dissimilarity.
dendogram is constructed as a sequence of partitions such that its root is a cluster cov-
ering all the points and the leaves are clusters containing only one point. In the middle,
child clusters partition the points assigned to their common parent according to a dis-
similarity level. This is illustrated in Figure 2.7. (We remark that the dendogram is not
a binary tree.) The dendogram is most useful up to a few levels deep, as the clustering
becomes more trivial as the tree depth increases.
37
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 38
2 Find the smallest entry in the dissimilarity matrix and merge the corresponding
two clusters
3 Update the dissimilarities between the new cluster and other clusters
dissimilarities are updated in step (3). In Section 2.3.2 we describe one such way.
Both agglomerative and divisive techniques suffer from the fact that if, say, at one
point during the construction of the dendogram, a misclassification is made, it is built
on until the end of the process. At some point of the dendogram’s growth an observa-
tion may be designated as belonging to a cluster in the hierarchy. It remains associated
with the successors of that cluster till the dendogram is finished. It is impossible to
correct this misclassification while the clustering process is still on. Optimization of
clusterings is then called for [Fisher, 1996].
After the tree has been produced, a multitude of possible clustering interpretations
are available. A practical problem with hierarchical clustering, thus, is: at which value
of dissimilarity should the dendogram be cut, or in other words, at which level should
the tree be cut. One heuristic commonly used is to choose that value of dissimilarity
where there is a large “gap” in the dendogram. This assumes that a cluster that merges
at a much higher value of dissimilarity than that at which it was formed is more “mean-
ingful”. However, this heuristic does not work all the time [Jain, 1986].
38
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 39
2 The smallest entry in the matrix is chosen, and the two points, a and b are fused
together as one group.
3 The dissimilarity matrix is updated by reducing its size by one and recalculating
the distances using the nearest neighbour rule. Thus for observation k and the
newly formed (ab) cluster:
in Figure 2.9. For a description of other possible algorithms see [Everitt, 1974].
39
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 40
1 Fix , 2 < N , choose the objective function you wish to minimise, and
initialise the partition matrix
2 Evaluate the objective function, and modify the partition matrix accordingly
non-linear, the optimal partition will usually have to be searched for algorithmically.
The initial placement of the prototypes, thus, is important since there can be many
suboptimal solutions that will trap the prototypes and terminate the algorithm.
Objective functions are specified using the data set, X , a distance metric, d, the
partition matrix, U , and the set of cluster prototypes P . The data set X and the metric
d are fixed and act as input. U and P are variables whose optimal values are being
sought. This can be represented mathematically as:
min J
[ ( P ; U ; X ; d; : : :)℄
where J is a generic objective function whose minimum value is being sought. The
dots after D indicate that a given formulation of the objective function can use its
own set of parameters. The squared error criterion, which minimises offsets between
a prototype and its nearest points, is the most common formulation of the objective
function.
The HCM algorithm has appeared in different equivalent versions over the years since
its first appearance in the sixties. It was given the name Hard because it produces a
crisp, or hard, partition (as opposed to fuzzy, or soft, partition, as described before).
40
2.3. HIERARCHICAL AND PARTITIONAL CLUSTERING 41
Further, HCM shares the -means part of its name with many prototype-based parti-
tional algorithms. The reason is because they search for prototypes, which intuitively
are the means, or centroids, of the clusters they represent. The objective function min-
imised in this algorithm is:
0 1 0 1
X X X X
J= d2ik A = kxk p ik
2A
pi = X uikxk = X uik
N N
(2.4)
k=1 k=1
Most versions of HCM operate in the same way as the oldest and frequently cited
algorithm of Forgy [Forgy, 1965] which is given in Figure 2.11. Its intuitive proce-
dure is: guess hard clusters, find their centroids, reallocate cluster memberships to
minimise squared errors between the data and current prototypes; stop when looping
ceases to lower J . Since the memberships are discrete, either 0 or 1, the notion of local
minimum is not defined for J , and likewise convergence would be undefined.
There are probably hundreds of papers detailing the theory and applications of
HCM (other names like ISODATA, k -means, etc, have also been used); [Duda & Hart,
41
2.4. REMARKS 42
1973] surveys some of this literature. HCM suffers from the weakness of producing
spurious solutions because the algorithm’s iterative steps may not converge [Bezdek,
1980; Selim & Kamel, 1992]. It also does not provide the wealth of information fuzzy
clustering provides.
2.4 Remarks
1 studying the raw data in terms of processing it, dealing with missing values in it,
or deciding on the features to use,
3 studying the parameter list of the algorithm and setting the parameters to appro-
priate values, perhaps revisting this step a number of times to experiment with
different values,
42
2.4. REMARKS 43
Before moving to the main theme of our research, fuzzy clustering, we conclude
this Chapter with examples of recent novel clustering approaches.
The notion of scale space was used for hierarchical clustering in [Roberts, 1996],
producing good results. However, the problem of where to cut the resulting tree still
persists. Scale space was also used in [Kothari & Pitts, 1999] to find and validate
clustering results.
43
CHAPTER 3
Fuzzy Clustering
In the previous Chapter we described the general clustering problem and gave exam-
ples of crisp hierarchical and partitional algorithms.
44
3.1. FUZZY SET THEORY 45
Fuzzy Set Theory was developed by Lotfi Zadeh [Zadeh, 1965] in order to describe
mathematically the imprecision or vagueness that is present in our everyday language.
Imprecisely defined classes play an important role when humans communicate and
learn. Despite this imprecision, humans still carry out sensible decisions. In order to
deal with these classes, Zadeh introduced the concept of a fuzzy set. Fuzzy sets parallel
ordinary mathematical sets but are more general than them in having a continuum of
grades, or degrees, of membership.
Based on the above definition for the fuzzy set, extensions for definitions involving
ordinary sets like empty, equal, containment, complement, union, and intersection have
been proposed. We refer the reader here to the wide literature available on this matter
[Kosko, 1993; Zadeh & Klir, 1996; Klir et al., 1997; Cox, 1998].
In the fuzzy clustering setting, a cluster is viewed as a fuzzy set in the data set, X .
Thus each feature vector in the data set will have membership values with all clusters
— membership indicating a degree of belonging to the cluster under consideration.
The goal of a given fuzzy clustering method will be to define each cluster by finding
its membership function.
In the general case, the fuzzy sets framework provides a way of dealing with prob-
lems in which the source of imprecision is the absence of sharply defined criteria of
class membership rather than the presence of random variables. Fuzzy clustering fits
well with the rest of the fuzzy sets and systems applications. It has been used with
success in, for example, optimising membership functions for forming fuzzy inference
45
3.2. THE FUZZY C -MEANS ALGORITHM 46
Fuzzy set theory is widely used as a modeling tool in various Pattern Recognition
and Image Analysis problems, [Rosenfeld, 1979; Philip et al., 1994] for example, be-
cause of the relative ease with which it can be applied to a problem and the robustness
of the resulting solution.
For a discussion of the future directions of fuzzy logic as seen by its founder see
[Zadeh, 1995; Zadeh, 1996; Zadeh, 1999]. Fuzzy logic is seen ultimately as a method-
ology for computing with words (CW) in which words are used in place of numbers
for computing and reasoning. The rationale for CW is that words become a necessity
when the available information is too imprecise to justify the use of numbers. And also
when there is a tolerance for imprecision which can be exploited to achieve tractability,
robustness, low solution cost, and better human-computer interaction.
The FCM algorithm took several names before FCM. These include Fuzzy ISODATA
and Fuzzy k -Means. The idea of using fuzzy set theory for clustering is credited to
Ruspini [Ruspini, 1969; Ruspini, 1970]. Dunn is credited with the first specific for-
mulation of FCM, [Dunn, 1973], but its generalisation and current framing is credited
to Bezdek, [Bezdek, 1981]. A collection of influential papers in the development of
fuzzy clustering methods can be found in [Bezdek & Pal, 1992]. The FCM objective
function and its generalisations are the most heavily studied fuzzy model in Pattern
Recognition.
46
3.2. THE FUZZY C -MEANS ALGORITHM 47
function. FCM is, thus, first and foremost an objective function. The way that most
researchers have solved the optimisation problem has been through an iterative locally-
optimal technique, called the FCM algorithm. This is not the only way to solve the
FCM objective function, for example, in [AlSultan & Selim, 1993] it is solved by the
Simulated Annealing optimisation technique; in [Hathaway & Bezdek, 1995] the prob-
lem is reformulated and general optimisation methods are suggested for its solution;
in [Al-Sultan & Fedjki, 1997] it is solved by a combinatorial optimisation technique
called Tabu Search; in [Hall et al., 1999] it is solved by the genetic algorithm which
is an optimisation technique based on evolutionary computation; and in [Runkler &
Bezdek, 1999] it is solved within an alternate optimisation framework. In fact, it is not
impossible that an exact solution to the problem may be formulated.
XX
N
Minimise JF CM (P ; U ; X ; ; m) = u
( ik )
m
d2ik (xk ; pi ) (3.1)
i=1 k=1
X
subject to the constraint uik = 1 8k 2 f1 : : : N g; (3.2)
i=1
where P and U are the variables whose optimal values are being sought. X, , and m
are input parameters of JF CM , where :-
m 1 is a fuzzification exponent that controls how fuzzy the result will be.
The larger the value of m the fuzzier the solution. At m = 1 FCM collapses to
HCM, giving crisp results. At very large values of m, all the points will have
equal memberships with all the clusters.
47
3.2. THE FUZZY C -MEANS ALGORITHM 48
uik describes the degree of membership of feature vector xk with the cluster
represented by pi . U = [uik ℄ is the N fuzzy partition matrix satisfying the
constraint stated in Equation 3.2.
A is any positive definite matrix which in the case of Euclidean distance is the
identity matrix.
PN
pi = Pk=1N uikmxk
m
(3.3)
k=1 uik
and
uik = P
1
1=(m 1)
(3.4)
d2
ik
j =1 ( d2 )
jk
1
The FCM algorithm is a sequence of iterations through the equations above,
which are referred to as the update equations. When the iteration converges, a fuzzy
1
This is referred to as Picard iteration in [Bezdek, 1981]; Picard iteration [Greenberg, 1998] is
a successive approximation scheme commonly used to solve differential equations, which starts with
initial guesses of the variables and by means of successive substitution arrives at a solution.
48
3.2. THE FUZZY C -MEANS ALGORITHM 49
-partition matrix and the pattern prototypes are obtained. A proof of the convergence
of the iterations to a local minimum can be found in [Bezdek, 1980; Selim & Kamel,
1992].
1. Fix , 2 < N ; choose any inner product norm; fix m, 1 m < 1; initialize
the fuzzy membership matrix, U .
4. Compare the change in the membership values using a appropriate norm; if the
change is small, stop. Else return to 2.
3.2.4 An Example
Let us give an example of FCM in action. Figure 3.2 shows the data set that we used as
input to FCM ( = 2). The table on the right of Figure 3.2 tabulates the found partition
matrix.
Whereas the solution is an approximately correct one, the locations of the found
prototypes are not satisfactory since they should be at the centres of the diamond-
like patterns. It is clear that the points located away from the diamond patterns have
influenced FCM’s solution in that they have “pulled” the prototypes away from the
ideal locations. We note that, as expected, the membership values per each point add
up to one.
49
3.2. THE FUZZY C -MEANS ALGORITHM 50
Data Memberships
x y Cluster 1 Cluster 2
1.8 2 0.997 0.003
4 2.0 2.2 1.000 0.000
2.0 1.8 0.995 0.005
3
2.2 2 0.997 0.003
2 2.0 3.5 0.968 0.032
8.8 3 0.000 1.000
1
1 2 3 4 5 6 7 8 9 10 9.0 3.2 0.003 0.997
9.0 2.8 0.003 0.997
9.2 3 0.006 0.994
7 2.8 0.100 0.900
Figure 3.2: A 10-point data set with two clusters and two outlying points. Input data
points are marked with a + and the prototypes found by FCM are marked with x.
Membership values provided by FCM are tabulated on the right hand side. The found
prototypes are at (2:0; 2:2) and (8:7; 3:0) instead of ideal placement at (2:0; 2:0) and
(9:0; 3:0).
We remark here on our definition of outlier and noise points. There is a lot of literature
on outlier detection and rejection (see [Millar & Hamilton, 1999] for a recent review).
In this dissertation, we took the view that every outlier point can be associated with one
cluster in the data in the sense that it would be lying close to that cluster. Also, we took
the view that the few points in a data set that cannot be said to be close to any cluster,
be considered noise points. We recognise that a dense collection of outliers could
become, at some scale, a “small” cluster of its own, but we operate on the assumption
that the number of outliers is insignificant and that we already know the correct number
of clusters.
50
3.2. THE FUZZY C -MEANS ALGORITHM 51
Let us first start by describing the HCM (hard -means) model. The optimisation
approach to the clustering problem uses an objective function to measure how good
a suggested partition is compared to an ideal, generalised one. This is facilitated by
using the concept of cluster prototypes; by introducing them, the formulation of the
objective function is made easier. In the ideal scenario, the prototypes are located
within very tightly packed clusters of points so that the distances between every cluster
of points and its prototype would be almost zero. Deviations from this model can then
be formulated, in squared-error fashion, as:
X X
d2ik (xk ; pi )
i=1 k;xk 2Si
FCM generalised the notion of membership to emulate the fuzzy clustering struc-
tures found in the real-world. The FCM objective function weighted the distance be-
tween a given data point and a given prototype by the corresponding degree of mem-
bership between the two (the respective entry in the fuzzy partition matrix). Thus,
partitions that minimise this function are those that weight small distances by high
membership values and large distances by low membership values. This was formu-
lated as per Equation 3.1. To visualise this, consider Figure 3.3. If point 6 is given a
high membership value with prototype B as compared to points 2 and 3, the overall
objective function score will be minimal compared to any other membership scheme
involving those three points and that prototype.
51
3.2. THE FUZZY C -MEANS ALGORITHM 52
8
B 5
uB2
2 uB6
1 7
4 6
A uB3
Figure 3.3: The distances between points 1 8 and prototypes A and B are weighted
by the degrees of memberships. Here, the distances and memberships concerning only
prototype B are shown.
0.7
0.6
u-ax
0.5
0.4
0.3
x
0.2
dax d bx 0.1
0
0 1 2 3 4 5 6 7 8 9 10
a b d-bx
Figure 3.4: Fixing dax at 1, uax is plotted versus dbx for m = 1:1; 2:0 and 5:0. This
clearly illustrates that uax changes in value depending on the location of the prototype
b. Note that as m approaches 1 the membership decision becomes a crisp one.
However, if things were left at the objective function formulation, without the con-
straint of Equation 3.2, all the uik ’s would take the value of zero as this would set J
to the absolute minimal value of zero, which is a trivial solution. In order to force the
uik ’s to take values greater than zero, the constraint was imposed. This way, degrees
of membership must take non-trivial values.
Looking now at the minimisers of the objective function, Equations 3.3 and 3.4,
we see that the prototypes are the fuzzy centroids, or means, of their respective mem-
bership function. This is an intuitively-pleasing result.
Further, we see that a point’s membership with a given prototype is affected by how
52
3.2. THE FUZZY C -MEANS ALGORITHM 53
far that same point is to the other prototypes. This is illustrated in Figure 3.4, where
1
uax = d2 (1=m
:
1 + ( dax
2 )
1)
bx
Figure 3.4: all curves pass through 0:5 for dbx = 1). The intuitive solution would be
to award such points equal but small membership degrees with each cluster. However,
such a solution would violate the constraint of equation 3.2 (memberships must add
to 1). If we observe Figure 3.4, we notice that a point’s membership degree is not a
function of anything but its relative distances to each prototype. The presence of many
points close to one prototype which is our (human) cue to the “noiseness” of a point,
is not included. Later in this Chapter, we will present brief summaries of some ideas
proposed to alleviate this counter-intuitive behaviour.
Several investigations have been made on the best value to choose for the fuzzification
exponent,m, which is chosen a priori. A recent study [Pal & Bezdek, 1995] con-
cludes empirically that m = 2:0 is a “good” value. This value for m has the further
advantage of simplifying the update equations and can therefore speed up computer
implementations of the algorithm.
Many investigations have been made on the convergence properties of FCM, for
example, [Bezdek, 1980; Selim & Kamel, 1992]. The conclusion is that the constraint
of Equation 3.2 is a necessary condition for the proof of convergence to a local mini-
mum of the FCM algorithm.
53
3.2. THE FUZZY C -MEANS ALGORITHM 54
[Cannon et al., 1986]. Recent examples of such studies are geared towards image
analysis applications [Cheng et al., 1998; Smith, 1998], and report orders of magnitude
speed-ups.
The FCM algorithm has proven a very popular method of clustering for many reasons.
In terms of programming implementation, it is relatively straightforward. It employs an
objective function that is intuitive and easy-to-grasp . For data sets composed of hyper-
spherically-shaped well-separated clusters, FCM discovers these clusters accurately.
Furthermore, because of its fuzzy basis, it performs robustly: it always converges to a
solution, and it provides consistent membership values.
2. Initialisation
(a) It requires initialisation for the prototypes, good initialisation positions are
difficult to assess.
(b) If the iterative algorithm commonly employed for finding solutions of the
FCM objective function is used, it may find more than one solution de-
pending on the initialisation. This relates to the general problem of local
and global optimisation.
3. It looks for clusters of the same shape (hyper-spheres if using the Euclidean
metric); different cluster shapes cannot be mixed.
4. Its objective function is not a good clustering criterion when clusters are close to
one another but are not equal in size or population. This is studied comprehen-
sively in Chapter 4.
54
3.2. THE FUZZY C -MEANS ALGORITHM 55
Data Memberships
22
x y Cluster 1 Cluster 2
20
12.0 3.0 0.975 0.025
18
12.0 4.0 0.983 0.017
16
11.5 3.5 0.989 0.011
14
12.5 3.5 0.967 0.033
12
21.0 10.0 0.028 0.972
10
21.0 11.0 0.009 0.991
8
20.5 10.5 0.014 0.986
6
21.5 10.5 0.021 0.979
4
2.0 4.0 0.845 0.155
2
19.0 20.0 0.174 0.826
0 11.0 12.0 0.588 0.412
0 2 4 6 8 10 12 14 16 18 20 22 24
Figure 3.5: A data set containing noise points. The prototypes found by FCM are
also plotted. Membership values provided as output are shown on the right hand side.
The presence of noise points strongly affected the positions of the found prototypes,
furthermore, the noise points’ membership values may be consistent but they are not
intuitive.
5. Its accuracy is sensitive to noise and outlier points (as demonstrated in Figure 3.2
and also again in Figure 3.5 where the placement of the prototypes was affected
by the outlying points). This is so because it squares the “error” between a
prototype and a point, thus, the effect of outlier and noise points is emphasised.
6. It gives counter-intuitive membership values for noise points. Noise points are
those that do not belong to any cluster, thus, their type of memberships should
not necessarily sum to one. In Figure 3.5, for example, the far points to the top
and bottom of the plot should have low memberships with both clusters. How-
ever, FCM gives each of them a membership value of more than : with their
08
55
3.3. EXTENSIONS OF FCM 56
Despite its weaknesses, the strengths of FCM have led researchers to generalise and
extend it further. In fuzzy covariance clustering, covered in Section 3.3.1, hyper-
ellipsoids can be detected instead of only hyperspheres. In elliptotype clustering, cov-
ered in Section 3.3.2, lines or planes can be detected by means of looking for hyper-
ellipsoids with a flat thickness. In shell clustering, covered in 3.3.3, boundaries of
spheres and ellipsoids are detected. All these extensions cannot mix cluster shapes,
i.e., they cannot look for a line and a circular shell simultaneously. Furthermore, they
are all very sensitive to initialisation and much more computationally expensive than
FCM. However, they must be considered as necessary evolutionary steps in the devel-
opment of better fuzzy clustering algorithms. This view also underlies our own work
in Chapter 4.
Gustafson and Kessel [Gustafson & Kessel, 1979] introduced a new variation on the
FCM functional given by Equation 3.1 by allowing the inner product inducing matrix
A used in the distance metric to vary per each cluster. In other words, they allowed
each cluster to have its own A-norm with which to measure distances from its proto-
type. This allows different clusters to have differing ellipsoidal shapes. Thus, their
56
3.3. EXTENSIONS OF FCM 57
X
N X X
J(P ; U ; A; X ; ; m) = umik k xk pi kA 2
i
= umik x pi)T Ai(xk pi)
( k
k=1 i=1 i=1
(3.5)
where Ai is a positive definite symmetric matrix. An additional constraint to the con-
straint of equation 3.2 was imposed. This is:
4. Calculate A’s by
1=p
Ai 1
=(
1
i jCij
) Ci
where Ci, the fuzzy covariance matrix, is given by:
Ci = k uik(xPk pi)(xk pi)
PN m T
=1
N
k=1 umik
5. If termination condition not achieved, return to step 2.
The resulting optimality conditions remain the same with the addition of an update
equation for the Ai ’s. The modified algorithm is described in Figure 3.6. Allowing
Ai to vary for each cluster enables the detection of ellipsoidal-shaped clusters each
with a differing orientation. The new constraint above limits the volume within which
an A-norm metric can have influence. The new constraint may have been placed to
simplify deriving update equations that would allow implementation of the method
57
3.3. EXTENSIONS OF FCM 58
The Fuzzy -Elliptotypes (FCE) algorithm was proposed by Bezdek et al. to detect
clusters that have the shape of lines or planes [Bezdek, 1981]. Its main idea is to
discount Euclidean distances for points lying along the main eigenvector directions
of a cluster (like those lying on a line) while taking the Euclidean distance in full
for other points. This is achieved by means of using a distance measure which is a
weighted combination of two distance measures:
d2 (xk ; pi ) = d2V ik + (1 d2 :
) Eik (3.7)
X
r
d2V ik =k xi pi k2
x pi) eij )
(( k
j =1
where r 2 [1; p℄, and eij is the j th eigenvector of the covariance matrix Ci of cluster
i. (The operator denotes the dot product of the two vectors.) The eigenvectors are
assumed to be arranged in descending order of the corresponding eigenvalues. Thus,
the first eigenvector describes the direction of the longest axis of the cluster. When r =
d
1, 2V ik can be used to detect lines, and when r =2 it can be used to detect planes. The
value of in Equation 3.7 varies from 0 to 1 and needs to be specified a priori, but there
is a dynamic method commonly used in the algorithm’s implementations (see [Davé,
1992]). It has been shown [Krishnapuram & Kim, 1999] that by allowing this dynamic
58
3.3. EXTENSIONS OF FCM 59
variation the FCE algorithm avoids the G-K algorithm’s shortcoming of looking for
clusters of equal volumes. However, since it looks for only linear structures, it would
therefore fit these structures onto data that may not contain them. The update equations
for this algorithm can be shown to be equivalent to those of fuzzy covariance clustering.
The main application of shell clustering algorithms is in image processing. Images are
pre-processed for edge detection and the edge pixels are then fed to these algorithms
for boundary detection. There are several variants of shell clustering algorithms and a
full review of them can be found in [Hoppner et al., 1999].
The main innovation behind every shell clustering algorithm is the distance mea-
sure it uses. In the Fuzzy c-shells algorithm by Davé, the prototype for a circular shell
cluster is described by its centre point and radius, pi and ri, respectively. The distance
measure is:
d2 (xk ; (pi ; ri)) = (k xk pi k ri )2
In the fuzzy c-spherical shells algorithm the distance measure used instead is:
d2 (xk ; (pi ; ri )) = (k xk pi k2
ri2 )2
This distance measure is more sensitive to points on the outside of the shell than on
the inside but has the advantage of simplifying the update equations. In the adaptive
fuzzy c-shells algorithm, shells in the shapes of ellipses are detected by means of the
distance measure:
q
d2 (xk ; (pi ; A)) = ( x pi)T A(xk pi)
( k 1)
2
where A is a positive definite matrix that contains the axes and orientations of the
ellipse. A more complex distance measure for shell ellipsoids is described in [Frigui
59
3.4. MODIFICATIONS TO THE FCM MODEL 60
Several attempts have been made to remedy one or more of the shortcomings we men-
tioned in Section 3.2.7. In Possibilistic Clustering, covered in Section 3.4.1, the mem-
bership value of a point with a cluster does not depend on the location of other cluster
prototypes. In High Contrast Clustering, covered in 3.4.2, mixtures of the hard and
fuzzy c-means algorithms will be formulated. In Competitive Agglomeration, covered
in 3.4.3, the requirement for specifying is overcome by means of starting with a large
value for it and subsequently letting bigger clusters compete for the smaller ones. In
Credibilistic Clustering, covered in 3.4.4, noise points are identified first as not credi-
ble representatives of the data set and awarded membership values that do not sum up
to 1.
Krishnapuram and Keller [Krishnapuram & Keller, 1993a] removed what they termed
the probabilistic constraint of Equation 3.2 by allowing the degrees of membership, uij ,
to take on any value within the [0 1℄ range. Their argument for the removal of the
constraint was: the membership function of a set should not depend on the membership
functions of other fuzzy sets defined in the same domain of discourse. The uij ’s were
therefore allowed to take on any value within the [0 1℄ range, but in order to avoid
60
3.4. MODIFICATIONS TO THE FCM MODEL 61
max ij
i
u >0 8j (3.8)
Thus, the memberships values generated are taken as absolute, not relative, and denote
degrees of belonging or typicality of the cluster in question.
XX
N X X
N
J(U; v; X ) = u d
m 2
ik ik + i (1 uik )m (3.9)
i=1 k=1 i=1 k=1
where i are positive numbers. The first term is the normal FCM objective function
which is minimised for compact and well-separated clusters, whereas the second term
forces theuik ’s to be as large as possible, thus avoiding the trivial solution. This
formulation of the objective function leads to the update equation of uik to be modified
to
1
uik = 2
1 (3.10)
d m 1
1 + ( ik )
i
The value of i determines the distance at which the membership value of a point
in a cluster becomes 0.5. If all clusters are expected to be similar in size, this value
could be the same for all of them. In the objective function we notice that the value
of i determines the relative importance of the second term and the authors observe
that it should therefore be of the same range as d2ik if equal weighting to both terms is
desired.
This definition of possibilistic clustering can be applied to the other fuzzy cluster-
ing algorithms. So, if we use the FCM algorithm but update uik according to Equa-
tion 3.10 above (plugging in the suitable values for the i ’s), the algorithm becomes
the Possibilistic -Means algorithm (PCM). Likewise we may have the Possibilistic
Gustafson-Kessel algorithm, Possiblistic -Spherical Shells algorithm, and so on.
61
3.4. MODIFICATIONS TO THE FCM MODEL 62
The success of PCM is very much dependent on the initialisation, as it may not
converge or the prototypes may even conincide [Barni et al., 1996; Krishnapuram
& Keller, 1996]. The values of i to use are probably the most difficult choice to
make when using this algorithm. The authors themselves recommend running FCM
once and estimating i from its output, then running PCM and adjusting i in its first
few iterations in order to provide the most meaningful values of uik while bypassing
the danger of not converging to a stationary point. The main advantage of PCM is
that it is more resilient to noise by comparison to FCM, and after taking the above
guidelines into consideration, the membership values it finds are more intuitive by
human perception standards.
Except in the case where a data point coincides with the location of a prototype, degrees
of membership found by FCM are never either 0 or 1. This is so even when a point is
very close to a prototype. The reason for this is the “sharing” constraint of Equation 3.2
imposed on the FCM optimisation problem. This constraint leads to update Equation
3.4 from which we can see that a membership value will never be zero, since it is a ratio
of distances. This peculiarity causes core points of a cluster to receive membership
values of less than one, even though we would clearly see them as being typical of the
cluster.
Approaches of the “High Contrast” kind, though not developed fully in [Rousseeuw
et al., 1995; Pei et al., 1996], aim to classify clear-cut, core, points in a crisp manner,
while leaving other points to still be classified in a fuzzy manner.
In [Rousseeuw et al., 1995], the u2ik term in the objective function is replaced by
f (uik ) = uik + (1 )u2ik where 0 < < 1 is termed a contrast factor. When = 0,
f (uik ) = u2ik which gives a fuzzy solution identical to standard FCM. When = 1,
f (uik ) = uik which gives a crisp solution identical to standard HCM. Varying be-
62
3.4. MODIFICATIONS TO THE FCM MODEL 63
tween 0 and 1 changes the “contrast” of the clustering results from none whatsover
(fuzzy) to full (crisp). Rousseeuw et al. conclude empirically that : is a good
= 03
value to set the contrast factor. However, the general case of m 6= 2:0 was not men-
tioned in the paper, nor were the differences between their approach and dynamically
varying m stated.
The CA algorithm [Frigui & Krishnapuram, 1997] was proposed as a robust successor
to FCM attempting to remedy several of its shortcomings. First, it requires only a
maximum number of clusters as input rather than the exact number, it will then find
the “correct” number of clusters itself. It does so by first partitioning the data set
into the given (large) number of small clusters. As the algorithm progresses, adjacent
clusters compete for data points and the clusters that lose the competition gradually
become depleted and vanish.
The CA algorithm minimises the following objective function, noting that is dy-
namically updated by the algorithm:
XX
N XX
N
J (P ; U ) = u2ik k xk pi kA2
[ uik ℄2 (3.11)
i=1 k=1 i=1 k=1
subject to
X
uik = 1 8k 2 f1; : : : ; N g (3.12)
i=1
The objective function has two components. The first component is similar to the
FCM objective function (m : while the second component is the sum of squares
= 2 0)
of the fuzzy cardinalities 2 of the clusters. The global minimum of the first component
is achieved when the number of clusters is equal to the number of samples N , i.e.,
2
The cardinality of a cluster is the number of points belonging to it; the fuzzy cardinality of a cluster
is the sum of the memberships of all points with it.
63
3.4. MODIFICATIONS TO THE FCM MODEL 64
each cluster contains a single point. The global minimum of the second component
is achieved when all points are grouped into one cluster, and all other clusters are
empty. Based on the premise that is chosen properly the final partition resulting from
this algorithm will find compact clusters while at the same time finding the smallest
possible number of clusters.
Noise points, i.e., points that do not lie close to any particular cluster are not distin-
guished as such by FCM. They share memberships with all clusters just like all points
even though we may clearly identify them as not belonging to any cluster. Noise points
affect the accuracy of the FCM algorithm.
The credibilistic fuzzy c-means algorithm was proposed by Chintalapudi and Kam
[Chintalapudi & Kam, 1998] to combat FCM’s sensitivity to noise points. Their re-
quirement was to assign to noise points low membership values with all clusters. In
this way, noise points will not affect the location of the prototypes.
X
uik = k 8k
i=1
64
3.5. REMARKS 65
eters. However, the authors suggest that their algorithm performs well in most cases
using the default values for the parameters.
3.5 Remarks
In this Chapter, we reviewed in detail the Fuzzy c-Means clustering model, and we
also briefly reviewed some of its extensions and modifications. We explained that a
lot of the algorithms mentioned in this Chapter were motivated by one or more of the
shortcomings we listed in Section 3.2.7. In the next Chapter, we will focus only on
FCM’s inability to perform accurately on data sets containing clusters close to one
another but not equal in size or population. Since we have only mentioned a few of the
large body of algorithms based on FCM, we conclude with a quick look at two threads
of fuzzy clustering research we did not include in our review.
The first thread of research is concerned with finding the optimal number of clusters
in the data. This problem is continually being addressed in the literature. The first
approach is to validate fuzzy partitions obtained at different values of by means of
an index, and then selecting the value of corresponding to the partition that scored
best on the index. In comparison to many indices, the Xie-Beni index [Xie & Beni,
1991] performs best (as studied in [Pal & Bezdek, 1995]), though there are some new
competitors [Kwon, 1998; Rezaee et al., 1998]. Further, there have been attempts
to integrate the validation step into the FCM clustering process such as the validity-
guided clustering method of Bensaid [Bensaid et al., 1996]. The second approach is
to fuse an agglomeration process with the clustering process, starting at a reasonably
high value for . Section 3.4.3 already described an algorithm of this type. Another
recent algorithm is that of [Geva, 1999] which fuses hierarchial clustering with fuzzy
clustering.
The second thread of research is concerned with making FCM more robust by
enhancing its response to noise points. We have already discussed one such algorithm
65
3.5. REMARKS 66
in Section 3.4.4 which addressed that point, however the recent and still developing
work by [Davé & Krishnapuram, 1997; Frigui & Krishnapuram, 1999] should also be
highlighted. These aim to use statistical methods such as the M estimator and weighted
least-squares technique to supplement the objective function.
66
CHAPTER 4
In the previous Chapter we described the FCM algorithm and detailed several algo-
rithms based on it.
67
4.1. THE EXPERIMENTAL FRAMEWORK 68
d x1,y1
p 1 x1
1
y1
Dot Pattern
Generator
d1 y2
d
2
p
2 d x2
2 x2,y2
Figure 4.1: In this two-clusters example, the inputs to the dot pattern generator are:
the populations, p1 and p2 , the diameters, d1 and d2 , of each cluster, and the central
locations of each cluster, (x1 ; y1 ) and (x2 ; y2 ). A clustering algorithm should now
attempt to match this description of the clusters by examining only the dot pattern.
Assume that we have a dot pattern generator that generates clusters of points in a given
p-dimensional feature space, Rp . Assume, further, that the points of every cluster are
distributed uniformly around that cluster’s centre-point. This assumed generator will
require as input a number of parameters. First, the number of clusters we want to have
in the dot pattern. Second, the central location of each cluster. Finally, for each cluster
its population and diameter. We define the diameter of a cluster as the diameter of a
hyper-sphere (or a circle in 2D) that contains the entire population of the cluster. This
is illustrated in Figure 4.1.
The test for any clustering algorithm would be to produce an accurate description
of the clusters present in the dot pattern, given only the dot pattern and no other in-
formation. Since the clustering structure of the dot pattern is already known, accuracy
of the clustering can be computed by comparing the known description to the one dis-
covered by the clustering algorithm. This is illustrated in Figure 4.2. Thus, for the
example in Figure 4.1, we would ideally like any clustering algorithm to output the in-
68
4.1. THE EXPERIMENTAL FRAMEWORK 69
Clustering
Dot Pattern Accuracy
Parameters Data Structure
Algorithm Evaluation
Generator
formation: number of clusters is two; the locations of the prototypes of the two clusters
are (x1 ; y1 ) and (x2 ; y2 ); the diameters of the two clusters are d1 and d2 respectively;
as well as a classification of the points from which we can calculate the populations p1
and p2 .
The generator we have described above is ideal for objective-function (OF) meth-
ods that minimise the aggregate distances between data points and suggested proto-
types. This type of methods, as discussed earlier, search for hyper-spherical clusters
of points (assuming the Euclidean distance metric). Prototype-based, sum-of-squared-
error objective function methods like FCM should perform with maximum accuracy
because the generated data consists of hyper-spherical clusters.
1
the fuzzy cardinality of a cluster is the sum of all its membership values
69
4.1. THE EXPERIMENTAL FRAMEWORK 70
A further test for any clustering algorithm is to find the correct solution every time it
is run. Since the results of some clustering algorithms may depend on the initialisation,
the correct solution should be found irrespective of the initialisation. Otherwise, the
algorithm would not be suitable for non-expert use. This challenge, however, can be
assumed to be of less priority than the other challenges since it depends also on the
optimisation scheme used.
Realistically, we know that most clustering algorithms will not be able to pass
all these tests successfully. For example, most objective-function-based algorithms
require the number of clusters, , as an input parameter beforehand. They thus fail test
I. This shortcoming is not addressed in this dissertation, as instead we assume that the
correct number of clusters has been estimated beforehand. If no such estimate exists,
the common way of handling this shortcoming is validating the solutions resulting
from different values of and choosing the best one [Windham, 1982; Gath & Geva,
1989; Pal & Bezdek, 1995].
Objective-function-based methods can deliver on tests II, III, and IV. Their perfor-
mance on these tests, however, may seem ad-hoc, for they can find the correct solution
in one run but fail to do so in another. The reason is that OF-based algorithms are
70
4.1. THE EXPERIMENTAL FRAMEWORK 71
iterative and locally optimal, and therefore produce results that depend on their initial-
isation. Unless an exhaustive or an (as-yet undiscovered) analytical solving method is
used, different solutions may be found. Therefore, short of initialising these algorithms
identically each time they are run, obtaining the same correct solution should not be
expected. Thus, in order to measure an algorithm’s accuracy on any of the three tests
above, we need to use identical initialisation. This initialisation should be favourable
to finding the correct solution by being close to it. If an algorithm now fails a test, we
will know that it cannot ever find the correct solution starting from a near-correct one.
Turning our attention now to tests II, III, and IV, we observe that within our dot
pattern generator framework, we can vary three sets of variables:
In the next Section we will describe how we used these three variables in a two-
dimensional two-cluster setting to generate a suite of synthetic data sets. Our aim is to
construct a benchmark covering many of the data sets that could be encountered within
this setting and then to see if an OF algorithm like FCM will pass tests II, III, and IV
on each of the data sets in the benchmark suite. We should note here that while the
framework as described above is ideal for squared-error-type prototype-based methods,
its basic structure is valid for other types of methods.
We now use our dot pattern generator to generate a suite of two-dimensional two-
cluster data sets. Thus, 6 variables must be set: the two centre-points at (x1 ; y1) and
(x2 ; y2 ), the diameter and population of cluster 1 (the lhs cluster) d1 and p1 , and the
71
4.1. THE EXPERIMENTAL FRAMEWORK 72
diameter and population of cluster 2 (the rhs cluster) d2 and p2 . If we do not consider
overlapping clusters and sample the range of possibilites, a suite of data sets that covers
many cluster configurations can be generated.
The Centre-Points
The Populations
We now have to consider that p1 and p2 can vary. Our approach has been to fix a
minimum value for the population of any cluster in any of the data sets: pmin . Then, to
use configurations where clusters have populations that are whole number multiples of
pmin . Using this new scale, we rename p1 and p2 to P1 and P2 respectively. We chose
to limit the range of both P1 and P2 to 1 to 20. A configuration with P 1 : P 2 = 1 : 20
indicates that the lhs cluster has the minimum population while the rhs cluster has
twenty times that population. To reduce the number of data sets generated, we sampled
the range of P 1 and P 2 at 1, 10, and 20 only. Thus, there will be 32 = 9 population
configurations. These are (in the form of P1 : P2 ): 1 : 1, 1 : 10, 1 : 20, 10 : 1, 10 : 10,
10 : 20, 20 : 1, 20 : 10, and 20 : 20.
The Diameters
With regards to d1 and d2 , both can have values from zero to infinity — a range that
has to be sampled. Let us choose to sample the distance between (0; 0) and (1; 0) 20
72
4.1. THE EXPERIMENTAL FRAMEWORK 73
times and to restrict the diameters to those 20 levels. Thus, a cluster with diameter-
level 10 will touch the point :;
(0 5 0) and one with diameter-level 20, will touch the
other’s centre-point. Using this normalised scale, d1 and d2 are renamed to D1 and D2
respectively, where the latter set has discrete-level units.
In order to not lose focus of our goals, we leave further details of the generation
of the data suite to Appendix A. We will say at this juncture that all in all 900 data
sets were generated in correspondence to the various combinations of populations and
diameters available.
Table 4.1 shows the values we used in our actual generated suite of data sets. Figure
4.3 illustrates some examples of the 900 data sets generated.
x1 y1 pmin dmin
0 0 300 2 0:05
x2 y2 pmax dmax
1 0 6000 2 0:95
Table 4.1: Parameters used to generate the suite of data sets.
The data points of each cluster were generated within a circle centred at the points
stated above. The area of each circle is divided into 10 shells of equal areas. The
population of a shell, i.e., the number of points inside it, is the result of dividing the
total population of the cluster by the number of shells. For each point, two polar co-
ordinates (r; ) were picked from a random number generator of uniform distribution.
These coordinates were then transformed to Cartesian coordinates.
73
4.1. THE EXPERIMENTAL FRAMEWORK 74
P1 : P2 = 1 : 1 and D1 : D2 = 2 : 4 P1 : P2 = 10 : 10 and D1 : D2 = 10 : 10
1 1
0 0 0 0
0 0
P1 : P2 = 10 : 20 and D1 : D2 = 3 : 17 P1 : P2 = 20 : 10 and D1 : D2 = 3 : 17
1 1
0 0 0 0
0 0
P1 : P2 = 1 : 20 and D1 : D2 = 5 : 10 P1 : P2 = 20 : 1 and D1 : D2 = 5 : 10
1 1
0 0 0 0
0 0
Figure 4.3: Samples of the benchmark data suite. The population and diameter settings
for a pattern are located at the top of its plot.
74
4.2. THE BEHAVIOUR OF FCM 75
Having established a framework and designed our suite of benchmark data we will
now examine the behaviour of the FCM algorithm. We will first present the results
of FCM clustering of the benchmark data, then, we will discuss the performance of
FCM and its behaviour. In Section 4.3, we will present our new Population Diameter
Independent (PDI) algorithm and its results on the same benchmark data.
FCM was run on the full 900 data sets described in Section 4.1.2. In Figures 4.4 and
4.5 samples of the 900 clustered sets are shown. The prototypes found by FCM are
marked out with arrows. Also, the points are classified according to the max rule which
specifies that a point is classified according to its maximum degree of membership.
FCM was run with m set at 2.0 and the initial prototypes placed at ( : ; :
0 05 0 05) and
: ; :
(1 05 0 05), i.e., at positions which are very close to the ideal positions. It is not our
aim here to test FCM’s shortcoming of getting entrapped in local solutions. Our aim is
to see if the ideal solution can indeed be an FCM solution.
We can clearly see from Figures 4.4 and 4.5 that FCM’s performance is affected by
the relative widths of the clusters and by their relative populations. We can also see that
in some cases gross misclassification has occured. Since the prototype initialisation
was very favourable (by being very close to the correct locations), we can deduce that
in these cases placing the prototypes at the correct locations is not a minimal solution
for the OF of FCM.
Let us first provide a summary of the FCM results. To achieve this, we need to
decide on our accuracy measures. There are potentially three different measures of a
given clustering algorithm’s accuracy within our framework. They are:-
75
4.2. THE BEHAVIOUR OF FCM 76
0 0 0 0
(a) (d)
FCM results for P1 : P2 = 10 : 10 and D1 : D2 = 5 : 11 FCM results for P1 : P2 = 10 : 10 and D1 : D2 = 8 : 10
0 0 0 0
(b) (e)
FCM results for P1 : P2 = 10 : 10 and D1 : D2 = 5 : 13 FCM results for P1 : P2 = 10 : 10 and D1 : D2 = 8 : 12
0 0 0 0
(c) (f)
Figure 4.4: FCM clustering of synthetic dot patterns with two colours representing the
two found clusters. Prototypes are marked out by the dotted blue lines. P1:P2 ratio
fixed at 10:10. D2 is varied while D1=5 for column (a), (b), and (c), and D1=8 for
column (d), (e), and (f).
76
4.2. THE BEHAVIOUR OF FCM 77
0 0 0 0
(a) (d)
FCM results for P1 : P2 = 10 : 10 and D1 : D2 = 10 : 10 FCM results for P1 : P2 = 1 : 20 and D1 : D2 = 5 : 10
0 0 0 0
(b) (e)
FCM results for P1 : P2 = 10 : 1 and D1 : D2 = 10 : 10 FCM results for P1 : P2 = 20 : 1 and D1 : D2 = 5 : 10
0 0 0 0
(c) (f)
Figure 4.5: FCM clustering results. D1:D2 fixed at 10:10 for column (a), (b), and (c),
and 5:10 for column (d), (e), and (f). The P1:P2 ratio is varied.
77
4.2. THE BEHAVIOUR OF FCM 78
3. and, also how well it performs in finding the correct populations of the clusters.
In FCM’s case and with this type of symmetrical cluster, the three measures are not
all required. If FCM fails in finding the centre-point, the other two measures become
misleading. So, our first priority will be, for every data set, to see how far off each of
the found prototypes is from its correct location. We decided that, for a given data set,
the maximum of the two prototype offsets will be our measure of accuracy.
Defining e1 as the distance between p 1 and (x1 ; y1 ) (where p 1 is the closest found-
prototype to (x1 ; y1 )), and e2 similarly, we can define the maximum prototype offset,
e, as:
e = max(e1 ; e2 )
where e1 =kp 1 x ; y 1) k
( 1 and e2 =k p2 ( 2x ; y 2) k
In Figure 4.6 we plotted e against D 2 for all nine population configurations, while
fixing D 1 at 1. Each curve represents a constant ratio of proportions. We note that apart
from population configurations where P1:P2 = 1:10 and 1:20, the curve proceeds in a
somewhat uniform upward trend. However, for the aforementioned configurations, the
curve takes a very steep upward climb and then slowly falls afterwards. In both these
configurations cluster 2 becomes very large by comparison to cluster 1. This largeness
is twofold: both in diameter and in population. Thus, cluster 1’s prototype moved
toward cluster 2 while cluster 2’s prototype moved towards the right side of its own
cluster. This is illustrated in Figure 4.7.
As cluster 2 became larger, p 1 was “drawn” towards it and took large steps in that
direction. This explains the steep climb. However, after a certain point, the diameter of
cluster 2 extended into the middle region between the two clusters, towards cluster 1,
thus, p 1 moved back again towards the left side of the graph, causing the decline in e.
As D2 went through and past the middle towards cluster 1, p 1 followed it progressively
78
4.2. THE BEHAVIOUR OF FCM 79
P1:P2=10:10
P1:P2=10:20
P1:P2=20:10
P1:P2=20:20
P1:P2=1:10
P1:P2=1:20
P1:P2=10:1
P1:P2=20:1
P1:P2=1:1
19
18
17
16
15
14
13
12
D1 = 1
11
D2
10
9
8
7
6
5
4
3
2
1
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Max Prototype Offset
Figure 4.6: Plot of e against D 2. D 1 = 1. All nine population configurations are
shown. Each curve has a constant population ratio.
79
4.2. THE BEHAVIOUR OF FCM 80
0 0 0 0
(a) (b)
FCM results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 11 FCM results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 13
0 0 0 0
(c) (d)
back towards the left hand side, albeit with a large margin of error.
The upward-trend curve of the other population configurations (i.e., those exclud-
ing 1 : 10 and 1 : 20) can be explained that in these configurations, cluster 2 never be-
came as large or “dominant” as the two other cases in terms of population. Thus, there
was less requirement for p 1 to move into cluster 2’s territory. However, as D 2=D 1 got
bigger, the error worsened proportionally.
80
4.2. THE BEHAVIOUR OF FCM
Figure 4.8: Plot of e against D 2.
D1 = 5
0.8
0.7
0.6
Max Prototype Offset
0.5 P1:P2=1:1
D1 = 5. Each curve has a constant population ratio.
P1:P2=1:10
0.4 P1:P2=1:20
P1:P2=10:1
81
P1:P2=10:10
0.3
P1:P2=10:20
P1:P2=20:1
0.2
P1:P2=20:10
P1:P2=20:20
0.1
0
5 6 7 8 9 10 11 12 13 14 15
D2
81
4.2. THE BEHAVIOUR OF FCM 82
From the results above, we can deduce preliminarily that as long as the separation
between clusters is high, FCM will not have a problem in identifying the output of the
pattern generator. Once one of the clusters extends into the middle region between the
two centre-points, FCM will produce very bad results. The question of the ratio of
the populations of the clusters plays a role in these diameter configurations and makes
82
4.2. THE BEHAVIOUR OF FCM
Figure 4.9: Plot of e against D 1 for P 1 : P 2 = 1 : 20. Each curve has a constant D 2.
P1 : P2 = 1 : 20
0.8
0.7 D2=1
D2=2
D2=3
0.6 D2=4
D2=5
Max Prototype Offset
D2=6
0.5 D2=7
D2=8
D2=9
83
0.4 D2=10
D2=11
D2=12
0.3 D2=13
D2=14
D2=15
0.2 D2=16
D2=17
D2=18
0.1 D2=19
0
1 2 3 4 5 6 7 8 9 10
D1
83
Figure 4.10: Plot of e against D 1 for P 1 : P 2 = 20 : 1. Each curve has a constant D 2.
0.8
D2=1
0.7 D2=2
D2=3
D2=4
0.6
D2=5
Max Prototype Offset
D2=6
0.5
D2=7
D2=8
0.4
84
D2=9
D2=10
0.3 D2=11
D2=12
0.2 D2=13
D2=14
0.1 D2=15
D2=16
0 D2=17
D2=18
1 2 3 4 5 6 7 8 9 10
D2=19
D1
84
4.2. THE BEHAVIOUR OF FCM 85
0 0 0 0
(a) (b)
the error severer. FCM, effectively, lets clusters with larger populations and larger
diameters dominate its solution.
XX
N
Minimise JF CM (P ; U ; X ; ; m) = u
( ik )
m
d2ik (xk ; pi )
i=1 k=1
X
uik = 1 8kjk 2 f1; : : : ; N g:
i=1
The constraint forces a point’s membership with a prototype to take into account
85
4.2. THE BEHAVIOUR OF FCM 86
Therefore, points that lie very close to a prototype take memberships of almost zero
with the other prototypes. However, points lying in the middle between two proto-
types will take membership degrees that are close to 0.5. In this way they add to
both prototypes’s OF contributions. If one prototype can be moved to a position that
will “neutralise” these midway points without incurring much penalty from its former
neighbourhood, it will be moved. This is because the new location would be close to
the optimal solution of the OF.
In the next Section we will attempt to visualise the shape of the OF of FCM. This
will help us to explain the sensitivity of FCM to the middle region between the two pro-
totypes. As we observed in the previous Section, when one of the clusters approaches
the diameter level of 7, FCM’s accuracy deteriorates significantly.
We now wish to visualise the shape of the OF of FCM. As before, we assume that there
are two cluster prototypes in a two-dimensional feature space. The left-hand prototype
is placed at the origin of the coordinate system, ;
(0 0). The right-hand prototype is
placed at coordinate (1; 0). Assuming now that a given data point is placed anywhere
86
4.2. THE BEHAVIOUR OF FCM 87
Jk
0.3
0.25
0.2
0.15
0.1
0.05
Normalised Distance
-1 -0.5 0.5 1 1.5 2
x
Figure 4.12: FCM (m = 2): Plot of point k ’s OF contribution, Jk , against k ’s x
x
position. k is constrained to move along the x-axis only. The prototypes are located
at (0; 0) and (1; 0).
in this 2D feature space, and given an OF, we can calculate the contribution of this
point to the OF.
Let us assume that we denote the OF contribution value for a data point, xk , by
Jk . First, let us constrain xk to be located along the x-axis. In Figure 4.12, we plot
xk ’s contribution to the FCM OF versus its location along the x-axis. We left the
mathematical derivations of the equation for the curve to Appendix B. From Figure
4.12 we observe each prototype has appropriated symmetrically a region of low cost
around it. In the middle between the two prototypes, there is a local peak. A point
placed at exactly half-way between both prototypes costs the most amongst points
lying between the prototypes. Furthermore, as a point heads away from the prototypes,
its cost rises steeply.
Now we allow the location of xk to move freely in the 2D space. Thus, we can
plot contour lines around the prototypes; points lying on these contour lines contribute
identical values towards the OF. Such a contour plot is illustrated in Figure 4.13. We
observe again that FCM creates symmetrical contours around the prototypes. As a
generalisation of Figure 4.12 in 2D, we observe that the rate at which contributions
change in the “valleys” around each prototype is less than further afield. Once again,
87
4.2. THE BEHAVIOUR OF FCM 88
0.5
-0.5
-1
-0.5 0 0.5 1 1.5
we left the mathematical derivations to Appendix B. Based on the contour plot we can
see the shape of ideal clusters for FCM, and we can guage how well it will perform
given any particular constellation of points.
A point of note is that if we were to integrate the area under the curve between
;
(0 0) and ;
(1 0) in Figure 4.12, what would that represent? It would represent the
total contribution of a continuous line of data points along the x-axis between the two
prototypes. Let us now work out the bounds of a region centered around the mid-point
which would cover only half of the computed area under the curve. The significance
of this computation would be to find the region, along the line between the prototypes,
that contributes as much as the remaining parts of the line. The bounds, as worked out
in Appendix B, are the points (0:38; 0) and (0:62; 0). In our benchmark data suite, these
approximate to values for either D1 or D2 of between 7 and 8. This is confirmed by our
88
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 89
results as previously presented. This calculation also shows that there is a relatively
narrow region of width 0.24 centred around the half-way point which “costs” FCM
twice as much as either region to the side of it.
When we first investigated FCM, we focused on its inability to cluster accurately when
the data set contains one cluster which is highly populated in comparison to the other
(in a two-cluster case). Thus, we thought of dividing each cluster’s contribution to
the objective function by its population. This way, the new ratio of one cluster’s con-
tribution to another would not be as disproportionate as the old one. In other words,
the lightly-populated cluster’s contribution would be increased, and that of the highly-
populated one decreased.
However, upon further study, as evidenced above, we concluded that as well as the
populations problem, there is also another problem. This occurs when there is a sharp
difference in the spans of the clusters (represented by diameters in our experiments)
and the larger cluster’s span reaches into the middle region between the two. The
diameters problem can either be compounded or alleviated by the populations problem
depending on the populations-ratio and which cluster it favours. Thus, we concluded
that the effects of population and diameter are correlated and it would not be easy to
compensate for their effects separately. We found it is more precise to talk about the
“relative contributions” of clusters
Obviously, FCM’s objective function does not account for these effects. This is
why we introduced the Population-Diameter Independent (PDI) Algorithm. The main
idea behind our new algorithm is to normalise the cluster contributions found in the
FCM objective function. Thus, in PDI’s objective function, we divide each cluster’s
(FCM) contribution by a number that should represent the strength of the contribution.
The result of the division would give the cluster’s new (PDI) contribution.
89
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 90
clusters.
and
X
i = 1 (4.3)
i=1
From the above formulation we can derive an algorithm to achieve a minimal so-
lution. This is effected by means of using the Lagrange multiplier method, setting
the differentials to zero, obtaining the update equations for each variable, and then us-
ing the Picard successive substitution strategy, as was used with FCM. We leave the
derivation of the update equations to Appendix C and now only state them.
90
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 91
=d2ik )1=m 1
( ri
uik = P ; (4.4)
i=1 (i =dik )
r 2 1=m 1
and
p
PN m
k=1 uik k
P ;
x
i = (4.5)
N
k=1
m uik
and
PN 1
u d
m 2 r+1
i = P
[ k=1 ( ik ) ik ℄
PN 1 : (4.6)
i=1 [ u
k=1 ( ik ) ik d
m 2 ℄ r+1
We note that the optimality condition for i has intuitive meaning; it is a ratio of
cluster i’s contribution to the sum of all the clusters’ contributions. The equations also
confirm that, as with the OF, setting r =0 collapses PDI to FCM.
We start with Figure 4.14, the plots can be compared directly to those of FCM
shown in Figure 4.4. Through visual assessment, we can observe a great overall im-
provement in clustering accuracy. The data sets of Figures 4.14(a),(b),(d), and (e) were
clustered perfectly. Figures 4.14(c) and 4.14(e) were not, however, compared to FCM,
PDI’s performance is a great improvement.
91
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 92
0 0 0 0
(a) (d)
PDI results for P1 : P2 = 10 : 10 and D1 : D2 = 5 : 11 PDI results for P1 : P2 = 10 : 10 and D1 : D2 = 8 : 10
0 0 0 0
(b) (e)
PDI results for P1 : P2 = 10 : 10 and D1 : D2 = 5 : 13 PDI results for P1 : P2 = 10 : 10 and D1 : D2 = 8 : 12
0 0 0 0
(c) (f)
Figure 4.14: PDI clustering of synthetic dot patterns with two colours representing the
two found clusters. Prototypes are marked out by the dotted blue lines. Compare with
the results in Figure 4.4.
92
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 93
0 0 0 0
(a) (b)
PDI results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 13 PDI results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 15
0 0 0 0
(c) (d)
PDI results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 17 PDI results for P1 : P2 = 1 : 20 and D1 : D2 = 1 : 19
0 0 0 0
(e) (f)
Figure 4.15: PDI results as D 2 increases from (a) 9, (b) 11, (c) 13, (d) 15, (e) 17, to (f)
19. D 1 is fixed at 1. This is for P 1 : P 2 = 1 : 20.
93
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 94
we observe how PDI copes extremely well with increasing D 2 incrementally. It is only
in the difficult case of D 2 = 19 that PDI’s accuracy is compromised.
Figure 4.16 continues with more PDI results, the plots can be compared to those
of FCM shown in Figure 4.5. In column (a), (b), and (c) of Figure 4.16, the case
of two equal-sized, touching clusters (D 1 : D2 = 10 : 10) is tested with changing
population ratios. Here we observe an interesting behaviour of PDI: it finds a cluster
within a cluster. This behaviour is also observed in Figure 4.16(f) where population
ratios are varied while the diameters remain fixed at D 1 : D2 = 5 : 10. This anomaly
of finding a cluster within a cluster is due to the light density of one of the clusters as
compared to the other. Because of the light density, the contribution is weak and thus
the corresponding cluster-normaliser takes a low value. This in turn marks a smaller
region of influence for the cluster prototype. We explain this in more detail in Section
4.4.
We now plot the improvement of PDI over FCM in a summarised manner, as cor-
responds Figures 4.6—4.10. In these plots, we use eF CM eP DI as our measure of
PDI’s improvement on FCM.
We start with all data sets with a D 1 = 1 configuration and plot the improvement
in Figure 4.17. The plot resembles almost exactly that of eF CM in Figure 4.6. Thus,
it confirms that PDI effectively equalises disproportionate objective-function contribu-
tions for configurations of D 1 = 1.
Comparing Figure 4.19 to Figure 4.9, where we fix the population ratio at P1 :
94
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 95
0 0 0 0
(a) (d)
PDI results for P1 : P2 = 10 : 10 and D1 : D2 = 10 : 10 PDI results for P1 : P2 = 1 : 20 and D1 : D2 = 5 : 10
0 0 0 0
(b) (e)
PDI results for P1 : P2 = 10 : 1 and D1 : D2 = 10 : 10 PDI results for P1 : P2 = 20 : 1 and D1 : D2 = 5 : 10
0 0 0 0
(c) (f)
Figure 4.16: PDI clustering: D1:D2 fixed at 10:10 for column (a), (b), and (c), and
5:10 for column (d), (e), and (f). The P1:P2 ratio is varied. Compare with the results
in Figure 4.5.
95
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 96
P1:P2=10:10
P1:P2=10:20
P1:P2=20:10
P1:P2=20:20
P1:P2=1:10
P1:P2=1:20
P1:P2=10:1
P1:P2=20:1
P1:P2=1:1
19
18
17
16
15
14
13
12
D1 = 1
11
D2
10
9
8
7
6
5
4
3
2
1
0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
96
population ratio.
Figure 4.18: Plot
0.8
eF CM
0.7
0.6
eP DI against D2. D1
0.5
P1:P2=1:1
P1:P2=1:10
0.4
P1:P2=1:20
P1:P2=10:1
97
0.3 P1:P2=10:10
P1:P2=10:20
0.2 P1:P2=20:1
P1:P2=20:10
= 5.
P1:P2=20:20
0.1
Each curve has a constant
0
5 6 7 8 9 10 11 12 13 14 15
-0.1
-0.2
D2
97
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 98
D2=10
D2=11
D2=12
D2=13
D2=14
D2=15
D2=16
D2=17
D2=18
D2=19
D2=1
D2=2
D2=3
D2=4
D2=5
D2=6
D2=7
D2=8
D2=9
10
9
8
7
P1:P2=1:20
D1
5
4
3
2
1
0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
PDI improvement on FCM
Figure 4.19: Plot of eF CM eP DI against D1 for P 1 : P 2 = 1 : 20. Each curve has a
constant D 2.
98
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 99
P 2 = 1 : 20, we observe that whereas PDI effectively corrects FCM for values of D1
less than 6, its performance declines afterwards. However, PDI still retains a margin
of improvement over FCM for values of D 1 > 6. The decline in performance is due to
the fact that at D 1 > 6 the LHS cluster becomes of such light contribution that correct
placement of its prototypes would necessitate a small value for the corresponding nor-
maliser, thus the prototype moves towards the left and PDI identifies only a subsection
of the cluster.
Finally, comparing Figure 4.20 to Figure 4.10, where we fix the population ratio at
P 1 : P 2 = 20 : 1, we observe that the plot follows the same trend as FCM’s except
that it ventures below zero for values of 3 D 1 6. This is the same behaviour as
mentioned above. Once again, we note that the margin of error is not great and that for
most cases PDI effectively corrects for FCM shortcomings.
99
4.3. POPULATION-DIAMETER INDEPENDENT ALGORITHM 100
D2=10
D2=11
D2=12
D2=13
D2=14
D2=15
D2=16
D2=17
D2=18
D2=19
D2=1
D2=2
D2=3
D2=4
D2=5
D2=6
D2=7
D2=8
D2=9
10
9
8
7
P1:P2=20:1
D1
5
4
3
2
1
0
0.8
0.6
0.4
0.2
-0.2
-0.4
PDI improvement on FCM
Figure 4.20: Plot of eF CM eP DI against D1 for P 1 : P 2 = 20 : 1. Each curve has a
constant D 2.
100
4.4. OBSERVATIONS ON PDI 101
The avenues of enquiry that PDI opens are quite numerous. In this Section we observe
the shape of the OF of PDI and compare it to that of FCM. We also touch on our
experience with the r exponent, and with PDI’s resilience to different initialisations.
If we move along only the x-axis and plot the Jk curve, Figures 4.22—4.25 show
the variations caused by different values of and r . These can be compared to that of
FCM in Figure 4.12.
same as for FCM: two symmetrical valleys around each prototype. The OF magnitudes
are not, however, directly comparable.
0 : We observe that this causes a thinner valley around the LHS prototype as
= 0 1.
101
4.4. OBSERVATIONS ON PDI 102
0.5
-0.5
-1
-0.5 0 0.5 1 1.5
Jk
1.2
1
0.8
0.6
0.4
0.2
Normalised Distance
-1 -0.5 0.5 1 1.5 2
102
4.4. OBSERVATIONS ON PDI 103
Jk
X axis
-1 -0.5 0.5 1 1.5 2
Figure 4.23: PDI (r = 1, 0 = 0:1): The plot forms a thin valley around the LHS
prototype, thereby giving a wider “scope” to the RHS prototype.
Jk
20
15
10
X axis
-1 -0.5 0.5 1 1.5 2
Figure 4.24: PDI (r = 1:5, 0 = 0:1): Raising r ’s value causes even stronger emphasis
around the LHS prototype, and a much wider scope around the RHS prototype.
103
4.4. OBSERVATIONS ON PDI 104
Jk
0.3
0.25
0.2
0.15
0.1
0.05
X axis
-1 -0.5 0.5 1 1.5 2
Figure 4.25: PDI (r = 0, 0 = 0:1): Despite the low value of 0 PDI’s OF collapses
to FCM’s symmetrically-shaped one because r was set to 0. This plot is equal in
magnitude to that of Figure 4.12.
The exponent of the normalisers plays an important role in how PDI performs. The
higher its value, the sharper the emphasis of the normalisers. The lower its value the
more PDI resembles FCM. In Figure 4.26 we demonstrate the results of applying PDI
at various values of r to a data set similar to those in our suite.
both classes becomes slightly curved, indicating that the normalisers have begun to
have some effect. Beginning at r : we see that PDI classified a subset of the
= 2 4,
small cluster as a cluster of its own. At r = 3:0 only one point in the small cluster
is identified! The small-cluster prototype is placed at the ideal location. This result
indicates that PDI “spotted” the small cluster. However, this result is very sensitive to
the initialisation. Our experience is that if the initialisation is far away from the ideal
104
4.4. OBSERVATIONS ON PDI 105
r = 0.0 r = 0.5
264 264
264 264
120 120
120 120
r = 1.0 r = 2.4
264 264
264 264
120 120
120 120
r = 2.6 r = 3.0
264 264
264 264
120 120
120 120
Figure 4.26: Results on varying r in PDI. r ’s value is labelled at the top of each graph.
r = 0 renders PDI to be FCM. An interesting behaviour happens at r = 2:4.
105
4.4. OBSERVATIONS ON PDI 106
r 1 2
0.0 0.468998 0.531002
0.5 0.444052 0.555948
1.0 0.032148 0.967852
1.2 0.031990 0.968010
1.4 0.037311 0.962689
1.8 0.050938 0.949062
2.0 0.057516 0.942484
2.2 0.063063 0.936937
2.4 0.066393 0.933607
2.6 0.061108 0.938892
2.8 0.000000 1.000000
3.0 0.000000 1.000000
4.0 0.000000 1.000000
Table 4.2: The effect of varying r on 1 and 2 in the data set of Figure 4.26.
In Table 4.2, the different values of r we used are tabulated against the correspond-
ing values for the normalisers 1 and 2 . 1 represents the small cluster. At r = 0,
the normalisers are approximately balanced. At r = 1:0, a steep descent in the value
of 1 is clearly observed and the solution found is the correct one. The ratio of 2 =1
here is about 30. At r = 2:4, the “aperture” of the small cluster begins to narrow and
by r = 2:8 it has become only wide enough for a very small number of points. The
points are located around the ideal location for the prototype. The solution is there-
fore technically correct! However, as mentioned above, this solution is sensitive to
initialisation.
We further observe that at values of r , r 2:8, the results became of doubtful use.
It is clear some form of divergence has occurred. In algorithmic implementations of
PDI such behaviour can be prevented by checking if one of the normalisers is heading
towards an infinitesimally small value.
106
4.5. SUMMARY AND CONCLUSIONS 107
PDI’s performance was achieved. This we can justify on the basis that the normaliser
1 is at a local minimum with respect to its other values as r is varied. So, it would
be interesting to conduct a study on tuning the value of r based on the variation of the
normalisers and finding out if the tuned value correlates with better clustering accuracy.
In the early parts of this Chapter, we established a shortcoming of FCM: its clustering
accuracy drops sharply in situations where there are small clusters lying close to large
ones. We rectified this shortcoming by introducing cluster-strength variables, one per
each cluster, to normalise cluster contributions. In this way, solutions that identify the
clustering structure correctly become optimal - in the eyes of the PDI OF.
The rationale for the weighting mechanism in FCM is to place one prototype in
107
4.5. SUMMARY AND CONCLUSIONS 108
the middle of each group of points in the data set. The rationale for PDI’s additional
weighting mechanism is to allow small clusters to be represented. For FCM, prototype
locations determine membership values and, therefore, the value of the OF. For PDI,
prototype locations are matched with normaliser values and together they determine
the membership values and, therefore, the value of the OF. Thus, normalisers grant a
scope to each prototype that matches the prototype’s relative contribution.
To fully assess this new algorithm, we reported in full its results on a variety of
data sets. We also proved that PDI remedies FCM’s shortcoming. Our new OF has
on the other hand shown a shortcoming of its own. This shortcoming is that it may
over-emphasise small, compact clusters. It is also very sensitive to the value of r .
Our approach in this Chapter has been a fundamental one. We set up an idealised
framework and accordingly designed data sets to test specific hypotheses. We believe
new clustering algorithms, particularly ones derived from or similar to FCM, should
be tested on the specific behavioural properties we raised in this Chapter using our data
sets.
In the next Chapter, we present our experience with the use of clustering for image
analysis.
108
CHAPTER 5
Fuzzy clustering provides a good framework for medical image analysis. The main
advantages of this framework are that it provides a way to represent and manipulate
the fuzzy data contained in medical images, and that it provides flexibility in presenting
extracted knowledge to clinicians and radiologists.
This Chapter discusses the issues involved in the analysis of images in general,
but with particular attention to medical images, using fuzzy clustering methods. Since
segmentation is often considered the main step in the image analysis process, we will
mainly be discussing the segmentation of medical images using clustering.
We first give a brief background on medical imaging and the main medical imag-
ing modalities involved. In Section 5.2, a segmentation framework based on clustering
will be outlined; the decision points within this framework: feature extraction, method,
and post-processing, will be discussed. Continuing on our work in the previous Chap-
ter, in Section 5.3, we describe a synthetic 2D model of cardiac images on which we
compared the performances of FCM and PDI.
109
5.1. MEDICAL IMAGE ANALYSIS 110
Medical imaging has developed exponentially in the past few years in terms of techno-
logical advance and wide-spread use. High-resolution, three-dimensional anatomical
information can now be obtained in a routine manner with magnetic resonance imag-
ing (MRI) and X-ray computer-aided tomography (CT). These two modalities provide
complementary information; CT shows detail of bony structures and some contrast
between hard and soft tissues while MRI shows detail of soft tissue structures, with
almost no detail of bony structures. CT imaging, like all X-ray techniques, exposes the
patient to a dose of X-rays, thus, incurring some health risks. MRI does not expose the
patient to radiation, but uses the magnetic properties of the patient’s tissues to provide
contrast in the image, and as far as we know at present it is completely harmless.
There is strong similarity between “clustering a data set” and “segmenting an image”.
Both these processes share the goal of finding “true” classification of the input. “True”
here depends very much on the application at hand. In general, however, there is a
stronger requirement for accuracy placed on the segmentation process. This is mainly
because while the data processed by clustering methods may not represent a physical
reality, medical images represent physical anatomy.
The general clustering process, because of its exploratory nature, has license to
interpret and may be imprecise. Its main strength is that it is unsupervised, i.e., it does
110
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 111
Feature Clustering
Vectors Structure Post−
Feature Clustering
Image Data Extraction Algorithm Processing Image Description
Figure 5.1: The process of clustering image data for the purposes of segmentation.
not require any training data, and is automatic (requires minimal user interaction).
Segmentation methods on the other hand are not generally required to interpret, but
instead have to be accurate. While many segmentation methods require training data
and are only semiautomatic, automatic methods are welcome since they require no
training effort, or human resources.
Segmenting images using clustering defines three decision points for the process,
as shown in Figure 5.1. The first decision point that arises is: how will we present the
image data to the clustering algorithm? This we have named feature extraction and
we address below. The next decision point is: what algorithm do we choose to run
on the data, and of course, how do we set it up? In response to this, we have already
discussed a variety of algorithms in Chapters 2 and 3 and so we will not discuss this
further in this Chapter. Embedded in any algorithm chosen, will be the question of
choice of distance metric by which to measure the similarity between two constituent
points in the extracted data set. The last decision point is: how do we use the output
of the clustering method? In some cases, all that may be needed is a suitable colouring
scheme or similar human-computer-interaction device so that clinicians (experts) can
use the results easily. In Section 5.2.2, we discuss some of the methods to post-process
the output of fuzzy clustering methods.
Arguably, workers in the field of image analysis have dealt with the above three
questions with increasing sophistication over the past two decades. About twenty
years ago, most researchers made straightforward choices when clustering image data
[Schachter et al., 1979; Mui et al., 1977]. Recent works have delved deeper into
111
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 112
Also, new metrics specifically designed for image data have been proposed. For
example, in [Udupa & Samarasekera, 1996] the notion of “fuzzy connectedness” is in-
troduced as a natural, but computationally complex, measure of distance best-suited to
images. Also, in [Gath et al., 1997] a data-induced measure of distance was introduced
for the purpose of extracting non-convex patterns in images.
The pragmatic idea of carrying out the three steps of Figure 5.1 and then repeating
them in order to produce better results has also been considered in the literature. For
example, in [Bensaid et al., 1996] an automatic evaluation of the segmentation result
is formulated so that based on this evaluation, the process is repeated with a new set of
parameters beginning at the second step.
We now address three ways in which image data may be presented to a clustering
algorithm. These are: using only the voxel intensities, using the voxel intensities and
spatial coordinates, and extracting locality measures from the image data. We have
called this step feature extraction because, in the data analysis framework, clustering
methods work on “feature vectors”.
In general, image data arrive in the form of one or more 2-D, 3-D, or even 4-D
(including time) data lattices containing the image measurements, or intensities. Every
112
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 113
Figure 5.2: An example of three different images obtained by measuring three differ-
ent properties in MR brain imaging. These are, from left to right: PD, T1, and T2
respectively.
cell in the image lattice is called a voxel (or pixel if the image is 2D). In the cases where
there is more than one lattice, each image provides a specific type of measurement.
For example, in MR brain imaging there are usually three images acquired at different
times: T1 and T2 weighted, and proton density PD. This is illustrated in Figure 5.2.
To illustrate how data are organised, assume two equally-sized 3D image lattices
M1 and M2 . The voxels in each of these lattices are accessed via the spatial cartesian
coordinates (x; y; z ). So, if at voxel coordinates (xk ; yk ; zk ), the intensity as measured
on M1 is m1k , then m1k = M1 [xk ℄[yk ℄[zk ℄.
Voxel Intensities
The simplest way to extract image data into a clustering algorithm is to define the
feature-set as the available image measurements. Every spatial location in every image
lattice provides a feature element. These feature vectors are then constructed to serve
as X , the input data set. For example, we construct data set X consisting of two
113
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 114
Figure 5.3: An original tissue density MR image is shown on the left, while its PDI-
clustered segmented version is shown on the right. ( = 4) Only the intensity data was
used. The max rule was used for defuzzification.
xk = (m k ; m k ) 8k 2 f1; : : : ; N g;
1 2
The simplicity of this approach and its sometimes quite accurate results are its main
strengths. Its most common application is when there are several feature images of the
same scene as in MR brain images or CT images [Clark et al., 1994; Clark et al.,
1998; Mansfield et al., 1998]. In such cases, the feature set consists of a given voxel’s
intensity in each image.
Figure 5.3 shows a cardiac MR image of the type we use in our research. Using
pixel intensity as the only feature of the data set, a segmentation of the image into
four regions using PDI (randomly initialised) is shown. The histogram of the image is
shown in Figure 5.4. The placements of the prototypes by PDI is also shown.
114
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 115
150
Count
100
50
0
0 400 800 1200 1600 2000 2400 2800 3200 3600
Intensity Level
Figure 5.4: The histogram of the MR image of Figure 5.3 for different bin sizes. The
vertical lines mark the locations of the found prototypes by PDI.
115
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 116
level. A simple way of addressing this problem is to append to each feature vector in
X (which corresponds to a voxel location) additional features containing that voxel’s
spatial coordinates.
Clustering voxel intensities only, as described above, does not utilise the proximity
relationships that exist between neighbouring voxels. The direct way of taking this
into account is to add features for the spatial coordinates of the voxel.
For example, we construct data set X consisting of five features that correspond to
M1 , M2 , and three spatial cartestian coordinates as follows:
where N is the size of either of the image lattices. Note that we may use a different
coordinate system, like polar or cylindrical, instead of the cartesian one.
The values of the coordinates can be plotted as an image in their own right. Thus,
using the same framework as above, we have the original image lattices plus one or
more lattices containing coordinate information. By visualising things in this manner
we can see that the data set will contain a lot of regularity. Assuming a 2D image,
then we have an x y coordinate system with a single intensity feature, the data set
would be regularised on the grid of x y coordinates and would look like a 3D rugged
terrain. This has influenced the design of special clustering algorithms that have no
general utility beyond this type of data, e.g., mountain clustering [Velthuizen et al.,
1997].
Intensity and spatial coordinate data will almost certainly not share the same units
and range. Thus, it is important to determine the weighting to give to each feature. This
is however a largely empirical exercise. In the image of Figure 5.3, the intensity (tissue
116
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 117
Figure 5.5: An original tissue density MR image is shown on the left, and different
FCM-clustered segmented versions are shown on the right. ( = 3; q = 1:5) The first
segmentation was produced with zero weighting given to the x y coordinates, then a
weighting of 10 was given to x and y , then 20, then 40, and finally a weighting of 60
was used. In the final image the clusters divide the x y space equally between them.
117
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 118
density) values range from 0 to approximately 3700, while the x and y coordinates
range from 0 to 77 only.
One approach to overcome this is to dynamically weight the spatial features and
then choose the value of the weight that minimises a suitable clustering validity crite-
rion [Boudraa et al., 1993]. In this case, the usual clustering validity measures may
not be suitable to make a judgement which is grounded in physical anatomy. However,
they may be useful in guiding the user to choose between different clustering results.
But a further problem lies in the fact that the objects in the image may not cluster in
shapes recognisable by the algorithm, e.g., spheres or ellipsoids.
Locality Measures
In this feature extraction approach, voxel intensity values are supplemented with other
“locality” features. A data point in X will therefore be composed of the intensity values
at the corresponding voxel and other numeric indicators that may be edge- or region-
based. These are usually measured over a small window centered around the voxel.
The histogram of this window region will have such features as mean, mean square
value (average energy), dispersion, skew, and so on. Results from this approach are
empirical and vary from one application to another [Tuceryan & Jain, 1993; Marchette
et al., 1997].
As we have not conducted much research into this approach we can say that whereas
this approach may provide very accurate results, it requires much more experimen-
tation than the above two approaches. There are a lot of studies on novel locality
measures, and while these may be effectively applied to images containing textures,
most medical imaging modalities produce pictures that may not be aptly described by
mixtures of textures.
118
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 119
We now address three ways in which the output of a fuzzy clustering algorithm may
be processed for the purposes of obtaining a segmentation. First, the fuzzy member-
ship images provided by the algorithm can be thresholded to obtain crisp, segmented
images. Second, the fuzzy membership images can be combined to provide image en-
hancement or used for segmentation display. Or, a small knowledge-base can used to
supplement the fuzzy output of the algorithm.
Crisp Segmentation
From the outset, we should say that obtaining crisp membership values from fuzzy
ones involves throwing away information. This is one of the conundrums of fuzzy
logic applications. However, the argument of fuzzy logic proponents is: it is better to
have more information, which may be pared down at some point, than less information,
which may be wrong. Obtaining a fuzzy partition of the image gives us the option
of assessing the fuzziness of the solution before applying the “de-fuzzification” step.
Furthermore, the fuzzy partition provides more information than a crisp one, in case
high-level processing were conducted.
119
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 120
case, the solution is very fuzzy and de-fuzzification may lead to tentative (inaccurate)
results.
Provided a cluster of interest is determined, the memberships with that cluster can be
plotted as a gray-level image. In such a case, maximum membership, 1, may be shown
as white and all other membership values scaled accordingly. Gray-level membership
images can provide good enhancement of an object of interest. Like standard contrast
enhancement techniques which give a bigger dynamic range to a particular section of
the intensity histogram, a fuzzy membership image will emphasise those pixels that
120
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 121
Figure 5.7: The image on the left is a colour-coded segmentation obtained using FCM
( = 3), while the image on the right is its PDI counterpart.
Also, in cases of a small number of clusters (ideally three or four), the member-
ship values of all clusters can be plotted as a colour image. A colour is selected to
represent a cluster and a given membership value is allocated a proportional strength
of that colour. The resulting colour image provides at the very least a neat summary
of the fuzzy output. This is shown in Figure 5.7 where we show both FCM and PDI’s
combined membership image using a colour coding. In these images, the pixels are la-
belled with varying strengths of red, green, blue, depending on their respective cluster
memberships. The dark pixels are, therefore, those whose membership values are not
strongly in favour of any one cluster.
121
5.2. SEGMENTATION AS A PROCESS INVOLVING CLUSTERING 122
them [Rosenfeld, 1984; Krishnapuram & Keller, 1993b; Chi et al., 1996]. Often,
this is done with the purpose of designing an automatic classifier; [Clark et al., 1998]
provides a good example of this type of work.
122
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 123
Having explained how clustering is used in image analysis, in this Section, we provide
a comparison between the performance of FCM and PDI on synthetic images that have
some similarity to the medical images we used in our research. We first describe our
synthetic model, then we present the results of both algorithms.
Class 0
Class 2
Class 1
Figure 5.8: A synthetic image with w = 5. Class 0 is the background, class 1 is the
shell, and class 2 is the inside of the shell.
123
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 124
The shell was given a width, w , which we varied in our experiments. Figure 5.8 shows
an example of one such image with w = 5.
Our methodology will now be to vary w and see its effect on the quality of both
FCM and PDI’s clustering. We measure the quality by counting the number of mis-
classified pixels.
In all our experiments below, we use m = 2, and for PDI r : These values
= 1 5.
were selected in accordance with our experiences from the previous Chapter. We chose
the values 0, 45, and 80 for a, b, and respectively, and the values 45, 35, and 4 for
, , and respectively. These values were arbitrary but selected to test the familiar
problem of close clusters of different sizes (Classes 0 and 1) but this time there is a
third cluster present (Class 2). Class 2 is a relatively compact and well-sepearated
cluster in comparison to the other two. This is evident in Figure 5.9 which shows the
histogram distributions of the synthetic images corresponding to w ; ; ; ; and 11
=3 5 7 9
respectively.
124
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 125
3500 3500
3000 3000
2500 2500
2000 2000
1500 1500
1000 1000
500 500
0 0
-22.5 0 27.5 45 62.5 80 -22.5 0 27.5 45 62.5 80
(a) (b)
3500 3500
3000 3000
2500 2500
2000 2000
1500 1500
1000 1000
500 500
0 0
-22.5 0 27.5 45 62.5 80 -22.5 0 27.5 45 62.5 80
(c) (d)
3500
3000
2500
2000
1500
1000
500
0
-22.5 0 27.5 45 62.5 80
(e)
Figure 5.9: Plots (a), (b), (c), (d), and (e) are the histogram distributions of the synthetic
images corresponding to w =3,5,7,9, and 11 respectively. The columns in each plot
correspond, from left to right, to classes 0, 1, and 2 respectively (background, shell,
and inside of the shell). The height of a column depicts the number of pixels in the
class it represents. The width of a column depicts the intensity distribution of the class.
The background and shell contain a varying number of pixels according to w and have
a wide almost-touching range, but the inside of the shell has a narrow range.
125
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 126
5.3.2 Results
PDI’s segmentation results were a great improvement over FCM’s. This is confirmed
by Table 5.1 which is a comparison between FCM and PDI in terms of classification
accuracy. The visual segmentation results obtained for both FCM and PDI are shown
in Figures 5.10 and 5.11. We observe that FCM performs rather badly at smaller values
of w .
At w = 7, most of the pixels in class 1 are correctly classified by PDI and near-
perfect results are attained at w = 9. FCM continues to struggle. We note that whereas
PDI misclassifies a small section of class 1’s pixels at smaller values of w , by w = 11
(where class 1 is now more populous than class 0), it extends class 1 to cover some of
the noiser points in class 0.
126
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 127
(a1) (a2)
(b1) (b2)
(c1) (c2)
Figure 5.10: The left side column shows FCM results and the right side column shows
PDI results. The top row shows results for w = 3, next is w = 5, and bottom-most is
w = 7.
127
5.3. FCM AND PDI FOR SYNTHETIC IMAGE SEGMENTATION 128
(d1) (d2)
(e1) (e2)
Figure 5.11: The left side column shows FCM results and the right side column shows
PDI results. The top row shows results for w = 9 and the bottom row is for w = 11.
128
5.4. CONCLUSIONS 129
This study shows how the effects of the population and diameter of a cluster affect
clustering algorithms’ performance. FCM would have coped well with this problem if
there was large separation between the intensity ranges of each class. PDI performs
much better at this type of problems because of the inequity between cluster sizes and
populations.
5.4 Conclusions
This Chapter provided a summary of our experience with clustering images for the
purpose of segmentation. We have divided the segmentation-by-clustering process into
three decision phases: feature extraction, clustering, and post-processing. Within the
clustering phase itself there are also decisions to be made about algorithm and distance
metric. We also demonstrated the advantage of PDI over FCM for some synthetic
images. Furthermore, we briefly reviewed the image clustering literature.
Since most clustering algorithms suffer from shortcomings that may affect accu-
racy, it is essential for the user to be aware of the shortcomings of their preferred
algorithm. Some segmentations are impossible to produce using clustering, unless the
right features are extracted to act as input to the clustering algorithm. Thus, empirical
feature extraction plays an important role as will be seen in the next Chapter.
129
CHAPTER 6
This Chapter presents the results of our published work on using fuzzy clustering in a
cardiac imaging application. The aim was to segment and track the volume of the left
ventricle during a complete cardiac cycle. The images used are MR images containing
tissue density and velocity data. Since there is no other published work on analysing
this type of image using fuzzy clustering, our application is a novel one. Our results
may be viewed to be an investigation into the feasibility of this type of research.
The Chapter proceeds as follows. Section 6.1 presents a brief review of the anatomy
and physiology of the cardiovascular system. Section 6.2 describes the type of velocity
(or flow) images we used in this research. Section 6.3 gives the specifics of our appli-
cation and Section 6.4 describes our results in full. The research presented here uses
PDI for clustering.
130
6.1. THE CARDIOVASCULAR SYSTEM 131
The cardiovascular system is responsible for blood circulation in the human body. It
supplies blood to cells throughout the body. Blood acts as a transport medium, where
it transports oxygen from the lungs to the cells and carbon dioxide from the cells back
to the lungs. This circulation of the blood is achieved by a pump — the heart — which
forces the blood through elastic tubes — the blood vessels.
Blood Vessels
The main function of the blood vessels is to carry the blood throughout the body. If
the blood flows away from the heart the blood vessels are called arteries. If the blood
flows to the heart the blood vessels are called veins. The largest artery is the aorta
which is characterised by a number of bifurcations. A third type of blood vessels
called capillaries connect the arteries to veins.
Heart Structure
A schematic diagram of the heart is shown in Figure 6.1. The heart consists of two pairs
of chambers: the left and right ventricles and the left right atria. The ventricles act as
pumps while the atria act as reservoirs. Blood enters the heart from its long journey
around the body through the superior and inferior vena cava into the right atrium. This
blood has very little if any oxygen. Then it passes by the tricuspid valve into the right
131
6.1. THE CARDIOVASCULAR SYSTEM 132
Aorta
Pulmonary artery
Right
Vena atrium
cava Vein
Semilunar Left atrium
valves
Tricuspid
valve Mitral valve
Chordae
tendinae
ventricle. After the right ventricle contracts, the blood is forced through the pulmonary
semilunar valve and into the pulmonary artery. The pulmonary artery splits into the
right and left pulmonary artery where the still oxygen-deficient blood travels through
the lungs. The blood becomes enriched with oxygen and travels back toward the heart.
The blood enters the heart via the right and left pulmonary vein which come directly
from the lungs. The blood then enters the left atrium. The bicuspid valve opens up
and the blood falls into the left ventricle. The ventricle contracts and the blood goes
rushing passed the aortic semilunar valve and into the aorta which is the largest artery
132
6.1. THE CARDIOVASCULAR SYSTEM 133
in the body. Now the blood is on its way back to the body.
The walls of the ventricles are composed of muscular tissue and form what is known as
the myocardium. During the cardiac cycle, the myocardium contracts, pumping blood
out of the ventricular chambers and through the semilunar valves. The myocardium’s
inner surface is called endocardium while the outer surface is called epicardium.
In normal conditions the human heart beats between 65 and 75 times per minute.
Each heart beat corresponds to an entire cardiac cycle which can be characterised by a
contraction phase (systole) and a relaxation phase (diastole) of the atria and ventricles.
The systole can be divided into two phases. In the first phase the atrioventricular valves
close, the ventricular muscle starts to contract, and the ventricular pressure increases
due to the closed artery valves. At this stage the volume does not change and the phase
is referred to as iso-volumetric contraction. In normal conditions this phase lasts for
60ms. In the second phase, the artery valves open due to the increased pressure, the
ventricular muscles contract and the ejection starts. Normally, the left ventricle ejects
only half of its volume of ca. 130ml as stroke volume into the aorta. At the end of this
phase a rest volume of ca. 70 ml remains in the ventricle, and the arteries valves close.
Similarly to systole, diastole can also be divided into two phases. During the first
phase of the relaxation all valves are closed and the relaxation is iso-volumetric. The
ventricular pressure drops rapidly. During the second phase the valves separating atria
and ventricles open and the ventricles are filled first rapidly and then more slowly. The
ventricular pressure increases slightly. Then the cardiac cycle starts again.
Quantitative Measurements
There are a number of quantitative measurements which can provide valuable clinical
information for the assessment of the heart [Mohiaddin & Longmore, 1993]. Myco-
133
6.2. MR IMAGING AND VELOCITY QUANTIFICATION 134
radial functionality can be assessed by measuring the ventricular volume, the stroke
volume and the rest volume. Based on these quantities it is possible to calculate the
ejection fraction of the ventricles which measures the ratio between stroke and rest vol-
ume. Other indicators of myocardial functionality are the muscle thickness and mass
as well as wall motion and thickening during the cardiac cycle. Arterial functionality
can be assessed by measuring the distensibility or elasticity of arteries in terms of com-
pliance and is defined as change in volume per change in pressure during the cardiac
cycle.
Magnetic Resonance images picture anatomic detail by measuring tissue density in the
plane of imaging. Every pixel in an MR image carries a value that is proportional to the
average tissue density registered by the MR scanner at the corresponding approximate
location in the plane of imaging.
The magnetic resonance signals are caused by Hydrogen nuclei present in the tis-
sue. The nuclei spin on their axes generating magnetic moments making them become
magnetic dipoles. When these nuclei are placed in the magnetic field of the scanner,
the axes of spin precess about the direction of the applied magnetic field. The fre-
quency of precession is directly proportional to the strength of the magnetic field each
nucleus experiences.
Flow velocity quantification [Rueckert, 1997; Yang, 1998] is based on the observa-
tion that as spins move along an imaging magnetic field gradient, they acquire a shift
in their angular position relative to those spins that are stationary. This is called a spin
phase shift, and it is proportional to the velocity with which a spin moves. This shift
in the phase angle of the spins is a parameter contained within the detected MR signal
and can be readily measured.
134
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 135
The composite MR signal provides two images. The first one is the conventional
image, called the modulus of the magnitude image, in which the image signal intensity
is simply related to the magnitude of the MR signal. The second image is the phase
image in which the signal intensity is proportional to the shift in spin phase relative to
the stationary spins. This phase image, therefore, provides a pixel-by-pixel mapping of
spin velocities, given that both the strength of the magnetic field gradient and the time
during which the spins are exposed to the gradient are known. Since these features of
the sequence can be explicitly determined, it is possible for the user to define a desired
amount of spin phase shift per unit velocity and consequently determine flow rates
from the phase image.
To display flow in two opposite directions, a gray scale for displaying the spin
phases is chosen so that zero phase shift is medium gray. Spins that move into the
scanner will typically acquire positive phase shifts of 0 to 180 degrees. These are as-
signed a proportional intensity from midgray to white. Spins that move in the opposite
direction will acquire negative phase shifts of 0 to 180 degrees. These are assigned
a proportional intensity from medium gray to black. This is similar to color Doppler
echocardiography, in which the flow toward and away from the transducer is displayed
with two different colors, red and blue.
We now detail the results of our work [Shihab & Burger, 1998a; Shihab & Burger,
1998b] using cardiac velocity MR images. We describe the feature extraction, clus-
tering, and post-processing decisions we made in this specific application. Our appli-
cation consists of analysing MR image cine sequences acquired at the mid-ventricular
plane of the heart. The images are conventional MR tissue density images as well as
velocity images. Our objective is to segment and track the Left Ventricle (LV).
The cine sequences of images are aligned with the short-axis of the left ventricle
135
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 136
Figure 6.2: A plane of imaging that provides a short-axis view of the heart would be
parallel to the plane shown. c Auckland Cardiac MRI Unit
136
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 137
Figure 6.3: Examples of tissue density images: frames 0, 2, 4, 6, 8, 10, 12, and 14 in
an image sequence.
(illustrated in Figure 6.2). The velocity data is rendered as 3 images, vx , vy and vz , cor-
responding to the cartesian components of the velocity vector field V at each pixel. The
reference coordinate system has the x-y plane lying on the plane of imaging (aligned
with the short-axis of the left ventricle) and the z axis perpendicular to it (aligned with
the LV long-axis).
The image sequences contain 16 frames. The sequences start at systole and end
at early diastole. The time space between each frame and the next is approximately
40 ms. Figure 6.3 displays example frames from a sequence. Figure 6.4 displays four
frames from each of the three velocity components. We remark that each image is
generated out of normally 256 heartbeats and therefore each image depicts the average
behaviour of the heart during a large number of heartbeats. However, the information
provided is useful for observing the global dynamics of the heart and we can still refer
in a meaningful manner to a particular time of the cine sequence since it belongs to a
definite phase of the cardiac cycle.
137
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 138
(a)
(b)
(c)
138
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 139
ϕ
vz
vx
θ
vy
Figure 6.5: and define the direction of the velocity vector at a given point.
Each frame in a cine sequence contains several types of data. It contains the tissue
density data: I and the velocity data: vx , vy , and vz . Further, we can use the x, y
spatial coordinates for each pixel, assuming a cartesian coordinate system or the r
and coordinates, assuming a polar coordinate system. The cartesian velocity data
can also be transformed to spherical or cylindrical data values. Thus, with very little
pre-processing, many possible features can be selected for each pixel.
In all our experiments, we used the two cartesian spatial coordinates, x and y , as
features. However, we did not enter into the issue of finding suitable weighting for the
spatial features. As their range is much smaller than that of either the tissue density or
velocity data, they had little effect on the results. However, we left them in since they
are useful in the post-processing stage.
We assessed the impact of velocity features by clustering first without them, and
then with combinations of them. The features for the first experiment consisted of
x, y , and I (tissue density data without V ). In the second experiment we added V
which is the magnitude of the three velocity components at each pixel: vx , vy , and vz
139
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 140
q
(V = vx 2 + vy 2 + vz 2 ). In the third experiment, we removed V and replaced it with
and . These angles describe the direction of the velocity field at a given pixel, as
shown in Figure 6.5.
6.3.2 Method
In all experiments we ran the PDI algorithm. The m fuzziness factor was set at 1:5,
and the r the normalisers’ exponent was fixed at 1:0. Also, was set to four, as this
gave the most intuitive segmentation of the images. As is known, PDI’s output is in
the form of cluster prototypes, membership matrix, and normalisers. In the results we
present here, we utilised the membership matrix.
For each data set belonging to a frame after the first one, we initialised PDI with the
found prototype locations of the previous frame. The first frame’s data was randomly
initialised. An entire patient sequence would take between 3—4 minutes on a recent
Pentium PC model.
6.3.3 Post-Processing
Having clustered a patient’s data (in the three ways stated above), we then selected the
cluster corresponding to the LV blood pool area. This could be effected in two ways:
the first is to estimate which of the found prototypes represents the LV, or to plot a
max-rule segmentation of the first frame, from which one can visually determine the
LV-cluster. Membership images of the LV-cluster for the two cases of without-V and
with-V are shown for a normal patient in Figures 6.6 and 6.7.
Once we have determined the LV cluster, we can now count the pixels in the LV
area. Using the x and y features of the LV cluster’s prototype as a seed, we ran a region
growing routine on the max-rule segmented images. These provided us with a count
of the pixels in the LV area for each of the chosen data sets, for each patient.
140
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 141
Figure 6.6: First experiment (only tissue density data): membership images of the LV
cluster tracked from frames 0 to 15 (left-to-right, top-to-bottom) for a normal patient.
141
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 142
Figure 6.7: Second experiment (tissue density and V data): membership images of
the LV cluster tracked from frames 0 to 15 (left-to-right, top-to-bottom) for the same
patient as in Figure 6.6.
142
6.3. NOVEL APPLICATION TO VELOCITY IMAGES 143
6.3.4 Results
We remark here that we faced difficulties in our investigations due to the unreliable
data values sometimes produced in phase-contrast MRI studies, and due to the length
of time required for a single patient study (to collect this data). Thus, we clarify that
our intention is to illustrate the application of fuzzy clustering to this type of studies,
instead of to present a complete, validated medical investigation.
In Figure 6.8, we compare the calculated areas of the left-ventricle using the three
routes we took with a ’ground truth’ established by a clinician. The cine sequence is
that of a normal patient.
4000
Ground Truth
Max rule: Density data only
Max rule: Density + V
3500 Max rule: Density + directions
3000
LV Planar Area (pixels)
2500
2000
1500
1000
500
0 2 4 6 8 10 12 14 16
Frame Number
Figure 6.8: Comparison of calculated LV area for the three data sets used.
The general trend of all the curves as compared to the ground truth is correct.
However, we observe that using the velocity-magnitude feature causes somewhat er-
143
6.4. CONCLUSIONS 144
ratic estimates of LV area. Furthermore, these estimates were generally greater than
the correct values. In general, it was difficult to distinguish between the results of
density-only and density-and-velocity-direction features. As can be seen in the plot,
the estimates using these two feature sets were consistently less than the correct values.
6.4 Conclusions
In this Chapter, we studied the cardiac system and then investigated the viability of us-
ing fuzzy clustering as the principal method for segmentation and tracking of the LV.
We proceeded along the same steps outlined in the previous Chapter: feature extrac-
tion, clustering, and post-processing. In the feature extraction step, we experimented
with novel feature sets that include velocity data made available through phase con-
trast MR. In the clustering step, we used our novel PDI clustering algorithm. In the
post-processing step, we took a conventional route and used the max rule.
We conclude by reviewing our experience. First, our results were generally accu-
rate and can be used for quantifying cardiac measures. Clinicians easily understood
the concept of clustering and immediately grasped its application. The strength of the
method lies in its general flexibility and accuracy. Decisions like: setting a value for
, fixing values for the clustering parameters, and identifying the cluster of interest,
allow flexibility for the user. Once these decisions are gone through for one patient,
the processing of the other data sets can be automated.
Second, in studying the effect of using extra velocity-related features, we found that
they enhanced accuracy for only one frame out of the 16, as compared to a conven-
tional feature set containing tissue density data. We also found that velocity-directional
features provided more accurate results than velocity-magnitude features.
144
6.4. CONCLUSIONS 145
bly necessitate using polar coordinates instead of cartesian ones and weighting the spa-
tial coordinates suitably. Including velocity features would probably increase the ex-
tent of accurate segmentation because of the relative lack of motion of the myocardium.
145
CHAPTER 7
This dissertation investigated the FCM algorithm and devised a new algorithm, PDI,
to address a behavioural shortcoming of FCM. The shortcoming is that FCM does
not classify accurately a data set containing small clusters lying close to big clusters.
We found the reason for this to be that the objective function which is at the heart of
FCM becomes inadequate in situations like those stated above. It does not have the
flexibility of narrowing or widening the scope of a cluster prototype. By scope of a
cluster prototype we mean an area around the prototype in which points would add
little cost to the objective function. If the objective function allows a given prototype
to possess a relatively wider scope than other prototypes, points that lie far from the
given prototype, but within its scope, would not be costly. FCM’s objective function
gives an equal amount of scope to each prototype and this causes the correct solution to
be costly when clusters are of unequal sizes, the situation is made worse if the clusters
are of unequal populations as well.
146
7.2. FURTHER RESEARCH 147
the smaller clusters to be found. For each prototype, PDI redefines its “cluster contri-
bution” to be the same as FCM’s but divides it by a variable, the cluster normaliser.
This normalisation creates a non-equal distribution of scopes for each prototype. Thus,
small clusters are granted small scopes, because they take small normaliser values, and
they therefore become less costly and have a higher chance of being found.
This dissertation also critically investigated the process of analysing image data by
using fuzzy clustering. We highlighted three decisions points in this process: feature
extraction, algorithm and parameters, and post-processing method. We described ex-
amples of each of these decision points. Furthermore, we compared FCM’s and PDI’s
clustering of medical MR images, and designed synthetic data to test this.
Finally, the thesis presented the results of a novel application of fuzzy clustering
in medical image analysis. We used velocity data obtained by using a phase-sensitive
MR technique, as well as the usual tissue density data, to track the left ventricle in
image cine sequences. We found the availability of velocity directional data increases
the accuracy of the overall clustering.
147
7.2. FURTHER RESEARCH 148
(e) It might be useful to extend PDI in some of the ways FCM was extended.
So, for example, how would a PDI-G-K algorithm (see Section 3.3.1) differ
from the plain G-K algorithm? Likewise, we can create a possibilistic (see
Section 3.4.1) version of PDI and compare its performance to the original.
The points we propose below are independent of clustering algorithm used, ex-
cept when mentioned.
(a) In Chapter 6, we only clustered the image data available in one cross-
sectional slice. Even though there is no spatial continuity in multi-slice
148
7.3. FINAL CONCLUSIONS 149
(b) Similarly to above, investigating clustering the volume data with time peri-
odic information summarised in a phase angle feature per voxel (an exten-
sion of the approach in [Boudraa et al., 1993]) may yield an improvement
to the accuracy of results obtained via point (a) above.
The goal of clustering methods: detecting an inherent clustering in the data set and then
accurately describing it is a complex exploratory process. In two dimensional feature
space, it seems that no method or strategy is as versatile as the human. In practical
applications, therefore, misleading interpretations of cluster structure will have to be
detected and corrected by human expertise.
149
7.3. FINAL CONCLUSIONS 150
Widening our view to beyond our PhD research, we offer the following conclusions
on the subjects of clustering and image analysis :-
The usual logic which consists of applying a clustering algorithm first and then
assessing the clustering tendency from the algorithm’s results assumes perfect
accuracy of the clustering algorithm — which is not guaranteed. Furthermore,
this two-step computational effort ought to be replaced with a simpler one-off
test. The approaches of [Dunn, 1973; Windham, 1982] are interesting and should
be followed on.
2. Graph-theoretic methods have not been combined with objective function meth-
ods. It would seem that this a fruitful research area as objective function methods
rely on distance metrics that do not “see” connectivity or the lack of it, while that
is graph-theoretic methods’ strongest point.
150
References
Ahuja, N, & Tuceryan, M. 1989. Extraction of early perceptual structure in dot pat-
terns: Integrating region, boundary and component gestalt. Computer Vision,
Graphics, and Image prcessing, 48(3), 304–356.
Al-Sultan, Khaled S, & Fedjki, Chawki A. 1997. A Tabu Search-Based Algorithm for
the Fuzzy Clustering Problem. Pattern Recognition, 30(12), 2023–2030.
AlSultan, K S, & Khan, M M. 1996. Computational experience on four algorithms for
the hard clustering problem. Pattern Recognition Letters, 17(3), 295–308.
AlSultan, K S, & Selim, S Z. 1993. A Global Algorithm for the Fuzzy Clustering
Problem. Pattern Recognition, 26(9), 1357–1361.
Backer, E. 1995. Computer-Assisted Reasoning in Cluster Analysis. Prentice Hall.
Bajcsy, P, & Ahuja, N. 1998. Location- and density-based hierarchical clustering using
similarity analysis. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 20(9), 1011–1015.
Banfield, Jeffrey D, & Raftery, Adrian E. 1993. Model-Based Gaussian and Non-
Gaussian Clustering. Biometrics, 49(September), 803–821.
Barni, M, Cappellini, V, & Mecocci, A. 1996. A possibilistic approach to clustering -
Comments. IEEE Transactions on Fuzzy Systems, 4(3), 393–396.
Bensaid, Amine M, Hall, Lawrence O, Bezdek, James C, Clarke, Laurence P, Sil-
biger, Martin L, Arrington, John A, & Murtagh, Reed F. 1996. Validity-Guided
(Re)Clustering with Applications to Image Segmentation. IEEE Transactions on
Fuzzy Systems, 4(2), 112–123.
151
REFERENCES 152
Bezdek, J C, Hall, LO, Clark, MC, Goldgof, Dmitri B, & Clarke, LP. 1997. Medical
Image Analysis with Fuzzy Models. Statistical Methods in Medical Research, 6,
191–214.
Bezdek, James. 1981. Pattern Recognition with Fuzzy Objective Function Algorithms.
Plenum Press.
Bezdek, James C. 1980. A Convergence Theorem for the Fuzzy ISODATA Clustering
Algorithms. IEEE Transactions Pattern Analysis and Machine Intelligence, 2(1).
Bezdek, James C, & Pal, Sankar K (eds). 1992. Fuzzy Models for Pattern Recognition.
IEEE Press.
Bobrowski, Leon, & Bezdek, James C. 1991. -means Clustering with the l1 and
l1 Norms. IEEE Transactions on Systems, Man, and Cybernetics — Part B:
Cybernetics, 21(3), 545–554.
Boudraa, A, Mallet, J-J, Besson, J-E, Bouyoucef, S, & Champier, J. 1993. Left Ven-
tricle Automated Detection Method in Gated Isotopic Ventriculography Using
Fuzzy Clustering. IEEE Transactions on Medical Imaging, 12(3).
Chen, Mu-Song, & Wang, Shinn-Wen. 1999. Fuzzy Clustering Analysis for Optimiz-
ing Fuzzy Membership Functions. Fuzzy Sets and Systems, 103, 239–254.
152
REFERENCES 153
Cheng, TW, Goldgof, DB, & Hall, LO. 1998. Fast Fuzzy Clustering. Fuzzy Sets and
Systems, 93(1), 49–56.
Chi, Zheru, Yan, Hong, & Pham, Tuan. 1996. Fuzzy Algorithms: With Applications to
Image Processing and Pattern Recognition. World Scientific.
Chintalapudi, Krishna K, & Kam, Moshe. 1998 (October). The Credibilistic Fuzzy C
Means clustering algorithm. Pages 2034–2039 of: IEEE International Conference
on Systems, Man, and Cybernetics.
Clark, MC, Hall, LO, Goldgof, DB, Velthuizen, R, Murtagh, FR, & Silbiger, MS.
1998. Automatic tumor segmentation using knowledge-based techniques. IEEE
Transactions on Medical Imaging, 17(2), 187–201.
Cox, Earl. 1998. The Fuzzy Systems Handbook. 2nd edn. Academic Press/Morgan
Kaufmann.
Davé, Rajesh N. 1992. Boundary Detection through Fuzzy Clustering. Pages 127–134
of: IEEE Conference on Fuzzy Systems. IEEE Press.
Davé, Rajesh N, & Krishnapuram, Raghu. 1997. Robust clustering methods: A unified
view. IEEE Transactions on Fuzzy Systems, 5(2), 270–293.
Davson, H., & Segal, M. B. 1975. Introduction to Physiology. Vol. 1. Academic Press.
Duda, R O, & Hart, P E. 1973. Pattern Classification and Scene Analysis. Wiley, New
York.
Dunn, J C. 1973. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting
Compact Well-Separated Clusters. J. Cybernetics, 3(3), 32–57.
Everitt, Brian. 1978. Graphical Techniques for Multivariate Data. North Holland
Publ.
153
REFERENCES 154
Fayyad, Usama, Haussler, David, & Stolorz, Paul. 1996b (August). KDD for science
data analysis: issues and examples. In: Proceedings of the second international
conference on knowledge discovery and data mining KDD-96.
Fraley, C, & Raftery, A E. 1998. How many clusters? Which clustering method?
Answers via model-based cluster analysis. Computer Journal, 41(8), 578–588.
Frigui, Hichem, & Krishnapuram, Raghu. 1999. A robust competitive clustering algo-
rithm with applications in computer vision. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence, 21(5), 450–465.
Gath, I, & Geva, A. 1989. Unsupervised Optimal Fuzzy Clustering. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 11, 773–781.
Gath, I, Iskoz, A S, & Cutsem, B Van. 1997. Data induced metric and fuzzy clustering
of non-convex patterns of arbitrary shape. Pattern Recognition Letters, 18, 541–
553.
Gustafson, Donald E, & Kessel, William C. 1979 (Jan. 10-12). Fuzzy Clustering with
a Fuzzy Covariance Matrix. Pages 761–766 of: Proc. IEEE CDC.
Hall, Lawrence O, Ozyurt, Ibrahim Burak, & Bezdek, James C. 1999. Clustering
with a Genetically Optimized Approach. IEEE Transactions on Evolutionary
Computation, 3(2), 103–112.
Hathaway, Richard J, & Bezdek, James C. 1993. Switching Regression Models and
Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 1(3), 195–203.
154
REFERENCES 155
Hathaway, Richard J, Bezdek, James C, & Pedrycz, Witold. 1996. A Parametric Model
for Fusing Heterogeneous Fuzzy Data. IEEE Transactions on Fuzzy Systems, 4(3),
270–281.
Hoppner, Frank, Klawonn, Frank, Kruse, Rudolf, & Runkler, Thomas. 1999. Fuzzy
Cluster Analysis. Wiley.
Huang, ZX. 1998. Extensions to the k-means algorithm for clustering large data sets
with categorical values. Data Mining and Knowledge Discovery, 2(3), 283–304.
Jain, Anil K. 1986. Cluster Analysis. Chap. 2 of: Young, Tzay Y, & Fu, King-Sun
(eds), Handbook of Pattern Recognition and Image Processing, vol. 1. Academic
Press.
Klir, George J., Clair, Ute St., & Yuan, Bo. 1997. Fuzzy Set Theory: Foundations and
Applications. Prentice Hall.
Kothari, Ravi, & Pitts, Dax. 1999. On finding the number of clusters. Pattern Recog-
nition Letters, 20, 405–416.
Krishnapuram, Raghu, & Keller, James M. 1996. The possibilistic C-means algorithm:
Insights and recommendations. IEEE Transactions on Fuzzy Systems, 4(3), 385–
393.
155
REFERENCES 156
Krishnapuram, Raghu, & Kim, Jongwoo. 1999. A note on the Gustafson-Kessel and
Adaptive Fuzzy Clustering Algorithms. IEEE Transactions on Fuzzy Systems,
7(4), 453–461.
Kwon, S H. 1998. Cluster Validity Index for Fuzzy Clustering. Electronic Letters,
34(22).
Li, Zhaoping. 1997 (August). Visual Segmentation without Classification in a Model
of the Primary Visual Cortex. Tech. rept. 1613. AI Lab, MIT.
Lin, Ja-Chen, & Lin, Wu-Ja. 1996. Real-time and Automatic Two-Class Clustering by
Analytical Formulas. Pattern Recognition, 29(11), 1919–1930.
Liu, Xiaohui. 2000. Progress in Intelligent Data Analysis. Applied Intelligence, 11(3).
Mansfield, J R, Sowa, M G, Payette, J R, Abdulrauf, B, Stranc, M F, & Mantsch, H H.
1998. Tissue Viability by Multispectral Near Infrared Imaging: A fuzzy c-Means
Clustering Analysis. IEEE Transactions on Medical Imaging, 17(6).
Marchette, D J, Lorey, R A, & Priebe, C E. 1997. An Analysis of Local Feature
Extraction in Digital Mammography. Pattern Recognition, 30(9), 1547–54.
McLachlan, Geoffrey J, & Basford, Kaye E. 1988. Mixture models: inference and
applications to clustering. Marcel Dekker.
Michalski, R. S., & Stepp, R. 1983. Automated Construction of Classifications: Con-
ceptual Clustering versus Numerical Taxonomy. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 5(4), 396–410.
Millar, Anne Michele, & Hamilton, David C. 1999. Modern Outlier Detection Methods
and their Effect on Subsequent Inference. Journal of Statistical Computation and
Simulation, 64(2), 125–150.
Mirkin, Boris. 1999. Concept Learning and Feature Selection Based on Square-Error
Clustering. Machine Learning, 35(1), 25–39. a mechanism for selecting and
evaluating features in the process of generating clusters is proposed.
Mitchell, Tom M. 1997. Machine Learning. WCB McGraw-Hill.
Mohan, Rakesh. 1992. Perceptual Organization for Scene Segmentation and Descrip-
tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(6).
Mohiaddin, R. H., & Longmore, D. B. 1993. Functional aspects of cardiovascular
nuclear magnetic resonance imaging. Circulation, 88(1), 264–281.
Mui, J K, W, Bacus J, & S, Fu K. 1977. A scene segmentation technique for mi-
croscopic cell images. Pages 99–106 of: Proceedings of the Symposium on
Computer-Aided Diagnosis of Medical Images. IEEE, New York, NY, USA.
156
REFERENCES 157
Niyogi, Partha. 1995. The Information Complexity of Learning from Examples. Ph.D.
thesis, Department of Electrical Engineering and Computer Science, MIT, USA.
Pacheco, F A L. 1998. Finding the number of natural clusters in groundwater data sets
using the concept of equivalence class. Computers and Geosciences, 24(1), 7–15.
Pal, Nikhil R, & Bezdek, James C. 1995. On Cluster Validity for the Fuzzy -Means
Model. IEEE Transactions on Fuzzy Systems, 3(3).
Pal, Nikhil R, & Pal, Sankar K. 1993. A Review on Image Segmentation Techniques.
Pattern Recognition, 26(9), 1277–1294.
Pei, JH, Fan, JL, Xie, WX, & Yang, X. 1996. A New Effective Soft Clustering Method
— Section Set Fuzzy C-Means (S2FCM) Clustering. Pages 773–776 of: ICSP
’96 - 1996 3rd International Conference On Signal Processing, Proceedings, Vols
I And Ii.
Pham, Dzung L, & Prince, Jerry L. 1999. An Adaptive Fuzzy C-Means Algorithm
for Image Segmentation in the Presence of Intensity Inhomogeneities. Pattern
Recognition Letters, 20(1), 57–68.
Rezaee, M Ramze, Lelieveldt, BPF, & Reiber, JHC. 1998. A new cluster validity index
for the fuzzy c-means. Pattern Recognition Letters, 19, 237–246.
Rosenfeld, Azriel. 1979. Fuzzy Digital Topology. Information Control, 40(1), 76–87.
Rosenfeld, Azriel. 1984. The Fuzzy Geometry of Image Subsets. Pattern Recognition
Letters, 2(5), 311–317.
157
REFERENCES 158
Runkler, T A, & Bezdek, J C. 1999. Alternating cluster estimation: A new tool for
clustering and function approximation. IEEE Transactions on Fuzzy Systems,
7(4), 377–393.
Russel, Stuart, & Norvig, Peter. 1995. Artificial Intelligence. Prentice Hall.
Selim, Shokri Z, & Kamel, M S. 1992. On the Mathematical Properties of the Fuzzy
c-means Algorithm. Fuzzy Sets and Systems, 49, 181–191.
Shapiro, Larry S. 1995. Affine Analysis of Image Sequences. Ph.D. thesis, University
of Oxford.
Shihab, Ahmed Ismail, & Burger, Peter. 1998a. The Analysis of Cardiac Velocity MR
Images Using Fuzzy Clustering. Pages 176–183 of: Proc. SPIE Medical Imaging
1998 — Physiology and Function from Multidimensional Images, vol. 3337. San
Diego, USA: SPIE.
Shihab, Ahmed Ismail, & Burger, Peter. 1998b (July). Tracking the LV in Velocity
MR Images Using Fuzzy Clustering. In: Proc. Medical Image Understanding
and Analysis.
Smith, Norman Ronald. 1998. Fast and Automatic Techniques for 3D Visualisation of
MRI Data. Ph.D. thesis, Imperial College, University of London, London, UK.
Tolias, Yannis A., & Panas, Stavros M. 1998a. A Fuzzy Vessel Tracking Algorithm
for Retinal Images Based on Fuzzy Clustering. IEEE Transactions On Medical
Imaging, 17(2), 263–273.
Tolias, Yannis A., & Panas, Stavros M. 1998b. Image Segmentation by a Fuzzy Clus-
tering Algorithm Using Adaptive Spatially Constrained Membership Functions.
IEEE Transactions on Systems, Man, and Cybernetics — Part A: Systems and
Humans, 28(3), 359–370.
158
REFERENCES 159
Tuceryan, M, & Jain, A K. 1993. Texture Analysis. Pages 235–276 of: Handbook of
Pattern Recognition and Computer Vision. World Scientific.
Tyree, Eric W, & Long, J A. 1999. The use of linked line segments for cluster repre-
sentation and data reduction. Pattern Recognition Letters, 20(1), 21–29.
Underwood, R., & Firmin, D. 1991. Magnetic Resonance of the Cardiovascular Sys-
tem. Blackwell Scientific Publications.
van der Wall, E. E., & de Ross, A. 1991. Magnetic Resonance Imaging in Coronary
Artery Disease. Kluwer Academic Publishers.
Wilson, Kathleen J W. 1990. Anatomy and Physiology in Health and Illness. 7th edn.
ELBS Churchill Livingstone.
Windham, Michael P. 1982. Cluster Validity for the Fuzzy -Means Clustering Al-
gorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(4),
357–363.
Xie, X-L, & Beni, G. 1991. Validity measure for fuzzy clustering. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 13(8), 841–847.
Yang, Guang-Zhong. 1998. Exploring In Vivo Blood Flow Dynamics. IEEE Engineer-
ing in Medicine and Biology, 17(3).
Zadeh, Lotfi A. 1995. Probability Theory and Fuzzy Logic are Complementary rather
than Competitive. Technometrics, 37, 271–276.
Zadeh, Lotfi A. 1996. Fuzzy Logic = Computing with Words. IEEE Transactions on
Fuzzy Systems, 4(2), 103–111.
Zadeh, Lotfi A. 1999. From Computing with Numbers to Computing with Words
– From Manipulation of Measurements to Manipulation of Perceptions. IEEE
Transactions on Circuits and Systems, 45(1), 105–119.
Zadeh, Lotfi A, & Klir, George J. 1996. Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems :
Selected Papers by Lotfi A. Zadeh. World Scientific Pub Co.
159
REFERENCES 160
160
APPENDIX A
In the above, when we cut down the number of possible diameter configurations to
100, we said that a configuration of D1 = 5 and D2 = 10 is equivalent to D1 = 10
the populations as they are but swap the diameters, the resulting configuration, (P1 =
1; D1 = 10) and (P2 = 10; D2 = 5), is not equivalent to the former configuration.
This is illustrated in Figure A.2. Thus, it seems we must keep the second configuration
as it describes a different data set, and we can not discard the “equivalent” region of
Figure A.1.
161
162
D1
Radius of
LHS cluster 20
Line of touching clusters
Overlapping clusters
15
Equivalent
Arrangement
of clusters
10
0 D2
1 5 10 15 20
Radius of RHS cluster
Figure A.1: Plot of possible diameter configurations. Data sets corresponding to the
black dots in the triangular region were generated. If we eliminate overlapping and
equivalent configurations only 100 data sets remain.
P=1 P = 10 P = 10 P=1
P=1 P = 10 P = 10 P=1
Figure A.2: Each row illustrates equivalent p-d configurations. Only one of each suf-
fices when generating the suite of data sets.
162
163
and 5 : 10 diameter ratio we will discover that it is the same as that of the second
configuration above. Therefore, in order to not count the same p d configurations
twice, we can still consider only the 100 diameter configurations of Figure A.1 for each
of the nine population configurations.
163
APPENDIX B
These are the derivations used to plot the shape of FCM and PDI objective functions
on Mathematica.
Assume two 1D cluster prototypes located at the origin and (1; 0) respectively. Denote
prototype at the origin by a and the other by b.
Assume a point located at location (x) somewhere on the x-axis. Let’s calculate its
contribution towards the FCM objective function. Assume m = 2.
=d2xa )
(1
uxa =
(1=d2xa ) + (1=d2xb )
(1 =d2xb)
uxb =
(1 =d2xa ) + (1=d2xb)
=d2xa) + (1=d2xb)
(1
Jx =
[(1=d2xa ) + (1=d2xb )℄2
164
B.2. PDI’S DERIVATIONS 165
Since
dxa = x and dxb = x 1
For the two dimensional case, where the point is now located anywhere on the
plane and is of coordinates (x; y ), we derive the point’s contribution, Jxy :
1
1)2 +y 2
1
)℄2
x2 +y2 ) + ( (x 1)2 +y 2
Assume two 2D cluster prototypes located at the origin and (1; 0) respectively. Denote
prototype at the origin by a and the other by b.
=d2xa)
( ra
uxa =
(ra =d2xa ) + ((1 a )r =d2xb)
165
B.2. PDI’S DERIVATIONS 166
((1 a )r =d2xb)
uxb =
(ra =d2xa ) + ((1 a )r =d2xb )
Since
dxa = x and dxb = x 1
) Jx = (1
1
a )r r
(x 1)2
+ xa2
For the two dimensional case, where the point is now located anywhere on the
plane and is of coordinates (x; y ), we derive the point’s contribution, Jxy :
) Jxy = (1a )r
1
r
(x 1)2 +y 2
+ x2 +ay2
166
APPENDIX C
Finding a solution is effected with the Lagrange multipliers method [Bertsekas, 1996].
Since an exact analytical solution can not be obtained, the first order optimality con-
ditions are found and these are used as update equations in Picard iterations. The
algorithm is started with any initial values for U ; P ; and and then these are iteratively
improved until convergence is attained.
X X
N X X
L(U ; ; P ) =
1
umik d2ik + ( uik 1) + ( i 1)
i=1
r
i k=1 i=1 i=1
Where and are Lagrange multipliers for each of the constraints. is a vector of N
elements, and is a single value.
According to the Lagrange multipliers method, the necessary first order optimality
conditions are:
rU L = 0; (C.1)
rP L = 0; (C.2)
rL = 0; (C.3)
167
168
X
uik 1=0 8k = 1::N; (C.4)
i=1
and,
X
i 1=0 8i = 1:: : (C.5)
i=1
mumik 1 d2ik
k = 0: (C.6)
ri
k ri 1=m 1
) uik = [ md2 ℄ : (C.7)
ik
X k ri 1=m 1
[ ℄ =1 :
i=1 mdik
2
m
) k = [P r =dik )1=m
2 1 1=m 1
: (C.8)
i=1 ( i ℄
=d2ik )1=m 1
( ri
uik = P : (C.9)
i=1 (i =dik )
r 2 1=m 1
From the optimality condition of equation C.2 and noting that dik is any inner-
product induced norm on the difference between xk and vi , we obtain:
X
N
2= r
i umik (xk pi) = 0:
k=1
X
N X
N
) u x m
ik k = umik pi
k=1 k=1
168
169
XX
N
)=[ [ umik d2ik ℄ r
1
+1 ℄r +1 : (C.13)
i=1 k=1
169