Robust Fuzzy Clustering Algorithms
Robust Fuzzy Clustering Algorithms
Rajesh N. Dave
Department of Mechanical and Industrial Engineering
New Jersey Institute of Technology
Newark, New Jersey 07102
Abstract - A class of fuzzy clustering algorithms based algorithms, either hard or fuzzy. In this paper, we show that
on a recently introduced "noise cluster" concept is proposed. A this approach is very general, and applies to all the generaliza-
"noise prototype" is defied such that it is equi-diistant to all the tions of the FCM algorithm. It also applies to a variety of
points in the data-set. This allows for detection of clusters regression problems. In what follows, the background on the
amongst data with or without noise. It is shown that this concept noise clustering is presented first, followed by the application
is applicable to all the generdlizations of fuzzy or hard k-means of this approach to a variety of fuzzy clustering algorilhms
algorithms. Various applicationsare also considered. Applica- including examples. Application to the regression problems
tion of this concept to a variety of regression problems is also are also presented, followed by the summary and reconunen-
considered. It is shown that the results of this approach are dations for further work.
comparable to many robust regression techniques. The paper
concludes with a summary and directions for future work. U. NOISECLUSTERING ALGORITHM
The main idea in the noise clustering algoritlm is the
Index Terms -Fuzzy clustering,robustclustering, image concept of a "noise-prototype". Although defining a separate
processing, noisy data, cluster analysis, pattern recognition, ro- cluster to dump noise points is not a new idea, the idea of
bust regression defining the noise itself as a prototype is new. The concept of
noise as a prototype requires definition. Following the nota-
I. INTRODUCTION tions in Dave[7], the noise prototype is defined below.
Fuzzy c-means (FCM) algorithms and its generalizations
have found continued applications in a variety of areas includ-
ing image processing [ 1-71. The major problems with these
algorithms,however, are poor performance when there is noise Noise prototype: Noise prototype is a universal cnlity
iu tlie data, and the requirement that the number of clusters in such that it is always at the same distance from every point in
the data is known apriori. Several techniques have been pro- the data-set. Let Vn be the noise prototype, and xk be Ilic poinl
posed to encounter these problems, for example, [7-93. The in feature space, Vn, xk E tKP . Then the noise prototype is sucli
problem of an unknown number of clusters has been addressed that the distance dnk, distance of point xk from Vn, is
in many different ways [8,9], the usual practice being the use
of cluster validity measures to determine the correct number of d*=6, V k (1)
clusters [ 101. The problem of noisy data has been a difficult Although the above definition does not tell us what the
one to solve. In theory, the FCM algorithms have a zero distance 6 is, it does imply that all the points in the data-setare
breakdown point, i.e., even a single outlier may completely at the same distance from the noise cluster, and thus defines the
throw off the prototypes. In fact, most clustering algorithms, noise prototype.
whether they are based on the principle of least-squared error
minimization or not, are not robust against noise. From the Next, we re-formulate the conventional FCM algorithm
practical point of view, real applications involve analysis of using this concept. Let there be c good clusters in the data-set,
data with some amount of noise. Therefore, in many cases, a so we add one cluster, i.e., the (c+l)th cluster as the noise
FCM type algorithm may have little practical value, unless cluster. Hereafter, we denote n=c+l. Then tlie fuimional .I,,
special care is taken to handle the noisy data. including the noise cluster is defined as,
Amongst the different techniques proposed to handle
noisy data [7-9,113, the recently introduced concept of "noise
clustering" [7] appears to have the best ability of handling noisy
data. This approach is applicable to all squared-error type
where, the distances are defined by,
Partial support for this work was received from USDOE contract
(did2 = <Xk - Vi>T Ai <Xk - Vi> for all k and i = 1 U, c, (3a)
DE-AC22-91PC90181, and NSF grant MSS-90068322.
0-7803-0614-7/93$03.00(D1993IEEE 1281
and, (dik)* = S2, for i = n (= c+l). (3b)
Here, xk is the feature vector of point k, (k = 1 through N
number of points), and Vi is the cluster prototype of class i. The
distances are measured through the norm induced by symmet-
ric, positive definite matrices Ai, m is the exponent, 1c m c 00,
>:*.:: . . .
L 0 .
-. ...
0. 5 *.
.
s
.
.
and Uik is the membership of point k in class i. Assuming that e:
the distance6 is specified, the minimization gives the following 2.- "Z e. : : .
equations.
1 ..
Uik = 1
1
9 (4)
N
j=l
N
( Uik )" xk
k=l Fig. l(a). Three clusters with noise.
Vi = N
, fori=l toc. (5)
i= 1
i= 1
(7)
y*
.$
. I
.
..+
. ..'
.:
$,*
ally, it should be based on the statistics of the data-set. The
following recommendation is made in [7].
a*= h 1 =: 1c
i=l k=l
N
(dik)2 (8)
where h is the value of the multiplier used to obtain 6 from the
.'
B
+*
.... .
.
average of distances. Even h may be based on some higher .+:
order statistics of the data. We present one example here to I...
Fig. 2(a). Edge map of an image of the top of a cube. Fig. 3(a). Edge data of the shaft with markers.
Fig. 2@). Output of GK for the data of Fig. 2(a). Fig. 3@). Circles detected using noise FCS algorithm.
90.00
80.00
i0.W
3
-
d
.P
U
60.00
50.00
40.00
ld
Y
0 30.00
20.00
10.00
1 . 1
0.m 2.00 4.00 6.03 R.OO 10.W 12.00 I4.W
Time (ms)
Fig. 2(c). Output of Noise-GK for the data of Fig. 2(a). Fig. 3(c). Plot of computed rotation and actual rotation.
1284
method such as least squared median error minimization
(LMS) is recommended,which works well even in the presence 22 ,
of half the number of noisy points. The concept of noise I I I l
0
clustering can also be applied to the standard LS method to
make it robust against noise. We formulate the problem as - 0
18
follows.
e
Given (xk, yk), k=l to N number of pairs, find the "best fit" +
c/)
e
function f(a,x), where a is the vector of the parameters of the
function f. For a line fit, a could be a vector with two parame-
ters, i.e., slope and intercept. We minimize the following
functional.
N
J n ~ s(%u,w)= [(uklm(f(%xk) - YkI2 + (Wklm(6kI21 (12)
k=l
where 6k is the distance of the point from the noise class. All
& are fixed to a constant value. A point has a membership u k I I I I
in the good class, and wk in the noise class. The following 50 55 60 65 70 75
conditions on the memberships are also used. Year
O<Uk<l, O<Uk<l, and Uk+Wk=1 (13)
Minimization of (12) subject to the parameters a is essen- Fig. 4. Noise LS fit to the telephone call data from [ 151.
tially the same as in the conventional LS fit, while using
Lagrange multiplier technique and a certain amount of re-ar-
rangement, we obtain the following for the memberships.
1
and
1
-
2
1 I
-
2 +-2
[f(a,xk) - Yk]" [6kIm-' Fig. 5(a). The star cluster data from [15].
The above can be applied to the line fit problem. An
example of phone call data from [ 151 (page 25) is considered. e
Fig. 4 shows the data plotted as solid circles. The conventional e
LS fit does not pick up the desired trend in the data,while the e
e
noise LS (NLS) fit using the above formulation picks up the
correct trend according to [151. The results of LMS are similar.
Another example is of locating the center of multivariate
data where one tries to fit an ellipsoidal shape to the data. Once
again, an example from [15] (page 261) is used as shown in
Fig. 5. Part (a) shows the original data, while in part (b), the
correct ellipsoid, discounting the effect of the outliers, is shown
;LSidentified by the noise clustering algorithm. The figure
shows solid circles as outliers, and the two lines show the
principal axes of the ellipsoid. Similar results are obtained
using the MVE (minimum volume ellipsoid) estimator pre-
sented in [ 141.
Fig. 5@). Best ellipsoid fit using noise clustering approach.
1285
VI. CONCLUSIONS REFERENCES
The results presented show the potential of noise cluster- 1. J. C. Bezdek and S.K. Pal, F u u y Modelsfor Patfern recognilion.
EEEPress, New York, 1992.
ing algorithms in improving the performance of fuzzy cluster-
2. R. N. Dave, "Boundary Detection through Fuzzy Clustering,"
ing algorithms. Although not explicitly shown here, this con- Invited Paper, IEEE International Conference on Fuuy Systems,
cept is equally applicableto hard k-means type algorithms [7]. San Diego, California, March 8-12, pp. 127-134, 1992.
Through this approach, one can solve one of the major prob- 3. R. L. Cannon, J. V. Dave, J. C. Bezdek and M. M. Trivedi,
lems associated with the FCM and hard k-means type algo- "Segmentation of a thematic mappex image using the fuzzy c-means
clustering algorithm," IEEE Trans. on Geos. and Remote Sens.. vol.
rithms. Examples shown here are from real digital image GE-24(3), pp. 400-408, 1986.
analysis applications. It is shown that this approach can be 4. R. N. Dave, "Fuzzy shell-clustering and applications to circle
successfully used to find lines and curves in digital images. detection in digital images," International J. of General Syslenu.
vol. 16, pp 343-355, 1990.
Application to regression analysis problems shows an- 5. R. N. Dave and K. Bhaswan, "Adaptive fuzzy c-shells clustering and
other new area for application of this approach. In terms of detection of ellipses," IEEE Trans. on Neural Networks, vol. 3(5),
curve fitting, this approach representsan important new tool in 1992.
robust data fitting. In multivariate data analysis, this approach 6. R. Krishnapuram, 0. Nasraoui and H. Frigui, "The fuzzy c-shells
algorithm: A new approach," IEEE Trans. on Neural networks, vol.
competes well with techniques like MVE estimators. In terms 3(5), 1992.
of computational cost, it is expected that this approach has an 7. R. N. Dave, "Characterization and detection of noise i n cluslering",
advantage over other statistical methods. Another major ad- PanernRec. Leners. vol. 12(11), pp 657-664, 1991.
vantage of this approach is its applicability to handle multiple 8. R. Krishnapuram and C-P Freg. "Fitting an unknown iiuntber of lines
lines or ellipsoids in the data-set. and planes to image data through compatible cluster merging", Prrt-
tern Recognition, 1992.
The ideas presented here need further improvements in 9. R. N. Dave and K. J. Patel, "Progressive fuzzy clustering algorithms
terms of developing better strategies to define the noise dis- for characteristic shape recognition", Proceedings of NA FIPS'!N.
pp 121-124, 1990.
tance, 6 and the multiplier, h. In addition, the algorithm must 10. J. C. Bezdek. Panern Recognition with F u u y Objective Function
select this quantities adaptively so that the 31 is changed to a Algorithms. Plenum Press, New York, 1981.
smaller value as the algorithm progresses. The main shortcom- 11. J. Jolion and A. Rosenfeld, "Cluster detection in background noise,"
ing of the FCM type algorithm is that there is no guarantee of Pattern Recognition. vol. 22(5). pp. 603-607 1989.
global convergence. This disadvantage is also present in the 12. E. E. Gustafson and W. C. Kessel. "Fuzzy clustering with a fuzzy
covariance matrix," in Proc. IEEE CDC, San Diego, Calif., pp. 761-
proposed modification. It is expected that use of the techniques 766,1979.
such as progressiveclustering [9]can significantly improvethe 13. R. N. Dave, "Use of the adaptive fuzzy clustering algorithm IO dccect
performance, and also overcome the other shortcoming, i.e., lines in digital images," Intelligent Robots and Coniyitter Vision
need to know the number of clusters. VIII, vol. 119x2). pp. 600-61 1, 1989.
14. A. D. Rosato, R. N. Dave, I. S. Fischer and W.N. Carr, "Devclopmr.~~~
ACKNOWLEDGMENT of a non-intrusive particle tracing technique for granular chute
flows," Quarterly Progress Report, US DOE contract DEAC22-
91PC90181, Pittsburgh Energy Technology Center, April 1992.
The author wishes to thank Kurra Bhaswan for coding the 15. P. J. Rousseeuw and A. M. Leroy, Robust Regression and Oiiilier
algorithms, and to Jim Yu for providing results of Fig. 3(c). Detection, John Wiley, New York, 1987.
1286