A New Generalized Learning Vector Quantization Algorithm
A New Generalized Learning Vector Quantization Algorithm
k=l if the data set does contain the pattern xi' the value of
where r21. All Minkowski metrics satisfy the following d,(xi,c) is zero because xl and XI. are symmetrical with
five properties 161: respect to c (i.e. (xi-c)+(xp-c) = 0).
1) d(i,i) = 0, Vi
2) d(i,j) = d(j,i), Vi,j IV.A New Generalized Learning Vector
3) d(ij)20, V i j Quantization (GLVQ) Algorithm
4) d(i,j) = 0 only if xl = xj The basic idea of the algorithm is to integrate the
5) d(i,j)5d(i,l)+d(l,j), Vi,j,l symmetrical distance with LVQ algorithm. We assign
The three most common Minkowski metrics are defined patterns to clusters that are closest to them in the
below symmetrical sense. The algorithm is summarized below.
1) r = 2 (Euclidean distance) The new GLVQ algorithm:
K 1 STEP1 : Fix c>O, 0.018011.0,
d ( i ,j ) = (xrk- x , ~17
) ~ p+,p-= learning rate (o.o_<s-<s+I~.o),
k=l
-1 Ptnte, P a t e = learning decadent rate
=[(xi - X , ) T ( X - x,)]* (3a) (O.O$+rate, P-ratS1 .O),
E = some small positive constant and
2) r =1 (Manhattan or city block distance) t,- = iterate limit.
K STEP2 : Initialize vo = (v,,~,~2.0,..., vC,0).
d(i,j ) = Ikk - x,&( (34 STEP3 : Let p'new = p' and plleW= p-.
k=l For t = 1,2,. ..,&;
3) r+w (('sup'' distance) A. Fori = 1,...,n;
a. Compute distance dl,,t-lwith
If [(
min
j = l , , e
d , - ,) < @,I
dv ( - 1 - - dsij 8
, j = 1 ,..., c ;
In order to translate the concept of symmetry about
Eke
clusters into reasonable mathematical formula and make it d,, , . ( = IIx,- v,,-SII
simple for computational reasons, we define the symmetry , j = 1 ,..., c
where
as follows. Given n patterns, xi, l%n, and a reference
vector c, the symmetrical distance between pattern xl and
- v,. - + (Xk - I 3 . t 1)1
II(X, r I) -
dsuvI
- = min
,j=l,...,c
340
t Example 3: The data set consists of a compact linear
6? = 8,+ (1 - Bo)-.
t,
cluster, a compact spherical cluster and a ring-shaped
b. Locate the winner j' = minj dij,t-l.
cluster as shown in Fig.4(a). The total number of data
c. If winner j' has the same label as x,
points is 300. The clustering results achieved by the
VI I = ,I - I + p' ww( X,- vj ,i - 1);
Else classical LVQ and new GLVQ algorithms are shown in
Fig.4(b-c), respectively.
VJ*.l = v I * , l - I - p-mV(Xd- VJ*.I- I)
and locate the near neuron winner j',,, Example 4: The data set consists of two compact linear
with the same label as x, clusters as shown in FigS(a). The total number of data
Vi near, I = Vj near. I -I points is 400. The clustering results achieved by the
+p " V ( X , - I - I). classical LVQ and new GLVQ algorithms are shown in
d. Next i. Fig.S(b-c), respectively.
B. Compute P+new =P+newP+mte Example 5 : The data set consists of a compact spherical
and P-new =P-newPrate.
C. Compute cluster and two compact ellipsoidal clusters as shown in
Fig.6(a). The total number of data points is 579. The
clustering results achieved by the classical LVQ and new
D. If (&>Et) stop; Else next t.
GLVQ algorithms are shown in Fig.6(b-c), respectively.
Example 6: The Anderson's IRIS data has three subsets
(i.e. Iris Setosa, Iris Versicolor and Iris Virginica), two of
V. Experimental Results
the subsets are overlapping. The Anderson's IRIS data are
Several two-dimensional data sets are used in order that
in a four-dimensional space and there are total 150
the reader can "see" the effectiveness of the proposed
patterns in the set. The comparison results are tabulated in
symmetrical distance. The algorithm is then tested by
Table.1.
Anderson's IRIS data so as to show that the algorithm has
no restriction on the dimensions of the pattern space, and
VI. Conclusion
it applicable in higher dunensional space. For comparison
In this paper, we propose a new version of GLVQ
purpose, we also tested the data sets using the classical
algorithm using the concept of symmetry. This improved
LVQ algorithm.
algorithm employing the symmetrical distance which is
Example 1: The data set consists of two crossed lines as
capable of detecting linear, spherical and ellipsoidal
shown in Fig.2(a). The total number of data points is 100.
clusters. Several two-dimensional data sets are used to
The clustering results achieved by the classical LVQ and
demonstrate this method. For Anderson's IRIS data test
new GLVQ algorithms are shown in Fig.2(b-c),
also show that this method has no restriction on the
respectively.
dimensions of the pattern space and is applicable in higher
Example 2: The data set consists of two crossed
dimension space. Besides, this method can also solve the
ring-shaped clusters as shown in Fig.3(a). The total
crossed question. The price paid is the increase of
number of data points is 100. The clustering results
computational time and complexity.
achieved by the classical LVQ and new GLVQ algorithms
are shown in Fig.3(b-c), respectively.
341
Reference
[11 T. Kohonen, “Learning Vector Quantization”,Helsinki
University of Technology, Laboratory of Computer and
Information Science Report TKK-F-A-601, 1986.
[2] J.C.Bezdek and N.R. Pal, “Two Soft Relatives of
Learning Vector Quantization”, Neural Network,
8(5):729-743, 1995.
[3] R. Lippman, “An introduction to neural computing”,
IEEE ASSP Mag., pp.4-22, April, 1987.
[4] N.R. Pal, J.C. Bezdek and E.C.K. Tsao, “Generalized 4 0 1 2 3 4 5 6 7 8 9 14
~~~~~ ~ ~~ ~
Clustering Networks and Kohonen’s Self-organizing Fig.2(a) The data set consisted of two crossed lines.
(mistakes=33).
11I
xi
4 0 1‘2 3 4 5 6 7 8 9 1 C
Fig.t(c) The clustering result achieved by the new CLVQ algorithm
Xi*
(mistakes+).
Fig.1 An example of the symmetrical distance.
342
I I I
-2' 1 I
0 1 2 3 -2-1 0 1 2 2
Fig.3(a) The data set consisted of two crossed ring-shaped clusters. Fig. I) The data set consisted of a compact lineal cluster, a compact
3
2
1
0
-1
-2
1 2 3
I
Fig. he clustering result achieved by the classical LVQ algorithm Fig.4(b) The clustering result achieved by the classical LVQ algorithm
(mistakes=26). (mistakes=lO).
I i
' I I I I
'I
l o 1 2 3 ~~
4
Fig.3(c) The clustering result achieved by the new GLVQ algorithm Fig.4(c) The clustering result achieved by the new GLVQ algorithm
(mistakes=O). (mistakes+).
343
10 'il I
8 5 -
6 4 -
4
3-
2-
2 1-
0 0-
-2 -1 ' I I I I I I I
10 I 1
8- 5 -
6 - 4-
3-
4-
2-
2- 1-
0- a
0. -
-2 ' ' I 1 I l l I
1 ~
3 - 4 - 2 0 2 4 6 81( -2-1 0 1 2 3 4 5
Fig b) The clustering result achieved by the classical LVQ alp thm Fig.( I The clustering result achieved by the classical LVQ algorithm
(mistakes=82). (mistakes=l7).
61
1
10
8 5-
6 4-
4
3-
2-
2 1-
0 0 -
I" -1 ' I I I I I I
3 - 4 - 2 0 2 4 6 81( -2-10 1 2 3 4 5
Fig.S(c) The clustering result achieved by the new GLVQ algoi im Fig. c) The clustering result achieved by the new GLVQ algor hm
(mistakes=O) (mistakes=3).
344