0% found this document useful (0 votes)
132 views6 pages

A New Generalized Learning Vector Quantization Algorithm

This document proposes a new generalized learning vector quantization (GLVQ) algorithm that uses a symmetrical distance measure to better detect clusters of different shapes in data. The GLVQ incorporates the concept of symmetry into the traditional LVQ algorithm by assigning patterns to clusters based on closest symmetrical distance rather than Euclidean distance. Computer simulations show the GLVQ method with random initialization is effective at detecting linear, spherical and ellipsoidal clusters and can also solve problems where clusters are crossed or overlapping.

Uploaded by

apramunendar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views6 pages

A New Generalized Learning Vector Quantization Algorithm

This document proposes a new generalized learning vector quantization (GLVQ) algorithm that uses a symmetrical distance measure to better detect clusters of different shapes in data. The GLVQ incorporates the concept of symmetry into the traditional LVQ algorithm by assigning patterns to clusters based on closest symmetrical distance rather than Euclidean distance. Computer simulations show the GLVQ method with random initialization is effective at detecting linear, spherical and ellipsoidal clusters and can also solve problems where clusters are crossed or overlapping.

Uploaded by

apramunendar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A New Generalized Learning Vector Quantization Algorithm

Ching-Tang Hsieh, Mu-Chun Su, Uei-Jyh Chen and Homg-Jae Lee


Department of Electrical Engineering
Tamkang University, Tamsui, Taipei Hsien, 25 137, China
E-mail: [email protected]

Abstract with the symmetrical distance can detect linear, spherical


A new approach of data clustering which is capable of and ellipsoidal clusters very well. Besides, this method
detecting clusters of dzferent shapes is proposed. In can also solve the crossed question. This paper is
classical clustering approaches, it is a great challenge to organized as follows. In Section 11, LVQ algorithm is
separate clusters if the cluster prototypes are dificult to briefly described. In Section 111, we review the most
be represented by a mathematical formula. In this paper, common Minkowski metrics, and then present the
we propose an improved learning vector quantization proposed symmetrical distance. In Section IV, LVQ
(LVQ) algorithm using the concept of symmetty. Through algorithm employing symmetrical distance is discussed.
several computer simulations, the results show that the In Section V, the several examples are used to
proposed method with the random initialization is demonstrate the effectiveness of the new algorithm.
efectiveness in detecting linear, spherical and ellipsoidal Finally, Section VI concludes this paper.
clusters. Besides, this method can also solve the crossed
question. 11. Leaming Vector Quantization Algorithm
The objective of LVQ algorithm is the representation of
I. Introduction a set of vectors X~EXCR'by a set of c prototypes V =
The leaming vector quantization (LVQ) algorithm can ...,
{vI,v2, v,}cR" 111. LVQ algorithm is associated with a
be performed through a supervised learning process using competitive network that consists of an input layer and an
a competitive neural network whose weight vectors output layer [1)[2]. Each node in the input layer is
represent the prototype [l](21. According to formulation, connected directly to the cells or units in the output layer.
LVQ algorithm can be achieved by minimizing a loss A weight vector also referred to as prototype that is
function that measures the locally weighted error of the assigned to each cell in the output layer [3]. When an
input vector with respect to the winning prototype (31. input vector x is submitted to this network, distances are
Because Euclidean distance sense is used to compute the computed between each v and x. The output nodes
distance between an input vector and the assigned compete, a winner node (minimumdistance) is found and
prototype, LVQ algorithm is suitable for detecting the it is then updated using below rules.
spherical cluster [4]. If the winner node has the same class as x
However, since most clusters in the real data sets may VI = V f - I + p+( x - VI - 1); (14
be the lineal, spherical or ellipsoidal shape. Based on the else
question, we propose a distance measure based on the VI = VI - I - p- ( x - V f - I), (W
concept of symmetry IS]. LVQ algorithm incorporated where f3' and f3- are the leaming rate ((Kf3+,f3'<1).

0-7803-6253-5/00/$10.0002000 IEEE. 339


111. Symmetrical Distance
Unless a meaningful measure of distance, or proximity,
between pairs of objects has been established, no where the denominator term is used to normalize the
meaningful cluster analysis is possible. The most common symmetrical distance so as to make the symmetrical
proximity index is the Minkowski metric, which measures distance insensible to the Euclidean distance I[xi-c [[
dissimilarity. Given n patterns, xi = (ql, ...,
xidT, lSiSn, and 11 xi-c 11 . The ideal of symmetrical distance is very
the Minkowski metric for measuring the dissimilarity simple and intuitive. It is instructive to observe the
between the ith and jth patterns is defined by geometrical interpretation of the definition of the
Y 1
symmetrical distance. Fig. 1 gives the concept. Obviously,

k=l if the data set does contain the pattern xi' the value of
where r21. All Minkowski metrics satisfy the following d,(xi,c) is zero because xl and XI. are symmetrical with
five properties 161: respect to c (i.e. (xi-c)+(xp-c) = 0).
1) d(i,i) = 0, Vi
2) d(i,j) = d(j,i), Vi,j IV.A New Generalized Learning Vector
3) d(ij)20, V i j Quantization (GLVQ) Algorithm
4) d(i,j) = 0 only if xl = xj The basic idea of the algorithm is to integrate the
5) d(i,j)5d(i,l)+d(l,j), Vi,j,l symmetrical distance with LVQ algorithm. We assign
The three most common Minkowski metrics are defined patterns to clusters that are closest to them in the
below symmetrical sense. The algorithm is summarized below.
1) r = 2 (Euclidean distance) The new GLVQ algorithm:
K 1 STEP1 : Fix c>O, 0.018011.0,
d ( i ,j ) = (xrk- x , ~17
) ~ p+,p-= learning rate (o.o_<s-<s+I~.o),
k=l
-1 Ptnte, P a t e = learning decadent rate
=[(xi - X , ) T ( X - x,)]* (3a) (O.O$+rate, P-ratS1 .O),
E = some small positive constant and
2) r =1 (Manhattan or city block distance) t,- = iterate limit.
K STEP2 : Initialize vo = (v,,~,~2.0,..., vC,0).
d(i,j ) = Ikk - x,&( (34 STEP3 : Let p'new = p' and plleW= p-.
k=l For t = 1,2,. ..,&;
3) r+w (('sup'' distance) A. Fori = 1,...,n;
a. Compute distance dl,,t-lwith
If [(
min
j = l , , e
d , - ,) < @,I
dv ( - 1 - - dsij 8

, j = 1 ,..., c ;
In order to translate the concept of symmetry about
Eke
clusters into reasonable mathematical formula and make it d,, , . ( = IIx,- v,,-SII
simple for computational reasons, we define the symmetry , j = 1 ,..., c
where
as follows. Given n patterns, xi, l%n, and a reference
vector c, the symmetrical distance between pattern xl and
- v,. - + (Xk - I 3 . t 1)1
II(X, r I) -
dsuvI
- = min

the reference c is defined as


kyz,"IIX#- v,, - + IIXk - v,. -t 111 I

,j=l,...,c

340
t Example 3: The data set consists of a compact linear
6? = 8,+ (1 - Bo)-.
t,
cluster, a compact spherical cluster and a ring-shaped
b. Locate the winner j' = minj dij,t-l.
cluster as shown in Fig.4(a). The total number of data
c. If winner j' has the same label as x,
points is 300. The clustering results achieved by the
VI I = ,I - I + p' ww( X,- vj ,i - 1);
Else classical LVQ and new GLVQ algorithms are shown in
Fig.4(b-c), respectively.
VJ*.l = v I * , l - I - p-mV(Xd- VJ*.I- I)

and locate the near neuron winner j',,, Example 4: The data set consists of two compact linear
with the same label as x, clusters as shown in FigS(a). The total number of data
Vi near, I = Vj near. I -I points is 400. The clustering results achieved by the
+p " V ( X , - I - I). classical LVQ and new GLVQ algorithms are shown in
d. Next i. Fig.S(b-c), respectively.
B. Compute P+new =P+newP+mte Example 5 : The data set consists of a compact spherical
and P-new =P-newPrate.
C. Compute cluster and two compact ellipsoidal clusters as shown in
Fig.6(a). The total number of data points is 579. The
clustering results achieved by the classical LVQ and new
D. If (&>Et) stop; Else next t.
GLVQ algorithms are shown in Fig.6(b-c), respectively.
Example 6: The Anderson's IRIS data has three subsets
(i.e. Iris Setosa, Iris Versicolor and Iris Virginica), two of
V. Experimental Results
the subsets are overlapping. The Anderson's IRIS data are
Several two-dimensional data sets are used in order that
in a four-dimensional space and there are total 150
the reader can "see" the effectiveness of the proposed
patterns in the set. The comparison results are tabulated in
symmetrical distance. The algorithm is then tested by
Table.1.
Anderson's IRIS data so as to show that the algorithm has
no restriction on the dimensions of the pattern space, and
VI. Conclusion
it applicable in higher dunensional space. For comparison
In this paper, we propose a new version of GLVQ
purpose, we also tested the data sets using the classical
algorithm using the concept of symmetry. This improved
LVQ algorithm.
algorithm employing the symmetrical distance which is
Example 1: The data set consists of two crossed lines as
capable of detecting linear, spherical and ellipsoidal
shown in Fig.2(a). The total number of data points is 100.
clusters. Several two-dimensional data sets are used to
The clustering results achieved by the classical LVQ and
demonstrate this method. For Anderson's IRIS data test
new GLVQ algorithms are shown in Fig.2(b-c),
also show that this method has no restriction on the
respectively.
dimensions of the pattern space and is applicable in higher
Example 2: The data set consists of two crossed
dimension space. Besides, this method can also solve the
ring-shaped clusters as shown in Fig.3(a). The total
crossed question. The price paid is the increase of
number of data points is 100. The clustering results
computational time and complexity.
achieved by the classical LVQ and new GLVQ algorithms
are shown in Fig.3(b-c), respectively.

341
Reference
[11 T. Kohonen, “Learning Vector Quantization”,Helsinki
University of Technology, Laboratory of Computer and
Information Science Report TKK-F-A-601, 1986.
[2] J.C.Bezdek and N.R. Pal, “Two Soft Relatives of
Learning Vector Quantization”, Neural Network,
8(5):729-743, 1995.
[3] R. Lippman, “An introduction to neural computing”,
IEEE ASSP Mag., pp.4-22, April, 1987.
[4] N.R. Pal, J.C. Bezdek and E.C.K. Tsao, “Generalized 4 0 1 2 3 4 5 6 7 8 9 14
~~~~~ ~ ~~ ~

Clustering Networks and Kohonen’s Self-organizing Fig.2(a) The data set consisted of two crossed lines.

Scheme”, IEEE Trans. Neural Networks, Vo1.4,


I
pp.549-557, 1993. il I

[5] K. Koffka, “Principles of Gestalt Psychology”,


London, Routledge & Kegan, 1935.
[6] A.K. Jain and R.C. Dubes, “Algorithms for Clustering
Data”, New Jersey, Prentic Hall, 1988.

Table.1 Performance of the classical LVQ and new GLVQ algorithms

for Anderson’s IRIS data.

The classical LVQ


4 0 1 2 3 4 5 6 7 8 914
Fig.t(b) The clustering result achieved by the classical LVQ algorithm

(mistakes=33).

11I

xi

4 0 1‘2 3 4 5 6 7 8 9 1 C
Fig.t(c) The clustering result achieved by the new CLVQ algorithm
Xi*
(mistakes+).
Fig.1 An example of the symmetrical distance.

342
I I I
-2' 1 I

0 1 2 3 -2-1 0 1 2 2
Fig.3(a) The data set consisted of two crossed ring-shaped clusters. Fig. I) The data set consisted of a compact lineal cluster, a compact

spherical cluster and a ring-shaped cluster.

3
2
1
0
-1
-2
1 2 3
I
Fig. he clustering result achieved by the classical LVQ algorithm Fig.4(b) The clustering result achieved by the classical LVQ algorithm

(mistakes=26). (mistakes=lO).

I i

' I I I I

'I
l o 1 2 3 ~~
4
Fig.3(c) The clustering result achieved by the new GLVQ algorithm Fig.4(c) The clustering result achieved by the new GLVQ algorithm

(mistakes=O). (mistakes+).

343
10 'il I
8 5 -
6 4 -
4
3-
2-
2 1-
0 0-
-2 -1 ' I I I I I I I

43-4-20 2 4 6 81( -2-1 0 1 2 3 4 E


Fig.S(n) The data set consisted of two compact lineal clustei Fig.,+) The data set consisted of a compact spherical cluster an WO

compact ellipsoidal clusters

10 I 1
8- 5 -
6 - 4-
3-
4-
2-
2- 1-
0- a
0. -
-2 ' ' I 1 I l l I
1 ~

3 - 4 - 2 0 2 4 6 81( -2-1 0 1 2 3 4 5
Fig b) The clustering result achieved by the classical LVQ alp thm Fig.( I The clustering result achieved by the classical LVQ algorithm

(mistakes=82). (mistakes=l7).

61
1
10
8 5-
6 4-
4
3-
2-
2 1-
0 0 -
I" -1 ' I I I I I I

3 - 4 - 2 0 2 4 6 81( -2-10 1 2 3 4 5
Fig.S(c) The clustering result achieved by the new GLVQ algoi im Fig. c) The clustering result achieved by the new GLVQ algor hm

(mistakes=O) (mistakes=3).

344

You might also like