Self Organizing Maps
Self Organizing Maps
=
N
k
x
x
x
x
x
.
.
2
1
Where N is the number of measurements (i.e., the
dimensionality of the patterns space). It is often more compact to use the
transpose of this vector, i.e.
x
t
= [ x
1
x
2
x
k
x
N
]
Each element of this vector, X
k
, is a random variable this means
that we cannot predict the value of X
k
before we take the measurement.
If a photodiode voltage lies between 0 and 1, then X
k
will lie somewhere
in this range.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 8
FIGURE 2.2.4
Delineation of pattern classes in pattern space.
The pattern space, in this example, possesses 35 dimensions
&each dimension is orthogonal to all the others. Clearly, we now refer to
hyperspace, hyper spheres and hyper surfaces .All the normal rules of
geometry apply to these high dimensional spaces (we just cannot
visualize them!).We may hope that there are 26 separate clusters each
representing a single letter of the alphabet. In this situation, it should be
relatively easy to construct hyper surfaces that separate each cluster from
the remainder. This may not be so.
Consider the two-dimensional pattern space of Figure2.2.4, in
which there are three pattern classes. The first case (a) is relatively
simple as only two decision surfaces are needed to delineate the pattern
space into the three class regions.
The second case (b) looks more complicated as many more
decision surfaces are required to form uniquely identifiable individual
case regions. However, if we transformed our two measurements in some
way, we could form the two thicker curved decision surfaces. The last
case (c) is different. Here, radically different measurements (patterns) are
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 9
associated to the same class. There is no way to transform our two
measurements to recover the situation where only two decision surfaces
are sufficient. We are now associating patterns, which bear no similarity
to each other, to the same pattern class. This is an example for hard
learning for which we must use supervised learning.
2.3 Features and decision spaces
As the previous section noted, it can sometimes be
beneficial to transform the measurements that we make as this can
greatly simplify the task of classification. It can also be unwise to use the
raw data in the case of the OCR system, the 35 individual photodiode
voltages. Working in such high dimensional pattern space is not wasteful
in terms of computing resources but can easily leads to difficulties in
accurate pattern classification. Consider the situation illustrated in
Figure2.3.1, where an OCR system has inadvertently been taught letter
Os that are always circular and letter Qs that are always elliptical.
After training, the system is exposed to elliptical Os and circular Qs.
As more picture elements (pixels) match for the main body of the letters,
rather than for the small cross-line, the system will make the incorrect
classification. However, we know that the small cross-line is the one
feature that distinguishes Qs from Os.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 10
FIGURE 2.3.1
Problems of using raw measurement data are illustrated here for a pattern
recognition system trained on the unprocessed image data (image pixel intensity
values). The system subsequently makes incorrect decisions because the
measurement vectors are dominated by the round bodies of the letter and not by
crucial, presence or absence of, the small cross-line.
A more reliable classifier could be made if we emphasized the
significance of the cross-line by ensuring that it was an essential feature
in differentiating between Os and Qs.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 11
Table 2.3.1
Possible feature table for the OCR system.
Pattern
class
/
line
\
line
|
line
--
line
Partial
circle
Closed
circle
End-
points
A 1 1 0 1 0 0 2
B 0 0 1 3 2 0 0
O 0 0 0 0 0 1 0
P 0 0 1 2 1 0 1
Q 0 1 0 0 0 1 2
Z 1 0 0 2 0 0 2
Functional block diagram of a generalized pattern recognition
system
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 12
3. Unsupervised Pattern Classification
3.1 Introduction
We need to measure the similarity of the various input
pattern vectors so that we can determine which cluster or pattern class
each vector should be associated with. This chapter discusses a number
of the common metrics employed to measure this similarity. It also
mentions briefly non-neural methods for unsupervised pattern
classification.
3.2 Distance metrics
To record the similarity, or difference, between vectors we
need to measure the distance between these vectors. In conventional
Euclidean space, we can use Pythagorass theorem (Figure 4.2.1(a)).
Normally, the distance between the two-dimensional vectors x and y is
given by
d(x, y) = | x - y | = [ (x
1
- y
1
)
2
+ (x
2
y
2
)
2
]
1/2
This can be extended to N dimensions, to yield the general expression
for Euclidean distance,
( ) ( )
2
1
1
2
| | ,
(
= =
=
o
i
i i x
y x y x y x d
We can use many different measures which all define different
kinds of metric space that is a space where distance has meaning. For a
space to be metric, it must satisfy the following three conditions:
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 13
{>0 xy
d(x,y)=
{=0 x=y
* If the vectors x and y are different, the distance between them must be
positive. If the vectors are identical, then the distance must be zero.
* d(x, y) =d(y, x) the distance between x to y is the same as the distance
between y to x.
* d(x, y) +d(y, z)>= d(x, z) The distance between x and z must be equal
to or greater than the distance between x to y and the distance between y
to z. This is the triangle inequality (figure4.2.1 (b))
FIGURE 4.2.1
(a) The Euclidean distance between vectors x and y.
(b) The triangle inequality d(x,y)+d(y,z)>=d(x,z).
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 14
3.3 Dimensionality Reduction
One important function of statistical pattern recognition is
dimensionality reduction. The set of measurements (i.e., pattern space)
originally taken may be too large and not the most appropriate. What we
require is means of reducing the number of dimensions but at the same
minimizing any error resulting from discarding measurements. If the
dimensionality of the pattern space is p, then we cannot simply keep
m of these (where m<p).We require some kind of transformation that
ranks its dimensions in terms of the variance of the data. Why should this
be so?
FIGURE 4.3.1
The vectors v1 and v2 are most approximate for representing
the data clusters 1 and 2 respectively, as these are the directions which
account for the greatest variance in each of the clusters. But are they the
best for discriminating the two clusters from each other?
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 15
Consider the scatter of data points in figure 4.3.1. The vector
along the direction of greatest scatter (i.e., variance) captures the most
important aspects of a particular data class. There are several important
statistical methods based on transforming the measurement space based a
different set of axes which may be more appropriate indiscriminating the
various data clusters (hopefully in fewer dimensions). The general
approach is called multivariant analysis with principle component
analysis being the oldest and most common. There are many pitfalls for
the novice in applying such techniques. The alternative approach, based
on clustering methods, is perhaps easier to comprehend. It should be
noted that there is no universal technique that is guaranteed to yield the
best solution for all cases. So much in statistical analysis (and neural
computing) is data dependent as illustrated in figure 4.3.2.
FIGURE4.3.2
(a) Classes best separated using transform methods (e.g., principal
component analysis).
(b)Classes best separated using clustering methods.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 16
4. Unsupervised Neural Networks
4.1 Introduction
At the start of Section One, we mentioned supervised
learning where the desired output response of a network is determined by
a set of targets. The general form of the relationship or mapping between
the input and output domains are established by the training data. We say
that an external teacher is needed to specify these input/output pairings.
Networks that are trained without such a teacher learn by evaluating the
similarity between the input patterns presented to the network. For
example, if the scalar product is used as a measure of the similarity
between input vectors and network's weight vectors, then an
unsupervised network could adapt its weights to become more like the
frequently occurring input vectors. So such networks can be used to
perform cluster analysis. They make use of the statistical properties of the
input data as frequently occurring patterns will have a greater influence
than infrequent ones. The final trained response of the network must
resemble in some way the underlying probability density function of the
data.
Unsupervised learning makes use of the redundancy present
in the input data in order to produce a more compact representation.
There is a sense of discovery with unsupervised learning. An
unsupervised network can discover the existence of unlabelled data
clusters, but it cannot give them meaningful names to these clusters nor
can it associate different clusters as representatives of the same class
(Figure 5.1.1)
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 17
FIGURE 5.1.1
(a) Cluster (class) discrimination in an unsupervised system.
(b) Class (cluster) discrimination in supervised system
4.2 Winner-take-all Network
The basic form of learning employed in unsupervised
networks is called competitive learning, and as the name suggests the
neurons compete among each other to be the one that fires. All the
neurons in the network are identical except that they initialized to have
randomly distributed weights. As only one neuron will fire, it can be
declared the winner. A simple winner-take-all network is shown in
Figure 5.2.1
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 18
FIGURE 5.2.1
This network of p neurons will classify the input data x, into one of p
clusters. The output of the network, y= [y
1
y
2
y
p
], is given by
Y= W x
where W is the weight matrix
(
(
(
(
(
=
r
i
r
r
w
w
w
w
.
2
1
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 19
and the individual weight vector of the ith neuron is given by
(
(
(
(
=
n
i
w
w
w
w
.
2
1
The first task is to discover the winning neuron that is the neuron
with a weight vector most like the current input vector (Remember that
the weight vectors were all originally set to random values). This is
usually measured in terms of the Euclidean distance. Hence the winning
neuron ,m, is the neuron for which | x-w
^
i
| is a minimum. Instead of
measuring each of these Euclidean distances and then identifying the
smallest, an equivalent operation is to use the scalar product. That is the
winning neuron is the one for which x w
r
n
is a maximum.
This simple network can perform single-layer clustering
analysis the clusters must be linearly separable by hyper planes passing
through origin of the encompassing hyper sphere. We also need to
specify a priori the number of neurons that is the number of identifiable
data clusters.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 20
5. Kohonen Self-Organizing Map
5.1 Introduction
The SOM consists of a (usually) one or two-dimensional array
of identical neurons. The input vector is broadcast in parallel to all these
neurons. For each input vector, the most responsive neuron is located
(using a similar method to that of the winner-take-all network). The
weights of this neuron and those within a neighborhood around it are
adapted as in the winner-take-all network to reduce the distance between
its weight vector and the current input vector. The training algorithm will
be described in detail before discussing the operation of the SOM.
5.2 SOM Algorithm
Assume an output array (Figure 5.2.1) of two dimensions
with k x k neurons, that the input samples, x, have dimensionality N, and
that index n represents the nth presentation of an input sample.
. All weight vectors are set to small random values. The only strict
condition is that all weights are different.
. Select a sample input vector x and locate the most responsive (winning)
neuron using some distance metric usually the Euclidean distance
That is | x(n)-w
i
| is a minimum (and j=1,2,. M, where M=K.K)
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 21
. Adapt all weight vectors, including those of the winning neuron, within
the current neighborhood region. Those outside this neighborhood are left
unchanged.
| |
)
`
O e +
= +
Otherwise n W
n i n w n x n n W
n w
i
i i
i
) (
) ( ) ( ) ( ) ( ) (
) 1 (
o
Where ) (n o is the current adaptation constant and ) (n O is the current
neighborhood size centered on the winning neuron.
. Modify, as necessary and, until no further change in the output feature
map is observable (or some other termination condition) otherwise go to
step two.
FIGURE 5.2.1
Two- dimensional self organizing map.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 22
Further details on parameter selection and map format:
- Adaptation constant
- Neighborhood function
- Adaptation processes
1. Adaptation constant, ( ) (n o )
For the first few 1,000 or so iterations, should be close to (but
less than) unity and then gradually decrease to about 0.1 or less. The first
phase of the learning is sometimes referred to as the ordering phase.
During this phase, the neighborhood is relatively large and the weight
vectors are changed by large amounts. The output space undergoes global
topological ordering. During the second stage, where is much smaller, the
space is merely refined at a local level. This second stage is sometimes
referred to as the convergence stage.
2. Neighborhood function, ( ) (n O )
The neighborhood usually assumes a size of between a third
and a half of the full array at the start of training, and falls gradually
during training to a size where only one layer of direct neighbors or none
at all, lie within the neighborhood. This reduction in can be stepped. In
the above description of the SOM algorithm, it was assumed that the
neighborhood was square and that all neurons within this region where
adapted by the same amount. The neighborhood could be hexagonal
(figure 6.2.2), or the effective change in the weight vectors within the
neighborhood could be weighted so that neurons close to the centre of the
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 23
neighborhood are proportionally changed more than those at its
boundary. A suitable profile for this neighborhood weighting profile
would be Gaussian. There may be some advantages, depending on the
application and the quality and quantity of input data, for using a
Gaussian profile.
Figure 6.2.2
Hexagonal neighbourhood
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 24
3 Adaptation processes
A typical learning sequence could be as follows:
Phase Neighbourhood radius Adaptation
Constant
Iterati
ons
Ordering Half map dimension (K/2) to
3 in
discrete steps.
0.9 to 0.1 5,000
Convergence 3 to 1 0.1 to 0 50,000
The convergence of the SOM is not critical on the settings of
these parameters, though the speed of convergence and the final state of
the topological ordering do depend on the choice of the adaptation
constant, and the neighborhood size and rate of shrinking. It is important
that consecutive input samples are uncorrelated.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 25
6. CONCLUSION
Devised by Kohonen in the early 80's, the SOM is now one of
the most popular and widely used types of unsupervised artificial neural
network.
It is built in a one-or two-dimensional lattice of neurons for
capturing the important features contained in an input (data) space of
interest. In so doing, it provides a structural representation of the input
data by the neurons weight vectors as prototypes. The SOM algorithm is
neurobiologically inspired, incorporation all the mechanisms that are
basic to self-organization: competition, cooperation, and self-
amplification.
The development of self-organizing map as a neural model is
motivated by distinct feature of the human brain. What is astonishing about
Kohonens SOM algorithm is that it is simple to implement, yet
mathematically so difficult to analyze its properties in a general setting.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 26
7. REFERENCES
Books:
1. Neural Networks by Simon Haykin
2. Fundamentals of artificial neural networks by M.H.Hassoun
3. Introduction to artificial neural networks by J.M.Zurada
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 27
ACKNOWLEDGEMENTS
I express my sincere thanks to Prof. M.N Agnisarman
Namboothiri (Head of the Department, Computer Science and
Engineering, MESCE), Mr. Zainul Abid (Staff incharge) for their kind
co-operation for presenting the seminar.
I also extend my sincere thanks to all other members of the faculty
of Computer Science and Engineering Department and my friends for their
co-operation and encouragement.
Neelima V.V
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 28
ABSTRACT
Self organizing maps are generally used in many fields like bio-
informatics, neural networks.It is basically used for the grouping of
data. In decision making we reach a conclusion by making certain tests to
the input given to the system. We will study the outputs obtained by the
tests and reach a solution. Consider a decision making system which
checks whether a cloth is wet or not. If we give the parameters of cloth
we need to know whether it is wet or not. Simply we can say if moisture
is there, the cloth is wet. But that is not the case always. The wet nature
can vary. We cannot take specific boundary to distinguish between the
wet cloth dry one.
In SOM we are making some sort of decision-making. Here
we will input one data and we will take a set of random points in the space
where the data is put. Usually that will be two-dimensional plane. All the
points will be having own weight vectors. Those points are called the
neurons. Then another algorithm called winner-take over used. Here the
distance between the input and each of the neurons are calculated and one
is selected according to the minimum distance. Then the input is moved
towards the neuron along with its neighborhood. Neighborhood is the set of
neurons with in a certain distance of the input. Again the process is
continued till the similarity reaches a certain limit, that allows the input to
accept the weight of one neuron as its own. This automatic mapping of
input to one of the neuron is called the SOM. This technology is used to
find the weight of unknown thing.
Seminar Report 03 Self Organizing Maps
Dept. of CSE MESCE Kuttippuram 29
CONTENTS
1. Introduction 1
2. Statistical Pattern Recognition 4
2.1 Elements of pattern Recognition
2.2 Pattern space and vectors
2.3 Feature and decision spaces
3. Unsupervised Pattern Classification 12
3.1Introduction
3.2 Distance metrics
3.3 Dimensionality Reduction
4. Unsupervised Neural Networks 16
4.1 Introduction
4.2 Winner-take-all Network
5. Kohonen Self-Organizing Map 20
5.1 Introduction
5.2 SOM Algorithm
6. CONCLUSION 25
7. REFERENCES 26