9 Som
9 Som
9 Som
SOM
1
21-10-2023
Unsupervised Learning
• Some problems require an algorithm to cluster or to
partition a given data set into disjoint subsets
("clusters"), such that patterns in the same cluster are
as alike as possible, and patterns in different clusters
are as dissimilar as possible.
• The application of a clustering procedure results in a
partition (function) that assigns each data point to a
unique cluster.
• A partition may be evaluated by measuring the
average squared distance between each input pattern
and the centroid of the cluster in which it is placed.
C2
C1
2
21-10-2023
3
21-10-2023
Clustering
• Clustering is alternatively called as “grouping”
• Intuitively, if you would want to assign same
label to a data points that are close to each
other
• Thus, clustering algorithms rely on a distance
metric between data points
4
21-10-2023
Issues
• Given desired number of clusters?
• Finding “best” clusters
• Are clusters semantically meaningful?
Cluster Analysis
• Finding groups of objects such that
– the objects in a group will be similar (or related) to one
another and
– different from (or unrelated to) the objects in other groups
5
21-10-2023
K-means Algorithm
• Given the cluster count K, the K-means algorithm is carried out in
three steps after initialisation:
Initialisation: set seed points (randomly selected as means
of clusters)
1)Assign each object to the cluster of the nearest seed
point measured with a specific distance metric
2)Compute new seed points as the centroids of the clusters
of the current partition (the centroid is the centre, i.e.,
mean point, of the cluster)
3)Go back to Step 1), stop when no more new assignment
(i.e., membership in each cluster no longer changes)
6
21-10-2023
Class Problem
Training x1 x2
Examples
A 1 1
B 1 0
C 0 2
D 2 4
E 3 5
• Let k=2, means we are interested in two clusters
• Let A and C are randomly selected as the means of 2 clusters.
Class Problem
Mean/Center Distance from Distance from
center 1 center 2
7
21-10-2023
Class Problem
Mean/Center Distance from Distance from
center 1 center 2
Class Problem
Mean/Center Distance from Distance from
center 1 {1,0.5} center 2 {1.7,
3.7}
A 1 1
A 0.5 (C1) 2.7 B 1 0
B 0.5 (C1) 3.7 C 0 2
D 2 4
C 1.8 (C1) 2.4
E 3 5
D 3.6 0.5 (C2)
E 4.9 1.9 (C2)
8
21-10-2023
Summary
• K-means algorithm is a simple yet popular method for clustering
analysis but it fails for non-linear or complex data.
• The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may substantially distort the
distribution of the data.
• There are other limitations – still a need for reducing costs of
calculating distances to centroids.
• Its performance is determined by initialisation and appropriate
distance measure
• There are several variants of K-means to overcome its weaknesses
– Kernel K-means
– K-Medoids or PAM (partitioning around medoids): resistance to noise
and/or outliers
– K-Modes: extension to categorical data clustering analysis
– CLARA: extension to deal with large data sets
– Mixture models (EM algorithm): handling uncertainty of clusters
SOM
Books:
Neural Networks by Simon Haykin
Fundamentals of NN by Laurene Fausett
9
21-10-2023
SOM model
• The Self-Organizing Map (SOM) was introduced by Teuvo
Kohonen in 1982.
• The SOM (also known as the Kohonen feature map)
algorithm is one of the best known artificial neural
network algorithms.
• In contrast to many other neural networks using
supervised learning, the SOM is based on unsupervised
learning.
• Teuvo Kohonen, a professor of the Academy of Finland,
provided a way of representing multidimensional data in
much lower dimensional spaces - usually one or two
dimensions with SOM algorithm.
SOM
• The SOM has been proven useful in many applications.
10
21-10-2023
SOM
• It provides a topology preserving mapping
from the high dimensional space to map units.
• The property of topology preservation means
that the mapping preserves the relative
distance between the points.
– Points that are near each other in the input space
are mapped to nearby map units in the SOM.
– The SOM can thus serve as a cluster analyzing tool
of high-dimensional data.
– Also, the SOM has the capability to generalize.
SOM
• Generalization capability means that the
network can recognize or characterize inputs it
has never encountered before.
11
21-10-2023
Competitive learning
• With Backpropagation, when we applied a net that was trained
to classify the input signal into one of the output categories,
A,B, C……Z, the net sometimes responded that the signal was
both C and K or both E and K.
• In such situations, we know only one of several neurons should
respond, we can include additional structure in the network so
that the net is forced to make a decision as to which one unit
will respond.
• The mechanism by which this is achieved is called competition.
• The most extreme form of competition among a group of
neurons is called WinnerTakeAll.
Competitive learning
• In competitive learning, neurons compete among
themselves to be activated.
• While in Hebbian learning, several output neurons
can be activated simultaneously, in competitive
learning, only a single output neuron is active at
any time.
• The output neuron that wins the competition is
called the winner-takes-all neuron.
12
21-10-2023
SOM Overview
• SOM is based on three principles:
– Competition: each neuron calculates a discriminant
function. The neuron with the highest value is declared the
winner.
– Cooperation: Neurons near-by the winner on the lattice
get a chance to adapt.
– Adaptation: The winner and its neighbors increase their
discriminant function value relative to the current input.
• Subsequent presentation of the current input should
result in enhanced function value.
• Redundancy in the input is needed!
13
21-10-2023
Architecture example
14
21-10-2023
Feature map
• Consider 3D input data with red, blue, green
(RGB) values for a particular color.
• For this dataset, a good mapping would
group red, green, blue colors far away from
one another and place the intermediate
colors between their base colors.
– E.g. Yellow should get mapped close to red and
green
– E.g. Teal should get mapped close to green and
blue.
15
21-10-2023
Feature map
• Map nodes are not connected to each other.
16
21-10-2023
Network Architecture
17
21-10-2023
Algorithm
• Initialize weights
• For 0 to X number of training epochs
– Select a sample from the input data set
– Find the "winning" neuron for the sample input
– Adjust the weights of winning and nearby neurons
• End for loop
This model tries moving input vectors similar (in
size) to weight vectors towards winning unit in the
feature map.
Algorithm
18
21-10-2023
19
21-10-2023
Cooperative Process
• The winning neuron locates the center of a topological
neighborhood of cooperating neurons.
Topographic Map
• This form of map, known as a topographic map , has two
important properties:
– At each stage of representation, or processing, each piece of incoming
information is kept in its proper context (neighbourhood).
– Neurons dealing with closely related pieces of information in input
space are kept close together in topographic map so that they can
interact via short synaptic connections.
• “The spatial location of an output neuron in a topographic
map corresponds to a particular domain or feature drawn
from the input space”.
w j (n 1) w j (n) (n) h j ,i ( x ) (n) [ x w j (n)] (eq 9.13)
• This weight update is applied to all neurons in lattice that lie
inside topological neighborhood of winning neuron i.
20
21-10-2023
Finding Neighbors
• The neurons close to the winning neuron are called its
neighbors.
• Determining a neuron's neighbors can be achieved with
– concentric squares, hexagons, and other polygonal shapes
as well as Gaussian functions, Mexican Hat functions, etc.
• Generally, the neighborhood function is designed to have a
global maxima at the winning neuron and decrease as it gets
further away from it.
• This makes it so that
– neurons close to the "winning" neuron get scaled towards
the sample input the most
– While neurons far away get scaled the least, which creates
groupings of similar neurons in the final map and this is
called cooperative process.
Topological neighbourhood
• Let dj,i denote the lateral distance (in output space rather than
some distance measure in original input space) between
winning neuron i and excited neuron j
21
21-10-2023
Gaussian Function
for n=0, 1, 2, 3
Gaussian Distribution
Bean
machine:
drop ball
with pins
1-d 2-d
Gaussian Gaussian
From wikipedia and http://home.dei.polimi.it
44
22
21-10-2023
Adaptation in SOM
• For the network to be self organizing or
adapting, the synaptic weight vector wj of
neuron j in the network is required to change
in relation to the input vector x.
23
21-10-2023
Adaptation in SOM
• Equation 9.13 has effect of moving synaptic weight
vector wi of winning neuron ‘i’ towards input vector
‘x’.
• Upon repeated presentations of training data,
synaptic weight vectors tend to follow distribution of
input vectors due to neighborhood updating.
• Algorithm therefore leads to a topological ordering of
feature map in input space in the sense that neurons
that are adjacent in lattice will tend to have similar
weight vectors.
24
21-10-2023
• As compared to:
25
21-10-2023
Convergence in SOM
• As a general rule, the no. of iterations constituting the
convergence phase must be at least 500 times the no. of
neurons in the network.
26
21-10-2023
27
21-10-2023
28
21-10-2023
Cross-validation
• Cross validation method
– Divide a given data set into m parts
– Use m – 1 parts to obtain a clustering model i.e. hold out
1 part
– Use the remaining part to test the quality of the clustering
• E.g.
I. For each point in the test set, find the closest
centroid, and
II. use the sum of squared distance between all points
in the test set and the closest centroids to measure
how well the model fits the test set
– For any k > 0, repeat it m times, compare the overall
quality measure w.r.t. different k’s, and find # of clusters
that fits the data the best.
BetaCV
29
21-10-2023
Silhouette Coefficient
• Silhouette Coefficient or silhouette score is a metric used to
calculate the goodness of a clustering technique. Its value
ranges from -1 to 1.
1: Means clusters are well apart from each other and clearly
distinguished.
0: Means clusters are indifferent, or we can say that the distance
between clusters is not significant.
-1: Means clusters are assigned in the wrong way.
• Silhouette Score = (b-a)/max(a,b)
– Where a= average intra-cluster distance i.e the average distance
between each point within a cluster.
– b= average inter-cluster distance i.e the average distance between all
clusters.
• http://docs.unigrafia.fi/publications/kohonen_teuvo/MATLAB
_implementations_and_applications_of_the_self_organizing_
map.pdf
• https://in.mathworks.com/help/nnet/examples/iris-
clustering.html
30
21-10-2023
Applications of SOM
• The most important practical applications of SOMs are in:
– exploratory data analysis,
– pattern recognition,
– speech analysis,
– robotics,
– industrial and medical diagnostics,
– instrumentation and control,
– Environment protection
SOM
• The major disadvantage of a SOM is that it requires necessary
and sufficient data in order to develop meaningful clusters.
31
21-10-2023
32