CI-10 Networks Based On Competition Learning - Clustering - Kmean and SOM
CI-10 Networks Based On Competition Learning - Clustering - Kmean and SOM
1
Data Clustering
Clustering is the classification of objects into different C1
groups
The partitioning of a data set into subsets (clusters),
so that the data in each subset (ideally) share some
common trait - often proximity according to some
defined distance measure
Clustering is unsupervised classification
Input: A set of unlabeled examples (input vectors)
Task: Cluster (group) the inputs into clusters
Output: Cluster labels
C2
2
Applications of Clustering
Imaging and Segmentation
3D brain imaging data segmented into three
tissue types (White Matter, Gray Matter and
Cerebrospinal Fluid)
3
Applications of Clustering
Gene Clustering
Market Surveys
Market researchers use
cluster analysis to
partition the general
population of consumers
into market segments
and to better understand
the relationships
between different groups
of consumers/potential
customers.
4
Applications of Clustering
Social Network Analysis
5
Applications of Clustering
Data Mining
Document Clustering on the
WWW
http://websom.hut.fi/websom/
Search Result Grouping
Image Clustering
6
Applications of Clustering
Land Use Analysis
Signal Analysis and Disputed Territory
7
Requirements for Clustering Algorithms
Scalability
Dealing with Different Types of Attributes
Discovery of Clusters with arbitrary shape
Minimum requirements for domain knowledge to
determine input parameters
Robust and Noise Tolerant
Insensitive to order of input records
High Dimensionality
Interpretability and Usability
8
How to define Similarity or Difference?
Clustering is essentially based upon finding the
similarity/difference between given data points
Use of Distance Measures
n
Euclidean Distance (2-Norm) pi qi
2
i 1
n
Manhattan Distance (1-Norm) pq
i 1
i i
m
pi qi
Minkowski Distance i 1
1
Mahalanobis Distance ( p q)P ( p q)
9
Quality of Clustering
A good clustering method will produce high quality clusters
with
High intra-class similarity (Low intra class variability)
Low inter-class similarity (High inter class variability)
Measurement of Clustering Quality
As the process in unsupervised so there is no error
relation specifying the quality of clustering
Clustering Indices (Comparison of Different Clustering
Techniques)
Davies Bouldin Index
Dunn’s Index
Partition Coefficients
Classification Entropy
Separation Index
Fuzzy Hypervolume
…
10
Davies Bouldin Index
A similarity measure Rij between the clusters Ci and Cj is
defined based on a measure of dispersion of a cluster C i, let
si, and a dissimilarity measure between two clusters d ij.
R13
C1
d12
d13 R12=R1 C2
C3
Rij
s s
i j Degree of intra class variability
dij Degree of inter si = Mean square distance from the points
R j max Rij class variability in cluster i to the center of cluster i
i 1,..., nc ,i j
dij = distance between centers of clusters i
nc
1 Smaller DBI and j
DBnc
nc
R
i 1
j means good nc = total number of clusters 11
Clustering
Dunn’s Index Large DI means
good Clustering
Degree of inter
class variability
Degree of intra
class variability
∆(A1)
∆(A3) ∆(A2)
C1
C3 C2
12
Clustering
Using K-Means
Most commonly used algorithm for clustering
13
k-means Clustering
15
k-means Clustering
16
k-means Clustering
17
k-means Clustering
18
Use of ANNs for Clustering: SOM
Effective Clustering
Adaptability
Is able adapt itself to a variety of
input data distributions
Topological Ordering
For example, SOM can be used
to divide the web-blogs into
groups and at the same time
provide a visualization
capability along with navigation
because neighboring nodes of
the SOM would be representing
similar communities [Merelo-
Guervós et al.]
Same can be said for document
clustering (WEBSOM)
Neuro-Biologically Sound
20
Inspiration for the SOM
Topologically ordered Computational Mapping in the brain
transform the input signal into a place coded probability
distribution: Information can be readily accessed and
utilized
21
Building concepts for the SOM
Neurobiological studies indicate that different sensory
inputs are mapped onto corresponding areas of cerebral
cortex in an orderly fashion
This form of map is known as a topographic map and has
two important properties
At each stage of representation or processing, each
piece of information is kept in its proper
context/neighborhood
Neurons dealing with closely related pieces of
information are kept close together so that they can
interact via short synaptic connections
This gives rise to the principle of topographic map
formation
The spatial location of an output neuron in a
topographic map corresponds to a particular domain or
feature drawn from the input space
22
Objectives of SOM
To have artificial topographic maps that learn
through self organization in a neuro-biologically
inspired manner
To transform an incoming signal pattern of
arbitrary dimensions into a one or two
dimensional discrete map and perform this
transformation adaptively in a topologically
ordered fashion
See Video!
23
Architecture
To classify a given set of n-
dimensional features into m
clusters
The input neurons represent
the different components of
the input pattern
The output unit is organized in
the form of a 1D or 2D map
Each output unit represents a
cluster
Weight vector, wj, of a
cluster unit j serves as an
exemplar (representative)
of the input patterns
associated with that cluster
24
Strategy
The neurons are placed at the nodes of a lattice
that is usually one or two dimensional. Higher
dimensional maps are also possible but not as
common. The neurons become selectively tuned
to various input patterns or classes of input
patterns in the course of a competitive learning
process
The locations of the neurons so tuned (i.e.
winning neurons) become ordered with respect to
each other in such a way that a meaningful
coordinate system for different input features is
created over the lattice
25
Training
Training:
Train the network such that the weight vector
wj associated with Yj becomes the
representative vector of the class of input
patterns Yj is to represent.
Processes in Training
Initialization
Initialize the weights of the network with small random
values
Competition
For each input pattern each output neuron calculates
how close it is to the input pattern using a Discriminant
or distance function and the neuron closest to the input
pattern is selected as the winner neuron
Cooperation
The winning neuron determines the spatial location of a
topological neighborhood (implemented through a
neighborhood function) of excited neurons thereby
providing a basis for cooperation among neighboring
neurons
Adaptation
Movement of neurons to better fit the data through
weight adjustment
26
Training…
During the competition phase the winner neuron is
selected on the basis of minimum distance from the input
pattern
i (x) arg min x( n) w j , j 1, 2,..., l
j
Adaptation
w j (n 1) w j (n) (n)h j ,i ( x ) (n) x(n) w j ( n)
Learning Rate Neighborhood Function
x
wj
( n ) h j , i ( x ) ( n ) x ( n ) w j ( n )
27
Neighborhood Function
Models the definition of the
neighborhood 1 if d j ,i r (n)
h j ,i ( x ) n Hard Neighborhood
Symmetric about the maximum point 0 otherwise
(at the winning neuron)
d 2j ,i
Amplitude of the neighborhood h j ,i ( x ) (n) exp Soft Neighborhood
2 2 (n)
decreases with increase in distance
from the winning neuron n
(n) 0 exp
Hard Neighborhood: Rectangular 1
Soft Neighborhood: Gaussian
(better)
Decreased over time to preserve
topology
28
Learning Rate
Upper bound on the amount of update
LR is decreased over time to preserve the
ordering and the topological structure once
established
We can use exponential decrease or linear
decrease
Decrease in learning rate: 0 = 0.1 2=1000
0.1
0.09
0.08
n 0.07
2
0.05
0.04
0.02
0.01
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
n
29
Different Topologies
30
Phases of the Adaptive Process
Self Organizing (Ordering)
May take ~1000 epochs
LR begins close to 0.1 and remains above 0.01
Neighborhood function should include almost
all neurons in the neighborhood
Convergence phase
LR is kept small (~0.01)
Neighborhood function should include only
close neighbors of the winning neuron
31
Examples
32
Properties of SOMs
33
Properties of SOM
34
35
Property-3 Density Matching
The feature map generated by SOM reflects the variations in the statistics of
the input distribution
If fX(x) is the distribution of the input
data and m(x) is the map magnification
factor, defined as the number of
neurons in a small volume dx of the
input space, thus
f x dx 1
X m x dx l
For the SOM to match the input density
exactly
m x f X x
But SOM tends to over-represent
regions of low input density and to
under-represent regions of high density.
Theoretically,
m x f X1/ 3 x m x f X2 / 3 x
36
Property-4 Feature Selection
Given data from an input space with a nonlinear
distribution, SOM is able to select a set of best features
for approximating the underlying distribution
A discrete approximation of Principal Curves or
Principal Surfaces
37