K Mean Clustering
K Mean Clustering
K Mean Clustering
CLUSTERING
INTRODUCTION-
What is clustering?
Therefore, there is no
change in the cluster.
Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
(with K=3)
Step 1 Step 2
PLOT
Real-Life Numerical Example
of K-Means Clustering
We have 4 medicines as our training data points object
and each medicine has 2 attributes. Each attribute
represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object
weight index
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
Step 1:
Initial value of
centroids : Suppose
we use medicine A and
medicine B as the first
centroids.
Let and c1 and c2
denote the coordinate
of the centroids, then
c1=(1,1) and c2=(2,1)
Objects-Centroids distance : we calculate the
distance between cluster centroid to each object.
Let us use Euclidean distance, then we have
distance matrix at iteration 0 is
Iteration 2, determine
centroids: Now we repeat step
4 to calculate the new centroids
coordinate based on the
clustering of previous iteration.
Group1 and group 2 both has
two members, thus the new
centroids are
and
Iteration-2, Objects-Centroids distances :
Repeat step 2 again, we have new distance
matrix at iteration 2 as
Iteration-2, Objects clustering: Again, we
assign each object based on the minimum
distance.
Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()
For i = 1 To totalData
min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)
For j = 1 To numCluster
d = dist(X, Y, Centroid(1, j), Centroid(2, j))
If d < min Then
min = d
cluster = j
End If
Next j
If Data(0, i) <> cluster Then
Data(0, i) = cluster
isStillMoving = True
End If
Next i
Loop
End If
End Sub
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm
may be trapped in the local optimum.
Applications of K-Mean
Clustering
It is relatively efficient and fast. It computes result
at O(tkn), where n is number of objects or points, k
is number of clusters and t is number of iterations.
k-means clustering can be applied to machine
learning or data mining
Used on acoustic data in speech understanding to
convert waveforms into one of k categories (known
as Vector Quantization or Image Segmentation).
Also used for choosing color palettes on old
fashioned graphical display devices and Image
Quantization.
CONCLUSION
K-means algorithm is useful for undirected
knowledge discovery and is relatively simple.
K-means has found wide spread usage in lot
of fields, ranging from unsupervised learning
of neural network, Pattern recognitions,
Classification analysis, Artificial intelligence,
image processing, machine vision, and many
others.
References
Tutorial - Tutorial with introduction of Clustering Algorithms (k-means, fuzzy-c-means,
hierarchical, mixture of gaussians) + some interactive demos (java applets).
H. Zha, C. Ding, M. Gu, X. He and H.D. Simon. "Spectral Relaxation for K-means
Clustering", Neural Information Processing Systems vol.14 (NIPS 2001). pp. 1057-
1064, Vancouver, Canada. Dec. 2001.
www.wikipedia.com