0% found this document useful (0 votes)
19 views27 pages

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

hl7694016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views27 pages

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

hl7694016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CISI612 December 03, 2022

7. Data Mining:
Clustering K-Means

Dr. Kadan Aljoumaa


[email protected]
Outlines
 Introduction
 Types of Clustering
 Common Distance Measures
 K-means Clustering
 How the K-mean Clustering Algorithm Works?
 A Simple Example Showing the Implementation of K-
means Algorithm
 Weaknesses of K-mean Clustering
 Applications of K-mean Clustering
 Conclusion

2 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


INTRODUCTION - What is clustering?

 Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set

into subsets (clusters), so that the data in each subset

(ideally) share some common trait - often according to

some defined distance measure.

3 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Clustering types
 Two types of clustering:
 Partitioning methods: these find successive clusters using
previously established clusters.
 Agglomerative ("bottom-up"):Agglomerative algorithms begin with
each element as a separate cluster and merge them into successively
larger clusters.
 Divisive ("top-down"): Divisive algorithms begin with the whole set
and proceed to divide it into successively smaller clusters.
 Hierarchical Methods: Partitional algorithms determine all
clusters at once. They include:
 K-means and derivatives
 Fuzzy c-means clustering
 QT clustering algorithm

4 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Clustering types - Partitioning methods
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative

a
ab
b
abcde

c
cde
d
de
e
divisive

Step 4 Step 3 Step 2 Step 1 Step 0

5 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Similarity and Dissimilarity Between
Objects
 Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of
the clusters.They include:
 Euclidean distance
 distance(x, y) = {Σi (xi – yi)2 }½
 Chebychev distance
 differentiate furthest dimensions or attributes
 distance(x, y) = Maximum | xi – yi |
 Hamming distancem
 distance(x,y) =  xi  yi
i 1
 Weighted Euclidean distance
dist(xi , x j )  w1 ( xi1  x j1 )2  w2 ( xi 2  x j 2 )2  ...  wr ( xir  x jr ) 2
6 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-MEANS CLUSTERING

 The k-means algorithm is an algorithm to cluster n

objects based on attributes into k partitions, where k < n.

 It is similar to the expectation-maximization algorithm for

mixtures of Gaussians in that they both attempt to find

the centers of natural clusters in the data.

 It assumes that the object attributes form a vector space.


7 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-MEANS CLUSTERING

 Simply speaking k-means clustering is an algorithm to


classify or to group the objects based on
attributes/features into K number of group.

 K is positive integer number.

 The grouping is done by minimizing the sum of squares of


distances between data and the corresponding cluster
centroid.

8 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


How the K-Mean Clustering algorithm works?

9 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


K-Mean Clustering algorithm 1/2
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the data
into k clusters. You may assign the training samples
randomly ,or systematically as the following:
1. Take the first k training sample as single-element
clusters
2. Assign each of the remaining (N-k) training sample to
the cluster with the nearest centroid. After each
assignment, recompute the centroid of the gaining
cluster.

10 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


K-Mean Clustering algorithm 2/2

 Step 3: Take each sample in sequence and compute its


distance from the centroid of each of the clusters. If a sample
is not currently in the cluster with the closest centroid, switch
this sample to that cluster and update the centroid of the
cluster gaining the new sample and the cluster losing the
sample.
 Step 4: Repeat step 3 until convergence is achieved, that is
until a pass through the training sample causes no new
assignments.

11 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Example of k-means algorithm

(using K=2)
12 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
Example of k-means algorithm
Step 1:
Initialization:
Randomly we
choose two
centroids (k=2)
for two clusters.

In this case the


2 centroid are:

m1=(1.0,1.0)
and
m2=(5.0,7.0).
13 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
Example of k-means algorithm
Step 2:
 Thus, we obtain two clusters
containing individuals:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:

14 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

 Therefore, the new clusters


are:
{1,2} and {3,4,5,6,7}

 Next centroids are:


m1=(1.25,1.5) and m2 =
(3.9,5.1)

15 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no change


in the cluster.
 Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.

16 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


17 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
18 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
19 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
20 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
21 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
22 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
23 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-means Clustering - Example

24 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial
condition may produce different result of cluster. The
algorithm may be trapped in the local optimum.

25 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


Applications of K-Mean Clustering

 It is relatively efficient and fast. It computes result at


O(tkn), where n is number of objects or points, k is
number of clusters and t is number of iterations.
 k-means clustering can be applied to machine learning
or data mining
 Used on acoustic data in speech understanding to convert
waveforms into one of k categories (known as Vector
Quantization or Image Segmentation).
 Also used for choosing color palettes on old fashioned
graphical display devices and Image Quantization.

26 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering


CONCLUSION

 K-means algorithm is useful for undirected knowledge


discovery and is relatively simple.

 K-means has found wide spread usage in lot of fields,


ranging from unsupervised learning of neural network,
Pattern recognitions, Classification analysis, Artificial
intelligence, image processing, machine vision, and many
others.

27 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

You might also like