0% found this document useful (0 votes)

13 views48 pages

K Mean Clustering

The K-means clustering algorithm is used to partition n observations into k clusters where each observation belongs to the cluster with the nearest mean. It works by iteratively assigning datapoints to centroids and updating the centroid positions until convergence is reached. The algorithm starts by randomly initializing k centroids and then calculates the distance between each datapoint and centroid, assigning each datapoint to its closest centroid. It then recalculates the positions of the k centroids as the means of the datapoints in each cluster and repeats this process until the centroids no longer move.

Uploaded by

Rexline S J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views48 pages

K Mean Clustering

Uploaded by

Rexline S J

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

K-MEANS

CLUSTERING
INTRODUCTION-
What is clustering?

 Clustering is the classification of objects into

different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.
 The output image is clearly showing the five different
clusters with different colors.
 The clusters are formed between two parameters of the
dataset; Annual income of customer and Spending.
 We can change the colors and labels as per the
requirement or choice.
 Cluster1 shows the customers with average salary and average
spending so we can categorize these customers as
 Cluster2 shows the customer has a high income but low spending, so
we can categorize them as careful.
 Cluster3 shows the low income and also low spending so they can be
categorized as sensible.
 Cluster4 shows the customers with low income with very high
spending so they can be categorized as careless.
 Cluster5 shows the customers with high income and high spending so
they can be categorized as target, and these customers can be the
most profitable customers for the mall owner.
How does the K-Means Algorithm Work?

 Step-1: Select the number K to decide the number of clusters.

 Step-2: Select random K points or centroids. (It can be other from the
input dataset).
 Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
 Step-4: Calculate the variance and place a new centroid of each
cluster.
 Step-5: Repeat the third steps, which means reassign each datapoint to
the new closest centroid of each cluster.
 Step-6: If any reassignment occurs, then go to step-4 else go to
FINISH.
 Step-7: The model is ready.
1
2
3
4
5
6
7
8
9
10
11
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
 It is similar to the
expectation-maximization algorithm for mixtures of
Gaussians in that they both attempt to find the
centers of natural clusters in the data.
 It assumes that the object attributes form a
vector space.
 An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing data points so as to minimize the
sum-of-squares criterion

where xn is a vector representing the the nth

data point and uj is the geometric centroid of
the data points in Sj.
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.
 The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
How the K-Mean Clustering
algorithm works?
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly,or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest centroid.
After each assignment, recompute the centroid of
the gaining cluster.
 Step 3: Take each sample in sequence and
compute its distance from the centroid
of each of the clusters. If a sample is not
currently in the cluster with the
closest centroid, switch this
sample to that cluster and update the
centroid of the cluster gaining the
new sample and the cluster losing the
sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through
the training sample causes no new
assignments.
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

 Therefore, the new

clusters are:
{1,2} and {3,4,5,6,7}

 Next centroids are:

m1=(1.25,1.5) and m2 =
(3.9,5.1)
 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
(with K=3)

Step 1 Step 2
PLOT
Real-Life Numerical Example
of K-Means Clustering
We have 4 medicines as our training data points object
and each medicine has 2 attributes. Each attribute
represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object weight index
1 1
Medicine A

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Step 1:
 Initial value of
centroids : Suppose
we use medicine A and
medicine B as the first
centroids.
 Let and c1 and c2

denote the coordinate

of the centroids, then
c1=(1,1) and c2=(2,1)
 Objects-Centroids distance : we calculate the
distance between cluster centroid to each object.
Let us use Euclidean distance, then we have
distance matrix at iteration 0 is

 Each column in the distance matrix symbolizes the

object.
 The first row of the distance matrix corresponds to the
distance of each object to the first centroid and the
second row is the distance of each object to the second
centroid.
 For example, distance from medicine C = (4, 3) to the
first centroid is , and its distance to the
second centroid is , is etc.
Step 2:
 Objects clustering : We
assign each object based
on the minimum distance.
 Medicine A is assigned to
group 1, medicine B to
group 2, medicine C to
group 2 and medicine D to
group 2.
 The elements of Group
matrix below is 1 if and
only if the object is
assigned to that group.
 Iteration-1, Objects-Centroids distances : The
next step is to compute the distance of all
objects to the new centroids.
 Similar to step 2, we have distance matrix at
iteration 1 is
 Iteration-1, Objects
clustering:Based on the new
distance matrix, we move the
medicine B to Group 1 while
all the other objects remain.
The Group matrix is shown
below

 Iteration 2, determine
centroids: Now we repeat step
4 to calculate the new centroids
coordinate based on the
clustering of previous iteration.
Group1 and group 2 both has
two members, thus the new
centroids are
and
 Iteration-2, Objects-Centroids distances :
Repeat step 2 again, we have new distance
matrix at iteration 2 as
 Iteration-2, Objects clustering: Again, we
assign each object based on the minimum
distance.

 We obtain result that . Comparing the

grouping of last iteration and this iteration reveals
that the objects does not move group anymore.
 Thus, the computation of the k-mean clustering
has reached its stability and no more iteration is
needed..
We get the final grouping as the results as:

Object Feature1(X): Feature2 Group

weight index (Y): pH (result)
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2
K-Means Clustering Visual Basic Code

Sub kMeanCluster (Data() As Variant, numCluster As Integer)

' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)

Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()

Dim isStillMoving As Boolean

isStillMoving = True
if totalData <= numCluster Then
'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData ' cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y
Else
'calculate minimum distance to assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numCluster
Do While isStillMoving
' this loop will surely convergent
'calculate new centroids
' 1 =X, 2=Y, 3=count number of data
ReDim sumXY(1 To 3, 1 To numCluster)
For i = 1 To totalData
sumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
Data(0, i))
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))
Next i
For i = 1 To numCluster
Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)
Next i
'assign all data to the new centroids
isStillMoving = False

For i = 1 To totalData
min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)
For j = 1 To numCluster
d = dist(X, Y, Centroid(1, j), Centroid(2, j))
If d < min Then
min = d
cluster = j
End If
Next j
If Data(0, i) <> cluster Then
Data(0, i) = cluster
isStillMoving = True
End If
Next i
Loop
End If
End Sub
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm
may be trapped in the local optimum.
Applications of K-Mean
Clustering
 It is relatively efficient and fast. It computes result
at O(tkn), where n is number of objects or points, k
is number of clusters and t is number of iterations.
 k-means clustering can be applied to machine
learning or data mining
 Used on acoustic data in speech understanding to
convert waveforms into one of k categories (known
as Vector Quantization or Image Segmentation).
 Also used for choosing color palettes on old
fashioned graphical display devices and Image
Quantization.
CONCLUSION
 K-means algorithm is useful for undirected
knowledge discovery and is relatively simple.
K-means has found wide spread usage in lot
of fields, ranging from unsupervised learning
of neural network, Pattern recognitions,
Classification analysis, Artificial intelligence,
image processing, machine vision, and many
others.
References
 Tutorial - Tutorial with introduction of Clustering Algorithms (k-means, fuzzy-c-means,
hierarchical, mixture of gaussians) + some interactive demos (java applets).

 Digital Image Processing and Analysis-byB.Chanda and D.Dutta Majumdar.

 H. Zha, C. Ding, M. Gu, X. He and H.D. Simon. "Spectral Relaxation for K-means
Clustering", Neural Information Processing Systems vol.14 (NIPS 2001). pp. 1057-
1064, Vancouver, Canada. Dec. 2001.

 J. A. Hartigan (1975) "Clustering Algorithms". Wiley.

 J. A. Hartigan and M. A. Wong (1979) "A K-Means Clustering Algorithm", Applied

Statistics, Vol. 28, No. 1, p100-108.

 D. Arthur, S. Vassilvitskii (2006): "How Slow is the k-means Method?,"

 D. Arthur, S. Vassilvitskii: "k-means++ The Advantages of Careful Seeding" 2007

Symposium on Discrete Algorithms (SODA).

 www.wikipedia.com

Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
General Directions
No ratings yet
General Directions
3 pages
Sybsc Cs Labbook 2019cbcs Sem IV
No ratings yet
Sybsc Cs Labbook 2019cbcs Sem IV
36 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
ML Clustering K Mean
No ratings yet
ML Clustering K Mean
33 pages
Lecture 18 Clustering 19092024 091909am
No ratings yet
Lecture 18 Clustering 19092024 091909am
33 pages
02 K-Means
No ratings yet
02 K-Means
25 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
K Mean Clustering 1
100% (1)
K Mean Clustering 1
12 pages
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
No ratings yet
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
73 pages
K Mean Clustering
No ratings yet
K Mean Clustering
11 pages
Clustering
No ratings yet
Clustering
8 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
K Mean
No ratings yet
K Mean
12 pages
42-Unsupervised Learning - K-Means Clustering-21-11-2024
No ratings yet
42-Unsupervised Learning - K-Means Clustering-21-11-2024
18 pages
Bis Distance
No ratings yet
Bis Distance
8 pages
C9 - Clustering - K Means
No ratings yet
C9 - Clustering - K Means
24 pages
K Mean Clustering
No ratings yet
K Mean Clustering
24 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
K Mean Clustering
No ratings yet
K Mean Clustering
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
K Mean
No ratings yet
K Mean
7 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Algo
No ratings yet
Algo
59 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
K Means
No ratings yet
K Means
23 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Week 11
No ratings yet
Week 11
49 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
ML 12
No ratings yet
ML 12
19 pages
Kmea
No ratings yet
Kmea
53 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4
No ratings yet
Unit 4
125 pages
K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
Pilot
No ratings yet
Pilot
3 pages
K Clustering
No ratings yet
K Clustering
28 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Clustering
No ratings yet
Clustering
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
3 pages
Introduction To Kmeans
No ratings yet
Introduction To Kmeans
4 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital Circuits - Decoders
No ratings yet
Digital Circuits - Decoders
26 pages
Binary Counters
No ratings yet
Binary Counters
48 pages
DAA Code
No ratings yet
DAA Code
58 pages
Excitation Table
No ratings yet
Excitation Table
6 pages
DS UnitII
No ratings yet
DS UnitII
57 pages
Unit Iii - DS - PPT
No ratings yet
Unit Iii - DS - PPT
71 pages
OS Structure
No ratings yet
OS Structure
14 pages
Unit Iv - DS - PPT
No ratings yet
Unit Iv - DS - PPT
60 pages
OS Services
No ratings yet
OS Services
11 pages
UNITV
No ratings yet
UNITV
34 pages
SA UnitII Case Studies
No ratings yet
SA UnitII Case Studies
20 pages
Lookup and Refernces
No ratings yet
Lookup and Refernces
114 pages
Class 8 It Unit-2
No ratings yet
Class 8 It Unit-2
6 pages
AVL Rotation
No ratings yet
AVL Rotation
7 pages
CMPS161ClassNotesChap07 Multidimensional Arrays
No ratings yet
CMPS161ClassNotesChap07 Multidimensional Arrays
16 pages
DS Practical
No ratings yet
DS Practical
99 pages
Data Structures and Algorithms-Mcap 1201-2023
No ratings yet
Data Structures and Algorithms-Mcap 1201-2023
4 pages
Computer Science Heuristic Search Algorithm Best-First Search
No ratings yet
Computer Science Heuristic Search Algorithm Best-First Search
2 pages
Alpha - Beta Pruning Example
No ratings yet
Alpha - Beta Pruning Example
3 pages
Strategies For Query Processing
No ratings yet
Strategies For Query Processing
19 pages
Ass Deepa Mandal CrS2004
No ratings yet
Ass Deepa Mandal CrS2004
6 pages
Heap Sort
No ratings yet
Heap Sort
40 pages
Chapter 2 Arrays Iteration Invariants
No ratings yet
Chapter 2 Arrays Iteration Invariants
19 pages
Module 3
No ratings yet
Module 3
98 pages
F2L Algorithms (First 2 Layers) : Algorithm Presentation Format
No ratings yet
F2L Algorithms (First 2 Layers) : Algorithm Presentation Format
3 pages
Stack
No ratings yet
Stack
91 pages
Data Structures in Java
No ratings yet
Data Structures in Java
9 pages
Cse373 10sp Midterm2.Key
No ratings yet
Cse373 10sp Midterm2.Key
10 pages
Double and Circular Linked List
No ratings yet
Double and Circular Linked List
4 pages
2) Shortest Job First Scheduling
No ratings yet
2) Shortest Job First Scheduling
3 pages
MIT 6.0002 Introduction To Computational Thinking and Data Science Notes
No ratings yet
MIT 6.0002 Introduction To Computational Thinking and Data Science Notes
25 pages
Experiment - 6
No ratings yet
Experiment - 6
5 pages
Answer:: Chapter 19 - Solution Procedures For Transportation and Assignment Problems True / False
No ratings yet
Answer:: Chapter 19 - Solution Procedures For Transportation and Assignment Problems True / False
13 pages
02 Analysis PDF
No ratings yet
02 Analysis PDF
54 pages
Longest Common Subsequence (Dynamic Programming)
No ratings yet
Longest Common Subsequence (Dynamic Programming)
18 pages
Report On Micro-Project: Sorting Linked List Using Bubble Sort
100% (1)
Report On Micro-Project: Sorting Linked List Using Bubble Sort
12 pages
Shortest Path Algorithms: 8.1.1 Problem
No ratings yet
Shortest Path Algorithms: 8.1.1 Problem
14 pages
Lect. 2-1numerical Solution of Nonlinear Equations Part1
No ratings yet
Lect. 2-1numerical Solution of Nonlinear Equations Part1
12 pages
Unit III - Digital Image Fundamentals
No ratings yet
Unit III - Digital Image Fundamentals
19 pages

K Mean Clustering

Uploaded by

K Mean Clustering

Uploaded by

K-MEANS

 Clustering is the classification of objects into

 Step-1: Select the number K to decide the number of clusters.

where xn is a vector representing the the nth

 Therefore, the new

 Next centroids are:

denote the coordinate

 Each column in the distance matrix symbolizes the

 We obtain result that . Comparing the

Object Feature1(X): Feature2 Group

Sub kMeanCluster (Data() As Variant, numCluster As Integer)

Dim isStillMoving As Boolean

 Digital Image Processing and Analysis-byB.Chanda and D.Dutta Majumdar.

 J. A. Hartigan (1975) "Clustering Algorithms". Wiley.

 J. A. Hartigan and M. A. Wong (1979) "A K-Means Clustering Algorithm", Applied

 D. Arthur, S. Vassilvitskii (2006): "How Slow is the k-means Method?,"

 D. Arthur, S. Vassilvitskii: "k-means++ The Advantages of Careful Seeding" 2007

You might also like