0% found this document useful (0 votes)

10 views45 pages

K Mean Clustering

Hello6

Uploaded by

hello125643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views45 pages

K Mean Clustering

Hello6

Uploaded by

hello125643

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Clustering

K-MEANS
CLUSTERING
INTRODUCTION-
What is clustering?

 Clustering is the classification of objects into

different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.
Types of clustering:
1. Hierarchical algorithms: these find successive clusters
using previously established clusters.

1. Agglomerative ("bottom-up"): Agglomerative algorithms

begin with each element as a separate cluster and merge
them into successively larger clusters.
2. Divisive ("top-down"): Divisive algorithms begin with
the whole set and proceed to divide it into successively
smaller clusters.
2. Partitional clustering: Partitional algorithms determine all
clusters at once. They include:
 K-means and derivatives
 Fuzzy c-means clustering
 QT clustering algorithm
Common Distance measures:

 Distance measure will determine how the similarity of two

elements is calculated and it will influence the shape of the
clusters.
They include:
1. The Euclidean distance (also called 2-norm distance) is
given by:

2. The Manhattan distance (also called taxicab norm or 1-

norm) is given by:
3.The maximum norm is given by:

4. The Mahalanobis distance corrects data for

different scales and correlations in the variables.
5. Inner product space: The angle between two
vectors can be used as a distance measure when
clustering high dimensional data
6. Hamming distance (sometimes edit distance)
measures the minimum number of substitutions
required to change one member into another.
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
 It is similar to the expectation-maximization

algorithm for mixtures of Gaussians in that they

both attempt to find the centers of natural clusters
in the data.
 It assumes that the object attributes form a vector

space.
 Analgorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj
containing data points so as to minimize the
sum-of-squares criterion

where xn is a vector representing the the nth

data point and uj is the geometric centroid of
the data points in Sj.
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.

 The grouping is done by minimizing the sum

of squares of distances between data and the

corresponding cluster centroid.
How the K-Mean Clustering
algorithm works?
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the
training samples randomly,or systematically
as the following:
1.Take the first k training sample as single-
element clusters
2. Assign each of the remaining (N-k) training
sample to the cluster with the nearest centroid.
After each assignment, recompute the centroid of
the gaining cluster.
 Step 3: Take each sample in sequence and
compute its distance from the centroid of
each of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster
and update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and
m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters

containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

 Therefore, the new

clusters are:
{1,2} and {3,4,5,6,7}

 Next centroids are:

m1=(1.25,1.5) and m2 =
(3.9,5.1)
 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
PLOT
(with K=3)

Step 1 Step 2
PLOT
Real-Life Numerical Example
of K-Means Clustering
We have 4 medicines as our training data points object
and each medicine has 2 attributes. Each attribute
represents coordinate of the object. We have to
determine which medicines belong to cluster 1 and
which medicines belong to the other cluster.
Attribute1 (X): Attribute 2 (Y): pH
Object
weight index

Medicine A 1 1

Medicine B 2 1

Medicine C 4 3

Medicine D 5 4
Step 1:
 Initial value of

centroids : Suppose
we use medicine A and
medicine B as the first
centroids.
 Let and c and c
1 2
denote the coordinate
of the centroids, then
c1=(1,1) and c2=(2,1)
 Objects-Centroids distance : we calculate the
distance between cluster centroid to each object.
Let us use Euclidean distance, then we have
distance matrix at iteration 0 is

 Each column in the distance matrix symbolizes the

object.
 The first row of the distance matrix corresponds to the
distance of each object to the first centroid and the
second row is the distance of each object to the second
centroid.
 For example, distance from medicine C = (4, 3) to the
first centroid is , and its distance to the
second centroid is , is etc.
Step 2:
 Objects clustering : We
assign each object based
on the minimum distance.
 Medicine A is assigned to
group 1, medicine B to
group 2, medicine C to
group 2 and medicine D to
group 2.
 The elements of Group
matrix below is 1 if and
only if the object is
assigned to that group.
 Iteration-1, Objects-Centroids distances : The
next step is to compute the distance of all
objects to the new centroids.
 Similar to step 2, we have distance matrix at

iteration 1 is
 Iteration-1, Objects
clustering:Based on the new
distance matrix, we move the
medicine B to Group 1 while
all the other objects remain.
The Group matrix is shown
below

 Iteration 2, determine
centroids: Now we repeat step
4 to calculate the new centroids
coordinate based on the
clustering of previous iteration.
Group1 and group 2 both has
two members, thus the new
centroids are
and
 Iteration-2, Objects-Centroids distances :
Repeat step 2 again, we have new distance
matrix at iteration 2 as
 Iteration-2,
Objects clustering: Again, we
assign each object based on the minimum
distance.

 We obtain result that . Comparing the

grouping of last iteration and this iteration reveals
that the objects does not move group anymore.
 Thus, the computation of the k-mean clustering
has reached its stability and no more iteration is
needed..
We get the final grouping as the results as:

Object Feature1(X): Feature2 Group

weight index (Y): pH (result)
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine D 5 4 2
K-Means Clustering Visual Basic Code

Sub kMeanCluster (Data() As Variant, numCluster As Integer)

' main function to cluster data into k number of Clusters
' input:
' + Data matrix (0 to 2, 1 to TotalData);
' Row 0 = cluster, 1 =X, 2= Y; data in columns
' + numCluster: number of cluster user want the data to be clustered
' + private variables: Centroid, TotalData
' ouput:
' o) update centroid
' o) assign cluster number to the Data (= row 0 of Data)

Dim i As Integer
Dim j As Integer
Dim X As Single
Dim Y As Single
Dim min As Single
Dim cluster As Integer
Dim d As Single
Dim sumXY()

Dim isStillMoving As Boolean

isStillMoving = True
if totalData <= numCluster Then
'only the last data is put here because it designed to be interactive
Data(0, totalData) = totalData ' cluster No = total data
Centroid(1, totalData) = Data(1, totalData) ' X
Centroid(2, totalData) = Data(2, totalData) ' Y
Else
'calculate minimum distance to assign the new data
min = 10 ^ 10 'big number
X = Data(1, totalData)
Y = Data(2, totalData)
For i = 1 To numCluster
Do While isStillMoving
' this loop will surely convergent
'calculate new centroids
' 1 =X, 2=Y, 3=count number of data
ReDim sumXY(1 To 3, 1 To numCluster)
For i = 1 To totalData
sumXY(1, Data(0, i)) = Data(1, i) + sumXY(1, Data(0, i))
sumXY(2, Data(0, i)) = Data(2, i) + sumXY(2, Data(0, i))
Data(0, i))
sumXY(3, Data(0, i)) = 1 + sumXY(3, Data(0, i))
Next i
For i = 1 To numCluster
Centroid(1, i) = sumXY(1, i) / sumXY(3, i)
Centroid(2, i) = sumXY(2, i) / sumXY(3, i)
Next i
'assign all data to the new centroids
isStillMoving = False

For i = 1 To totalData
min = 10 ^ 10 'big number
X = Data(1, i)
Y = Data(2, i)
For j = 1 To numCluster
d = dist(X, Y, Centroid(1, j), Centroid(2, j))
If d < min Then
min = d
cluster = j
End If
Next j
If Data(0, i) <> cluster Then
Data(0, i) = cluster
isStillMoving = True
End If
Next i
Loop
End If
End Sub
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm
may be trapped in the local optimum.
Applications of K-Mean
Clustering
 Itis relatively efficient and fast. It computes result
at O(tkn), where n is number of objects or points, k
is number of clusters and t is number of iterations.
 k-means clustering can be applied to machine

learning or data mining

 Used on acoustic data in speech understanding to

convert waveforms into one of k categories (known

as Vector Quantization or Image Segmentation).
 Also used for choosing color palettes on old

fashioned graphical display devices and Image

Quantization.
CONCLUSION
 K-means algorithm is useful for undirected
knowledge discovery and is relatively simple.
K-means has found wide spread usage in lot
of fields, ranging from unsupervised learning
of neural network, Pattern recognitions,
Classification analysis, Artificial intelligence,
image processing, machine vision, and many
others.

Module - 4 K Means Clustering
No ratings yet
Module - 4 K Means Clustering
20 pages
K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
ML Clustering K Mean
No ratings yet
ML Clustering K Mean
33 pages
Lecture 18 Clustering 19092024 091909am
No ratings yet
Lecture 18 Clustering 19092024 091909am
33 pages
02 K-Means
No ratings yet
02 K-Means
25 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
42-Unsupervised Learning - K-Means Clustering-21-11-2024
No ratings yet
42-Unsupervised Learning - K-Means Clustering-21-11-2024
18 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
K Mean Clustering 1
100% (1)
K Mean Clustering 1
12 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
No ratings yet
Lecture - 9 Unsupervised Learning (K-Means, Association Analysis and Frequuent Items)
73 pages
K Mean Clustering
No ratings yet
K Mean Clustering
11 pages
Kmea
No ratings yet
Kmea
53 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
K Mean Clustering
No ratings yet
K Mean Clustering
18 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
K Mean Clustering
No ratings yet
K Mean Clustering
3 pages
Bis Distance
No ratings yet
Bis Distance
8 pages
K Mean Clustering
No ratings yet
K Mean Clustering
24 pages
Algo
No ratings yet
Algo
59 pages
Clustering
No ratings yet
Clustering
8 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
K Mean
No ratings yet
K Mean
12 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
PART2
No ratings yet
PART2
61 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
K-Mean Clustering Final
No ratings yet
K-Mean Clustering Final
21 pages
Clustering
No ratings yet
Clustering
80 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Clustering
No ratings yet
Clustering
84 pages
K Clustering
No ratings yet
K Clustering
28 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Question Bank of Computer Vision
100% (5)
Question Bank of Computer Vision
2 pages
Project Proposal
No ratings yet
Project Proposal
3 pages
Encipherment Using Modern Symmetric-Key Ciphers-Block Ciphers
No ratings yet
Encipherment Using Modern Symmetric-Key Ciphers-Block Ciphers
48 pages
Collision Risk in Hash-Based Surrogate Keys - by Krzysztof K. Zdeb - Nov, 2024 - Towards Data Science
No ratings yet
Collision Risk in Hash-Based Surrogate Keys - by Krzysztof K. Zdeb - Nov, 2024 - Towards Data Science
11 pages
Forty Years of Attacks On The RSA Crypto
No ratings yet
Forty Years of Attacks On The RSA Crypto
23 pages
Inventory Theory.S3 Stochastic Inventory Models: Probability Distribution For Demand
No ratings yet
Inventory Theory.S3 Stochastic Inventory Models: Probability Distribution For Demand
4 pages
Assgn 2
No ratings yet
Assgn 2
2 pages
2000 Sussman&Puckett CLSVOF
No ratings yet
2000 Sussman&Puckett CLSVOF
37 pages
6690 01 Que 2003 SPECIMEN
No ratings yet
6690 01 Que 2003 SPECIMEN
9 pages
Joint Probability
No ratings yet
Joint Probability
8 pages
Vired
No ratings yet
Vired
4 pages
s1 October 2016 Paper
No ratings yet
s1 October 2016 Paper
24 pages
Chapter 6
No ratings yet
Chapter 6
42 pages
Annu Maria-Introduction To Modelling and Simulation
0% (1)
Annu Maria-Introduction To Modelling and Simulation
7 pages
Ai
No ratings yet
Ai
287 pages
SCIE 211 Lab 2 Worksheet
No ratings yet
SCIE 211 Lab 2 Worksheet
4 pages
Aniket Asole - Senior Executive Analytics MMM - HR Central
No ratings yet
Aniket Asole - Senior Executive Analytics MMM - HR Central
1 page
Probabilistic Load Forecasting For Integrated Energy System - 2024 - Advances in
No ratings yet
Probabilistic Load Forecasting For Integrated Energy System - 2024 - Advances in
13 pages
Simplex Method
No ratings yet
Simplex Method
15 pages
Kriging External Drift - The Most Powerful Guide
No ratings yet
Kriging External Drift - The Most Powerful Guide
24 pages
ML Question Bank
No ratings yet
ML Question Bank
68 pages
TMA4180 Solutions To Recommended Exercises in Chapter 12 of N&W
No ratings yet
TMA4180 Solutions To Recommended Exercises in Chapter 12 of N&W
4 pages
Real Coded Genetic Algorithm
No ratings yet
Real Coded Genetic Algorithm
4 pages
BMSP Unit 3
No ratings yet
BMSP Unit 3
21 pages
DL IT324a 1
No ratings yet
DL IT324a 1
38 pages
Ankit Adhikari 2 PDF
No ratings yet
Ankit Adhikari 2 PDF
22 pages
Stodola
No ratings yet
Stodola
3 pages
Finish Start: Chapter 02: Project Management Solution: Practice Problems
No ratings yet
Finish Start: Chapter 02: Project Management Solution: Practice Problems
5 pages
Control Engineering LAb 7
No ratings yet
Control Engineering LAb 7
17 pages

K Mean Clustering

Uploaded by

K Mean Clustering

Uploaded by

Clustering

 Clustering is the classification of objects into

1. Agglomerative ("bottom-up"): Agglomerative algorithms

 Distance measure will determine how the similarity of two

2. The Manhattan distance (also called taxicab norm or 1-

4. The Mahalanobis distance corrects data for

algorithm for mixtures of Gaussians in that they

where xn is a vector representing the the nth

 The grouping is done by minimizing the sum

of squares of distances between data and the

 Therefore, the new

 Next centroids are:

 Each column in the distance matrix symbolizes the

 We obtain result that . Comparing the

Object Feature1(X): Feature2 Group

Sub kMeanCluster (Data() As Variant, numCluster As Integer)

Dim isStillMoving As Boolean

learning or data mining

convert waveforms into one of k categories (known

fashioned graphical display devices and Image

You might also like