0% found this document useful (0 votes)

19 views27 pages

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

hl7694016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views27 pages

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

hl7694016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CISI612 December 03, 2022

7. Data Mining:
Clustering K-Means

Dr. Kadan Aljoumaa

[email protected]
Outlines
 Introduction
 Types of Clustering
 Common Distance Measures
 K-means Clustering
 How the K-mean Clustering Algorithm Works?
 A Simple Example Showing the Implementation of K-
means Algorithm
 Weaknesses of K-mean Clustering
 Applications of K-mean Clustering
 Conclusion

2 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

INTRODUCTION - What is clustering?

 Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set

into subsets (clusters), so that the data in each subset

(ideally) share some common trait - often according to

some defined distance measure.

3 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Clustering types
 Two types of clustering:
 Partitioning methods: these find successive clusters using
previously established clusters.
 Agglomerative ("bottom-up"):Agglomerative algorithms begin with
each element as a separate cluster and merge them into successively
larger clusters.
 Divisive ("top-down"): Divisive algorithms begin with the whole set
and proceed to divide it into successively smaller clusters.
 Hierarchical Methods: Partitional algorithms determine all
clusters at once. They include:
 K-means and derivatives
 Fuzzy c-means clustering
 QT clustering algorithm

4 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Clustering types - Partitioning methods
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative

a
ab
b
abcde

c
cde
d
de
e
divisive

Step 4 Step 3 Step 2 Step 1 Step 0

5 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Similarity and Dissimilarity Between
Objects
 Distance measure will determine how the similarity of two
elements is calculated and it will influence the shape of
the clusters.They include:
 Euclidean distance
 distance(x, y) = {Σi (xi – yi)2 }½
 Chebychev distance
 differentiate furthest dimensions or attributes
 distance(x, y) = Maximum | xi – yi |
 Hamming distancem
 distance(x,y) =  xi  yi
i 1
 Weighted Euclidean distance
dist(xi , x j )  w1 ( xi1  x j1 )2  w2 ( xi 2  x j 2 )2  ...  wr ( xir  x jr ) 2
6 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-MEANS CLUSTERING

 The k-means algorithm is an algorithm to cluster n

objects based on attributes into k partitions, where k < n.

 It is similar to the expectation-maximization algorithm for

mixtures of Gaussians in that they both attempt to find

the centers of natural clusters in the data.

 It assumes that the object attributes form a vector space.

7 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-MEANS CLUSTERING

 Simply speaking k-means clustering is an algorithm to

classify or to group the objects based on
attributes/features into K number of group.

 K is positive integer number.

 The grouping is done by minimizing the sum of squares of

distances between data and the corresponding cluster
centroid.

8 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

How the K-Mean Clustering algorithm works?

9 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

K-Mean Clustering algorithm 1/2
 Step 1: Begin with a decision on the value of k =
number of clusters .
 Step 2: Put any initial partition that classifies the data
into k clusters. You may assign the training samples
randomly ,or systematically as the following:
1. Take the first k training sample as single-element
clusters
2. Assign each of the remaining (N-k) training sample to
the cluster with the nearest centroid. After each
assignment, recompute the centroid of the gaining
cluster.

10 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

K-Mean Clustering algorithm 2/2

 Step 3: Take each sample in sequence and compute its

distance from the centroid of each of the clusters. If a sample
is not currently in the cluster with the closest centroid, switch
this sample to that cluster and update the centroid of the
cluster gaining the new sample and the cluster losing the
sample.
 Step 4: Repeat step 3 until convergence is achieved, that is
until a pass through the training sample causes no new
assignments.

11 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Example of k-means algorithm

(using K=2)
12 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
Example of k-means algorithm
Step 1:
Initialization:
Randomly we
choose two
centroids (k=2)
for two clusters.

In this case the

2 centroid are:

m1=(1.0,1.0)
and
m2=(5.0,7.0).
13 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
Example of k-means algorithm
Step 2:
 Thus, we obtain two clusters
containing individuals:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:

14 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.

 Therefore, the new clusters

are:
{1,2} and {3,4,5,6,7}

 Next centroids are:

m1=(1.25,1.5) and m2 =
(3.9,5.1)

15 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}

 Therefore, there is no change

in the cluster.
 Thus, the algorithm comes to
a halt here and final result
consist of 2 clusters {1,2} and
{3,4,5,6,7}.

16 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

17 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
18 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
19 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
20 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
21 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
22 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
23 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering
K-means Clustering - Example

24 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial
condition may produce different result of cluster. The
algorithm may be trapped in the local optimum.

25 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Applications of K-Mean Clustering

 It is relatively efficient and fast. It computes result at

O(tkn), where n is number of objects or points, k is
number of clusters and t is number of iterations.
 k-means clustering can be applied to machine learning
or data mining
 Used on acoustic data in speech understanding to convert
waveforms into one of k categories (known as Vector
Quantization or Image Segmentation).
 Also used for choosing color palettes on old fashioned
graphical display devices and Image Quantization.

26 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

CONCLUSION

 K-means algorithm is useful for undirected knowledge

discovery and is relatively simple.

 K-means has found wide spread usage in lot of fields,

ranging from unsupervised learning of neural network,
Pattern recognitions, Classification analysis, Artificial
intelligence, image processing, machine vision, and many
others.

27 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

SkinCare Recommendation System Using Computer Vision
No ratings yet
SkinCare Recommendation System Using Computer Vision
16 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Machine Learning Notes - Lec 02 - Concept Learning
No ratings yet
Machine Learning Notes - Lec 02 - Concept Learning
92 pages
CS607-Artificial Intelligence: July 01,2013
No ratings yet
CS607-Artificial Intelligence: July 01,2013
30 pages
Quality 4.0
100% (4)
Quality 4.0
14 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Update Project Report Vaishnavi
No ratings yet
Update Project Report Vaishnavi
59 pages
Playful Capitalism, or Play As An Instrument of Capital - Miguel Sicart
No ratings yet
Playful Capitalism, or Play As An Instrument of Capital - Miguel Sicart
15 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
No ratings yet
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
20 pages
Advanced Econometrics: Professor: Sukjin Han
No ratings yet
Advanced Econometrics: Professor: Sukjin Han
12 pages
Unsupervised Learning Update
No ratings yet
Unsupervised Learning Update
37 pages
4 Classification 1
100% (1)
4 Classification 1
45 pages
AI - Playbook Executive Briefing Artificial Intelligence
No ratings yet
AI - Playbook Executive Briefing Artificial Intelligence
27 pages
K Mean Clustering
No ratings yet
K Mean Clustering
24 pages
Hospital Management System With Chatbot
No ratings yet
Hospital Management System With Chatbot
54 pages
Clustering
No ratings yet
Clustering
84 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Week 10
No ratings yet
Week 10
41 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Ard Essay v1
No ratings yet
Ard Essay v1
4 pages
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
No ratings yet
Design and Analysis of Algorithms - AD3351 - Important Questions With Answer - Unit 3 - Dynamic Programming and Greedy Technique
8 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Multimodal Fake News Detection
No ratings yet
Multimodal Fake News Detection
16 pages
NNDL Notes
No ratings yet
NNDL Notes
73 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
Clustring Data Mining
No ratings yet
Clustring Data Mining
21 pages
K Mean
No ratings yet
K Mean
12 pages
Kmean
No ratings yet
Kmean
24 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
73 pages
P01 39-5-6 Lakshmi p541-554
No ratings yet
P01 39-5-6 Lakshmi p541-554
14 pages
K Mean Cluster Analysis
No ratings yet
K Mean Cluster Analysis
16 pages
NavigatingtheFuture AI DrivenProjectManagementintheDigitalEra
No ratings yet
NavigatingtheFuture AI DrivenProjectManagementintheDigitalEra
12 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Ensemble Learning
No ratings yet
Ensemble Learning
1 page
DA528 Machine Learning Midterm Exam Questions
No ratings yet
DA528 Machine Learning Midterm Exam Questions
4 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
Anjeza Kanxha Bachelor Thesis FinalPresentation
No ratings yet
Anjeza Kanxha Bachelor Thesis FinalPresentation
24 pages
42-Unsupervised Learning - K-Means Clustering-21-11-2024
No ratings yet
42-Unsupervised Learning - K-Means Clustering-21-11-2024
18 pages
Efficient Neural Architecture Search (NAS)
No ratings yet
Efficient Neural Architecture Search (NAS)
2 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
ML 12
No ratings yet
ML 12
19 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Hung-Yi Lee Word2vec (v3)
No ratings yet
Hung-Yi Lee Word2vec (v3)
23 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K Means
No ratings yet
K Means
40 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
Minor Project
No ratings yet
Minor Project
10 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
K Mean
No ratings yet
K Mean
7 pages
AI Reinvents Chip Design
No ratings yet
AI Reinvents Chip Design
3 pages
Na 2010
No ratings yet
Na 2010
5 pages
1a.business Understanding Answers
No ratings yet
1a.business Understanding Answers
5 pages
Gowtham Resume
No ratings yet
Gowtham Resume
2 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
De Unit-5
No ratings yet
De Unit-5
25 pages
29-3 Slot C - University Practical Exam - Jan To Mar 2025 - Hs1
No ratings yet
29-3 Slot C - University Practical Exam - Jan To Mar 2025 - Hs1
6 pages
KMeans Clustering
No ratings yet
KMeans Clustering
11 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Pilot
No ratings yet
Pilot
3 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
? Revolutionizing Liver Care
No ratings yet
? Revolutionizing Liver Care
4 pages
K Means Clustering
No ratings yet
K Means Clustering
29 pages
Unit 4
No ratings yet
Unit 4
125 pages
Practical 5
No ratings yet
Practical 5
3 pages
Clustering
No ratings yet
Clustering
18 pages
Mini Project
No ratings yet
Mini Project
8 pages
K Means Clustering
No ratings yet
K Means Clustering
3 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Lecture 15 Unsupervised Clustering
No ratings yet
Lecture 15 Unsupervised Clustering
73 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

ADB Ch07 - Data Mining Clustering K-Means

Uploaded by

CISI612 December 03, 2022

Dr. Kadan Aljoumaa

2 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 Clustering is the classification of objects into different

groups, or more precisely, the partitioning of a data set

into subsets (clusters), so that the data in each subset

(ideally) share some common trait - often according to

some defined distance measure.

3 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

4 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

Step 4 Step 3 Step 2 Step 1 Step 0

5 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 The k-means algorithm is an algorithm to cluster n

objects based on attributes into k partitions, where k < n.

 It is similar to the expectation-maximization algorithm for

mixtures of Gaussians in that they both attempt to find

the centers of natural clusters in the data.

 It assumes that the object attributes form a vector space.

 Simply speaking k-means clustering is an algorithm to

 K is positive integer number.

 The grouping is done by minimizing the sum of squares of

8 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

9 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

10 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 Step 3: Take each sample in sequence and compute its

11 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

In this case the

14 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 Therefore, the new clusters

 Next centroids are:

15 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 Therefore, there is no change

16 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

24 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

25 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 It is relatively efficient and fast. It computes result at

26 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

 K-means algorithm is useful for undirected knowledge

 K-means has found wide spread usage in lot of fields,

27 Dr. Kadan ALJOUMAA Data Mining: K-Means Clustering

You might also like