0% found this document useful (0 votes)

8 views21 pages

IntroML 8 KmeanClustering

The document provides an overview of machine learning concepts, focusing on clustering techniques such as K-means, C-means, and Gaussian Mixture Models (GMM). It outlines the processes involved in K-means clustering, including initialization, distance calculation, and convergence criteria, as well as methods for determining the optimal number of clusters. Additionally, it compares the performance of K-means and C-means, highlighting the advantages and disadvantages of each approach.

Uploaded by

tghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

IntroML 8 KmeanClustering

Uploaded by

tghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction toINSERT IMAGE HERE

Machine Learning
(5 ECTS)
Quote Slide Option 2 Overview lecture

• Clustering
• K-means Clustering (hard clustering)

• C-means Clustering (soft-clustering)

• Gaussian Mixture Models (GMM)

Trinity College Dublin, The University of Dublin 2

Quote Slide Option 2 Supervised Learning

Applications:

- Image classification

- Natural Learning processing

- Medical Diagnosis

- …

Classification Model: Fit on labelled data. Applied on new unlabelled

entries to determine their class.

“https://abeyon.com/how-do-machines-
learn/”

Trinity College Dublin, The University of Dublin 3

Quote Slide Option 2 Unsupervised Learning

Applications:

- Image segmentation

- Dimensionality reduction

- Clustering

- …

No Labels, grouping data based on (dis)similarity criteria

“https://abeyon.com/how-do-machines-
learn/”

Trinity College Dublin, The University of Dublin 4

Quote Slide Option 2 Clustering: k-means

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 5

Quote Slide Option 2 K-Means algorithm
Input:
• Training Data: {x(1), x(2), x(3),…, x(m)}
• number of clusters (K). K< m

Steps:
1- Randomly assign centres for the K clusters (µ(1), µ(2), µ(3),…, µ(k))
2- Calculate the distance of every point in the training data to the cluster centres
3- assign datapoints to the nearest cluster centre (c(1), c(2), c(3),…, c(m))
4- Update the centre of clusters (average (mean) of datapoints assigned to cluster)
5- repeat 2-4
6- stop when assignment does not change

Trinity College Dublin, The University of Dublin 6

Quote Slide Option 2 Clustering. k-means

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 7

Quote Slide Option 2 K-Means algorithm
Cost Function:
• J(c(1), c(2), c(3),…, c(m), µ(1), µ(2), µ(3),…, µ(k)) =

• The cost function is basically the average of the variance within each group

• Goal is to minimise the cost function

• The minimization can lead to a local minimums, the reasons are:

• Bad initialization of cluster centres
• Bad number of clusters

Trinity College Dublin, The University of Dublin 8

Quote Slide Option 2 Clustering: k-means

• Bad initialization of cluster centres  repeat algorithm with different initialization of centroids

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 9

Quote Slide Option 2 Clustering: k-means

Silhouette criteria
Solutions for optimum Calinski-Harabasz criteria
number of clusters:
Gap criteria
https://www.mathworks.com/help/stats/
evalclusters.html

Trinity College Dublin, The University of Dublin 10

Silhouette criteria:
Quote Slide Option 2
ai = averaged distance of data point i from points in the same
cluster
bi = averaged distance of data point i from points in the second
best cluster cluster

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<n (number of datapoints)
2- apply the K-Means algorithm until convergence for k clusters
3- Calculate the silhouette value for each data point and average
4- repeat step 1-3
5- Select the number of clusters with the highest silhouette coefficient

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 11

Calinski-Harabasz criteria:
Quote Slide Option 2
SSb = between-cluster variance
SSw = within-cluster variance
N = the number of datapoints
K= number of clusters

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<N (number of datapoints)
2- apply the K-Means algorithm until convergence for k clusters
3- Calculate the Calinski score value for each number of cluster
4- repeat step 1-3
5- Select the number of clusters with the highest Calinski score

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 12

Gap criteria:
Quote Slide Option 2
• It is an approach for objectifying the elbow method
• Comparing inertia of datapoints to a null distribution

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<n (number of datapoints)

2- apply the K-Means algorithm until convergence for k clusters, on real data

3- calculate inertia (real inertia)

4- create a set of random points from a uniform distribution (i.e, fake datapoints)

5- calculate the inertia for fake datapoints (fake inertia)

6- repeat 4-6 a reasonable amount of times (e.g, 100)

https://towardsdatascience.com/how-many-
7- compare real inertia and fake inertia (gap statistics) clusters-6b3f220f0ef5

Trinity College Dublin, The University of Dublin 13

Gap criteria:
Quote Slide Option
(A) 2

(B)

Trinity College Dublin, The University of Dublin

Quote Slide Option 2 Clustering. k-means

We could either use other methods (e.g., Gaussian mixture models) or we could manipulate the features (e.g.,
rescaling, or other features altogether).

“Hands-On Machine Learning with Scikit-Learn,

Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 15

Quote Slide Option 2C-means Clustering (soft-clustering)

• Unsupervised
• Each datapoint can belong to more than one cluster (probability assignment)
• Slower than k-means

https://medium.com/geekculture/fuzzy-c-
means-clustering-fcm-algorithm-in-machine-
learning-c2e51e586fff
Trinity College Dublin, The University of Dublin 16
Quote Slide Option 2C-means Clustering (soft-clustering)
How it works:
• Same as K-mean except that the centroids of each cluster is updated based on the weighted probabilities of
each datapoint in that cluster
m: fuzziness parameter
wk(x)= degree (probability) of datapoint x belonging to

cluster k
m
• Update degree values:

• Repeat until convergence (define threshold of convergence)

Trinity College Dublin, The University of Dublin 17

K-means and C-means
Quote Slide Option 2

The resulting performance of the two methods is

significantly different

The fuzzy c-means algorithm has better performance than

k-means

The fuzzy c-means algorithm has a weakness in terms of

computational time required, fuzzy c-means is longer
than k-means

Retinal blood vessel segmentation using the

k-mean and fuzzy c-means clustering Wiharto W, Suryani E. The Comparison of Clustering Algorithms K-Means and Fuzzy
C-Means for Segmentation Retinal Blood Vessels. Acta Inform Med. 2020
Mar;28(1):42-47. doi: 10.5455/aim.2020.28.42-47. PMID: 32210514; PMCID:
PMC7085333.
Trinity College Dublin, The University of Dublin 18
Quote Slide Option 2Gaussian Mixture Models (GMM)

Trinity College Dublin, The University of Dublin 19

GMM
Quote
How Slide Option 2
it works:
Expectation-maximization
1- E-step: estimate the expected value (likelihood) of each variable (datapoint) belonging to the each
distribution/cluster: p(ck|xi)

• P(xi|ck) =

• p(ck|xi) = P(xi|ck)p(ck)/p(xi); in other words (wik)

2- M-step: Estimate the parameters (mean and covariance) in order to maximize the likelihood; (updating
parameters)

µk = ; σk=

Trinity College Dublin, The University of Dublin 20

Quote Slide Option 2 Mixture Models (GMM); 1-D example
Gaussian
P(xi|ck) =

wik =p(ck|xi) = P(xi|ck)p(ck)/p(xi)

µk =

σ k=

https://www.youtube.com/watch?
v=iQoXFmbXRJA

Trinity College Dublin, The University of Dublin 21

Architecture and Sociology
No ratings yet
Architecture and Sociology
11 pages
Literature Review On Accessibility
100% (1)
Literature Review On Accessibility
7 pages
2.1 PPT - Homogeneous and Hetero Mixtures
No ratings yet
2.1 PPT - Homogeneous and Hetero Mixtures
60 pages
FRC Series
No ratings yet
FRC Series
1 page
L7 Clustering
No ratings yet
L7 Clustering
58 pages
AI Unit 5
No ratings yet
AI Unit 5
103 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
AI With ML (Unit-4)
No ratings yet
AI With ML (Unit-4)
14 pages
AASHTO M300 Inorganic Zinc-Rich Primer
100% (2)
AASHTO M300 Inorganic Zinc-Rich Primer
8 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
Mehdi Belouahchia Resume F
No ratings yet
Mehdi Belouahchia Resume F
2 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
Unit 4
No ratings yet
Unit 4
125 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Scipy - Stats.norm - SciPy v1.11.2 Manual
No ratings yet
Scipy - Stats.norm - SciPy v1.11.2 Manual
3 pages
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
No ratings yet
Implementation of Modbus Slave TCPIP For Alfen NG9xx Platform
15 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
SML Hand Note Bau by DT
No ratings yet
SML Hand Note Bau by DT
1 page
Monday Tuesday Wednesday Thursday Friday
No ratings yet
Monday Tuesday Wednesday Thursday Friday
8 pages
Epic Minigeddon2
No ratings yet
Epic Minigeddon2
1 page
07 Rawlbolts Plugs Anchors
No ratings yet
07 Rawlbolts Plugs Anchors
1 page
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
Assignment 1 To 4 - BTC507 - 20376005
No ratings yet
Assignment 1 To 4 - BTC507 - 20376005
35 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Collaborative Learning
No ratings yet
Collaborative Learning
7 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Unit 4
No ratings yet
Unit 4
46 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
Def Slide
No ratings yet
Def Slide
9 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Minor Project
No ratings yet
Minor Project
10 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
K Means
No ratings yet
K Means
40 pages
Strength of Materials
No ratings yet
Strength of Materials
115 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
ML 5
No ratings yet
ML 5
61 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
Week 9
No ratings yet
Week 9
66 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Activity Based Costing
No ratings yet
Activity Based Costing
34 pages
Csit 301 Lesson Plan 1
No ratings yet
Csit 301 Lesson Plan 1
5 pages
K Means
No ratings yet
K Means
9 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
Clustering
No ratings yet
Clustering
18 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Visual COBOL Question and Answers PDF
No ratings yet
Visual COBOL Question and Answers PDF
33 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
TUV SUD - MT Procedure Rev.05
No ratings yet
TUV SUD - MT Procedure Rev.05
11 pages
A Journey of Self-Actualization of Amir in The Kite Runner
No ratings yet
A Journey of Self-Actualization of Amir in The Kite Runner
4 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
7-8-TLE CSS Week 5
No ratings yet
7-8-TLE CSS Week 5
10 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
100% (1)
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
5 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Business Case Studies
No ratings yet
Business Case Studies
10 pages
Unit 8 - TQM
No ratings yet
Unit 8 - TQM
37 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
Evidence and Practice
No ratings yet
Evidence and Practice
18 pages
Passband Digital Transmission
No ratings yet
Passband Digital Transmission
99 pages
Magnetic Flow E&H
No ratings yet
Magnetic Flow E&H
20 pages
A Feasibility Study of Eco Bag Made in Banana Fiber
No ratings yet
A Feasibility Study of Eco Bag Made in Banana Fiber
3 pages
The Implementation of Distance Protection Relay in Transmission Lines
No ratings yet
The Implementation of Distance Protection Relay in Transmission Lines
8 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

IntroML 8 KmeanClustering

Uploaded by

IntroML 8 KmeanClustering

Uploaded by

Introduction toINSERT IMAGE HERE

• C-means Clustering (soft-clustering)

• Gaussian Mixture Models (GMM)

Trinity College Dublin, The University of Dublin 2

- Natural Learning processing

Classification Model: Fit on labelled data. Applied on new unlabelled

Trinity College Dublin, The University of Dublin 3

No Labels, grouping data based on (dis)similarity criteria

Trinity College Dublin, The University of Dublin 4

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 5

Trinity College Dublin, The University of Dublin 6

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 7

• Goal is to minimise the cost function

• The minimization can lead to a local minimums, the reasons are:

Trinity College Dublin, The University of Dublin 8

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 9

Trinity College Dublin, The University of Dublin 10

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 11

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 12

3- calculate inertia (real inertia)

5- calculate the inertia for fake datapoints (fake inertia)

6- repeat 4-6 a reasonable amount of times (e.g, 100)

Trinity College Dublin, The University of Dublin 13

Trinity College Dublin, The University of Dublin

“Hands-On Machine Learning with Scikit-Learn,

Trinity College Dublin, The University of Dublin 15

• Repeat until convergence (define threshold of convergence)

Trinity College Dublin, The University of Dublin 17

The resulting performance of the two methods is

The fuzzy c-means algorithm has better performance than

The fuzzy c-means algorithm has a weakness in terms of

Retinal blood vessel segmentation using the

Trinity College Dublin, The University of Dublin 19

• p(ck|xi) = P(xi|ck)p(ck)/p(xi); in other words (wik)

Trinity College Dublin, The University of Dublin 20

wik =p(ck|xi) = P(xi|ck)p(ck)/p(xi)

Trinity College Dublin, The University of Dublin 21

You might also like