0% found this document useful (0 votes)
8 views21 pages

IntroML 8 KmeanClustering

The document provides an overview of machine learning concepts, focusing on clustering techniques such as K-means, C-means, and Gaussian Mixture Models (GMM). It outlines the processes involved in K-means clustering, including initialization, distance calculation, and convergence criteria, as well as methods for determining the optimal number of clusters. Additionally, it compares the performance of K-means and C-means, highlighting the advantages and disadvantages of each approach.

Uploaded by

tghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

IntroML 8 KmeanClustering

The document provides an overview of machine learning concepts, focusing on clustering techniques such as K-means, C-means, and Gaussian Mixture Models (GMM). It outlines the processes involved in K-means clustering, including initialization, distance calculation, and convergence criteria, as well as methods for determining the optimal number of clusters. Additionally, it compares the performance of K-means and C-means, highlighting the advantages and disadvantages of each approach.

Uploaded by

tghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction toINSERT IMAGE HERE

Machine Learning
(5 ECTS)
Quote Slide Option 2 Overview lecture

• Clustering
• K-means Clustering (hard clustering)

• C-means Clustering (soft-clustering)

• Gaussian Mixture Models (GMM)

Trinity College Dublin, The University of Dublin 2


Quote Slide Option 2 Supervised Learning

Applications:

- Image classification

- Natural Learning processing

- Medical Diagnosis

- …

Classification Model: Fit on labelled data. Applied on new unlabelled


entries to determine their class.

“https://abeyon.com/how-do-machines-
learn/”

Trinity College Dublin, The University of Dublin 3


Quote Slide Option 2 Unsupervised Learning

Applications:

- Image segmentation

- Dimensionality reduction

- Clustering

- …

No Labels, grouping data based on (dis)similarity criteria

“https://abeyon.com/how-do-machines-
learn/”

Trinity College Dublin, The University of Dublin 4


Quote Slide Option 2 Clustering: k-means

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 5


Quote Slide Option 2 K-Means algorithm
Input:
• Training Data: {x(1), x(2), x(3),…, x(m)}
• number of clusters (K). K< m

Steps:
1- Randomly assign centres for the K clusters (µ(1), µ(2), µ(3),…, µ(k))
2- Calculate the distance of every point in the training data to the cluster centres
3- assign datapoints to the nearest cluster centre (c(1), c(2), c(3),…, c(m))
4- Update the centre of clusters (average (mean) of datapoints assigned to cluster)
5- repeat 2-4
6- stop when assignment does not change

Trinity College Dublin, The University of Dublin 6


Quote Slide Option 2 Clustering. k-means

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 7


Quote Slide Option 2 K-Means algorithm
Cost Function:
• J(c(1), c(2), c(3),…, c(m), µ(1), µ(2), µ(3),…, µ(k)) =

• The cost function is basically the average of the variance within each group

• Goal is to minimise the cost function

• The minimization can lead to a local minimums, the reasons are:


• Bad initialization of cluster centres
• Bad number of clusters

Trinity College Dublin, The University of Dublin 8


Quote Slide Option 2 Clustering: k-means

• Bad initialization of cluster centres  repeat algorithm with different initialization of centroids

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 9


Quote Slide Option 2 Clustering: k-means

Silhouette criteria
Solutions for optimum Calinski-Harabasz criteria
number of clusters:
Gap criteria
https://www.mathworks.com/help/stats/
evalclusters.html

Trinity College Dublin, The University of Dublin 10


Silhouette criteria:
Quote Slide Option 2
ai = averaged distance of data point i from points in the same
cluster
bi = averaged distance of data point i from points in the second
best cluster cluster

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<n (number of datapoints)
2- apply the K-Means algorithm until convergence for k clusters
3- Calculate the silhouette value for each data point and average
4- repeat step 1-3
5- Select the number of clusters with the highest silhouette coefficient

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 11


Calinski-Harabasz criteria:
Quote Slide Option 2
SSb = between-cluster variance
SSw = within-cluster variance
N = the number of datapoints
K= number of clusters

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<N (number of datapoints)
2- apply the K-Means algorithm until convergence for k clusters
3- Calculate the Calinski score value for each number of cluster
4- repeat step 1-3
5- Select the number of clusters with the highest Calinski score

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 12


Gap criteria:
Quote Slide Option 2
• It is an approach for objectifying the elbow method
• Comparing inertia of datapoints to a null distribution

Steps:
1- Determine a set of cluster numbers to evaluate; K={2,3,4…k}; 2<k<n (number of datapoints)

2- apply the K-Means algorithm until convergence for k clusters, on real data

3- calculate inertia (real inertia)

4- create a set of random points from a uniform distribution (i.e, fake datapoints)

5- calculate the inertia for fake datapoints (fake inertia)

6- repeat 4-6 a reasonable amount of times (e.g, 100)


https://towardsdatascience.com/how-many-
7- compare real inertia and fake inertia (gap statistics) clusters-6b3f220f0ef5

Trinity College Dublin, The University of Dublin 13


Gap criteria:
Quote Slide Option
(A) 2

(B)

Trinity College Dublin, The University of Dublin


Quote Slide Option 2 Clustering. k-means

We could either use other methods (e.g., Gaussian mixture models) or we could manipulate the features (e.g.,
rescaling, or other features altogether).

“Hands-On Machine Learning with Scikit-Learn,


Keras, and TensorFlow”, Aurélien Géron, 2019

Trinity College Dublin, The University of Dublin 15


Quote Slide Option 2C-means Clustering (soft-clustering)

• Unsupervised
• Each datapoint can belong to more than one cluster (probability assignment)
• Slower than k-means

https://medium.com/geekculture/fuzzy-c-
means-clustering-fcm-algorithm-in-machine-
learning-c2e51e586fff
Trinity College Dublin, The University of Dublin 16
Quote Slide Option 2C-means Clustering (soft-clustering)
How it works:
• Same as K-mean except that the centroids of each cluster is updated based on the weighted probabilities of
each datapoint in that cluster
m: fuzziness parameter
wk(x)= degree (probability) of datapoint x belonging to

cluster k
m
• Update degree values:

• Repeat until convergence (define threshold of convergence)

Trinity College Dublin, The University of Dublin 17


K-means and C-means
Quote Slide Option 2

The resulting performance of the two methods is


significantly different

The fuzzy c-means algorithm has better performance than


k-means

The fuzzy c-means algorithm has a weakness in terms of


computational time required, fuzzy c-means is longer
than k-means

Retinal blood vessel segmentation using the


k-mean and fuzzy c-means clustering Wiharto W, Suryani E. The Comparison of Clustering Algorithms K-Means and Fuzzy
C-Means for Segmentation Retinal Blood Vessels. Acta Inform Med. 2020
Mar;28(1):42-47. doi: 10.5455/aim.2020.28.42-47. PMID: 32210514; PMCID:
PMC7085333.
Trinity College Dublin, The University of Dublin 18
Quote Slide Option 2Gaussian Mixture Models (GMM)

Trinity College Dublin, The University of Dublin 19


GMM
Quote
How Slide Option 2
it works:
Expectation-maximization
1- E-step: estimate the expected value (likelihood) of each variable (datapoint) belonging to the each
distribution/cluster: p(ck|xi)

• P(xi|ck) =

• p(ck|xi) = P(xi|ck)p(ck)/p(xi); in other words (wik)


2- M-step: Estimate the parameters (mean and covariance) in order to maximize the likelihood; (updating
parameters)

µk = ; σk=

Trinity College Dublin, The University of Dublin 20


Quote Slide Option 2 Mixture Models (GMM); 1-D example
Gaussian
P(xi|ck) =

wik =p(ck|xi) = P(xi|ck)p(ck)/p(xi)

µk =

σ k=

https://www.youtube.com/watch?
v=iQoXFmbXRJA

Trinity College Dublin, The University of Dublin 21

You might also like