0% found this document useful (0 votes)
31 views31 pages

ML 8

This document discusses clustering algorithms, specifically k-means clustering. It begins with an introduction to clustering and unsupervised learning. It then describes the k-means clustering algorithm, including defining the objective function, explaining the steps of the algorithm, and discussing convergence. Examples are provided to demonstrate applying k-means to datasets. The document concludes by discussing the strengths and weaknesses of k-means clustering and techniques for evaluating clustering results.

Uploaded by

Tejas Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views31 pages

ML 8

This document discusses clustering algorithms, specifically k-means clustering. It begins with an introduction to clustering and unsupervised learning. It then describes the k-means clustering algorithm, including defining the objective function, explaining the steps of the algorithm, and discussing convergence. Examples are provided to demonstrate applying k-means to datasets. The document concludes by discussing the strengths and weaknesses of k-means clustering and techniques for evaluating clustering results.

Uploaded by

Tejas Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Clustering Algorithm

Mr. Rohan Pillai


Assistant Professor
Department of Electrical Engineering, DTU
Supervised Learning
Unsupervised Learning
Unsupervised Learning
Introduction to Clustering
Can you spot the clusters here?
Group the following datapoints into 2
clusters :-

1. Single Linkage (Agglomerative) 2. K-Means (K = 2 here) Clustering


Algorithm Algorithm
Various aspects of Clustering
Distance ( dissimilarity ) measures
Cluster Evaluation ( a hard problem)
Optimal Clusters ?
Clustering techniques
1. k- means Clustering Algorithm
k-means objective function
k-means algorithm
k-means convergence ( stopping criterion)


☆qqi
iao
is Hot
•q &
*
aggro
q*%
k- means clustering example 1 :
Use k-means algorithm to divide the following dataset into three clusters

Step 1 : Randomly initialize the cluster centers (synaptic weights)


k- means clustering example 1…
Step 2 : Determine cluster membership for each datapoints
k- means clustering example 1…
Step 3 : Re-estimate cluster centers (adapt synaptic weights)
k- means clustering example 1…
k- means clustering example 1…
k- means clustering example 1…
k- means clustering : Strengths &
Weaknesses
Strengths

❑ Simple : easy to understand and to implement

❑ Relatively efficient : Time complexity = O(tkn) ,


where n is the number of datapoints,
k is the number of clusters, and
t is the number of iterations. (Since both k and t are small, k-means algorithm is
considered a linear algorithm)

❑ Procedure always terminates successfully


k- means clustering : Strengths &
Weaknesses
Weaknesses
❑ Does not necessarily find the most optimal configuration

❑ The algorithm is only applicable if the mean is defined.


- for categorical data, k-mode - the centroid is represented by most frequent values.

❑ The user needs to specify k.

❑ The algorithm is sensitive to outliers.

❑ Significantly sensitive to the initial randomly selected cluster centers


Effects of Outliers
Sensitivity to initial seeds
Clustering validity problem
• Problem 1 :
- A problem we face in clustering is to decide the optimal number of
clusters that fits a dataset
• Problem 2 :
- The various clustering algorithms behave in a different way depending on
▪ The features of the dataset (geometry and density distribution of clusters)
▪ The input parameter values (for eg : for k-means, initial cluster choices influence the
result)
• So how do we know , which clustering method is better/ suitable?
• We need a clustering quality criteria !!
Clustering quality criteria
One way to find the number of clusters :
‘Elbow method’
Reference ( Slides adapted from ):
• Andrew Moore, CMU
(https://www.cs.cmu.edu/~./awm/tutorials/kmeans11.pdf)

• http://www.mit.edu/~9.54/fall14/slides/Class13.pdf

• https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/und
erstanding-machine-learning-theory-algorithms.pdf

• CC282 Unsupervised Learning (Clustering) Lecture 7 – R.


Palaniappan (2008)

You might also like