Unsupervised Learning - Clustering
Unsupervised Learning - Clustering
Clustering
May 2020
SECRET
Knowledge Share –Session plan
Topic Application Schedule
2
SECRET
Machine Learning Universe
SECRET 3
What is clustering?
SECRET 5
Clustering techniques
Divisive
K-means
K-Means clustering
• K-means (MacQueen, 1967) is a partitional clustering algorithm
• Let the set of data points D be {x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in X Rr, and r is the
number of dimensions.
• The k-means algorithm partitions the given data into
k clusters:
– Each cluster has a cluster center, called centroid.
– k is specified by the user
K-means clustering example: step 1
SECRET 8
K-means clustering example – step 2
SECRET 9
K-means clustering example – step 3
SECRET 10
K-means clustering example
SECRET 11
K-means clustering example
SECRET 12
K-means clustering example
SECRET 13
Weaknesses of K-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• Sensitive to initial seed
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away
from other data points.
– Outliers could be errors in the data recording or
some special data points with very different values.
Optimal Number of cluster
SECRET 18
K-means summary