Clustering
Clustering
• What is a cluster?
i
Cluster Analysis
• In general, we want:
• Density-based methods
• This partitions the space into Voronoi cells, which are our
clusters
Red
cluster
Blue cluster:
blue center is
nearest
Yellow cluster
What are the means of the data points in each
cluster?
These are the new centers.
Now which data points are closest to each center?
Now which data points are closest to each center?
Redefine the clusters based on which center
they’re nearest
And repeat! Keep calculating the centers and
redefining the clusters until they stop changing.
And repeat! Keep calculating the centers and
redefining the clusters until they stop changing.
The results once the clusters and centers are fixed
are your final k-means clusters.
K-means clustering
• Relatively efficient
• Tends toward equal sized clusters By Chire - Own work, Public Domain, https://
commons.wikimedia.org/w/index.php?curid=11765684
A B
• Start with all single point clusters
radius
Core point
m=3
DBSCAN example
radius
m=3
DBSCAN example
Directly
reachable radius
m=3
DBSCAN example
Core radius
m=3
DBSCAN example
radius
m=3
DBSCAN example
Core radius
m=3
DBSCAN example
Reachable
Core radius
m=3
DBSCAN example
radius
Core m=3
DBSCAN example
radius
Core
m=3
DBSCAN example
radius
Core
m=3
DBSCAN example
radius
m=3
DBSCAN example
radius
m=3
DBSCAN example
m=3
DBSCAN example
radius
m=3
DBSCAN example
Cluster 1
radius
Cluster 2 m=3
Outlier point
DBSCAN example
Cluster 1
radius
Cluster 2 m=3
Outlier point
DBSCAN
https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
DBSCAN
• How to choose radius & min points? There are rules of thumb
but can be tricky! Often use min points = 2 x dim, for radius,
can us elbow plot of a k-distance graph, but harder to say)
Model based methods: Gaussian Mixture Models
https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Network methods: modularity maximization
• Community (cluster)
detection approach for
networks
Modularity and community structure in networks. M. E. J. Newman. PNAS 2006, 103 (23) 8577-8582; DOI: 10.1073/pnas.0601602103
Network methods: assortativity
• https://en.wikipedia.org/wiki/DBSCAN
• https://medium.com/predict/three-popular-clustering-methods-and-
when-to-use-each-4227c80ba2b6
• https://blog.dominodatalab.com/topology-and-density-based-
clustering/
• https://shapeofdata.wordpress.com/2014/03/04/k-modes/
• https://towardsdatascience.com/the-5-clustering-algorithms-data-
scientists-need-to-know-a36d136ef68