Clustering
Clustering
CO-3
AIM
To familiarize students with the concepts of unsupervised machine learning, its difference with
supervised machine learning and the use of unsupervised learning, particularly clustering
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
• The data set has three natural groups of data points, i.e., 3 natural
clusters.
What is clustering for?
• A clustering algorithm
Partitional clustering
Hierarchical clustering
…
• It can be used to cluster large datasets that do not fit in main memory
• Not the best method. There are other scale-up algorithms, e.g., BIRCH.
A disk version of k-means (cont …)
Strengths of k-means
• Strengths:
Simple: easy to understand and to implement
Efficient: Time complexity: O(tkn), where n is the number of data points,
k is the number of clusters, and t is the number of iterations.
Since both k and t are small. k-means is considered a linear algorithm.
21
Hierarchical Clustering
• Divisive (top down) clustering: It starts with all data points in one
cluster, the root.
Splits the root into a set of child clusters. Each child cluster is
recursively divided further
stops when only singleton clusters of individual data points remain,
i.e., each cluster with only a single point
AGGLOMERATIVE CLUSTERING
24
AGGLOMERATIVE CLUSTERING
25
AGGLOMERATIVE CLUSTERING
26
AGGLOMERATIVE CLUSTERING
27
AGGLOMERATIVE CLUSTERING
28
DIVISIVE HIERARCHICAL CLUSTERING
29
DIVISIVE HIERARCHICAL CLUSTERING
30
DIVISIVE CLUSTERING
31
DIVISIVE CLUSTERING
• This approach starts with all of the objects in the same cluster.
• In the continuous iteration, a cluster is split up into smaller clusters.
• It is down until each object in one cluster or the termination condition
holds.
• This method is rigid, i.e., once a merging or splitting is done, it can never
be undone.
32
APPLICATIONS
33
Summary
The centroid representation alone works well if the clusters are of the
hyper-spherical shape.
(d)Dendrogram
(a) TRUE
(b)FALSE
36
Density-Based
Self-Assessment Questions
(a) Dbscan
(b) Hierarchy
(c) Grid
(d) Project based
37
THANK YOU
OUR TEAM
38