Learning Objectives
• Introduction to Clustering
• Understand K-Means clustering
• Learn DBSCAN a Density-based clustering
• Compare K-Means clustering with DBSCAN
2
Let’s Look At…
We will take a look at an
Unsupervised unsupervised learning
algorithm to understand how
Learning data without labels can also be
useful.
4
And Now Clustering…
Task of dividing the population
or data points into a number of
groups such that data points in
What is the same groups are more
similar to other data points in
Clustering? the same group and dissimilar
to the data points in other
groups.
5
Clustering
7
Clustering
8
Clustering
9
Clustering
10
Clustering
11
Clustering: K Means
13
Clustering: K Means
14
Clustering: K Means
15
Clustering: K Means
16
Clustering: K Means
17
Clustering: K Means
18
Clustering: K Means
19
Clustering: K Means
20
Clustering: K Means
21
Clustering: K Means
22
Clustering: K Means
24
Clustering: K Means
25
Limitations: K Means
26
Limitations: K Means
27
Limitations: K Means
28
Limitations: K Means
29
Clustering: K Means
Hard clustering and soft clustering
In hard clustering, one data
point can belong to one cluster
only.
But in soft clustering, the output
provided is a probability
likelihood of a data point
belonging to each of the
predefined numbers of clusters.
30
DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN)
● DBSCAN is a density-based spatial clustering algorithm proposed in 1996
by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu.
● It is a clustering method that identifies groups of data points that are
densely packed together while flagging outliers that are isolated in low-
density areas.
● DBSCAN is widely recognized and frequently cited as one of the most
common clustering algorithms in data analysis and machine learning!
32
Density Based Clustering
Basic idea
● Clusters are dense regions in
the data space, separated by
regions of lower object density
● Discovers clusters of arbitrary
shape
Where are the clusters?
33
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.
Any distance function can be used based on the application
34
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.
Any distance function can be used based on the application
36
Density Definition
● High Density: ε-Neighborhood of a point contains at least MinPts
number of points.
Example:
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
37
Type of Points
● A point is a core point if it has at least MinPts number of points
within ε. These are points that are at the interior of a cluster.
● A border point has fewer than MinPts within ε, but is in the
neighborhood of a core point.
● An outlier point is any point that is not a core point nor a border
point.
38
Type of Points
MinPts = 5
39
Type of Points
40
Density Reachability
Directly density-reachable
42
Density Reachability
Directly density-reachable
43
Density Reachability
Directly density-reachable
Is q is directly density-reachable from p?
Is p is directly density-reachable from q?
Is Density-reachability is symmetric?
44
Density Reachability
Directly density-reachable
Is q is directly density-reachable from p? ✅
Is p is directly density-reachable from q? ❌
Is Density-reachability is symmetric? ❌
45
Density Reachability
Density-reachable (Indirectly)
● A point p is directly density-reachable from p2
● p2 is directly density-reachable from p1
● p1 is directly density-reachable from q
p ← p2 ← p1 ← q form a chain
46
Density Reachability
Density-reachable (Indirectly)
● p is (indirectly) density-reachable from q
● q is not density-reachable from p
p ← p2 ← p1 ← q form a chain
47
Density Connectivity
Density-connected
https://www.geeksforgeeks.org/
48
DBSCAN Algorithm ● ε = 2 cm
● MinPts = 3
50
DBSCAN Algorithm
51
DBSCAN Algorithm
52
DBSCAN Algorithm
for each p ∈ D do
if p is not yet classified then
if p is a core-object then
collect all objects density reachable
from p and assign them to a new
cluster
else
assign p to outlier
53
DBSCAN Algorithm
for each p ∈ D do
if p is not yet classified then
What is the time complexity?
if p is a core-object then
collect all objects density reachable What is the space complexity?
from p and assign them to a new
cluster
else
assign p to outlier
54
DBSCAN is Sensitive to Parameters
56
DBSCAN vs. K Means
57
DBSCAN vs. K Means
DBSCAN K Means
● In DBSCAN two parameters are ● In K-Means only one parameter is required
required for training the Model is for training the model
● Clusters formed in DBSCAN can be ● Clusters formed in K-Means are spherical
of any arbitrary shape or convex in shape
● DBSCAN can work well with ● K-Means is very sensitive to the number of
datasets having noise and outliers clusters so it need to specified
● K-Means does not work well with outliers
data
58