Module 10
Module 10
• Introduction to Clustering
• Understand K-Means clustering
• Learn DBSCAN a Density-based clustering
• Compare K-Means clustering with DBSCAN
2
Let’s Look At…
4
And Now Clustering…
5
Clustering
7
Clustering
8
Clustering
9
Clustering
10
Clustering
11
Clustering: K Means
13
Clustering: K Means
14
Clustering: K Means
15
Clustering: K Means
16
Clustering: K Means
17
Clustering: K Means
18
Clustering: K Means
19
Clustering: K Means
20
Clustering: K Means
21
Clustering: K Means
22
Clustering: K Means
24
Clustering: K Means
25
Limitations: K Means
26
Limitations: K Means
27
Limitations: K Means
28
Limitations: K Means
29
Clustering: K Means
Hard clustering and soft clustering
In hard clustering, one data
point can belong to one cluster
only.
30
DBSCAN
Density-based spatial clustering of applications with noise (DBSCAN)
● DBSCAN is a density-based spatial clustering algorithm proposed in 1996
by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu.
● It is a clustering method that identifies groups of data points that are
densely packed together while flagging outliers that are isolated in low-
density areas.
● DBSCAN is widely recognized and frequently cited as one of the most
common clustering algorithms in data analysis and machine learning!
32
Density Based Clustering
Basic idea
● Clusters are dense regions in
the data space, separated by
regions of lower object density
● Discovers clusters of arbitrary
shape
Where are the clusters?
33
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.
34
Density Definition
● ε-Neighborhood: All points q within a radius of ε from a given point p.
36
Density Definition
● High Density: ε-Neighborhood of a point contains at least MinPts
number of points.
Example:
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
37
Type of Points
● A point is a core point if it has at least MinPts number of points
within ε. These are points that are at the interior of a cluster.
● A border point has fewer than MinPts within ε, but is in the
neighborhood of a core point.
● An outlier point is any point that is not a core point nor a border
point.
38
Type of Points
MinPts = 5
39
Type of Points
40
Density Reachability
Directly density-reachable
42
Density Reachability
Directly density-reachable
43
Density Reachability
Directly density-reachable
44
Density Reachability
Directly density-reachable
45
Density Reachability
Density-reachable (Indirectly)
46
Density Reachability
Density-reachable (Indirectly)
p ← p2 ← p1 ← q form a chain
47
Density Connectivity
Density-connected
https://www.geeksforgeeks.org/
48
DBSCAN Algorithm ● ε = 2 cm
● MinPts = 3
50
DBSCAN Algorithm
51
DBSCAN Algorithm
52
DBSCAN Algorithm
for each p ∈ D do
if p is not yet classified then
if p is a core-object then
collect all objects density reachable
from p and assign them to a new
cluster
else
assign p to outlier
53
DBSCAN Algorithm
for each p ∈ D do
if p is not yet classified then
What is the time complexity?
if p is a core-object then
collect all objects density reachable What is the space complexity?
from p and assign them to a new
cluster
else
assign p to outlier
54
DBSCAN is Sensitive to Parameters
56
DBSCAN vs. K Means
57
DBSCAN vs. K Means
DBSCAN K Means
● In DBSCAN two parameters are ● In K-Means only one parameter is required
required for training the Model is for training the model
● DBSCAN can work well with ● K-Means is very sensitive to the number of
datasets having noise and outliers clusters so it need to specified
● K-Means does not work well with outliers
data
58