0% found this document useful (0 votes)
50 views

Clustering in Machine Learning

Density-based clustering locates regions of high density that are separated by regions of low density. DBSCAN is a density-based clustering algorithm that discovers clusters of arbitrary shapes and identifies outliers. It requires two parameters, Eps which defines the neighborhood distance, and MinPts which defines the minimum number of points in the neighborhood. A point is part of a cluster if it is density-reachable from another point in the cluster. DBSCAN can discover clusters of arbitrary shapes and handles outliers but is sensitive to parameter selection.

Uploaded by

Peter Machado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Clustering in Machine Learning

Density-based clustering locates regions of high density that are separated by regions of low density. DBSCAN is a density-based clustering algorithm that discovers clusters of arbitrary shapes and identifies outliers. It requires two parameters, Eps which defines the neighborhood distance, and MinPts which defines the minimum number of points in the neighborhood. A point is part of a cluster if it is density-reachable from another point in the cluster. DBSCAN can discover clusters of arbitrary shapes and handles outliers but is sensitive to parameter selection.

Uploaded by

Peter Machado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Density-Based Clustering

Izabela Moise, Evangelos Pournaras, Dirk Helbing

Izabela Moise, Evangelos Pournaras, Dirk Helbing 1


Reminder

Unsupervised data mining


X Clustering
→ k -Means

Izabela Moise, Evangelos Pournaras, Dirk Helbing 2


Main Clustering Approaches

• Partitioning method
→ constructs partitions of data points
→ evaluates the partitions by some criterion
→ k -means, k -medoids
• Density-based method:
→ based on connectivity and density functions
→ DBSCAN, DJCluster

Izabela Moise, Evangelos Pournaras, Dirk Helbing 3


Density-Based Clustering

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4


Density-Based Clustering

Density-Based Clustering
locates regions of high density that are separated from one another
by regions of low density.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 4


Main principles

• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5


Main principles

• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5


Main principles

• Two parameters:
1. maximum radius of the neighbourhood → Eps
2. minimum number of points in an Eps neighbourhood of a point
→ MinPts
• NEps (p) : {q ∈ D s.t. dist(p, q) ≤ Eps }
• Key idea: the density of the neighbourhood has to exceed
some threshold.
• The shape of a neighbourhood depends on the dist function

Izabela Moise, Evangelos Pournaras, Dirk Helbing 5


Core, Border and Noise/Outlier

Izabela Moise, Evangelos Pournaras, Dirk Helbing 6


1
Jing Gao, SUNY Buffalo
Directly Density-Reachable

Directly density-reachable:
→ A point p is directly density-reachable from a point q wrt. Eps,
MinPts if:

1. p ∈ NEps (q) and


2. |NEps (q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7


Directly Density-Reachable

Directly density-reachable:
→ A point p is directly density-reachable from a point q wrt. Eps,
MinPts if:

1. p ∈ NEps (q) and


2. |NEps (q)| ≥ MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 7


Density-Reachable

• Density-reachable:
→ A point p is density-reachable from a point q wrt. Eps,
MinPts if there is a chain of points p1 , ..., pn , with
p1 = q, pn = p, s.t.pi+1 is directly density reachable from pi
• transitive but not symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 8


Density-Connected

Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9


Density-Connected

Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts
→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9


Density-Connected

Density-connected:
→ A point p is density-connected from a point q wrt. Eps, MinPts if
there is a point o s.t. p and q are density-reachable from o wrt. Eps
and MinPts
→ symmetric

Izabela Moise, Evangelos Pournaras, Dirk Helbing 9


DBSCAN - Density-Based Spatial Clustering of Applications
with Noise

Izabela Moise, Evangelos Pournaras, Dirk Helbing 10


Main Principles

One of the most cited clustering algorithms


Main principle:
a cluster is defined as a maximal set of density-connected points.

• Discovers clusters of arbitrary shapes (spherical, elongated,


linear), and noise
• Works with spatial datasets:
→ geomarketing, tomography, satellite images
• Requires only two parameters (no prior knowledge of the
number of clusters)

Izabela Moise, Evangelos Pournaras, Dirk Helbing 11


Definition: Cluster

2
Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 12


Definition: Noise

3
Erik Kropat, University of the Bundeswehr Munich

Izabela Moise, Evangelos Pournaras, Dirk Helbing 13


The Algorithm

1. Randomly select a point p


2. Retrieve all points density-reachable from p wrt. Eps and
MinPts
3. If p is a core point, a cluster is formed
4. If p is a border point, no points are density-reachable from p →
visit the next data point
5. Continue the process until all points have been processed

Izabela Moise, Evangelos Pournaras, Dirk Helbing 14


Selecting Eps and MinPts

The two parameters can be determined by a heuristic

Observation:
• For points in a cluster their k -th nearest neighbours are at
roughly the same distance.
• Noise points have the k -th nearest neighbour at farther
distance.

Izabela Moise, Evangelos Pournaras, Dirk Helbing 15


4

4
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 16
5

5
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 17
6

6
Erik Kropat, University of the Bundeswehr Munich
Izabela Moise, Evangelos Pournaras, Dirk Helbing 18
Pros and Cons

Pros:
X discovers clusters of arbitrary shapes
X handles noise
X needs density parameters as termination condition

Izabela Moise, Evangelos Pournaras, Dirk Helbing 19


Pros and Cons

Cons:
X cannot handle varying densities
X sensitive to parameters → hard to determine the correct set of
parameters
X sampling affects density measures

Izabela Moise, Evangelos Pournaras, Dirk Helbing 20

You might also like