0% found this document useful (0 votes)
10 views12 pages

11 Grid Based Methods 04-11-2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

11 Grid Based Methods 04-11-2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 12

Eick: Topics9---Clustering 2

Density-based Clustering

1
Eick: Topics9---Clustering 2

Density-Based Clustering
Methods
 Clustering based on density (local cluster criterion),
such as density-connected points or based on an
explicitly constructed density function

Major features:

Discover clusters of arbitrary shape

Handle noise

One scan

Need density parameters
 Several interesting studies:

DBSCAN: Ester, et al. (KDD’96)

DENCLUE: Hinneburg & D. Keim (KDD’98/2006)

OPTICS: Ankerst, et al (SIGMOD’99).

CLIQUE: Agrawal, et al. (SIGMOD’98)
2
Eick: Topics9---Clustering 2

DBSCAN
(http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf )

 DBSCAN is a density-based algorithm.


 Density = number of points within a specified radius r
(Eps)

 A point is a core point if it has more than a specified


number of points (MinPts) within Eps

These are points that are at the interior of a
cluster

 A border point has fewer than MinPts within Eps, but is in


the neighborhood of a core point

 A noise point is any point that is not a core point or a


border point.
Eick: Topics9---Clustering 2

DBSCAN: Core, Border, and Noise


Points
Eick: Topics9---Clustering 2

DBSCAN Algorithm (simplified view for teaching)


1. Create a graph whose nodes are the points to be
clustered
2. For each core-point c create an edge from c to every
point p in the -neighborhood of c
3. Set N to the nodes of the graph;
4. If N does not contain any core points terminate
5. Pick a core point c in N
6. Let X be the set of nodes that can be reached from c
by going forward;
1. create a cluster containing X{c}
2. N=N/(X{c})
7. Continue with step 4
Remark: points that are not assigned to any cluster are outliers;
Eick: Topics9---Clustering 2

DBSCAN: Core, Border and Noise


Points

Original Points Point types: core,


border and noise

Eps = 10, MinPts = 4


Eick: Topics9---Clustering 2

When DBSCAN Works Well

Original Points Clusters

• Resistant to Noise
• Can handle clusters of different shapes and sizes
Eick: Topics9---Clustering 2

When DBSCAN Does NOT Work Well

(MinPts=4, Eps=9.75).

Original Points

• Varying densities
• High-dimensional data
(MinPts=4, Eps=9.92)
Eick: Topics9---Clustering 2

DBSCAN: Determining EPS and MinPts

 Idea is that for points in a cluster, their kth nearest


neighbors are at roughly the same distance
 Noise points have the kth nearest neighbor at
farther distance
 So, plot sorted distance of every point to its kth
nearest neighbor

Core-points
Non-Core-points

Run K-means for Minp=4 and not fixed


Eick: Topics9---Clustering 2

Complexity DBSCAN
 Time Complexity: O(n2)—for each point
it has to be determined if it is a core
point, can be reduced to O(n*log(n)) in
lower dimensional spaces by using
efficient data structures (n is the
number of objects to be clustered);
 Space Complexity: O(n).
Eick: Topics9---Clustering 2

Summary DBSCAN
 Good: can detect arbitrary shapes, not very
sensitive to noise, supports outlier
detection, complexity is kind of okay,
beside K-means the second most used
clustering algorithm.
 Bad: does not work well in high-
dimensional datasets, parameter selection
is tricky, has problems of identifying
clusters of varying densities (SSN
algorithm), density estimation is kind of
simplistic (does not create a real density
function, but rather a graph of density-
connected points)
Eick: Topics9---Clustering 2

DBSCAN Algorithm Revisited


 Eliminate noise points
 Perform clustering on the remaining
points:

You might also like