11 Grid Based Methods 04-11-2024
11 Grid Based Methods 04-11-2024
Density-based Clustering
1
Eick: Topics9---Clustering 2
Density-Based Clustering
Methods
Clustering based on density (local cluster criterion),
such as density-connected points or based on an
explicitly constructed density function
Major features:
Discover clusters of arbitrary shape
Handle noise
One scan
Need density parameters
Several interesting studies:
DBSCAN: Ester, et al. (KDD’96)
DENCLUE: Hinneburg & D. Keim (KDD’98/2006)
OPTICS: Ankerst, et al (SIGMOD’99).
CLIQUE: Agrawal, et al. (SIGMOD’98)
2
Eick: Topics9---Clustering 2
DBSCAN
(http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf )
• Resistant to Noise
• Can handle clusters of different shapes and sizes
Eick: Topics9---Clustering 2
(MinPts=4, Eps=9.75).
Original Points
• Varying densities
• High-dimensional data
(MinPts=4, Eps=9.92)
Eick: Topics9---Clustering 2
Core-points
Non-Core-points
Complexity DBSCAN
Time Complexity: O(n2)—for each point
it has to be determined if it is a core
point, can be reduced to O(n*log(n)) in
lower dimensional spaces by using
efficient data structures (n is the
number of objects to be clustered);
Space Complexity: O(n).
Eick: Topics9---Clustering 2
Summary DBSCAN
Good: can detect arbitrary shapes, not very
sensitive to noise, supports outlier
detection, complexity is kind of okay,
beside K-means the second most used
clustering algorithm.
Bad: does not work well in high-
dimensional datasets, parameter selection
is tricky, has problems of identifying
clusters of varying densities (SSN
algorithm), density estimation is kind of
simplistic (does not create a real density
function, but rather a graph of density-
connected points)
Eick: Topics9---Clustering 2