CLUSTERING
DENSITY-BASED METHODS
Elsayed Hemayed
Data Mining Course
Outline
2
Density-Based Clustering Methods
Density-Based Clustering Background
Terminology
How does DBSCAN find clusters?
DBSCAN
Density-based Clustering Methods
Clustering Methods
3
Partitioning methods
K-Means
Hierarchical methods
Agglomerative Hierarchical Clustering
Divisive hierarchical clustering
Density-based methods
DBSCAN: a Density-Based Spatial Clustering of Applications with Noise
Grid-based methods
STING: A Statistical Information Grid Approach to Spatial Data Mining
Model-based methods
Expectation-Maximization
Neural Network Approach
High Dimensional Data Clustering
CLIQUE: A Dimension-Growth Subspace Clustering Method
Density-based Clustering Methods
4 Density-based Clustering Methods
DBSCAN
Density-based Clustering Methods
Density-Based Clustering Methods
5
Clustering based on density, such as density-connected points instead
of distance metric.
Cluster = set of “density connected” points.
Major features:
Discover clusters of arbitrary shape
Handle noise
Need “density parameters” as termination condition- (when no new
objects can be added to the cluster.)
Example:
DBSCAN (Ester, et al. 1996)
OPTICS (Ankerst, et al 1999)
DENCLUE (Hinneburg & D. Keim 1998)
Density-based Clustering Methods
Density-Based Clustering: Background
6
Eps neighborhood: The neighborhood within a radius
Eps of a given object
MinPts: Minimum number of points in an Eps-neighborhood
of that object.
Core object: If the Eps neighborhood contains at least a
minimum number of points Minpts, then the object is a core
object
Directly density-reachable: A point p is directly density-
reachable from a point q wrt. Eps, MinPts if
1) p is within the Eps neighborhood of q
2) q is a core object p MinPts = 5
q
Density-based Clustering Methods
Eps = 1
Density Reachability and Density
7
Connectivity
M, P, O and R are core objects since each is in an
Eps neighborhood containing at least 3 points
Minpts = 3
Eps=radius
of the
circles
Density-based Clustering Methods
Directly density reachable
8
Q is directly density reachable from M.
M is directly density reachable from P and vice versa.
Density-based Clustering Methods
Indirectly density reachable
9
Q is indirectly density reachable from P since Q is
directly density reachable from M and M is directly
density reachable from P. But, P is not density
reachable from Q since Q is not a core object.
Density-based Clustering Methods
Core, border, and noise points
10
DBSCAN is a Density-Based Spatial Clustering of
Applications with Noise
Density = number of points within a specified radius (Eps)
A point is a core point if it has a specified number (or more)
of points (MinPts) within Eps
These are points that are at the interior of a cluster.
A border point has fewer than MinPts within Eps, but is in the
neighborhood of a core point.
A noise point is any point that is not a core point nor a
border point.
Density-based Clustering Methods
How does DBSCAN find clusters?
11
DBSCAN searches for clusters by checking the Eps-
neighborhood of each point in the database.
If the Eps-neighborhood of a point p contains more than
MinPts, a new cluster with p as a core object is created.
DBSCAN then iteratively collects directly density-
reachable objects from these core objects, which may
involve the merge of a few density-reachable clusters.
The process terminates when no new point can be
added to any cluster
Density-based Clustering Methods
DBSCAN Algorithm
12
Arbitrary select a point p
Retrieve all points density-reachable from p wrt Eps
and MinPts.
If p is a core point, a cluster is formed.
If p is a border point, no points are density-reachable
from p and DBSCAN visits the next point of the
database.
Continue the process until all of the points have been
processed.
Density-based Clustering Methods
DBSCAN Summary
13
DBSCAN is A Density-Based Clustering Method Based
on Connected Regions with Sufficiently High Density
The algorithm grows regions with sufficiently high
density into clusters and discovers clusters of arbitrary
shape in spatial databases with noise.
It defines a cluster as a maximal set of density-
connected points. So distance is not the metric unlike
the case of hierarchical methods.
Density-based Clustering Methods
Summary
14
Density-Based Clustering Methods
Density-Based Clustering Background
Terminology
How does DBSCAN find clusters?
DBSCAN
Density-based Clustering Methods