Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
parameter, MinPts, which specifies the density threshold of dense regions. An object is
a core object if the -neighborhood of the object contains at least MinPts objects. Core
objects are the pillars of dense regions.
Given a set, D, of objects, we can identify all core objects with respect to the given
parameters, and MinPts. The clustering task is therein reduced to using core objects
and their neighborhoods to form dense regions, where the dense regions are clusters.
For a core object q and an object p, we say that p is directly density-reachable from q
(with respect to and MinPts) if p is within the -neighborhood of q. Clearly, an object
p is directly density-reachable from another object q if and only if q is a core object and
p is in the -neighborhood of q. Using the directly density-reachable relation, a core
object can “bring” all objects from its -neighborhood into a dense region.
“How can we assemble a large dense region using small dense regions centered by core
objects?” In DBSCAN, p is density-reachable from q (with respect to and MinPts in
D) if there is a chain of objects p1 , . . . , pn , such that p1 = q, pn = p, and pi+1 is directly
density-reachable from pi with respect to and MinPts, for 1 ≤ i ≤ n, pi ∈ D. Note that
density-reachability is not an equivalence relation because it is not symmetric. If both o1
and o2 are core objects and o1 is density-reachable from o2 , then o2 is density-reachable
from o1 . However, if o2 is a core object but o1 is not, then o1 may be density-reachable
from o2 , but not vice versa.
To connect core objects as well as their neighbors in a dense region, DBSCAN uses
the notion of density-connectedness. Two objects p1 , p2 ∈ D are density-connected with
respect to and MinPts if there is an object q ∈ D such that both p1 and p2 are density-
reachable from q with respect to and MinPts. Unlike density-reachability, density-
connectedness is an equivalence relation. It is easy to show that, for objects o1 , o2 , and
o3 , if o1 and o2 are density-connected, and o2 and o3 are density-connected, then so are
o1 and o3 .
Example 10.7 Density-reachability and density-connectivity. Consider Figure 10.14 for a given
represented by the radius of the circles, and, say, let MinPts = 3.
Of the labeled points, m, p, o, r are core objects because each is in an -neighborhood
containing at least three points. Object q is directly density-reachable from m. Object m
is directly density-reachable from p and vice versa.
Object q is (indirectly) density-reachable from p because q is directly density-
reachable from m and m is directly density-reachable from p. However, p is not density-
reachable from q because q is not a core object. Similarly, r and s are density-reachable
from o and o is density-reachable from r. Thus, o, r, and s are all density-connected.
m
r
p
s
“How does DBSCAN find clusters?” Initially, all objects in a given data set D are
marked as “unvisited.” DBSCAN randomly selects an unvisited object p, marks p as
“visited,” and checks whether the -neighborhood of p contains at least MinPts objects.
If not, p is marked as a noise point. Otherwise, a new cluster C is created for p, and all
the objects in the -neighborhood of p are added to a candidate set, N . DBSCAN iter-
atively adds to C those objects in N that do not belong to any cluster. In this process,
for an object p0 in N that carries the label “unvisited,” DBSCAN marks it as “visited” and
checks its -neighborhood. If the -neighborhood of p0 has at least MinPts objects, those
objects in the -neighborhood of p0 are added to N . DBSCAN continues adding objects
to C until C can no longer be expanded, that is, N is empty. At this time, cluster C is
completed, and thus is output.
To find the next cluster, DBSCAN randomly selects an unvisited object from the
remaining ones. The clustering process continues until all objects are visited. The
pseudocode of the DBSCAN algorithm is given in Figure 10.15.
If a spatial index is used, the computational complexity of DBSCAN is O(n log n),
where n is the number of database objects. Otherwise, the complexity is O(n2 ). With
appropriate settings of the user-defined parameters, and MinPts, the algorithm is
effective in finding arbitrary-shaped clusters.