DBSCAN - Density-Based - Spatial - Clustering - of - Applications - With (1) (Autosaved)
DBSCAN - Density-Based - Spatial - Clustering - of - Applications - With (1) (Autosaved)
Outlier
Border
Core = 1unit
MinPts = 5
Concepts: ε-Neighborhood
• ε-Neighborhood - Objects within a radius of ε from an object. (epsilon-neighborhood)
• Core objects - ε-Neighborhood of an object contains at least MinPts of objects
ε-Neighborhood of p
ε ε ε-Neighborhood of q
qq pp
p is a core object (MinPts = 4)
q is not a core object
Concepts: Reachability
• Directly density-reachable
• An object q is directly density-reachable from object p if q is within the ε-Neighborhood of p
and p is a core object.
ε ε
qq pp
Concepts: Reachability
• Density-reachable:
• An object p is density-reachable from q w.r.t ε and MinPts if there is a chain of objects p1,
…,pn, with p1=q, pn=p such that pi+1is directly density-reachable from pi w.r.t ε and MinPts for
all 1 <= i <= n
q is density-reachable from p
p is not density- reachable from q?
Transitive closure of direct density-Reachability, asymmetric
qq
pp
Concepts: Connectivity
• Density-connectivity
• Object p is density-connected to object q w.r.t ε and MinPts if there is an object o such that both p and q are density-reachable from o
w.r.t ε and MinPts
qq
rr pp
Concepts: cluster & noise
• Cluster: a cluster C in a set of objects D w.r.t ε and MinPts is a non empty subset of D satisfying
• Maximality: For all p, q if p Î C and if q is density-reachable from p w.r.t ε and MinPts, then
also q Î C.
• Connectivity: for all p, q Î C, p is density-connected to q w.r.t ε and MinPts in D.
• Note: cluster contains core objects as well as border objects
• Noise: objects which are not directly density-reachable from at least one core object.
(Indirectly) Density-reachable:
p
p1
q
Density-connected
p q
o
DBSCAN: The Algorithm
• select a point p
• Retrieve all points density-reachable from p wrt and MinPts.
• If p is a core point, a cluster is formed.
• If p is a border point, no points are density-reachable from p and DBSCAN visits the next
point of the database.
• Continue the process until all of the points have been processed.
Result is independent of the order of processing the points
An Example
MinPts = 4
C1
C1
C1
Sl. No. Elevation Date Flood Distance Price
1 10 -103 0 0.3 4.5
2 4 -103 0 2.5 10.6
3 0 -98 1 10.3 1.7
4 1 -93 0 14 5
5 1 -92 1 14 5
6 2 -86 0 0 3.3
7 4 -68 0 0 5.7
8 4 -64 0 0 6.2
9 20 -63 0 1.2 19.4
10 0 -62 0 0 3.2
11 0 -61 0 0 4.7
12 3 -60 0 0 6.9
13 5 -59 0 0.5 8.1
14 8 -59 0 4.4 11.6
15 10 -59 0 4.2 19.3
16 9 -59 0 4.5 11.7
17 8 -59 0 4.7 13.3
18 6 -59 0 4.9 15.1
19 11 -59 0 4.6 12.4
20 8 -59 0 5 15.3
21 0 -54 0 16.5 12.2
22 5 -54 0 5.2 18.1
23 2 -53 0 5.5 16.8
24 0 -49 1 11.9 5.9
25 2 -45 1 5.5 4
26 5 -39 0 7.2 37.2
27 5 -39 0 0.15 18.2
28 2 -35 0 10.2 15.2
29 5 -16 0 5.5 22.9
30 2 -5 1 5.5 15.2
31 2 -4 0 5.5 21.9