0% found this document useful (0 votes)

16 views7 pages

14 Dbscan

Uploaded by

l.arrizabalaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

14 Dbscan

Uploaded by

l.arrizabalaga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Density-Based Methods

Rubén Sánchez Corcuera

[email protected]

■ Partitioning and hierarchical methods are designed to ﬁnd

DBSCAN
spherical-shaped clusters.
■ They have difﬁculty ﬁnding clusters of arbitrary shape such as the “S” shape
and oval clusters.

Clustering
○ Given such data, they would likely inaccurately identify convex regions,
where noise or outliers are included in the clusters.
■ To ﬁnd clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
■ This is the main strategy behind density-based clustering methods, which
can discover clusters of nonspherical shape.

Density-Based Methods DBSCAN: Density-Based Spatial Clustering of

Applications with Noise
■ How can we find dense regions in density-based clustering?
○ The density of an object o can be measured by the number of objects close
to o.
○ DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
finds core objects, that is, objects that have dense neighborhoods.
○ It connects core objects and their neighborhoods to form dense regions as
clusters.
■ How does DBSCAN quantify the neighborhood of an object?
○ A user-specified parameter ϵ > 0 is used to specify the radius of a
neighborhood we consider for every object.
○ The ϵ -neighborhood of an object o is the space within a radius centered
at o.
3 4
DBSCAN DBSCAN

■ Given a set, D, of objects, we can identify all core objects with respect to the
■ Due to the fixed neighborhood size parameterized by ϵ, the density of a given parameters, ϵ and MinPts.
neighborhood can be measured simply by the number of objects in the
neighborhood. ■ The clustering task is therein reduced to using core objects and their
neighborhoods to form dense regions, where the dense regions are clusters.
■ To determine whether a neighborhood is dense or not, DBSCAN uses another
user-specified parameter, MinPts, which specifies the density threshold of dense ■ For a core object q and an object p, we say that p is directly density-reachable
regions. from q (with respect to ϵ and MinPts) if p is within the ϵ -neighborhood of q.

■ An object is a core object if the ϵ -neighborhood of the object contains at least ■ Clearly, an object p is directly density-reachable from another object q if and
MinPts objects. only if q is a core object and p is in the ϵ-neighborhood of q.

■ Core objects are the pillars of dense regions. ■ Using the directly density-reachable relation, a core object can “bring” all
objects from its ϵ-neighborhood into a dense region.

5 6

DBSCAN DBSCAN

■ How can we assemble a large dense region using small dense regions centered
by core objects?
■ To connect core objects as well as their neighbors in a dense region, DBSCAN
■ In DBSCAN, p is density-reachable from q (with respect to ϵ and MinPts in D) if
uses the notion of density-connectedness. Two objects p1,p2 ∈ D are
there is a chain of objects p1,…,pn, such that p1 = q, pn = p, and pi+1 is directly
density-connected with respect to ϵ and MinPts if there is an object q ∈ D such
density-reachable from pi with respect to and MinPts, for 1 <= i <= n, pi ∈ D.
that both p1 and p2 are density reachable from q with respect to ϵ and MinPts.
■ Note that density-reachability is not an equivalence relation because it is not
■ Unlike density-reachability, density connectedness is an equivalence relation. It
symmetric.
is easy to show that, for objects o1, o2, and o3, if o1 and o2 are density-connected,
■ If both o1 and o2 are core objects and o1 is density-reachable from o2, then o2 is and o2 and o3 are density-connected, then so are o1 and o3.
density-reachable from o1. However, if o2 is a core object but o1 is not, then o1
may be density-reachable from o2, but not vice versa.

7 8
Density-reachability and density-connectivity Density-reachability and density-connectivity
example example
■ Consider the ﬁgure for a given ϵ represented by the radius of the circles, and, ■ Of the labeled points, m, p, o, r are core objects because each is in an
say, let MinPts = 3. ϵ–neighborhood containing at least three points.

9 10

Density-reachability and density-connectivity Density-reachability and density-connectivity

example example
■ Object q is directly density-reachable from m. ■ Object m is directly density-reachable from p and vice versa.

11 12
Density-reachability and density-connectivity Density-reachability and density-connectivity
example example
■ Object q is (indirectly) density-reachable from p because q is directly density ■ However, p is not density reachable from q because q is not a core object.
reachable from m and m is directly density-reachable from p.

13 14

Density-reachability and density-connectivity DBSCAN

example
■ Similarly, r and s are density-reachable from o and o is density-reachable from r.
Thus, o, r, and s are all density-connected.

■ We can use the closure of density-connectedness to ﬁnd connected dense

regions as clusters.
■ Each closed set is a density-based cluster. A subset C ⊆ D is a cluster if:
1. for any two objects o1,o2 ∈ C, o1 and o2 are density-connected
2. there does not exist an object o ∈ C and another object o’ ∈(D-C) such
that o and o’ are density connected.

15 16
DBSCAN: How does it ﬁnd clusters? DBSCAN: How does it ﬁnd clusters?

■ Initially, all objects in a given data set D are marked as “unvisited.”

■ If the ϵ-neighborhood of p’ has at leastMinPts objects, those objects in the ϵ
■ DBSCAN randomly selects an unvisited object p, marks p as “visited,” and
-neighborhood of p’ are added to N.
checks whether the ϵ-neighborhood of p contains at least MinPts objects. If not,
p is marked as a noise point. ■ DBSCAN continues adding objects to C until C can no longer be expanded, that
is, N is empty.
■ Otherwise, a new cluster C is created for p, and all the objects in the
ϵ-neighborhood of p are added to a candidate set, N. DBSCAN iteratively adds ■ At this time, cluster C is completed, and thus is output.
to C those objects in N that do not belong to any cluster.
■ To ﬁnd the next cluster, DBSCAN randomly selects an unvisited object from the
■ In this process, for an object p’ in N that carries the label “unvisited,” DBSCAN remaining ones. The clustering process continues until all objects are visited.
marks it as “visited” and checks its ϵ-neighborhood.

17 18

DBSCAN

■ If a spatial index is used, the computational complexity of DBSCAN

is O(nlogn), where n is the number of database objects. Otherwise,
the complexity is O(n2).
■ With appropriate settings of the user-deﬁned parameters, ϵ and
MinPts, the algorithm is effective in ﬁnding arbitrary-shaped
clusters.

19 20
DBSCAN: Advantages DBSCAN: Advantages
■ Handles irregularly shaped and sized clusters. One of the main advantages
of DBSCAN is its ability to detect clusters that are irregularly shaped. Of all
the common clustering algorithms out there, DBSCAN is one of the ■ Less sensitive to initialization conditions. DBSCAN is less sensitive to
algorithms that makes the fewest assumptions about the shape of your initialization conditions like the order of the observations in the dataset and
clusters. That means that DBSCAN can be used to detect clusters that are the seed that is used than some other clustering algorithms. Some points
oddly or irregularly shaped, such as clusters that are ring-shaped. that are on the borders between clusters may shift around when
■ Robust to outliers. Another big advantage of DBSCAN is that it is able to initialization conditions change, but the majority of the observations should
detect outliers and exclude them from the clusters entirely. That means that remain in the same cluster.
DBSCAN is very robust to outliers and great for datasets with multiple ■ Relatively fast. While DBSCAN is not the fastest clustering algorithm out
outliers. there, it is certainly not the slowest either. There are multiple
■ Does not require the number of clusters to be speciﬁed. Yet another implementations of DBSCAN that aim to optimize the time complexity of
advantage of DBSCAN is that it does not require the user to specify the the algorithm. DBSCAN is generally slower than k-means clustering but
number of clusters. Instead, DBSCAN can automatically detect the number faster than hierarchical clustering and spectral clustering.
of clusters that exist in the data. This is great for cases where you do not
have much intuition on how many clusters there should be.
21 22

DBSCAN: Disadvantages DBSCAN: Disadvantages

■ Difﬁcult to incorporate categorical features. One of the main disadvantages of

DBSCAN is that it does not perform well on datasets with categorical features.
That means that you are best off using DBSCAN in cases where most of your
features are numeric. ■ Sensitive to scale. Like many other clustering algorithms, DBSCAN is
■ Requires a drop in density to detect cluster borders. With DBSCAN, there must sensitive to the scale of your variables. That means that you may need
be a drop in the density of the data points between clusters in order for the to rescale your variables if they are on very different scales.
algorithm to be able to detect the boundaries between clusters. If there are
multiple clusters that are overlapping without a drop in data density between ■ Struggles with high dimensional data. Like many clustering algorithms,
them, they may get grouped into a single cluster. the performance of DBSCAN tends to degrade in situations where
there are many features. In general, you are better off using
■ Struggles with clusters of varying density. DBSAN also has a difficulty detecting
clusters of varying density. This is because DBSCAN determines where clusters dimensionality reduction or features selection techniques to reduce the
start and stop by looking at places where the density of data points drops below number of features if you have a high-dimensional dataset.
a certain threshold. It may be difficult to find a threshold that captures all of the
points in the less dense cluster without excluding too many extraneous outliers
in the more dense cluster.
23 24
DBSCAN: When to use it? DBSCAN: When NOT to use it?

■ You suspect there may be irregularly shaped clusters. If you have ■ No drop in density between clusters. In general, DBSCAN requires
reason to expect that the clusters in your dataset may be irregularly there to be a drop in the density of data points in order to detect
shaped, DBSCAN is a great option. DBSCAN will be able to identify boundaries between clusters. That means that you should not use
clusters that are spherical or ellipsoidal as well as clusters that have DBSCAN if you do not expect there to be much of a drop in density
more irregular shapes. between different clusters. For example, if you expect many of your
clusters overlap, multiple clusters might get grouped together into one
■ Data has outliers. DBSCAN is also a great option for cases where there large cluster.
are many outliers in your dataset. DBSCAN is able to detect outlying
data point that do not belong to any clusters and exclude those data ■ Many categorical features. DBSCAN is generally intended to be used in
points from the the clusters. scenarios where the majority of your features are numeric. That means
that you should avoid using DBSCAN in cases where you have many
■ Anomaly detection. Since DBSCAN automatically detects outliers and categorial features. In these scenarios, you may be better off using
excludes them from all clusters, DBSCAN is also a good option in cases hierarchical clustering with an appropriate distance metric or an
where you want to be able to detect outliers in your dataset. extension of k-means clustering like k-modes to k-prototypes

25 26

OPTICS (Ordering Points to Identify the Clustering

Structure) Further reading

■ Section 10.4 in [Han & Kamber, 2016]

■ OPTICS is an extension of DBSCAN that performs better on datasets

that have clusters of varying densities.
■ You can ﬁnd it also implemented in scikit-learn:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTIC
S.html

27 28

Unit 8 DBSCAN
No ratings yet
Unit 8 DBSCAN
53 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Numerical I Module-1
88% (8)
Numerical I Module-1
184 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
ML14 Dbscan
No ratings yet
ML14 Dbscan
10 pages
Fuzzy Extensions of The DBScan Clustering Algorithm
No ratings yet
Fuzzy Extensions of The DBScan Clustering Algorithm
12 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
DBSCAN
No ratings yet
DBSCAN
8 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
Dbscan
No ratings yet
Dbscan
18 pages
Density Based Clustering
No ratings yet
Density Based Clustering
22 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
DBSCAN Presentation
No ratings yet
DBSCAN Presentation
10 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Module 10
No ratings yet
Module 10
59 pages
Clusters - Density-Based
No ratings yet
Clusters - Density-Based
12 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Lesson 4.2 Intermediate and Extreme Value Theorem
100% (2)
Lesson 4.2 Intermediate and Extreme Value Theorem
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Capture D'écran, Le 2025-04-14 À 16.57.54
No ratings yet
Capture D'écran, Le 2025-04-14 À 16.57.54
40 pages
M6
No ratings yet
M6
23 pages
Density Based
No ratings yet
Density Based
52 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
Lecture 11 DBSCAN
No ratings yet
Lecture 11 DBSCAN
6 pages
DBSCAN - Density-Based - Spatial - Clustering - of - Applications - With (1) (Autosaved)
No ratings yet
DBSCAN - Density-Based - Spatial - Clustering - of - Applications - With (1) (Autosaved)
12 pages
Density Based Clustering
No ratings yet
Density Based Clustering
17 pages
Clustering Density Based
No ratings yet
Clustering Density Based
14 pages
Density Based
No ratings yet
Density Based
52 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
19 pages
Ktustudents - In: 1. Hierarchical Methods
No ratings yet
Ktustudents - In: 1. Hierarchical Methods
21 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
15 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
14 pages
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
No ratings yet
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
28 pages
Density ML
No ratings yet
Density ML
51 pages
DB SCAN Unit 4
No ratings yet
DB SCAN Unit 4
6 pages
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
No ratings yet
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
27 pages
Data Mining
No ratings yet
Data Mining
3 pages
DBSCAN (Density Based Spatial Clustering)
No ratings yet
DBSCAN (Density Based Spatial Clustering)
10 pages
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
Density Based
No ratings yet
Density Based
52 pages
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
No ratings yet
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
21 pages
VDBSCAN
No ratings yet
VDBSCAN
4 pages
Density and Grid Based Clustering
No ratings yet
Density and Grid Based Clustering
5 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Density Based
No ratings yet
Density Based
27 pages
A Survey of Some Density Based Clustering Techniques PDF
No ratings yet
A Survey of Some Density Based Clustering Techniques PDF
5 pages
Dbscan: Densiy Based Scan Algorithm
No ratings yet
Dbscan: Densiy Based Scan Algorithm
8 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
Chapter 6 - Distribution and Network Models: Cengage Learning Testing, Powered by Cognero
100% (1)
Chapter 6 - Distribution and Network Models: Cengage Learning Testing, Powered by Cognero
42 pages
05 Interpolation
No ratings yet
05 Interpolation
44 pages
30 Questions To Test A Data Scientist On Linear Regression
No ratings yet
30 Questions To Test A Data Scientist On Linear Regression
10 pages
CHAPTER 4 - Network Models
No ratings yet
CHAPTER 4 - Network Models
11 pages
4T + 3C 240 (Hours of Carpentry Time)
No ratings yet
4T + 3C 240 (Hours of Carpentry Time)
5 pages
Lesson 8 - LPP - Transportation Problems
No ratings yet
Lesson 8 - LPP - Transportation Problems
43 pages
MCQs NS-PDE 1
No ratings yet
MCQs NS-PDE 1
19 pages
Illustrating Quadratic Equation: Grade 9
No ratings yet
Illustrating Quadratic Equation: Grade 9
6 pages
Ordinary Differential Equations
No ratings yet
Ordinary Differential Equations
28 pages
2.6 M Uller's Method: Chapter 2. Solutions of Equations of One Variable
No ratings yet
2.6 M Uller's Method: Chapter 2. Solutions of Equations of One Variable
9 pages
Ai-Unit2 - QB
No ratings yet
Ai-Unit2 - QB
7 pages
CH 4-Design Optimization-Optimum Design Concepts PDF
No ratings yet
CH 4-Design Optimization-Optimum Design Concepts PDF
62 pages
Layout Planning (ALDEP Algorithm)
No ratings yet
Layout Planning (ALDEP Algorithm)
5 pages
2BP Slides To Present in Thu-Aug-24
No ratings yet
2BP Slides To Present in Thu-Aug-24
21 pages
Floyd Warshall Algorithm
No ratings yet
Floyd Warshall Algorithm
5 pages
W03L01 - SP24 - AIC262 - Intro To Artificial Intelligence - Syed Ahmed
No ratings yet
W03L01 - SP24 - AIC262 - Intro To Artificial Intelligence - Syed Ahmed
19 pages
Dfs TSP
No ratings yet
Dfs TSP
7 pages
DAA Lab Manual
No ratings yet
DAA Lab Manual
22 pages
ODE Multi Step
No ratings yet
ODE Multi Step
38 pages
Usict End Sem 2023 Exam
No ratings yet
Usict End Sem 2023 Exam
2 pages
Lectures - Multiple - Regression - Analysis - Further - Issues
No ratings yet
Lectures - Multiple - Regression - Analysis - Further - Issues
14 pages
ComputerScience 1A
No ratings yet
ComputerScience 1A
4 pages
L, - 1'), 1.' - C:-') Ty - 'T"1/ J.,... : Test-Chs Sections (25 Points)
No ratings yet
L, - 1'), 1.' - C:-') Ty - 'T"1/ J.,... : Test-Chs Sections (25 Points)
2 pages
Introduction To Clusterwise Regression
No ratings yet
Introduction To Clusterwise Regression
10 pages
Algorithms Modeexam Qp-Nba
No ratings yet
Algorithms Modeexam Qp-Nba
3 pages
Splines
No ratings yet
Splines
18 pages
Revision Test - 1 (MATHS) ON 28-12-17
No ratings yet
Revision Test - 1 (MATHS) ON 28-12-17
2 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

14 Dbscan

Uploaded by

14 Dbscan

Uploaded by

Density-Based Methods

Rubén Sánchez Corcuera

■ Partitioning and hierarchical methods are designed to ﬁnd

Density-Based Methods DBSCAN: Density-Based Spatial Clustering of

Density-reachability and density-connectivity Density-reachability and density-connectivity

Density-reachability and density-connectivity DBSCAN

■ We can use the closure of density-connectedness to ﬁnd connected dense

■ Initially, all objects in a given data set D are marked as “unvisited.”

■ If a spatial index is used, the computational complexity of DBSCAN

DBSCAN: Disadvantages DBSCAN: Disadvantages

■ Difﬁcult to incorporate categorical features. One of the main disadvantages of

OPTICS (Ordering Points to Identify the Clustering

■ Section 10.4 in [Han & Kamber, 2016]

■ OPTICS is an extension of DBSCAN that performs better on datasets

You might also like