0% found this document useful (0 votes)

3 views22 pages

DM Lect 8 - Clustering - DBSCAN

The document discusses density-based clustering, specifically the DBSCAN algorithm, which effectively identifies clusters of arbitrary shapes in datasets containing noise and outliers. Unlike K-means, DBSCAN does not require the user to specify the number of clusters and can find clusters of varying shapes. The document also outlines the algorithm's parameters, classification of points, and provides examples of its application.

Uploaded by

mohamed2004mowaffak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views22 pages

DM Lect 8 - Clustering - DBSCAN

Uploaded by

mohamed2004mowaffak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Data Mining

Density Based Clustering

Dr. Wedad Hussein

[email protected]

Dr. Mahmoud Mounir

[email protected]
TYPES OF CLUSTERING

Clustering algorithms
➢ Connectivity-based Clustering
➢ Centroid-based Clustering
➢ Distribution-based Clustering
➢ Density-based Clustering
➢ Graph – based Clustering
2
DBSCAN
• K-Means is suitable for finding spherical-shaped clusters
or convex clusters.
• In other words, it works well for compact and well separated
clusters.
• Moreover, it is also severely affected by the presence of noise
and outliers in the data.
• Unfortunately, real life data may contain:
• Clusters can be of arbitrary shape (oval, linear, and “S” shape).
• Data may contain noise and outliers.
• The plot contains 5 clusters and outliers,
• including:
• 2 oval clusters.
• 2 linear clusters.
• 1 compact cluster.
3
DBSCAN
•Given such data, k-means algorithm
has difficulties for identifying theses
clusters with arbitrary shapes.
•We know there are 5 clusters in the
data, but it can be seen that k-means
method inaccurately identifies the 5
clusters.

4
DBSCAN
▪It can be seen that DBSCAN
performs better for these data sets
and can identify the correct set of
clusters compared to k-means
algorithms.

5
DBSCAN
▪The DBSCAN, a density-based clustering algorithm, can be
used to identify clusters of any shape in dataset containing
noise and outliers.

▪DBSCAN stands for Density-Based Spatial Clustering and

Application with Noise.

▪The advantage of DBSCAN:

▪ Unlike K-means, DBSCAN does not require the user to specify
the number of clusters to be generated.
▪ DBSCAN can find any shape of clusters. The cluster doesn’t
have to be circular.
▪ DBSCAN can identify outliers.
6
DBSCAN
▪The basic idea behind the density-based clustering
approach is derived from a human intuitive clustering
method.
▪ For instance, by looking at the figure below, one can easily
identify four clusters along with several points of noise,
because of the differences in the density of points.
▪ As illustrated in the figure, clusters are dense regions in the
data space, separated by regions of lower density of points.
▪ DBSCAN algorithm is based on this intuitive notion of
“clusters” and “noise”. The key idea is that for each point of a
cluster, the neighborhood of a given radius has to contain at
least a minimum number of points.

7
Algorithm of DBSCAN
▪The goal is to identify dense regions, which can be
measured by the number of objects close to a given point.

▪Two important parameters are required for DBSCAN:

◦ epsilon (“eps”)
◦ minimum points (“MinPts”).
▪ The parameter eps defines the radius of neighborhood
around a point x. It’s called the epsilon-neighborhood of x.
▪ The parameter MinPts is the minimum number of
neighbors within “eps” radius.

8
Algorithm of DBSCAN
▪Any point x in the dataset, with a neighbor count greater
than or equal to MinPts, is marked as a core point.
▪We say that x is border point, if the number of its
neighbors is less than MinPts, but it belongs to the epsilon-
neighborhood of some core point.
▪Finally, if a point is neither a core nor a border point, then
it is called a noise point or an outlier.

9
Algorithm of DBSCAN
▪The figure below shows the different types of points (core,
border and outlier points) using MinPts = 6.
▪ x is a core point because neighbours_epsilon(x)=6,
▪ Y is a border point because neighbours_epsilon(y)<MinPts,
but it belongs to the ϵ-neighborhood of the core point x.
▪ z is a noise point.

10
Algorithm of DBSCAN
The points are classified as follows:
▪A point p is a core point, if at least MinPts points
are within distance (eps) of it (including p). Those
points are said to be directly reachable from p.

▪A point q is directly reachable from p if point q is

within distance (eps) from core point p and p must
be a core point.

▪A point q is density reachable from p if there is a

path p1, ..., pn with p1 = p and pn = q, where each
pi+1 is directly reachable from pi. (all points on the
path must be core points, with the possible
exception of q).
▪Two points p and q are density connected if there
are a core point x, such that p and q are density
reachable from x.
▪All points not reachable from any other point are
outliers or noise points.
11
Algorithm of DBSCAN
MinPts = 4.
▪ Red points are core points.

▪ Points B and C are not core

points but are reachable from
A (via other core points) and
thus belong to the cluster as
well.

▪ Point N is a noise point that is

neither a core point nor
directly-reachable.
12
Algorithm of DBSCAN

▪ A density-based cluster is defined as a group of

density connected points.
▪ Now if A is a core point, then it forms a cluster
together with all points (core or non-core) that are
reachable from it.

13
Algorithm of DBSCAN
o The algorithm of DBSCAN works as follow:
1. For each point xi, compute the distance between xi and the
other points.
• Finds all neighbor points within distance eps of the starting
point (xi).
• Each point, with a neighbor count greater than or equal to
MinPts, is marked as core point or visited.
2. For each core point, if it’s not already assigned to a cluster,
create a new cluster. Find recursively all its density connected
points and assign them to the same cluster as the core point.
3. Iterate through the remaining unvisited points in the data set.

Those points that do not belong to any cluster are treated as

outliers or noise.

14
DBSCAN Example
Given 8 data points:
A1 = (2, 10), A2 = (2, 5), A3 = (8, 4), A4 = (5, 8) , A5 = (7, 5), A6 =
(6 , 4) , A7 = (1, 2), A8 = (4, 9).
Apply the DBSCAN algorithm to find the final clusters and
identify outlier points in the given data points.
1. (Use epsilon (eps) = 2 and Minpts =2 and the Euclidean
distance as a distance measure)
2. What if eps = 10.
3. Draw a 10 X 10 grid to illustrate your answer and the
discovered clusters along with the outliers with each
epsilon in 1 and 2.

15
DBSCAN Example (eps = 2 , Minpts = 2)
Step 1: Construct distance matrix
A1 = (2, 10)
A2 = (2, 5)
A3 = (8, 4)
A4 = (5, 8)
A5 = (7, 5)
A6 = (6 , 4)
A7 = (1, 2)
A8 = (4, 9)

16
DBSCAN Example (eps = 2 , Minpts = 2)
Step 2: Find the Epsilon neighborhood of each data point
eps = 𝟐 , Minpts = 2
N (A1) = {}
N (A2) = {}
N (A3) = {A5, A6}
N (A4) = {A8}
N (A5) = {A3, A6}
N (A6) = {A3, A5}
N (A7) = {}
N (A8) = {A4}

17
DBSCAN Example (eps = 2 , Minpts = 2)
Step 3: Identify the final clusters and outliers
Cluster (1) = {A3, A5, A6}
Cluster (2) = {A4, A8}

Outliers
A1, A2, A7

18
DBSCAN Example (eps = 𝟏𝟎 , Minpts = 2)
Step 1: Construct distance matrix
A1 = (2, 10)
A2 = (2, 5)
A3 = (8, 4)
A4 = (5, 8)
A5 = (7, 5)
A6 = (6 , 4)
A7 = (1, 2)
A8 = (4, 9)

19
DBSCAN Example (eps = 𝟏𝟎 , Minpts = 2)
Step 2: Find the Epsilon neighborhood of each data point
eps = 𝟏𝟎 , Minpts = 2
N (A1) = {A8}
N (A2) = {A7}
N (A3) = {A5, A6}
N (A4) = {A8}
N (A5) = {A3, A6}
N (A6) = {A3, A5}
N (A7) = {A2}
N (A8) = {A1,A4}

20
DBSCAN Example (eps = 𝟏𝟎 , Minpts = 2)
Step 3: Identify the final clusters and outliers
Cluster (1) = {A1, A4, A8}
Cluster (2) = {A3, A5, A6}
Cluster (3) = {A2, A7}

No Outliers

21
Parameter Estimation of DBSCAN
▪ DBSCAN algorithm requires the user to identify the optimal
values for eps and MinPts.
▪ MinPts: As a general rule, a minimum minPts can be derived
from the number of dimensions D in the data set, as MinPts
≥ D + 1.
▪ Larger values are usually better for data sets with noise
and will yield more significant clusters.
▪ The minimum value for MinPts must be 3, but it may be
necessary to choose larger values for very large data.
▪ eps:
▪ if it is too small, a large part of the data will not be
clustered; It will be considered outliers.
▪ On the other hand if it is too high, clusters will merge
and the majority of objects will be in the same cluster.
▪ In general, small values of eps are preferable
22

Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
No ratings yet
Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
341 pages
Unit 8 DBSCAN
No ratings yet
Unit 8 DBSCAN
53 pages
5 Junior P.E and Arts
No ratings yet
5 Junior P.E and Arts
83 pages
Dbscan
No ratings yet
Dbscan
18 pages
1941 - National Building Code of Canada
No ratings yet
1941 - National Building Code of Canada
432 pages
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
100% (1)
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
395 pages
HW Ch7 1
No ratings yet
HW Ch7 1
12 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
GAGEtrak Pro 8 Intro Guide
No ratings yet
GAGEtrak Pro 8 Intro Guide
119 pages
ZYJ260
No ratings yet
ZYJ260
78 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
3-Data Fundamentals For BI - Part2
No ratings yet
3-Data Fundamentals For BI - Part2
44 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
15 pages
Networks Lecture 5
No ratings yet
Networks Lecture 5
29 pages
Lecture 1 - Introduction To Data Security
No ratings yet
Lecture 1 - Introduction To Data Security
46 pages
DM Lect 9 - Classification - Decision Trees
No ratings yet
DM Lect 9 - Classification - Decision Trees
39 pages
1-Introduction To Business Intelligence in A Business Environment
No ratings yet
1-Introduction To Business Intelligence in A Business Environment
40 pages
Lec5-Regular Simplex Method and Dual Simplex Method
No ratings yet
Lec5-Regular Simplex Method and Dual Simplex Method
48 pages
Lecture 5
No ratings yet
Lecture 5
20 pages
DBSCAN
No ratings yet
DBSCAN
29 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
22 pages
Se Demo
No ratings yet
Se Demo
29 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
19 pages
DBSCAN
No ratings yet
DBSCAN
27 pages
Lecture 5 Modes of Operation
No ratings yet
Lecture 5 Modes of Operation
30 pages
5-Data Analytics in A Business Operations and BI Marketing Models
No ratings yet
5-Data Analytics in A Business Operations and BI Marketing Models
29 pages
Unit 4-2
No ratings yet
Unit 4-2
7 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
14 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
Networks Lecture 2
No ratings yet
Networks Lecture 2
21 pages
DBSCAN
No ratings yet
DBSCAN
14 pages
DBSCAN
No ratings yet
DBSCAN
14 pages
Networks Lecture 1
No ratings yet
Networks Lecture 1
28 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
No ratings yet
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
30 pages
DB Scan Clustering
No ratings yet
DB Scan Clustering
11 pages
11 Grid Based Methods 04-11-2024
No ratings yet
11 Grid Based Methods 04-11-2024
12 pages
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
No ratings yet
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
30 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
Density and Grid Based Clustering
No ratings yet
Density and Grid Based Clustering
5 pages
Nostalgia Funny Car Rules V1
No ratings yet
Nostalgia Funny Car Rules V1
5 pages
5 Versionfinal
No ratings yet
5 Versionfinal
8 pages
Quality Work Life
No ratings yet
Quality Work Life
12 pages
Lab Manual Dbscan
No ratings yet
Lab Manual Dbscan
6 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 2
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 2
2 pages
Density ML
No ratings yet
Density ML
51 pages
ML14 Dbscan
No ratings yet
ML14 Dbscan
10 pages
DM Lec 6
No ratings yet
DM Lec 6
4 pages
Preparation and Delivery of Sermons Manual
No ratings yet
Preparation and Delivery of Sermons Manual
4 pages
Unsupervised Learning Clustering II
No ratings yet
Unsupervised Learning Clustering II
17 pages
DBSCAN
No ratings yet
DBSCAN
30 pages
Topic 2
No ratings yet
Topic 2
17 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
DB SCAN Unit 4
No ratings yet
DB SCAN Unit 4
6 pages
STS Lesson 1
No ratings yet
STS Lesson 1
8 pages
Density Based Clustering (Unit 5)
No ratings yet
Density Based Clustering (Unit 5)
5 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
Ads Exp 7 - Labmanual
No ratings yet
Ads Exp 7 - Labmanual
3 pages
DBSCAN Clustering in ML - Density Based Clustering
No ratings yet
DBSCAN Clustering in ML - Density Based Clustering
5 pages
Data Mining
No ratings yet
Data Mining
3 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
Density Based
No ratings yet
Density Based
27 pages
Reflow Soldering
No ratings yet
Reflow Soldering
6 pages
Nursing Care Assignment
No ratings yet
Nursing Care Assignment
8 pages
7-8-TLE CSS Week 5
No ratings yet
7-8-TLE CSS Week 5
10 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
2 pages
VDBSCAN
No ratings yet
VDBSCAN
4 pages
DB Scan
No ratings yet
DB Scan
7 pages
Equilibrium: Three Stooges in Chemical Reactions
No ratings yet
Equilibrium: Three Stooges in Chemical Reactions
5 pages
DIP Lab 13 DBSCAN Clustering
No ratings yet
DIP Lab 13 DBSCAN Clustering
6 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
Shamjith UiUx Design Resume
No ratings yet
Shamjith UiUx Design Resume
1 page
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
No ratings yet
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
21 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
Filling Station Case Study
No ratings yet
Filling Station Case Study
22 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
No ratings yet
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
27 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
DBSCAN Presentation
No ratings yet
DBSCAN Presentation
10 pages
10 of The Most Luxurious Indian Homes On Houzz
No ratings yet
10 of The Most Luxurious Indian Homes On Houzz
2 pages
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
No ratings yet
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
3 pages
Reg 216 - B520
No ratings yet
Reg 216 - B520
24 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
09 Elms Review
No ratings yet
09 Elms Review
1 page
Eurocode 7 Geotechnical Limit Analysis
No ratings yet
Eurocode 7 Geotechnical Limit Analysis
19 pages
CNP Bill
No ratings yet
CNP Bill
1 page
DEGUZMAN KS3 LeaP G8Q3W6
No ratings yet
DEGUZMAN KS3 LeaP G8Q3W6
3 pages
Session 2 Overview of Integrity
No ratings yet
Session 2 Overview of Integrity
19 pages
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
No ratings yet
Pollution Emitting From Guernsey Power Plant/PEH Incinerator and Proposed EtW
6 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
Dbscan: Densiy Based Scan Algorithm
No ratings yet
Dbscan: Densiy Based Scan Algorithm
8 pages
11 2 Multi-Step Subtraction Problems
No ratings yet
11 2 Multi-Step Subtraction Problems
2 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

DM Lect 8 - Clustering - DBSCAN

Uploaded by

DM Lect 8 - Clustering - DBSCAN

Uploaded by

Data Mining

Density Based Clustering

Dr. Wedad Hussein

Dr. Mahmoud Mounir

▪DBSCAN stands for Density-Based Spatial Clustering and

▪The advantage of DBSCAN:

▪Two important parameters are required for DBSCAN:

▪A point q is directly reachable from p if point q is

▪A point q is density reachable from p if there is a

▪ Points B and C are not core

▪ Point N is a noise point that is

▪ A density-based cluster is defined as a group of

Those points that do not belong to any cluster are treated as

You might also like