0% found this document useful (0 votes)
84 views22 pages

DBSCAN Clustering

The document discusses DBSCAN clustering, an unsupervised clustering algorithm that allows clustering unlabeled data based on density. It works by grouping together points that are closely packed, marking points that lie alone in low-density regions as outliers. The algorithm requires defining parameters for neighborhood size and minimum number of points.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views22 pages

DBSCAN Clustering

The document discusses DBSCAN clustering, an unsupervised clustering algorithm that allows clustering unlabeled data based on density. It works by grouping together points that are closely packed, marking points that lie alone in low-density regions as outliers. The algorithm requires defining parameters for neighborhood size and minimum number of points.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

DBSCAN CLUSTERING

Jifry Issadeen
DBSCAN CLUSTERING Introduction

• Density-Based Spatial Clustering of


Applications with Noise (DBSCAN) is
an unsupervised clustering
algorithm which allows us to cluster
unlabeled data.

• Clustering in general is a process of


grouping similar data points
together and discover underlying
patterns.
Jifry Issadeen
DBSCAN CLUSTERING Why DBSCAN

• For simple clustering, we can


use Partition-based clustering
techniques like K-Means
clustering.

Jifry Issadeen
DBSCAN CLUSTERING Why DBSCAN

• Compared to other clustering


techniques, when it comes to
arbitrary shaped clusters or
detecting outliers, density-based
techniques are more efficient.

• Density-based clustering
algorithms are highly effective
at finding high-density regions
and outliers.
Jifry Issadeen
DBSCAN CLUSTERING Ɛ – Epsilon (eps)

Ɛ – Epsilon (eps)
• Epsilon is the distance that
specifies the neighborhoods. Two
Epsilon (Ɛ)
points are considered to be
neighbors if the distance between
them are less than or equal to eps.

• We also can consider the epsilon


as the radius to draw the
boundary.
Jifry Issadeen
DBSCAN CLUSTERING minPts – Minimum Points

minPts – Minimum Points minPts = 4

• The minimum number of points


required to form a dense region/
define a cluster.

• If the minPts defined is 4, since


the number of neighbors in the
red circle are also 4 (including Neighbors
the red point), we define the red
point as core point.
Jifry Issadeen
DBSCAN CLUSTERING Different types of Points

The points are classified as:

1. Core Points

2. Border points

3. Outliers / Noise

Jifry Issadeen
DBSCAN CLUSTERING Core Point
Core Point minPts = 4
• A point (p) is a core point if there
are at least minimum number of
points within the neighborhood Core Point
(within the distance of eps).

• Since we have 4 points within the


circle which is equal to the
minimum points defined, we
define the red point as a core
point.
Jifry Issadeen
DBSCAN CLUSTERING Border Points

Border Point minPts = 4

• Border points are the points that


are directly reachable from core
points and there are less than
minPts number of points within
the neighborhood of that point.

Jifry Issadeen
DBSCAN CLUSTERING Border Points

Border Point minPts = 4

• Yellow points don’t meet the


minimum points requirements to
become a Core Points.

• There are only less than


minimum points within the
radius of the yellow points.

Jifry Issadeen
DBSCAN CLUSTERING Noise/Outlier

Outlier minPts = 4

• All the points that are not


reachable from any other point
are outliers or noise points.

• The blue point is considered as


an outlier as it doesn’t fall into
any other clusters.

Jifry Issadeen
DBSCAN CLUSTERING Identify Core Points
Since we have 4
points, that is = minPts = 4
equal to minPts..
To begin with, we
pick a point
randomly..

Draw a circle
around the given
Weradius..
select this
point as a core
point..

Jifry Issadeen
DBSCAN CLUSTERING Identify Core Points
Since we have 4
points, that is = minPts = 4
equal to minPts..
Next, we pick this
point..

Draw a circle
around the given
Weradius..
select this
point as a core
point..

Jifry Issadeen
DBSCAN CLUSTERING Identify Core Points
Since we have 4
points, that is = minPts = 4
equal to minPts..
Next, we pick this
point..

Draw a circle
around the given
Weradius..
select this
point as a core
point..

Jifry Issadeen
DBSCAN CLUSTERING Identify Core Points
Since we have 4
points, that is = minPts = 4
equal to minPts..
Next, we pick this
point..

Draw a circle
around the given
Weradius..
select this
point as a core
point..

Jifry Issadeen
DBSCAN CLUSTERING Identify Border Points
minPts = 4
Since we have only 2
points,
Next, wethat
pickis this
less
than minPts..
point..
..and one of the
points is a core point..
This time.. We
..and there are no
select this point as
more points
connected
a Borderto point..
this
point..
Draw a circle
around the given
radius..

Jifry Issadeen
DBSCAN CLUSTERING Identify Border Points
Next, we pick this This time..
Since We only 2
we have
selectpoints,
this that is
point as less minPts = 4
point.. than minPts..
a Border point..
..and one of the
points is a core point..
..and there are no
more points
connected to this
point..

Draw a circle
around the given
radius..

Jifry Issadeen
DBSCAN CLUSTERING Identify Outlier Points
minPts = 4
Since
Next, we pick
havethis
only
1 point, that is less
point..
than minPts..
..and there are no
any other neighbors
This
in thetime.. We
circle..
select this point as
a Noise/ Outlier..
Draw a circle
around the given
radius..

Jifry Issadeen
DBSCAN CLUSTERING Identify Noise/Outlier
• DBSCAN clustered these 6 points, minPts = 4
including the border points in to
one group.

• Since DBSCAN is very robust to


noise, it has also successfully
identified the noise/outlier.

• Note: A point that is marked as


noise may be revisited and be
part of a another cluster.

Jifry Issadeen
DBSCAN CLUSTERING Introduction

• DBSCAN algorithm is able to find minPts = 4


high density regions and separate
them from low density regions.
• In this example, DBSCAN has
clustered the data into two
clusters.
• When DBSCAN gets a new data
point, it puts it in to the closest
cluster.

Jifry Issadeen
DBSCAN CLUSTERING Advantages

Advantages

• It can discover arbitrarily shaped clusters.

• Find clusters completely surrounded by different clusters.

• It is great at separating clusters of high density versus clusters


of low density within a given dataset.

• Robust towards Noise / Outlier.


Jifry Issadeen
DBSCAN CLUSTERING Disadvantages

Disadvantages

• Doesn’t work well when dealing with clusters of varying


densities. While it is great at separate high density clustering
from low density clusters, DBSCAN struggles with clusters
with similar density.

• Struggles with high dimensionality data.

Jifry Issadeen

You might also like