0% found this document useful (0 votes)
2 views15 pages

Density Based Clustering Methods

Density-Based Clustering is an unsupervised learning method that groups data points based on their density, identifying core points, border points, and outliers. Key parameters include EPS, which defines the neighborhood radius, and MinPts, the minimum number of points required to form a core point. The DBSCAN algorithm implements this clustering approach by iterating through points, marking them as part of a cluster or noise based on their density connectivity.

Uploaded by

research.veltech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views15 pages

Density Based Clustering Methods

Density-Based Clustering is an unsupervised learning method that groups data points based on their density, identifying core points, border points, and outliers. Key parameters include EPS, which defines the neighborhood radius, and MinPts, the minimum number of points required to form a core point. The DBSCAN algorithm implements this clustering approach by iterating through points, marking them as part of a cluster or noise based on their density connectivity.

Uploaded by

research.veltech
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Density Based Clustering

Methods
the dataset in the figure below can easily be Consider the following figures:
divided into three clusters using k-means
algoritm.
• Clustering analysis or simply Clustering is
basically an Unsupervised learning method
that divides the data points into a number of
specific batches or groups, such that the data
points in the same groups have similar
properties and data points in different groups
have different properties in some sense.
Density Based Clustering
• Density-Based Clustering refers to one of the most
popular unsupervised learning methodologies used in
model building and machine learning algorithms.
• The data points in the region separated by two
clusters of low point density are considered as noise.
• The surroundings with a radius ε of a given object are
known as the ε neighborhood of the object.
• If the ε neighborhood of the object comprises at least
a minimum number, MinPts of objects, then it is
called a core object.
Density Based Clustering
There are two different parameters to calculate the density-based
clustering
• EPS: It is considered as the maximum radius of the
neighborhood.
• It defines the neighborhood around a data point i.e. if the
distance between two points is lower or equal to ‘eps’ then they
are considered neighbors.
• If the eps value is chosen too small then large part of the data
will be considered as outliers.
• If it is chosen very large then the clusters will merge and the
majority of the data points will be in the same clusters.
• One way to find the eps value is based on the k-distance graph.
Density Based Clustering
• MinPts: Minimum number of neighbors (data
points) within eps radius.
• Larger the dataset, the larger value of MinPts
must be chosen.
• As a general rule, the minimum MinPts can be
derived from the number of dimensions D in
the dataset as, MinPts >= D+1.
• The minimum value of MinPts must be chosen
at least 3.
Density Based Clustering
• Core point: A point is a core point if there are at
least minPts number of points (including the point
itself) in its surrounding area with radius eps.
• Border point: A point is a border point if it is
reachable from a core point and there are less than
minPts number of points within its surrounding
area.
• Outlier: A point is an outlier if it is not a core point
and not reachable from any core points.
• In this case, minPts is 4.
• Red points are core points because there
are at least 4 points within their
surrounding area with radius eps. This area
is shown with the circles in the figure.
• The yellow points are border points because
they are reachable from a core point and
have less than 4 points within their
neighborhood.
• Reachable means being in the surrounding
area of a core point. The points B and C
have two points (including the point itself)
within their neigborhood (i.e. the
surrounding area with a radius of eps).
Finally N is an outlier because it is not a
core point and cannot be reached from a
core point.
Density Based Clustering

In this algorithm, we have 3 types


of data points.
Core Point: A point is a core point if
it has more than MinPts points
within eps.
Border Point: A point which has
fewer than MinPts within eps but it
is in the neighborhood of a core
point.
Noise or outlier: A point which is
not a core point or border point.
Density Based Clustering
• NEps (i) : { k belongs to D
and dist (i,k) < = Eps}
Directly density reachable:
A point i is considered as the
directly density reachable
from a point k with respect to
Eps, MinPts if i belongs to
NEps(k)
Density Based Clustering
• Density reachable:
A point denoted by i is a
density reachable from a
point j with respect to
Eps, MinPts if there is a
sequence chain of a point
i1,…., in, i1 = j, pn = i such
that ii + 1 is directly
density reachable from ii.
Density Based Clustering
Density connected:
• A point i refers to density connected to a point
j with respect to Eps, MinPts if there is a point
o such that both i and j are considered as
density reachable from o with respect to Eps
and MinPts.
DBSCAN

• DBSCAN stands for Density-Based Spatial


Clustering of Applications with Noise. It
depends on a density-based notion of cluster.
It also identifies clusters of arbitrary size in the
spatial database with outliers.
Algorithm - DBSCAN
• Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires two parameters:
ε (eps) and the minimum number of points required to form a cluster (minPts).
• 1) Start with an arbitrary starting point that has not been visited.
• 2) Extract the neighborhood of this point using ε (All points which are within the ε
distance are neighborhood).
• 3) If there are sufficient neighborhood around this point then clustering process
starts and point is marked as visited else this point is labeled as noise (Later this
point can become the part of the cluster).
• 4) If a point is found to be a part of the cluster then its ε neighborhood is also the
part of the cluster and the above procedure from step 2 is repeated for all ε
neighborhood points. This is repeated until all points in the cluster is determined.
• 5) A new unvisited point is retrieved and processed, leading to the discovery of a
further cluster or noise.
• 6) This process continues until all points are marked as visited.

You might also like