Density Based Clustering Methods
Density Based Clustering Methods
Methods
the dataset in the figure below can easily be Consider the following figures:
divided into three clusters using k-means
algoritm.
• Clustering analysis or simply Clustering is
basically an Unsupervised learning method
that divides the data points into a number of
specific batches or groups, such that the data
points in the same groups have similar
properties and data points in different groups
have different properties in some sense.
Density Based Clustering
• Density-Based Clustering refers to one of the most
popular unsupervised learning methodologies used in
model building and machine learning algorithms.
• The data points in the region separated by two
clusters of low point density are considered as noise.
• The surroundings with a radius ε of a given object are
known as the ε neighborhood of the object.
• If the ε neighborhood of the object comprises at least
a minimum number, MinPts of objects, then it is
called a core object.
Density Based Clustering
There are two different parameters to calculate the density-based
clustering
• EPS: It is considered as the maximum radius of the
neighborhood.
• It defines the neighborhood around a data point i.e. if the
distance between two points is lower or equal to ‘eps’ then they
are considered neighbors.
• If the eps value is chosen too small then large part of the data
will be considered as outliers.
• If it is chosen very large then the clusters will merge and the
majority of the data points will be in the same clusters.
• One way to find the eps value is based on the k-distance graph.
Density Based Clustering
• MinPts: Minimum number of neighbors (data
points) within eps radius.
• Larger the dataset, the larger value of MinPts
must be chosen.
• As a general rule, the minimum MinPts can be
derived from the number of dimensions D in
the dataset as, MinPts >= D+1.
• The minimum value of MinPts must be chosen
at least 3.
Density Based Clustering
• Core point: A point is a core point if there are at
least minPts number of points (including the point
itself) in its surrounding area with radius eps.
• Border point: A point is a border point if it is
reachable from a core point and there are less than
minPts number of points within its surrounding
area.
• Outlier: A point is an outlier if it is not a core point
and not reachable from any core points.
• In this case, minPts is 4.
• Red points are core points because there
are at least 4 points within their
surrounding area with radius eps. This area
is shown with the circles in the figure.
• The yellow points are border points because
they are reachable from a core point and
have less than 4 points within their
neighborhood.
• Reachable means being in the surrounding
area of a core point. The points B and C
have two points (including the point itself)
within their neigborhood (i.e. the
surrounding area with a radius of eps).
Finally N is an outlier because it is not a
core point and cannot be reached from a
core point.
Density Based Clustering