0% found this document useful (0 votes)
2 views14 pages

Density Based Clustering Methods

The document discusses density-based clustering methods, particularly DBSCAN and OPTICS, which are designed to identify clusters of arbitrary shapes. DBSCAN classifies points as core, border, or outlier based on their density and proximity to other points, while OPTICS enhances this by introducing core distance and reachability distance concepts. Both methods aim to effectively discover dense regions in data, improving upon traditional clustering techniques that struggle with non-spherical clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views14 pages

Density Based Clustering Methods

The document discusses density-based clustering methods, particularly DBSCAN and OPTICS, which are designed to identify clusters of arbitrary shapes. DBSCAN classifies points as core, border, or outlier based on their density and proximity to other points, while OPTICS enhances this by introducing core distance and reachability distance concepts. Both methods aim to effectively discover dense regions in data, improving upon traditional clustering techniques that struggle with non-spherical clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Clustering

• Density based methods


Density Based Methods
• Partitioning and hierarchical methods are designed to find
spherical-shaped clusters. They have difficulty finding clusters of
arbitrary shape such as the “S” shape and oval clusters
• Given such data, they would likely inaccurately identify convex
regions, where noise or outliers are included in the clusters.
• To find clusters of arbitrary shape, alternatively, we can model
clusters as dense regions in the data space, separated by sparse
regions.
• This is the main strategy behind density-based clustering
methods, which can discover clusters of nonspherical shape.
DBSCAN: Density-Based Spatial Clustering of
Applications with Noise

• How can we find dense regions in density-based


clustering?”
• The density of an object o can be measured by the
number of objects close to o. DBSCAN finds core
objects, that is, objects that have dense
neighborhoods. It connects core objects and their
neighborhoods to form dense regions as clusters.
• “How does DBSCAN quantify the neighborhood of an
object?”
• A user-specified para- meter ε > 0 is used to specify
the radius of a neighborhood we consider for every
object. The ε-neighborhood of an object o is the
space within a radius ε centered at o.
• Due to the fixed neighborhood size parameterized by
ε, the density of a neighborhood can be measured
• The main idea behind DBSCAN is that a point
belongs to a cluster if it is close to many points
from that cluster.
• There are two key parameters of DBSCAN:
• eps: The distance that specifies the
neighborhoods. Two points are considered to be
neighbors if the distance between them are less
than or equal to eps.
• minPts: Minimum number of data points to define
a cluster.
• Based on these two parameters, points are
classified as core point, border point, or outlier:
• Core point: A point is a core point if there are at
least minPts number of points (including the point
itself) in its surrounding area with radius eps.
• Border point: A point is a border point if it is
reachable from a core point and there are less
than minPts number of points within its
surrounding area.
• Outlier: A point is an outlier if it is not a core
point and not reachable from any core points.
In this case, minPts is 4. Red points are core points because
there are at least 4 points within their surrounding area
with radius eps. This area is shown with the circles in the
figure. The yellow points are border points because they
are reachable from a core point and have less than 4 points
within their neighborhood. Reachable means being in the
surrounding area of a core point. The points B and C have
two points (including the point itself) within their
neigborhood (i.e. the surrounding area with a radius of eps).
• Given a set, D, of objects, we can identify all core
objects with respect to the given parameters, ε and
MinPts. The clustering task is therein reduced to using
core objects and their neighborhoods to form dense
regions, where the dense regions are clusters.
• For a core object q and an object p, we say that p is
directly density-reachable from q (with respect to
ε and MinPts) if p is within the ε-neighborhood of q.
Clearly, an object p is directly density-reachable from
another object q if and only if q is a core object and p
is in the ε-neighborhood of q. Using the directly
density-reachable relation, a core object can “bring”
all objects from its ε-neighborhood into a dense
region.
https://www.youtube.com/watch?
v=RDZUdRSDOok
OPTICS Clustering stands for Ordering Points To Identify
Cluster Structure. It draws inspiration from the DBSCAN
clustering algorithm. It adds two more terms to the
concepts of DBSCAN clustering. They are:-
1. Core Distance: It is the minimum value of radius
required to classify a given point as a core point. If the
2.given point is not
Reachability a Core point,
Distance: It is then it’s Core
defined Distancetois
with respect
undefined.
another data point q(Let). The Reachability distance
between a point p and q is the maximum of the Core
Distance of p and the Euclidean Distance(or some other
distance metric) between p and q. Note that The
Reachability Distance is not defined if q is not a Core
point.
• Core-distance and reachability-distance. Figure
10.16 illustrates the concepts of core- distance and
reachability-distance. Suppose that ε = 6 mm and
MinPts = 5. The core- distance of p is the distance, ε′,
between p and the fourth closest data object from p.
The reachability-distance of q1 from p is the core-
distance of p (i.e., ε′ = 3mm) because this is greater
than the Euclidean distance from p to q1. The
reachability-distance of q2 with respect to p is the
Euclidean distance from p to q2 because this is
greater than the core-distance of p.
• OPTICS computes an ordering of all objects in a given database
and, for each object in the database, stores the core-distance and
a suitable reachability-distance. OPTICS maintains a list called
OrderSeeds to generate the output ordering. Objects in Order-
Seeds are sorted by the reachability-distance from their respective
closest core objects, that is, by the smallest reachability-distance
of each object.
• OPTICS begins with an arbitrary object from the input database as
the current object, p. It retrieves the ε-neighborhood of p,
determines the core-distance, and sets the reachability-distance to
undefined. The current object, p, is then written to output.
• https://in.coursera.org/lecture/cluster-
analysis/5-3-optics-ordering-points-to-
identify-clustering-structure-JiYeI

You might also like