The document discusses density-based clustering methods, particularly DBSCAN and OPTICS, which are designed to identify clusters of arbitrary shapes. DBSCAN classifies points as core, border, or outlier based on their density and proximity to other points, while OPTICS enhances this by introducing core distance and reachability distance concepts. Both methods aim to effectively discover dense regions in data, improving upon traditional clustering techniques that struggle with non-spherical clusters.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
2 views14 pages
Density Based Clustering Methods
The document discusses density-based clustering methods, particularly DBSCAN and OPTICS, which are designed to identify clusters of arbitrary shapes. DBSCAN classifies points as core, border, or outlier based on their density and proximity to other points, while OPTICS enhances this by introducing core distance and reachability distance concepts. Both methods aim to effectively discover dense regions in data, improving upon traditional clustering techniques that struggle with non-spherical clusters.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14
Clustering
• Density based methods
Density Based Methods • Partitioning and hierarchical methods are designed to find spherical-shaped clusters. They have difficulty finding clusters of arbitrary shape such as the “S” shape and oval clusters • Given such data, they would likely inaccurately identify convex regions, where noise or outliers are included in the clusters. • To find clusters of arbitrary shape, alternatively, we can model clusters as dense regions in the data space, separated by sparse regions. • This is the main strategy behind density-based clustering methods, which can discover clusters of nonspherical shape. DBSCAN: Density-Based Spatial Clustering of Applications with Noise
• How can we find dense regions in density-based
clustering?” • The density of an object o can be measured by the number of objects close to o. DBSCAN finds core objects, that is, objects that have dense neighborhoods. It connects core objects and their neighborhoods to form dense regions as clusters. • “How does DBSCAN quantify the neighborhood of an object?” • A user-specified para- meter ε > 0 is used to specify the radius of a neighborhood we consider for every object. The ε-neighborhood of an object o is the space within a radius ε centered at o. • Due to the fixed neighborhood size parameterized by ε, the density of a neighborhood can be measured • The main idea behind DBSCAN is that a point belongs to a cluster if it is close to many points from that cluster. • There are two key parameters of DBSCAN: • eps: The distance that specifies the neighborhoods. Two points are considered to be neighbors if the distance between them are less than or equal to eps. • minPts: Minimum number of data points to define a cluster. • Based on these two parameters, points are classified as core point, border point, or outlier: • Core point: A point is a core point if there are at least minPts number of points (including the point itself) in its surrounding area with radius eps. • Border point: A point is a border point if it is reachable from a core point and there are less than minPts number of points within its surrounding area. • Outlier: A point is an outlier if it is not a core point and not reachable from any core points. In this case, minPts is 4. Red points are core points because there are at least 4 points within their surrounding area with radius eps. This area is shown with the circles in the figure. The yellow points are border points because they are reachable from a core point and have less than 4 points within their neighborhood. Reachable means being in the surrounding area of a core point. The points B and C have two points (including the point itself) within their neigborhood (i.e. the surrounding area with a radius of eps). • Given a set, D, of objects, we can identify all core objects with respect to the given parameters, ε and MinPts. The clustering task is therein reduced to using core objects and their neighborhoods to form dense regions, where the dense regions are clusters. • For a core object q and an object p, we say that p is directly density-reachable from q (with respect to ε and MinPts) if p is within the ε-neighborhood of q. Clearly, an object p is directly density-reachable from another object q if and only if q is a core object and p is in the ε-neighborhood of q. Using the directly density-reachable relation, a core object can “bring” all objects from its ε-neighborhood into a dense region. https://www.youtube.com/watch? v=RDZUdRSDOok OPTICS Clustering stands for Ordering Points To Identify Cluster Structure. It draws inspiration from the DBSCAN clustering algorithm. It adds two more terms to the concepts of DBSCAN clustering. They are:- 1. Core Distance: It is the minimum value of radius required to classify a given point as a core point. If the 2.given point is not Reachability a Core point, Distance: It is then it’s Core defined Distancetois with respect undefined. another data point q(Let). The Reachability distance between a point p and q is the maximum of the Core Distance of p and the Euclidean Distance(or some other distance metric) between p and q. Note that The Reachability Distance is not defined if q is not a Core point. • Core-distance and reachability-distance. Figure 10.16 illustrates the concepts of core- distance and reachability-distance. Suppose that ε = 6 mm and MinPts = 5. The core- distance of p is the distance, ε′, between p and the fourth closest data object from p. The reachability-distance of q1 from p is the core- distance of p (i.e., ε′ = 3mm) because this is greater than the Euclidean distance from p to q1. The reachability-distance of q2 with respect to p is the Euclidean distance from p to q2 because this is greater than the core-distance of p. • OPTICS computes an ordering of all objects in a given database and, for each object in the database, stores the core-distance and a suitable reachability-distance. OPTICS maintains a list called OrderSeeds to generate the output ordering. Objects in Order- Seeds are sorted by the reachability-distance from their respective closest core objects, that is, by the smallest reachability-distance of each object. • OPTICS begins with an arbitrary object from the input database as the current object, p. It retrieves the ε-neighborhood of p, determines the core-distance, and sets the reachability-distance to undefined. The current object, p, is then written to output. • https://in.coursera.org/lecture/cluster- analysis/5-3-optics-ordering-points-to- identify-clustering-structure-JiYeI
(Advances in Data Warehousing and Mining Series) David Taniar, David Taniar - Research and Trends in Data Mining Technologies and Applications. Volume 1-Idea Group Pub (2007)