0% found this document useful (0 votes)
23 views14 pages

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that identifies clusters based on data density without requiring a pre-defined number of clusters. It categorizes points into core points, border points, and noise points, using parameters ε (epsilon) and MinPts to define cluster density. While it effectively handles noise and can identify arbitrarily shaped clusters, its performance is sensitive to the choice of parameters and may struggle with clusters of varying densities.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that identifies clusters based on data density without requiring a pre-defined number of clusters. It categorizes points into core points, border points, and noise points, using parameters ε (epsilon) and MinPts to define cluster density. While it effectively handles noise and can identify arbitrarily shaped clusters, its performance is sensitive to the choice of parameters and may struggle with clusters of varying densities.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

DBSCAN

Clustering
DBSCAN
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
is a popular clustering algorithm that groups together closely packed
points in a dataset and marks points in low-density regions as outliers
or noise. Unlike traditional clustering methods like K-means, which
require the number of clusters to be pre-defined, DBSCAN uses
density to determine the number of clusters and the size of each
cluster.
Key Concepts:
1.Core Points: Points that have at least a specified number of
neighboring points within a given radius (ε). These points form the
"core" of a cluster.
2.Border Points: Points that are within the ε radius of a core point but
do not have enough neighbors to be core points themselves.
3.Noise Points (Outliers): Points that do not belong to any cluster.
These points do not meet the requirements for being core or border
points.
Parameters:
• ε (epsilon): The maximum distance between two points to be
considered neighbors.
• MinPts (minimum points): The minimum number of points required
to form a dense region or a core point.
How DBSCAN Works:
1. Initialize: Start with an arbitrary unvisited point.
2. Find neighbors: Identify all points within the ε radius.
3. Core point identification: If the number of neighbors (including the point
itself) is greater than or equal to MinPts, the point becomes a core point,
and a cluster is formed.
4. Expand clusters: All points that are directly reachable from core points
(including border points) are added to the cluster.
5. Mark outliers: Points that are not reachable from any core points (i.e.,
neither core nor border points) are classified as outliers or noise.
6. Repeat: Continue the process for all points until all points are either
assigned to a cluster or marked as noise.
Advantages
• No need to pre-define the number of clusters: Unlike algorithms like
K-means, DBSCAN automatically finds clusters based on data density.
• Can identify arbitrarily shaped clusters: It works well for clusters of
different shapes, unlike K-means which assumes clusters are
spherical.
• Handles noise well: DBSCAN naturally identifies noise points, making
it robust to outliers.
Disadvantage
• Sensitive to ε and MinPts values: The algorithm’s performance can be
sensitive to the choice of ε and MinPts. Setting them incorrectly can
lead to poor clustering results.
• Doesn't perform well with clusters of varying densities: If the dataset
contains clusters with varying densities, DBSCAN may not perform
well, as a single set of parameters may not work for all clusters.
How to Find ε (Epsilon)? Step 1
• Finding an optimal ε can be challenging
because it depends on the dataset's
structure. A typical method to
determine an appropriate ε is the k-
distance graph approach.
• Step 1: Calculate the Distance Between
Each Point and Its Nearest Neighbors
• We need to compute the distance
between each point and its k-th nearest
neighbor, where k = MinPts - 1 = 1 in our
case.
• We’ll calculate the Euclidean distance
between each pair of points.
How to Find ε (Epsilon)? Step 2
• Step 2: Find the k-th Nearest Neighbor Distances (k = 1)
• Now, for each point, we will find the distance to the
nearest neighbor (the smallest distance, since k = 1).
We will sort these distances.
• (1, 2): Nearest neighbor is (2, 2) with a distance of 1.0.
• (2, 2): Nearest neighbor is (1, 2) with a distance of 1.0.
• (2, 3): Nearest neighbor is (2, 2) with a distance of 1.0.
• (8, 7): Nearest neighbor is (8, 8) with a distance of 1.41.
• (8, 8): Nearest neighbor is (8, 7) with a distance of 1.41.
• (25, 80): Nearest neighbor is (8, 8) with a distance of
72.35.
How to Find ε (Epsilon)? Step 3
How to Find ε (Epsilon)? Step 3
• Step 4: Choose ε from the Elbow Point
• Based on the k-distance graph, the elbow point is just before the
large jump in distance. Thus, we choose ε = 1.5 as the optimal value.
Final Example of Using DBSCAN on
Data:
Chosen Parameters:
• ε = 2 (chosen from the k-distance graph method)
• MinPts = 2
DBSCAN Algorithm Steps:
•Point (1, 2):
•Distance to nearest neighbors: (2, 2), (2, 3) within ε = 2 → Core point.
•Assign it to a cluster.
•Point (2, 2):
•Distance to nearest neighbors: (1, 2), (2, 3) within ε = 2 → Core point.
•Assign it to the same cluster.
•Point (2, 3):
•Distance to nearest neighbors: (1, 2), (2, 2) within ε = 2 → Core point.
•Assign it to the same cluster.
•Point (8, 7):
•Distance to nearest neighbors: (8, 8) within ε = 2 → Border point.
•Assign it to a new cluster (Cluster 2).
•Point (8, 8):
•Distance to nearest neighbors: (8, 7) within ε = 2 → Border point.
•Assign it to Cluster 2.
•Point (25, 80):
•Distance to nearest neighbors: No points within ε = 2 → Noise.
•Mark it as outlier.

You might also like