0% found this document useful (0 votes)
22 views6 pages

DBSCAN Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that identifies clusters based on point density without requiring the number of clusters to be specified in advance. It categorizes points into core, border, and noise points, and uses parameters like Epsilon (ϵ) and MinPts to define clusters. While it effectively detects arbitrary shapes and handles noise, it is sensitive to parameter choices and struggles with varying densities and high-dimensional data.

Uploaded by

Rana Ben Fraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

DBSCAN Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that identifies clusters based on point density without requiring the number of clusters to be specified in advance. It categorizes points into core, border, and noise points, and uses parameters like Epsilon (ϵ) and MinPts to define clusters. While it effectively detects arbitrary shapes and handles noise, it is sensitive to parameter choices and struggles with varying densities and high-dimensional data.

Uploaded by

Rana Ben Fraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DBSCAN Clustering

Defnition:
DBSCAN (density-Based Spatial Clustering of Applications with Noise) is a
density-based clustering algorithm that identifies clusters in data by grouping
points that are close together and marking points in low-density regions as
noise or outliers. It does not require specifying the number of clusters in
advance and works well for clusters of arbitrary shape.(handles nasted
clusters).

Key Concepts of DBSCAN :


1. Core Points, Border Points, and Noise
Core Points:

Points with at least a minimum number of neighboring points (MinPts)


within a specified distance (ϵ).

These points are considered central to a cluster.

Border Points:

Points within the ϵ-neighborhood of a core point but do not themselves


have enough neighbors to be a core point.

They "belong" to the cluster of the core point.

Noise Points:

DBSCAN Clustering 1
Points that are not core points and are not within the ϵ-neighborhood of any
core point.

Treated as outliers.

2. Parameters
Epsilon (ϵ):

Maximum distance between two points to be considered neighbors..

MinPts:

The minimum number of points required to form a dense region


(including the point itself).

Steps :

1. Identify Core Points:

For each point, count how many points fall within its ϵ-neighborhood.

If the count ≥MinPts, the point is a core point.

DBSCAN Clustering 2
A point is considered a core point if it has at least MinPts points
(including itself) within a given radius ε (epsilon)

2. Expand Clusters:

Start with an unvisited core point.

DBSCAN Clustering 3
Create a new cluster and include all points in its ϵ-neighborhood.

Recursively add all neighboring core points and their neighbors to the
cluster.

3. Classify Points:

Border points are added to the cluster of the nearest core point.( Non
core points) but we dont use it to ad to the cluster, meaning non core
points can only be added to the cluste , but we don’t use them to
expand it ( Ne9fou fih)

DBSCAN Clustering 4
Points not belonging to any cluster are classified as noise.

Remaining points are called


outliers/Noise points.

Advantages
1. No Need to Specify KKK:

Unlike K-Means, DBSCAN automatically determines the number of


clusters based on the data.

2. Detects Arbitrary Shapes:

DBSCAN Clustering 5
Can identify clusters of irregular shapes (e.g., spirals, concentric
circles).

3. Handles Noise:

Effectively identifies outliers as noise points.

4. Works Well for Density-Based Clusters:

Clusters are defined by dense regions of data.

Limitations
1. Parameter Sensitivity:

The results depend heavily on the choice of ϵ and MinPts.

ϵthat is too small results in many small clusters or noise, while too large
may merge clusters.

2. Varying Densities:

Struggles when clusters have different densities. A single ϵ value may


not work well for all clusters.

3. High Dimensionality:

Computing distances becomes less meaningful in high-dimensional


data.

DBSCAN Clustering 6

You might also like