0% found this document useful (0 votes)
6 views2 pages

DBSCAN Clustering

DBSCAN is a density-based clustering algorithm that identifies clusters as dense regions and marks outliers as noise, effectively handling arbitrary-shaped clusters and noise. It requires two key parameters: eps, which defines the neighborhood radius, and MinPts, the minimum number of points needed to form a dense region. Unlike K-Means, DBSCAN does not require specifying the number of clusters and is robust to outliers.

Uploaded by

sarey74393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

DBSCAN Clustering

DBSCAN is a density-based clustering algorithm that identifies clusters as dense regions and marks outliers as noise, effectively handling arbitrary-shaped clusters and noise. It requires two key parameters: eps, which defines the neighborhood radius, and MinPts, the minimum number of points needed to form a dense region. Unlike K-Means, DBSCAN does not require specifying the number of clusters and is robust to outliers.

Uploaded by

sarey74393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

DBSCAN is a density-based clustering algorithm that groups data points

that are closely packed together and marks outliers as noise based on their
density in the feature space. It identifies clusters as dense regions in the data
space, separated by areas of lower density.
Unlike K-Means or hierarchical clustering, which assume clusters
are compact and spherical, DBSCAN excels in handling real-world data
irregularities such as:
 Arbitrary-Shaped Clusters: Clusters can take any shape, not just circular
or convex.
 Noise and Outliers: It effectively identifies and handles noise points
without assigning them to any cluster.
Key Parameters in DBSCAN
 1. eps: This defines the radius of the neighborhood around a data point.
If the distance between two points is less than or equal to eps, they are considered
neighbors. Choosing the right eps is crucial:
 If eps is too small, most points will be classified as noise.
 If eps is too large, clusters may merge, and the algorithm may fail to
distinguish between them.
A common method to determine eps is by analyzing the k-distance graph.
 2. MinPts: This is the minimum number of points required within
the eps radius to form a dense region.
How Does DBSCAN Work?
DBSCAN works by categorizing data points into three types:
1. core points, which have a sufficient number of neighbors within a specified
radius (eplison)
2. border points, which are near core points but lack enough neighbors to be core
points themselves
3. noise points, which do not belong to any cluster.
Steps in the DBSCAN Algorithm
1. Identify Core Points: For each point in the dataset, count the number of
points within its eps neighborhood. If the count meets or exceeds MinPts,
mark the point as a core point.
2. Form Clusters: For each core point that is not already assigned to a cluster,
create a new cluster. Recursively find all density-connected points (points
within the eps radius of the core point) and add them to the cluster.
3. Density Connectivity: Two points, a and b, are density-connected if there
exists a chain of points where each point is within the eps radius of the next,
and at least one point in the chain is a core point. This chaining process
ensures that all points in a cluster are connected through a series of dense
regions.
4. Label Noise Points: After processing all points, any point that does not
belong to a cluster is labeled as noise.
DBSCAN K-Means

In DBSCAN we need not specify the K-Means is very sensitive to the


number number of clusters so it
of clusters. need to specified

Clusters formed in K-Means are


Clusters formed in DBSCAN can be of any
spherical or
arbitrary shape.
convex in shape

K-Means does not work well with


DBSCAN can work well with datasets outliers data. Outliers
having noise and outliers can skew the clusters in K-Means to a
very large extent.

In K-Means only one parameter is


In DBSCAN two parameters are required
required is for training
for training the Model
the model

You might also like