0% found this document useful (0 votes)
6 views25 pages

Density Based Clustering

The document discusses Density-Based Clustering, specifically the DBSCAN algorithm, which groups similar data points based on density and does not require pre-specification of the number of clusters. It outlines the advantages of DBSCAN, such as robustness to outliers and the ability to find arbitrarily shaped clusters, while also addressing its limitations, including sensitivity to hyperparameters and challenges with varying density clusters. The conclusion emphasizes DBSCAN's utility in various fields and suggests potential improvements for future applications.

Uploaded by

SRI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views25 pages

Density Based Clustering

The document discusses Density-Based Clustering, specifically the DBSCAN algorithm, which groups similar data points based on density and does not require pre-specification of the number of clusters. It outlines the advantages of DBSCAN, such as robustness to outliers and the ability to find arbitrarily shaped clusters, while also addressing its limitations, including sensitivity to hyperparameters and challenges with varying density clusters. The conclusion emphasizes DBSCAN's utility in various fields and suggests potential improvements for future applications.

Uploaded by

SRI RAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

National Institute of Technology, Arunachal Pradesh

Density Based Clustering

Ziaur Rahman Ansari


Daulat Jaduan
Ajay Kumar Yadav
Introduction to Clustering

•What is Clustering?
 Clustering groups similar data points together.
 Various types: Centroid-based, Hierarchical, Density-based

•Why Clustering?
 Helps in pattern recognition, data compression, and anomaly detection.
Types of Clustering Algorithms

•Centroid-Based Clustering: K-Means (forms spherical clusters).


•Hierarchical Clustering: Builds a tree of clusters.
•Density-Based Clustering: Focus of this presentation (e.g., DBSCAN).
Types of Clustering Algorithms
Types of Clustering Algorithms

Figure: DBSCAN
Clustering
why DBSCAN?

1. Specify Number of 2. Sensitive to


Clusters Outliers
why DBSCAN?
DBSCAN
› DBSCAN is a density-based algorithm.
● Density = number of points within a specified radius r
(Eps)
● A point is a core point if it has more than a specified
number of points (MinPts) within Eps.
› These are points that are at the interior of a cluster
● A border point has fewer than MinPts within Eps, but
is in the neighborhood of a core point
● A noise point is any point that is not a core point or a
border point.
DBSCAN
› Two parameters (eps and MinPts):
› ● ε: Maximum radius of the neighbourhood
› ● MinPts: Minimum number of points in an Eps-
neighbourhood of that point
› ● Nε(p): {q belongs to D | dist(p,q) <= ε}
› Directly density-reachable: A point p is directly density-
reachable from a point q wrt. ε, MinPts if
› 1) p belongs to Nε (q)
› 2) core point condition: |Nε (q)| >= MinPts
Core Points, Border Points and Noise Points
Density Connected Points
DBSCAN: Large Eps
DBSCAN: Optimal Eps
DBSCAN Algorithm

 Step 1 - Identify all points as either core point, border point


or noise point.
 Step 2 - For all of the unclustered core points.
– Step2a- Create a new cluster.
– Step2b -add all the points that are unclustered and
density connected to the current point into this cluster.
 Step3 - For each unclustered border point assign it to the
cluster of nearest core point .
 Step4 - Leave all the noise points as it is.
DBSCAN Visualization

DBSCAN Visualization
APPLICATION OF DBSCAN
› 1 City Planning:
Helps city planners find areas with a lot of people or
buildings to decide where to put services like schools
and parks.

2 Image Analysis:
› Used to detect objects in pictures or to separate
different parts of an image, like finding tumors in
medical scans.
APPLICATION OF DBSCAN
› 3 Fraud Detection:
Helps find unusual activities in banking transactions that
might indicate fraud.

4 Studying Genes:
› Groups genes with similar behavior to understand how
they work in the body.

5 Understanding Customers:
› Businesses use it to group customers with similar
shopping habits to create targeted marketing
APPLICATION OF DBSCAN
6 Social Media:
› Helps find communities of users with similar
interests on platforms like Facebook or Twitter.

› 7 Organizing Documents:
› Groups similar documents together, which is useful
for searching and organizing large collections of
text.
APPLICATION OF DBSCAN

8 Traffic Analysis:
Analyzes traffic data to find congestion spots and
improve traffic flow.

9 Quality Control in Manufacturing:


Helps identify patterns in product defects to improve
manufacturing processes.
Advantages of DBSCAN Algorithm
1. Robust to outliers: DBSCAN can effectively identify and handle
noise (outliers) in the data. Points that don’t belong to any
cluster are labeled as noise, allowing for cleaner results.
2. No need to specify numbers of clusters :Unlike K-means, where
you must specify the number of clusters beforehand, DBSCAN
automatically determines the number of clusters based on the
data.
3. Can find arbitrary shaped clusters :Unlike some algorithms that
only find circular clusters (like K-means), DBSCAN can discover
clusters of any shape, making it more versatile for real-world
data.
Limitations of DBSCAN Algorithm
1. Sensitivity to hyperparameters :The algorithm requires setting
two parameters: the radius (epsilon) and the minimum
number of points required to form a dense region (minPts).
Choosing inappropriate values can lead to poor clustering
results.

2. Difficulty with varying density clusters :While DBSCAN can handle


clusters of different shapes, it may have difficulty with clusters
of varying densities. If clusters are significantly different in
density, some clusters may be missed.
Limitations of DBSCAN Algorithm
3 Varying Density Challenges:
While DBSCAN can handle clusters of different
shapes, it may have difficulty with clusters of varying
densities. If clusters are significantly different in
density, some clusters may be missed.
› 4 Spatial Indexing Limitations:
› The performance of DBSCAN can degrade if the
dataset is not well-distributed. Spatial indexing
structures may not be effective if the data is
unevenly distributed.
CHALLENGES
› 11. Challenges and Best Practices
› In this section, you will explain some of the challenges that come with using DBSCAN
and the best practices for overcoming them:
› Selecting Epsilon (ε) and minPts:
– Challenge: Choosing appropriate values for these parameters is crucial. If ε is too small, many
points will be marked as noise. If too large, distinct clusters may merge. Similarly, setting minPts
too low might lead to identifying too many small clusters.

› Dealing with High-Dimensional Data:


– Challenge: DBSCAN may struggle in high-dimensional spaces because distance metrics like
Euclidean distance become less meaningful, leading to poor clustering results.
– Best Practice: Use dimensionality reduction techniques like PCA (Principal Component
Analysis) or t-SNE before applying DBSCAN. This can simplify the data and make clustering more
effective.
› Using Domain Knowledge:
– Challenge: Without proper understanding of the data, parameter tuning can be difficult.
– Best Practice: Leverage domain knowledge to adjust the parameters. For example, in
geographic clustering, you might have an idea of reasonable distances (ε) based on physical
locations.
. Conclusion

› Summarize Key Takeaways:


– DBSCAN is a powerful algorithm for finding clusters of arbitrary shapes and handling
noise.
– It doesn’t require the user to pre-specify the number of clusters, making it versatile in
exploratory data analysis.
– Choosing the right parameters (ε and minPts) is critical for effective clustering, and
DBSCAN may struggle with high-dimensional or varied density datasets.
› Reiterate the Importance:
– DBSCAN’s ability to find clusters in real-world, noisy datasets makes it highly useful in
fields such as urban planning, market analysis, and anomaly detection.
› Final Thought:
– As machine learning evolves, density-based clustering algorithms may improve
with more adaptive methods to handle high-dimensional data, auto-tune parameters, or
work with more complex data structures like graphs and networks. Newer versions of
DBSCAN (e.g., HDBSCAN) and hybrid approaches could address existing challenges and
expand the algorithm’s applicability further.
THANK YOU!

You might also like