DBSCAN
DBSCAN
Introduction
• Clustering analysis or simply Clustering is basically
an Unsupervised learning method that divides the
data points into a number of specific batches or
groups, such that the data points in the same groups
have similar properties and data points in different
groups have different properties in some sense.
• Partitioning methods (K-means) and hierarchical
clustering work for finding spherical-shaped clusters
or convex clusters. In other words, they are suitable
only for compact and well-separated clusters.
K-Means DBSCAN
K-means generally clusters all the objects. DBSCAN discards objects that it defines as
noise.
K-means needs a prototype-based concept of a DBSCAN needs a density-based concept.
cluster.
K-means has difficulty with non-globular DBSCAN is used to handle clusters of multiple
clusters and clusters of multiple sizes. sizes and structures and is not powerfully
influenced by noise or outliers.
K-means can be used for data that has a clear DBSCAN needed that its definition of density,
centroid, including a mean or median. which depends on the traditional Euclidean
concept of density, be significant for the data.
K-means can be used to sparse, high DBSCAN generally implements poorly for such
dimensional data, including file data. information because the traditional Euclidean
definition of density does not operate well for
high dimensional data.
The basic K-means algorithm is similar to a DIBSCAN creates no assumption about the
statistical clustering approach (mixture models) distribution of the record.
that consider all clusters come from spherical
Gaussian distributions with several means but
the equal covariance matrix.
• Real-life data may contain irregularities, like:
• Clusters can be of arbitrary shape such as those shown in the figure
below.
• Data may contain noise.
• The figure shows a data set containing non-convex(greater than 180
degrees) shape clusters and outliers. Given such data, the k-means
algorithm has difficulties in identifying these clusters with arbitrary
shapes.
Parameters Required For DBSCAN Algorithm