Clustering
Clustering
Introduction to Clustering
It is basically a type of unsupervised learning method.
An unsupervised learning method is a method in
which we draw references from datasets consisting of
input data without labeled responses.
Generally, it is used as a process to find meaningful
structure, explanatory underlying processes,
generative features, and groupings inherent in a set of
examples
Overview
Clustering is the task of dividing the population or
data points into a number of groups such that data
points in the same groups are more similar to other
data points in the same group and dissimilar to the
data points in other groups.
It is basically a collection of objects on the basis of
similarity and dissimilarity between them.
Overview
It is a main task of exploratory data analysis, and a
common technique for statistical data analysis, used in
many fields, including
pattern recognition,
image analysis,
machine learning.
Overview
Overview
Why Clustering
Clustering is very much important as it determines the
intrinsic grouping among the unlabelled data present.
There are no criteria for good clustering. It depends on the
user, what is the criteria they may use which satisfy their
need. For instance, we could be interested in finding
representatives for homogeneous groups (data reduction),
in finding “natural clusters” and describe their unknown
properties (“natural” data types), in finding useful and
suitable groupings (“useful” data classes) or in finding
unusual data objects (outlier detection).
This algorithm must make some assumptions that
constitute the similarity of points and each assumption
make different and equally valid clusters.
Cluster Model types
Typical cluster models include:
Connectivity models: for example, hierarchical
clustering builds models based on distance connectivity.
Centroid models: for example, the k-means
algorithm represents each cluster by a single mean vector.
Distribution models: clusters are modeled using statistical
distributions, such as multivariate normal
distributions used by the expectation-maximization
algorithm.
Density models: for example, DBSCAN and OPTICS defines
clusters as connected dense regions in the data space.
Clustering Uses
The clustering technique can be widely used in various
tasks. Some most common uses of this technique are:
Market Segmentation
Statistical data analysis
Social network analysis
Image segmentation
Anomaly detection, etc.