Unit 3 Clustering Algorithm
Unit 3 Clustering Algorithm
Dr.M.Thamarai
Professor/ECE
SVEC
Introduction
• Clustering Analysis is a technique of partitioning a
collection of unlabelled objects/data with many
attributes ,into a meaningful disjoint groups or clusters.
• Cluster analysis is a fundamental task of unsupervised
learning.
• Visual identification and grouping similar data points is
easy if the data set has less attributes.(Only two
features).
• But dataset having n number of features clustering
process requires automatic clustering algorithms.
Clustering..
• "A way of grouping the data points into
different clusters, consisting of similar data
points. The objects with the possible
similarities remain in a group that has less or
no similarities with another group.“
• It does it by finding some similar patterns in the
unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the
presence and absence of those similar patterns.
Clustering
Clustering…
• All clusters are represented by centroids.
• Example data points (3,3),(2,6),(7,9) and centroid is given
as (4,6).
• The clusters should not overlap and every cluster should
represent only one class.
• Clustering algorithms use trial and error method to form
clusters that can be converted into labels.
• After applying this clustering technique, each cluster or
group is provided with a cluster-ID.
• ML system can use this id to simplify the processing of large
and complex datasets.
Difference between classification and
clustering
• Partitioning Clustering
• Density-Based Clustering
• Distribution Model-Based Clustering
• Hierarchical Clustering
• Fuzzy Clustering
Partitioning Clustering
• It is a type of clustering that divides the data into non-
hierarchical groups. It is also known as the centroid-
based clustering method.
• The most common example of partitioning clustering is
the K-Means Clustering algorithm.
• In this type, the dataset is divided into a set of k groups,
where K is used to define the number of pre-defined
groups.
• The cluster center is created in such a way that the
distance between the data points of one cluster is
minimum as compared to another cluster centroid.
Density-Based Clustering
1 i
p ( / )
2
• This is given as e 2
i 1 2 2
Expectation Maximization..
• The parameters and should be chosen such
that the above equation is maximized.
• This is known as maximum likelihood principle.
• The objective of EM algorithm is to maximize the
likelihood of observation by selecting proper
parameters.
• The EM algorithm works in two stages.
• 1.Expectation step
• 2.Maximization step
Expectation Maximization..