6 Clustering
6 Clustering
6 Clustering
Supervised
Unsuprvised
(Prediciton||Classification
(Description||Clustering)
Regression)
What is Clustering?
“It is an unsupervised descriptive data analytics”
Definition: Clustering is the task of dividing the population or
data points into a number of groups such that data points in the
same groups are more similar than data points those are in other
groups.
Clustering Methods
Clustering methods can be classified into the following
categories −
1-Partitioning Method
Key points:
The k-means algorithm assigns each of the n examples
to one of the k clusters.
Where suitable k is a number that has been
determined ahead of time (but must be given in
advance at beginning).
The goal is to minimize the differences within each
cluster and maximize the differences between the
clusters.
Procedure:
Euclidean distance
Medicine data
Weight PH (y)
(x)
A 1 1
B 2 1
C 4 3
D 5 4
Step-Iteration(1)
New centroid
Now re-calculate the centroid of each cluster based on the new
member.
Group-1 has one member and so centroid is = 1,1
Group-2 has three members and so centroid is:
Distance calculation
Step-Iteration (2)
New centroid
Distance calculation