0% found this document useful (0 votes)

8 views24 pages

Clustering

Uploaded by

1138 Anuj Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views24 pages

Clustering

Uploaded by

1138 Anuj Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning
or data science. In this topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the
Python implementation of k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here
K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and
for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only

one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to
minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process
until it does not find the best clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a
cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters. It means
here we will try to group these datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster. These points can be either the points
from the dataset or any other point. So, here we are selecting the below two points as k points, which are not the
part of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute it by
applying some mathematics that we have studied to calculate the distance between two points. So, we will draw a
median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to the rig ht of
the line are close to the yellow centroid. Let's color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To choose the
new centroids, we will compute the center of gravity of these centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process of finding a
median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue points are right to the line.
So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new centroids will be as shown in
the below image:

o As we got the new centroids so again will draw the median line and reassign the data points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on either side of the line, which means our
model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as shown in the
below image:

K-Medoids clustering-Theoretical Explanation

K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering. First, Clustering is the process of
breaking down an abstract group of data points/ objects into classes of similar objects such that all the objects in one cluster
have similar traits. , a group of n objects is broken down into k number of clusters based on their similarities.

Two statisticians, Leonard Kaufman, and Peter J. Rousseeuw came up with this method. This tutorial explains wha t K-Medoids
do, their applications, and the difference between K-Means and K-Medoids.

K-medoids is an unsupervised method with unlabelled data to be clustered. It is an improvised version of the K-Means
algorithm mainly designed to deal with outlier data sensitivity. Compared to other partitioning algorithms, the algorithm is
simple, fast, and easy to implement.

1. Each cluster must have at least one object

2. An object must belong to only one cluster

Here is a small recap on K-Means clustering:

In the K-Means algorithm, given the value of k and unlabelled data:

1. Choose k number of random points (Data point from the data set or some other points). These points are also
called "Centroids" or "Means".
2. Assign all the data points in the data set to the closest centroid by applying any distance formula like Euclidian
distance, Manhattan distance, etc.
3. Now, choose new centroids by calculating the mean of all the data points in the clusters and goto step 2
4. Continue step 3 until no data point changes classification between two iterations.

The problem with the K-Means algorithm is that the algorithm needs to handle outlier data. An outlier is a point different
from the rest of the points. All the outlier data points show up in a different cluster and will attract other clusters to merge
with it. Outlier data increases the mean of a cluster by up to 10 units. Hence, K-Means clustering is highly affected by
outlier data.

K-Medoids:

Medoid: A Medoid is a point in the cluster from which the sum of distances to other data points is minimal.

(or)

A Medoid is a point in the cluster from which dissimilarities with all the other points in the clusters are minimal.

Instead of centroids as reference points in K-Means algorithms, the K-Medoids algorithm takes a Medoid as a reference
point.

There are three types of algorithms for K-Medoids Clustering:

1. PAM (Partitioning Around Clustering)

2. CLARA (Clustering Large Applications)
3. CLARANS (Randomized Clustering Large Applications)

PAM is the most powerful algorithm of the three algorithms but has the disadvantage of time complexity. The following K-
Medoids are performed using PAM. In the further parts, we'll see what CLARA and CLARANS are.
Algorithm:

Given the value of k and unlabelled data:

1. Choose k number of random points from the data and assign these k points to k number of clusters. These are the
initial medoids.
2. For all the remaining data points, calculate the distance from each medoid and assign it to the cluster with the
nearest medoid.
3. Calculate the total cost (Sum of all the distances from all the data points to the medoids)
4. Select a random point as the new medoid and swap it with the previous medoid. Repeat 2 and 3 steps.
5. If the total cost of the new medoid is less than that of the previous medoid, make the new medoid permanent and
repeat step 4.
6. If the total cost of the new medoid is greater than the cost of the previous medoid, undo the swap and repeat step
4.
7. The Repetitions have to continue until no change is encountered with new medoids to classify data points.

Here is an example to make the theory clear:

Data set:

x y

0 5 4

1 7 7

2 1 3

3 8 6

4 4 9

Scatter plot:
If k is given as 2, we need to break down the data points into 2 clusters.

1. Initial medoids: M1(1, 3) and M2(4, 9)

2. Calculation of distances

Manhattan Distance: |x1 - x2| + |y1 - y2|

x< y From M1(1, 3) From M2(4, 9)

0 5 4 5 6

1 7 7 10 5

2 1 3 - -

3 8 6 10 7
4 4 9 - -

Cluster 1: 0

Cluster 2: 1, 3

1. Calculation of total cost:

(5) + (5 + 7) = 17
2. Random medoid: (5, 4)

M1(5, 4) and M2(4, 9):

x y From M1(5, 4) From M2(4, 9)

0 5 4 - -

1 7 7 5 5

2 1 3 5 9

3 8 6 5 7

4 4 9 - -

Cluster 1: 2, 3

Cluster 2: 1

1. Calculation of total cost:

(5 + 5) + 5 = 15
Less than the previous cost
New medoid: (5, 4).
2. Random medoid: (7, 7)

M1(5, 4) and M2(7, 7)

x y From M1(5, 4) From M2(7, 7)

0 5 4 - -
1 7 7 - -

2 1 3 5 10

3 8 6 5 2

4 4 9 6 5

Cluster 1: 2

Cluster 2: 3, 4

1. Calculation of total cost:

(5) + (2 + 5) = 12
Less than the previous cost
New medoid: (7, 7).
2. Random medoid: (8, 6)

M1(7, 7) and M2(8, 6)

x y From M1(7, 7) From M2(8, 6)

0 5 4 5 5

1 7 7 - -

2 1 3 10 10

3 8 6 - -

4 4 9 5 7

Cluster 1: 4

Cluster 2: 0, 2

1. Calculation of total cost:

(5) + (5 + 10) = 20
Greater than the previous cost
UNDO
Hence, the final medoids: M1(5, 4) and M2(7, 7)
Cluster 1: 2
Cluster 2: 3, 4
Total cost: 12
Clusters:

Limitation of PAM:

Time complexity: O(k * (n - k)2)

Possible combinations for every node: k*(n - k)

Cost for each computation: (n - k)

Total cost: k*(n - k)2

Hence, PAM is suitable and recommended to be used for small data sets.

CLARA:

It is an extension to PAM to support Medoid clustering for large data sets. This algorithm selects data samples from the
data set, applies Pam on each sample, and outputs the best Clustering out of these samples. This is more effective than
PAM. We should ensure that the selected samples aren't biased as they affect the Clustering of the whole data.
CLARANS:

This algorithm selects a sample of neighbors to examine instead of selecting samples from the data set. In every step, it
examines the neighbors of every node. The time complexity of this algorithm is O(n 2), and this is the best and most efficient
Medoids algorithm of all.

Advantages of using K-Medoids:

1. Deals with noise and outlier data effectively
2. Easily implementable and simple to understand
3. Faster compared to other partitioning algorithms

Disadvantages:
1. Not suitable for Clustering arbitrarily shaped groups of data points.
2. As the initial medoids are chosen randomly, the results might vary based on the choice in different runs.

K-Means and K-Medoids:

K-Means K-Medoids

Both methods are types of Partition Clustering.

Unsupervised iterative algorithms

Have to deal with unlabelled data

Both algorithms group n objects into k clusters based on similar traits where k is pre-defined.

Inputs: Unlabelled data and the value of k

Metric of similarity: Euclidian Distance Metric of similarity: Manhattan Distance

Clustering is done based on distance from centroids. Clustering is done based on distance
from medoids.

A centroid can be a data point or some other point in A medoid is always a data point in the cluster.
the cluster

Can't cope with outlier data Can manage outlier data too

Sometimes, outlier sensitivity can turn out to be useful Tendency to ignore meaningful clusters in
outlier data
Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets
into a cluster and also known as hierarchical cluster analysis or HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as
the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they both differ depending on
how they work. As there is no requirement to predetermine the number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data points
as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach.

Why hierarchical clustering?

As we already have other clustering algorithms such as K-Means Clustering, then why we need hierarchical clustering? So,
as we have seen in the K-means clustering that there are some challenges with this algorithm, which are a predetermined
number of clusters, and it always tries to create the clusters of the same size. To solve these two challenges, we can opt fo r
the hierarchical clustering algorithm because, in this algorithm, we don't need to have knowledge about the predefined
number of clusters.

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group the datasets into clusters, it
follows the bottom-up approach. It means, this algorithm considers each dataset as a single cluster at the beginning, and
then start combining the closest pair of clusters together. It does this until all the clusters are merged into a single clus ter
that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

The working of the AHC algorithm can be explained using the below steps:
o Step-1: Create each data point as a single cluster. Let's say there are N data points, so the number of clusters will
also be N.

o Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there will now be N-1
clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one cluster. There will be N-2
clusters.

o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to divide the clusters as
per the problem.

Measure for the distance between two clusters

As we have seen, the closest distance between the two clusters is crucial for the hierarchical clustering. There are various
ways to calculate the distance between two clusters, and these ways decide the rule for clustering. These measures are
called Linkage methods. Some of the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the clusters. Consider th e below image:

2. Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one of the
popular linkage methods as it forms tighter clusters than single-linkage.

3. Average Linkage: It is the linkage method in which the distance between each pair of datasets is added up and
then divided by the total number of datasets to calculate the average distance between two clusters. It is also one
of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is calculated.
Consider the below image:

From the above-given approaches, we can apply any of them according to the type of problem or business requirement.

Woking of Dendrogram in Hierarchical clustering

The dendrogram is a tree-like structure that is mainly used to store each step as a memory that the HC algorithm performs.
In the dendrogram plot, the Y-axis shows the Euclidean distances between the data points, and the x-axis shows all the data
points of the given dataset.

The working of the dendrogram can be explained using the below diagram:

In the above diagram, the left part is showing how clusters are created in agglomerative clustering, and the right part is
showing the corresponding dendrogram.
o As we have discussed above, firstly, the datapoints P2 and P3 combine together and form a cluster,
correspondingly a dendrogram is created, which connects P2 and P3 with a rectangular shape. The hight is decided
according to the Euclidean distance between the data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created. It is higher than of
previous, as the Euclidean distance between P5 and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram, and P4, P5, and P6, in
another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.

Ubc 1985
100% (11)
Ubc 1985
840 pages
Athlean X BUILT For Size
50% (2)
Athlean X BUILT For Size
9 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
Best Practice Guide For Securing Active Directory Installations and Day-To-Day Operations - Part II
No ratings yet
Best Practice Guide For Securing Active Directory Installations and Day-To-Day Operations - Part II
126 pages
MLUnit III
No ratings yet
MLUnit III
42 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
CV Unit 4
No ratings yet
CV Unit 4
60 pages
Clustering
No ratings yet
Clustering
17 pages
2.10 Partitioning Methods - K-Means and K-Medoids
No ratings yet
2.10 Partitioning Methods - K-Means and K-Medoids
38 pages
Unit Iv
No ratings yet
Unit Iv
12 pages
Wma14-01-June-2023 Solved
50% (2)
Wma14-01-June-2023 Solved
32 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Algo
No ratings yet
Algo
59 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
K Clustering
No ratings yet
K Clustering
28 pages
K Medoids
No ratings yet
K Medoids
9 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
ML 12
No ratings yet
ML 12
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Unit 4
No ratings yet
Unit 4
22 pages
ML 5
No ratings yet
ML 5
61 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K - Means Numerical
No ratings yet
K - Means Numerical
3 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Kmean
No ratings yet
Kmean
24 pages
Nand 2 Nor 2
No ratings yet
Nand 2 Nor 2
19 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Considerații Privind Restaurarea Unei Icoane Rusesti Din Sec Al XIX-lea
100% (2)
Considerații Privind Restaurarea Unei Icoane Rusesti Din Sec Al XIX-lea
11 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Clustering
No ratings yet
Clustering
18 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Clustering
No ratings yet
Clustering
10 pages
The SHS For SHS Framework
No ratings yet
The SHS For SHS Framework
3 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Flet (Joyelle McSweeney) (Z-Library)
No ratings yet
Flet (Joyelle McSweeney) (Z-Library)
155 pages
Questionaire
No ratings yet
Questionaire
6 pages
Get TRDoc
No ratings yet
Get TRDoc
309 pages
Opensap: Big Data With Sap Hana Vora: Course Week 03 - Exercises
No ratings yet
Opensap: Big Data With Sap Hana Vora: Course Week 03 - Exercises
18 pages
PLAINTIFFS' VERIFIED COMPLAINT - BLACK LIVES MATTER SHENANDOAH VALLEY v. DONALD L. SMITH, Sheriff, Augusta County, Virginia
No ratings yet
PLAINTIFFS' VERIFIED COMPLAINT - BLACK LIVES MATTER SHENANDOAH VALLEY v. DONALD L. SMITH, Sheriff, Augusta County, Virginia
46 pages
Assignment ON MGT-516: (Research and Methodology)
100% (1)
Assignment ON MGT-516: (Research and Methodology)
5 pages
MMG 301 Final March18
No ratings yet
MMG 301 Final March18
143 pages
Apllicant Tracking Sarinah
No ratings yet
Apllicant Tracking Sarinah
43 pages
Disposal of Plastic Bags
No ratings yet
Disposal of Plastic Bags
15 pages
TSC7320 Controller Manual
No ratings yet
TSC7320 Controller Manual
51 pages
Division of Negros Occidental
No ratings yet
Division of Negros Occidental
5 pages
Activity 2 - Qualitative Test For The Presence of Organic Compounds
No ratings yet
Activity 2 - Qualitative Test For The Presence of Organic Compounds
5 pages
Kajian Manajemen Transportasi Pada Daerah Hinterland (Studi Kasus Di Pelabuhan Ketapang Banyuwangi)
No ratings yet
Kajian Manajemen Transportasi Pada Daerah Hinterland (Studi Kasus Di Pelabuhan Ketapang Banyuwangi)
13 pages
SAFEGRID35BP25X5P
No ratings yet
SAFEGRID35BP25X5P
2 pages
Fire Fighter
No ratings yet
Fire Fighter
3 pages
Power and Communication
No ratings yet
Power and Communication
14 pages
Business Studies Notes PDF Class 12 Chapter 13
No ratings yet
Business Studies Notes PDF Class 12 Chapter 13
6 pages
What Is Painting? Definition &amp Description - Eden Gallery
No ratings yet
What Is Painting? Definition &amp Description - Eden Gallery
3 pages
Work-Related Musculoskeletal Disorders Among Nursing Students During Clinical Training
No ratings yet
Work-Related Musculoskeletal Disorders Among Nursing Students During Clinical Training
7 pages
Erick Oliva
No ratings yet
Erick Oliva
6 pages
362a3322p005 1
No ratings yet
362a3322p005 1
1 page
Answer-Past Paper Qs-Topic 1 and Topic 2
No ratings yet
Answer-Past Paper Qs-Topic 1 and Topic 2
2 pages
Admit Card Patna University, Patna
No ratings yet
Admit Card Patna University, Patna
2 pages

Clustering

Uploaded by

Clustering

Uploaded by

K-Means Clustering Algorithm

What is K-Means Algorithm?

one group that has similar properties.

The k-means clustering algorithm mainly performs two tasks:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready

K-Medoids clustering-Theoretical Explanation

1. Each cluster must have at least one object

Here is a small recap on K-Means clustering:

In the K-Means algorithm, given the value of k and unlabelled data:

There are three types of algorithms for K-Medoids Clustering:

1. PAM (Partitioning Around Clustering)

Given the value of k and unlabelled data:

Here is an example to make the theory clear:

1. Initial medoids: M1(1, 3) and M2(4, 9)

Manhattan Distance: |x1 - x2| + |y1 - y2|

x< y From M1(1, 3) From M2(4, 9)

1. Calculation of total cost:

M1(5, 4) and M2(4, 9):

x y From M1(5, 4) From M2(4, 9)

1. Calculation of total cost:

M1(5, 4) and M2(7, 7)

x y From M1(5, 4) From M2(7, 7)

1. Calculation of total cost:

M1(7, 7) and M2(8, 6)

x y From M1(7, 7) From M2(8, 6)

1. Calculation of total cost:

Time complexity: O(k * (n - k)2)

Possible combinations for every node: k*(n - k)

Cost for each computation: (n - k)

Total cost: k*(n - k)2

Advantages of using K-Medoids:

K-Means and K-Medoids:

Both methods are types of Partition Clustering.

Unsupervised iterative algorithms

Have to deal with unlabelled data

Inputs: Unlabelled data and the value of k

Metric of similarity: Euclidian Distance Metric of similarity: Manhattan Distance

The hierarchical clustering technique has two approaches:

Why hierarchical clustering?

In this topic, we will discuss the Agglomerative Hierarchical clustering algorithm.

Agglomerative Hierarchical clustering

This hierarchy of clusters is represented in the form of the dendrogram.

How the Agglomerative Hierarchical clustering Work?

Measure for the distance between two clusters

Woking of Dendrogram in Hierarchical clustering

You might also like