0% found this document useful (0 votes)
40 views5 pages

ML Exp 10

Program to implement K-Medoids in machine learning.

Uploaded by

ananyahc12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

ML Exp 10

Program to implement K-Medoids in machine learning.

Uploaded by

ananyahc12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Experiment No.

10

Objective:
Program to implement K-Medoids in machine learning.

Apparatus required:
Pc and Jupyter or collab .

Theory:

K-Medoids clustering is a data mining technique that groups data points into a predefined
number of clusters (k). It shares similarities with the widely used K-Means algorithm, but
with a key distinction: K-Medoids employs actual data points (medoids) as cluster centers
instead of means (centroids) calculated from data points within a cluster. This approach
makes K-Medoids particularly advantageous in scenarios where datasets exhibit:

 Non-Euclidean Distances: When data points are measured using distance metrics
beyond Euclidean distance (e.g., Manhattan distance, Hamming distance), K-Medoids
can be more suitable, as it doesn't rely on calculations that might be skewed by
outliers or non-spherical clusters.
 Presence of Outliers: Outliers can significantly affect the location of centroids in K-
Means, potentially leading to suboptimal cluster formation. K-Medoids, by using
actual data points, can be less susceptible to such distortions.

Key concepts:

K-Medoids clustering is grounded in the principles of minimizing dissimilarity within


clusters. Here's a breakdown of the core concepts:

 Distance Metric: A distance metric, denoted by d(x, y), quantifies the dissimilarity
between two data points x and y. Common choices include Euclidean distance (||x -
y||), Manhattan distance (Σ |x_i - y_i|), and Hamming distance (number of differing
elements).
 Medoid: A medoid is a data point within a cluster that is centrally located relative to
other points in that cluster. It serves as the representative point for the cluster,
minimizing the sum of distances between the medoid and all other points within the
cluster.
 Cost Function: The cost function, denoted by J(C), measures the total dissimilarity
within all clusters. In K-Medoids, it's typically calculated as the sum of distances
between each data point and its assigned cluster's medoid.

K-Medoids Algorithm

The K-Medoids algorithm follows a step-by-step process to partition data points into k
clusters:

1. Initialization:
o Define the number of clusters (k).
o Select k data points randomly (or using an initialization strategy) as initial
medoids.
2. Assignment Step:
o For each data point x:
 Calculate the distance between x and each medoid using the chosen
distance metric.
 Assign x to the cluster that has the medoid closest to x.
3. Swapping Step (Optimization):
o For each cluster c:
 For each non-medoid data point x in c:
 Temporarily swap x with the current medoid of c.
 Recompute the cost function J(C) after the swap.
 If the swap reduces the cost function, make the swap permanent (update
the medoid of c).
4. Termination:
o Repeat steps 2 and 3 until no swaps result in a lower cost function (convergence is
achieved).

Key Considerations and Advantages

 Choice of Distance Metric: Selecting an appropriate distance metric is crucial for


effective clustering. Consider the nature of your data and the relationships between
data points when making this decision.

 Initialization Strategies: While random initialization is a common starting point,


alternative strategies that select medoids likely to be well-positioned within clusters
can improve the algorithm's efficiency and lead to better clusterings.

 Time Complexity: K-Medoids has a time complexity of O(nkT), where n is the


number of data points, k is the number of clusters, and T is the number of iterations
required for convergence. It can be computationally more expensive than K-Means,
especially for large datasets.

Applications of K-Medoids

K-Medoids clustering finds applications in various domains, including:

 Customer Segmentation: Grouping customers based on purchase history,


demographics, or behavior to personalize marketing campaigns.

 Image Segmentation: Identifying and grouping regions within an image that share
similar characteristics (e.g., color, texture) for object recognition.

 Document Clustering: Grouping documents based on content similarity for


information retrieval or topic modeling.

 Gene Expression Analysis: Identifying patterns in gene expression data to


understand biological processes or disease mechanisms.
Implementation Using Code:

!pip install https://github.com/scikit-learn-contrib/scikit-learn-extra/archive/master.zip


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn_extra.cluster import KMedoids

# Load the Iris dataset


iris = load_iris()
data = iris.data
target = iris.target # Actual species labels (for evaluation)

# Define the number of clusters (k)


k=3

# Initialize K-Medoids model


kmedoids = KMedoids(n_clusters=k, metric='euclidean', random_state=0)

# Fit the model to the data


kmedoids.fit(data)

# Get the cluster labels for each data point


predicted_cluster = kmedoids.labels_

# Print some results


print("Predicted cluster labels:", predicted_cluster)

# (Optional) Evaluate clustering performance (e.g., silhouette score)


from sklearn.metrics import silhouette_score

silhouette_coeff = silhouette_score(data, predicted_cluster)


print("Silhouette Coefficient:", silhouette_coeff)

# (Optional) Compare predicted clusters with actual species labels


from sklearn.metrics import confusion_matrix

confusion_matrix = confusion_matrix(target, predicted_cluster)


print("Confusion Matrix:\n", confusion_matrix)

# Visualize the clustered data


plt.figure(figsize=(10, 5))

# Plot the original data


plt.subplot(1, 2, 1)
plt.scatter(data[:, 0], data[:, 1], c=target, cmap='viridis', edgecolor='k')
plt.title('Original Data')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')

# Plot the clustered data


plt.subplot(1, 2, 2)
plt.scatter(data[:, 0], data[:, 1], c=predicted_cluster, cmap='viridis', edgecolor='k')
plt.title('Clustered Data (K-Medoids)')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')

plt.tight_layout()
plt.show()

Result:
Program to implement K-Medoids is implemented.

You might also like