ML Exp 10
ML Exp 10
10
Objective:
Program to implement K-Medoids in machine learning.
Apparatus required:
Pc and Jupyter or collab .
Theory:
K-Medoids clustering is a data mining technique that groups data points into a predefined
number of clusters (k). It shares similarities with the widely used K-Means algorithm, but
with a key distinction: K-Medoids employs actual data points (medoids) as cluster centers
instead of means (centroids) calculated from data points within a cluster. This approach
makes K-Medoids particularly advantageous in scenarios where datasets exhibit:
Non-Euclidean Distances: When data points are measured using distance metrics
beyond Euclidean distance (e.g., Manhattan distance, Hamming distance), K-Medoids
can be more suitable, as it doesn't rely on calculations that might be skewed by
outliers or non-spherical clusters.
Presence of Outliers: Outliers can significantly affect the location of centroids in K-
Means, potentially leading to suboptimal cluster formation. K-Medoids, by using
actual data points, can be less susceptible to such distortions.
Key concepts:
Distance Metric: A distance metric, denoted by d(x, y), quantifies the dissimilarity
between two data points x and y. Common choices include Euclidean distance (||x -
y||), Manhattan distance (Σ |x_i - y_i|), and Hamming distance (number of differing
elements).
Medoid: A medoid is a data point within a cluster that is centrally located relative to
other points in that cluster. It serves as the representative point for the cluster,
minimizing the sum of distances between the medoid and all other points within the
cluster.
Cost Function: The cost function, denoted by J(C), measures the total dissimilarity
within all clusters. In K-Medoids, it's typically calculated as the sum of distances
between each data point and its assigned cluster's medoid.
K-Medoids Algorithm
The K-Medoids algorithm follows a step-by-step process to partition data points into k
clusters:
1. Initialization:
o Define the number of clusters (k).
o Select k data points randomly (or using an initialization strategy) as initial
medoids.
2. Assignment Step:
o For each data point x:
Calculate the distance between x and each medoid using the chosen
distance metric.
Assign x to the cluster that has the medoid closest to x.
3. Swapping Step (Optimization):
o For each cluster c:
For each non-medoid data point x in c:
Temporarily swap x with the current medoid of c.
Recompute the cost function J(C) after the swap.
If the swap reduces the cost function, make the swap permanent (update
the medoid of c).
4. Termination:
o Repeat steps 2 and 3 until no swaps result in a lower cost function (convergence is
achieved).
Applications of K-Medoids
Image Segmentation: Identifying and grouping regions within an image that share
similar characteristics (e.g., color, texture) for object recognition.
plt.tight_layout()
plt.show()
Result:
Program to implement K-Medoids is implemented.