Wa0001
Wa0001
Step 2. Initialization
Set Parameters: Choose the number of clusters k (may not be fixed if it’s adaptive), a threshold
for cluster assignment, and maximum iterations.
Centroids: Randomly initialize k cluster centroids or select initial cluster centers based on a
heuristic (like k-means++ initialization).
For each data point, assign soft membership to each cluster based on similarity:
Calculate the distance/similarity of the data point to each centroid.
Convert the distance into a degree of membership to each cluster (e.gusing a Gaussian function
or a normalized similarity measure).
Ensure that the membership for each data point across all clusters sums up to 1.
Update the centroid of each cluster by calculating the weighted average of all points based on
their membership values:
For each cluster Cj, compute the new centroid as: Where uij is the membership degree of point
xi to cluster Cj.
Stopping Criteria: Check if the centroids change less than a defined threshold or if the maximum
number of iterations is reached. If not, go back to step 4.
Step 9. Output
Final overlapping clusters, where each point may belong to multiple clusters based on its
degree of membership.
Hierarchical Clustering
Algorithm:
Step1. Initialization
Start with each data point as a separate cluster. If you have N data points, initialize with N
clusters (each cluster containing one point).
Compute the distance (or similarity) between every pair of clusters. Use a distance metric like
Euclidean distance, Manhattan distance, or others depending on your data.
Store the distances in a distance matrix.
Find the pair of clusters that are closest (have the smallest distance) and merge them into a
single cluster.
This reduces the number of clusters by 1.
After merging, update the distance matrix to reflect the new cluster distances.
The distance between the new cluster and the remaining clusters is calculated using a linkage
criterion such as:
Single Linkage (Minimum): Distance between two clusters is the minimum distance between
any pair of points in the two clusters.
Complete Linkage (Maximum): Distance is the maximum distance between any pair of points in
the clusters.
Average Linkage: Distance is the average of all pairwise distances between points in the
clusters.
Centroid Linkage: Distance between the centroids (mean points) of the clusters.
Step 5. Repeat
Repeat steps 3 and 4 until all data points are in a single cluster, or a predefined number of
clusters is reached.
During the merging process, keep track of the order in which clusters are merged.
Construct a dendrogram (a tree-like diagram) that shows the hierarchical relationship between
clusters at different levels of similarity.
To get a final clustering solution, you can "cut" the dendrogram at a specific height. This will
result in a specified number of clusters, depending on where the cut is made.