0% found this document useful (0 votes)
12 views3 pages

Wa0001

Uploaded by

VIJAY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Wa0001

Uploaded by

VIJAY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Overlapping Clustering Algorithm:

Step1. Data Preparation

Input: A dataset D with n data points.


Features: Each data point is represented by a feature vector.
Distance Metric: Define a similarity/distance metric (Euclidean, Cosine, etc.).

Step 2. Initialization

Set Parameters: Choose the number of clusters k (may not be fixed if it’s adaptive), a threshold
for cluster assignment, and maximum iterations.
Centroids: Randomly initialize k cluster centroids or select initial cluster centers based on a
heuristic (like k-means++ initialization).

Step 3. Cluster Membership Assignment

For each data point, assign soft membership to each cluster based on similarity:
Calculate the distance/similarity of the data point to each centroid.
Convert the distance into a degree of membership to each cluster (e.gusing a Gaussian function
or a normalized similarity measure).
Ensure that the membership for each data point across all clusters sums up to 1.

Step 4. Membership Update

Iterate over all points to update memberships:


For a given data point xi, calculate the probability of belonging to each cluster based on
distances or similarities.
If the similarity to multiple clusters exceeds a given threshold, the data point is considered to
belong to those clusters, allowing for overlap.

Step 5. Centroid Update

Update the centroid of each cluster by calculating the weighted average of all points based on
their membership values:
For each cluster Cj, compute the new centroid as: Where uij is the membership degree of point
xi to cluster Cj.

Step 6. Convergence Check

Stopping Criteria: Check if the centroids change less than a defined threshold or if the maximum
number of iterations is reached. If not, go back to step 4.

Step 7. Cluster Assignment


Assign each point to one or more clusters where its membership is above a certain threshold. If
the point has significant membership in multiple clusters, it belongs to those clusters (allowing
overlap).

Step 8. Post-processing (Optional)

Refine memberships: If needed, refine memberships based on additional criteria (such as


reducing overlap by pruning low-membership assignments).
Outlier Detection: Identify points with very low membership in all clusters and treat them as
outliers.

Step 9. Output
Final overlapping clusters, where each point may belong to multiple clusters based on its
degree of membership.

Hierarchical Clustering

Algorithm:

Step1. Initialization

Start with each data point as a separate cluster. If you have N data points, initialize with N
clusters (each cluster containing one point).

Step 2. Calculate Distance Matrix

Compute the distance (or similarity) between every pair of clusters. Use a distance metric like
Euclidean distance, Manhattan distance, or others depending on your data.
Store the distances in a distance matrix.

Step 3. Merge Closest Clusters

Find the pair of clusters that are closest (have the smallest distance) and merge them into a
single cluster.
This reduces the number of clusters by 1.

Step 4. Update Distance Matrix

After merging, update the distance matrix to reflect the new cluster distances.

The distance between the new cluster and the remaining clusters is calculated using a linkage
criterion such as:
Single Linkage (Minimum): Distance between two clusters is the minimum distance between
any pair of points in the two clusters.

Complete Linkage (Maximum): Distance is the maximum distance between any pair of points in
the clusters.

Average Linkage: Distance is the average of all pairwise distances between points in the
clusters.

Centroid Linkage: Distance between the centroids (mean points) of the clusters.

Step 5. Repeat

Repeat steps 3 and 4 until all data points are in a single cluster, or a predefined number of
clusters is reached.

Step 6. Build a Dendrogram

During the merging process, keep track of the order in which clusters are merged.
Construct a dendrogram (a tree-like diagram) that shows the hierarchical relationship between
clusters at different levels of similarity.

Step 7. Cut the Dendrogram

To get a final clustering solution, you can "cut" the dendrogram at a specific height. This will
result in a specified number of clusters, depending on where the cut is made.

You might also like