Fuzzy Clustering
Fuzzy Clustering
Clustering is an unsupervised machine learning technique that divides the given data into different clusters
based on their distances (similarity) from each other.
The unsupervised k-means clustering algorithm gives the values of any point lying in some particular cluster
to be either as 0 or 1 i.e., either true or false. But the fuzzy logic gives the fuzzy values of any particular data
point to be lying in either of the clusters.
In Fuzzy c-means clustering, we find out the centroid of the data points and then calculate the distance of
each data point from the given centroids until the clusters formed become constant.
Suppose the given data points are {(1, 3), (2, 5), (6, 8), (7, 9)}
Fuzzy Clustering is a type of clustering algorithm in machine learning that allows a data point to belong to
more than one cluster with different degrees of membership. Unlike other clustering algorithms, such as k-
means or hierarchical clustering, which assign each data point to a single cluster, fuzzy clustering assigns a
membership degree between 0 and 1 for each data point for each cluster.
2. Model selection: Choosing the right number of clusters and membership functions can be challenging,
and may require expert knowledge or trial and error.
Where, µ is fuzzy membership value of the data point, m is the fuzziness parameter (generally taken as
2), and xk is the data point.
Here,
V11 = (0.8^2 *1 + 0.7^2 * 2 + 0.2^2 * 4 + 0.1^2 * 7) / ( (0.8^2 + 0.7^2 + 0.2^2 +
0.1^2 ) = 1.568
V12 = (0.8^2 *3 + 0.7^2 * 5 + 0.2^2 * 8 + 0.1^2 * 9) / ( (0.8^2 + 0.7^2 + 0.2^2 +
0.1^2 ) = 4.051
V21 = (0.2^2 *1 + 0.3^2 * 2 + 0.8^2 * 4 + 0.9^2 * 7) / ( (0.2^2 + 0.3^2 + 0.8^2 +
0.9^2 ) = 5.35
V22 = (0.2^2 *3 + 0.3^2 * 5 + 0.8^2 * 8 + 0.9^2 * 9) / ( (0.2^2 + 0.3^2 + 0.8^2 +
0.9^2 ) = 8.215
Centroids are: (1.568, 4.051) and (5.35, 8.215)
Step 3: Find out the distance of each point from the centroid.
D11 = ((1 - 1.568)2 + (3 - 4.051)2)0.5 = 1.2
D12 = ((1 - 5.35)2 + (3 - 8.215)2)0.5 = 6.79
Similarly, the distance of all other points is computed from both the centroids.
Step 4: Updating membership values.
Similarly, compute all other membership values, and update the matrix.
Step 5: Repeat the steps(2-4) until the constant values are obtained for the membership values or the
difference is less than the tolerance value (a small value up to which the difference in values of two
consequent updations is accepted).
Step 6: Defuzzify the obtained membership values.
Implementation: The fuzzy scikit learn library has a pre-defined function for fuzzy c-means which can be
used in Python. For using fuzzy c-means you need to install the skfuzzy library.
pip install sklearn
pip install skfuzzy
This clustering technique is different from other clustering techniques in the sense that this technique does
not explicitly segment the data into clusters. Instead, it produces a visualization of Reachability distances
and uses this visualization to cluster the data.
Agglomerative Clustering
Agglomerative clustering is a type of hierarchical clustering algorithm that merges the most similar pairs of
data points or clusters, building a hierarchy of clusters until all the data points belong to a single cluster. It
starts with each data point as its own cluster and then iteratively merges the most similar pairs of clusters
until all data points belong to a single cluster
Divisive Clustering
Divisive Clustering is the technique that starts with all data points in a single cluster and recursively splits the
clusters into smaller sub-clusters based on their dissimilarity. It is also known as, “top-down” clustering. It
starts with all data points in a single cluster, and then recursively splits the clusters into smaller sub-clusters
based on their dissimilarity.
Unlike agglomerative clustering, which starts with each data point as its own cluster and iteratively merges
the most similar pairs of clusters, divisive clustering is a “divide and conquer” approach that breaks a large
cluster into smaller sub-clusters
Difference between agglomerative clustering and Divisive clustering :
S.No. Parameters Agglomerative Clustering Divisive Clustering
Agglomerative clustering is
generally more Comparatively less
computationally expensive, expensive as divisive
especially for large datasets as clustering only requires the
3. Complexity level this approach requires the calculation of distances
calculation of all pairwise between sub-clusters, which
distances between data can reduce the
points, which can be computational burden.
computationally expensive.
granularity.