Fuzzy C Mean
Fuzzy C Mean
Fuzzy C-Means
When to Use:
When you need overlapping clusters.
In fields like image segmentation, bioinformatics, and market segmentation.
When data has ambiguous or fuzzy boundaries.
2. Principle:
Assign each data point a membership value to every cluster.
Cluster centroids are computed based on weighted averages of data points.
Iteratively update memberships and centroids to minimize the objective function.
Achieve optimal clusters by balancing the distance of points to centroids and their degrees of membership.
3. Mathematical Foundation:
Objective Function:
𝑛 𝑐
𝑚
𝐽𝑚 = ∑ ∑ 𝑢ⅈ𝑗 ‖𝑥ⅈ − 𝑐𝑗 ‖2
ⅈ=1 𝑗=1
Where:
𝐽𝑚 : Objective function
𝑢ⅈ𝑗 : Membership of point ⅈ in cluster 𝑗
𝑥ⅈ : Data point
𝑐𝑗 : Centroid of cluster 𝑗
𝑗: Fuzziness parameter, 𝑚> 1 (controls cluster softness)
Membership Update:
1
𝑢ⅈ𝑗 = 2
‖𝑥ⅈ − 𝑐𝑗 ‖ 𝑚−1
𝑐
∑𝑘=1 ( )
‖𝑥ⅈ − 𝑐𝑘 ‖
Centroid Update:
∑𝑛ⅈ=1 𝑢ⅈ𝑗
𝑚
𝑥ⅈ
𝑐𝑗 = 𝑛 𝑚
∑ⅈ=1 𝑢ⅈ𝑗
4. Algorithm Description:
Initialize:
Define𝑐 (number of clusters), 𝑚 (fuzziness parameter), and stopping criterion.
Randomly initialize the membership matrix 𝑈.
Update Centroids:
Compute 𝑐𝑗 for each cluster.
Update Memberships:
Recalculate 𝑢ⅈ𝑗 for each point and cluster.
Check Convergence:
If the changes in 𝑈 or 𝑐𝑗 are less than a threshold, stop. Otherwise, repeat.
5 .Hyperparameters and Their Values:
Number of Clusters (𝒄):
Usually determined based on the application or using techniques like the elbow method.
Distance Metric:
Euclidean distance is most common.
Cons:
Computationally expensive for large datasets.
Sensitive to initialization and noise.
Requires predefined number of clusters.
Thank you