0% found this document useful (0 votes)
13 views

Fuzzy C-Means Clustering

The document describes fuzzy c-means clustering, an extension of k-means clustering that allows data points to belong to multiple clusters simultaneously. It presents the fuzzy c-means algorithm, which assigns data points membership levels in clusters based on distance from cluster centers. The algorithm aims to minimize an objective function to find the optimal fuzzy partition and cluster centers. It iterates between updating membership levels and recalculating cluster centers until convergence is reached.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Fuzzy C-Means Clustering

The document describes fuzzy c-means clustering, an extension of k-means clustering that allows data points to belong to multiple clusters simultaneously. It presents the fuzzy c-means algorithm, which assigns data points membership levels in clusters based on distance from cluster centers. The algorithm aims to minimize an objective function to find the optimal fuzzy partition and cluster centers. It iterates between updating membership levels and recalculating cluster centers until convergence is reached.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Lecture-8 and 10: Fuzzy c-Means Clustering

Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟕
𝟑
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟕
𝟑
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟓% 𝟕
𝟑
𝟗𝟓%
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟕
𝟏𝟎%
𝟑
𝟐 𝟗𝟎%

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟓𝟓%
𝟒 𝟓
𝟏 𝟕
𝟑
𝟒𝟓%
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟕
𝟑
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering

𝟔
𝟒 𝟓
𝟏 𝟕
𝟑
𝟐

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
History
❖ Fuzzy C-Means(FCM) clustering is an extension of the K-mean
clustering is developed by J. C. Dunn in 1973 and improved by
J. C. Bezdek in 1981.
❖ FCM clustering allows data points to be assigned into more than
one cluster.
❖ This algorithm works by assigning membership to each data

point corresponding to each cluster on the basis of distance


between the cluster center and the data point. Data near to the
cluster center more is its membership for the particular cluster
center. Clearly, summation of membership of each data point
should be equal to one.
1. J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of
Cybernetics 3: 32-57
2. J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algorithms", Plenum Press, New York.
FCM Algorithm Introduction
❖ To introduce this method, we define a sample set of 𝑛 data point that we
want to classify:
𝐗 = {𝒙𝟏 , 𝒙𝟐 , … . . 𝒙𝒏 } (1)
each data point, 𝐱𝑖 is defined by 𝑑 features, i.e.
𝐱 𝑖 = {𝑥𝑖1 , 𝑥𝑖2 , … . . 𝑥𝑖𝐷 } where D is the no. of features (2)
❖ Define a family of fuzzy sets 𝐴𝑗 where, 𝑗 = 1,2, … , C is a fuzzy C- partitions

in the universe of data points 𝐗. cj is the d-dimension center of the jthcluster


❖ Assign a membership degree to various data points (𝐗) in each fuzzy set

(fuzzy class). Hence, a single data point can have partial membership
degree in more than on class.
for e.g. the 𝑖𝑡ℎ data point in the 𝑗𝑡ℎ cluster have membership degree:
𝜇𝑖𝑗 ∈ [0,1] (3)
❖ The condition is that the sum of all the membership degree for a single
data point in all the classes has to be unity:
𝐶

෍ 𝜇𝑖𝑗 = 1 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖 = 1, 2, … . . , 𝑛 (4)


𝑗=1
Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
❖ Define a fuzzy 𝑐 − partition matrix 𝐔 for grouping a collection of 𝑛 data points into
C − clusters. The objective function 𝐽 for a fuzzy C− partitions is given as.
𝑚
J 𝐔, 𝐯 = σ𝑛𝑖=1 σ𝑐𝑗=1(µ𝑖𝑗 ) (𝑑𝑖𝑗 )2 (3)
𝑑𝑖𝑗 =𝑑(x𝑖 − 𝒄𝑗 )= ||(x𝑖 − 𝒄𝑗 )|| = [σ𝐷
𝑑=1 𝑥𝑖𝑑 − 𝑐𝑗𝑑 ]
1/2
(4)
▪ µ𝑖𝑗 is the membership of the 𝑖𝑡ℎ data point in the 𝑗𝑡ℎ cluster.
▪ 𝑑𝑖𝑗 is the Euclidean distance between the 𝑗𝑡ℎ cluster center and the 𝑖𝑡ℎ data point.
▪ 𝑥𝑖𝑑 is the 𝑑 𝑡ℎ feature of the 𝑖𝑡ℎ data set.
▪ 𝑚 ∈ [1, ∞] weighting parameter controls the amount of fuzziness in the
classification process.
▪ 𝒄𝑗 is the 𝑗𝑡ℎ cluster center described by D features is represented in the vector form
𝒄j = {𝑐j1 , 𝑐j2 , … . . 𝑐𝑗𝐷 }.
▪ Each cluster coordinates for every cluster can be calculated as follow:
𝑚
σ𝑛
𝑖=1(µ𝑖𝑗 ) 𝑥𝑖𝑑
𝑐𝑗𝑑 = 𝑚 (5)
σ𝑛
𝑖=1(µ𝑖𝑗 )

where d is a variable on the feature space, that is, d = 1, 2, . . .,D.


❖ Optimum fuzzy C− partitions will be obtained by eq.(6)

𝐽∗ 𝑼∗ , 𝐯 ∗ = 𝑚𝑖𝑛(𝐽 𝐔, 𝐯 ) (6)

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
1. 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑈 = µ𝑖𝑗 𝑚𝑎𝑡𝑟𝑖𝑥, 𝑈 0

2. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟𝑠 𝑐𝑗 𝑓𝑜𝑟 𝑗 = 1, 2, … 𝐶 𝑤𝑖𝑡ℎ 𝑈 0

σ𝑛 𝑚
𝑖=1 µ𝑖𝑗 .𝑥𝑖
𝑐𝑗 = σ𝑛 𝑚 n is the no. of data points and C is the total no. of clusters
𝑖=1 µ𝑖𝑗

Each cluster coordinates for every cluster can be calculated as follow:


𝑚
σ𝑛
𝑖=1(µ𝑖𝑗 ) 𝑥𝑖𝑑
𝑐𝑗𝑑 = 𝑚
σ𝑛
𝑖=1(µ𝑖𝑗 )
where d is a variable on the feature space, that is, d = 1, 2, . . .,D.

𝑘 𝑘+1 1
3. 𝑈𝑝𝑑𝑎𝑡𝑒 𝑈 ,𝑈 µ𝑖𝑗 = 2
𝑥𝑖 −𝑐𝑗 𝑚−1
σ𝐶
𝑘=1 𝑥𝑖 −𝑐𝑘

𝑑𝑖𝑗 =𝑑(x𝑖 − 𝒄𝑗 )= ||(x𝑖 − 𝒄𝑗 )|| = [σ𝐷


𝑑=1 𝑥𝑖𝑑 − 𝑐𝑗𝑑 ] 1/2 where d = 1, 2, . . .,D.

𝑘+1 𝑘
4. 𝐼𝑓 𝑈 −𝑈 <Ɛ 𝑡ℎ𝑒𝑛 𝑆𝑇𝑂𝑃; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑟𝑒𝑡𝑢𝑟𝑛 𝑡𝑜 𝑠𝑡𝑒𝑝 2
Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
❖ For example: we have initial centroid 3 & 11 (with 𝑚 = 2)

1
µ𝑖𝑗 = 2
𝑥𝑖 − 𝑐𝑗 𝑚−1
σ𝐶𝑘=1
µ𝑖𝑗 is the degree of membership of 𝑥𝑖 in the cluster 𝑗 𝑥𝑖 − 𝑐𝑘

❖ For data point 2 (1𝑠𝑡 element):


1 1 81
µ11 = 2 2 =
1+
1 =
82
= 98.78%
2−3 2−1 2−3 2−1 81
2−3
+ 2−11

The membership of data point 2 to first cluster

1 1 1
µ 12 = 2 2 = = = 1.22%
2 − 11 2−1 2 − 11 2−1
81 + 1 82
+
2−3 2 − 11
The membership of first data point (i.e.2) to second cluster

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
❖ For example: we have initial centroid 3 & 11 (with 𝑚 = 2)

µ𝑖𝑗 is the degree of membership of 𝑥𝑖 in the cluster 𝑗 1


µ𝑖𝑗 = 2
𝑥𝑖 − 𝑐𝑗 𝑚−1
σ𝐶𝑘=1
𝑥𝑖 − 𝑐𝑘

❖ For data point 3 (2𝑛𝑑 element):


µ21 = 100%

The membership of second data point to first cluster


µ22 = 0%
The membership of second data point to second cluster (since it completely
belongs to first cluster)

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
❖ For example: we have initial centroid 3 & 11 (with 𝑚 = 2)

µ𝑖𝑗 is the degree of membership of 𝑥𝑖 in the cluster 𝑗 1


µ𝑖𝑗 = 2
𝑥𝑖 − 𝑐𝑗 𝑚−1
σ𝐶𝑘=1
𝑥𝑖 − 𝑐𝑘

❖ For data point 4 (3𝑟𝑑 element):


1 1 1
µ31 = 2 2 = 1 = 50 = 98%
4−3 2−1 4−3 2−1 1+
+ 49 49
4−3 4−11

The membership of third data point to first cluster

1 1 1
µ32 = 2 2 = = = 2%
4−11 2−1 4−11 2−1 49+1 50
+
4−3 4−11

The membership of third data point to second cluster

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
❖ For example: we have initial centroid 3 & 11 (with 𝑚 = 2)

𝑢𝑖𝑗 is the degree of membership of 𝑥𝑖 in the cluster 𝑗 1


µ𝑖𝑗 = 2
𝑥𝑖 − 𝑐𝑗 𝑚−1
σ𝐶𝑘=1
𝑥𝑖 − 𝑐𝑘

❖ For data point 7 (4th element):


1 1 1
µ41 = 2 2 = = = 50%
7−3 2−1 7−3 2−1 1+1 2
+
7−3 7−11

The membership of fourth data point to first cluster

1 1 1
µ42 = 2 2 = = = 50%
7−11
2−1 +
7−11
2−1
7+1 2
7−3 7−11

The membership of fourth data point to second cluster

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Fuzzy C-means Clustering
Updated cluster center cj

σ𝑛𝑖=1 𝑢𝑖𝑗
𝑚
. 𝑥𝑖
𝑐𝑗 =
σ𝑛𝑖=1 𝑢𝑖𝑗
𝑚

98.78% 2× 2 + 100% 2 × 3 + 98% 2 × 4 + 50% 2 × 7 + ⋯ . .


𝐶1 =
98.78% 2 + 100% 2 + 98% 2 + 50% 2 + ⋯

similarly 𝐶2 can be found and these cluster centres


help in updating µ𝑖𝑗 further till it converges.

Reference: J. C. Bezdek, R. Ehrlich, and W. Full, ‘‘FCM: The fuzzy c-means clustering algorithm,’’ Comput. Geosci., vol. 10, nos. 2–3, pp. 191–203, 1984.
Example: Given Fuzzy clusters below
convert to Crisp Clusters

Fuzzy Clusters
𝐶1: [0.991 0.986 0.993 0]
𝐶2 : [0.009 0.014 0.007 1]

Crisp Clusters from Fuzzy Clusters


𝐶1: [1 1 1 0]
𝐶2 : [0 0 0 1]

Reference: T. J. Ross, "Fuzzy logic with engineering applications," Vol. 2. New York: wiley, 2004.
𝐶1: [1 1 1 0]
𝐶2 : [0 0 0 1]

𝐶1: [𝑥1 𝑥2 𝑥3 ]
𝐶2 : [ 𝑥4 ]
Answer:
Crisp 𝐶1 : {𝑥1 , 𝑥2 , 𝑥3 }
Cluster 𝐶2 : {𝑥4 }

Reference: T. J. Ross, "Fuzzy logic with engineering applications," Vol. 2. New York: wiley, 2004.
Apply FCM algorithm on D-Dimensional feature space data sets.

Find clusters using FCM on following dataset to find two clusters:

x1 x2 x3 x4

1 5 2 1
0 5 0 0
4 9 9 0
5 9 0 2
Some relevant Data Clustering works that you may try:
Thanks

You might also like