Unit 3-Fuzzy Clustering
Unit 3-Fuzzy Clustering
Fuzzy C-means
Clustering
Clustering is an unsupervised machine learning technique that
divides the population into several groups or clusters such that
data points in the same group are similar to each other, and data
points in different groups are dissimilar.
Hard Clustering
In hard clustering, each data point either belongs to a cluster completely or not.
For example, each customer is put into one group out of the 10 groups.
K-Means Clustering is a hard clustering algorithm.
Soft Clustering
In soft clustering, instead of putting each data point into a separate cluster, a
probability or likelihood of that data point to be in those clusters is assigned.
For example, each costumer is assigned a probability to be in either of 10 clusters
of the retail store.
Fuzzy C-means clustering (FCM) is widely used soft clustering algorithms .
Why?
Difficult to categorise data point into rigidly single cluster.
Example : Customer
Good for certain scenario’s
Fuzzy c means gives better accuracy
Used in scenario’s where data points are overlaapping
What is Fuzzy??
Fuzzy
The term fuzzy refers to things that are not clear or are
vague.
In the real world many times we encounter a situation
when we can’t determine whether the state is true or
false, their fuzzy logic provides very valuable flexibility
for reasoning.
Fuzzy
In the boolean system truth value, 1.0 represents the absolute
truth value and 0.0 represents the absolute false value.
But in the fuzzy system, there is no logic for the absolute truth
and absolute false value.
In fuzzy logic, there is an intermediate value which is partially
true and partially false.
Fuzzy terms
FUZZIFICATION:
It is used to convert inputs i.e. crisp numbers into fuzzy
sets.
Crisp inputs are basically the exact inputs measured by
sensors and passed into the control system for
processing, such as temperature, pressure, rpm’s, etc.
Fuzzy terms
DEFUZZIFICATION:
It is used to convert the fuzzy sets obtained by the
inference engine into a crisp value.
There are several defuzzification methods available and
the best-suited one is used with a specific expert system
to reduce the error.
Fuzzy terms
RULE BASE:
It contains the set of rules and the IF-THEN conditions
provided by the experts to govern the decision-making
system.
INFERENCE ENGINE:
It determines the matching degree of the current fuzzy input
with respect to each rule and decides which rules are to be
fired according to the input field.
Fuzzy set
Fuzzy Set is a Set where every key is associated with
value, which is between 0 to 1 based on the certainity.
This value is often called as degree of membership.
Fuzzy Set is denoted with a Tilde Sign on top of the
normal Set notation.
Fuzzy C Means
Fuzzy C-means is a famous soft clustering algorithm.
It is based on the fuzzy logic and is often referred to as the
FCM algorithm.
The way FCM works is that the items are assigned
probabilities which are essentially expressing the strength
of the belonging of the item into the cluster.
Fuzzy C Means
Each data point is assigned a likelihood or probability score to belong
to that cluster.
Data set is grouped into N clusters with every data point in the dataset
belonging to every cluster to a certain degree.
For example, a data point that lies close to the center of a cluster will
have a high degree of membership in that cluster, and another data
point that lies far away from the center of a cluster will have a low
degree of membership to that cluster.
Fuzzy C Means
A membership vector is created during the FCM process
which expresses the probability of the membership, ranging
from 0 to 1 that indicates how similar an item is to the mean
of the cluster:
In the vector above, we can see that a data item belongs to
two clusters named m2 and m3. This membership vector is
created for each of the data item.
Membership function
A graph that defines how each point in the input space is
mapped to membership value between 0 and 1.
Input space is often referred to as the universe of
discourse or universal set (u), which contains all the
possible elements of concern in each particular
application.
Membership function
Let A is a given set.
The membership function to define a set A is given by:
Fuzzy C Means
This algorithm works by assigning membership to each
data point corresponding to each cluster center on the
basis of distance between the cluster center and the data
point.
More the data is near to the cluster center more is its
membership towards the particular cluster center.
Clearly, summation of membership of each data point
should be equal to one.
Fuzzy C Means
It starts with a random initial guess for the cluster centers; that is
the mean location of each cluster.
Next, fcm assigns every data point a random membership grade
for each cluster. By iteratively updating the cluster centers and the
membership grades for each data point, fcm moves the cluster
centers to the correct location within a data set and, for each data
point, finds the degree of membership in each cluster.
This iteration minimizes an objective function that represents the
distance from any given data point to a cluster center weighted by
the membership of that data point in the cluster.
Identical to the K-Means algorithm.
A data point can theoretically belong to all groups, with a
membership function (also called a membership grade) between
0 and 1, where
0 is where the data point is at the farthest possible point from a
cluster’s center and
1 is where the data point is closest to the center.
Fuzzy C means
Let x = {x1, x2, .....xn} be set of data points
Let V = {v1, v2,.....vc} be the set of centers
Algorithmic steps for Fuzzy c-means clustering
1) Randomly select ‘c’ cluster centers.
2) Calculate the cluster membership probability (for ith data
point to the jth cluster):
Where:
i is the data point , j is the cluster
dij represents the Euclidean distance between ith data and jth cluster center
m is fuzziness parameter (1, ∞) ie how much sharing of datapoints is allowed
between clusters or how many clusters a particular data point can belong to
c is the number of clusters
Algorithmic steps for Fuzzy c-means clustering
where,
n is the number of data points
vj represents the jth cluster center
m is the fuzziness index m € [1, ∞]
c represents the number of cluster center
µij represents the membership of ith data to jth cluster center
Algorithmic steps for Fuzzy c-means clustering
4) Repeat previous step 2 & 3 until minimal J value is
achieved, J is the objective function.
Where,
||xi – vj|| is the Euclidean distance between ith data
and jth cluster center.
Result of Fuzzy c-means clustering