0% found this document useful (0 votes)

46 views6 pages

Fuzzy Clustering

Uploaded by

chodanker15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views6 pages

Fuzzy Clustering

Uploaded by

chodanker15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Fuzzy Clustering

Clustering is an unsupervised machine learning technique that divides the given data into different clusters
based on their distances (similarity) from each other.
The unsupervised k-means clustering algorithm gives the values of any point lying in some particular cluster
to be either as 0 or 1 i.e., either true or false. But the fuzzy logic gives the fuzzy values of any particular data
point to be lying in either of the clusters.
In Fuzzy c-means clustering, we find out the centroid of the data points and then calculate the distance of
each data point from the given centroids until the clusters formed become constant.
Suppose the given data points are {(1, 3), (2, 5), (6, 8), (7, 9)}
Fuzzy Clustering is a type of clustering algorithm in machine learning that allows a data point to belong to
more than one cluster with different degrees of membership. Unlike other clustering algorithms, such as k-
means or hierarchical clustering, which assign each data point to a single cluster, fuzzy clustering assigns a
membership degree between 0 and 1 for each data point for each cluster.

Applications in several fields of Fuzzy clustering :

1. Image segmentation: Fuzzy clustering can be used to segment images by grouping pixels with similar
properties together, such as color or texture.
2. Pattern recognition: Fuzzy clustering can be used to identify patterns in large datasets by grouping
similar data points together.
3. Marketing: Fuzzy clustering can be used to segment customers based on their preferences and
purchasing behavior, allowing for more targeted marketing campaigns.
4. Medical diagnosis: Fuzzy clustering can be used to diagnose diseases by grouping patients with similar
symptoms together.
5. Environmental monitoring: Fuzzy clustering can be used to identify areas of environmental concern by
grouping together areas with similar pollution levels or other environmental indicators.
6. Traffic flow analysis: Fuzzy clustering can be used to analyze traffic flow patterns by grouping similar
traffic patterns together, allowing for better traffic management and planning.
7. Risk assessment: Fuzzy clustering can be used to identify and quantify risks in various fields, such as
finance, insurance, and engineering.

Advantages of Fuzzy Clustering:

1. Flexibility: Fuzzy clustering allows for overlapping clusters, which can be useful when the data has a
complex structure or when there are ambiguous or overlapping class boundaries.
2. Robustness: Fuzzy clustering can be more robust to outliers and noise in the data, as it allows for a more
gradual transition from one cluster to another.
3. Interpretability: Fuzzy clustering provides a more nuanced understanding of the structure of the data, as it
allows for a more detailed representation of the relationships between data points and clusters.

Disadvantages of Fuzzy Clustering:

1. Complexity: Fuzzy clustering algorithms can be computationally more expensive than traditional
clustering algorithms, as they require optimization over multiple membership degrees.

2. Model selection: Choosing the right number of clusters and membership functions can be challenging,
and may require expert knowledge or trial and error.

The steps to perform the algorithm are:

Step 1: Initialize the data points into the desired number of clusters randomly.
Let us assume there are 2 clusters in which the data is to be divided, initializing the data point randomly.
Each data point lies in both clusters with some membership value which can be assumed anything in the
initial state.
The table below represents the values of the data points along with their membership (gamma) in each
cluster.

Cluster (1, 3) (2, 5) (4, 8) (7, 9)

1) 0.8 0.7 0.2 0.1
2) 0.2 0.3 0.8 0.9
Step 2: Find out the centroid.
The formula for finding out the centroid (V) is:

Where, µ is fuzzy membership value of the data point, m is the fuzziness parameter (generally taken as
2), and xk is the data point.
Here,
V11 = (0.8^2 *1 + 0.7^2 * 2 + 0.2^2 * 4 + 0.1^2 * 7) / ( (0.8^2 + 0.7^2 + 0.2^2 +
0.1^2 ) = 1.568
V12 = (0.8^2 *3 + 0.7^2 * 5 + 0.2^2 * 8 + 0.1^2 * 9) / ( (0.8^2 + 0.7^2 + 0.2^2 +
0.1^2 ) = 4.051
V21 = (0.2^2 *1 + 0.3^2 * 2 + 0.8^2 * 4 + 0.9^2 * 7) / ( (0.2^2 + 0.3^2 + 0.8^2 +
0.9^2 ) = 5.35
V22 = (0.2^2 *3 + 0.3^2 * 5 + 0.8^2 * 8 + 0.9^2 * 9) / ( (0.2^2 + 0.3^2 + 0.8^2 +
0.9^2 ) = 8.215
Centroids are: (1.568, 4.051) and (5.35, 8.215)
Step 3: Find out the distance of each point from the centroid.
D11 = ((1 - 1.568)2 + (3 - 4.051)2)0.5 = 1.2
D12 = ((1 - 5.35)2 + (3 - 8.215)2)0.5 = 6.79
Similarly, the distance of all other points is computed from both the centroids.
Step 4: Updating membership values.

For point 1 new membership values are:

= [{ [(1.2)2 / (1.2)2] + [(1.2)2 / (6.79)2]} ^ {(1 / (2 – 1))} ] -1 = 0.96
= [{ [(6.79)2 / (6.79)2] + [(6.79)2 / (1.2)2]} ^ {(1 / (2 – 1))} ] -1 = 0.04
Alternatively,

Similarly, compute all other membership values, and update the matrix.
Step 5: Repeat the steps(2-4) until the constant values are obtained for the membership values or the
difference is less than the tolerance value (a small value up to which the difference in values of two
consequent updations is accepted).
Step 6: Defuzzify the obtained membership values.
Implementation: The fuzzy scikit learn library has a pre-defined function for fuzzy c-means which can be
used in Python. For using fuzzy c-means you need to install the skfuzzy library.
pip install sklearn
pip install skfuzzy

OPTICS Clustering Explanation

OPTICS Clustering stands for Ordering Points To Identify Cluster Structure. It draws inspiration from the
DBSCAN clustering algorithm. It adds two more terms to the concepts of DBSCAN clustering.
OPTICS (Ordering Points To Identify the Clustering Structure) is a density-based clustering algorithm, similar
to DBSCAN (Density-Based Spatial Clustering of Applications with Noise), but it can extract clusters of
varying densities and shapes. It is useful for identifying clusters of different densities in large, high-
dimensional datasets.
The main idea behind OPTICS is to extract the clustering structure of a dataset by identifying the density-
connected points. The algorithm builds a density-based representation of the data by creating an ordered list
of points called the reachability plot. Each point in the list is associated with a reachability distance, which is
a measure of how easy it is to reach that point from other points in the dataset. Points with similar
reachability distances are likely to be in the same cluster.
The OPTICS algorithm follows these main steps:
Define a density threshold parameter, Eps, which controls the minimum density of clusters.
For each point in the dataset, calculate the distance to its k-nearest neighbors.
Starting with an arbitrary point, calculate the reachability distance of each point in the dataset, based on the
density of its neighbors.
Order the points based on their reachability distance and create the reachability plot.
Extract clusters from the reachability plot by grouping points that are close to each other and have similar
reachability distances.
One of the main advantage of OPTICS over DBSCAN, is that it does not require to set the number of
clusters in advance, instead, it extracts the clustering structure of the data and produces the reachability
plot. This allows the user to have more flexibility in selecting the number of clusters, by cutting the
reachability plot at a certain point.
Also, unlike other density-based clustering algorithms like DBSCAN, It can handle clusters of different
densities and shapes and can identify hierarchical structure.
OPTICS is implemented in Python using the sklearn.cluster.OPTICS class in the scikit-learn library. It takes
several parameters including the minimum density threshold (Eps), the number of nearest neighbors to
consider (min_samples), and a reachability distance cutoff (xi).
They are:-
1. Core Distance: It is the minimum value of radius required to classify a given point as a core point. If the
given point is not a Core point, then it’s Core Distance is undefined.
2. Reachability Distance: It is defined with respect to another data point q(Let). The Reachability distance
between a point p and q is the maximum of the Core Distance of p and the Euclidean Distance(or some
other distance metric) between p and q. Note that The Reachability Distance is not defined if q is not a
Core point.

This clustering technique is different from other clustering techniques in the sense that this technique does
not explicitly segment the data into clusters. Instead, it produces a visualization of Reachability distances
and uses this visualization to cluster the data.

OPTICS Clustering v/s DBSCAN Clustering:

1. Memory Cost : The OPTICS clustering technique requires more memory as it maintains a priority queue
(Min Heap) to determine the next data point which is closest to the point currently being processed in
terms of Reachability Distance. It also requires more computational power because the nearest neighbour
queries are more complicated than radius queries in DBSCAN.
2. Fewer Parameters : The OPTICS clustering technique does not need to maintain the epsilon parameter
and is only given in the above pseudo-code to reduce the time taken. This leads to the reduction of the
analytical process of parameter tuning.
3. This technique does not segregate the given data into clusters. It merely produces a Reachability
distance plot and it is upon the interpretation of the programmer to cluster the points accordingly.
4. Handling varying densities: DBSCAN clustering can struggle to handle datasets with varying densities,
as it requires a single value of epsilon to define the neighborhood size for all points. In contrast, OPTICS
can handle varying densities by using the concept of reachability distance, which adapts to the local
density of the data. This means that OPTICS can identify clusters of different sizes and shapes more
effectively than DBSCAN in datasets with varying densities.
5. Cluster extraction: While both OPTICS and DBSCAN can identify clusters, OPTICS produces a
reachability distance plot that can be used to extract clusters at different levels of granularity. This allows
for more flexible clustering and can reveal clusters that may not be apparent with a fixed epsilon value in
DBSCAN. However, this also requires more manual interpretation and decision-making on the part of the
programmer.
6. Noise handling: DBSCAN explicitly distinguishes between core points, boundary points, and noise
points, while OPTICS does not explicitly identify noise points. Instead, points with high reachability
distances can be considered as potential noise points. However, this also means that OPTICS may be
less effective at identifying small clusters that are surrounded by noise points, as these clusters may be
merged with the noise points in the reachability distance plot.
7. Runtime complexity: The runtime complexity of OPTICS is generally higher than that of DBSCAN, due
to the use of a priority queue to maintain the reachability distances. However, recent research has
proposed optimizations to reduce the computational complexity of OPTICS, making it more scalable for
large datasets.

Difference Between Agglomerative clustering and Divisive

clustering


Hierarchical clustering is a popular unsupervised machine learning technique used to group

similar data points into clusters based on their similarity or dissimilarity. It is called “hierarchical”
because it creates a tree-like hierarchy of clusters, where each node represents a cluster that can
be further divided into smaller sub-clusters.
There are two types of hierarchical clustering techniques:
1. Agglomerative and
2. Divisive clustering

Agglomerative Clustering

Agglomerative clustering is a type of hierarchical clustering algorithm that merges the most similar pairs of
data points or clusters, building a hierarchy of clusters until all the data points belong to a single cluster. It
starts with each data point as its own cluster and then iteratively merges the most similar pairs of clusters
until all data points belong to a single cluster

Divisive Clustering

Divisive Clustering is the technique that starts with all data points in a single cluster and recursively splits the
clusters into smaller sub-clusters based on their dissimilarity. It is also known as, “top-down” clustering. It
starts with all data points in a single cluster, and then recursively splits the clusters into smaller sub-clusters
based on their dissimilarity.
Unlike agglomerative clustering, which starts with each data point as its own cluster and iteratively merges
the most similar pairs of clusters, divisive clustering is a “divide and conquer” approach that breaks a large
cluster into smaller sub-clusters
Difference between agglomerative clustering and Divisive clustering :
S.No. Parameters Agglomerative Clustering Divisive Clustering

1. Category Bottom-up approach Top-down approach

each data point starts in its all data points start in a

own cluster, and the algorithm single cluster, and the
recursively merges the closest algorithm recursively splits
2. Approach
pairs of clusters until a single the cluster into smaller sub-
cluster containing all the data clusters until each data
points is obtained. point is in its own cluster.

Agglomerative clustering is
generally more Comparatively less
computationally expensive, expensive as divisive
especially for large datasets as clustering only requires the
3. Complexity level this approach requires the calculation of distances
calculation of all pairwise between sub-clusters, which
distances between data can reduce the
points, which can be computational burden.
computationally expensive.

Agglomerative clustering can divisive clustering may

handle outliers better than create sub-clusters around
4. Outliers divisive clustering since outliers, leading to
outliers can be absorbed into suboptimal clustering
larger clusters results.

5. Interpretability Agglomerative clustering divisive clustering can be

tends to produce more more difficult to interpret
interpretable results since the since the dendrogram shows
dendrogram shows the the splitting process of the
merging process of the clusters, and the user must
clusters, and the user can choose a stopping criterion
choose the number of clusters to determine the number of
based on the desired level of clusters.
S.No. Parameters Agglomerative Clustering Divisive Clustering

granularity.

Scikit-learn provides multiple

linkage methods for
divisive clustering is not
6. Implementation agglomerative clustering, such
currently implemented in
as “ward,” “complete,”
Scikit-learn.
“average,” and “single,”

Here are some of the

Here are some of the
applications in which
applications in which
Agglomerative Clustering is
Divisive Clustering is used :
used :
7. Example Market segmentation,
Image segmentation, Customer
Anomaly detection,
segmentation, Social network
Biological classification,
analysis, Document clustering,
Natural language processing,
Genetics, genomics, etc., and
etc.
many more.

7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unec 1734186881
No ratings yet
Unec 1734186881
50 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Fuzzy C Means
No ratings yet
Fuzzy C Means
18 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Unit 4
No ratings yet
Unit 4
16 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Clustering
No ratings yet
Clustering
17 pages
ML - 8
No ratings yet
ML - 8
70 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
K Means
No ratings yet
K Means
40 pages
Optics
No ratings yet
Optics
3 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Fuzzy C Mean
No ratings yet
Fuzzy C Mean
6 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Clustering and Its Application To GIS
No ratings yet
Clustering and Its Application To GIS
8 pages
Lect 12
No ratings yet
Lect 12
80 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
No ratings yet
Data Mining Algorithms in R - Clustering - Fuzzy Clustering - Fuzzy C-Means - Wikibooks, Open Books For An Open World
8 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
Fuzzy C-Means - Review
No ratings yet
Fuzzy C-Means - Review
3 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Fuzzypaper May No K
No ratings yet
Fuzzypaper May No K
20 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Cluster
100% (1)
Cluster
72 pages
Alpha Curriculum
No ratings yet
Alpha Curriculum
7 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
6 pages
Deep Dive Pytorch
No ratings yet
Deep Dive Pytorch
986 pages
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
No ratings yet
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
17 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Unit 4
No ratings yet
Unit 4
5 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
COMP6049001-Algorithm Design and Analysis-01
No ratings yet
COMP6049001-Algorithm Design and Analysis-01
69 pages
TF Idf
No ratings yet
TF Idf
4 pages
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
No ratings yet
Fuzzy Image Processing: Fuzzy C-Means Clustering Farah Al-Tufaili
17 pages
Fuzzy Clustering: Presented by CH - Srikanth (07991A1268)
No ratings yet
Fuzzy Clustering: Presented by CH - Srikanth (07991A1268)
11 pages
Pso PPT F
No ratings yet
Pso PPT F
30 pages
Unit 3
No ratings yet
Unit 3
67 pages
Decision Tree
No ratings yet
Decision Tree
56 pages
Graph Algorithm
No ratings yet
Graph Algorithm
14 pages
Lect 4
No ratings yet
Lect 4
34 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
MA207 Chap2
No ratings yet
MA207 Chap2
22 pages
Daa Lab Final
No ratings yet
Daa Lab Final
39 pages
資結重點2
No ratings yet
資結重點2
112 pages
Branch and Bound 15 Puzzle Problem
No ratings yet
Branch and Bound 15 Puzzle Problem
39 pages
ML Lab Report
No ratings yet
ML Lab Report
6 pages
8.hierarchical AGNES DIANA
No ratings yet
8.hierarchical AGNES DIANA
46 pages
Artificial Intelligence Heuristics in Solving Vehicle Routing Problems With Time Window Constraints
No ratings yet
Artificial Intelligence Heuristics in Solving Vehicle Routing Problems With Time Window Constraints
13 pages
Chapter 5 Notes
No ratings yet
Chapter 5 Notes
6 pages
Lecture 5 Greedy Algorithm
No ratings yet
Lecture 5 Greedy Algorithm
24 pages
07 BinarySearchTree
No ratings yet
07 BinarySearchTree
25 pages
LeetCode QQ
No ratings yet
LeetCode QQ
12 pages
Cluster
No ratings yet
Cluster
36 pages
L10 - Strobogrammatic Number
No ratings yet
L10 - Strobogrammatic Number
15 pages
Google Interview Prep Guide - Software Engineer
No ratings yet
Google Interview Prep Guide - Software Engineer
4 pages
Real Time System by Jane W. S. Liu Chapter 5.1 (A) (B) Solution
100% (2)
Real Time System by Jane W. S. Liu Chapter 5.1 (A) (B) Solution
4 pages
Gls University: Sem Iii Data Structures Unit - I
No ratings yet
Gls University: Sem Iii Data Structures Unit - I
29 pages
Linear Programming
No ratings yet
Linear Programming
19 pages
Searching and Sorting
No ratings yet
Searching and Sorting
19 pages
Minimization Z Z Z Z Maximization Z Z : LP IP
No ratings yet
Minimization Z Z Z Z Maximization Z Z : LP IP
13 pages
PHP-Lab Programs
No ratings yet
PHP-Lab Programs
10 pages
Introduction To Data Structures
No ratings yet
Introduction To Data Structures
17 pages
LAB ASSIGNMENT RECORD of DSA
No ratings yet
LAB ASSIGNMENT RECORD of DSA
8 pages
Data Structure & Program Design: B.E. (Computer Science Engineering) Fourth Semester (C.B.S.)
No ratings yet
Data Structure & Program Design: B.E. (Computer Science Engineering) Fourth Semester (C.B.S.)
4 pages
Sorting Visualizer
No ratings yet
Sorting Visualizer
16 pages
Avl Tree Insert
No ratings yet
Avl Tree Insert
5 pages
Evolutionary Algorithms PDF
No ratings yet
Evolutionary Algorithms PDF
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet