0% found this document useful (0 votes)

25 views61 pages

DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering

The document discusses different clustering techniques including hierarchical clustering, k-means clustering, and measures of similarity used in clustering like distance functions and metrics. It provides examples of calculating distances between data points and categorical variables. It also describes different hierarchical, partitioning, density-based, grid-based, and model-based clustering methods.

Uploaded by

an7l7a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views61 pages

DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering

Uploaded by

an7l7a

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

Clustering

Measure of Similarity,
Hierarchical Clustering,
K-means Clustering

References:
Han, J. , Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques.
Larose, Daniel T. (2005). Discovering Knowledge In Data – An Introduction to Data Mining.
Tan, P., Steinbach, M., Kumar, v. (2006) Introduction to Data Mining.
Bramer, M., (2007) Principles of Data Mining.
Birant, D. Lecture Notes (2012).
Vahaplar, A. Lecture Notes (2012)
Clustering
• Clustering is the process of grouping a set of physical or abstract
unlabelled objects into classes of similar objects.
• A Cluster is a collection of data objects that are similar to one
another within the same cluster, and dissimilar to the objects in
other clusters.
• Clustering is an important human activity:
o Distinguishing animals and plants, male and female, cars and busses etc.

• Goals:
o Detecting natural groups in data,
o Creating homogenous classes,
o Data reduction, Outlier detection.
Clustering
• Measuring Similarity
or measuring dissimilarity?
• A distance measure to calculate the differences between two objects
d(x,y) should have the properties:
1. d(x, y)  0 for all x and y and d(x, y) = 0 only if
x = y. (Positive definiteness)
2. d(x, y) = d(y, x) for all x and y. (Symmetry)
3. d(x, z)  d(x, y) + d(y, z) for all points x, y, and z.
(Triangle Inequality)
Clustering
• Distance Function
• Euclidean Distance

d Euc ( x, y )  i i i
( x  y ) 2

• Manhattan (City Block) Distance

d Man ( x, y ) i xi  yi
• Minkowski Distance

d Min ( x, y ) 

 i
xi  yi
Clustering
• Distance Measure
• Example:
Clustering
3
point x y
2 p1
p1 0 2
p3 p4
1
p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6

p1 p2 p3 p4 p1 p2 p3 p4
p1 0 2.828 3.162 5.099 p1 0 4 4 6
p2 2.828 0 1.414 3.162 p2 4 0 2 4
p3 3.162 1.414 0 2 p3 4 2 0 2
p4 5.099 3.162 2 0 p4 6 4 2 0

Euclidean Distance Matrix Manhattan Distance Matrix

Clustering
3
point x y
2 p1
p1 0 2
p3 p4
1
p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6

p1 p2 p3 p4 p1 p2 p3 p4
p1 0 2.828 3.162 5.099 p1 0 4 4 6
p2 2.828 0 1.414 3.162 p2 4 0 2 4
p3 3.162 1.414 0 2 p3 4 2 0 2
p4 5.099 3.162 2 0 p4 6 4 2 0

Euclidean Distance Matrix Manhattan Distance Matrix

Clustering
• Problems in distance measure
o Different ranges in data
• Normalization (min-max, Z-score, etc)
o Categorical variables
Clustering
• Find the distance between Ali and Ayşe, Ali and Veli, Ayşe and Veli
Adı Yaşı Kilosu Gözrengi
Ali 22 65 Siyah
Ayşe 19 52 Ela
Veli 23 60 Siyah

Variable Yaşı Kilosu

Min 18 50
Max 30 85
Clustering
• Find the distance between Ali and Ayşe, Ali and Veli, Ayşe and Veli
Adı Yaşı Kilosu Gözrengi
Ali 22 (0,33) 65 (0,43) Siyah
Ayşe 19 (0,08) 52 (0,06) Ela
Veli 23 (0,42) 60 (0,29) Siyah

Variable Yaşı Kilosu

Min 18 50
Max 30 85

d Ali Ayşe Veli

Ali 0 1.096 0.165
Ayşe 1.096 0 1.079
Veli 0.165 1.079 0
Clustering
• Distance measure for Categorical Variables
• Binary Data (0/1 - presence/absence – Yes/No)
• Jackard’s Distance
Object j

1 0 sum
Object i
1 a b a b
0 c d cd
sum a  c b  d p

d (i, j)  bc
a bc
Example for Clustering Categorical Data

• Find the Jaccard's distance between Apple and Banana.

Feature of Fruit Sphere shape Sweet Sour Crunchy
Object i =Apple Yes Yes Yes Yes
Object j =Banana No Yes No No

(a = 1, b = 3, c = 0, d= 0) d (i, j)  bc
a bc

(3+0) / (1 + 3 + 0) = 3/4 = 0.75 Object j

1 0 sum
1 a b a b
Object i
0 c d cd
sum a  c b  d p
Example for Clustering Categorical Data

• Who are the most likely to have a similar disease?

Name Fever Cough Test-1 Test-2 Test-3 Test-4
Jack Y N P N N N
Mary Y N P N P N
Jim Y P N N N N
Let the values Y and P be set to 1, and the value N be set to 0

0 1 d (i, j)  bc
d(Jack,Mary)=  0.33 a bc
2  0 1
11 Object j
d(Jack,Jim) =1  1  1  0.67 1 0 sum
1 2 1 a b a b
d(Jim,Mary) = 1  1  2  0.75
Object i
0 c d cd
sum a  c b  d p

Result: Jim and Mary are unlikely to have a similar disease.

Jack and Mary are the most likely to have a similar disease.
Clustering Methods
• Hierarchical Methods
o AGNES, DIANA, BIRCH, Fuzzy Joint Points (FJP), ...

• Partitioning Methods
o K-Means, K-Medoids, Fuzzy c-Means, ...

• Density-Based Methods
o DBSCAN, OPTICS, Fuzzy Joint Points (FJP), ...

• Grid-Based Methods
o STING, WaveCluster, CLIQUE ...

• Model-Based Methods
o COBWEB, CLASSIT, SOM (Self-Organizing Feature Maps) ...
Hierarchical Clustering
• A tree like cluster structure (dendrogram)
• Agglomerative
o Each item is a tiny cluster of its own at the beginning,
o Two closest clusters are aggregated,
o At the end, all items are in one cluster.

• Divisive methods
o All items are in one cluster at the beginning,
o Most dissimilar cluster are seperated,
o At the end, each record represents its own cluster.
Hierarchical Clustering
• Measuring distance between clusters in Hierarchical Clustering
• Single linkage,
o the nearest-neighbor approach,
o based on the minimum distance between any record in two clusters

• Complete linkage,
o the farthest-neighbor approach,
o based on the maximum distance between any record in two clusters.

• Average linkage ,
o is designed to reduce the dependence of the cluster-linkage criterion on
extreme values, such as the most similar or dissimilar records.
o the criterion is the average distance of all the records in cluster A from all
the records in cluster B.
Hierarchical Clustering
• Single link: smallest distance between an element in one cluster
and an element in the other,

• Complete link: largest distance between an element in one

cluster and an element in the other.

• Average: avg. distance between an element in one cluster and an

element in the other.
Single-Linkage Clustering - Example
Dataset: 2,5,9,15,16,18,25,33,33,45
Complete-Linkage Clustering - Example
• Dataset: 2,5,9,15,16,18,25,33,33,45
Average-Linkage Clustering - Example
• Dataset: 2,5,9,15,16,18,25,33,33,45

 d ( x, y)
x A yB
The average dis tan ce 
Acount * Bcount
How the Clusters are Merged?

5 0.4
1 0.2
4 1 0.35
3
2 5 0.3
5 0.15 5 0.25
2 1 2
0.2
2 3 6 0.1 3 6 0.15
3
1 0.1
0.05
4 4 0.05
4 0 0
3 6 2 5 4 1 3 6 4 1 2 5

Single Link Complete Link

5 4 1 0.25

2 0.2
5
2 0.15
3 6 0.1
1
4 0.05
3
0
3 6 4 1 2 5

Average Link
Hierarchical Clustering
• Single Linkage
o Can handle non-elliptical shapes
o Sensitive to noise and outliers

• Complete Linkage
o Less sensitive to noise and outliers
o Tends to break large clusters and to form more compact, globular clusters

• Average Linkage
o Less sensitive to noise and outliers
o Tends to form more compact, globular clusters (similar to complete
linkage)
Hierarchical Clustering
• Advantages
o Does not require the number of cluster
o Easy to implement
o Fast and less complex

• Disadvantages
o Need to know where to cut the tree
o Sensitivity to noise and outliers
o Difficulty handling different sized clusters and convex shapes
o Tend to break large clusters
Partition Based Clustering
• Aims to construct a partition of a database D of n objects into a set
of k clusters such that the sum of squared distances is minimized.
• Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion e.g. minimize SSE.
Partition Based Clustering
• Within Cluster Variation (WCV)

SSE   ik1 pCi d ( p  ci ) 2

• Between Cluster Variation (BCV)

BCV = d(c1, c2)

• Maximize the between-cluster-variation with respect to to within-
cluster-variation

BCV d (c1 , c2 )

WCV SSE
Partition Based Clustering

• k-means Clustering
• is an algorithm to cluster n objects based on attributes into k
partitions, k < n
• Step 1: Ask k,
• Step 2: Randomly assign k point as the initial cluster centers,
• Step 3: For each data point, find the nearest cluster center and
assign it to that cluster,
• Step 4: For each k cluster, find the new cluster centers,
• Step 5: Repeat Step 3-5 until
o Centers do not move,
o No data point changes cluster,
o Desired SSE is obtained.
 Step 1: let k be 2
 Step 2: Randomly assign initial cluster centers, let
c1=(1,1) and c2=(2,1)
 Step 3: (first pass) for each record find the nearest cluster
center. (c1=(1,1) and c2=(2,1))
c1 c2

SSE   ik1 pCi d ( p  ci ) 2

SSE   ik1 pCi d ( p  ci ) 2  2 2  2.24 2  2.832  3.612  12  2.24 2  0 2  0 2  36.0762
BCV d (c1  c2 ) 1
   0,0278
WCV SSE 36
 We expect this ratio to increase with successive passes.
 Step 4: For each of the k clusters find the cluster centroid and
update the location of each cluster center to the new value of
the centroid.

 1  1  1   3  2  1 
new c1      (1,2)
 3   3 
 3  4  5  4  2   3  3  3  2  1 
new c2      (3.6,2.4)
 5   5 
 Step 5: repeat steps 3 and 4 until convergence.
 Step 3 (second pass) : update cluster centers c1=(1,2) and
c2=(3.6,2.4). Calculate the distances between each point and
updated cluster centers.
 Step 3 (second pass) : update cluster centers c1=(1,2) and
c2=(3.6,2.4). Calculate the distances between each point and
updated cluster centers.

c1 c2

SSE   ik1 pCi d ( p  ci ) 2  12  0.852  0.72 2  1.52 2  0 2  0.57 2  12  1.412  7,88

BCV d (c1  c2 ) 2.63
   0,3338
WCV SSE 7.88
 Step 4 (second pass) : For each of the k clusters find the cluster
centroid and update the location of each cluster center to the
new value of the centroid.

 1  1  1  2   3  2  1  1 
new c1      (1.25,1.75)
 4   4 
 3  4  5  4   3  3  3  2 
new c2      (4,2.75)
 4   4 

 Step 5: repeat steps 3 and 4 until convergence.

 Step 3 (third pass) : update cluster centers c1=(1.25,1.75) and
c2=(4,2.75). Calculate the distances between each point and
updated cluster centers.
c1 c2

BCV d (c1  c2 ) 2.93

SSE   ik1 pCi d ( p  ci ) 2  6,25    0,4688
WCV SSE 6,25
 Step 4 (third pass) : For each of the k clusters find the cluster
centroid and update the location of each cluster center to the
new value of the centroid. Since no records have shifted cluster
membership, the cluster centroids therefore also remain
unchanged.
 Step 5: Repeat steps 3 and 4 until convergence or termination.
Since the centroids remain unchanged, the algorithm
terminates.
K-means example, step 1

k1
Y

Pick 3
k2
initial
cluster
centers
(randomly)

X
K-means example, step 2

k1
Y

k2
Assign
each point
to the closest
cluster
center
k3

X
K-means example, step 3

k1 k1
Y

Move k2
each cluster center
to the mean k3
of each cluster
k2

X
K-means example, step 4

Reassign k1
points
closest to a Y
different new
cluster center

Q: Which points
are reassigned? k3
k2

X
K-means example, step 4 …

k1
Y

A: three
points with
animation
k3
k2

X
K-means example, step 4b

k1
Y

re-compute
cluster means

k3
k2

X
K-means example, step 5

k2
move cluster
centers to k3
cluster means

X
k-means Clustering
• Strength:
o Relatively efficient and fast: O(tkn)
o Easy to understand
o Often terminates at a local optimum

• Weakness
o Applicable only when mean is defined, then what about categorical data?
o Need to specify k, the number of clusters, in advance
o Unable to handle noisy data and outliers
o Not suitable to discover clusters with non-convex shapes
o Result can vary significantly depending on initial choice of centroids
o Total steps can vary depending on initial choice of centroids
k-means Clustering
k-means Clustering
k-means Clustering types
• Alternatives
• K-medians – instead of mean, use medians of each cluster
205
o Mean of 1, 3, 5, 7, 1009 is
5
o Median of 1, 3, 5, 7, 1009 is

• K-modes – to cluster categorical data by using modes instead of means for

clusters.
• K-medoids
o A medoid can be defined as the object of a cluster, whose average dissimilarity to
all the objects in the cluster is minimal.
o PAM (Partitioning Around Medoids) Algorithm

• Fuzzy c-means
o a method of clustering which allows one piece of data to belong to two or more
clusters.
Fuzzy c-Means Clustering
• Step 1: Ask k,
• Step 2: Randomly assign k point as the initial cluster centers,
• Step 3: For each data point, find the membership degree to each cluster according to the
following formula:

• where m>1 and is a membership degree

• of xi to the j.th cluster.
• Step 4: For each k cluster, find the new cluster centers,

• Step 5: Repeat Step 3-4 until desired SSE is obtained.

Fuzzy c-Means Clustering
Density Based Clustering
• Clustering based on density (local cluster criterion), such as density-
connected points.
• Major features:
o Discover clusters of arbitrary shape
o Handle noise
o One scan
o Need density parameters as termination condition

• Several interesting studies:

o DBSCAN: Ester, et al. (KDD’96)
o OPTICS: Ankerst, et al (SIGMOD’99).
o DENCLUE: Hinneburg & D. Keim (KDD’98)
DBSCAN

• Density-Based Spatial Clustering of Applications with

Noise.
– Density = number of points within a specified radius (Eps)

– A point is a core point if it has more than a specified

number of points (MinPts) within Eps
• These are points that are at the interior of a cluster

– A border point has fewer than MinPts within Eps, but is in

the neighborhood of a core point

– A noise point is any point that is not a core point or a

border point.
DBSCAN: Core, Border, and Noise Points
DBSCAN Algorithm

• Eliminate noise points

• Perform clustering on the remaining points

DBSCAN: Core, Border and Noise Points

Original Points Point types: core,

border and noise

Eps = 10, MinPts = 4

When DBSCAN Works Well

Original Points Clusters

• Resistant to Noise
• Can handle clusters of different shapes and sizes
When DBSCAN Does NOT Work Well

(MinPts=4, Eps=9.75).

Original Points

• Varying densities
• High-dimensional data
(MinPts=4, Eps=9.92)
Fuzzy Joint Points Clustering (FJP)
Fuzzy Joint Points Clustering (FJP)
Fuzzy Joint Points Clustering (FJP)
Fuzzy Joint Points Clustering (FJP)

• max:
Fuzzy Joint Points Clustering (FJP)
Model Based Methods
Attempt to optimize the fit between the given data and some mathematical model
It uses statistical functions

Unit IV
No ratings yet
Unit IV
51 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Asit Kumar Das - M4 BDA Clustering
No ratings yet
Asit Kumar Das - M4 BDA Clustering
99 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering
No ratings yet
Clustering
55 pages
K Medoids
No ratings yet
K Medoids
101 pages
ADS Lecture1
No ratings yet
ADS Lecture1
413 pages
Clustering Class
No ratings yet
Clustering Class
103 pages
ABC Assignment PDF
No ratings yet
ABC Assignment PDF
64 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Propositional Logic
No ratings yet
Propositional Logic
38 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Linear Programming: Artifical Variable Technique: Two - Phase Method
No ratings yet
Linear Programming: Artifical Variable Technique: Two - Phase Method
4 pages
Module 5
No ratings yet
Module 5
43 pages
Cluster
100% (1)
Cluster
72 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
4.1 Clustering
No ratings yet
4.1 Clustering
80 pages
Clustering
No ratings yet
Clustering
125 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Experiment 4: To Calculate Linear and Circular Convolution of Discrete Time Signals
No ratings yet
Experiment 4: To Calculate Linear and Circular Convolution of Discrete Time Signals
6 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Clustering
No ratings yet
Clustering
80 pages
Adsaa Unit Iii
No ratings yet
Adsaa Unit Iii
41 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Delfosse Lecture 2
No ratings yet
Delfosse Lecture 2
35 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
Clustering
No ratings yet
Clustering
75 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Grouping
No ratings yet
Grouping
98 pages
TQM - House of Quality 2
No ratings yet
TQM - House of Quality 2
47 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Unit 5
No ratings yet
Unit 5
63 pages
Clustering
No ratings yet
Clustering
35 pages
Clustering Analysis
No ratings yet
Clustering Analysis
102 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Fuzzy Membership Functions Fuzzy Operations: Fuzzy Union Fuzzy Intersection Fuzzy Complement
No ratings yet
Fuzzy Membership Functions Fuzzy Operations: Fuzzy Union Fuzzy Intersection Fuzzy Complement
21 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
DM 10,11 Clustering PDF
No ratings yet
DM 10,11 Clustering PDF
65 pages
Chapter 4
No ratings yet
Chapter 4
35 pages
TQM - House of Quality 1
No ratings yet
TQM - House of Quality 1
31 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
TQM Sunum 3
No ratings yet
TQM Sunum 3
18 pages
Chapter III - Computer Solution Assignment Problem
No ratings yet
Chapter III - Computer Solution Assignment Problem
43 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
This Example Shows Default Values of 8 Primitive Types in Java
No ratings yet
This Example Shows Default Values of 8 Primitive Types in Java
5 pages
Assignment No.2 - Design and Analysis of Algorithms
No ratings yet
Assignment No.2 - Design and Analysis of Algorithms
37 pages
LABMANUALFORMATCNdocx 2022 08 28 20 44 43
No ratings yet
LABMANUALFORMATCNdocx 2022 08 28 20 44 43
22 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 6+ANFIS
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 6+ANFIS
16 pages
Clustering
No ratings yet
Clustering
39 pages
KLL
No ratings yet
KLL
48 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Accelerating Benders Decomposition Algorithmic Enh PDF
No ratings yet
Accelerating Benders Decomposition Algorithmic Enh PDF
41 pages
Introduction To Matrix: A11x1 + A12x2 + ... + A1nxn b1 A21x1 + A22x2 + ... + A2nxn b2
No ratings yet
Introduction To Matrix: A11x1 + A12x2 + ... + A1nxn b1 A21x1 + A22x2 + ... + A2nxn b2
42 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
RSA Factoring Challenge
No ratings yet
RSA Factoring Challenge
4 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Nested Quantifiers: Maria Tamoor
No ratings yet
Nested Quantifiers: Maria Tamoor
20 pages
SPK Clustering
No ratings yet
SPK Clustering
35 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Python - Basic Python Program
No ratings yet
Python - Basic Python Program
16 pages
15 Essential Data Structure and Algorithm
No ratings yet
15 Essential Data Structure and Algorithm
8 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Model Performance
No ratings yet
Model Performance
8 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Coma Mse Q Paper
No ratings yet
Coma Mse Q Paper
2 pages
4.4 Lagrange Polynomials
No ratings yet
4.4 Lagrange Polynomials
16 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
QB Dsa
No ratings yet
QB Dsa
5 pages
SWRL - Semantic Web Rule Language
No ratings yet
SWRL - Semantic Web Rule Language
20 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Chapter 01 - Introduction
No ratings yet
Chapter 01 - Introduction
11 pages
Linear Programming With Python and PuLP - Part 3 - Ben Alex Keen
No ratings yet
Linear Programming With Python and PuLP - Part 3 - Ben Alex Keen
4 pages
Ada 3
No ratings yet
Ada 3
2 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Visual Basic: Numeric Data Type Conversions
No ratings yet
Visual Basic: Numeric Data Type Conversions
1 page
Shadow Weave Simply: Understanding the Weave Structure 25 Projects to Practice Your Skills
From Everand
Shadow Weave Simply: Understanding the Weave Structure 25 Projects to Practice Your Skills
Susan Kesler-Simpson
No ratings yet
Whimsical Holiday Crochet
From Everand
Whimsical Holiday Crochet
AVA OAKLEY
5/5 (1)

DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering

Uploaded by

DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering

Uploaded by

Clustering

• Manhattan (City Block) Distance

Euclidean Distance Matrix Manhattan Distance Matrix

Euclidean Distance Matrix Manhattan Distance Matrix

Variable Yaşı Kilosu

Variable Yaşı Kilosu

d Ali Ayşe Veli

• Find the Jaccard's distance between Apple and Banana.

(3+0) / (1 + 3 + 0) = 3/4 = 0.75 Object j

• Who are the most likely to have a similar disease?

Result: Jim and Mary are unlikely to have a similar disease.

• Complete link: largest distance between an element in one

• Average: avg. distance between an element in one cluster and an

Single Link Complete Link

SSE   ik1 pCi d ( p  ci ) 2

• Between Cluster Variation (BCV)

BCV = d(c1, c2)

SSE   ik1 pCi d ( p  ci ) 2

SSE   ik1 pCi d ( p  ci ) 2  12  0.852  0.72 2  1.52 2  0 2  0.57 2  12  1.412  7,88

 Step 5: repeat steps 3 and 4 until convergence.

BCV d (c1  c2 ) 2.93

• K-modes – to cluster categorical data by using modes instead of means for

• where m>1 and is a membership degree

• Step 5: Repeat Step 3-4 until desired SSE is obtained.

• Several interesting studies:

• Density-Based Spatial Clustering of Applications with

– A point is a core point if it has more than a specified

– A border point has fewer than MinPts within Eps, but is in

– A noise point is any point that is not a core point or a

• Eliminate noise points

• Perform clustering on the remaining points

Original Points Point types: core,

Eps = 10, MinPts = 4

Original Points Clusters

You might also like