0% found this document useful (0 votes)

101 views59 pages

UNIT 3-Clustering Metrics

Clustering metrics are quantitative measures to evaluate clustering algorithms, categorized into internal metrics (like Silhouette Score and Davies-Bouldin Index) and external metrics (like Adjusted Rand Index and Normalized Mutual Information). The Silhouette Score assesses cluster quality based on cohesion and separation, while the Davies-Bouldin Index evaluates cluster compactness and separation. The Dunn Index, ARI, and NMI further provide insights into clustering effectiveness, with higher values indicating better clustering performance.

Uploaded by

Priyanshu Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views59 pages

UNIT 3-Clustering Metrics

Uploaded by

Priyanshu Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 59

Clustering metrics

Clustering metrics
• Clustering metrics are quantitative measures used to evaluate the quality of
clustering algorithms and the resulting clusters.
• Internal Evaluation Metrics: These metrics evaluate the quality of clusters without
any external reference. They measure the compactness of data points within the
same cluster and the separation between different clusters.
• Such as: Silhouette Score, Davies-Bouldin Index, Calinski-Harabasz Index (Variance Ratio Criterion),
Dunn Index
• External Evaluation Metrics: These metrics require a ground truth or a reference set
of labels to compare the clustering results against. They measure the agreement
between the generated clusters and the true classes in the reference set.
• Such as: Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Fowlkes-Mallows Index
Silhouette Score
• Silhouette Coefficient or silhouette score is a metric used to
calculate the goodness of a clustering technique. This score is
calculated by measuring each data point’s similarity to the cluster
it belongs to and how different it is from other clusters. The
Silhouette score is commonly used to assess the performance of
clustering algorithms like K-Means.
• Its value ranges from -1 to 1.
• 1: Means clusters are well apart from each other and clearly
distinguished (Best Case)
• 0: Means clusters are indifferent, or we can say that the distance
between clusters is not significant (Overlapping Clusters)
• -1: Means clusters are assigned in the wrong way (Worst Case)
It is computed as:
• The silhouette value is a measure of how similar an
object is to its own cluster (cohesion) compared to other
clusters (separation).
• Silhouette Score = (b-a)/max(a,b)
where
• a= average intra-cluster distance i.e the average
distance between each point within a cluster
• b= average inter-cluster distance i.e the average
distance between all clusters
Example:
Datapoints Cluster Label
A1 C1
A2 C1
A3 C2
A4 C2

Datapoint A1 A2 A3 A4
A1 0 0.10 0.65 0.55
A2 0.10 0 0.70 0.60
A3 0.65 0.70 0 0.30
A4 0.55 0.60 0.30 0
For point A1:
a= 0.1/1=0.1
b= (0.65+0.55)/2=0.6
Silhouette Score = (b-a)/max(a,b)
(0.6-0.1)/ 0.6= 0.833

For point A2:

a= 0.1/1=0.1
b= (0.70+0.60)/2=0.65
Silhouette Score = (b-a)/max(a,b)
(0.65-0.1)/ 0.65= 0.846

For point A3:

a= 0.30/1=0.30
b= (0.65+0.70)/2=0.675
Silhouette Score = (b-a)/max(a,b)
(0.675-0.30)/ 0.675= 0.555

For point A4:

a= 0.30/1=0.30
b= (0.55+0.60)/2=0.575
Silhouette Score = (b-a)/max(a,b)
(0.575-0.30)/ 0.575= 0.478
Point A1 and A2 are lying cluster C1 , so for computing Silhouette
Score for cluster C1,
(0.833+0.846)/2= 1.679/2= 0.839

Point A3 and A4 are lying cluster C2 , so for computing Silhouette

Score for cluster C2,
(0.555+0.478)/2= 1.033/2= 0.5165

Silhouette Score/Coefficient for overall clustering

problem is:
(0.839+0.5165)/2= 0.6775 or 0.678
Davies-Bouldin Index

• The Davies-Bouldin index (DBI) is a metric for assessing

the separation and compactness of clusters.
• It is based on the idea that good clusters are those that
have low within-cluster variation and high between-
cluster separation.
• The minimum score is zero, with lower values indicating
better clustering.
• Lower the DB index value, better is the clustering.
Where,
Example
• Data Point 1: (2, 3)
• Data Point 2: (3, 2)
• Data Point 3: (8, 8)
• Data Point 4: (9, 7)
• Data Point 5: (15, 14)
• Data Point 6: (16, 13)
• Let's say we want to perform K-means clustering on this dataset with
K = 2. After clustering, the data points are divided into two clusters:

• Cluster 1: {Data Point 1, Data Point 2} Cluster 2: {Data Point 3, Data

Point 4, Data Point 5, Data Point 6}

• The centroids of these clusters are approximately:

• Centroid of Cluster 1: (2.5, 2.5) Centroid of Cluster 2: (12, 10.5)

• Now, let's calculate the pairwise distances between the centroids of

each cluster:
• Now, let's calculate the pairwise distances between the centroids of
each cluster:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import
davies_bouldin_score
data1 = np.array([[6, 8], [9, 5], [5, 4], [2,
6], [5,6], [3,4]])
labels1 = np.array([0, 0, 1, 1, 1, 0])
davies_bouldin_index =
davies_bouldin_score(data1, labels1)
print("Davies-Bouldin Index:",
davies_bouldin_index)
Dunn index:
• A metric for evaluating clustering algorithms, is an internal evaluation scheme,
where the result is based on the clustered data itself.
• Like all other such indices, the aim of this Dunn index to identify sets of clusters
that are compact, with a small variance between members of the cluster, and well
separated, where the means of different clusters are sufficiently far apart, as
compared to the within cluster variance.
• Higher the Dunn index value, better is the clustering.
• The number of clusters that maximizes Dunn index is taken as the optimal number
of clusters k.
Where,
• A higher Dunn Index value indicates better clustering, as it implies
that the clusters are compact and well-separated. Let's go through a
numerical example to illustrate the concept.
• Consider a dataset with 8 data points in a 2D space:

• Data Point 1: (2, 3) Data Point 2: (3, 2) Data Point 3: (4, 3) Data Point
4: (8, 8) Data Point 5: (9, 7) Data Point 6: (10, 8) Data Point 7: (15, 14)
Data Point 8: (16, 13)
• Let's say we want to perform K-means clustering on this dataset with
K = 2. After clustering, the data points are divided into two clusters:
• Cluster 1: {Data Point 1, Data Point 2, Data Point 3} Cluster 2: {Data
Point 4, Data Point 5, Data Point 6, Data Point 7, Data Point 8}
• Now, let's calculate the inter-cluster distances (minimum distance
between any two points from different clusters) and intra-cluster
distances (maximum distance between any two points within the
same cluster):
Example
• Let's work through a numerical example to calculate the Dunn Index for a simple set of clusters.
Suppose you have the following data points and their corresponding clusters:
Data Points:
• A(1, 2)
• B(2, 2)
• C(5, 8)
• D(6, 8)
• E(10, 12)
• F(11, 12)
Clusters:
• Cluster 1: {A, B}
• Cluster 2: {C, D}
• Cluster 3: {E, F}
• Distance between Cluster 1 and Cluster 2 (d(1, 2)):
• d(A, C) = sqrt((1-5)^2 + (2-8)^2) = sqrt(32) = 4√2
• d(A, D) = sqrt((1-6)^2 + (2-8)^2) = sqrt(41) ≈ 6.40
• d(B, C) = sqrt((2-5)^2 + (2-8)^2) = sqrt(29) ≈ 5.39
• d(B, D) = sqrt((2-6)^2 + (2-8)^2) = sqrt(36) = 6
• So, d(1, 2) = 4√2
• Distance between Cluster 1 and Cluster 3 (d(1, 3)):
• d(A, E) = sqrt((1-10)^2 + (2-12)^2) = sqrt(170) ≈ 13.04
• d(A, F) = sqrt((1-11)^2 + (2-12)^2) = sqrt(200) ≈ 14.14
• d(B, E) = sqrt((2-10)^2 + (2-12)^2) = sqrt(164) ≈ 12.81
• d(B, F) = sqrt((2-11)^2 + (2-12)^2) = sqrt(194) ≈ 13.93
• So, d(1, 3) = 12.81
• Distance between Cluster 2 and Cluster 3 (d(2, 3)):
• d(C, E) = sqrt((5-10)^2 + (8-12)^2) = sqrt(20) ≈ 4.47
• d(C, F) = sqrt((5-11)^2 + (8-12)^2) = sqrt(32) ≈ 5.66
• d(D, E) = sqrt((6-10)^2 + (8-12)^2) = sqrt(16) = 4
• d(D, F) = sqrt((6-11)^2 + (8-12)^2) = sqrt(29) ≈ 5.39
• So, d(2, 3) = 4.47
• Now, let's calculate the minimum inter-cluster distance and the
maximum intra-cluster distance:
• Minimum Inter-cluster Distance: min(d(1, 2), d(1, 3), d(2, 3)) =
min(4√2, 12.81, 4.47) = 4√2
• Maximum Intra-cluster Distance: max(max(d(A, B), d(C, D), d(E, F))) =
max(6.40, 6.40, 14.14) = 14.14
• Finally, we can calculate the Dunn Index:
• Dunn Index = min(d(1, 2), d(1, 3), d(2, 3)) / max(max(d(A, B), d(C, D),
d(E, F))) Dunn Index = (4√2) / 14.14 ≈ 0.2828
• So, the Dunn Index for these clusters is approximately 0.2828.
Data Point Coordinates Cluster
1 (2, 3) A
2 (2, 5) A
3 (3, 8) B
4 (6, 5) B
5 (8, 8) C
6 (9, 6) C
7 (10, 2) A
8 (12, 4) A
9 (15, 7) C
10 (17, 5) C
Drawbacks of Dunn index:
• As the number of clusters and dimensionality of the
data increase, the computational cost also increases.
Adjusted Rand Index (ARI)
• It measures the similarity between the true class labels and the
clusters generated by a clustering algorithm while accounting for
chance agreement.
• The ARI produces a score between -1 and 1, where higher values
indicate better agreement between the predicted clusters and the
true labels.
• Contigency table- m x n
• m= number of clusters by c1 algo.
• n= number of clusters by c2 algo (Ground Truth)
To use the Adjusted Rand Index for evaluating clustering, follow these steps:

Perform Clustering: Apply a clustering algorithm to your dataset to create clusters of data points. This
could be an algorithm like k-means, hierarchical clustering, DBSCAN, or any other clustering technique.

Obtain True Class Labels: If you have access to ground truth labels or class assignments for your data,
this is ideal. These true labels represent the actual groups or categories that your data points belong to.
Example
• Suppose you have two clustering results for a dataset with 100 data points:
• Create a contingency table that shows the number of data points in common
between the two clusterings. Rows represent clusters in Clustering A, and
columns represent clusters in Clustering B. The table might look like this:
• | Cluster 1 | Cluster 2 |
• -----------------------------------
• Cluster 1 | 30 | 20 |
• -----------------------------------
• Cluster 2 | 40 | 10 |
• -----------------------------------
• Calculate the sum of squares for the rows and columns of the
contingency table. Let's denote these as SSR (Sum of Squares for
Rows) and SSC (Sum of Squares for Columns):
• SSR = (30^2 + 20^2) + (40^2 + 10^2) = (900 + 400)+(1600+100) = 3000
• SSC = (30^2 + 40^2) + (20^2 + 10^2) = 2500 + 500 = 3000
• Now Calculate ARI = [ (RI - Expected_RI) ] / [ max(RI) - Expected_RI ) ]
• The Rand Index (RI) is calculated as (SSR + SSC) / [2 * (N choose 2)], where N is the total number of data
points (100 in this case).
• Choose (100,2)

• There are 4,950 ways that 2 items chosen from a set of 100 can be combined.
• RI = (3000 + 3000) / [2 * (100 choose 2)] = 6000 / [2 * 4950] ≈ 0.606
• Expected_RI = (SSR * SSC) / [2 * (N choose 2)^2] = (3000 * 3000) / [2 * 4950^2] ≈ 0.183

• ARI = [ (0.606 - 0.183) ] / [ 1 - 0.183 ] ≈ 0.517

• So, the Adjusted Rand Index (ARI) for the given clustering A and B is approximately 0.51. This indicates a
moderate agreement between the two clusterings, where a higher ARI value would indicate better
agreement.
Compute Adjusted rand Index
Normalized Mutual Information
(NMI)
• Normalized Mutual Information (NMI) is a normalization
of the Mutual Information (MI) score to scale the results
between 0 (no mutual information) and 1 (perfect
correlation).
NMI
• NMI is a good measure for determining the
quality of clustering.
• It is an external measure because we need the
class labels of the instances to determine the
NMI.
• Since it’s normalized we can measure and
compare the NMI between different
clusterings having different number of
clusters.
Normalized Mutual
Information
2×
• Normalized Mutual Information:

𝑁𝑀𝐼 𝐼(𝑌;
𝐻 𝑌 +𝐶)
𝐻
𝑌, 𝐶=
•where, 𝐶
1) Y = class labels
2) C = cluster labels
3) H(.) = Entropy
4) I(Y;C) = Mutual Information b/w Y and C Note: All logs are base-2.
Calculating NMI for
Clustering
• Assume m=3 classes and k=2 clusters

Cluster-1 (C=1) Cluster-2 (C=2)

Class-1 (Y=1) Class-2 (Y=2) Class-3 (Y=3)

H(Y) = Entropy of Class
Labels
• P(Y=1) = 5/20 = ¼
• P(Y=2) = 10/20 = ½
• P(Y=3) = 5/20 =¼
=
1 1 1
• H(Y) = − 41
4
−41
4
−21
2
log log log 1.5
This is calculated for the entire dataset and can be
calculated prior to clustering, as it will not change
depending on the clustering output.
H(C) = Entropy of Cluster
Labels
• P(C=1) = 10/20 = 1/2
• P(C=2) = 10/20 = ½
1
1
− 1 1
=
log 2
• H(Y) =− 2
log 2 2 1
This will be calculated every time the clustering
changes. You can see from the figure that the
clusters are balanced (have equal number of
instances).
I(Y;C)= Mutual
Information
• Mutual information is given as:
–𝐼 𝑌; 𝐶= 𝐻 𝑌 − 𝐻 𝑌 𝐶
– We already know H(Y)
– H(Y|C) is the entropy of class labels within each
cluster, how do we calculate this??

Mutual Information tells us the reduction in the

entropy of class labels that we get if we know the
cluster labels. (Similar to Information gain in
deicison trees)
H(Y|C): conditional entropy of class
labels for clustering C
• Consider Cluster-1:
– P(Y=1|C=1)=3/10 (three triangles in cluster-1)
– P(Y=2|C=1)=3/10 (three rectangles in cluster-1)
– P(Y=3|C=1)=4/10 (four stars in cluster-1)
– Calculate conditional entropy as:

𝐻 𝑌 𝐶 = −𝑃(𝐶 = ∑ 𝑃 𝑌 = 𝑦 𝐶 = 1 log(𝑃 𝑌 = 𝑦
=1 1) Y∈ 𝐶 =1
{1,2,3})
+ log(4 )+ log( )] =
1 3 3 3 4
=− 32 × [1 1 1 10 1
log 0 0 0.7855
0 10 0
H(Y|C): conditional entropy of class
labels for clustering C
• Now, consider Cluster-2:
– P(Y=1|C=2)=2/10 (two triangles in cluster-2)
– P(Y=2|C=2)=7/10 (seven rectangles in cluster-2)
– P(Y=3|C=2)=1/10 (one star in cluster-2)
– Calculate conditional entropy as:

𝐻 𝑌 𝐶 = −𝑃(𝐶 = ∑ 𝑃 𝑌 = 𝑦 𝐶 = 2 log(𝑃 𝑌 = 𝑦
=2 2) Y∈ 𝐶 =2
{1,2,3})
+ log(1 )+ log( )] =
1 2 7 7 1
=− 22 × [1 1 1 10 1
log 0 0 0.5784
0 10 0
I(Y;
C)
𝐼 = 𝐻𝑌 − 𝐻 𝑌 𝐶
• Finally the mutual information is:

𝑌; 𝐶 = 1.5 − 0.7855 +
=
0.5784
0.1361

2×
The NMI is therefore,
𝑁𝑀𝐼 𝐼(𝑌; 𝐶)
𝐻𝑌 + 𝐻
𝑌, 𝐶
𝐶
𝑁𝑀𝐼 2 × 0.1361
= 1.5 + =
=
𝑌, 𝐶
1
Calculate NMI for the following
Homogeneity
• A clustering result satisfies homogeneity if all of its
clusters contain only data points which are members of
a single class.
• This metric is independent of the absolute values of the
labels: a permutation of the class or cluster label values
won’t change the score value in any way.
• Syntax : sklearn.metrics.homogeneity_score(labels_tru
e, labels_pred)
• The Metric is not symmetric,
switching label_true with label_pred will return
the completeness_score.
Parameters :
•labels_true:<int array, shape = [n_samples]> : It accept the
ground truth class labels to be used as a reference.
•labels_pred: <array-like of shape (n_samples,)>: It accepts the
cluster labels to evaluate.
Returns:
homogeneity:<float>: Its return the score between 0.0 and 1.0
stands for perfectly homogeneous labeling.
• To calculate homogeneity numerically, you can use the following formula:
Completeness score
• This score is complementary to the previous one. Its
purpose is to provide a piece of information about the
assignment of samples belonging to the same class.
• More precisely, a good clustering algorithm should
assign all samples with the same true label to the same
cluster.
Completeness portrays the closeness of the clustering algorithm to this
(completeness_score) perfection.
This metric is autonomous of the outright values of the labels. A permutation of the
cluster label values won’t change the score value in any way.

sklearn.metrics.completeness_score()
Syntax: sklearn.metrics.completeness_score(labels_true, labels_pred)

•labels_true:<int array, shape = [n_samples]>: It

accepts the ground truth class labels to be used as a
reference.
•labels_pred: <array-like of shape (n_samples,)>: It
accepts the cluster labels to evaluate.
Returns: completeness score between 0.0 and 1.0.
1.0 stands for perfectly completeness labeling.
V-Measure
One of the primary disadvantages of any clustering technique is that it is difficult to
evaluate its performance. To tackle this problem, the metric of V-Measure was
developed. The calculation of the V-Measure first requires the calculation of two
terms:-

1.Homogeneity: A perfectly homogeneous clustering is one where each cluster has

data-points belonging to the same class label. Homogeneity describes the closeness
of the clustering algorithm to this perfection.
2.Completeness: A perfectly complete clustering is one where all data-points
belonging to the same class are clustered into the same cluster. Completeness
describes the closeness of the clustering algorithm to this perfection.

sklearn.metrics.v_measure_score(labels_true, labels_pred, *, beta=1.0)

The V-measure is the harmonic mean between homogeneity and completeness:

v = (1 + beta) * homogeneity * completeness

/ (beta * homogeneity + completeness)
Fowlkes-Mallows Score
• The Fowlkes-Mallows Score is an evaluation metric to evaluate the
similarity among clustering's obtained after applying different
clustering algorithms.
• Although technically it is used to quantify the similarity between two
clustering's, it is typically used to evaluate the clustering performance
of a clustering algorithm by assuming the second clustering to be the
ground-truth i.e. the observed data and assuming it to be the perfect
clustering.
The Fowlkes-Mallows index (FMI) is defined as the geometric mean between of the precision
and recall:

FMI = TP / sqrt((TP + FP) * (TP + FN))

• Where TP is the number of True Positive (i.e. the number of pair of points that belongs in the
same clusters in both labels_true and labels_pred),
• FP is the number of False Positive (i.e. the number of pair of points that belongs in the same
clusters in labels_true and not in labels_pred)
• and FN is the number of False Negative (i.e the number of pair of points that belongs in the
same clusters in labels_pred and not in labels_True).

The score ranges from 0 to 1. A high value indicates a good similarity between two clusters.
Adjusted Mutual Information (AMI)
• Adjusted Mutual Information (AMI) is an adjustment of
the Mutual Information (MI) score to account for chance.
• It accounts for the fact that the MI is generally higher
for two clusterings with a larger number of clusters,
regardless of whether there is actually more information
shared.
• For two clusterings U and V, the AMI is given as:

• AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

Thank you!

Psychopathology An Integrative Approach To Mental Disorders 9th Edition by David H Barlow V Mark Durand Stefan G Hofmann
No ratings yet
Psychopathology An Integrative Approach To Mental Disorders 9th Edition by David H Barlow V Mark Durand Stefan G Hofmann
351 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
(E-Book PDF) The Medical Examiner Service A Practical Guide For England and Wales 1st Edition Fast Download
100% (2)
(E-Book PDF) The Medical Examiner Service A Practical Guide For England and Wales 1st Edition Fast Download
15 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
Unit IV
No ratings yet
Unit IV
51 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
Clustering
No ratings yet
Clustering
55 pages
Mary Nirmala MSN Synopsis
No ratings yet
Mary Nirmala MSN Synopsis
25 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 3
No ratings yet
Unit 3
130 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
UNIT 3-Clustering Metrics
No ratings yet
UNIT 3-Clustering Metrics
54 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
K Means
No ratings yet
K Means
66 pages
Unit4 Clustering Evaluation
No ratings yet
Unit4 Clustering Evaluation
53 pages
PART2
No ratings yet
PART2
61 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Department of Management Presentation
No ratings yet
Department of Management Presentation
84 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering Performance Evaluation Metrics1
No ratings yet
Clustering Performance Evaluation Metrics1
19 pages
Clustering
No ratings yet
Clustering
75 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
Clustering - Introduction J Evaluation Metrics
No ratings yet
Clustering - Introduction J Evaluation Metrics
19 pages
M5
No ratings yet
M5
40 pages
Clustering (Introduction, Evaluation Metrics)
No ratings yet
Clustering (Introduction, Evaluation Metrics)
21 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Test 1A DF
No ratings yet
Test 1A DF
11 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
DW&M Unit 3 Part II
No ratings yet
DW&M Unit 3 Part II
50 pages
Navigating Veterinary Practice in The Digital Age: Implementing A Web-Based Information Management System at Animals' Choice Clinic
No ratings yet
Navigating Veterinary Practice in The Digital Age: Implementing A Web-Based Information Management System at Animals' Choice Clinic
9 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Means Clustering
No ratings yet
K-Means Clustering
13 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Brochure Digital Showroom Plans
No ratings yet
Brochure Digital Showroom Plans
14 pages
Cluster Validation: Presented By:Rohit Paul
No ratings yet
Cluster Validation: Presented By:Rohit Paul
22 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
TOPIC 7 Unemployment
No ratings yet
TOPIC 7 Unemployment
13 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Ultrasonic Sensors: USA Series US-T50/R25 US-S25AN US-S300 Series US-1AH
No ratings yet
Ultrasonic Sensors: USA Series US-T50/R25 US-S25AN US-S300 Series US-1AH
19 pages
MINERALS
No ratings yet
MINERALS
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Michael's Resume 2024
No ratings yet
Michael's Resume 2024
3 pages
TC Electronic Hall of Fame Reverb Manual English PDF
No ratings yet
TC Electronic Hall of Fame Reverb Manual English PDF
28 pages
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
No ratings yet
Iphellstar Shirt Hellstar Studios Short Sleeve Tee Shirt6644203228classType VARIANT&From Search
1 page
Spatial Data Mining: Clustering Techniques
No ratings yet
Spatial Data Mining: Clustering Techniques
56 pages
Calculation of Electrical Induction Near Power Lines
No ratings yet
Calculation of Electrical Induction Near Power Lines
22 pages
SBR - Chapter 1
No ratings yet
SBR - Chapter 1
2 pages
Island Agriculture Assessment - TOR
No ratings yet
Island Agriculture Assessment - TOR
2 pages
New Methos For Granulating Diamond and Powder
No ratings yet
New Methos For Granulating Diamond and Powder
2 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Acob, Jonalyn C. Bsed Iii-English Engl 105 Module 1 Midterm
No ratings yet
Acob, Jonalyn C. Bsed Iii-English Engl 105 Module 1 Midterm
3 pages
Laag 1
No ratings yet
Laag 1
12 pages
Bryson Yee Resume 2018-2019 Updated
No ratings yet
Bryson Yee Resume 2018-2019 Updated
2 pages
Surface Engineering Industry Germany
No ratings yet
Surface Engineering Industry Germany
27 pages
Msme Tool Room, Indore: Bio-Data
No ratings yet
Msme Tool Room, Indore: Bio-Data
2 pages
RIPMWC Round 2 Sample Questions 2019
100% (3)
RIPMWC Round 2 Sample Questions 2019
2 pages
India - Gratuity - Form - V2 - Signed PDF
No ratings yet
India - Gratuity - Form - V2 - Signed PDF
2 pages
Lect 4
No ratings yet
Lect 4
34 pages
EMS-Motor Starter Wiring and Enclosuers
No ratings yet
EMS-Motor Starter Wiring and Enclosuers
3 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
C Lab Manual
No ratings yet
C Lab Manual
43 pages
Silhouette (Clustering) : Method
No ratings yet
Silhouette (Clustering) : Method
7 pages
Substitute Leadership
100% (1)
Substitute Leadership
1 page
Digiscope Slimhole MWD Ps
No ratings yet
Digiscope Slimhole MWD Ps
2 pages
Baroda Companies
100% (1)
Baroda Companies
25 pages
ALPFA Brings Top Latino Professionals Together For 2011 Annual Convention in Anaheim, CA.
No ratings yet
ALPFA Brings Top Latino Professionals Together For 2011 Annual Convention in Anaheim, CA.
1 page
Linear Algebra Fundamentals
From Everand
Linear Algebra Fundamentals
Kartikeya Dutta
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

UNIT 3-Clustering Metrics

Uploaded by

UNIT 3-Clustering Metrics

Uploaded by

Clustering metrics

For point A2:

For point A3:

For point A4:

Point A3 and A4 are lying cluster C2 , so for computing Silhouette

Silhouette Score/Coefficient for overall clustering

• The Davies-Bouldin index (DBI) is a metric for assessing

• Cluster 1: {Data Point 1, Data Point 2} Cluster 2: {Data Point 3, Data

• The centroids of these clusters are approximately:

• Centroid of Cluster 1: (2.5, 2.5) Centroid of Cluster 2: (12, 10.5)

• Now, let's calculate the pairwise distances between the centroids of

• ARI = [ (0.606 - 0.183) ] / [ 1 - 0.183 ] ≈ 0.517

Cluster-1 (C=1) Cluster-2 (C=2)

Class-1 (Y=1) Class-2 (Y=2) Class-3 (Y=3)

Mutual Information tells us the reduction in the

•labels_true:<int array, shape = [n_samples]>: It

1.Homogeneity: A perfectly homogeneous clustering is one where each cluster has

sklearn.metrics.v_measure_score(labels_true, labels_pred, *, beta=1.0)

v = (1 + beta) * homogeneity * completeness

FMI = TP / sqrt((TP + FP) * (TP + FN))

• AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

You might also like