0% found this document useful (0 votes)
51 views

Clustering Methods

The document discusses clustering methods, including hierarchical clustering and k-means clustering. It defines clustering as grouping samples into homogeneous categories to structure data. Hierarchical clustering works by iteratively merging the closest clusters into larger clusters from the bottom up. K-means clustering groups samples into k clusters by minimizing distances between samples and cluster centroids. The document provides details on different clustering algorithms, measures of cluster quality, and applications of clustering.

Uploaded by

Adil Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Clustering Methods

The document discusses clustering methods, including hierarchical clustering and k-means clustering. It defines clustering as grouping samples into homogeneous categories to structure data. Hierarchical clustering works by iteratively merging the closest clusters into larger clusters from the bottom up. K-means clustering groups samples into k clusters by minimizing distances between samples and cluster centroids. The document provides details on different clustering algorithms, measures of cluster quality, and applications of clustering.

Uploaded by

Adil Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Clustering

Gilles Gasso

INSA Rouen - ASI Departement


Laboratory LITIS

18 septembre 2019

Gilles Gasso Clustering 1 / 29


Plan

1 Introduction
Notion of dissimilarity
Quality of clusters

2 Methods of clustering
Hierarchical clustering
Principle and algorithm
K-means
Principle and algorithm
Variations

Gilles Gasso Clustering 2 / 29


Introduction

Introduction

D = {x i ∈ Rd }N
i=1 : set of training samples

Goal : structure the data into homogeneous categories


Group the samples into clusters so that the data in a cluster be as
similar as possible

Clustering ≡ unsupervised learning.

Clustering images

https://courses.cs.washington.edu/courses/cse416/18sp/slides/L11_kmeans.pdf

Gilles Gasso Clustering 3 / 29


Introduction

Applications
Field Data type Clusters
Text mining Texts Close texts
E-mails Automatic folders
Graph mining Graphs Social sub-networks
BioInformatics Genes Resembling genes
Marketing Client profile, Customer
purchased products segmentation
Image Images Homogeneous areas
segmentation in an image
Web log analysis Clickstream User profile

http:
//images2.programmersought.com/267/fc/fc00092c0966ec1d4b726f60880f9703.png
Gilles Gasso Clustering 4 / 29
Introduction

What is clustering ?
How to define similarity or dissimilarity between samples
How to characterize a cluster ?
Number of clusters
Which algorithms of clustering ?
How to assess a clustering result

https://image.slidesharecdn.com/k-means-130411020903-phpapp01/95/k-means-clustering-algorithm-4-638.jpg?
cb=1365646184
Gilles Gasso Clustering 5 / 29
Introduction Notion of dissimilarity

Dissimilarity measure (1)

Concept of dissimilarity
Dissimilarity is a function of the pair (x 1 , x 2 ) with a value in R+ such that
D(x 1 , x 2 ) = D(x 2 , x 1 ) ≥ 0 and D(x 1 , x 2 ) = 0 ⇒ x 1 = x 2

Measure of dissimilarity : distance D(x 1 , x 2 ) between 2 points x 1 and x 2


Minkoswski’s distance : D(x 1 , x 2 )q = dj=1 |x1,j − x2,j |q
P

Euclidean distance
Pd corresponds to q = 2 :
D(x 1 , x 2 )2 = j=1 (x1,j − x2,j )2 = (x 1 − x 2 )> (x 1 − x 2 )
Pd
Manhattan distance (q = 1) : D(x 1 , x 2 ) = j=1 |x1,j − x2,j |

Metric linked to the positive definite matrix W :

D 2 (x 1 , x 2 ) = (x 1 − x 2 )> W(x 1 − x 2 )

Gilles Gasso Clustering 6 / 29


Introduction Notion of dissimilarity

Dissimilarity measure (2)

The d-dimensional samples x 1 and x 2 are of discrete values


Compute the contingency matrix A(x 1 , x 2 ) ∈ IRd×d
> >
x1 = 0 1 2 1 2 1 and x 2 = 1 0 2 1 0 1
 
0 1 0
A(x 1 , x 2 ) = 1 2 0
1 0 1

Hamming’s distance : number of indexes where the 2 vectors differ


d
X d
X
D(x 1 , x 2 ) = aij
i=1 j=1,j6=i

Example : D(x 1 , x 2 ) = 3

Gilles Gasso Clustering 7 / 29


Introduction Notion of dissimilarity

Dissimilarity between clusters (1)

Distance D(C1 , C2 ) between 2 clusters C1 and C2


minimum diameter (nearest neighbor) :
Dmin (C1 , C2 ) = min {D(x i , x j ), x i ∈ C1 , x j ∈ C2 }

maximum diameter :
Dmax (C1 , C2 ) = max {D(x i , x j ), x i ∈ C1 , x j ∈ C2 }

Minimum diameter Maximum diameter

Gilles Gasso Clustering 8 / 29


Introduction Notion of dissimilarity

Dissimilarity between clusters (2)


Distance D(C1 , C2 ) between 2 clusters C1 and C2
average distance :
P P
x i ∈C1 x j ∈C2 D(x i , x j )
Dmoy (C1 , C2 ) =
n1 n2
Ward’s distance (between
q centres) :
n1 n2
DWard (C1 , C2 ) = n1 +n2 D(µ1 , µ2 )

Average distance Distance between centroids

Gilles Gasso Clustering 9 / 29


Introduction Quality of clusters

Characteristics of a good clustering

Every cluster C` is characterized by :


His centroid : µ` = N1` i∈C` x i with N` = card(C` )
P

His inertia : J` = i∈C` D 2 (x i , µ` )


P

measures how close are the points around µ` . The lower J` , the smaller
is the dispersion of points around µ`

D 2 (x i , µ` ) = i∈C` J`
P P P
Within distance : Jw = ` i∈C`

Let µ be the mean of the samples : µ = N1 i x i


P

2 (µ , µ)
P
Between distance : Jb = ` N` D `

measures the "distance" between the clusters. The greater the inertia,
the more the clusters are well separated

Gilles Gasso Clustering 10 / 29


Introduction Quality of clusters

Illustration

C1
g1
C3
g3
C2
g g
g4 C4
g2

Total inertia of the points = Inertia Intra-cluster + Inertia Inter-cluster

A good clustering . . .
is the one which minimizes the within distance and maximizes the between
distance
Gilles Gasso Clustering 11 / 29
Methods of clustering Hierarchical clustering

Approaches of clustering

Hierarchical clustering

K-means clustering

Gilles Gasso Clustering 12 / 29


Methods of clustering Hierarchical clustering

Hierarchical clustering : principle

Principle : bottom up approach


Each point or cluster is gradually "agglomerated" to the nearest cluster.

Algorithm
Initialization :
Each sample represents a cluster,
Compute the pairwise distance matrix M with Mij = D(x i , x j )
Repeat
Select from M the two closest clusters CI and CJ
Merge CI and CJ into the cluster CG
Update M by computing the distance between CG and the remaining
clusters
Until all samples merged into one cluster

Gilles Gasso Clustering 13 / 29


Methods of clustering Hierarchical clustering

Hierarchical clustering : illustration

C10
Hiérarchie i C8 C10
(indicée) C5
C9 C2
C4
C9
C8
C7 C6
C6 C3 C1
C7
C5
C4
C3 C2 C1

Samples Dendogram Clustering

Dendrogram = representation of the successive fusions

Height of a cluster in the dendrogram = distance betweent the 2


clusters before they merge

Gilles Gasso Clustering 14 / 29


Methods of clustering Hierarchical clustering

Linkage criterion

Common metrics used for merging two clusters

Single linkage (minimum) based on Dmin (C1 , C2 )


tends to produce some large clusters (by chaining effect)
sensitivity to noised data

Complete linkage (maximum) based on Dmax (C1 , C2 )


tends to produce specific clusters (only very close clusters are merged
together)
sensitivity to noised data

Average linkage based on Dmoy (C1 , C2 )


tendency to produce classes with close variance

Ward distance DWard (C1 , C2 )


tends to minimize within variance of clusters being merged

Gilles Gasso Clustering 15 / 29


Methods of clustering Hierarchical clustering

Influence of linkage criterion (1)

Données (métrique : dist. Eucl.) Saut minimal Saut maximal


4
D
i i
1,1 4,0
C
3

E 0,9 2,8
2
F
B 0,7 1,7
1
A
0,5 0,5
0
0 1 2 3 4 F E A B C D C D A B E F

Clustering result may change w.r.t the selected linkage measure

Gilles Gasso Clustering 16 / 29


Methods of clustering Hierarchical clustering

Influence of linkage criterion (2)

Gilles Gasso Clustering 17 / 29


Methods of clustering K-means

Approaches of clustering

Hierarchical clustering

K-means clustering

Gilles Gasso Clustering 18 / 29


Methods of clustering K-means

Clustering by data partitioning

Goal
D = {x i ∈ Rd }N
i=1 a set of training samples

Search of a partition in K clusters (with K < N)

Direct approach
Build all possible partitions
Retain the best partition among them

NP-hard problem
The number of possible partitions increases exponentially :
#Clusters = K1! K K −k C K k N .
P
k=1 (−1) k
For N = 10 and K = 4, we have 34105 possible partitions !

Gilles Gasso Clustering 19 / 29


Methods of clustering K-means

Data partitioning

Workaround solution
Determine the K clusters {C` }K K
`=1 and their centers {µ` }`=1 that
minimize the cluster within-distance Jw
K X
X
min kx i − µ` k2
{C` }K K
`=1 ,{µ` }`=1 `=1 i∈C`

Global solution : NP-hard problem

A local solution (not necessarily the optimal partition) can be attained


using a simple algorithm : K-means

Gilles Gasso Clustering 20 / 29


Methods of clustering K-means

K-means clustering

A popular clustering algorithm


Principle
Assume the centroids µ` , ` = 1, · · · , K are fixed
assign any point x i to one and only one cluster
x i is assigned to the closest cluster Ck (according to the distance
between x i and the clusters’ center µ1 ` )
Given the clusters C` , ` = 1, · · · , K ,
estimate their centers µ` , ` = 1, · · · , K
Repeat the previous steps until convergence

Gilles Gasso Clustering 21 / 29


Methods of clustering K-means

K-Means : illustration
Clustering in K = 2 classes

Data Initialization Iteration 1


La vérité vraie Initialisation Clusters obtenus à l’iteration 1
4 4 4

3 3 3

2 2 2

1 1 1

0 0 0

−1 −1 −1

−2
−3 −2 −1 0 1 2 3 4 5
−2 −2
−4 −2 0 2 4 6 −4 −2 0 2 4 6

Iteration 2 Iteration 3 Iteration 5


Clusters obtenus à l’iteration 2 Clusters obtenus à l’iteration 3 Clusters obtenus à l’iteration 5
4 4 4

3 3 3

2 2 2

1 1 1

0 0 0

−1 −1 −1

−2 −2 −2
−4 −2 0 2 4 6 −4 −2 0 2 4 6 −4 −2 0 2 4 6

Gilles Gasso Clustering 22 / 29


Methods of clustering K-means

K-Means : Llyod’s algorithm

Initialize the centers µ1 , · · · µK


Repeat
Assign each point x i to the closest cluster

∀i ∈ {1, · · · , N} si ← arg min kx i − µ` k2 and Ck = {i : si = k}


`

Compute the center µk of each cluster

1 X
µ` = xi with N` = card(C` )
N`
i∈C`

Until convergence

Gilles Gasso Clustering 23 / 29


Methods of clustering K-means

K-Means : example

Initial centers : plain yellow squares

Gilles Gasso Clustering 24 / 29


Methods of clustering K-means

K-Means : example

=⇒ Different initializations lead to different partitions !


Gilles Gasso Clustering 25 / 29
Methods of clustering K-means

K-Means : remarks and limitations


The criterion Jw decreases at each iteration.

The algorithm converges to (at least) a local minimum of Jw


Initialization of µk :
select randomly within the range of definition of x i
select randomly among x i
Different initializations can lead to different clusters (convergence to
local minimum)
Gilles Gasso Clustering 26 / 29
Methods of clustering K-means

K-Means : some issues


Number of clusters
Difficult to asses the number of clusters because of a lack of a ground
truth to compare with
Fixed a priori (e.g. : we want to split customers into K segments)
Use the "elbow trick" on the variation of Jw (K ) w.r.t K
Use ad-hoc metrics such as silhouette score

Gilles Gasso Clustering 27 / 29


Methods of clustering K-means

Sequential K-Means
Adaptation of K-means for streaming samples

Initialize µ1 , · · · µK
Repeat
acquire x
assign x to the nearest cluster

C` ← x such that ` = arg min kx − µk k2


k

increment n`
recompute the center µ` of this cluster

1
µ` = µ` + (x − µ` )
n`

Gilles Gasso Clustering 28 / 29


Methods of clustering K-means

Conclusion

Clustering : unsupervised learning


Data grouping into homogeneous clusters
The number of clusters is a hyper-parameter that depends on the
application but can be determined on the basis of some metrics such
as silhouette score
Several algorithm : hierarchical clustering, K-means, but also DBScan,
Spectral clustering, . . .

Gilles Gasso Clustering 29 / 29

You might also like