Clustering: CMPUT 466/551 Nilanjan Ray

The document discusses various clustering algorithms and techniques. It defines clustering as assigning labels to data points such that points close together receive the same label. It describes k-means clustering, which iteratively assigns points to clusters based on distance to cluster means. The document also discusses hierarchical clustering, which recursively merges or splits clusters based on distance thresholds. It provides examples of linkage functions used to determine cluster distances in hierarchical clustering.

Uploaded by

Richa Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views34 pages

Clustering: CMPUT 466/551 Nilanjan Ray

Uploaded by

Richa Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Clustering

CMPUT 466/551
Nilanjan Ray
What is Clustering?
Attach label to each observation or data points in a set
You can say this unsupervised classification
Clustering is alternatively called as grouping
Intuitively, if you would want to assign same label to a
data points that are close to each other
Thus, clustering algorithms rely on a distance metric
between data points
Sometimes, it is said that the for clustering, the distance
metric is more important than the clustering algorithm
Distances: Quantitative Variables
S
o
m
e

e
x
a
m
p
l
e
s

T
ip i i
x x x ] [
1
=
Data point:
Distances: Ordinal and Categorical
Variables
Ordinal variables can be forced to lie within (0, 1) and then
a quantitative metric can be applied:

For categorical variables, distances must be specified by
user between each pair of categories.
M k
M
k
, , 2 , 1 ,
2 / 1
=

Combining Distances
Often weighted sum is used:
. 0 , 1 , ) , ( ) , (
1 1
> = =

= =
l
p
l
l
p
l
jl il l j i
w w x x d w x x D
Combinatorial Approach
In how many ways can we assign K labels to N
observations?

For each such possibility, we can compute a cost. Pick
up the assignment with best cost.

Formidable number of possible assignments:
(Ill post a page about the origin of this formula)
K-means Overview
An unsupervised clustering algorithm
K stands for number of clusters, it is typically a user
input to the algorithm; some criteria can be used to
automatically estimate K
It is an approximation to an NP-hard combinatorial
optimization problem
K-means algorithm is iterative in nature
It converges, however only a local minimum is obtained
Works only for numerical data
Easy to implement
K-means: Setup
x
1
,, x
N
are data points or vectors of observations

Each observation (vector x
i
) will be assigned to one and only one cluster

C(i) denotes cluster number for the i
th
observation

Dissimilarity measure: Euclidean distance metric

K-means minimizes within-cluster point scatter:

= = = = =
= =
K
k k i C
k i k
K
k k i C k j C
j i
m x N x x C W
1 ) (
2
1 ) ( ) (
2
2
1
) (
where

m
k
is the mean vector of the k
th
cluster

N
k
is the number of observations in k
th
cluster

(Exercise)
Within and Between Cluster Criteria
) ( ) (
) ) , ( ) , ( (
2
1
1 ) ( ) ( ) (
C B C W
x x d x x d T
K
k k i C k j C
j i
k j C
j i
+ =
+ =

= = = =
Lets consider total point scatter for a set of N data points:

= =
=
N
i
N
j
j i
x x d T
1 1
) , (
2
1
Distance between two points
T can be re-written as:
Where,

= = =
=
K
k k i C k j C
j i
x x d C W
1 ) ( ) (
) , (
2
1
) (

= = =
=
K
k k i C k j C
j i
x x d C B
1 ) ( ) (
) , (
2
1
) (
If d is square Euclidean distance, then

= =
=
K
k k i C
k i k
m x N C W
1 ) (
2
) (
and

=
=
K
k
k k
m m N C B
1
2
) (
Within cluster
scatter
Between cluster
scatter
Minimizing W(C) is equivalent to maximizing B(C)
Grand mean
Ex.
K-means Algorithm
For a given cluster assignment C of the data points,
compute the cluster means m
k
:

For a current set of cluster means, assign each
observation as:

Iterate above two steps until convergence
. , , 1 ,
) ( :
K k
N
x
m
k
k i C i
i
k
= =

=
N i m x i C
K k
k i
, , 1 , min arg ) (
1
2
= =
s s
K-means clustering example
K-means Image Segmentation
An image (I) Three-cluster image (J) on
gray values of I
Matlab code:

I = double(imread('));

J = reshape(kmeans(I(:),3),size(I));
Note that K-means result is noisy
K-means: summary
Algorithmically, very simple to implement

K-means converges, but it finds a local minimum of the
cost function

Works only for numerical observations

K is a user input; alternatively BIC (Bayesian information
criterion) or MDL (minimum description length) can be
used to estimate K

Outliers can considerable trouble to K-means
K-medoids Clustering
K-means is appropriate when we can work with
Euclidean distances
Thus, K-means can work only with numerical,
quantitative variable types
Euclidean distances do not work well in at least two
situations
Some variables are categorical
Outliers can be potential threats
A general version of K-means algorithm called K-
medoids can work with any distance measure
K-medoids clustering is computationally more intensive
K-medoids Algorithm
Step 1: For a given cluster assignment C, find the
observation in the cluster minimizing the total distance to
other points in that cluster:

Step 2: Assign

Step 3: Given a set of cluster centers {m
1
, , m
K
},
minimize the total error by assigning each observation to
the closest (current) cluster center:

Iterate steps 1 to 3
. ) , ( min arg
) (
} ) ( : {

=
=
-
=
k j C
j i
k i C i
k
x x d i
K k x m
k
i
k
, , 2 , 1 , = =
-
N i m x d i C
k i
K k
, , 1 ), , ( min arg ) (
1
= =
s s
K-medoids Summary
Generalized K-means
Computationally much costlier that K-means
Apply when dealing with categorical data
Apply when data points are not available, but
only pair-wise distances are available
Converges to local minimum
Choice of K?
Can W
K
(C), i.e., the within cluster distance as a function
of K serve as any indicator?
Note that W
K
(C) decreases monotonically with
increasing K. That is the within cluster scatter decreases
with increasing centroids.
Instead look for gap statistics (successive difference
between W
K
(C)):
} : { } : {
*
1
*
1
K K W W K K W W
K K K K
> >> <
+ +
Choice of K
Data points simulated
from two pdfs
Gap curve
Log(W
K
) curve
This is essentially a visual heuristic
Vector Quantization
A codebook (a set of centroids/codewords):

A quantization function:

K-means can be used to construct the codebook
} , , , {
2 1 K
m m m
k i
m x q = ) (
Often, the nearest-neighbor function
Image Compression by VQ
8 bits/pixel 1.9 bits/pixel,
using 200 codewords
0.5 bits/pixel,
using 4 codewords
Otsus Image Thresholding Method
Based on the clustering idea: Find the threshold that
minimizes the weighted within-cluster point scatter.

This turns out to be the same as maximizing the
between-class scatter.

Operates directly on the gray level histogram [e.g. 256
numbers, P(i)], so its fast (once the histogram is
computed).
Otsus Method
Histogram (and the image) are bimodal.

No use of spatial coherence, nor any other
notion of object structure.

Assumes uniform illumination (implicitly), so
the bimodal brightness behavior arises from
object appearance differences only.
The weighted within-class variance is:
o
w
2
(t) = q
1
(t)o
1
2
(t) + q
2
(t)o
2
2
(t)
Where the class probabilities are estimated as:
q
1
(t) = P(i)
i =1
t

q
2
(t) = P(i)
i = t +1
I

1
(t) =
iP(i)
q
1
(t)
i =1
t

2
(t) =
iP(i)
q
2
(t )
i =t +1
I

And the class means are given by:

Finally, the individual class variances are:
o
1
2
(t) = [i
1
(t)]
2
P(i)
q
1
(t)
i=1
t

o
2
2
(t) = [i
2
(t)]
2
P(i)
q
2
(t)
i =t +1
I

Now, we could actually stop here. All we need to do is just

run through the full range of t values [1, 256] and pick the
value that minimizes .
But the relationship between the within-class and between-
class variances can be exploited to generate a recursion
relation that permits a much faster calculation.
o
w
2
(t)
Finally...
q
1
(t +1) = q
1
(t) + P(t +1)

1
(t +1) =
q
1
(t)
1
(t) + (t +1)P(t +1)
q
1
(t +1)
q
1
(1) = P(1)

1
(0) =0
;

2
(t +1) =
q
1
(t +1)
1
(t +1)
1 q
1
(t +1)
Initialization...
Recursion...
After some algebra, we can express the total variance as...
o
2
= o
w
2
(t) + q
1
(t)[1 q
1
(t)][
1
(t)
2
(t)]
2
Within-class,
from before
Between-class,
Since the total is constant and independent of t, the effect of
changing the threshold is merely to move the contributions of
the two terms back and forth.
So, minimizing the within-class variance is the same as
maximizing the between-class variance.
The nice thing about this is that we can compute the quantities
in recursively as we run through the range of t values.
o
B
2
(t)
o
B
2
(t)
Result of Otsus Algorithm
An image
Binary image
by Otsus method
0 50 100 150 200 250 300
0
0.01
0.02
0.03
0.04
0.05
0.06
Gray level histogram
Matlab code:

I = double(imread('));

I = (I-min(I(:)))/(max(I(:))-min(I(:)));

J = I>graythresh(I);
Hierarchical Clustering
Two types: (1) agglomerative (bottom up), (2) divisive (top down)
Agglomerative: two groups are merged if distance between them is
less than a threshold
Divisive: one group is split into two if intergroup distance more than
a threshold
Can be expressed by an excellent graphical representation called
dendogram, when the process is monotonic: dissimilarity between
merged clusters is increasing. Agglomerative clustering possesses
this property. Not all divisive methods possess this monotonicity.
Heights of nodes in a dendogram are proportional to the threshold
value that produced them.

An Example Hierarchical Clustering
Linkage Functions
ij
H j
G i
SL
d H G d
e
e
= min ) , (
ij
H j
G i
CL
d H G d
e
e
= max ) , (

e e
=
G i H j
ij
H G
GA
d
N N
H G d
1
) , (
Linkage functions computes the dissimilarity between two groups of data points:
Single linkage (minimum distance between two groups):
Complete linkage (maximum distance between two groups):
Group average (average distance between two groups):
Linkage Functions
SL considers only a single pair of data points; if this pair
is close enough then action is taken. So, SL can form a
chain by combining relatively far apart data points.
SL often violates the compactness property of a cluster.
SL can produce clusters with large diameters (D
G
).

CL is just the opposite of SL; it produces many clusters
with small diameters.
CL can violate closeness property- two close data
points may be assigned to different clusters.
GA is a compromise between SL and CL
ij
G j G i
G
d D
e e
=
,
max
Different Dendograms
Hierarchical Clustering on
Microarray Data
Hierarchical Clustering Matlab
Demo

Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Cluster
No ratings yet
Cluster
50 pages
Lecture08b Kmeans
No ratings yet
Lecture08b Kmeans
10 pages
ML Lec13
No ratings yet
ML Lec13
3 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Clustering Part1
No ratings yet
Clustering Part1
19 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
K Means
No ratings yet
K Means
3 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Unit IV
No ratings yet
Unit IV
51 pages
4.1 Clustering
No ratings yet
4.1 Clustering
80 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
No ratings yet
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
28 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
Kmea
No ratings yet
Kmea
53 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Clustering
No ratings yet
Clustering
55 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
No ratings yet
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
13 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Clustering MIT 15.097 Course Notes
No ratings yet
Clustering MIT 15.097 Course Notes
9 pages
Clustering
No ratings yet
Clustering
35 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Clustering
No ratings yet
Clustering
75 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Chapter 04 Clustering
No ratings yet
Chapter 04 Clustering
36 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K Means
No ratings yet
K Means
33 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
No ratings yet
K-Means Clustering: CMPUT 615 Applications of Machine Learning in Image Analysis
13 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Unit 4
No ratings yet
Unit 4
125 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet

Clustering: CMPUT 466/551 Nilanjan Ray

Uploaded by

Clustering: CMPUT 466/551 Nilanjan Ray

Uploaded by

Clustering

And the class means are given by:

Now, we could actually stop here. All we need to do is just

You might also like