0% found this document useful (0 votes)

60 views9 pages

Clustering MIT 15.097 Course Notes

Uploaded by

alok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views9 pages

Clustering MIT 15.097 Course Notes

Uploaded by

alok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Clustering

MIT 15.097 Course Notes

Cynthia Rudin and Şeyda Ertekin
Credit: Dasgupta, Hastie, Tibshirani, Friedman

Clustering (a.k.a. data segmentation) Let’s segment a collection of examples

into “clusters” so that objects within a cluster are more closely related to one
another than objects assigned to diﬀerent clusters. We want to assign each ex
ample xi to a cluster k ∈ {1, ...., K}.

The K-Means algorithm is a very popular way to do this. It assumes points lie
in Euclidean space.

Input: Finite set {xi }m 1=1 , xi ∈ R

Output: z1 , ..., zK cluster centers

Goal: Minimize
cost(z1 , ..., zK ) := min Ixi − zk I22 .
k
i
The choice of the squared norm is fortuitous, it really helps simplify the math!

If we’re given points {zk }k , they can induce a Voronoi partition of Rn : they
break the space into cells where each cell corresponds to one of the zk ’s. That
is, each cell contains the region of space whose nearest representative is zk .

Draw a picture

We can look at the examples in each of these regions of space, which are the
clusters. Speciﬁcally,
Ck := {xi : the closest representative to xi is zk }.
Let’s compute the cost another way. Before, we summed over examples, and
then picked the right representative zk for each example. This time, we’ll sum
over clusters, and look at all the examples in that cluster:
cost(z1 , ..., zK ) = Ixi − zk I22 .
k {i:xi ∈Ck }

1
While we’re analyzing, we’ll need to consider suboptimal partitions of the data,
where an example might not be assigned to the nearest representative. So we
redefine the cost:
X X
cost(C1 , ..., CK ; z1 , ..., zK ) = kxi − zk k22 . (1)
k {i:xi ∈Ck }

Let’s say we only have one cluster to deal with. Call it C. The representative is
z. The cost is then: X
cost(C; z) = kxi − zk22 .
{i:xi ∈C }
Where should we place z?

As you probably guessed, we would put it at the mean of the examples in C. But
also, the additional cost incurred by picking z 6= mean(C) can be characterized
very simply:

Lemma 1. For any set C ⊂ Rn and any z ∈ Rn ,

cost(C; z) = cost(C, mean(C)) + |C | · kz − mean(C)k22 .
Let’s go ahead and prove it. In order to do that, we need to do another bias-
variance decomposition (this one’s pretty much identical to one of the ones we
did before).

Lemma 2. Let X ∈ Rn be any random variable. For any z ∈ Rn , we have:

EX kX − zk22 = EX kX − EX Xk22 + kz − EX Xk22 .
Proof. Let x̄ := EX X.
X
EX kX − zk22 = EX (X (j) − z (j) )2
j
X
= EX (X (j) − x̄(j) + x̄(j) − z j )2
j
X X
(j) (j) 2
= EX (X − x̄ ) + EX (x̄(j) − z j )2
j j
X
+2EX (X (j) − x̄(j) )(x̄(j) − z (j) )
j
= EX kX − x̄k22 + EX kx̄ − zk22 + 0.

2
To prove Lemma 1, pick a specific choice for X, namely X is a uniform random
draw from the points xi in set C. So X has a discrete distribution. What will
happen with this choice of X is that the expectation will reduce to the cost we
already defined above.
X
EX kX − zk22 = (prob. that point i is chosen)kxi − zk22
{i:xi ∈C }
X 1 1
= kxi − zk22 = cost(C, z) (2)
|C | |C|
{i:xi ∈C}

and if we use Lemma 2 substituting z to be x̄ (a.k.a., EX X, or mean(C)) and

simplify as in (2):
1
EX kX − x̄k22 = cost(C, mean(C)). (3)
|C |
We had already defined cost earlier, and the choice of X was nice because its
expectation is just the cost. Let’s recopy Lemma 2’s statement here, using the
x̄ notation:
EX kX − zk22 = EX kX − x̄k22 + kz − x̄k22 .
Plugging in (2) and (3),
1 1
cost(C, z) = cost(C, mean(C)) + kz − x̄k22 . (4)
|C| |C |
Multiplying through,
cost(C; z) = cost(C, mean(C)) + |C| · kz − mean(C)k22 .
And that’s the statement of Lemma 1.

To really minimize the cost (1), you’d need to try all possible assignments of the
m data points to K clusters. Uck! The number of distinct assignments is (Jain
and Dubes 1988):
K
1 X K m
S(m, K) = (−1)K−k k
K! k
k=1

S(10, 4) = 34K, S(19, 4) ≈ 1010 , ... so not doable.

Let’s try some heuristic gradient-descent-ish method instead.

3
The K-Means Algorithm

Choose the value of K before you start.

Initialize centers z1 , ..., zK ∈ Rn and clusters C1 , ..., CK in any way.

Repeat until there is no further change in cost:
for each k: Ck ← {xi : the closest representative is zk }
for each k: zk = mean(Ck )

This is simple enough, and takes O(Km) time per iteration.

PPT demo

Of course, it doesn’t always converge to the optimal solution.

But does the cost converge?

Lemma 3. During the course of the K-Means algorithm, the cost monotonically
decreases.
(t) (t) (t) (t)
Proof. Let z1 , ..., zK , C1 , ..., CK denote the centers and clusters at the start of
the tth iterate of K-Means. The ﬁrst step of the iteration assigns each data point
to its closest center, therefore, the cluster assignment is better:
(t+1) (t+1) (t) (t) (t) (t) (t) (t)
cost(C1 , ..., CK , z1 , ..., zK ) ≤ cost(C1 , ..., CK , z1 , ..., zK ).

On the second step, each cluster is re-centered at its mean, so the representatives
are better. By Lemma 1,
(t+1) (t+1) (t+1) (t+1) (t+1) (t+1) (t) (t)
cost(C1 , ..., CK , z1 , ..., zK ) ≤ cost(C1 , ..., CK , z1 , ..., zK ).

So does the cost converge?

4
Example of how K-Means could converge to the wrong thing

How might you make K-Means more likely to converge to the optimal?

How might you choose K? (Why can’t you measure test error?)

Other ways to evaluate clusters (“cluster validation”)

There are loads of cluster validity measures, alternatives to the cost. Draw a picture
• Davies-Baldwin Index - looks at average intracluster distance (within-cluster
distance) to the centroid (want it to be small), and intercluster distances
between centroids (want it to be large).
• Dunn Index - looks pairwise at minimal intercluster distance (want it to be
large) and maximal intracluster distance (want it to be small).

Example: Microarray data. Have 6830 genes (rows) and 64 patients (columns).
The color of each box is a measurement of the expression level of a gene. The
expression level of a gene is basically how much of its special protein it is pro
ducing. The physical chip itself doesn’t actually measure protein levels, but a
proxy for them (which is RNA, which sticks to the DNA on the chip). If the
color is green, it means low expression levels, if the color is red, it means higher
expression levels. Each patient is represented by a vector, which is the expression
level of their genes. It’s a column vector with values given in color:

Each patient (column) has some type of cancer. Want to cluster patients to see
whether patients with the same types of cancers cluster together. So each cluster
center is an “average” patient expression level vector for some type of cancer.
It’s also a column vector
Sum of Squares (104)

2 4 6 8 10
Number of Clusters K

Hm, there’s no kink in this ﬁgure. Compare K = 3 solution with “true” clusters:

Cluster Breast CNS Colon K562 Leukemia MCF7

1 3 5 0 0 0 0
2 2 0 0 2 6 2
3 2 0 7 0 0 0
Cluster Melanoma NSCLC Ovarian Prostate Renal Unknown
1 1 7 6 2 9 1
2 7 2 0 0 0 0
3 0 0 0 0 0 0

Images by MIT OpenCourseWare, adapted from Hastie et al., The Elements of Statistical Learning,
Springer, 2009.
6
It’s pretty good at keeping the same cancers in the same cluster. The two breast
cancers in the 2nd cluster were actually melanomas that metastasized.

Generally we cluster genes, not patients. Would really like to get something like
this in practice:

Courtesy of the Rockefeller University Press. Used with permission.

Figure 7 from Rumfelt, Lynn, et al. "Lineage Specification and Plasticity in CD19- Early B
cell Precursors." Journal of Experimental Medicine 203 (2006): 675-87.

where each row is a gene, and the columns are diﬀerent immune cell types.

A major issue with K-means: as K changes, cluster membership can change

arbitrarily. A solution is Hierarchical Clustering.
• clusters at the next level of the hierarchy are created by merging clusters at
the next lowest level.
– lowest level: each cluster has 1 example
– highest level: there’s only 1 cluster, containing all of the data.

7
NSCLC
BREAST COLON
COLON
COLON
COLON
COLON
COLON
COLON MCF7D-repro
BREAST
BREAST MCF7A-repro
NSCLC
NSCLC
BREAST
CNS
CNS CNS
CNS
CNS
RENAL
PROSTATE
OVARIAN
PROSTATE
NSCLC
NSCLC NSCLC
OVARIAN
OVARIAN
NSCLC RENAL
RENAL
RENAL
RENAL
RENAL
RENAL
RENAL
MELANOMA
NSCLC OVARIAN
UNKNOWN
OVARIAN
OVARIAN
NSCLC
BREAST
RENAL MELANOMA
MELANOMA
MELANOMA
MELANOMA
MELANOMA
MELANOMA BREAST
MELANOMA BREAST
LEUKEMIA
LEUKEMIA K562A-repro
K562B-repro
LEUKEMIA
LEUKEMIA
LEUKEMIA
LEUKEMIA

Image by MIT OpenCourseWare, adapted from Hastie et al., The Elements of

Statistical Learning, Springer, 2009.

Application Slides

8
MIT OpenCourseWare
http://ocw.mit.edu

15.097 Prediction: Machine Learning and Statistics

Spring 2012

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Final Exam
No ratings yet
Final Exam
5 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
kdd10 Thclust
No ratings yet
kdd10 Thclust
185 pages
K Means
No ratings yet
K Means
33 pages
Lecture08b Kmeans
No ratings yet
Lecture08b Kmeans
10 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Cluster Center Initialization Algorithm For K-Means Clustering
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
10 pages
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
No ratings yet
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
28 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
10 K Means Clustering PDF
No ratings yet
10 K Means Clustering PDF
5 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
Lec6 K-Means, Niavebase, KNN
No ratings yet
Lec6 K-Means, Niavebase, KNN
25 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering
No ratings yet
Clustering
6 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
No ratings yet
K-Means++: The Advantages of Careful Seeding: David Arthur and Sergei Vassilvitskii
11 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
K-Medias, Mezcla de Gausianas y Un Ejemplo
No ratings yet
K-Medias, Mezcla de Gausianas y Un Ejemplo
6 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
SpectralClustering Lectures
No ratings yet
SpectralClustering Lectures
162 pages
K Means
No ratings yet
K Means
36 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Lecture Notes On Clustering
No ratings yet
Lecture Notes On Clustering
10 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Wk03 Machine Learning
No ratings yet
Wk03 Machine Learning
5 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Rmbi1020 Lec07 Clustering
No ratings yet
Rmbi1020 Lec07 Clustering
41 pages
Clustering
No ratings yet
Clustering
55 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Algo
No ratings yet
Algo
59 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Roch Mmids Intro 3clustering
No ratings yet
Roch Mmids Intro 3clustering
15 pages
Unit IV
No ratings yet
Unit IV
96 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
02 - Clustering
No ratings yet
02 - Clustering
43 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
Kmea
No ratings yet
Kmea
53 pages
Kmean
No ratings yet
Kmean
24 pages
Report 1
No ratings yet
Report 1
3 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
Deep Learning: Computer Science and Engineering
No ratings yet
Deep Learning: Computer Science and Engineering
18 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
12 Vision and Language
No ratings yet
12 Vision and Language
68 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
No ratings yet
Rule Mining and The Apriori Algorithm: M I, 2 I, 3 I 1 I, 5
6 pages
MV Forced Vibrations Notes
No ratings yet
MV Forced Vibrations Notes
51 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
No ratings yet
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
5 pages
THEORY FILE - Machine Learning (6th Sem) !!
No ratings yet
THEORY FILE - Machine Learning (6th Sem) !!
26 pages
Full ML Viva Questions Answers Q1 To Q70
No ratings yet
Full ML Viva Questions Answers Q1 To Q70
6 pages
An Insight Into Machine Learning Techniq
No ratings yet
An Insight Into Machine Learning Techniq
8 pages
AudioVisual Video Summarization
No ratings yet
AudioVisual Video Summarization
8 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Clustering Tuberculosis in Children Using K-Means Based On Geographic Information System
No ratings yet
Clustering Tuberculosis in Children Using K-Means Based On Geographic Information System
10 pages
Summer Training Report
No ratings yet
Summer Training Report
36 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
AI&ML Unit 5
No ratings yet
AI&ML Unit 5
122 pages
Zeraietal2023 NaturalResourcesResearch Mineralprospectivitymappingeritrea
No ratings yet
Zeraietal2023 NaturalResourcesResearch Mineralprospectivitymappingeritrea
31 pages
Survey On Predictive Medical Data Analysis Using Pattern Recognition Algorithm
No ratings yet
Survey On Predictive Medical Data Analysis Using Pattern Recognition Algorithm
7 pages
A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta
No ratings yet
A Movie Recommender System: MOVREC: Manoj Kumar D.K. Yadav Ankur Singh Vijay Kr. Gupta
5 pages
ML For Perovskite Solar Cells
No ratings yet
ML For Perovskite Solar Cells
18 pages
Plant Health Monitoring Using Digital Image Processing
No ratings yet
Plant Health Monitoring Using Digital Image Processing
5 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Module4 DS PPT
No ratings yet
Module4 DS PPT
49 pages
This Study Resource Was
No ratings yet
This Study Resource Was
4 pages
CP 7
No ratings yet
CP 7
2 pages
En Tanagra Kohonen SOM R
No ratings yet
En Tanagra Kohonen SOM R
21 pages
IBM Applied Data Science - Cspstone Project - Final Report
No ratings yet
IBM Applied Data Science - Cspstone Project - Final Report
7 pages
Journal Homepage: - : Introduction
No ratings yet
Journal Homepage: - : Introduction
30 pages
Day13 K Means Clustering
No ratings yet
Day13 K Means Clustering
4 pages
ML Lab Mannual
No ratings yet
ML Lab Mannual
29 pages
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
No ratings yet
Intelligent Systems Notes: Federico Rossi A.A 2017/2018
34 pages
Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data
No ratings yet
Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data
26 pages
Cps 8210 Assignment 2
No ratings yet
Cps 8210 Assignment 2
3 pages

Clustering MIT 15.097 Course Notes

Uploaded by

Clustering MIT 15.097 Course Notes

Uploaded by

Clustering

MIT 15.097 Course Notes

Clustering (a.k.a. data segmentation) Let’s segment a collection of examples

Input: Finite set {xi }m 1=1 , xi ∈ R

Output: z1 , ..., zK cluster centers

Lemma 1. For any set C ⊂ Rn and any z ∈ Rn ,

Lemma 2. Let X ∈ Rn be any random variable. For any z ∈ Rn , we have:

and if we use Lemma 2 substituting z to be x̄ (a.k.a., EX X, or mean(C)) and

S(10, 4) = 34K, S(19, 4) ≈ 1010 , ... so not doable.

Choose the value of K before you start.

Initialize centers z1 , ..., zK ∈ Rn and clusters C1 , ..., CK in any way.

This is simple enough, and takes O(Km) time per iteration.

Of course, it doesn’t always converge to the optimal solution.

But does the cost converge?

So does the cost converge?

Other ways to evaluate clusters (“cluster validation”)

Cluster Breast CNS Colon K562 Leukemia MCF7

Courtesy of the Rockefeller University Press. Used with permission.

A major issue with K-means: as K changes, cluster membership can change

Image by MIT OpenCourseWare, adapted from Hastie et al., The Elements of

15.097 Prediction: Machine Learning and Statistics

You might also like