0% found this document useful (0 votes)

6 views56 pages

QSRI Lecture4

The document discusses unsupervised learning techniques, specifically Principal Components Analysis (PCA) and K-means clustering, to address the challenge of high-dimensional data in diagnosing leukaemia using gene expression data. PCA is used to reduce the number of features by maximizing variance, while K-means clustering groups similar data points based on distance metrics. The iterative K-means algorithm assigns points to clusters and updates cluster centroids until convergence.

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views56 pages

QSRI Lecture4

Uploaded by

Len McLemore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Unsupervised learning: PCA and k-means

clustering

Seth Flaxman1

Imperial College London

3 July 2019

1
Based on slides from Simon Rogers & Maurizio Filippone
A problem - too many features
I Aim: To build a classifier that can diagnose leukaemia using
Gene expression data.
I Data: 27 healthy samples, 11 leukaemia samples (N = 38). Each
sample is the expression (activity) level for 3751 genes. (Also have
an independent test set)

I In general, the number of parameters will increase with the number

of features – d = 3751.
I e.g. Logistic regression – w would have length 3751!

I Fitting lots of parameters is hard

Features

I For visualisation, most examples we’ve seen have had only 2

features x = [x1 , x2 ]T .
I Now, we’ve been given lots (3751) to start with.
I We need to reduce this number.
Features

I For visualisation, most examples we’ve seen have had only 2

features x = [x1 , x2 ]T .
I Now, we’ve been given lots (3751) to start with.
I We need to reduce this number.
I 2 general schemes:
I Use a subset of the originals.
I Make new ones by combining the originals.
Making new features

I An alternative to choosing features is making new ones.

Making new features

I An alternative to choosing features is making new ones.

I Cluster:
I Cluster the features (turn our clustering problem around)
I If we use say K-means, our new features will be the K mean
vectors.
Making new features

I An alternative to choosing features is making new ones.

I Cluster:
I Cluster the features (turn our clustering problem around)
I If we use say K-means, our new features will be the K mean
vectors.
I Projection/combination
I Reduce the number of features by projecting into a lower
dimensional space.
I Do this by making new features that are combinations (linear)
of the old ones.
Projection

A 3-dimensional
object

A 2-dimensional
projection
Projection

I We can project data (d dimensions) into a lower number of

dimensions (m).
I Z = XW
I X is N × d
I W is d × m
I Z is N × m – an m-dimensional representation of our N
objects.
I W defines the projection
I Changing W is like changing where the light is coming from
for the shadow (or rotating the hand).
I (X is the hand, Z is the shadow)

I Once we’ve chosen W we can project test data into this new
space too: Znew = Xnew W
Choosing W
I Different W will give us different projections (imagine moving
the light).
I Which should we use?
Choosing W
I Different W will give us different projections (imagine moving
the light).
I Which should we use?
I Not all will represent our data well...

This doesn't look

like a hand!
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj

I PCA chooses wj to maximise the variance of zj

N N
1 X 1 X
(zjn − µj )2 , µj = zjn
N N
n=1 n=1
Principal Components Analysis
I Principal Components Analysis (PCA) is a method for
choosing W.
I It finds the columns of W one at a time (define the jth
column as wj ).
I Each d × 1 column defines one new dimension.
I Consider one of the new dimensions (columns of Z):

zj = Xwj

I PCA chooses wj to maximise the variance of zj

N N
1 X 1 X
(zjn − µj )2 , µj = zjn
N N
n=1 n=1

I Once the first one has been found, w2 is found that maximises
the variance and is orthogonal to the first one etc etc.
PCA – a visualisation

1
x2
0

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

I Original data in 2-dimensions.

I We’d like a 1-dimensional projection.
PCA – a visualisation

1 σ z2 = 0.39

x2
0

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

I Pick some arbitrary w.

I Project the data onto it.
I Compute the variance (on the line).
I The position on the line is our 1 dimensional representation.
PCA – a visualisation

1 σ z2 = 0.39

x2
0

−1

labeled by class).
Summary

I Sometimes we have too much data (too many dimensions).

I Need to select features.
I Features can be dimensions that already exist.
I Or we can make new ones.
I We’ve seen one example of each.
Clustering

I What if we just have xn ?

Clustering

I For example:
I xn is a binary vector indicating products customer n has
bought.
I Can group customers that buy similar products.
I Can group products bought together.
Clustering

I For example:
I xn is a binary vector indicating products customer n has
bought.
I Can group customers that buy similar products.
I Can group products bought together.
I Known as Clustering
I And is an example of unsupervised learning.
Clustering
5 5

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3

0 2 4 6 0 2 4 6

I In this example each object has two attributes:

xn = [xn1 , xn2 ]T
I Left: data.
I Right: data after clustering (points coloured according to
cluster membership).
What we’ll cover

I K-means
I But note: there are dozens and dozens of other clustering
methods out there!
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
I Each xn is assigned to its closest cluster:
6

2
x2

−2

−4

−6
−2 0 2 4 6
x1
K-means
I Assume that there are K clusters.
I Each cluster is defined by a position in the input space:
µk = [µk1 , µk2 ]T
I Each xn is assigned to its closest cluster:
6

2
x2

−2

−4

−6
−2 0 2 4 6
x1
I Distance is normally Euclidean distance:
dnk = (xn − µk )T (xn − µk )
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
3. znk = 1 if xn assigned to µk (0 otherwise)
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

of X.
I Use an iterative algorithm:
1. Guess µ1 , µ2 , . . . , µK
2. Assign each xn to its closest µk
3. znk = 1 if xn assigned to µk (0 otherwise)
4. Update µk to average of xn s assigned to µk :
PN
znk xn
µk = Pn=1
N
n=1 znk
How do we find µk ?

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

K-means – example

2
x2

−2

−4

−6
−2 0 2 4 6
x1

I Update mean.
K-means – example

2
x2

−2

−4

−6
−2 0 2 4 6
x1

I Solution at convergence.
When does K-means break?

1.5

x2 0.5

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x1

I Data has clear cluster structure.

I Outer cluster can not be represented as a single point.
When does K-means break?

1.5

x2 0.5

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x1

I Data has clear cluster structure.

I Outer cluster can not be represented as a single point.

R&R PC3000-6D Rev2
No ratings yet
R&R PC3000-6D Rev2
655 pages
Aspenone: Deployment Guide
100% (2)
Aspenone: Deployment Guide
40 pages
(MMEP611) : Theory and Practice of Environmental Planning
No ratings yet
(MMEP611) : Theory and Practice of Environmental Planning
13 pages
JCB Cross Ref Application Guide 2024 New 1 1
No ratings yet
JCB Cross Ref Application Guide 2024 New 1 1
8 pages
Sigma-Aldrich General Laboratory Reagents
No ratings yet
Sigma-Aldrich General Laboratory Reagents
102 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
English Term 1 Revision Worksheet
No ratings yet
English Term 1 Revision Worksheet
13 pages
Teaching Memoir Writing
No ratings yet
Teaching Memoir Writing
28 pages
ESOL International English Listening Examination Level C2 Proficient
No ratings yet
ESOL International English Listening Examination Level C2 Proficient
16 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
KPCA
No ratings yet
KPCA
26 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Module 4
No ratings yet
Module 4
63 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
PCA ChrisDing4
No ratings yet
PCA ChrisDing4
74 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
20 Pca
No ratings yet
20 Pca
50 pages
Schimmel Deciphering Signs
No ratings yet
Schimmel Deciphering Signs
287 pages
D&D 3.5 - Wizard and Sorcerer Spells
No ratings yet
D&D 3.5 - Wizard and Sorcerer Spells
6 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 4 (Presentation Kind of N
56 pages
Drying Lecture
No ratings yet
Drying Lecture
85 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Week 11 Notes
No ratings yet
Week 11 Notes
52 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
No ratings yet
Clustering and Dimensionality Reduction Techniques PCA T SNE K Means
15 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
The Metamorphica Revised
No ratings yet
The Metamorphica Revised
270 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
DimensionalityReduction Pca
No ratings yet
DimensionalityReduction Pca
24 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Pac
No ratings yet
Pac
70 pages
Week 1
No ratings yet
Week 1
19 pages
Sta 5
No ratings yet
Sta 5
16 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
Are You Looking For .: WWW - Promining.in
No ratings yet
Are You Looking For .: WWW - Promining.in
20 pages
Pca 1
No ratings yet
Pca 1
3 pages
XMAC
No ratings yet
XMAC
2 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Practical Statistics For Data Science - Chapter7
No ratings yet
Practical Statistics For Data Science - Chapter7
12 pages
M146 Lec15 Sidenotes S25
No ratings yet
M146 Lec15 Sidenotes S25
24 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
PCA
100% (1)
PCA
33 pages
Topic 2
No ratings yet
Topic 2
10 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
DATA - Dist
No ratings yet
DATA - Dist
90 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Abaqus-Modeling of Nonlinear Cyclic Load Behavior of Ishaped
No ratings yet
Abaqus-Modeling of Nonlinear Cyclic Load Behavior of Ishaped
10 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Pca
No ratings yet
Pca
6 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
Modular Electronic Devices - 2015
No ratings yet
Modular Electronic Devices - 2015
156 pages
Refraction
No ratings yet
Refraction
35 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
Dex Amg
No ratings yet
Dex Amg
9 pages
The Tale of Peter Rabbitun598
No ratings yet
The Tale of Peter Rabbitun598
6 pages
MOVIE REVIEW and Analysis
No ratings yet
MOVIE REVIEW and Analysis
5 pages
A +Fadel+Muhammad
No ratings yet
A +Fadel+Muhammad
8 pages
One App To Trace Them All Examining App Specifications For Mass Acceptance of Contact Tracing Apps
No ratings yet
One App To Trace Them All Examining App Specifications For Mass Acceptance of Contact Tracing Apps
15 pages
Usage of The Fly Ash in Hot Asphalt Mixes: Ivica, Androjić, Mag - Ing.aedif., Osijek - Koteks D.D., Croatia
No ratings yet
Usage of The Fly Ash in Hot Asphalt Mixes: Ivica, Androjić, Mag - Ing.aedif., Osijek - Koteks D.D., Croatia
10 pages
Sense Organs That Work Together
No ratings yet
Sense Organs That Work Together
8 pages
Tagging Fire Extinguisher
No ratings yet
Tagging Fire Extinguisher
5 pages
The Uveitis - Periodontal Disease Connection in Pregnancy: Controversy Between Myth and Reality
No ratings yet
The Uveitis - Periodontal Disease Connection in Pregnancy: Controversy Between Myth and Reality
5 pages
Printable Minimalism
No ratings yet
Printable Minimalism
2 pages
Lecture 1 CSU510 Introduction To Metabolism
No ratings yet
Lecture 1 CSU510 Introduction To Metabolism
13 pages
Objection Letter
No ratings yet
Objection Letter
7 pages
My Voice....
No ratings yet
My Voice....
3 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

QSRI Lecture4

Uploaded by

QSRI Lecture4

Uploaded by

Unsupervised learning: PCA and k-means

Imperial College London

I In general, the number of parameters will increase with the number

I Fitting lots of parameters is hard

I For visualisation, most examples we’ve seen have had only 2

I For visualisation, most examples we’ve seen have had only 2

I An alternative to choosing features is making new ones.

I An alternative to choosing features is making new ones.

I An alternative to choosing features is making new ones.

I We can project data (d dimensions) into a lower number of

This doesn't look

I PCA chooses wj to maximise the variance of zj

I PCA chooses wj to maximise the variance of zj

I Original data in 2-dimensions.

I Pick some arbitrary w.

I Pick some arbitrary w.

I Pick some arbitrary w.

I Could search for w1 , . . . , wM

I What would be the second component?

First two principal components in our leukaemia data (points

I Sometimes we have too much data (too many dimensions).

I What if we just have xn ?

I In this example each object has two attributes:

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

I No analytical solution – we can’t write down µk as a function

5. Return to 2 until assignments do not change.

I Cluster means randomly assigned (top left).

I Cluster means updated to mean of assigned points.

I Points re-assigned to closest mean.

I Cluster means updated to mean of assigned points.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Assign point to closest mean.

I Data has clear cluster structure.

I Data has clear cluster structure.

You might also like