0% found this document useful (0 votes)
18 views4 pages

ECS7020P UnsupervisedExercisesSolutions

This document discusses principles of machine learning and unsupervised learning. It provides solutions to exercises on clustering a 2D dataset using k-means clustering and analyzing the results. Key steps include plotting the dataset, initializing cluster prototypes, assigning samples, updating prototypes, and calculating intra-cluster scatter.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

ECS7020P UnsupervisedExercisesSolutions

This document discusses principles of machine learning and unsupervised learning. It provides solutions to exercises on clustering a 2D dataset using k-means clustering and analyzing the results. Key steps include plotting the dataset, initializing cluster prototypes, assigning samples, updating prototypes, and calculating intra-cluster scatter.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

P RINCIPLES OF M ACHINE L EARNING

U NSUPERVISED LEARNING
A CADEMIC YEAR 2021/2022
Q UEEN M ARY U NIVERSITY OF L ONDON

1
S OLUTIONS

E XERCISE ]1 (S OL ): Let’s plot the dataset first:


8

0
0 1 2 3 4 5 6 7 8

The histograms for the marginal densities p(x) and p(y) are:
6/18 6/18

5/18 5/18

4/18 4/18

3/18 3/18

2/18 2/18

1/18 1/18

0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

The histograms for the marginal densities p(x|z = A) and p(x|z = B) are:

6/14

5/14
2/4

4/14

3/14

1/4
2/14

1/14

0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

The mean µ of p(x, y) can be calculated by averaging x and y. The result is µ = [4.3, 4.3]T . We
shouldn’t expect Σ to be diagonal: the plot shows that x and y are not independent,
specifically, higher values of x are associated to higher values of y.

As previously mentioned, µ = [µx , µy ]T , where µx and µy are the means of the marginal
densities p(x) and p(y). The probability density p(x, y) cannot be expressed as the product of
p(x) and p(y), as a and y are not independent.

2
If we treat the samples where z = A and z = B separately, we can build two new probability
densities, namely p(x, y|z = A) and p(x, y|z = B). Their means are µA = [2, 2]T and
µB = [5, 5]T . Note that µ = (4 × µA + 14 × µB ) /18, where 4 is the number of A samples, 14 is
the number of B samples and 18 is the total number of samples in the dataset.

After applying PCA we obtain:


10

1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Note that PCA rotates the dataset so that the directions along which data spreads are aligned
with one of the new axes. We can obtain the transformation visually or use Python (Matlab,
R...).

In this case, we would expect the covariance matrix of the new probability distribution
p(x0 , y 0 ) to be diagonal, as the new attributes (components) are independent. Independence also
means that p(x0 , y 0 ) = p(x0 )p(y 0 ).

E XERCISE ]2 (S OL ): The first step is always to plot the dataset:


7

0
0 1 2 3 4 5 6 7

3
T T
If the initial values of the prototypes are µ1 = [0, 0] and µ2 = [7, 7] the 2 identified clusters
will be as follows:
7

0
0 1 2 3 4 5 6 7

T T
The final prototypes will be in the locations µ1 = [2, 2] and µ2 = [5, 5] . The intra-cluster
sample scatter can be calculated by adding the distances between any two samples within a
cluster or by adding the distances between the samples and the cluster center. Let’s do it both
ways for the first cluster:
I1 = (12 + 12 ) + (22 + 02 ) + (12 + 12 ) + (12 + 12 ) + (02 + 22 ) + (12 + 12 ) =


= (2 + 4 + 2 + 2 + 4 + 2) = 16 (distances between two pair of samples)


I1 = 4 × (12 + 02 ) + (12 + 02 ) + (12 + 02 ) + (12 + 02 ) = 16 (distances to cluster centers)


Cluster 2 has the same intra-cluster sample scatter, I2 = 16. Therefore, the overall quality is
T T
I1 + I2 = 32. If the initial values of the prototypes are µ1 = [1, 6] and µ2 = [6, 1] the 2
identified clusters will be:
7

0
0 1 2 3 4 5 6 7

T T
The final prototypes will be in the locations µ1 = [4, 3] and µ2 = [3, 4] . The intra-cluster
sample scatter will be the same for both. Let’s calculate it for the first cluster based on the
distances to the cluster centre:
I1 = 4 × (12 + 12 ) + (22 + 22 ) + (12 + 12 ) + (22 + 22 ) = 4 × (2 + 8 + 2 + 8) = 80


The quality of this clustering arrangement is I1 + I2 = 160.

k-means might provide different final solutions depending on the initial location of
prototypes. k-mean solutions can be seen as local minima where the algorithm gets stuck. In
this case, the first clustering arrangement happens to also be the global minimum.

You might also like