ECS7020P UnsupervisedExercisesSolutions
ECS7020P UnsupervisedExercisesSolutions
U NSUPERVISED LEARNING
A CADEMIC YEAR 2021/2022
Q UEEN M ARY U NIVERSITY OF L ONDON
1
S OLUTIONS
0
0 1 2 3 4 5 6 7 8
The histograms for the marginal densities p(x) and p(y) are:
6/18 6/18
5/18 5/18
4/18 4/18
3/18 3/18
2/18 2/18
1/18 1/18
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
The histograms for the marginal densities p(x|z = A) and p(x|z = B) are:
6/14
5/14
2/4
4/14
3/14
1/4
2/14
1/14
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
The mean µ of p(x, y) can be calculated by averaging x and y. The result is µ = [4.3, 4.3]T . We
shouldn’t expect Σ to be diagonal: the plot shows that x and y are not independent,
specifically, higher values of x are associated to higher values of y.
As previously mentioned, µ = [µx , µy ]T , where µx and µy are the means of the marginal
densities p(x) and p(y). The probability density p(x, y) cannot be expressed as the product of
p(x) and p(y), as a and y are not independent.
2
If we treat the samples where z = A and z = B separately, we can build two new probability
densities, namely p(x, y|z = A) and p(x, y|z = B). Their means are µA = [2, 2]T and
µB = [5, 5]T . Note that µ = (4 × µA + 14 × µB ) /18, where 4 is the number of A samples, 14 is
the number of B samples and 18 is the total number of samples in the dataset.
1
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Note that PCA rotates the dataset so that the directions along which data spreads are aligned
with one of the new axes. We can obtain the transformation visually or use Python (Matlab,
R...).
In this case, we would expect the covariance matrix of the new probability distribution
p(x0 , y 0 ) to be diagonal, as the new attributes (components) are independent. Independence also
means that p(x0 , y 0 ) = p(x0 )p(y 0 ).
0
0 1 2 3 4 5 6 7
3
T T
If the initial values of the prototypes are µ1 = [0, 0] and µ2 = [7, 7] the 2 identified clusters
will be as follows:
7
0
0 1 2 3 4 5 6 7
T T
The final prototypes will be in the locations µ1 = [2, 2] and µ2 = [5, 5] . The intra-cluster
sample scatter can be calculated by adding the distances between any two samples within a
cluster or by adding the distances between the samples and the cluster center. Let’s do it both
ways for the first cluster:
I1 = (12 + 12 ) + (22 + 02 ) + (12 + 12 ) + (12 + 12 ) + (02 + 22 ) + (12 + 12 ) =
Cluster 2 has the same intra-cluster sample scatter, I2 = 16. Therefore, the overall quality is
T T
I1 + I2 = 32. If the initial values of the prototypes are µ1 = [1, 6] and µ2 = [6, 1] the 2
identified clusters will be:
7
0
0 1 2 3 4 5 6 7
T T
The final prototypes will be in the locations µ1 = [4, 3] and µ2 = [3, 4] . The intra-cluster
sample scatter will be the same for both. Let’s calculate it for the first cluster based on the
distances to the cluster centre:
I1 = 4 × (12 + 12 ) + (22 + 22 ) + (12 + 12 ) + (22 + 22 ) = 4 × (2 + 8 + 2 + 8) = 80
k-means might provide different final solutions depending on the initial location of
prototypes. k-mean solutions can be seen as local minima where the algorithm gets stuck. In
this case, the first clustering arrangement happens to also be the global minimum.