ECS7020P ClassificationExercisesSolutions II
ECS7020P ClassificationExercisesSolutions II
C LASSIFICATION II
A CADEMIC YEAR 2022/2023
Q UEEN M ARY U NIVERSITY OF L ONDON
1
S OLUTIONS
E XERCISE ]1 (S OL ): Let’s start plotting the dataset (we will use symbol X for class A and O
for class B):
Both class overlap. In fact two samples have the same predictor and different labels.
• A Gaussian distribution has two parameters, namely the mean µ and standard deviation
σ. The estimator for the mean is:
1 X
µ̂ = xi ,
N i
The estimator for the variance can be obtained as the square root of the estimator of the
variance σ 2 . There are two estimators for the variance, one biased and another unbiased:
1 X 1 X
σˆ2 = or σˆ2 =
2 2
(xi − µ̂) (biased) (xi − µ̂) (unbiased)
N i N −1 i
Using the estimator of the mean and the square root of the unbiased estimator of the
variance we get for each class:
µA = (−2 − 1 + 0 + 1 + 2)/5 = 0
p
σA = ((−2 − 0)2 + (−1 − 0)2 + (0 − 0)2 + (1 − 0)2 + (2 − 0)2 ) /4 = 1.58
µB = (1 + 2 + 3 + 4 + 5)/5 = 3
p
σB = ((1 − 3)2 + (2 − 3)2 + (3 − 3)2 + (4 − 3)2 + (5 − 3)2 ) /4 = 1.58
Note that both classes have the same standard deviation. If we plot them, we get:
-3 -2 -1 0 1 2 3 4 5 6
2
• Given a sample xi , the Bayes classifier compares the posterior probabilities P (A|xi ) and
P (B|xi ) to classify it:
P (A|xi )
>1 → ŷi = A
P (B|xi )
P (A|xi )
<1 → ŷi = B
P (B|xi )
Using Bayes rule, we can express the posterior probabilities in terms of the priors P (A)
and P (B) and the class densities p(x|A) and p(x|B). The class densities are Gaussian
and have the same standard deviation (as in linear discriminant analysis) and the priors
are P (A) = 0.5 and P (B) = 0.5. The classifier is then:
• If the priors are P (A) = 0.1 and P (B) = 0.9 instead and the class densities (also known
as likelihoods) are the same, we get:
and its diagonal entries are actually the variances of the marginal class densities:
2
σ 0
Σ= A 2
0 σB
Then, for each class we simply need to estimate the parameters of the marginal class densities.
In total, there are 6 class densities (3 classes × 2 predictors). Note that in this problem the
subindices A and B identify each predictor, whereas in the previous problem they were
used to identify each class instead. The means are
2
µ• =
8
7
µ• =
5
3
µ• =
3
3
And the covariance matrices:
1 0
Σ• =
0 1
1.1 0
Σ• =
0 8.9
0.5 0
Σ• =
0 0.5
After obtaining the mean and covariance matrix for each class densities, it is convenient to
check that the results make sense. The means should correspond to the centre of each class,
the variances should describe the spread of samples around their centres.
10
6
xB
0
0 1 2 3 4 5 6 7 8 9 10
xA
Figure 1
The boundaries of the classifier consist of the points where two or more posterior probabilities
are equal. The posterior probabilities can be expressed in terms of the priors and the class
densities. Note that in this exercise the priors are P (•) = 5/20=1/4, P (•) = 5/20=1/4 and P (•)
= 10/20=1/2.
4
Actual Actual Actual
xB = 5.5 • • xB = 7.5 • • xB = 9.5 • •
• 20 14 • 21 24 • 23 30
Predicted Predicted Predicted
• 3 16 • 2 6 • 0 0
We will assume • is the positive class and • the negative class. The sensitivity is calculated as:
For each classifier, we have the following values of sensitivity and specificity:
xB = 0.5
0
Sensitivity = =0
23
30
Specificity = =1
30
xB = 1.5
5
Sensitivity =
23
28
Specificity =
30
xB = 3.5
15
Sensitivity =
23
25
Specificity =
30
xB = 5.5
20
Sensitivity =
23
16
Specificity =
30
xB = 7.5
21
Sensitivity =
23
6
Specificity =
30
xB = 9.5
23
Sensitivity = =1
23
0
Specificity = =0
30
Plotting we sensitivity against the 1-specificity we obtain the the ROC curve.
5
1
0.9
0.8
0.7
0.6
sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 - specificity
For each classifier (there are 6) we need to plot the boundary, count the number of samples
correctly and incorrectly classified for each class, and fill in their confusion matrices.
6
The sensitivity and specificity and specificity values are:
c = −8.5
0
Sensitivity = =0
23
30
Specificity = =1
30
c = −4.5
8
Sensitivity =
23
29
Specificity =
30
c = −1.5
16
Sensitivity =
23
24
Specificity =
30
c = 1.5
20
Sensitivity =
23
12
Specificity =
30
c = 4.5
22
Sensitivity =
23
3
Specificity =
30
c = 8.5
23
Sensitivity = =1
23
0
Specificity = =0
30
Plotting we sensitivity against the 1-specificity we obtain the the ROC curve. The red curve
corresponds to the family xB = xA + c, the blue to xB = c. The family of classifiers
xB = xA + c is slightly better than xB = c.
The area under the curve (AUC) is a measure of goodness for a classifier that can be calibrated.
1
0.9
0.8
0.7
0.6
sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 - specificity