IML21 Term1
IML21 Term1
PAPER COMP70050=COMP97101=COMP97151
This time-limited remote assessment has been designed to be open book. You may use resources which have been identified by the examiner to
complete the assessment and are included in the instructions for the examination. You must not use any additional resources when completing
this assessment.
The use of the work of another student, past or present, constitutes plagiarism. Giving your work to another student to use constitutes an
offence. Collusion is a form of plagiarism and will be treated in a similar manner. This is an individual assessment and thus should be completed
solely by you. The College will investigate all instances where an examination or assessment offence is reported or suspected, using plagiarism
software, vivas and other tools, and apply appropriate penalties to students. In all examinations we will analyse exam performance against
previous performance and against data from previous years and use an evidence-based approach to maintain a fair and robust examination. As
with all exams, the best strategy is to read the question carefully and answer as fully as possible, taking account of the time and number of marks
available.
Predictions
# NN1 NN2 NN3 NN4 NN5 Label
(1-NN) (3-NN) (5-NN)
1 + + + × + + +
2 + × + + × + +
3 × + × + + + ×
4 × + × × × × ×
5 + × × × + × +
6 × + + × + × ×
i) Compute the predicted labels (+ or ×) for a 3-nearest neighbours and
5-nearest neighbours classifier for each test instance in the table above.
The predictions for a 1-nearest neighbour are provided as an example.
There is no need to provide any calculations for this question. Just provide
the predicted labels (either by filling up the table above, or writing them
down on a blank sheet).
iii) Assume the weights for the five neighbours of test instance #6 are 0.6,
0.15, 0.1, 0.1, and 0.05 respectively. What is the prediction for a
distance-weighted nearest neighbour classifier for this test instance (+ or
×)? Justify your answer in one sentence (there is no need to show your
calculations).
The network takes the three features as input. There is one hidden neuron (h)
with tanh activation. The output (ŷ) is a single neuron with linear activation,
predicting the resulting mark in a range between 0-10. The weights of the
network have been randomly initialised and can be seen on the diagram. The
network does not have any bias parameters.
If you need to make any assumptions, state them clearly in your answer.
a You are given normalised feature values for one test taker: 0.5 for the number of
years the person has studied French, 0.6 for the number of years they have
studied any foreign language, and 0.7 for the number of hours they spent
preparing for this test.
Find what result does the model predict for this person. Demonstrate your
calculations and intermediate steps.
b After the person takes the test, they receive mark 5.0 as their official result. You
now want to use this knowledge to improve your model.
Calculate the updated values for all the trainable parameters in this network after
one step of stochastic gradient descent using this datapoint. Use mean squared
error as the loss function and 0.5 as the learning rate.
Show the path of your calculations.
c Below is a table with predicted scores and true results for 5 test takers. You want
to evaluate how well the model is able to predict who passes the test (receives a
score ≥ 6.0). Calculate precision, recall and F-score for the system.
The three parts carry, respectively, 30%, 50%, and 20% of the marks.
a Initially, the grid was composed of 100 cells (i.e., 10x10 resolution). However,
we can observe that a significantly lower number of solutions has been reported.
To avoid this, we want to group all the found solutions into three clusters.
Apply one iteration of the k-Means algorithm with k = 3 and the following
data-points as initial values for the centroids. Show the details of your
calculations.
Please use the Manhattan distance (the L1-norm) as the distance metric for
this question.
Initial Centroids
c1 1 1
c2 9 1
c3 8 9
b To make this question independent from the previous one, let’s assume that we
ran k-Means with k = 4 and obtained the following centroids:
c1 1 3
c2 7 3
c3 5 7
c4 8 9
iii) Quantify the diversity, the performance and the QD score of the
collection created above.
The three parts carry, respectively, 40%, 40%, and 20% of the marks.