IML19 Term1
IML19 Term1
b You are given the dataset below. Each row is a sample email annotated as spam
(or not), given whether or not a word appears in the email (0 indicates that the
word does not appear in the email, 1 indicates that it does).
i) Using the Information Gain metric, which attribute will be selected as the
root node of a decision tree classifier? Please show all calculations
(including the Information Gain of all candidate nodes) to justify your
answer.
ii) Give one reason why one would need to prune a decision tree. Also
describe (in one or two sentences) how a validation set can be useful in
performing pruning.
(i )
i x (i ) y (i ) d ( x ( q ) , x (i ) ) wq
1 1.5 3.16 ??? ???
2 2.3 1.45 ??? ???
3 3.0 1.07 ??? ???
4 3.8 2.01 ??? ???
5 4.9 4.51 ??? ???
ii) Predict the output y(q) for x (q) = 4.2 using the k-nearest neighbours
regression algorithm with d( x (q) , x (i) ) as its distance measure, and
assuming k = 3. Show your calculation.
iii) Now predict the output y(q) for x (q) = 4.2 using the locally weighted
k-nearest neighbours regression algorithm. Use k = 3, the distance
(i )
measure d( x (q) , x (i) ), and the weights wq . Show your calculation.
The three parts carry, respectively, 30%, 40%, and 30% of the marks.
b Explain the concept of overfitting. Name 3 methods you can use to deal with
overfitting and explain how each of them helps.
ii) Calculate the classification accuracy, along with precision, recall and F1
for both classes.
iii) Analyse the results. Are there any issues? If so, which metrics identify
them?
The three parts carry, respectively, 40%, 30%, and 30% of the marks.
k πk µk σk2
1 0.5 -2 1
2 0.3 1 4
3 0.2 4 0.25
Suppose you are given an example x (i) = 0 at test time. Compute the probability
density for p( x (i) |θ ) given the parameters of the Gaussian Mixture Model above.
Show your calculations.
( x − µ )2
−
Hint: the Gaussian distribution is defined as: N ( x |µ, σ2 ) = √ 1 exp 2σ2
2πσ2
c When developing a neural network, which activation function and loss function
would you use in the output layer for the following applications? Justify your
decisions.
ii) Generating text by predicting the next word in the sequence based on the
previous words.
iii) Detecting whether the camera image from a self-driving car contains a stop
sign.
ii) Given a dataset of 10,000 datapoints, how would you use it to find good
hyperparameters?
The four parts carry, respectively, 20%, 30%, 30%, and 20% of the marks.