0% found this document useful (0 votes)

22 views14 pages

02450ex Fall2018

This document provides information about an examination for an introduction to machine learning and data mining course. The exam will take place on December 18th, 2018 from 9 AM to 1 PM. All aids are permitted. The exam consists of multiple choice questions worth 3 points for a correct answer, -1 point for a wrong answer, and 0 points for answering "don't know". Students should submit their answers digitally if possible, or use the answer sheet provided.

Uploaded by

navistories

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views14 pages

02450ex Fall2018

Uploaded by

navistories

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Technical University of Denmark

Written examination: December 18th 2018, 9 AM - 1 PM.

Course name: Introduction to Machine Learning and Data Mining.

Course number: 02450.

Aids allowed: All aids permitted.

Exam duration: 4 hours.

Weighting: The individual questions are weighted equally.

Please hand in your answers using the electronic file. Only use this page in the case where digital
handin is unavailable. In case you have to hand in the answers using the form on this sheet, please follow
these instructions:
Print name and study number clearly. The exam is multiple choice. All questions have four possible answers
marked by the letters A, B, C, and D as well as the answer “Don’t know” marked by the letter E. Correct answer
gives 3 points, wrong answer gives -1 point, and “Don’t know” (E) gives 0 points.
The individual questions are answered by filling in the answer fields with one of the letters A, B, C, D, or E.

Answers:

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27

Name:

Student number:

PLEASE HAND IN YOUR ANSWERS DIGITALLY.

USE ONLY THIS PAGE FOR HAND IN IF YOU ARE
UNABLE TO HAND IN DIGITALLY.

1 of 14
No. Attribute description Abbrev.
x1 intercolumnar distance interdist
x2 upper margin upperm
x3 lower margin lowerm
x4 exploitation exploit
x5 row number row nr.
x6 modular ratio modular
x7 interlinear spacing interlin
x8 weight weight
x9 peak number peak nr.
x10 modular ratio/ interlinear spacing mr/is
y Who copied the text? Copyist

Table 1: Description of the features of the Avila Bible

dataset used in this exam. The dataset has been
extracted from images of the ’Avila Bible’, an XII
century giant Latin copy of the Bible. The prediction
task consists in associating each pattern to one of three
copyist (copyist refers to the monk who copied the text
in the bible), indicated by the y-value. Note that only
a subset of the dataset is used. The dataset used here
consist of N = 525 observations and the attribute y is
discrete taking values y = 1, 2, 3 corresponding to the
three different copyists. Figure 1: Plot of observations x2 , x3 , x9 , x10 of the
Avila Bible dataset of Table 1 as percentile plots.
Question 1.
The main dataset used in this exam is the Avila Bible
dataset1 shown in Table 1.
In Figure 1 and Figure 2 are shown respectively
percentile plots and boxplots of the Avila Bible dataset
based on the attributes x2 , x3 , x9 , x10 found in
Table 1. Which percentile plots match which boxplots? 1.2

1
A. Boxplot 1 is mr/is, Boxplot 2 is lowerm, Boxplot
0.8
3 is upperm and Boxplot 4 is peak nr.
0.6

B. Boxplot 1 is upperm, Boxplot 2 is lowerm, Boxplot 0.4

3 is peak nr. and Boxplot 4 is mr/is
0.2

C. Boxplot 1 is upperm, Boxplot 2 is peak nr., Box- 0

plot 3 is mr/is and Boxplot 4 is lowerm -0.2

-0.4
D. Boxplot 1 is mr/is, Boxplot 2 is lowerm, Boxplot
3 is peak nr. and Boxplot 4 is upperm 1 2 3 4

E. Don’t know.
Figure 2: Boxplots corresponding to the variables
plotted in Figure 1 but not necessarily in that order.

1
Dataset obtained from https://archive.ics.uci.edu/ml/
datasets/Avila

2 of 14
Question 2.
A Principal Component Analysis (PCA) is carried
out on the Avila Bible dataset in Table 1 based on the
attributes x1 , x3 , x5 , x6 , x7 .
The data is standardized by (i) substracting the
mean and (ii) dividing each column by its standard
deviation to obtain the standardized matrix X̃. A
singular value decomposition is then carried out on
the standardized matrix to obtain the decomposition
U SV T = X̃
 
0.04 −0.12 −0.14 0.35 0.92
 0.06
 0.13 0.05 −0.92 0.37 

−0.03
V = −0.98 0.08 −0.16 −0.05 (1)
−0.99 0.03 0.06 −0.02 0.07 
−0.07 −0.05 −0.98 −0.11 −0.11

 
14.4 0.0 0.0 0.0 0.0 Figure 3: Black dots show attributes x5 and x7 of the
 0.0 8.19 0.0 0.0 0.0 
  Avila Bible dataset from Table 1. The two points cor-
 0.0 0.0 7.83 0.0 0.0 
S= 
responding to the colored markers indicate two specific
 0.0 0.0 0.0 6.91 0.0 
observations A, B.
0.0 0.0 0.0 0.0 6.01

Which one of the following statements is true?

A. The variance explained by the first principal com-

ponent is greater than 0.45

B. The variance explained by the first four principal

components is less than 0.85

C. The variance explained by the last four principal

components is greater than 0.56

D. The variance explained by the first three principal

components is less than 0.75

E. Don’t know.

Question 3.
Consider again the PCA analysis fo the Avila Bible
dataset. In Figure 3 the features x5 and x7 from Figure 4: Candidate plots of the observations and
Table 1 are plotted as black dots. We have indicated path shown in Figure 3 projected onto the first two
two special observations as colored markers (Point A principal components considered in Equation (1). The
and Point B). colored markers still refer to points A and B, now
We can imagine that the dataset, along with the in the coordinate system corresponding to the PCA
two special observations, is projected onto the first two projection.
principal component directions given in V as computed
earlier (see Equation (1)). Which one of the four plots

3 of 14
o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 K nearest neighbors. What is the average relative
o1 0.0 2.91 0.63 1.88 1.02 1.82 1.92 1.58 1.08 1.43
o2 2.91 0.0 3.23 3.9 2.88 3.27 3.48 4.02 3.08 3.47 density for observation o4 for K = 2 nearest neighbors?
o3 0.63 3.23 0.0 2.03 1.06 2.15 2.11 1.15 1.09 1.65
o4 1.88 3.9 2.03 0.0 2.52 1.04 2.25 2.42 2.18 2.17
o5 1.02 2.88 1.06 2.52 0.0 2.44 2.38 1.53 1.71 1.94
A. 1.0
o6 1.82 3.27 2.15 1.04 2.44 0.0 1.93 2.72 1.98 1.8
o7 1.92 3.48 2.11 2.25 2.38 1.93 0.0 2.53 2.09 1.66 B. 0.71
o8 1.58 4.02 1.15 2.42 1.53 2.72 2.53 0.0 1.68 2.06
o9 1.08 3.08 1.09 2.18 1.71 1.98 2.09 1.68 0.0 1.48
o10 1.43 3.47 1.65 2.17 1.94 1.8 1.66 2.06 1.48 0.0
C. 0.68

D. 0.36
Table 2: The pairwise
qP Euclidian distances, d(oi , oi ) =
M 2 E. Don’t know.
kxi − xj k2 = k=1 (xik − xjk ) between 10 obser-
vations from the Avila Bible dataset (recall M = 10).
Each observation oi corresponds to a row of the data Question 5.
matrix X of Table 1 (the data has been standardized). Suppose a GMM model is applied to the Avila Bible
The colors indicate classes such that the black observa- dataset in the processed version shown in Table 2. The
tions {o1 , o2 , o3 } belongs to class C1 (corresponding to GMM is constructed as having K = 3 components,
copyist one), the red observations {o4 , o5 , o6 , o7 , o8 } and each component k of the GMM is fitted by letting
belongs to class C2 (corresponding to copyist two), and it’s mean vectors µk be equal to the location of the
the blue observations {o9 , o10 } belongs to class C3 (cor- observations:
responding to copyist three). o 7 , o8 , o9
(i.e. each observation corresponds to exactly one mean
in Figure 4 shows the correct PCA projection? vector) and setting the covariance matrix equal to
Σk = σ 2 I where I is the identity matrix:
A. Plot A −d(oi ,µk )2
1
N (oi ; µk , Σk ) = p e 2σ2
B. Plot B |2πΣk |

where | · | is the determinant. The components of the

C. Plot C
GMM are weighted evenly.
D. Plot D If σ = 0.5, and denoting the density of the GMM as
p(x), what is the density as evaluated at observation
E. Don’t know. o3 ?

A. p(o3 ) = 0.048402
Question 4. To examine if observation o4 may be an
B. p(o3 ) = 0.076
outlier, we will calculate the average relative density
based on euclidean distance and the observations given C. p(o3 ) = 0.005718
in Table 2 only. We recall that the KNN density and
average relative density (ard) for the observation xi are D. p(o3 ) = 0.114084
given by:
E. Don’t know.
1
densityX \i (xi , K) = 1 P 0)
,
K x0 ∈NX \i (xi ,K) d(xi , x

densityX \i (xi , K)
ardX (xi , K) = 1 P ,
K xj ∈NX \i (xi ,K) densityX \j (xj , K)

where NX \i (xi , K) is the set of K nearest neighbors

of observation xi excluding the i’th observation, and
ardX (xi , K) is the average relative density of xi using

4 of 14
Figure 6: Dendrogram 1 from Figure 5 with a cutoff
indicated by the dotted line, thereby generating 3
clusters.

Question 7.
Consider dendrogram 1 from Figure 5. Suppose we
apply a cutoff (indicated by the black line) thereby gen-
Figure 5: Proposed hierarchical clustering of the 10 erating three clusters. We wish to compare the quality
observations in Table 2. of this clustering, Q, to the ground-truth clustering, Z,
indicated by the colors in Table 2. Recall the normal-
Question 6. A hierarchical clustering is applied ized mutual information of the two clusterings Z and
to the 10 observations in Table 2 using minimum Q is defined as
linkage. Which of the dendrograms shown in Figure 5 MI[Z, Q]
corresponds to the clustering? NMI[Z, Q] = p p
H[Z] H[Q]
A. Dendrogram 1 where MI is the mutual information and H is the
entropy. Assuming we always use an entropy based
B. Dendrogram 2 on the natural logarithm,
C. Dendrogram 3 n
X
H=− pi log pi , log(e) = 1,
D. Dendrogram 4 i=1

E. Don’t know. what is the normalized mutual information of the two

clusterings?

A. NMI[Z, Q] ≈ 0.313
B. NMI[Z, Q] ≈ 0.302
C. NMI[Z, Q] ≈ 0.32
D. NMI[Z, Q] ≈ 0.274
E. Don’t know.

5 of 14
x9 -interval y=1 y=2 y=3 Question 10. Consider the split in Table 3. Suppose
we build a classification tree with only this split and
x9 ≤ 0.13 108 112 56
evaluate it on the same data it was trained on. What
0.13 < x9 58 75 116
is the accuracy?
Table 3: Proposed split of the Avila Bible dataset
A. Accuracy is: 0.64
based on the attribute x9 . We consider a 2-way split
where for each interval we count how many observa- B. Accuracy is: 0.29
tions belonging to that interval has the given class la-
bel. C. Accuracy is: 0.35

D. Accuracy is: 0.43

Question 8. Consider the distances in Table 2 based
on 10 observations from the Avila Bible dataset. The E. Don’t know.
class labels C1 , C2 , C3 (see table caption for details)
will be predicted using a k-nearest neighbour classifier Question 11. Suppose s1 and s2 are two text docu-
based on the distances given in Table 2. Suppose we use ments containing the text:
leave-one-out cross validation (i.e. the observation that
is being predicted is left out) and a 1-nearest neighbour the bag of words representation
s1 =
classifier (i.e. k = 1). What is the error rate computed should not give you a hard time
for all N = 10 observations? remember the representation should
n o
s2 =
be a vector
4
A. error rate = 10
The documents are encoded using a bag-of-words en-
9 coding assuming a total vocabulary size of M = 10000.
B. error rate = 10
No stopwords lists or stemming is applied to the
2
C. error rate = 10 dataset. What is the cosine similarity between doc-
D. error rate = 6 uments s1 and s2 ?
10

E. Don’t know. A. cosine similarity of s1 and s2 is 0.047619

B. cosine similarity of s1 and s2 is 0.000044

Question 9.
Suppose we wish to build a classification tree based C. cosine similarity of s1 and s2 is 0.000400
on Hunt’s algorithm where the goal is to predict Copy-
D. cosine similarity of s1 and s2 is 0.436436
ist which can belong to three classes, y = 1, y = 2, y =
3. The first split we consider is a two-way split based E. Don’t know.
on the value of x9 into the intervals indicated in Ta-
ble 3. For each interval, we count how many observa-
tions belong to each of the three classes and the result
is indicated in Table 3. Suppose we use the classifica-
tion error impurity measure, what is then the purity
gain ∆?

A. ∆ ≈ 0.485

B. ∆ ≈ 0.078

C. ∆ ≈ 0.566

D. ∆ ≈ 1.128

E. Don’t know.

6 of 14
Figure 7: Output of a logistic regression classifier
trained on 7 observations from the dataset. Figure 8: Proposed ROC curves for the logistic regres-
sion classifier in Figure 7.
Question 12. Consider again the Avila Bible dataset.
We are particularly interested in predicting whether a Question 13.
bible copy was written by copyist 1, and we therefore To evaluate the classifier Figure 7, we will use the
wish to train a logistic regression classifier to distin- area under curve (AUC) of the reciever operator char-
guish between copyist one vs. copyist two and three. acteristic (ROC) curve as computed on the 7 observa-
To simplify the setup further, we select just 7 obser- tions in Figure 7. In Figure 8 is given four proposed
vations and train a logistic regression classifier using ROC curves, which one of the curves corresponds to
only the feature x8 as input (as usual, we apply a simple the classifier?
feature transformation to the inputs to add a constant
feature in the first coordinate to handle the intercept A. ROC curve 1
term). To be consistent with the lecture notes, we la-
B. ROC curve 2
bel the output as y = 0 (corresponding to copyist one)
and y = 1 (corresponding to copyist two and three). C. ROC curve 3
In Figure 7 is shown the predicted output probability
an observation belongs to the positive class, p(y = D. ROC curve 4
1|x8 ). What are the weights?
E. Don’t know.

−0.93
A.
1.72

−2.82
B.
0.0

1.36
C.
0.4

−0.65
D.
0.0
E. Don’t know.

7 of 14
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 Question 15.
o1 1 1 0 0 0 1 0 0 0 1 Consider the binarized version of the Avila Bible
o2 1 0 0 0 0 0 0 0 0 0 dataset shown in Table 4.
o3 1 1 0 0 0 1 0 0 0 1
o4 0 1 1 1 0 0 0 1 1 0 The matrix can be considered as representing N =
o5 1 1 0 0 0 1 0 0 0 1 10 transactions o1 , o2 , . . . , o10 and M = 10 items
o6 0 1 1 1 0 0 1 1 1 0 f1 , f2 , . . . , f10 . Which of the following options repre-
o7 1 1 1 0 0 1 1 1 1 0 sents all (non-empty) itemsets with support greater
o8 0 1 1 1 0 1 1 0 0 1
o9 0 0 0 0 1 1 1 0 1 1
than 0.55 (and only itemsets with support greater than
o10 1 0 0 0 0 1 1 1 1 0 0.55)?

Table 4: Binarized version of the Avila Bible dataset. A. {f1 }, {f2 }, {f6 }, {f7 }, {f9 }, {f10 }, {f1 , f6 },
Each of the features fi are obtained by taking a feature {f2 , f6 }, {f6 , f10 }
xi and letting fi = 1 correspond to a value xi greater
B. {f1 }, {f2 }, {f6 }
than the median (otherwise fi = 0).The colors indicate
classes such that the black observations {o1 , o2 , o3 } C. {f1 }, {f2 }, {f3 }, {f4 }, {f6 }, {f7 }, {f8 }, {f9 },
belongs to class C1 (corresponding to copyist one), the {f10 }, {f1 , f2 }, {f2 , f3 }, {f2 , f4 }, {f3 , f4 }, {f1 , f6 },
red observations {o4 , o5 , o6 , o7 , o8 } belongs to class {f2 , f6 }, {f2 , f7 }, {f3 , f7 }, {f6 , f7 }, {f2 , f8 },
C2 (corresponding to copyist two), and the blue obser- {f3 , f8 }, {f7 , f8 }, {f2 , f9 }, {f3 , f9 }, {f6 , f9 },
vations {o9 , o10 } belongs to class C3 (corresponding to {f7 , f9 }, {f8 , f9 }, {f1 , f10 }, {f2 , f10 }, {f6 , f10 },
copyist three). {f2 , f3 , f4 }, {f1 , f2 , f6 }, {f2 , f3 , f7 }, {f2 , f3 , f8 },
{f2 , f3 , f9 }, {f6 , f7 , f9 }, {f2 , f8 , f9 }, {f3 , f8 , f9 },
Question 14. We again consider the Avila Bible {f7 , f8 , f9 }, {f1 , f2 , f10 }, {f1 , f6 , f10 }, {f2 , f6 , f10 },
dataset from Table 1 and the N = 10 observations we {f2 , f3 , f8 , f9 }, {f1 , f2 , f6 , f10 }
already encountered in Table 2. The data is processed
D. {f1 }, {f2 }, {f3 }, {f6 }, {f7 }, {f8 }, {f9 }, {f10 },
to produce 10 new, binary features such that fi = 1
{f1 , f2 }, {f2 , f3 }, {f1 , f6 }, {f2 , f6 }, {f6 , f7 },
corresponds to a value xi greater than the median2 ,
{f7 , f9 }, {f8 , f9 }, {f2 , f10 }, {f6 , f10 }, {f1 , f2 , f6 },
and we thereby arrive at the N × M = 10 × 10 binary
{f2 , f6 , f10 }
matrix in Table 4. Suppose we train a naı̈ve-Bayes
classifier to predict the class label y from only the E. Don’t know.
features f1 , f2 , f6 . If for an observations we observe
Question 16. We again consider the binary matrix
f1 = 1, f2 = 1, f6 = 0
from Table 4 as a market basket problem consisting
what is then the probability that y = 1 according to of N = 10 transactions o1 , . . . , o10 and M = 10 items
the Naı̈ve-Bayes classifier? f1 , . . . , f10 .
What is the confidence of the rule {f1 , f3 , f8 , f9 } →
50 {f
A. pNB (y = 1|f1 = 1, f2 = 1, f6 = 0) = 77 2 , f6 , f7 }

25 1
B. pNB (y = 1|f1 = 1, f2 = 1, f6 = 0) = 43 A. Confidence is 10
5
C. pNB (y = 1|f1 = 1, f2 = 1, f6 = 0) = 11 B. Confidence is 1
10 1
D. pNB (y = 1|f1 = 1, f2 = 1, f6 = 0) = 19 C. Confidence is 2

E. Don’t know. 3
D. Confidence is 20

E. Don’t know.

2
Note that in association mining, we would normally also
include features fi such that fi = 1 if the corresponding feature
is less than the median; for brevity we will not consider features
of this kind in this problem

8 of 14
to the nodes in the decision tree?

A. A: x7 ≥ 0.5, B: x9 ≥ 0.54, C: x9 ≥ 0.35, D:

x9 ≥ 0.26

B. A: x7 ≥ 0.5, B: x9 ≥ 0.26, C: x9 ≥ 0.54, D:

x9 ≥ 0.35

C. A: x9 ≥ 0.54, B: x7 ≥ 0.5, C: x9 ≥ 0.35, D:

x9 ≥ 0.26

D. A: x9 ≥ 0.26, B: x7 ≥ 0.5, C: x9 ≥ 0.35, D:

x9 ≥ 0.54

E. Don’t know.
Figure 9: Example classification tree.

Question 18. We will again consider the binarized

version of the Avila Bible dataset already encountered
in Table 4, however we will now only consider the first
M = 6 features f1 , f2 , f3 , f4 , f5 , f6 .
We wish to apply the Apriori algorithm (the specific
variant encountered in chapter 19 of the lecture notes)
to find all itemsets with support greater than ε = 0.15.
Suppose at iteration k = 3 we know that:
 
1 1 0 0 0 0
1 0 0 0 0 1
 
0 1 1 0 0 0
 
0 1 0 1 0 0
L2 =  
0 1 0 0 0 1
 
0 0 1 1 0 0
0 0 1 0 0 1

Recall the key step in the Apriori algorithm is to con-

Figure 10: classification boundary. struct L3 by first considering a large number of candi-
date itemsets C30 , and then rule out some of them using
the downwards-closure principle thereby saving many
(potentially costly) evaluations of support. Suppose L2
is given as above, which of the following itemsets does
Question 17. the Apriori algorithm not have to evaluate the support
of?
Consider again the Avila Bible dataset. Suppose we
train a decision tree to classify which of the 3 classes, A. {f2 , f3 , f4 }
Copyist 1, Copyist 2, Copyist 3, an observation belongs
to. Since the attributes of the dataset are continuous, B. {f1 , f2 , f6 }
we will consider binary splits of the form xi ≥ z for C. {f2 , f3 , f6 }
different values of i and z, and for simplicity we limit
ourselves to the attributes x7 and x9 . Suppose the D. {f1 , f3 , f4 }
trained decision tree has the form shown in Figure 9,
and that according to the tree the predicted label E. Don’t know.
assignment for the N = 525 observations are as given
in Figure 10, what is then the correct rule assignment

9 of 14
Question 19.
Consider again the Avila Bible dataset in Table 1. Feature(s) Training RMSE Test RMSE
We would like to predict the copyist using a linear none 3.429 4.163
regression, and since we would like the model to be as x1 3.043 3.252
interpretable as possible we will use variable selection x5 3.303 4.52
to obtain a parsimonious model. We limit ourselves x6 3.424 4.274
to the 5 features x1 , x5 , x6 , x8 , x9 and in Table 5 x8 3.399 4.429
we have pre-computed the estimated training and test x9 2.866 5.016
error for different variable combinations of the dataset. x1 , x5 3.001 3.44
Which of the following statements is correct? x1 , x6 3.031 3.423
x5 , x6 3.297 4.641
A. Backward selection will select attributes x1
x1 , x8 3.017 3.42
B. Backward selection will select attributes x5 , x8 3.299 4.485
x1 , x5 , x6 , x8 x6 , x8 3.396 4.519
x1 , x9 2.644 4.267
C. Forward selection will select attributes x1 , x8 x5 , x9 2.645 5.495
x6 , x9 2.787 5.956
D. Forward selection will select attributes
x8 , x9 2.71 5.536
x1 , x5 , x6 , x8
x1 , x5 , x6 2.988 3.607
E. Don’t know. x1 , x5 , x8 3.0 3.453
x1 , x6 , x8 3.007 3.574
x5 , x6 , x8 3.292 4.61
Question 20.
x1 , x5 , x9 2.523 4.704
Consider the Avila Bible dataset from Table 1. We
x1 , x6 , x9 2.562 5.184
wish to predict the copyist based on the attributes
x5 , x6 , x9 2.544 6.552
upperm and mr/is.
x1 , x8 , x9 2.517 4.686
Therefore, suppose the attributes have been bina-
x5 , x8 , x9 2.628 5.532
rized such that x̃2 = 0 corresponds x2 ≤ −0.056
x6 , x8 , x9 2.629 6.569
(and otherwise x̃2 = 1) and x̃10 = 0 corresponds
x1 , x5 , x6 , x8 2.988 3.614
x10 ≤ −0.002 (and otherwise x̃10 = 1). Suppose the
x1 , x5 , x6 , x9 2.425 5.725
probability for each of the configurations of x̃2 and x̃10
x1 , x5 , x8 , x9 2.491 4.734
conditional on the copyist y are as given in Table 6.
x1 , x6 , x8 , x9 2.433 5.687
and the prior probability of the copyists is
x5 , x6 , x8 , x9 2.53 6.597
p(y = 1) = 0.316, p(y = 2) = 0.356, p(y = 3) = 0.328. x1 , x5 , x6 , x8 , x9 2.398 5.766

Using this, what is then the probability an observation Table 5: Root-mean-square error (RMSE) for the
was authored by copyist 1 given that x̃2 = 1 and training and test set when using least squares regres-
x̃10 = 0? sion to predict y in the avila dataset using different
combinations of the features x1 , x5 , x6 , x8 , x9 .
A. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.25

B. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.313 p(x̃2 , x̃10 |y) y=1 y=2 y=3
C. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.262 x̃2 = 0, x̃10 = 0 0.19 0.3 0.19
x̃2 = 0, x̃10 = 1 0.22 0.3 0.26
D. p(y = 1|x̃2 = 1, x̃10 = 0) = 0.298 x̃2 = 1, x̃10 = 0 0.25 0.2 0.35
x̃2 = 1, x̃10 = 1 0.34 0.2 0.2
E. Don’t know.
Table 6: Probability of observing particular values of
x̃2 and x̃10 conditional on y.

10 of 14
Variable t=1 t=2 t=3 t=4
y1 1 2 2 2
y2 1 2 2 1
y3 2 2 2 1
y4 1 1 1 2
y5 1 1 1 1
y6 2 2 2 1
y7 1 2 2 1
y8 2 1 1 2
y9 2 2 2 2
y10 1 1 2 2
y11 2 2 1 2
y12 2 1 1 2
y1test 2 1 1 2
y2test 2 2 1 2
t 0.583 0.657 0.591 0.398
αt -0.168 -0.325 -0.185 0.207

Table 7: Tabulation of each of the predicted outputs

of the AdaBoost classifiers, as well as the intermediate
values αt and t , when the AdaBoost algorithm when
evaluated for T = 4 steps. Note the table includes the
prediction of the two test points in Figure 11.

Question 21.
Consider again the Avila Bible dataset of Table 1.
Suppose we limit ourselves to N = 12 observations Figure 11: Decision boundaries for a KNN classifier
from the original dataset and furthermore suppose we for the first T = 4 rounds of boosting. Notice that in
limit ourselves to class y = 1 or y = 2 and only addition to the training data, the plot also indicate the
consider the features x6 and x8 . We wish to apply a location of two test points.
KNN classification model (K = 2) to this dataset and
apply AdaBoost to improve the performance. During
the first T = 4 rounds of boosting, we obtain the
decision boundaries shown in Figure 11. The figure
also contains two test observations (marked by a cross
and a square).
The prediction of the intermediate AdaBoost clas-
sifiers, as well as the values of αt and t , are given
in Table 7. Given this information, how will the Ad-
aBoost classifier, as obtained by combining the T = 4
weak classifiers, classify the two test observations?

A. ỹ1test ỹ2test = 1

1

B. ỹ1test ỹ2test = 2

1

C. ỹ1test ỹ2test = 1

2

D. ỹ1test ỹ2test = 2

2

E. Don’t know.

11 of 14
to the function f ?

A. ANN output 4

B. ANN output 1

C. ANN output 3

D. ANN output 2

E. Don’t know.

Question 23. Suppose a neural network is trained to

translate documents. As part of training the network,
we wish to select between four different ways to encode
the documents (i.e., S = 4 models) and estimate the
generalization error of the optimal choice. In the outer
loop we opt for K1 = 3-fold cross-validation, and in
Figure 12: Suggested activation curves for an ANN the inner K2 = 4-fold cross-validation. The time taken
applied to the feature x7 from Avila Bible dataset. to train a single model is 20 minutes, and this can
be assumed constant for each fold. If the time taken
to test a model is negligible, what is the total time
Question 22. required for the 2-level cross-validation procedure?
We will consider an artificial neural network (ANN)
A. 1020 minutes
applied to the Avila Bible dataset described in Table 1
and trained to predict based on just the feature x7 ; that B. 2040 minutes
is, the neural network is a function that maps from a
single real number to a single real number: f (x7 ) = y C. 300 minutes
Suppose the neural network takes the form: D. 960 minutes

E. Don’t know.
2
(2) (2) (1)
X
f (x, w) = w0 + wj h(1) ([1 x]wj ).
j=1

where h(1) (x) = max(x, 0) is the rectified linear func-

tion used as activation function in the hidden layer and
the weights are given as:

(1) −1.8
w1 =
−1.1

(1) −0.6
w2 =
3.8

−0.1
w(2) = ,
2.1
(2)
w0 = − 0.8.

Which of the curves in Figure 12 will then correspond

12 of 14
Figure 13: Mixture components in a GMM mixture
Figure 14: Scatter plot of each pairs of attributes of a
model with K = 3.
vectors x drawn from a multivariate normal distribu-
tion of 3 dimensions.
Question 24.
We wish to apply the EM algorithm to fit a 1D
Question 25. Consider a multivariate normal distri-
GMM mixture model to the single feature x3 from
bution with covariance matrix Σ and mean µ and sup-
the Avila Bible dataset. At the first step of the
pose we generate 1000 random samples from it:
EM algorithm, the K = 3 mixture components has
densities as indicated by each of the curves in Figure 13 >
x = x1 x2 x3 ∼ N (µ, Σ)
(i.e. each curve is a normalized, Gaussian density
N (x; µk , σk )). In the figure, we have indicated the x3 - Plots of each pair of coordinates of the draws x is shown
value of a single observation i from the dataset as a in Figure 14. What is the most plausible covariance
black cross. matrix?
Suppose we wish to apply the EM algorithm to this  
mixture model beginning with the E-step. We assume 1.0 0.65 −0.65
the weights of the components are A. Σ =  0.65 1.0 0.0 
−0.65 0.0 1.0
π = 0.15 0.53 0.32  
1.0 0.0 0.65
and the mean/variances of the components are those B. Σ =  0.0 1.0 −0.65
indicated in the figure. 0.65 −0.65 1.0
According to the EM algorithm, what is the (ap-  
proximate) probability the black cross is assigned to 1.0 −0.65 0.0
mixture component 3 (γik )? C. Σ = −0.65 1.0 0.65
0.0 0.65 1.0
A. 0.4  
1.0 0.0 −0.65
B. 0.86 D. Σ =  0.0 1.0 0.65 
−0.65 0.65 1.0
C. 0.28
E. Don’t know.
D. 0.58

E. Don’t know.

13 of 14
Which norms were used in the four KNN classifiers?

A. KNN classifier 1 corresponds to p = ∞, KNN

classifier 2 corresponds to p = 2, KNN classifier 3
corresponds to p = 4, KNN classifier 4 corresponds
to p = 1

B. KNN classifier 1 corresponds to p = 4, KNN

classifier 2 corresponds to p = 2, KNN classifier 3
corresponds to p = 1, KNN classifier 4 corresponds
to p = ∞

C. KNN classifier 1 corresponds to p = 4, KNN

classifier 2 corresponds to p = 1, KNN classifier 3
corresponds to p = 2, KNN classifier 4 corresponds
to p = ∞

D. KNN classifier 1 corresponds to p = ∞, KNN

classifier 2 corresponds to p = 1, KNN classifier 3
corresponds to p = 2, KNN classifier 4 corresponds
to p = 4

E. Don’t know.
Figure 15: Decision boundaries for a KNN classifier,
K = 1, computed for the two observations marked by
circles (the colors indicate class labels), but using four Question 27. Consider a small dataset comprised of
different p-distances dp (·, ·) to compute k-neighbors. N = 9 observations

x = 0.1 0.3 0.5 1.0 2.2 3.0 4.1 4.4 4.7 .
Question 26.
We consider a K-nearest neighbor (KNN) classifier Suppose a k-means algorithm is applied to the dataset
with K = 1. Recall in a KNN classifier, we find the with K = 4 and using Euclidian distances. At a given
nearest neighbors by computing the distances using a stage of the algorithm the data is partitioned into the
distance measure d(x, y). For this problem, we will blocks:
consider KNN classifiers based on different distance
{0.1, 0.3}, {0.5, 1}, {2.2, 3, 4.1}, {4.4, 4.7}
measures based on p-norms
 1 What clustering will the k-means algorithm eventually
M p
X converge to?
dp (x, y) =  |xj − yj |p  , p ≥ 1
j=1 A. {0.1, 0.3, 0.5, 1}, {2.2}, {}, {3, 4.1, 4.4, 4.7}
and what decision surfaces they induce. B. {0.1, 0.3}, {0.5, 1}, {2.2, 3}, {4.1, 4.4, 4.7}
In Figure 15 are shown four different decision bound-
aries obtained by training the KNN (K = 1) classifiers C. {0.1, 0.3}, {0.5}, {1, 2.2}, {3, 4.1, 4.4, 4.7}
using the training observations (marked by the two cir- D. {0.1, 0.3}, {0.5, 1, 2.2, 3}, {4.1, 4.4}, {4.7}
cles in the figure):
E. Don’t know.
0.301 0.34
x1 = , x2 =
0.514 0.672

and with corresponding class labels y1 = 0 and y2 = 1,

but with distance measures based on p = 1, 2, 4, ∞
(not necessarily plotted in that order).

14 of 14

BS en 60584-1-2013
100% (2)
BS en 60584-1-2013
72 pages
Bio-Stats Step 3
100% (6)
Bio-Stats Step 3
9 pages
Mobile SDK Developer Guide
No ratings yet
Mobile SDK Developer Guide
387 pages
Ucc Financing Statement
100% (9)
Ucc Financing Statement
6 pages
Multiple Choice Questions Descriptive Statistics Summary Statistics - Compress
No ratings yet
Multiple Choice Questions Descriptive Statistics Summary Statistics - Compress
15 pages
IIT Kanpur Machine Learning End Sem Paper
No ratings yet
IIT Kanpur Machine Learning End Sem Paper
10 pages
Application of Linear Programming Techniques To Practical
100% (1)
Application of Linear Programming Techniques To Practical
13 pages
Practical Manual BT511P Introduction To Biotechnology
0% (1)
Practical Manual BT511P Introduction To Biotechnology
60 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
No ratings yet
Living and Non Living Things DLP Final Exam Prepared by Jessica Carolino...
5 pages
Artists and Artisans
100% (2)
Artists and Artisans
46 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
No ratings yet
University of Cambridge International Examinations International General Certificate of Secondary Education
20 pages
统计考试题13页
100% (1)
统计考试题13页
13 pages
Exam SRM Sample Questions 2
No ratings yet
Exam SRM Sample Questions 2
60 pages
Descr Summary Sol
0% (1)
Descr Summary Sol
19 pages
TQ - Stat
100% (1)
TQ - Stat
9 pages
Electric Charges and Fields
No ratings yet
Electric Charges and Fields
58 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Context PDF
No ratings yet
Context PDF
31 pages
Old Exam
No ratings yet
Old Exam
104 pages
Persuasive Speech On Homework Should Be Banned
100% (1)
Persuasive Speech On Homework Should Be Banned
6 pages
02450ex Fall2018 Sol
No ratings yet
02450ex Fall2018 Sol
26 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
77 pages
Pre Eclampsia
No ratings yet
Pre Eclampsia
102 pages
2006 Grad Dip AS I
No ratings yet
2006 Grad Dip AS I
13 pages
DLP - 2 - Weel 2 - in 21ST Centurt Literature in The Philippines and The World
No ratings yet
DLP - 2 - Weel 2 - in 21ST Centurt Literature in The Philippines and The World
5 pages
Family Business Management Presentation
No ratings yet
Family Business Management Presentation
16 pages
Day 3-2 Logical Framework
No ratings yet
Day 3-2 Logical Framework
21 pages
Deep Learning Quiz Merged
No ratings yet
Deep Learning Quiz Merged
82 pages
2017dec 02402 Solution en
No ratings yet
2017dec 02402 Solution en
45 pages
Shakiba Rahimiaghdam - 61130 - Assignsubmission - File - DatasetAnalysis - MINERS
No ratings yet
Shakiba Rahimiaghdam - 61130 - Assignsubmission - File - DatasetAnalysis - MINERS
56 pages
Term One Edited
No ratings yet
Term One Edited
70 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
He Mrut 006
No ratings yet
He Mrut 006
3 pages
MCQ
No ratings yet
MCQ
15 pages
02450ex Spring2020 Sol
No ratings yet
02450ex Spring2020 Sol
20 pages
Statistical Analysis
No ratings yet
Statistical Analysis
50 pages
02450ex August2020 Sol
No ratings yet
02450ex August2020 Sol
22 pages
Client Services Agreement
No ratings yet
Client Services Agreement
37 pages
PR Practical File
No ratings yet
PR Practical File
38 pages
02450ex Spring2018 Sol
No ratings yet
02450ex Spring2018 Sol
22 pages
One Snowy Night
No ratings yet
One Snowy Night
7 pages
Review Final 200 Questions
No ratings yet
Review Final 200 Questions
53 pages
02450ex Fall2016 Sol
No ratings yet
02450ex Fall2016 Sol
17 pages
YETI Documentation: Release 1.0
No ratings yet
YETI Documentation: Release 1.0
53 pages
02450ex Spring2019
No ratings yet
02450ex Spring2019
15 pages
Analyzing User Comments On YouTube Coding Tutorial Videos
No ratings yet
Analyzing User Comments On YouTube Coding Tutorial Videos
50 pages
02450ex Spring2020
No ratings yet
02450ex Spring2020
13 pages
02450ex August2020
No ratings yet
02450ex August2020
15 pages
Geoff Bohling NonParClass
No ratings yet
Geoff Bohling NonParClass
26 pages
Sample Test
No ratings yet
Sample Test
17 pages
02450ex Fall2016
No ratings yet
02450ex Fall2016
12 pages
My Notes Financial Market
No ratings yet
My Notes Financial Market
8 pages
CS 215: Data Analysis and Interpretation: Sample Questions
No ratings yet
CS 215: Data Analysis and Interpretation: Sample Questions
10 pages
Problem
No ratings yet
Problem
25 pages
French Invention Cuisine
No ratings yet
French Invention Cuisine
4 pages
Sta301 Ch.1 To 22 For Grand Quiz
No ratings yet
Sta301 Ch.1 To 22 For Grand Quiz
16 pages
02450ex Spring2019 Sol
No ratings yet
02450ex Spring2019 Sol
23 pages
DDOT Reimagined Phase II Draft Plan (Released 4/24/23)
100% (1)
DDOT Reimagined Phase II Draft Plan (Released 4/24/23)
4 pages
UEEE Economics 2012
No ratings yet
UEEE Economics 2012
31 pages
CS001-B03 - Exploratory Data Analysis 20
No ratings yet
CS001-B03 - Exploratory Data Analysis 20
7 pages
Final Compre - Solutions - Updated FoDS
No ratings yet
Final Compre - Solutions - Updated FoDS
12 pages
Sample Questions: Subject Name: Semester: VI
No ratings yet
Sample Questions: Subject Name: Semester: VI
17 pages
Lecture4 Mech SU
No ratings yet
Lecture4 Mech SU
17 pages
PC1015
No ratings yet
PC1015
13 pages
Project 3: Technical University of Denmark
No ratings yet
Project 3: Technical University of Denmark
10 pages
STA3022Test2 2023 v2
No ratings yet
STA3022Test2 2023 v2
6 pages
Wiljam Flight Training: 050-01-01 Composition, Extent, Vertical Division
No ratings yet
Wiljam Flight Training: 050-01-01 Composition, Extent, Vertical Division
18 pages
Exam dm1 121017 Ans
No ratings yet
Exam dm1 121017 Ans
8 pages
Comp - Sem VI - Quantitative Analysis+Sample Questions
No ratings yet
Comp - Sem VI - Quantitative Analysis+Sample Questions
10 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
2-2 Cse-A, B, CSM Bit Bank Mid - 1
No ratings yet
2-2 Cse-A, B, CSM Bit Bank Mid - 1
7 pages
Stastics Question
No ratings yet
Stastics Question
7 pages
Excel Na Di Pa Tapos Hahah ROAR ROAR!
No ratings yet
Excel Na Di Pa Tapos Hahah ROAR ROAR!
3 pages
COMP 1003&1433 Midterm (Tuesday)
No ratings yet
COMP 1003&1433 Midterm (Tuesday)
8 pages
Offline - MTP-4 - 100 Marks Q P - 3-5-2025 - 250503 - 225608
No ratings yet
Offline - MTP-4 - 100 Marks Q P - 3-5-2025 - 250503 - 225608
7 pages
Microwave Project
No ratings yet
Microwave Project
11 pages
Normal Probability Plot: Shibdas@isical - Ac.in
No ratings yet
Normal Probability Plot: Shibdas@isical - Ac.in
6 pages
Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms
No ratings yet
Ics 2328 Computer Oriented Statistical Modeling Assignment March 2024 Ms
6 pages
2CSOE03-O IR December 2023
No ratings yet
2CSOE03-O IR December 2023
4 pages
Ph. D. in Technical Sciences, Russia: Associate Professor Dr. Said Elshahat Abdallah
No ratings yet
Ph. D. in Technical Sciences, Russia: Associate Professor Dr. Said Elshahat Abdallah
17 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Sample Statistics Exam Q37 Q60
No ratings yet
Sample Statistics Exam Q37 Q60
2 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Data Engineer Requirment
No ratings yet
Data Engineer Requirment
2 pages
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
The Surprise Attack in Mathematical Problems
From Everand
The Surprise Attack in Mathematical Problems
Louis A. Graham
4/5 (1)
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)

02450ex Fall2018

Uploaded by

02450ex Fall2018

Uploaded by

Technical University of Denmark

Written examination: December 18th 2018, 9 AM - 1 PM.

Course name: Introduction to Machine Learning and Data Mining.

Course number: 02450.

Aids allowed: All aids permitted.

Exam duration: 4 hours.

Weighting: The individual questions are weighted equally.

PLEASE HAND IN YOUR ANSWERS DIGITALLY.

Table 1: Description of the features of the Avila Bible

B. Boxplot 1 is upperm, Boxplot 2 is lowerm, Boxplot 0.4

C. Boxplot 1 is upperm, Boxplot 2 is peak nr., Box- 0

plot 3 is mr/is and Boxplot 4 is lowerm -0.2

Which one of the following statements is true?

A. The variance explained by the first principal com-

B. The variance explained by the first four principal

C. The variance explained by the last four principal

D. The variance explained by the first three principal

where | · | is the determinant. The components of the

where NX \i (xi , K) is the set of K nearest neighbors

E. Don’t know. what is the normalized mutual information of the two

D. Accuracy is: 0.43

E. Don’t know. A. cosine similarity of s1 and s2 is 0.047619

B. cosine similarity of s1 and s2 is 0.000044

A. A: x7 ≥ 0.5, B: x9 ≥ 0.54, C: x9 ≥ 0.35, D:

B. A: x7 ≥ 0.5, B: x9 ≥ 0.26, C: x9 ≥ 0.54, D:

C. A: x9 ≥ 0.54, B: x7 ≥ 0.5, C: x9 ≥ 0.35, D:

D. A: x9 ≥ 0.26, B: x7 ≥ 0.5, C: x9 ≥ 0.35, D:

Question 18. We will again consider the binarized

Recall the key step in the Apriori algorithm is to con-

Table 7: Tabulation of each of the predicted outputs

Question 23. Suppose a neural network is trained to

where h(1) (x) = max(x, 0) is the rectified linear func-

Which of the curves in Figure 12 will then correspond

A. KNN classifier 1 corresponds to p = ∞, KNN

B. KNN classifier 1 corresponds to p = 4, KNN

C. KNN classifier 1 corresponds to p = 4, KNN

D. KNN classifier 1 corresponds to p = ∞, KNN

and with corresponding class labels y1 = 0 and y2 = 1,

You might also like