Endsem ML Regular AK
Endsem ML Regular AK
Q.1 The results of the election are to be predicted for candidates based on dataset D.
There are three different hypotheses h1, h2 and h3 are used to predict the result
of candidates winning or losing an election. The probability of h1 given dataset D
is 0.5, the probability of h2 given dataset D is 0.3 and the probability of h3 given
dataset D is 0.2. Given a new candidate, h1 predicts that a candidate will win the
election whereas h2 and h3 predict that candidate will lose the election. What’s
the most probable classification of a new candidate? [3 Marks]
Solution:
+ = win, - = lose
Q.2 Use kernel trick and find the equation for hyperplane using nonlinear SVM.
Positive Points: {(1,0), (3,0), (5,0)} Negative Points: {(0,0), (2,0), (4,0), (6,0)}. Plot
the point before and after the transformation. [5 Marks]
Solution:
(x) = x mod 2 [3M]
Equation of hyperplane : y=0.5 [2M]
Q.3 Suppose we have the following one-dimensional data at -4.0, -.3.0, -2.0, -1.0, 0.0,
1.0, 2.0, 3.0, 4.0. Use the EM algorithm to find a Gaussian mixture model consisting
of exactly one Gaussian that fits the data. Assume that the initial mean of the
Gaussian is 10.0 and the initial variance is 1.0. [7 Marks]
Answer
First we note that 𝜋1 = 1 since there is only one Gaussian in the mixture model.
Computing the posterior probabilities 𝑃(𝑧𝑛1 = 1⁄𝑥𝑛 ) = 𝛾(𝑧𝑛1 ) we see that the
posterior probabilities are all equal to 1 since both the numerator and
denominator are equal to 𝜋1 𝑁(𝑥𝑛 ⁄𝜇1 , Σ1 ). [1.5 M]
1
and Σ𝑘𝑛𝑒𝑤 = ∑𝑛=𝑁(𝑥 − 𝜇1𝑛𝑒𝑤 )(𝑥𝑛 − 𝜇1𝑛𝑒𝑤 )𝑇 .
𝑁1 𝑛=1 𝑛
Here the 𝑥𝑛 and 𝜇1𝑛𝑒𝑤 are 1 × 1 matrices and the expression for Σ𝑘𝑛𝑒𝑤 simplifies to
∑𝑁 2
𝑛=1 𝑥𝑛 2∗(4.02 +3.02 +2.02 +1.02 )
𝑁
which is 9
=6.66. [2M]
In the next iteration the E-step computes the posterior probabilities to be 1 and
the M-step computes the same mean and covariance matrix as above, so the
algorithm converges.[1M]
Q.4 A dataset 𝐷 consists of the results of 100 independent coin tosses of the same
coin where 30 turn out to be heads and 70 turn out to be tails. Let 𝑝 be the
probability of tossing a head. How many datasets on 100 coin tosses are possible
which have the same likelihood as the given dataset 𝐷? Determine the maximum
likelihood estimate of the parameter 𝑝 using appropriate calculations. [5 Marks]
Answer: The likelihood of the given data 𝐷 given 𝑝 is 𝑃(𝐷⁄𝑝) = 𝑝30 (1 − 𝑝)70 .
Any other dataset 𝐷 ′ with the same number of heads and tails as 𝐷 will have the
same likelihood given the same probability 𝑝 of tossing a head. [1M]
X1 X2 Y
-1 -1 Positive class
-1 1 Negative class
1 -1 Negative class
1 1 Positive class
Answer the following with respect to above dataset
a) Comment on the separability of the dataset with an explanation.[0.5M]
b) Provide the 1-dimensional transformation of this dataset for each linearly and non-
linearly separable case with justification[2M]
c) Model the above dataset with an Artificial Neural Network which has two hidden
layers, each of which contains two units. Assume that the weights in each layer are set
to 1 so that the top unit in each layer applies sigmoid activation to the sum of its inputs
and the bottom unit in each layer applies tanh activation to the sum of its inputs.
Finally, the single output node applies ReLU activation to the sum of its two inputs.
Write the output of this neural network in closed form as a function of x1 and x2. (no
need to calculate exact values) [2.5M]
Solution:
In b) just transformation with no explanation deduct 0.5M in each case.
In c) any mistake in function – 0 M
Q.6 Solve the below and find the equation for hyper plane using linear Support Vector
Machine method. Positive Points: {(3, 2), (4, 3), (2, 3), (3, -1)} Negative Points: {(1, 0),
(-1, -1), (0, 2), (-1, 2)} [5 Marks]
A. Find the support vectors
B. Determine the equation of hyperplane if it is changed and give a reason if it is not
changed for the following two cases
a. If the point (2, 3) is removed.
b. If the point (5,4) is added
Solution:
A. Support vectors are (2,3), (1,0) and (3,-1) [1M- if one of the SVs is wrong then 0 M]
Equation of decision Hyperplane - 2M
The solution obtained using the Lagrange method or by geometrical inspection is
acceptable
B. a. If the point (2, 3) is removed. [1M – 0.5M if no reason is given]:
The equation of the decision hyperplane will change as the removed point is the
support vector.
b. If the point (5,4) is added [1M – 0.5M if no reason is given]: The equation of the
decision hyperplane will not change as the added point is not support vector.
Q.7 Consider training a boosting classifier using decision stumps on the following data set.
Circle the examples which will have their weights increased at the end of each
iteration. Run the iteration till zero training error is achieved. [3 Marks]
Solution:
No of iteration - 3
Q.8 Consider the following dataset. [7 Marks]
x1 -1 -1 -1 -1 0 4 4
x2 2 1 -1 -2 0 2 -2
Class label 1 1 1 1 1 2 2
The given class label exhibits two natural clusters formed in the given dataset and acts as
a ground truth. Now remove class labels and use the K-means clustering algorithm to find
the 2 clusters by initializing two cluster centres as follows:
A. C1(-1,2) and C2(0,0)
B. C1(-0.5,0) and C2(0,0)
For both the above cases run the algorithm till centres do not change (convergence
criteria) and give the final cluster assignment [2+2M]. In each case, comment on the
correctness of cluster assignment. [1+1M] Also, comment in no more than 20 words on
the drawback of k-means which is depicted in above two cases.[1M]
Solution:
A.
c1 c2
x1 -1 -1 -1 -1 0 4 4
x2 2 1 -1 -2 0 2 -2
d1 0 1 3 4 2.236068 5 6.403124
d2 2.236068 1.414214 1.414214 2.236068 0 4.472136 4.472136
cluster 1 1 2 2 2 2 2
new c1 new c2
x1 -1 1.2
x2 1.5 -0.6
d1 0.5 0.5 2.5 3.5 1.802776 5.024938 6.103278
d2 3.405877 2.720294 2.236068 2.607681 1.341641 3.820995 3.130495
cluster 1 1 2 2 2 2 2
new c1 new c2
x1 -1 1.2
x2 1.5 -0.6
Comment on cluster assignment:
Algorithm has converged after 2 iterations but the cluster assignment does not
depict the natural clusters in the datasets as given by the ground truth.
B
c1 c2 c1
x1 -1 -1 -1 -1 0 4 4 -0.5
x2 2 1 -1 -2 0 2 -2 0
d1 2.061553 1.118034 1.118034 2.061553 0.5 4.924429 4.924429
d2 2.236068 1.414214 1.414214 2.236068 0 4.472136 4.472136
cluster 1 1 1 1 2 2 2
new c1 new c2
x1 -1 2.666667
x2 0 0
d1 2 1 1 2 1 5.385165 5.385165
d2 4.176655 3.800585 3.800585 4.176655 2.666667 2.403701 2.403701
cluster 1 1 1 1 1 2 2
new c1-
mean new c2
x1 -0.8 4
x2 0 0
d1 2.009975 1.019804 1.019804 2.009975 0.8 5.2 5.2
d2 5.385165 5.09902 5.09902 5.385165 4 2 2
cluster 1 1 1 1 1 2 2
new c1 new c2
x1 -0.8 4
x2 0 0