Nptel ML Questions
Nptel ML Questions
Nptel ML Questions
training data points are linearly separable. In general, will the classifier trained in this
manner be always the same as the classifier trained using the perceptron training algorithm
on the same training data
No
2) Consider the case where two classes follow Gaussian distribution which are centred at (−1,
2) and (1, 4) and have identity covariance matrix. Which of the following is the separating
decision boundary using LDA?
x+y=3
Test specification
4) Consider a modified k-NN method in which once the k nearest neighbours to the query point
are identified, you do a linear regression fit on them and output the fitted value for the
query point. Which of the following is/are true regarding this method
The lasso constraint is a high dimensional rhomboid while the ridge regression is a high
dimensional ellipsoid
6) Suppose we are trying to model a p dimensional Gaussian distribution. What is the actual
number of independent parameters that need to be estimated?
p(p+3)/2
7) If the number of features is larger than the number of training data points, to identify a
suitable subset of the features for use with linear regression, we would prefer
8) We have seen methods like ridge and lasso to reduce variance among the co-efficients. We
can use these methods to do feature selection also. Which one of them is more appropriate?
Lasso
9) What assumption does the CURE clustering algorithm make with regards to the shape of the
clusters?
No assumption
10) Which method among bagging and stacking should be chosen in case of limited training
data? and What is the appropriate reason for your preference?
11) Consider the following distribution of training data: Which method would you choose for
dimensionality reduction
12) For training a binary classification model with three independent variables, you choose to
use neural networks. You apply one hidden layer with three neurons. What are the number
of parameters to be estimated? (Consider the bias term as a parameter)
21
13) Given N samples x1, x2, . . . , xN drawn independently from a Gaussian distribution with
variance σ2 and unknown mean µ, find the MLE of the mean
14) In the context of Reinforcement Learning algorithms, which of the following definitions
constitutes a valid Markov State?
15) Given below are some properties of different classification algorithms. In which among the
following would you expect feature
normalisation to be useful?
uses a measure of distance between points
uses ridge regression
attempts to identify the maximum-margin hyperplane
16)
17)
18) In Gaussian Mixture Models, πi are the mixing coefficients. Select the correct conditions that
the mixing coefficients need to satisfy for a valid GMM model.
−1 ≤ πi ≤ 1, ∀i
0 ≤ πi ≤ 1, ∀i
19) Which of the following properties are true in the context of decision trees?
High variance
Lack of smoothness of prediction surfaces
20)
21)
22) Solution: A 5. Which of the following graphical models capture the Naive Bayes assumption,
where c represents the class label and fi are the features?
23) Based on a survey, it was found that the probability that a student likes to play football was
0.25 and the probability that a student likes to play cricket is 0.43. It was also found that the
probability that a student likes to play both football and cricket is 0.12. What is the
probability that a student does not like to play either?
0.44
24) For the ROC curve of True positive rate vs False positive rate, which of the following are
true?
25) Which of the following are true about bias and variance of overfitted and underfitted
models?
26) Considering the AdaBoost algorithm, which among the following statements are false?
in each stage, we try to train a classifier which makes accurate predictions on any subset of
the data points where the subset size is at least half the size of the data set
The weight assigned to an individual classifier depends upon the number of data points
correctly classified by the classifier
27)
28) Consider the Bayesian network given in the previous question. Let ‘A’, ‘B’, ‘C’, ‘D’and
‘E’denote the random variables shown in the network. Which of the following can be
inferred from the network structure?
none of the above can be inferred
29) ) Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8
and 18. Given k = 3 and initial cluster centers to be 5, 6 and 31, what are the final cluster
centres obtained on applying the k-means algorithm?
4.8, 17.6, 32
30) For the previous question, in how many iterations will the k-means algorithm converge?
31)
0.098
32)
None of these
33)
0.006144
34) Using the data given in the previous question, compute the probability of following
assignment, P(i = 1, g = 1, s = 1, l = 0) irrespective of the difficulty of the course? (up to 3
decimal places)
0.047
35)
ANS\\
36) Does there exist a more compact factorization involving less number of factors for the
distribution given in previous question
No
37)
Considering ’profitable’ as the binary valued attribute we are trying to predict, which of the
attributes would you select as the root in a decision tree with multi-way splits using the
information gain measure?
capacity
38) For the same data set, suppose we decide to construct a decision tree using binary splits and
the Gini index impurity measure. Which among the following feature and split point
combinations would be the best to use as the root node assuming that we consider each of
the input features to be unordered?
39) In the above data set, what is the value of cross entropy when we consider capacity as the
attribute to split on (multi-way splits)? (You can round-off the cross entropy value to the
nearest 4-decimal place number)
0.8382
40)
0.4615