Ilovepdf Merged
Ilovepdf Merged
Ans: b
Ans: a
3. Friendship structure of users in a social networking site can be considered as an example of:
a) Record data
b) Ordered data
c) Graph data
d) None of the above
Ans: c
Ans: a
Explanation: Nominal-related to names. The values of a Nominal attribute are name of things,
some kind of symbols. There is no order (rank, position) among values of nominal attribute.
Ans: b
Ans: a
Explanation: Sample is the subset of the population. The process of selecting a sample is known as
sampling.
Ans: a
8. Based on the results in (7), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.5, 0.5
b) 1, 0.25
c) 0.25, 1
d) 0.75, 0.25
Ans: b
Ans: d
Support({b,d}) = 5/5 = 1
10. Based on the results in (9), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.8, 1
b) 1, 0.8
c) 0.25, 1
d) 1, 0.25
Ans: a
A. Regression problems
B. Classification problems
Ans: C
2. Entropy value of ____ represents that the data sample is pure or homogenous: (1 Mark)
A. 1
B. 0
C. 0.5
Ans: B
3. Entropy value of _____ represents that the data sample has a 50-50 split belonging to two categories: (1
mark)
A. 1
B. 0
C. 0.5
Ans: A
A. the internal nodes in a branch are connected by AND and the branches by AND
C. the internal nodes in a branch are connected by AND and the branches by OR
D. the internal nodes in a branch are connected by OR and the branches by AND
Ans: C
Explanation: definition of decision tree.
if C2 then
if C1 then A3
else A2
endif
else A1, A3
endif
A.
B.
C.
D.
Ans: C
Low F F
Low T T
Medium F F
Medium T T
6. What is the entropy of the dataset? (1
Mark) High F T
A. 0.50 High T T
B. 0.92
C. 1
D. 0
Ans: B
7. Which attribute would information gain choose as the root of the tree? (2 Marks)
A. GPA
B. Studied
C. Passed
Ans: B
Explanation: From information gain criterion. The Studied has the highest information gain.
8. A chemical company has three options: (i) commercial production, (ii) pilot plant and (iii) no
production. The cost of constructing a pilot plant is Rs 3 lacs. If a pilot plant is built, chances of high
and low yield are 80% and 20% respectively. In the case of high yield from the pilot plant, there is
a 75% chance of high yield from the commercial plant. In the case of low yield from the pilot plant,
there is only a 10% chance of high yield from the commercial plant. If the company goes for
commercial plant directly without constructing a pilot plant, then there are 60% chance of high
yield. The company earns Rs 1,20,00,000 in high yield and loses Rs 12,00,000 in low yield. The
optimum decision for the company is: (2 marks)
A. Commercial Production.
B. Pilot plant
C. No Production
Ans: A
Explanation: The company should produce commercially. The final estimated cost is Rs 67,20,000
1. In a multiclass classification problem, Bayes classifier assigns an instance to the class corresponding
to: (1 Mark)
Ans: A
B. Attributes are statistically dependent on one another given the class value.
Ans: B
Explanation: Attributes are statistically independent of one another given the class value.
3. A fair coin is tossed n times. The probability that the difference between the number of heads and
tails is (n-3) is: (1 mark)
A. 2-n
B. 0
C. C(n,n-3)*2-n
D. 2-n+3
Ans: B
Explanation: Let the number of heads = h then the number of tails will be n-h. The difference between
them is n-3 so it is
h – (n - h) = n-3
h = (2n-3)/2 = n – 3/2 which is not an integer value, therefore, the probability of the event is 0.
4. Three companies supply bulbs. The percentage of bulbs supplied by them and the probability of them
being defective is given below:
A. 0.1
B. 0.2
C. 0.3
D. 0.4
Ans: D
P(D) = P(D|A) * P(A) + P(D|B) * P(B) + P(D|C) * P(C) = 0.01*0.6 + 0.02*0.3 + 0.03*0.10 = 0.015
A. 0
B. 2/3
Ans: B
Explanation: P(Z|X∩Y) = P(Z|X) since P(Y) = 1. Therefore, P(Z|X∩Y) = P(Z∩X)/P(X) = 0.2/0.3= 2/3
For questions 6-7, consider the following hypothetical data regarding the hiring of a person.
6. Using Naïve Bayes determine whether a person with GPA=High, Effort=Some, and Confidence=Yes be
hired: (2 marks)
A. Yes
B. No
Ans: A
Explanation:
7. Using Naïve Bayes determine whether a person with Effort=lots, and Confidence=No be hired: (2
marks)
A. Yes
B. No
Ans: A
x y z w
The Bayesian Network is fully specified by the marginal probabilities of the root node(x) and the
conditional probabilities.
A. 0.70
B. 0.12
C. 0.64
D. 0.36
Ans: C
A. 0.50
B. 0.60
C. 0.46
D. 0
Ans: C
Explanation:
B. 0.63
C. 1
Ans: B
4. Consider a binary classification problem with two classes C1 and C2. Class labels of ten other training
set instances sorted in increasing order of their distance to an instance x is as follows: {C1, C2, C1, C2,
C2, C2, C1, C2, C1, C2}. How will a K=5 nearest neighbor classifier classify x? (1 mark)
B. C1
C. C2
Ans: C
Explanation: closest 3 neighbors are C1, C2, C1, C2, C2. In this C1 has 2 occurrences and C2 has 3
occurrence, therefore, by majority voting X will be classified as C2.
You are given the following set of training examples. Each attribute can take value either 0 or 1.
A1 A2 A3 Class
0 0 1 C1
0 1 0 C1
0 1 1 C1
1 0 0 C2
1 1 0 C1
1 1 1 C2
5. How would a 3-NN classify the example A1 = 1, A2 = 0, A3 = 1 if the distance metric is Euclidean
distance? (1 mark)
A. C1
B. C2
Ans: B
Explanation: We get minimum distance of 1 with points (0,0,1), (1,0,0), (1,1,1) which are classified as C1,
C2, C2; since majority is C2 therefore class is C2.
6. How would a 3-NN classify the example A1 = 0, A2 = 0, A3 = 0 if the distance metric is Euclidean
distance? (1 mark)
A. C1
B. C2
Ans: A
Explanation: We get minimum distance of 1 with points (0,0,1), (0,1,0), (1,0,0) which are classified as C1,
C1, C2; since majority is C1 therefore class is C1.
Ans: D
Ans: C
Ans: A
3. The primal optimization problem solved to obtain the hard margin optimal
separating hyperplane is:
Ans: A
4. The dual optimization problem solved to obtain the hard margin optimal separating
hyperplane is:
A. equal to zero
Ans: C
Ans: C
A. Linear programming
B. Quadratic programming
C. Dynamic programming
D. Integer programming
Ans: B
8. The relative performance of a SVM on training set and unknown samples is
controlled by:
A. Lagrange multipliers
B. Margin
C. Slack
D. Generalization constant C
Ans: D
9. The primal optimization problem that is solved to obtain the optimal separating
hyperplane in soft margin SVM is:
Ans: B
10. We are designing a SVM WTX+b=0, suppose Xj’s are the support vectors and αj’s
the corresponding Lagrange multipliers, then which of the following statements are
correct:
A. W = αjyjXj
B. αjyj = 0
C. Either A or B
D. Both A and B
Ans: D
Data Mining: Assignment Week 6: ANN
A. Pattern Recognition
B. Classification
C. Clustering
Ans: D
Explanation: ANN are used for all the given tasks in the options.
2. A perceptron can correctly classify instances into two classes where the classes are:
A. Overlapping
B. Linearly separable
C. Non-linearly separable
Ans: B
3. The logic function that cannot be implemented by a perceptron having two inputs
is?
A. AND
B. OR
C. NOR
D. XOR
Ans: D
A. wi ← wi + h(t - o)
B. wi ← wi + h(t - o) x
C. wi ← h(t - o) x
D. wi ← wi + (t - o) x
Ans: B
Δ wi= h(t - o) x
where t is the target output for the current training example, o is the output generated
by the perceptron, and h is a positive constant called the learning rate.
5. A neuron with 3 inputs has the weight vector [0.2 -0.1 0.1]^T and a bias θ = 0. If
the input vector is X = [0.2 0.4 0.2]^T , then the total input to the neuron is:
A. 0.2
B. 0.02
C. 0.4
D. 0.10
Ans: B
A. E≡ 1 ∑ ( t i−oi )
2 i=1 .. n
B. 1
E≡ ∑ ( t −o ) 2
2 i=1 .. n i i
1
C. E≡ 2 ∑ ( t i +o i )
2
i=1 .. n
D. E≡ 1 ∑ ( t i +o i )
2 i=1 .. n
Ans : B
1
Explanation: error function is E≡ ∑ ( t −o )2
2 i=1 .. n i i
where t is the target output for the current training example, o is the output generated
by the perceptron.
2
7. The tanh activation function h ( z )= −2 z
−1 is:
1+e
Ans: D
8. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
1
X1 h(X)
5
X2
5
A. OR
B. AND
C. NAND
D. NOR
Ans: A
For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
OR.
9. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
8
X1 h(x)
5
X2
5
A. OR
B. AND
C. NAND
D. NOR
Ans: B
For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
AND.
A.With training iterations, error on training set as well as test set decreases
B. With training iterations, error on training set decreases but test set increases
C. With training iterations, error on training set as well as test set increases
D. With training iterations, training set as well as test set error remains constant
Ans: B
Explanation: Overfitting is when training error decreases and test error increases.
Data Mining: Assignment Week 7: Clustering
Ans: C
Explanation: A good clustering technique is one which produces high quality clusters
in which intra-cluster similarity (i.e. intra cluster distance) is low and the inter-cluster
similarity (i.e. inter cluster distance) is high.
Ans: A
B. K-means clustering
C. DBSCAN
Ans: A
Ans: C
5. Which of the following clustering algorithm uses a minimal spanning tree concept?
C. DBSCAN
Ans: B
Explanation: The naive algorithm for single-linkage clustering has time complexity O(n3). An alternative algorithm
is based on the equivalence between the naive algorithm and Kruskal's algorithm for minimum spanning trees.
Instead of using Kruskal's algorithm, Prim's algorithm can also be used.
C. Distance between the most centrally located pair of points in the clusters
Ans: A
D(X,Y) = min d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.
7. Distance between two clusters in complete linkage clustering is defined as:
C. Distance between the most centrally located pair of points in the clusters
Ans : B
D(X,Y) = max d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.
8. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5, 7),
and p5=(0, 0.5). Euclidean distance is the distance function used. Single linkage clus-
tering is used to cluster the points into two clusters. The clusters are:
A. {p1, p2, p3} {p4, p5}
Ans : C
Explanation: find the Euclidean distance between the points and cluster together
points having minimum Euclidean distance.
P1 P2 P3 P4 P5
P1 0
P2 1 0
P3 9.4 8.60 0
2
P4 8.60 7.81 1 0
2
P5 0.5 0.5 9.01 8.2 0
{P1, P5} and {P2, P5} has minimum distance. We will choose {P1, P5} and cluster
them together.
We will evaluate the distance of all the points from the cluster {P1, P5}. Taking
minimum distance.
P1, P5 P2 P3 P4
P1, P5 0
P2 0.5 0
P3 9.01 8.602 0
P4 8.2 7.81 1 0
(P1, P5) and P2 has minimum distance. We will cluster them together.
P1, P2, P3 P4
P5
P1, P2, 0
P5
P3 8.602 0
P4 7.81 1 0
(P3, P4) has minimum distance. They will be clustered together.
Two clusters obtained are {P1, P2, P5} and {P3, P4}.
B. The final cluster obtained depends on the choice of initial cluster centres
Ans: D
10. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5,
7), and p5=(0, 0.5). Euclidean distance is the distance function. The k-means algorithm
is used to cluster the points into two clusters. The initial cluster centers are p1 and p4.
The clusters after two iterations of k-means are:
A. {p1, p4, p5} {p2, p3}
Ans: B
P1 0 8.602 c1
P2 1 7.81 c1
P3 9.4 1 c2
P4 8.602 0 c2
P5 0.5 8.2 c1
2nd iteration
Clusters after 1st iteration are:
C1 = {P1, P2, P5} cluster centre is c1= (0, 0.5)
C2 = {P3, P4} cluster centre is c2= (5, 7.5)
P1 0.5 9.01 c1
P2 0.5 8.2 c1
P3 9.01 0.5 c2
P4 8.2 0.5 c2
P5 0 8.6 c1
Clusters formed after 2nd iteration are {P1, P2, P5} and {P3, P4}.
Data Mining: Assignment Week 8: Regression
Ans: A
A. 21
B. -21
C. 3
D. -3
Ans: D
Explanation: slope intercept form of a line is y=mx+c.
A. real variable
B. integer variable
C. character variable
D. string variable
Ans: A
Ans: C
5. The linear regression model y = a0 + a1x is applied to the data in the table
shown below. What is the value of the sum squared error function S(a0, a1),
when a0 = 1, a1 = 2?
x y
1 1
2 1
4 6
3 2
A. 0.0
B. 27
C. 13.5
D. 54
Ans: D
Explanation: y’ is the predicted output.
y’ = 1+2x
x y y’
1 1 3
2 1 5
4 6 9
3 2 7
A. y = a0 + a1/x1 + a2/x2
B. y = a0 + a1x1 + a2x2
C. y = a0 + a1x1 + a2x22
D. y = a0 + a1x12 + a2x2
Ans: B
A. 1,3
B. 2,3
C. 1,2,3
D. Eigenvalues cannot be found.
Ans: C
Explanation: If A is an n × n triangular matrix (upper triangular, lower
triangular, or diagonal), then the eigenvalues of A are entries of the main
diagonal of A. Therefore, eigenvalues are 1,2,3.
8. In the figures below the training instances for classification problems are
described by dots. The blue dotted lines indicate the actual functions and the
red lines indicate the regression model. Which of the following statement is
correct?
A. Figure 1 represents overfitting and Figure 2 represents underfitting
Ans: B
Ans: B
Explanation: We must first subtract the mean of each variable from the dataset to cen-
ter the data around the origin. Then, we compute the covariance matrix of the data and
calculate the eigenvalues and corresponding eigenvectors of this covariance ma-
trix. Then we must normalize each of the orthogonal eigenvectors to become unit vectors.
Once this is done, each of the mutually orthogonal, unit eigenvectors can be interpreted as
an axis of the ellipsoid fitted to the data. This choice of basis will transform our covariance
matrix into a diagonalised form with the diagonal elements representing the variance of
each axis.
10. A time series prediction problem is often best solved using?
A. Multivariate regression
B. Autoregression
C. Logistic regression
D. Sinusoidal regression
Ans : B