0% found this document useful (0 votes)
85 views30 pages

Ilovepdf Merged

The document contains 10 multiple choice questions related to data mining concepts like preprocessing, attributes, market basket analysis and association rule mining. The questions test concepts like nominal vs continuous attributes, candidate itemsets generation, support and confidence calculations for association rules and market basket analysis.

Uploaded by

KEERTHANA K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views30 pages

Ilovepdf Merged

The document contains 10 multiple choice questions related to data mining concepts like preprocessing, attributes, market basket analysis and association rule mining. The questions test concepts like nominal vs continuous attributes, candidate itemsets generation, support and confidence calculations for association rules and market basket analysis.

Uploaded by

KEERTHANA K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Assignment 1(2024)

All Questions are of 1 mark.

1. The earliest step in the data mining process is usually?


a) Visualization
b) Preprocessing
c) Modelling
d) Deployment

Ans: b

Explanation: Preprocessing is the earliest step in data mining.

2. Which of the following is an example of continuous attribute?:


a) Height of a person
b) Name of a person
c) Gender of a person
d) None of the above

Ans: a

Explanation: Height of a person is Real Number.

3. Friendship structure of users in a social networking site can be considered as an example of:
a) Record data
b) Ordered data
c) Graph data
d) None of the above

Ans: c

Explanation: Friendship is an edge in a graph with users as nodes.

4. Name of a person, can be considered as an attribute of type?


a) Nominal
b) Ordinal
c) Interval
d) Ratio

Ans: a

Explanation: Nominal-related to names. The values of a Nominal attribute are name of things,
some kind of symbols. There is no order (rank, position) among values of nominal attribute.

5. A store sells 15 items. Maximum possible number of candidate 2-itemsets is:


a) 120
b) 105
c) 150
d) 2

Ans: b

Explanation: Number of ways of choosing 2 items from 15 items is 15C2 = 105


6. If a record data matrix has reduced number of rows after a transformation, the
transformation has performed:
a) Data Sampling
b) Dimensionality Reduction
c) Noise Cleaning
d) Discretization

Ans: a

Explanation: Sample is the subset of the population. The process of selecting a sample is known as
sampling.

Answer Q7-Q10 based on the following table:

Customer ID Transaction ID Items Bought


1 1 {a,d,e}
1 2 {a,b,c,e}
2 3 {a,b,d,e}
2 4 {a,c,d,e}
3 5 {b,c,e}
3 6 {b,d,e}
4 7 {c,d}
4 8 {a,b,c}
5 9 {a,d,e}
5 10 {a,b,e}
7. Taking transaction ID as a market basket, support for each itemset {e}, {b,d}, and {b,d,e} is:
a) 0.8, 0.2, 0.2
b) 0.3, 0.3, 0.4
c) 0.25, 0.25, 0.5
d) 1,0,0

Ans: a

Explanation: support of {e} = 8/10, {b,d} = 2/10, {b,d,e} = 2/10.

8. Based on the results in (7), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.5, 0.5
b) 1, 0.25
c) 0.25, 1
d) 0.75, 0.25

Ans: b

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.2/0.2 = 1.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.2/0.8 = 0.25.

9. Repeat (7) by taking customer ID as market basket. An item is treated as 1 if it appears in at


least one transaction done by the customer, 0 otherwise. Support of itemsets {e}, {b,d},
{b,d,e} are:
a) 0.3, 0.5, 0.2
b) 0.8, 1, 0.2
c) 1, 0.2, 0.8
d) 0.8, 1, 0.8

Ans: d

Explanation: Treating each customer id as a market basket.

Customer ID Items Bought


1 {a,d,e}, {a,b,c,e}
2 {a,b,d,e}, {a,c,d,e}
3 {b,c,e}, {b,d,e}
4 {c,d}, {a,b,c}
5 {a,d,e}, {a,b,e}
Support({e}) = 4/5 = 0.8

Support({b,d}) = 5/5 = 1

Support({b,d,e}) = 4/5 = 0.8

10. Based on the results in (9), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.8, 1
b) 1, 0.8
c) 0.25, 1
d) 1, 0.25

Ans: a

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.8/1 = 0.8.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.8/0.8 = 1.


Assignment 2(2024)

1. A decision tree can be used to build models for: (1 Mark)

A. Regression problems

B. Classification problems

C. Both of the above

D. None of the above

Ans: C

Explanation: Decision is used for both regression and classification problems.

2. Entropy value of ____ represents that the data sample is pure or homogenous: (1 Mark)

A. 1

B. 0

C. 0.5

D. None of the above.

Ans: B

Explanation: A pure or homogenous data sample is 0.

3. Entropy value of _____ represents that the data sample has a 50-50 split belonging to two categories: (1
mark)

A. 1

B. 0

C. 0.5

D. None of the above

Ans: A

Explanation: Entropy = - 0.5log2 0.5 – 0.5log20.5 = 1

4. If a decision tree is expressed as a set of logical rules, then: (1 Mark)

A. the internal nodes in a branch are connected by AND and the branches by AND

B. the internal nodes in a branch are connected by OR and the branches by OR

C. the internal nodes in a branch are connected by AND and the branches by OR

D. the internal nodes in a branch are connected by OR and the branches by AND

Ans: C
Explanation: definition of decision tree.

5. The Decision tree corresponding to the following is? (1 Mark)

if C2 then
if C1 then A3
else A2
endif
else A1, A3
endif

A.

B.

C.

D.

Ans: C

Explanation: option c is the valid DT for the rule.


For questions 6-7, consider the following table depicting whether a customer will buy a flat or not.

GPA Studied Passed

Low F F

Low T T

Medium F F

Medium T T
6. What is the entropy of the dataset? (1
Mark) High F T

A. 0.50 High T T

B. 0.92

C. 1

D. 0

Ans: B

Explanation: Entropy(2,4) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.92

7. Which attribute would information gain choose as the root of the tree? (2 Marks)

A. GPA

B. Studied

C. Passed

D. None of the above

Ans: B

Explanation: From information gain criterion. The Studied has the highest information gain.

8. A chemical company has three options: (i) commercial production, (ii) pilot plant and (iii) no
production. The cost of constructing a pilot plant is Rs 3 lacs. If a pilot plant is built, chances of high
and low yield are 80% and 20% respectively. In the case of high yield from the pilot plant, there is
a 75% chance of high yield from the commercial plant. In the case of low yield from the pilot plant,
there is only a 10% chance of high yield from the commercial plant. If the company goes for
commercial plant directly without constructing a pilot plant, then there are 60% chance of high
yield. The company earns Rs 1,20,00,000 in high yield and loses Rs 12,00,000 in low yield. The
optimum decision for the company is: (2 marks)

A. Commercial Production.

B. Pilot plant
C. No Production

D. None of the above.

Ans: A

Explanation: The company should produce commercially. The final estimated cost is Rs 67,20,000

For Commercial Production:

Estimated cost = 0.6x12000000 – 0.4x1200000 = 67,20,000

For Pilot Plant:

Estimated cost = 0.8x0.75x12000000 – 0.8x0.25x1200000 + 0.2x0.10x12000000 – 0.8x0.9x1200000


- 300000 = 60,36,000
Assignment Week 3: Bayes Classifier

1. In a multiclass classification problem, Bayes classifier assigns an instance to the class corresponding
to: (1 Mark)

A. Maximum aposteriori probability

B. Maximum apriori probability

C. Lowest aposteriori probability

D. Lowest apriori probability

Ans: A

Explanation: Bayes classifier is also known as MAP (Maximum Aposteriori Probability.)

2. Which of the following is incorrect about Naive Bayes: (1 mark)

A. Attributes can be nominal or numeric

B. Attributes are statistically dependent on one another given the class value.

C. Attributes are equally likely.

D. All of the above.

Ans: B

Explanation: Attributes are statistically independent of one another given the class value.

3. A fair coin is tossed n times. The probability that the difference between the number of heads and
tails is (n-3) is: (1 mark)

A. 2-n

B. 0

C. C(n,n-3)*2-n

D. 2-n+3

Ans: B

Explanation: Let the number of heads = h then the number of tails will be n-h. The difference between
them is n-3 so it is

h – (n - h) = n-3

h = (2n-3)/2 = n – 3/2 which is not an integer value, therefore, the probability of the event is 0.
4. Three companies supply bulbs. The percentage of bulbs supplied by them and the probability of them
being defective is given below:

Company % of bulbs supplied Probability of defective


A 60 0.01
B 30 0.02
C 10 0.03
Given that the bulb is defective probability that it is supplied by B is: (2 marks)

A. 0.1

B. 0.2

C. 0.3

D. 0.4

Ans: D

Explanation: P(B|D) = (P(D|B)*P(B))/P(D)

P(D|B) * P(B) = 0.02 * 0.3 = 0.006

P(D) = P(D|A) * P(A) + P(D|B) * P(B) + P(D|C) * P(C) = 0.01*0.6 + 0.02*0.3 + 0.03*0.10 = 0.015

P(B|D) = 0.006/0.015 = 0.4

5. If P(Z∩X) = 0.2, P(X) = 0.3, P(Y) = 1 then P(Z|X∩Y) is: (1 mark)

A. 0

B. 2/3

C. Not enough data.

D. None of the above.

Ans: B

Explanation: P(Z|X∩Y) = P(Z|X) since P(Y) = 1. Therefore, P(Z|X∩Y) = P(Z∩X)/P(X) = 0.2/0.3= 2/3
For questions 6-7, consider the following hypothetical data regarding the hiring of a person.

GPA Effort Confidence Hire


Low Some Yes No
Low Lots Yes Yes
High Lots No No
High Some No Yes
High Lots Yes Yes

6. Using Naïve Bayes determine whether a person with GPA=High, Effort=Some, and Confidence=Yes be
hired: (2 marks)

A. Yes

B. No

C. The example cannot be classified.

D. Both classes are equally likely

Ans: A

Explanation:

P(Hire=Yes|High, Some, Yes) = P(High, Some, Yes|Hire=Yes)P(Hire=Yes) = 4/45

P(Hire=No|High, Some, Yes) = P(High, Some, Yes|Hire=No)P(Hire=No) = 1/20

P(Hire=Yes|High, Some, Yes)> P(Hire=No|High, Some, Yes)

7. Using Naïve Bayes determine whether a person with Effort=lots, and Confidence=No be hired: (2
marks)

A. Yes

B. No

C. The example cannot be classified

D. Both classes are equally likely

Ans: A

Explanation: P(Hire=Yes|Lots, No) = P(Lots, No|Hire=Yes)P(Hire=Yes) = 0.133

P(Hire=No|Lots, No) = P(Lots, No|Hire=No)P(Hire=No) = 0.1

P(Hire=Yes|Lots, No)> P(Hire=No|Lots, No)


Assignment Week 4: Bayes Classifier and KNN(2024)

Q1-3 are based on a simple Bayesian Network shown below:

P(y=1|x=1)=0.40 P(z=1|y=1)=0.25 P(w=1|z=1)=0.45


P(x=1)=0.60 P(y=1|x=0)=0.30 P(z=1|y=0)=0.60 P(w=1|z=0)=0.30

x y z w

The Bayesian Network is fully specified by the marginal probabilities of the root node(x) and the
conditional probabilities.

1. P(y=0) is: (2 marks)

A. 0.70

B. 0.12

C. 0.64

D. 0.36

Ans: C

Explanation: P(y=0) = 1-P(y=1)

P(y=1) = P(y=1|x=0)*P(x=0) + P(y=1|x=1)*P(x=1) = 0.30*0.40 + 0.40*0.60 = 0.36

P(y=0) = 1-0.36 = 0.64

2. P(z=1|x=1) is: (2 marks)

A. 0.50

B. 0.60

C. 0.46

D. 0

Ans: C

Explanation:

P(z=1|x=1) = P(z=1|y=0)*P(y=0|x=1) + P(z=1|y=1)*P(y=1|x=1) = 0.60*0.60 + 0.25*0.40 = 0.46

3. P(w=0|x=1) is: (2 marks)


A. 0.37

B. 0.63

C. 1

D. None of the above

Ans: B

Explanation: P(w=0|x=1) = P(w=0|z=1)* P(z=1|x=1) + P(w=0|z=0)* P(z=0|x=1)

= 0.55*0.46 + 0.70*0.54 = 0.63

4. Consider a binary classification problem with two classes C1 and C2. Class labels of ten other training
set instances sorted in increasing order of their distance to an instance x is as follows: {C1, C2, C1, C2,
C2, C2, C1, C2, C1, C2}. How will a K=5 nearest neighbor classifier classify x? (1 mark)

A. There will be a tie

B. C1

C. C2

D. Not enough information to classify

Ans: C

Explanation: closest 3 neighbors are C1, C2, C1, C2, C2. In this C1 has 2 occurrences and C2 has 3
occurrence, therefore, by majority voting X will be classified as C2.

Consider the following data for questions 5-6.

You are given the following set of training examples. Each attribute can take value either 0 or 1.

A1 A2 A3 Class
0 0 1 C1
0 1 0 C1
0 1 1 C1
1 0 0 C2
1 1 0 C1
1 1 1 C2

5. How would a 3-NN classify the example A1 = 1, A2 = 0, A3 = 1 if the distance metric is Euclidean
distance? (1 mark)

A. C1
B. C2

C. There will be a tie

D. Not enough information to classify

Ans: B

Explanation: We get minimum distance of 1 with points (0,0,1), (1,0,0), (1,1,1) which are classified as C1,
C2, C2; since majority is C2 therefore class is C2.

6. How would a 3-NN classify the example A1 = 0, A2 = 0, A3 = 0 if the distance metric is Euclidean
distance? (1 mark)

A. C1

B. C2

C. There will be a tie

D. Not enough information to classify

Ans: A

Explanation: We get minimum distance of 1 with points (0,0,1), (0,1,0), (1,0,0) which are classified as C1,
C1, C2; since majority is C1 therefore class is C1.

7. Issues with Euclidean measure are: (1 mark)

A. High dimensional data.

B. Can produce counter-intuitive results.

C. Shrinking density – sparsification effect

D. All of the above.

Ans: D

Explanation: all the above are issues of Euclidean measure.


Data Mining: Assignment Week 5: Support Vector Machine

1. Margin of a hyperplane is defined as:

A. The angle it makes with the axes

B. The intercept it makes on the axes

C. Perpendicular distance from its closest point

D. Perpendicular distance from origin

Ans: C

2. In a hard margin support vector machine:

A. No training instances lie inside the margin

B. All the training instances lie inside the margin

C. Only few training instances lie inside the margin

D. None of the above

Ans: A

3. The primal optimization problem solved to obtain the hard margin optimal
separating hyperplane is:

A. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1 for all i

B. Maximize ½ WTW, such that yi(WTXi+b) ≥ 1 for all i

C. Minimize ½ WTW, such that yi(WTXi+b) ≤ 1 for all i

D. Maximize ½ WTW, such that yi(WTXi+b) ≤ 1 for all i

Ans: A

4. The dual optimization problem solved to obtain the hard margin optimal separating
hyperplane is:

A. Maximize ½ WTW, such that yi(WTXi+b) ≥ 1- αi for all i

B. Minimize ½ WTW -  αi(yi(WTXi+b) -1), such that αi ≥ 0, for all i

C. Minimize ½ WTW -  αi, such that yi(WTXi+b) ≤ 1 for all i

D. Maximize ½ WTW +  αi , such that yi(WTXi+b) ≤ 1 for all i


Ans: B

5. The Lagrange multipliers corresponding to the support vectors have a value:

A. equal to zero

B. less than zero

C. greater than zero

D. can take on any value

Ans: C

6. The SVM’s are less effective when:

A. The data is linearly separable


B. The data is clean and ready to use
C. The data is noisy and contains overlapping points
D. None of the above

Ans: C

7. The dual optimization problem in SVM design is solved using:

A. Linear programming

B. Quadratic programming

C. Dynamic programming

D. Integer programming

Ans: B
8. The relative performance of a SVM on training set and unknown samples is
controlled by:

A. Lagrange multipliers

B. Margin

C. Slack

D. Generalization constant C

Ans: D

9. The primal optimization problem that is solved to obtain the optimal separating
hyperplane in soft margin SVM is:

A. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1-i for all i

B. Minimize ½ WTW + Ci2, such that yi(WTXi+b) ≥ 1-i for all i

C. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1-i 2for all i

D. Minimize ½ WTW+ Ci2, such that yi(WTXi+b) ≥ 1 for all i

Ans: B

10. We are designing a SVM WTX+b=0, suppose Xj’s are the support vectors and αj’s
the corresponding Lagrange multipliers, then which of the following statements are
correct:

A. W =  αjyjXj

B.  αjyj = 0

C. Either A or B

D. Both A and B

Ans: D
Data Mining: Assignment Week 6: ANN

1. Artificial neural networks can be used for:

A. Pattern Recognition

B. Classification

C. Clustering

D. All of the above

Ans: D

Explanation: ANN are used for all the given tasks in the options.

2. A perceptron can correctly classify instances into two classes where the classes are:

A. Overlapping

B. Linearly separable

C. Non-linearly separable

D. None of the above

Ans: B

Explanation: Perceptron is a linear classifier.

3. The logic function that cannot be implemented by a perceptron having two inputs
is?

A. AND

B. OR

C. NOR

D. XOR

Ans: D

Explanation: XOR is not linearly seperable.


4. A training input x is used for a perceptron learning rule. The desired output is t and
the actual output is o. If learning rate is η, the weight (w) update performed by the
perceptron learning rule is described by?

A. wi ← wi + h(t - o)

B. wi ← wi + h(t - o) x

C. wi ← h(t - o) x

D. wi ← wi + (t - o) x

Ans: B

Explanation: Perceptron training rule: wi = wi + Δ wi

Δ wi= h(t - o) x

where t is the target output for the current training example, o is the output generated
by the perceptron, and h is a positive constant called the learning rate.

5. A neuron with 3 inputs has the weight vector [0.2 -0.1 0.1]^T and a bias θ = 0. If
the input vector is X = [0.2 0.4 0.2]^T , then the total input to the neuron is:

A. 0.2

B. 0.02

C. 0.4

D. 0.10

Ans: B

Explanation: input to neuron = w1*x1+w2*x2+w3*x3 = 0.2*0.2 -


0.1*0.4+0.1*0.2=0.02

6. Suppose we have n training examples xi , i=1...n, whose desired outputs are ti ,


i=1...n. The output of a perceptron for these training examples xi‘s are oi , i=1...n. The
error function minimised by the gradient descend perceptron learning algorithm is:

A. E≡ 1 ∑ ( t i−oi )
2 i=1 .. n

B. 1
E≡ ∑ ( t −o ) 2
2 i=1 .. n i i
1
C. E≡ 2 ∑ ( t i +o i )
2

i=1 .. n

D. E≡ 1 ∑ ( t i +o i )
2 i=1 .. n

Ans : B
1
Explanation: error function is E≡ ∑ ( t −o )2
2 i=1 .. n i i
where t is the target output for the current training example, o is the output generated
by the perceptron.

2
7. The tanh activation function h ( z )= −2 z
−1 is:
1+e

A. Discontinuous and not differentiable

B. Discontinuous but differentiable

C. Continuous but not differentiable

D. Continuous and differentiable

Ans: D

Explanation: tanh is continuous and differentiable.

8. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
1

X1 h(X)
5
X2
5

A. OR

B. AND

C. NAND

D. NOR
Ans: A

Explanation: h(X) = 5*X1 + 5*X2 -1 where X1, X2 ϵ {0,1}.

For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
OR.

9. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
8

X1 h(x)
5
X2
5

A. OR

B. AND

C. NAND

D. NOR

Ans: B

Explanation: h(X) = 5*X1 + 5*X2 -8 where X1, X2 ϵ {0,1}.

For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
AND.

10. Overfitting is expected when we observe that?

A.With training iterations, error on training set as well as test set decreases

B. With training iterations, error on training set decreases but test set increases

C. With training iterations, error on training set as well as test set increases

D. With training iterations, training set as well as test set error remains constant

Ans: B

Explanation: Overfitting is when training error decreases and test error increases.
Data Mining: Assignment Week 7: Clustering

1. A good clustering is one with_______?

A. Low inter-cluster distance and low intra-cluster distance

B. Low inter-cluster distance and high intra-cluster distance

C. High inter-cluster distance and low intra-cluster distance

D. High inter-cluster distance and high intra-cluster distance

Ans: C

Explanation: A good clustering technique is one which produces high quality clusters
in which intra-cluster similarity (i.e. intra cluster distance) is low and the inter-cluster
similarity (i.e. inter cluster distance) is high.

2. The leaves of a dendrogram in hierarchical clustering represent?

A. Individual data points

B. Clusters of multiple data points

C. Distances between data points

D. Cluster membership value of the data points

Ans: A

Explanation:Refer to Dendrogram usage in HAG clustering.

3. Which of the following is a hierarchical clustering algorithm?

A. Single linkage clustering

B. K-means clustering

C. DBSCAN

D. None of the above

Ans: A

Explanation: single-linkage clustering is one of several methods of hierarchical


clustering. It is based on grouping clusters in bottom-up fashion (agglomerative
clustering), at each step combining two clusters that contain the closest pair of
elements not yet belonging to the same cluster as each other.
4. Which of the following is not true about the DBSCAN algorithm?
A. It is a density based clustering algorithm

B. It requires two parameters MinPts and epsilon

C. The number of clusters need to be specified in advance

D. It can produce non-convex shaped clusters

Ans: C

Explanation: Density-based spatial clustering of applications with noise (DBSCAN) is


a density-based clustering non-parametric algorithm. DBSCAN requires two parameters: ε (epsilon) and the
minimum number of points required to form a dense region (minPts).

5. Which of the following clustering algorithm uses a minimal spanning tree concept?

A. Complete linkage clustering

B. Single linkage clustering

B. Average linkage clustering

C. DBSCAN

Ans: B

Explanation: The naive algorithm for single-linkage clustering has time complexity O(n3). An alternative algorithm
is based on the equivalence between the naive algorithm and Kruskal's algorithm for minimum spanning trees.
Instead of using Kruskal's algorithm, Prim's algorithm can also be used.

6. Distance between two clusters in single linkage clustering is defined as:

A. Distance between the closest pair of points between the clusters

B. Distance between the furthest pair of points between the clusters

C. Distance between the most centrally located pair of points in the clusters

D. None of the above

Ans: A

Explanation: Mathematically, the linkage function – the distance D(X,Y) between


clusters Xand Y is described by the expression:

D(X,Y) = min d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.
7. Distance between two clusters in complete linkage clustering is defined as:

A. Distance between the closest pair of points between the clusters

B. Distance between the furthest pair of points between the clusters

C. Distance between the most centrally located pair of points in the clusters

D. None of the above

Ans : B

Explanation: Mathematically, the linkage function – the distance D(X,Y) between


clusters X and Y is described by the expression:

D(X,Y) = max d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.

8. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5, 7),
and p5=(0, 0.5). Euclidean distance is the distance function used. Single linkage clus-
tering is used to cluster the points into two clusters. The clusters are:
A. {p1, p2, p3} {p4, p5}

B. {p1, p4, p5} {p2, p3}

C. {p1, p2, p5} {p3, p4}

D. {p1, p2, p4} {p3, p5}

Ans : C

Explanation: find the Euclidean distance between the points and cluster together
points having minimum Euclidean distance.

P1 P2 P3 P4 P5
P1 0
P2 1 0
P3 9.4 8.60 0
2
P4 8.60 7.81 1 0
2
P5 0.5 0.5 9.01 8.2 0
{P1, P5} and {P2, P5} has minimum distance. We will choose {P1, P5} and cluster
them together.

We will evaluate the distance of all the points from the cluster {P1, P5}. Taking
minimum distance.
P1, P5 P2 P3 P4
P1, P5 0
P2 0.5 0
P3 9.01 8.602 0
P4 8.2 7.81 1 0
(P1, P5) and P2 has minimum distance. We will cluster them together.

P1, P2, P3 P4
P5
P1, P2, 0
P5
P3 8.602 0
P4 7.81 1 0
(P3, P4) has minimum distance. They will be clustered together.

We have got two clusters the process of clustering stops.

Two clusters obtained are {P1, P2, P5} and {P3, P4}.

9. Which of the following is not true about K-means clustering algorithm?

A. It is a partitional clustering algorithm

B. The final cluster obtained depends on the choice of initial cluster centres

C. Number of clusters need to be specified in advance

D. It can generate non-convex cluster shapes

Ans: D

Explanation: K-means clustering cannot generate non-convex cluster shapes.

10. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5,
7), and p5=(0, 0.5). Euclidean distance is the distance function. The k-means algorithm
is used to cluster the points into two clusters. The initial cluster centers are p1 and p4.
The clusters after two iterations of k-means are:
A. {p1, p4, p5} {p2, p3}

B. {p1, p2, p5} {p3, p4}

C. {p3, p4, p5} {p1, p2}

D. {p1, p2, p4} {p3, p5}

Ans: B

Explanation: 1st iteration


Initial centres are P1 and P4

c1 =P1= c2 =P4= Closest


(0,0) (5,7) Centre

P1 0 8.602 c1

P2 1 7.81 c1

P3 9.4 1 c2

P4 8.602 0 c2

P5 0.5 8.2 c1

2nd iteration
Clusters after 1st iteration are:
C1 = {P1, P2, P5} cluster centre is c1= (0, 0.5)
C2 = {P3, P4} cluster centre is c2= (5, 7.5)

c1= (0, 0.5) c2= (5, 7.5) Closest


centre

P1 0.5 9.01 c1

P2 0.5 8.2 c1

P3 9.01 0.5 c2

P4 8.2 0.5 c2

P5 0 8.6 c1

Clusters formed after 2nd iteration are {P1, P2, P5} and {P3, P4}.
Data Mining: Assignment Week 8: Regression

(Each question carries 1 mark)

1. Regression is used in:

A. predictive data mining

B. exploratory data mining

C. descriptive data mining

D. explanative data mining

Ans: A

Explanation: Regression is used for prediction.

2. In the regression equation Y = 21 - 3X, the slope is

A. 21
B. -21
C. 3
D. -3

Ans: D
Explanation: slope intercept form of a line is y=mx+c.

3. The output of a regression algorithm is usually a:

A. real variable

B. integer variable

C. character variable

D. string variable

Ans: A

Explanation: Regression outputs real variables.


4. Regression finds out the model parameters which produces the least square
error between -

A. input value and output value

B. input value and target value

C. output value and target value

D. model parameters and output value

Ans: C

Explanation: Regression finds out the model parameters which


minimises the error between the output value and the target value

5. The linear regression model y = a0 + a1x is applied to the data in the table
shown below. What is the value of the sum squared error function S(a0, a1),
when a0 = 1, a1 = 2?

x y
1 1
2 1
4 6
3 2

A. 0.0
B. 27
C. 13.5
D. 54

Ans: D
Explanation: y’ is the predicted output.

y’ = 1+2x

x y y’
1 1 3
2 1 5
4 6 9
3 2 7

sum of squared error = (1-3)2 +(1-5)2 +(6-9)2 +(2-7)2 = 54


6. Consider x1, x2 to be the independent variables and y the dependent
variable, which of the following represents a linear regression model?

A. y = a0 + a1/x1 + a2/x2

B. y = a0 + a1x1 + a2x2

C. y = a0 + a1x1 + a2x22

D. y = a0 + a1x12 + a2x2

Ans: B

Explanation: In option B y is linear in x.

7. Find all the eigenvalues of the following matrix A.

A. 1,3
B. 2,3
C. 1,2,3
D. Eigenvalues cannot be found.
Ans: C
Explanation: If A is an n × n triangular matrix (upper triangular, lower
triangular, or diagonal), then the eigenvalues of A are entries of the main
diagonal of A. Therefore, eigenvalues are 1,2,3.

8. In the figures below the training instances for classification problems are
described by dots. The blue dotted lines indicate the actual functions and the
red lines indicate the regression model. Which of the following statement is
correct?
A. Figure 1 represents overfitting and Figure 2 represents underfitting

B. Figure 1 represents underrfitting and Figure 2 represents overfitting

C. Both Figure 1 and Figure 2 represents underfitting

D. Both Figure 1 and Figure 2 represents overfitting

Ans: B

Explanation: Definition of overfitting and underfitting.

9. In principal component analysis, the projected lower dimensional space


corresponds to –

A. subset of the original co-ordinate axis

B. eigenvectors of the data covariance matrix

C. eigenvectors of the data distance matrix

D. orthogonal vectors to the original co-ordinate axis

Ans: B

Explanation: We must first subtract the mean of each variable from the dataset to cen-
ter the data around the origin. Then, we compute the covariance matrix of the data and
calculate the eigenvalues and corresponding eigenvectors of this covariance ma-
trix. Then we must normalize each of the orthogonal eigenvectors to become unit vectors.
Once this is done, each of the mutually orthogonal, unit eigenvectors can be interpreted as
an axis of the ellipsoid fitted to the data. This choice of basis will transform our covariance
matrix into a diagonalised form with the diagonal elements representing the variance of
each axis.
10. A time series prediction problem is often best solved using?

A. Multivariate regression

B. Autoregression

C. Logistic regression

D. Sinusoidal regression

Ans : B

Explanation: Autoregression is a time series model that uses observations


from previous time steps as input to a regression equation to predict the value
at the next time step.

You might also like