0% found this document useful (0 votes)

85 views30 pages

Ilovepdf Merged

The document contains 10 multiple choice questions related to data mining concepts like preprocessing, attributes, market basket analysis and association rule mining. The questions test concepts like nominal vs continuous attributes, candidate itemsets generation, support and confidence calculations for association rules and market basket analysis.

Uploaded by

KEERTHANA K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views30 pages

Ilovepdf Merged

Uploaded by

KEERTHANA K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Assignment 1(2024)

All Questions are of 1 mark.

1. The earliest step in the data mining process is usually?

a) Visualization
b) Preprocessing
c) Modelling
d) Deployment

Ans: b

Explanation: Preprocessing is the earliest step in data mining.

2. Which of the following is an example of continuous attribute?:

a) Height of a person
b) Name of a person
c) Gender of a person
d) None of the above

Ans: a

Explanation: Height of a person is Real Number.

3. Friendship structure of users in a social networking site can be considered as an example of:
a) Record data
b) Ordered data
c) Graph data
d) None of the above

Ans: c

Explanation: Friendship is an edge in a graph with users as nodes.

4. Name of a person, can be considered as an attribute of type?

a) Nominal
b) Ordinal
c) Interval
d) Ratio

Ans: a

Explanation: Nominal-related to names. The values of a Nominal attribute are name of things,
some kind of symbols. There is no order (rank, position) among values of nominal attribute.

5. A store sells 15 items. Maximum possible number of candidate 2-itemsets is:

a) 120
b) 105
c) 150
d) 2

Ans: b

Explanation: Number of ways of choosing 2 items from 15 items is 15C2 = 105

6. If a record data matrix has reduced number of rows after a transformation, the
transformation has performed:
a) Data Sampling
b) Dimensionality Reduction
c) Noise Cleaning
d) Discretization

Ans: a

Explanation: Sample is the subset of the population. The process of selecting a sample is known as
sampling.

Answer Q7-Q10 based on the following table:

Customer ID Transaction ID Items Bought

1 1 {a,d,e}
1 2 {a,b,c,e}
2 3 {a,b,d,e}
2 4 {a,c,d,e}
3 5 {b,c,e}
3 6 {b,d,e}
4 7 {c,d}
4 8 {a,b,c}
5 9 {a,d,e}
5 10 {a,b,e}
7. Taking transaction ID as a market basket, support for each itemset {e}, {b,d}, and {b,d,e} is:
a) 0.8, 0.2, 0.2
b) 0.3, 0.3, 0.4
c) 0.25, 0.25, 0.5
d) 1,0,0

Ans: a

Explanation: support of {e} = 8/10, {b,d} = 2/10, {b,d,e} = 2/10.

8. Based on the results in (7), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.5, 0.5
b) 1, 0.25
c) 0.25, 1
d) 0.75, 0.25

Ans: b

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.2/0.2 = 1.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.2/0.8 = 0.25.

9. Repeat (7) by taking customer ID as market basket. An item is treated as 1 if it appears in at

least one transaction done by the customer, 0 otherwise. Support of itemsets {e}, {b,d},
{b,d,e} are:
a) 0.3, 0.5, 0.2
b) 0.8, 1, 0.2
c) 1, 0.2, 0.8
d) 0.8, 1, 0.8

Ans: d

Explanation: Treating each customer id as a market basket.

Customer ID Items Bought

1 {a,d,e}, {a,b,c,e}
2 {a,b,d,e}, {a,c,d,e}
3 {b,c,e}, {b,d,e}
4 {c,d}, {a,b,c}
5 {a,d,e}, {a,b,e}
Support({e}) = 4/5 = 0.8

Support({b,d}) = 5/5 = 1

Support({b,d,e}) = 4/5 = 0.8

10. Based on the results in (9), confidence of association rules {b,d}->{e} and {e}->{b,d} are:
a) 0.8, 1
b) 1, 0.8
c) 0.25, 1
d) 1, 0.25

Ans: a

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.8/1 = 0.8.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.8/0.8 = 1.

Assignment 2(2024)

1. A decision tree can be used to build models for: (1 Mark)

A. Regression problems

B. Classification problems

C. Both of the above

D. None of the above

Ans: C

Explanation: Decision is used for both regression and classification problems.

2. Entropy value of ____ represents that the data sample is pure or homogenous: (1 Mark)

A. 1

B. 0

C. 0.5

D. None of the above.

Ans: B

Explanation: A pure or homogenous data sample is 0.

3. Entropy value of _____ represents that the data sample has a 50-50 split belonging to two categories: (1
mark)

A. 1

B. 0

C. 0.5

D. None of the above

Ans: A

Explanation: Entropy = - 0.5log2 0.5 – 0.5log20.5 = 1

4. If a decision tree is expressed as a set of logical rules, then: (1 Mark)

A. the internal nodes in a branch are connected by AND and the branches by AND

B. the internal nodes in a branch are connected by OR and the branches by OR

C. the internal nodes in a branch are connected by AND and the branches by OR

D. the internal nodes in a branch are connected by OR and the branches by AND

Ans: C
Explanation: definition of decision tree.

5. The Decision tree corresponding to the following is? (1 Mark)

if C2 then
if C1 then A3
else A2
endif
else A1, A3
endif

Ans: C

Explanation: option c is the valid DT for the rule.

For questions 6-7, consider the following table depicting whether a customer will buy a flat or not.

GPA Studied Passed

Low F F

Low T T

Medium F F

Medium T T
6. What is the entropy of the dataset? (1
Mark) High F T

A. 0.50 High T T

B. 0.92

C. 1

D. 0

Ans: B

Explanation: Entropy(2,4) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.92

7. Which attribute would information gain choose as the root of the tree? (2 Marks)

A. GPA

B. Studied

C. Passed

D. None of the above

Ans: B

Explanation: From information gain criterion. The Studied has the highest information gain.

8. A chemical company has three options: (i) commercial production, (ii) pilot plant and (iii) no
production. The cost of constructing a pilot plant is Rs 3 lacs. If a pilot plant is built, chances of high
and low yield are 80% and 20% respectively. In the case of high yield from the pilot plant, there is
a 75% chance of high yield from the commercial plant. In the case of low yield from the pilot plant,
there is only a 10% chance of high yield from the commercial plant. If the company goes for
commercial plant directly without constructing a pilot plant, then there are 60% chance of high
yield. The company earns Rs 1,20,00,000 in high yield and loses Rs 12,00,000 in low yield. The
optimum decision for the company is: (2 marks)

A. Commercial Production.

B. Pilot plant
C. No Production

D. None of the above.

Ans: A

Explanation: The company should produce commercially. The final estimated cost is Rs 67,20,000

For Commercial Production:

Estimated cost = 0.6x12000000 – 0.4x1200000 = 67,20,000

For Pilot Plant:

Estimated cost = 0.8x0.75x12000000 – 0.8x0.25x1200000 + 0.2x0.10x12000000 – 0.8x0.9x1200000

- 300000 = 60,36,000
Assignment Week 3: Bayes Classifier

1. In a multiclass classification problem, Bayes classifier assigns an instance to the class corresponding
to: (1 Mark)

A. Maximum aposteriori probability

B. Maximum apriori probability

C. Lowest aposteriori probability

D. Lowest apriori probability

Ans: A

Explanation: Bayes classifier is also known as MAP (Maximum Aposteriori Probability.)

2. Which of the following is incorrect about Naive Bayes: (1 mark)

A. Attributes can be nominal or numeric

B. Attributes are statistically dependent on one another given the class value.

C. Attributes are equally likely.

D. All of the above.

Ans: B

Explanation: Attributes are statistically independent of one another given the class value.

3. A fair coin is tossed n times. The probability that the difference between the number of heads and
tails is (n-3) is: (1 mark)

A. 2-n

B. 0

C. C(n,n-3)*2-n

D. 2-n+3

Ans: B

Explanation: Let the number of heads = h then the number of tails will be n-h. The difference between
them is n-3 so it is

h – (n - h) = n-3

h = (2n-3)/2 = n – 3/2 which is not an integer value, therefore, the probability of the event is 0.
4. Three companies supply bulbs. The percentage of bulbs supplied by them and the probability of them
being defective is given below:

Company % of bulbs supplied Probability of defective

A 60 0.01
B 30 0.02
C 10 0.03
Given that the bulb is defective probability that it is supplied by B is: (2 marks)

A. 0.1

B. 0.2

C. 0.3

D. 0.4

Ans: D

Explanation: P(B|D) = (P(D|B)*P(B))/P(D)

P(D|B) * P(B) = 0.02 * 0.3 = 0.006

P(D) = P(D|A) * P(A) + P(D|B) * P(B) + P(D|C) * P(C) = 0.01*0.6 + 0.02*0.3 + 0.03*0.10 = 0.015

P(B|D) = 0.006/0.015 = 0.4

5. If P(Z∩X) = 0.2, P(X) = 0.3, P(Y) = 1 then P(Z|X∩Y) is: (1 mark)

A. 0

B. 2/3

C. Not enough data.

D. None of the above.

Ans: B

Explanation: P(Z|X∩Y) = P(Z|X) since P(Y) = 1. Therefore, P(Z|X∩Y) = P(Z∩X)/P(X) = 0.2/0.3= 2/3
For questions 6-7, consider the following hypothetical data regarding the hiring of a person.

GPA Effort Confidence Hire

Low Some Yes No
Low Lots Yes Yes
High Lots No No
High Some No Yes
High Lots Yes Yes

6. Using Naïve Bayes determine whether a person with GPA=High, Effort=Some, and Confidence=Yes be
hired: (2 marks)

A. Yes

B. No

C. The example cannot be classified.

D. Both classes are equally likely

Ans: A

Explanation:

P(Hire=Yes|High, Some, Yes) = P(High, Some, Yes|Hire=Yes)P(Hire=Yes) = 4/45

P(Hire=No|High, Some, Yes) = P(High, Some, Yes|Hire=No)P(Hire=No) = 1/20

P(Hire=Yes|High, Some, Yes)> P(Hire=No|High, Some, Yes)

7. Using Naïve Bayes determine whether a person with Effort=lots, and Confidence=No be hired: (2
marks)

A. Yes

B. No

C. The example cannot be classified

D. Both classes are equally likely

Ans: A

Explanation: P(Hire=Yes|Lots, No) = P(Lots, No|Hire=Yes)P(Hire=Yes) = 0.133

P(Hire=No|Lots, No) = P(Lots, No|Hire=No)P(Hire=No) = 0.1

P(Hire=Yes|Lots, No)> P(Hire=No|Lots, No)

Assignment Week 4: Bayes Classifier and KNN(2024)

Q1-3 are based on a simple Bayesian Network shown below:

P(y=1|x=1)=0.40 P(z=1|y=1)=0.25 P(w=1|z=1)=0.45

P(x=1)=0.60 P(y=1|x=0)=0.30 P(z=1|y=0)=0.60 P(w=1|z=0)=0.30

x y z w

The Bayesian Network is fully specified by the marginal probabilities of the root node(x) and the
conditional probabilities.

1. P(y=0) is: (2 marks)

A. 0.70

B. 0.12

C. 0.64

D. 0.36

Ans: C

Explanation: P(y=0) = 1-P(y=1)

P(y=1) = P(y=1|x=0)P(x=0) + P(y=1|x=1)P(x=1) = 0.300.40 + 0.400.60 = 0.36

P(y=0) = 1-0.36 = 0.64

2. P(z=1|x=1) is: (2 marks)

A. 0.50

B. 0.60

C. 0.46

D. 0

Ans: C

Explanation:

P(z=1|x=1) = P(z=1|y=0)P(y=0|x=1) + P(z=1|y=1)P(y=1|x=1) = 0.600.60 + 0.250.40 = 0.46

3. P(w=0|x=1) is: (2 marks)

A. 0.37

B. 0.63

C. 1

D. None of the above

Ans: B

Explanation: P(w=0|x=1) = P(w=0|z=1)* P(z=1|x=1) + P(w=0|z=0)* P(z=0|x=1)

= 0.550.46 + 0.700.54 = 0.63

4. Consider a binary classification problem with two classes C1 and C2. Class labels of ten other training
set instances sorted in increasing order of their distance to an instance x is as follows: {C1, C2, C1, C2,
C2, C2, C1, C2, C1, C2}. How will a K=5 nearest neighbor classifier classify x? (1 mark)

A. There will be a tie

B. C1

C. C2

D. Not enough information to classify

Ans: C

Explanation: closest 3 neighbors are C1, C2, C1, C2, C2. In this C1 has 2 occurrences and C2 has 3
occurrence, therefore, by majority voting X will be classified as C2.

Consider the following data for questions 5-6.

You are given the following set of training examples. Each attribute can take value either 0 or 1.

A1 A2 A3 Class
0 0 1 C1
0 1 0 C1
0 1 1 C1
1 0 0 C2
1 1 0 C1
1 1 1 C2

5. How would a 3-NN classify the example A1 = 1, A2 = 0, A3 = 1 if the distance metric is Euclidean
distance? (1 mark)

A. C1
B. C2

C. There will be a tie

D. Not enough information to classify

Ans: B

Explanation: We get minimum distance of 1 with points (0,0,1), (1,0,0), (1,1,1) which are classified as C1,
C2, C2; since majority is C2 therefore class is C2.

6. How would a 3-NN classify the example A1 = 0, A2 = 0, A3 = 0 if the distance metric is Euclidean
distance? (1 mark)

A. C1

B. C2

C. There will be a tie

D. Not enough information to classify

Ans: A

Explanation: We get minimum distance of 1 with points (0,0,1), (0,1,0), (1,0,0) which are classified as C1,
C1, C2; since majority is C1 therefore class is C1.

7. Issues with Euclidean measure are: (1 mark)

A. High dimensional data.

B. Can produce counter-intuitive results.

C. Shrinking density – sparsification effect

D. All of the above.

Ans: D

Explanation: all the above are issues of Euclidean measure.

Data Mining: Assignment Week 5: Support Vector Machine

1. Margin of a hyperplane is defined as:

A. The angle it makes with the axes

B. The intercept it makes on the axes

C. Perpendicular distance from its closest point

D. Perpendicular distance from origin

Ans: C

2. In a hard margin support vector machine:

A. No training instances lie inside the margin

B. All the training instances lie inside the margin

C. Only few training instances lie inside the margin

D. None of the above

Ans: A

3. The primal optimization problem solved to obtain the hard margin optimal
separating hyperplane is:

A. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1 for all i

B. Maximize ½ WTW, such that yi(WTXi+b) ≥ 1 for all i

C. Minimize ½ WTW, such that yi(WTXi+b) ≤ 1 for all i

D. Maximize ½ WTW, such that yi(WTXi+b) ≤ 1 for all i

Ans: A

4. The dual optimization problem solved to obtain the hard margin optimal separating
hyperplane is:

A. Maximize ½ WTW, such that yi(WTXi+b) ≥ 1- αi for all i

B. Minimize ½ WTW -  αi(yi(WTXi+b) -1), such that αi ≥ 0, for all i

C. Minimize ½ WTW -  αi, such that yi(WTXi+b) ≤ 1 for all i

D. Maximize ½ WTW +  αi , such that yi(WTXi+b) ≤ 1 for all i

Ans: B

5. The Lagrange multipliers corresponding to the support vectors have a value:

A. equal to zero

B. less than zero

C. greater than zero

D. can take on any value

Ans: C

6. The SVM’s are less effective when:

A. The data is linearly separable

B. The data is clean and ready to use
C. The data is noisy and contains overlapping points
D. None of the above

Ans: C

7. The dual optimization problem in SVM design is solved using:

A. Linear programming

B. Quadratic programming

C. Dynamic programming

D. Integer programming

Ans: B
8. The relative performance of a SVM on training set and unknown samples is
controlled by:

A. Lagrange multipliers

B. Margin

C. Slack

D. Generalization constant C

Ans: D

9. The primal optimization problem that is solved to obtain the optimal separating
hyperplane in soft margin SVM is:

A. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1-i for all i

B. Minimize ½ WTW + Ci2, such that yi(WTXi+b) ≥ 1-i for all i

C. Minimize ½ WTW, such that yi(WTXi+b) ≥ 1-i 2for all i

D. Minimize ½ WTW+ Ci2, such that yi(WTXi+b) ≥ 1 for all i

Ans: B

10. We are designing a SVM WTX+b=0, suppose Xj’s are the support vectors and αj’s
the corresponding Lagrange multipliers, then which of the following statements are
correct:

A. W =  αjyjXj

B.  αjyj = 0

C. Either A or B

D. Both A and B

Ans: D
Data Mining: Assignment Week 6: ANN

1. Artificial neural networks can be used for:

A. Pattern Recognition

B. Classification

C. Clustering

D. All of the above

Ans: D

Explanation: ANN are used for all the given tasks in the options.

2. A perceptron can correctly classify instances into two classes where the classes are:

A. Overlapping

B. Linearly separable

C. Non-linearly separable

D. None of the above

Ans: B

Explanation: Perceptron is a linear classifier.

3. The logic function that cannot be implemented by a perceptron having two inputs
is?

A. AND

B. OR

C. NOR

D. XOR

Ans: D

Explanation: XOR is not linearly seperable.

4. A training input x is used for a perceptron learning rule. The desired output is t and
the actual output is o. If learning rate is η, the weight (w) update performed by the
perceptron learning rule is described by?

A. wi ← wi + h(t - o)

B. wi ← wi + h(t - o) x

C. wi ← h(t - o) x

D. wi ← wi + (t - o) x

Ans: B

Explanation: Perceptron training rule: wi = wi + Δ wi

Δ wi= h(t - o) x

where t is the target output for the current training example, o is the output generated
by the perceptron, and h is a positive constant called the learning rate.

5. A neuron with 3 inputs has the weight vector [0.2 -0.1 0.1]^T and a bias θ = 0. If
the input vector is X = [0.2 0.4 0.2]^T , then the total input to the neuron is:

A. 0.2

B. 0.02

C. 0.4

D. 0.10

Ans: B

Explanation: input to neuron = w1x1+w2x2+w3x3 = 0.20.2 -

0.1*0.4+0.1*0.2=0.02

6. Suppose we have n training examples xi , i=1...n, whose desired outputs are ti ,

i=1...n. The output of a perceptron for these training examples xi‘s are oi , i=1...n. The
error function minimised by the gradient descend perceptron learning algorithm is:

A. E≡ 1 ∑ ( t i−oi )
2 i=1 .. n

B. 1
E≡ ∑ ( t −o ) 2
2 i=1 .. n i i
1
C. E≡ 2 ∑ ( t i +o i )
2

i=1 .. n

D. E≡ 1 ∑ ( t i +o i )
2 i=1 .. n

Ans : B
1
Explanation: error function is E≡ ∑ ( t −o )2
2 i=1 .. n i i
where t is the target output for the current training example, o is the output generated
by the perceptron.

2
7. The tanh activation function h ( z )= −2 z
−1 is:
1+e

A. Discontinuous and not differentiable

B. Discontinuous but differentiable

C. Continuous but not differentiable

D. Continuous and differentiable

Ans: D

Explanation: tanh is continuous and differentiable.

8. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
1

X1 h(X)
5
X2
5

A. OR

B. AND

C. NAND

D. NOR
Ans: A

Explanation: h(X) = 5X1 + 5X2 -1 where X1, X2 ϵ {0,1}.

For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
OR.

9. The neural network given bellow takes two binary valued inputs x 1, x 2 ϵ {0,1} and
the activation function is the binary threshold function ( h ( z )=1 if z >0 ;0 otherwise ) . Which
of the following logical functions does it compute?
-1
8

X1 h(x)
5
X2
5

A. OR

B. AND

C. NAND

D. NOR

Ans: B

Explanation: h(X) = 5X1 + 5X2 -8 where X1, X2 ϵ {0,1}.

For different values of X1 and X2 we will obtain the value of h(X), this resembles the truth table of
AND.

10. Overfitting is expected when we observe that?

A.With training iterations, error on training set as well as test set decreases

B. With training iterations, error on training set decreases but test set increases

C. With training iterations, error on training set as well as test set increases

D. With training iterations, training set as well as test set error remains constant

Ans: B

Explanation: Overfitting is when training error decreases and test error increases.
Data Mining: Assignment Week 7: Clustering

1. A good clustering is one with_______?

A. Low inter-cluster distance and low intra-cluster distance

B. Low inter-cluster distance and high intra-cluster distance

C. High inter-cluster distance and low intra-cluster distance

D. High inter-cluster distance and high intra-cluster distance

Ans: C

Explanation: A good clustering technique is one which produces high quality clusters
in which intra-cluster similarity (i.e. intra cluster distance) is low and the inter-cluster
similarity (i.e. inter cluster distance) is high.

2. The leaves of a dendrogram in hierarchical clustering represent?

A. Individual data points

B. Clusters of multiple data points

C. Distances between data points

D. Cluster membership value of the data points

Ans: A

Explanation:Refer to Dendrogram usage in HAG clustering.

3. Which of the following is a hierarchical clustering algorithm?

A. Single linkage clustering

B. K-means clustering

C. DBSCAN

D. None of the above

Ans: A

Explanation: single-linkage clustering is one of several methods of hierarchical

clustering. It is based on grouping clusters in bottom-up fashion (agglomerative
clustering), at each step combining two clusters that contain the closest pair of
elements not yet belonging to the same cluster as each other.
4. Which of the following is not true about the DBSCAN algorithm?
A. It is a density based clustering algorithm

B. It requires two parameters MinPts and epsilon

C. The number of clusters need to be specified in advance

D. It can produce non-convex shaped clusters

Ans: C

Explanation: Density-based spatial clustering of applications with noise (DBSCAN) is

a density-based clustering non-parametric algorithm. DBSCAN requires two parameters: ε (epsilon) and the
minimum number of points required to form a dense region (minPts).

5. Which of the following clustering algorithm uses a minimal spanning tree concept?

A. Complete linkage clustering

B. Single linkage clustering

B. Average linkage clustering

C. DBSCAN

Ans: B

Explanation: The naive algorithm for single-linkage clustering has time complexity O(n3). An alternative algorithm
is based on the equivalence between the naive algorithm and Kruskal's algorithm for minimum spanning trees.
Instead of using Kruskal's algorithm, Prim's algorithm can also be used.

6. Distance between two clusters in single linkage clustering is defined as:

A. Distance between the closest pair of points between the clusters

B. Distance between the furthest pair of points between the clusters

C. Distance between the most centrally located pair of points in the clusters

D. None of the above

Ans: A

Explanation: Mathematically, the linkage function – the distance D(X,Y) between

clusters Xand Y is described by the expression:

D(X,Y) = min d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.
7. Distance between two clusters in complete linkage clustering is defined as:

A. Distance between the closest pair of points between the clusters

B. Distance between the furthest pair of points between the clusters

C. Distance between the most centrally located pair of points in the clusters

D. None of the above

Ans : B

Explanation: Mathematically, the linkage function – the distance D(X,Y) between

clusters X and Y is described by the expression:

D(X,Y) = max d(x,y) s.t. x ϵ X and y ϵ Y where X and Y are any two sets of elements
considered as clusters, and d(x,y) denotes the distance between the two elements x
and y.

8. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5, 7),
and p5=(0, 0.5). Euclidean distance is the distance function used. Single linkage clus-
tering is used to cluster the points into two clusters. The clusters are:
A. {p1, p2, p3} {p4, p5}

B. {p1, p4, p5} {p2, p3}

C. {p1, p2, p5} {p3, p4}

D. {p1, p2, p4} {p3, p5}

Ans : C

Explanation: find the Euclidean distance between the points and cluster together
points having minimum Euclidean distance.

P1 P2 P3 P4 P5
P1 0
P2 1 0
P3 9.4 8.60 0
2
P4 8.60 7.81 1 0
2
P5 0.5 0.5 9.01 8.2 0
{P1, P5} and {P2, P5} has minimum distance. We will choose {P1, P5} and cluster
them together.

We will evaluate the distance of all the points from the cluster {P1, P5}. Taking
minimum distance.
P1, P5 P2 P3 P4
P1, P5 0
P2 0.5 0
P3 9.01 8.602 0
P4 8.2 7.81 1 0
(P1, P5) and P2 has minimum distance. We will cluster them together.

P1, P2, P3 P4
P5
P1, P2, 0
P5
P3 8.602 0
P4 7.81 1 0
(P3, P4) has minimum distance. They will be clustered together.

We have got two clusters the process of clustering stops.

Two clusters obtained are {P1, P2, P5} and {P3, P4}.

9. Which of the following is not true about K-means clustering algorithm?

A. It is a partitional clustering algorithm

B. The final cluster obtained depends on the choice of initial cluster centres

C. Number of clusters need to be specified in advance

D. It can generate non-convex cluster shapes

Ans: D

Explanation: K-means clustering cannot generate non-convex cluster shapes.

10. Consider a set of five 2-dimensional points p1=(0, 0), p2=(0, 1), p3=(5, 8), p4=(5,
7), and p5=(0, 0.5). Euclidean distance is the distance function. The k-means algorithm
is used to cluster the points into two clusters. The initial cluster centers are p1 and p4.
The clusters after two iterations of k-means are:
A. {p1, p4, p5} {p2, p3}

B. {p1, p2, p5} {p3, p4}

C. {p3, p4, p5} {p1, p2}

D. {p1, p2, p4} {p3, p5}

Ans: B

Explanation: 1st iteration

Initial centres are P1 and P4

c1 =P1= c2 =P4= Closest

(0,0) (5,7) Centre

P1 0 8.602 c1

P2 1 7.81 c1

P3 9.4 1 c2

P4 8.602 0 c2

P5 0.5 8.2 c1

2nd iteration
Clusters after 1st iteration are:
C1 = {P1, P2, P5} cluster centre is c1= (0, 0.5)
C2 = {P3, P4} cluster centre is c2= (5, 7.5)

c1= (0, 0.5) c2= (5, 7.5) Closest

centre

P1 0.5 9.01 c1

P2 0.5 8.2 c1

P3 9.01 0.5 c2

P4 8.2 0.5 c2

P5 0 8.6 c1

Clusters formed after 2nd iteration are {P1, P2, P5} and {P3, P4}.
Data Mining: Assignment Week 8: Regression

(Each question carries 1 mark)

1. Regression is used in:

A. predictive data mining

B. exploratory data mining

C. descriptive data mining

D. explanative data mining

Ans: A

Explanation: Regression is used for prediction.

2. In the regression equation Y = 21 - 3X, the slope is

A. 21
B. -21
C. 3
D. -3

Ans: D
Explanation: slope intercept form of a line is y=mx+c.

3. The output of a regression algorithm is usually a:

A. real variable

B. integer variable

C. character variable

D. string variable

Ans: A

Explanation: Regression outputs real variables.

4. Regression finds out the model parameters which produces the least square
error between -

A. input value and output value

B. input value and target value

C. output value and target value

D. model parameters and output value

Ans: C

Explanation: Regression finds out the model parameters which

minimises the error between the output value and the target value

5. The linear regression model y = a0 + a1x is applied to the data in the table
shown below. What is the value of the sum squared error function S(a0, a1),
when a0 = 1, a1 = 2?

x y
1 1
2 1
4 6
3 2

A. 0.0
B. 27
C. 13.5
D. 54

Ans: D
Explanation: y’ is the predicted output.

y’ = 1+2x

x y y’
1 1 3
2 1 5
4 6 9
3 2 7

sum of squared error = (1-3)2 +(1-5)2 +(6-9)2 +(2-7)2 = 54

6. Consider x1, x2 to be the independent variables and y the dependent
variable, which of the following represents a linear regression model?

A. y = a0 + a1/x1 + a2/x2

B. y = a0 + a1x1 + a2x2

C. y = a0 + a1x1 + a2x22

D. y = a0 + a1x12 + a2x2

Ans: B

Explanation: In option B y is linear in x.

7. Find all the eigenvalues of the following matrix A.

A. 1,3
B. 2,3
C. 1,2,3
D. Eigenvalues cannot be found.
Ans: C
Explanation: If A is an n × n triangular matrix (upper triangular, lower
triangular, or diagonal), then the eigenvalues of A are entries of the main
diagonal of A. Therefore, eigenvalues are 1,2,3.

8. In the figures below the training instances for classification problems are
described by dots. The blue dotted lines indicate the actual functions and the
red lines indicate the regression model. Which of the following statement is
correct?
A. Figure 1 represents overfitting and Figure 2 represents underfitting

B. Figure 1 represents underrfitting and Figure 2 represents overfitting

C. Both Figure 1 and Figure 2 represents underfitting

D. Both Figure 1 and Figure 2 represents overfitting

Ans: B

Explanation: Definition of overfitting and underfitting.

9. In principal component analysis, the projected lower dimensional space

corresponds to –

A. subset of the original co-ordinate axis

B. eigenvectors of the data covariance matrix

C. eigenvectors of the data distance matrix

D. orthogonal vectors to the original co-ordinate axis

Ans: B

Explanation: We must first subtract the mean of each variable from the dataset to cen-
ter the data around the origin. Then, we compute the covariance matrix of the data and
calculate the eigenvalues and corresponding eigenvectors of this covariance ma-
trix. Then we must normalize each of the orthogonal eigenvectors to become unit vectors.
Once this is done, each of the mutually orthogonal, unit eigenvectors can be interpreted as
an axis of the ellipsoid fitted to the data. This choice of basis will transform our covariance
matrix into a diagonalised form with the diagonal elements representing the variance of
each axis.
10. A time series prediction problem is often best solved using?

A. Multivariate regression

B. Autoregression

C. Logistic regression

D. Sinusoidal regression

Ans : B

Explanation: Autoregression is a time series model that uses observations

from previous time steps as input to a regression equation to predict the value
at the next time step.

HCIA-AI V3.0 Training Material
100% (2)
HCIA-AI V3.0 Training Material
474 pages
DataMining - Workbook MCQ
No ratings yet
DataMining - Workbook MCQ
16 pages
AI in Cyber Security
100% (1)
AI in Cyber Security
5 pages
Simple Neural Nets For Pattern Classification
No ratings yet
Simple Neural Nets For Pattern Classification
68 pages
Supervised learningNN
No ratings yet
Supervised learningNN
73 pages
CS407 Neural Computation: Lecturer: A/Prof. M. Bennamoun
No ratings yet
CS407 Neural Computation: Lecturer: A/Prof. M. Bennamoun
34 pages
Concepts in Deep Learning
No ratings yet
Concepts in Deep Learning
14 pages
Intel AI Awareness Final PPT (Intel AI For Current Workforce Program)
No ratings yet
Intel AI Awareness Final PPT (Intel AI For Current Workforce Program)
130 pages
Traditional Maori Medicines
0% (1)
Traditional Maori Medicines
5 pages
AIES Unit 5 (2022)
No ratings yet
AIES Unit 5 (2022)
98 pages
Unit 6
No ratings yet
Unit 6
70 pages
(LASER) survival8-DM An DSAD-2-print Pending
No ratings yet
(LASER) survival8-DM An DSAD-2-print Pending
29 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
10 pages
Short - Ques - Answers FML
No ratings yet
Short - Ques - Answers FML
10 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
F030 - The Evolution of Expert System by Noran
No ratings yet
F030 - The Evolution of Expert System by Noran
25 pages
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
No ratings yet
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
4 pages
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
No ratings yet
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
26 pages
Homework1 Excersises
No ratings yet
Homework1 Excersises
12 pages
Neural Network
No ratings yet
Neural Network
16 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
DM Witten 03
No ratings yet
DM Witten 03
56 pages
Lec 1
No ratings yet
Lec 1
30 pages
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
No ratings yet
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
5 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Perceptron - Wikipedia
No ratings yet
Perceptron - Wikipedia
9 pages
Java Neural Networks and Neuroph - A Tutorial - CertPal
No ratings yet
Java Neural Networks and Neuroph - A Tutorial - CertPal
9 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
Problem 1: Cse352 AI Homework 3 Solutions
No ratings yet
Problem 1: Cse352 AI Homework 3 Solutions
31 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
No ratings yet
ANSWERS TO 15-381 Final, Spring 2004: Friday May 7, 2004
20 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
2.6 Planning Premises
No ratings yet
2.6 Planning Premises
4 pages
Key3 DM
No ratings yet
Key3 DM
4 pages
MFDS - Test 1 Problems
No ratings yet
MFDS - Test 1 Problems
9 pages
ML Important Questions For Preparation All Units 2022
No ratings yet
ML Important Questions For Preparation All Units 2022
12 pages
Data Mining f20 Practice Final Solutions
No ratings yet
Data Mining f20 Practice Final Solutions
8 pages
Types of Audit Procedures
No ratings yet
Types of Audit Procedures
3 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Machine Learning Learning
No ratings yet
Machine Learning Learning
35 pages
Assignment 3 Solution
No ratings yet
Assignment 3 Solution
3 pages
Week 7 Assignment 1
No ratings yet
Week 7 Assignment 1
6 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
3 pages
Midterm F07 Solutions
No ratings yet
Midterm F07 Solutions
4 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
4 pages
Tutorial 1
No ratings yet
Tutorial 1
4 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Assign 7
No ratings yet
Assign 7
5 pages
6 (4 Files Merged)
0% (1)
6 (4 Files Merged)
4 pages
Zeba 1
No ratings yet
Zeba 1
38 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Lec7 - Nonparametric Methods - II
No ratings yet
Lec7 - Nonparametric Methods - II
38 pages
Unit 3
No ratings yet
Unit 3
81 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
ML - Important Questions
No ratings yet
ML - Important Questions
3 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
MLT Answer Key
No ratings yet
MLT Answer Key
10 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Practical Guide and Concepts Data Mining
No ratings yet
Practical Guide and Concepts Data Mining
63 pages
Heart Disease Prediction Python
No ratings yet
Heart Disease Prediction Python
114 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
Data Mining - Sem 3 - Assignment - 2
No ratings yet
Data Mining - Sem 3 - Assignment - 2
5 pages
ICS423 IoT Syllabus
No ratings yet
ICS423 IoT Syllabus
2 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Acd 21 JB
No ratings yet
Acd 21 JB
51 pages
B. Sc. H Computer S 3OWYH6v
No ratings yet
B. Sc. H Computer S 3OWYH6v
6 pages
Emerging Artificial Intelligence Applications in Computer Engineering 1st Edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 9781586037802 - Download the ebook now and own the full detailed content
100% (7)
Emerging Artificial Intelligence Applications in Computer Engineering 1st Edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 9781586037802 - Download the ebook now and own the full detailed content
80 pages
Classification-1
No ratings yet
Classification-1
48 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Learning From Data A Short Course 1st Edition Yaser S. Abu-Mostafa Download
No ratings yet
Learning From Data A Short Course 1st Edition Yaser S. Abu-Mostafa Download
67 pages
Deep Learning Lecture 6
No ratings yet
Deep Learning Lecture 6
8 pages
AI Lecture 12-b
No ratings yet
AI Lecture 12-b
20 pages
Decision Tree
No ratings yet
Decision Tree
71 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
2018-2022 Results - Till Final Sem
No ratings yet
2018-2022 Results - Till Final Sem
89 pages
Lec 12
No ratings yet
Lec 12
21 pages
Exam dm1 121017 Ans
No ratings yet
Exam dm1 121017 Ans
8 pages
1.1 Measures of Central Tendency - Moments, Skewness, Kurtosis
No ratings yet
1.1 Measures of Central Tendency - Moments, Skewness, Kurtosis
37 pages
DT 2023 24 Sols
No ratings yet
DT 2023 24 Sols
8 pages
1.5 Normal - Evaluation of Statistical Parameters
No ratings yet
1.5 Normal - Evaluation of Statistical Parameters
11 pages
2.1 Sampling Distributions & Estimation of Parameters
No ratings yet
2.1 Sampling Distributions & Estimation of Parameters
17 pages
4.8 LED Display
No ratings yet
4.8 LED Display
8 pages
2 2 Quantization
No ratings yet
2 2 Quantization
12 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
7.2 Learning Decision Trees (Continued) : 1 2 M I 1: M I 2 I I I
No ratings yet
7.2 Learning Decision Trees (Continued) : 1 2 M I 1: M I 2 I I I
5 pages
2021 - Data Mining DU CBCS
No ratings yet
2021 - Data Mining DU CBCS
4 pages
AI2002 - Final Exam Paper-2024 - CS
No ratings yet
AI2002 - Final Exam Paper-2024 - CS
4 pages
1
No ratings yet
1
4 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Math Practice Tests For The ACT
From Everand
Math Practice Tests For The ACT
Vibrant Publishers
No ratings yet
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)

Ilovepdf Merged

Uploaded by

Ilovepdf Merged

Uploaded by

Assignment 1(2024)

All Questions are of 1 mark.

1. The earliest step in the data mining process is usually?

Explanation: Preprocessing is the earliest step in data mining.

2. Which of the following is an example of continuous attribute?:

Explanation: Height of a person is Real Number.

Explanation: Friendship is an edge in a graph with users as nodes.

4. Name of a person, can be considered as an attribute of type?

5. A store sells 15 items. Maximum possible number of candidate 2-itemsets is:

Explanation: Number of ways of choosing 2 items from 15 items is 15C2 = 105

Answer Q7-Q10 based on the following table:

Customer ID Transaction ID Items Bought

Explanation: support of {e} = 8/10, {b,d} = 2/10, {b,d,e} = 2/10.

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.2/0.2 = 1.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.2/0.8 = 0.25.

9. Repeat (7) by taking customer ID as market basket. An item is treated as 1 if it appears in at

Explanation: Treating each customer id as a market basket.

Customer ID Items Bought

Support({b,d,e}) = 4/5 = 0.8

Explanation: Confidence(X->Y) = support({X,Y})/support({X}).

Confidence({b,d}->{e}) = support({b,d,e})/support({b,d}) = 0.8/1 = 0.8.

Confidence({e}->{b,d}) = support({b,d,e})/support({e}) = 0.8/0.8 = 1.

1. A decision tree can be used to build models for: (1 Mark)

C. Both of the above

D. None of the above

Explanation: Decision is used for both regression and classification problems.

D. None of the above.

Explanation: A pure or homogenous data sample is 0.

D. None of the above

Explanation: Entropy = - 0.5log2 0.5 – 0.5log20.5 = 1

4. If a decision tree is expressed as a set of logical rules, then: (1 Mark)

B. the internal nodes in a branch are connected by OR and the branches by OR

5. The Decision tree corresponding to the following is? (1 Mark)

Explanation: option c is the valid DT for the rule.

GPA Studied Passed

Explanation: Entropy(2,4) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.92

D. None of the above

D. None of the above.

For Commercial Production:

Estimated cost = 0.6x12000000 – 0.4x1200000 = 67,20,000

For Pilot Plant:

Estimated cost = 0.8x0.75x12000000 – 0.8x0.25x1200000 + 0.2x0.10x12000000 – 0.8x0.9x1200000

A. Maximum aposteriori probability

B. Maximum apriori probability

C. Lowest aposteriori probability

D. Lowest apriori probability

Explanation: Bayes classifier is also known as MAP (Maximum Aposteriori Probability.)

2. Which of the following is incorrect about Naive Bayes: (1 mark)

A. Attributes can be nominal or numeric

C. Attributes are equally likely.

D. All of the above.

Company % of bulbs supplied Probability of defective

Explanation: P(B|D) = (P(D|B)*P(B))/P(D)

P(D|B) * P(B) = 0.02 * 0.3 = 0.006

P(B|D) = 0.006/0.015 = 0.4

5. If P(Z∩X) = 0.2, P(X) = 0.3, P(Y) = 1 then P(Z|X∩Y) is: (1 mark)

C. Not enough data.

D. None of the above.

GPA Effort Confidence Hire

C. The example cannot be classified.

D. Both classes are equally likely

P(Hire=Yes|High, Some, Yes) = P(High, Some, Yes|Hire=Yes)P(Hire=Yes) = 4/45

P(Hire=No|High, Some, Yes) = P(High, Some, Yes|Hire=No)P(Hire=No) = 1/20

P(Hire=Yes|High, Some, Yes)> P(Hire=No|High, Some, Yes)

C. The example cannot be classified

D. Both classes are equally likely

Explanation: P(Hire=Yes|Lots, No) = P(Lots, No|Hire=Yes)P(Hire=Yes) = 0.133

P(Hire=No|Lots, No) = P(Lots, No|Hire=No)P(Hire=No) = 0.1

P(Hire=Yes|Lots, No)> P(Hire=No|Lots, No)

Q1-3 are based on a simple Bayesian Network shown below:

P(y=1|x=1)=0.40 P(z=1|y=1)=0.25 P(w=1|z=1)=0.45

1. P(y=0) is: (2 marks)

Explanation: P(y=0) = 1-P(y=1)

P(y=1) = P(y=1|x=0)*P(x=0) + P(y=1|x=1)*P(x=1) = 0.30*0.40 + 0.40*0.60 = 0.36

P(y=0) = 1-0.36 = 0.64

P(y=1) = P(y=1|x=0)P(x=0) + P(y=1|x=1)P(x=1) = 0.300.40 + 0.400.60 = 0.36

P(z=1|x=1) = P(z=1|y=0)P(y=0|x=1) + P(z=1|y=1)P(y=1|x=1) = 0.600.60 + 0.250.40 = 0.46

= 0.550.46 + 0.700.54 = 0.63

Explanation: input to neuron = w1x1+w2x2+w3x3 = 0.20.2 -

Explanation: h(X) = 5X1 + 5X2 -1 where X1, X2 ϵ {0,1}.

Explanation: h(X) = 5X1 + 5X2 -8 where X1, X2 ϵ {0,1}.