Data Mining and Warehousing22
Data Mining and Warehousing22
No
f. Find the cosine similarity between the given two term frequency vectors: CO-2 PO-1
X=[3,2,0,5,0,0,0,2,0,0]
Y=[1,0,0,0,0,0,0,1,0,2]
g. What is attribute selection measure? CO-3 PO-1
i. Give two examples of activation function used in neural networks. CO-3 PO-2
3.a. Briefly outline how to compute the dissimilarity between objects described by 5 CO-1 PO-2
the following types of variables:
i. Numerical (interval-scaled) variables
ii. Categorical variables
iii. Ratio-scaled variables
iv. Nonmetric vector objects
b. Explain the steps of KDD, with the help of a diagram. 5 CO-1 PO-1
(OR)
c. Suppose that a hospital tested the age and body fat data for 18 randomly 10 CO-2 PO-2
selected adults with the following results:
Age 23 23 27 27 39 41 47 49 50
% fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2
Age 52 54 54 56 57 58 58 60 61
% fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7
i. Calculate the mean, median, and standard deviation of age and %fat.
ii. Find out the covariance and correlation among these two attributes.
4.a. Explain how Apriori Algorithm is used for mining frequent item sets. 5 CO-2 PO-1
b. What are the measures of interestingness for an association rule? Define a 5 CO-2 PO-2
strong association rule.
(OR)
c. There are five transactions (T1,T2,T3,T4,T5) with items (A,B,C,D) purchased 10 CO-3 PO-2
as T1(B,C),T2(A,C,D),T3(B,C), T4(A,B,C,D), T5(B,D). The min_sup=2.
Show how Apriori Rule Mining Algorithm can generate the association rules
for the above dataset.
Page 2 of 3
5.a. What is decision trees algorithm? List down the attribute selection measures 5 CO-2 PO-2
used by the ID3 algorithm to construct a Decision Tree.
b. Write short answer on Naïve Bayes classifier. 5 CO-2 PO-1
(OR)
c. A multilayer feed-forward neural network is shown in below Figure. Let the 10 CO-3 PO-2
learning rate be 0.9. The initial weight and bias values of the network are given
in Table below, along with the first training tuple, X = (1, 0, 1), with a class
label of 1. Compute Net input, output and error at each node and update weight
and bias values just once. Use logistic activation function at nodes 4, 5 and 6.
6.a. Why is outlier mining important? Briefly describe the different approaches 5 CO-2 PO-2
behind distanced-based outlier detection and density based local outlier
detection.
b. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8): 5 CO-2 PO-1
Compute the Minkowski distance between the two objects, using q = 3.
(OR)
c. Both k-means and k-medoids algorithms can perform effective Clustering. 5 CO-3 PO-2
Illustrate the strength and weakness of k-means in comparison with the k-
medoids algorithm.
d. Suppose that the data mining task is to cluster the following eight points (with 5 CO-3 PO-2
(x, y) representing location) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1,
B1, and C1 as the center of each cluster, respectively.
Use the k-means algorithm to show only
i. The three cluster centers after the first round execution
ii. The final three clusters
--- End of Paper ---
Page 3 of 3