Data Mining End 23 24
Data Mining End 23 24
Note: Attempt all guestions. In case of any doubts in any question, make suitable assumptions, state them, and
justify them
Marks
Q.1 (a) Discuss whether each of the following activities is a data mining task: (a) Dividing the [6]
customers of a company according to their profitability, (b) Sorting a student database
based on student identification numbers, (c) Predicting the outcomes of tossing a (fair)
pair of dice, (d) Predicting the future stock price of a company using historical records,
(e) Monitoring the heart rate of a patient for abnormalities, (f) Extracting the frequencies
of a sound wave.
(b) Using the data for age given (in increasing order) for the attribute age: 13, 15, 16, 16, 19, [6]
20, 20, 21, 22, 22) 25, 25, 25, 25, 30,, 33, 33, 35, 35, 35 35, 36, 40, 45, 46) 52, 70. Sketch
examples of each sampling technique: Sampling without Replacement (SRSWOR),
Sampling with Replacement (SRSVWR), and stratified sampling. Use samples of size five
and the strata "youth," "middle-aged," and "senior."
Q.2 (a) A database has 5 transactions. Let min sup= 60% and min_conf = 80%. [8]
TID Item_bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E)
T400 (M,U, C, K, Y}
T500 {C, O, 0, K,I E)
Find all frequent itemsets using Apriori and FP-growth, respectively.
6) Discuss the advantages and disadvantages of Cosine, Jaccard and Simple Matching
Coefficient similarity measures in brief.
Q.3 (a) The support vector machine is a highly accurate classification method. However, SVM [6]
classifiers suffer from slow processing when training with a large set of data tuples.
Page 1 of 2
Discuss how to overcome this difficulty and develop a scalable SVM algorithm for
efficient SVM classification in large data sets.
(b) RID age income student credit rating Class: buyscomputer [6]
youth high no fair
2 youth high eKcellent
middleaged high no fair Ves
senior medium fair yes \
senio low yes fair yes
6 senio low yes excellent
middle aged Jow yes Cxcellent YeS
13
youth
youth
sernior
youth
medium
low
medium
medium
Yes
yes
tiiii:
fair
fair
fair
ecellent
ys
yes
12
middleaged medium no excellent yes
13
middle aged high yes air ves
14 senior medium no cxcellent no
05 (a) Consider a multlayer feed-forward neural network that uses back propogation [8]
algorithm with given weight and bais values. Wi4 0.2, Wis =-.3, W24 =0.4, W2s =
0.1, W34=-0.5, W3s =0.2, W6 =-0.3, Ws6 =-0.2, 04 = -0.4, ; = 0.2, 6 = 0.1
1
The activation function 0, = 1+ei is used on each hidden or output unit j that receive an
input I, =ZiWyOi+, with respect to previous layer, i.
.2
W15
W46
WS6
What would be the predicted class for the test sample X= (1,0,0) using the mulilayer
feed-forward neural network classifier?
(b) Outline methods for addressing the class imbalance problem. Suppose a bank would like
[4]
to develop a classifier that guards against fraudulent credit card transactions. Illustrate
how you can induce a quality classifier based on a large set of non-fraudulent examples
and a very small set of fraudulent cases.
Page 2 of 2