0% found this document useful (0 votes)
6 views2 pages

Data Mining End 23 24

The document outlines the end semester examination for the Data Mining course at Motilal Nehru National Institute of Technology, covering various topics such as data mining tasks, frequent itemsets, SVM classification, decision tree algorithms, and neural networks. It includes specific questions related to data analysis techniques, sampling methods, and classification algorithms. The exam is structured to assess students' understanding of theoretical concepts and practical applications in data mining.

Uploaded by

Zeke. 1232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Data Mining End 23 24

The document outlines the end semester examination for the Data Mining course at Motilal Nehru National Institute of Technology, covering various topics such as data mining tasks, frequent itemsets, SVM classification, decision tree algorithms, and neural networks. It includes specific questions related to data analysis techniques, sampling methods, and classification algorithms. The exam is structured to assess students' understanding of theoretical concepts and practical applications in data mining.

Uploaded by

Zeke. 1232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

yRIG-211004 (4RGI)

Motilal Nehru National Institute of TechnologyAllahabad


Prayagraj-211004 [India]

Computer Science & Engineering


End Semester (Even) Examination 2024
Programme Name: B.Tech. Semester: VI

Course Code: CS16103 Course Name:Data Mining

Branch: Computer Science & Engineering Student Reg. No.: 2 0 2 | oo 3


Duration: 03 Hours Max. Marks: 60

Note: Attempt all guestions. In case of any doubts in any question, make suitable assumptions, state them, and
justify them
Marks

Q.1 (a) Discuss whether each of the following activities is a data mining task: (a) Dividing the [6]
customers of a company according to their profitability, (b) Sorting a student database
based on student identification numbers, (c) Predicting the outcomes of tossing a (fair)
pair of dice, (d) Predicting the future stock price of a company using historical records,
(e) Monitoring the heart rate of a patient for abnormalities, (f) Extracting the frequencies
of a sound wave.
(b) Using the data for age given (in increasing order) for the attribute age: 13, 15, 16, 16, 19, [6]
20, 20, 21, 22, 22) 25, 25, 25, 25, 30,, 33, 33, 35, 35, 35 35, 36, 40, 45, 46) 52, 70. Sketch
examples of each sampling technique: Sampling without Replacement (SRSWOR),
Sampling with Replacement (SRSVWR), and stratified sampling. Use samples of size five
and the strata "youth," "middle-aged," and "senior."

Q.2 (a) A database has 5 transactions. Let min sup= 60% and min_conf = 80%. [8]
TID Item_bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E)
T400 (M,U, C, K, Y}
T500 {C, O, 0, K,I E)
Find all frequent itemsets using Apriori and FP-growth, respectively.
6) Discuss the advantages and disadvantages of Cosine, Jaccard and Simple Matching
Coefficient similarity measures in brief.

Q.3 (a) The support vector machine is a highly accurate classification method. However, SVM [6]
classifiers suffer from slow processing when training with a large set of data tuples.

Page 1 of 2
Discuss how to overcome this difficulty and develop a scalable SVM algorithm for
efficient SVM classification in large data sets.
(b) RID age income student credit rating Class: buyscomputer [6]
youth high no fair
2 youth high eKcellent
middleaged high no fair Ves
senior medium fair yes \
senio low yes fair yes
6 senio low yes excellent
middle aged Jow yes Cxcellent YeS

13
youth
youth
sernior
youth
medium
low
medium
medium
Yes

yes
tiiii:
fair
fair
fair
ecellent
ys
yes
12
middleaged medium no excellent yes
13
middle aged high yes air ves
14 senior medium no cxcellent no

Table 1: Computer Purchase Data


What would be the class label of the test tuple (age =youth, income = medium, student =
yes, credit rating = Excellent) using naive Bayesian classifier using the data in Table 1.
Q.4 (a) Consider the traditional decision tree algorithnm in your mind and provide the solution [8]
for the following issues related to the decision tree algorithm: (a) Handling continuous
attributes (b) Dealing with cost associated attributes, (c) Handling inherent bias
associated with information gain measure,(d) Handling missing values.
(b) Use these methods to normalize the following group of data: 200,300,400,600,1000 (a) [4]
min-max normalization by seting min= 0 and max = 1 (b) z-score normalization (c) z
score normalization using the mean absolute deviation instead of standard deviation (d)
normalization by decimal scaling

05 (a) Consider a multlayer feed-forward neural network that uses back propogation [8]
algorithm with given weight and bais values. Wi4 0.2, Wis =-.3, W24 =0.4, W2s =
0.1, W34=-0.5, W3s =0.2, W6 =-0.3, Ws6 =-0.2, 04 = -0.4, ; = 0.2, 6 = 0.1
1
The activation function 0, = 1+ei is used on each hidden or output unit j that receive an
input I, =ZiWyOi+, with respect to previous layer, i.
.2

W15
W46

WS6

What would be the predicted class for the test sample X= (1,0,0) using the mulilayer
feed-forward neural network classifier?
(b) Outline methods for addressing the class imbalance problem. Suppose a bank would like
[4]
to develop a classifier that guards against fraudulent credit card transactions. Illustrate
how you can induce a quality classifier based on a large set of non-fraudulent examples
and a very small set of fraudulent cases.

Page 2 of 2

You might also like