0% found this document useful (0 votes)
23 views3 pages

Data Mining and Warehousing22

This document is an examination paper for the B. Tech (Fifth Semester – Regular) on Data Mining & Data Warehousing at GIET University. It consists of multiple choice questions, short answer questions, and long answer questions covering various topics in data mining and data warehousing. The exam is structured to assess knowledge on algorithms, data structures, and statistical measures relevant to the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views3 pages

Data Mining and Warehousing22

This document is an examination paper for the B. Tech (Fifth Semester – Regular) on Data Mining & Data Warehousing at GIET University. It consists of multiple choice questions, short answer questions, and long answer questions covering various topics in data mining and data warehousing. The exam is structured to assess knowledge on algorithms, data structures, and statistical measures relevant to the field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

QPC: RD20BTECH327 AR 20 Reg.

No

GIET UNIVERSITY, GUNUPUR – 765022


B. Tech (Fifth Semester – Regular) Examinations, December – 2022
BPCCS5010 / BPCCT5010 - Data Mining & Data Warehousing
(CSE & CST)
Time: 3 hrs Maximum: 70 Marks
Answer ALL Questions
The figures in the right hand margin indicate marks.
PART – A: (Multiple Choice Questions) (1 x 10 = 10 Marks)

Q.1. Answer ALL questions CO # PO #


a. What does Apriori algorithm do? CO-3 PO-1
i. It mines all frequent patterns through ii. It mines all frequent patterns through
pruning rules with lesser support pruning rules with higher support
iii. Both 1 and 2 iv. None of the above
b. What is not true about FP growth algorithms? CO-2 PO-2
i. It mines frequent itemsets without ii. There are chances that FP trees may not
candidate generation. fit in the memory
iii. FP trees are very expensive to build iv. It expands the original database to
build FP trees.
c. What is Gini index? CO-3 PO-1
i. It is a type of index structure ii. It is a measure of purity
iii. Both options except none iv. None of the options
d. Which one of these is not a tree based learner? CO-2 PO-2
i. CART ii. ID3
iii. Bayesian classifier iv. Random Forest
e. The following technology is not well-suited for data mining: CO-3 PO-1
i. Expert system technology ii. Data visualization
iii. Technology limited to specific data iv. Parallel architecture
types such as numeric data types
f. Which of the following features usually applies to data in a data warehouse? CO-3 PO-1
i. Data are often deleted ii. Most applications consist of
transactions
iii. Data are rarely deleted iv. Relatively few records are processed by
applications
g. In the relational database terminology, a table is synonymous with: CO-1 PO-1
i. A column ii. A row
iii. An attribute iv. A relation
h. A null value indicates: CO-1 PO-1
i. A numeric value with value 0 ii. The absence of a value
iii. A very small value iv. An erroneous value
i. The following is a major disadvantage while using a neural network CO-2 PO-2
i. It is very difficult to find optimal or near ii. Interpretation of the model becomes
optimal parameters for the network very difficult
iii. It becomes difficult to model non-linear iv. The number of inputs it can handle are
relation between input and output limited
variables
j. In training a neural network using back propagation algorithm CO-2 PO-2
i. Chain rule of differentiation is used in ii. Activation functions are chosen so that
computing gradient of the error surface they are differentiable in nature
iii. The connecting weights can be iv. All of the above
generated initially at random in the
range of (0.0, 1.0)
Page 1 of 3
PART – B: (Short Answer Questions) (2 x 10 = 20 Marks)

Q2. Answer ALL questions CO # PO #

a. What is Knowledge Discovery? CO-1 PO-1

b. What is the need of data warehouses? CO-2 PO-2

c. Define fact table. CO-4 PO-1

d. Define metadata and explain the types of metadata CO-3 PO-1

e. Define support and confidence. CO-3 PO-1

f. Find the cosine similarity between the given two term frequency vectors: CO-2 PO-1

X=[3,2,0,5,0,0,0,2,0,0]
Y=[1,0,0,0,0,0,0,1,0,2]
g. What is attribute selection measure? CO-3 PO-1

h. Briefly describe the k-NN classification algorithm. CO-3 PO-3

i. Give two examples of activation function used in neural networks. CO-3 PO-2

j. Explain the principle of hierarchical clustering. CO-3 PO-1

PART – C: (Long Answer Questions) (10 x 4 = 40 Marks)

Answer ALL questions Marks CO # PO #

3.a. Briefly outline how to compute the dissimilarity between objects described by 5 CO-1 PO-2
the following types of variables:
i. Numerical (interval-scaled) variables
ii. Categorical variables
iii. Ratio-scaled variables
iv. Nonmetric vector objects
b. Explain the steps of KDD, with the help of a diagram. 5 CO-1 PO-1

(OR)
c. Suppose that a hospital tested the age and body fat data for 18 randomly 10 CO-2 PO-2
selected adults with the following results:
Age 23 23 27 27 39 41 47 49 50
% fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2
Age 52 54 54 56 57 58 58 60 61
% fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7

i. Calculate the mean, median, and standard deviation of age and %fat.
ii. Find out the covariance and correlation among these two attributes.
4.a. Explain how Apriori Algorithm is used for mining frequent item sets. 5 CO-2 PO-1

b. What are the measures of interestingness for an association rule? Define a 5 CO-2 PO-2
strong association rule.
(OR)
c. There are five transactions (T1,T2,T3,T4,T5) with items (A,B,C,D) purchased 10 CO-3 PO-2
as T1(B,C),T2(A,C,D),T3(B,C), T4(A,B,C,D), T5(B,D). The min_sup=2.
Show how Apriori Rule Mining Algorithm can generate the association rules
for the above dataset.
Page 2 of 3
5.a. What is decision trees algorithm? List down the attribute selection measures 5 CO-2 PO-2
used by the ID3 algorithm to construct a Decision Tree.
b. Write short answer on Naïve Bayes classifier. 5 CO-2 PO-1

(OR)
c. A multilayer feed-forward neural network is shown in below Figure. Let the 10 CO-3 PO-2
learning rate be 0.9. The initial weight and bias values of the network are given
in Table below, along with the first training tuple, X = (1, 0, 1), with a class
label of 1. Compute Net input, output and error at each node and update weight
and bias values just once. Use logistic activation function at nodes 4, 5 and 6.

Initial Input, weight and Bias values:


𝑥1 𝑥2 𝑥3 𝑤14 𝑤15 𝑤24 𝑤24 𝑤34 𝑤35 𝑤46 𝑤56 𝜃4 𝜃5 𝜃6
1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2 -0.4 0.2 0.1

6.a. Why is outlier mining important? Briefly describe the different approaches 5 CO-2 PO-2
behind distanced-based outlier detection and density based local outlier
detection.
b. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8): 5 CO-2 PO-1
Compute the Minkowski distance between the two objects, using q = 3.
(OR)
c. Both k-means and k-medoids algorithms can perform effective Clustering. 5 CO-3 PO-2
Illustrate the strength and weakness of k-means in comparison with the k-
medoids algorithm.
d. Suppose that the data mining task is to cluster the following eight points (with 5 CO-3 PO-2
(x, y) representing location) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9):
The distance function is Euclidean distance. Suppose initially we assign A1,
B1, and C1 as the center of each cluster, respectively.
Use the k-means algorithm to show only
i. The three cluster centers after the first round execution
ii. The final three clusters
--- End of Paper ---

Page 3 of 3

You might also like