DWDM Unit Wise Question Bank

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DWDM Unit wise Question Bank

Unit-1
1 Define Data Warehouse? Differentiate between OLAP and OLTP?

2 Discuss 3-tier Data Warehouse architecture with a neat diagram?

3 Explain the role of different types of Data Warehouse models?

What is a Data Cube in a multidimensional data model? Explain briefly about Data
4
Cube with a suitable example?

5 Summarize about different schemas available for multidimensional data modelling?

6 Discuss typical OLAP operations with suitable examples?

7 Explain the role of the Star net query model in the multidimensional data model?

8 Describe Data warehouse design and usage?

9 Explain the role of materialization in efficient Data cube computation?

10 Discuss different indexing techniques available for OLAP data?

11 Summarize about OLAP Server architectures?


Unit-2

1 What is Data mining? Explain the role of Data mining in KDD with a neat diagram?

2 Discuss the challenges that motivated the development of data mining

3 Illustrate briefly about any 4 Data mining tasks with suitable examples?

4 What are the properties of attribute values? Examine each attribute type?

5 What are the general characteristics of the dataset? Examine different types of dataset?

Examine the role of following terms with respect to quality of data.


I. Measurement Error
II. Data Collection Error
III. Noise & Artifacts
IV. Precision
6
V. Bias
VI. Accuracy
VII. Outliers
VIII. Missing Values
IX. Duplicate data
7 Discuss various data sampling approaches with relevant examples

What is Discretization? Discuss about Supervised Discretization and Unsupervised


8
Discretization?

9 Explain the role of variable transformation in data preprocessing

What is the Curse of Dimensionality? Examine Dimensionality reduction, Feature Subset


10
selection and Feature Creation?
a) Calculate the Euclidean distance, Manhattan distance and cosine similarity
between two vectors (3,2,0,5) and (1,0,0,0)
11
b) Find the Jaccard Similarity Coefficient between the two sets
A = {0,1,2,5,6} B = {0,2,3,4,5,7,9}
12 Calculate the correlation coefficient between two vectors (1, 2, 3, 4) and (3, 8, 7, 6)
Unit-3
Define the Following terms:
1. Classification Model
2. Classifier
1 3. Learning Algorithm
4. Training set
5. Confusion Matrix
6. Accuracy & Error Rate
2 Explain the working of a Decision Tree with an example
Write Hunt’s algorithm and apply it on the following dataset to construct a
decision tree?

4 Discuss the methods for expressing attribute test conditions


Discuss various measures for selecting best split for the following attribute types
5
a) binary attribute b) nominal (or) categorical attribute c) continuous attribute
6

List the reasons for Model Overfitting and explain the methods for evaluating
7
performance of classifier
Discuss the working of Naïve Bayes Classifier in finding the conditional
8
probability
9

Using Naïve Bayes Classifier on the above dataset, Predict the class label of the
below test record X = (Home Owner=No, Marital Status = Married, Income =
$120K
10 Find out Class Label (Play Golf ) for the record today = (Sunny, Cool, High, True)
using Naive Bayes Classification for the golf dataset
Unit-4
1. Explain the terms
i) support ii) confidence iii) association rule iv) apriori principle
2. Find all frequent itemsets using Apriori Algorithm with min support of 40% for the
below transaction database

TID items

100 B,C,E,J

200 B,C,J

300 B,M,Y

400 B,J,M

500 C,J,M

3. Find all frequent itemsets and generate association rules using Apriori Algorithm with
min support=2 and minimum confidence of 50% for the below transactions

4. Discuss various procedures for candidate set generation.


5. Explain the procedure of Support Counting using Hash Tree with suitable example
6. Explain the methods for compact representation of frequent itemsets with relevant
examples (closed frequent itemsets, maximal frequent itemsets)
7. a) Write the steps for generating frequent itemsets using FP-Growth algorithm
b) Generate frequent itemsets for the following data using FP-Growth algorithm with
min support=5
TID items

1 1,2,3,5

2 2,5,7,9

3 1,3,5,7

4 2,4,6,8
5 1,2,3,4

6 2,3,4,5

7 3,4,5,6

8 4,5,6,7

9 5,6,7,8,9

10 9,1,2,5

11 8,2,9,7

12 5,6,3,2

8. Generate frequent itemsets for the following data using FP-Growth algorithm with min
support=3

9. Generate frequent itemsets for the following data using FP-Growth algorithm with min
support=50%

Transaction List of items


T1 I1,I2,I3
T2 I2,I3,I4
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4
Unit-5

1. What is cluster analysis and discuss its applications.


2. Illustrate types of clustering with neat sketch
3. Explain in detail various types of clusters with neat sketch.
4. Explain in detail about algorithm for K-Means clustering and discuss its limitations
5. Discuss the problem of selecting initial centroids in K-Means and solution for them.
6. Cluster the following points into 3 clusters using K-Means
A(2,10) B(2,5) C(8,4) D(5,8) E(7,5) F(6,4) G(1,2) H(4,9)
Initial cluster centers are A, D, G
7. Explain the types of Hierarchical clustering with neat diagram and discuss its limitations
8. Discuss the following methods in finding inter-cluster distance in hierarchical clustering
a) MIN (or) Single Linkage b) MAX (or) Complete Linkage c) Grouping
d)Ward method
9. Find the clusters for the below points using Hierarchical Clustering. Use Euclidian distance as
metric and draw the dendrogram

10. Explain the DBSCAN algorithm and write its strength and weakness

11. Problems on DBSCAN

You might also like