DWDM-CSE-Question Bank
DWDM-CSE-Question Bank
13. State the differences between Data Mart & Data Warehouse. (5)
14. Distinguish between OLTP and OLAP systems. (5)
15. How is data warehouse different from a database? (3)
16. Explain Metadata in brief. Explain different types of Metadata.
(5)
17. Define Data Lake. What is a Data Mart? Define the types of Data Marts.
(2.5+2.5)
18. What is the significance of a multi-dimensional data model in data-warehousing?
Briefly compare the snowflake schema and fact constellation concepts with a suitable
example. (3+3)
19. Discuss the steps of the Apriori Algorithm for mining frequent itemsets. (5)
20. Generate FP-Tree for the following Transaction dataset. [Min. Support Count= 3]
Show the Conditional Pattern Base, Conditional FP-Tree and Frequent Item set. (10)
Transaction ID Items
T1 {E, K, M, N, O, Y}
T2 {D, E, K, N, O, Y}
T3 {A, E, K, M}
T4 {C, K, M, U, Y}
T5 {C, E, I, K, O}
21. Define with suitable examples of each of the following data mining functionalities:
data characterization, data association and data discrimination. (3)
22. Explain the architecture of a typical data mining system. (5)
23. What is meant by slice-and-dice? Give an example. (5)
24. Define Roll-up and Drill-down process with a suitable example. (5)
25. Explain the three-tier data warehousing architecture. (5)
26. What is ETL? Explain each of the terms clearly. (5)
27. Differentiate among ROLAP, MOLAP and HOLAP. (5)
28. Discuss the different phases of FP-tree growth algorithm. (5)
29. What do you mean by OLAP? What are the various OLAP operations in
multidimensional data models? Describe them briefly. (10)
30. Write a short note on Snowflake Schema, Galaxy Schema. (3+3)
31. Discuss Star schema with suitable example. (5)
32. Explain Jaccard similarity index. Find the Jaccard similarity index and Jaccard
distance for the following data: (5)
A = {0, 1, 2, 5, 6}
B = {0, 2, 3, 4, 5, 7, 9}
33. The rating data for 4 colleges is given. (5)
Sl. No. Engg. Teaching Fees Placements Internship Infrastructure
College
1. A 5 2 5 5 3
2. B 4 5 5 4 5
3. C 3 4 4 3 4
4. D 1 3 1 1 2
Find the Association Rules from the above frequent sets at minimum 50% confidence.
(10)
Module2:
1. Define decision tree. (3)
2. What are the advantages and disadvantages of the decision tree approach over other
approaches for data mining? (3)
3. What is clustering? What are the different clustering techniques? Write some
applications of cluster analysis. (6)
4. Define Entropy and Information Gain with suitable examples. (5)
5. Describe the working of the PAM K-medoids clustering algorithm. (5)
6. Define Classification and Prediction. (5)
7. Describe K-medoids algorithm in brief. (5)
8. Using K-means clustering algorithm, determine 3 clusters for the following eight data
points: A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). Distance
function is Euclidean distance. Do it for 3 iterations. (10)
9. Define Jaccard coefficient. (2)
10. Apply the K-means clustering for the following dataset for two clusters. Consider data
point S1 and S2 are the initial centroid of the respective clusters. Continue the
procedure for three iterations. (12)
Sample No. X Y
S1 185 72
S2 170 56
S3 168 60
S4 179 68
S5 182 72
S6 188 77
S7 180 71
S8 180 70
S9 183 84
S10 180 88
S11 180 67
S12 177 76
11. What do you mean by attribute selection measure with respect to decision tree
induction? (3)
12. Suppose that the data mining task is to cluster the following ten points representing
location into two clusters:
X1 2 6
X2 3 4
X3 3 8
X4 4 7
X5 6 2
X6 6 4
X7 7 3
X8 7 4
X9 8 5
X10 7 6
The distance function is defined as |Xi – Xj| + |Yi – Yj|. Use K-medoids algorithm to
determine the two clusters. (10)
13. Write down the algorithm for K-means clustering. (5)
14. What is hierarchical clustering technique? (5)
15. Distinguish between partitional clustering and hierarchical clustering. (5)
16. What is Classification and Clustering? Explain the key differences between them. (5)
17. What is a classification problem? What is the difference between Supervised and
Unsupervised Learning? (5)
18. Differentiate agglomerative hierarchical clustering and divisive hierarchical
clustering. (5)
19. Explain the ID3 algorithm for Decision Trees. (5)
20. What is a dendrogram? Explain it with the help of an example. (5)
21. Define Euclidean and Manhattan distance metric. (5)
22. What is a centroid point in K-means clustering? (2)
23. Apply Hierarchical Agglomerative clustering technique on the following dataset (Use
Complete Linkage method). Draw the corresponding dendrogram. (8)
Sample No. X Y
S1 40 53
S2 22 38
S3 35 32
S4 26 19
S5 8 41
S6 45 30
S7 40 50
24. Use single and complete linkage agglomerative clustering to group the data described
by the following distance matrix. Show the dendrograms. (5+5)
P1 P2 P3 P4 P5
P1 0 9 3 6 11
P2 9 0 7 5 10
P3 3 7 0 9 2
P4 6 5 9 0 8
P5 11 10 2 8 0
34. Data in the table below shows the height to nearest weight of a sample of 10 male
students drawn at random from 1st year students of an Engineering College. Construct
the regression line that approximates the data set: (10)
X 63 59 62 65 61 64 65 62 60 58
(height
in
inches)
Y 55 52 54 58 63 60 59 53 60 51
(weight
in kg)
35. A random sample of 15 students in that class was selected and the data is given
below:
Internal 15 23 18 23 24 22 22 19 19 16 24 11 24 16 23
Exam
External 49 63 58 60 58 61 60 63 60 52 62 30 59 49 68
Exam
Construct the regression line that approximates the data set. (10)
44. Create a decision tree for the following data given below. The objective is to predict
the class category (Play Tennis or not?). (10)
X Y Class
7 7 False
7 4 False
3 4 True
1 4 True
49. Apply kNN classification algorithm on the following dataset and predict the class for
P (X = 5 and Y = 7), where k = 3. (5)
X Y Class
7 7 False
7 5 False
3 4 True
4 4 True
4 3 False
50. Apply the data set of Question 44 for Naïve Bayes classification problem. The
objective is to predict the class category (Play Tennis or not?).
(10)
Module 3
1. What do you understand by ‘Secular Trend’ in the analysis of a time series? Explain
with examples. (5)
2. Explain the process of Exponential Smoothing with an example. (5)
3. Mention the merits and demerits of Moving Average Method. (5)
4. Distinguish between ‘seasonal’ and ‘cyclical’ fluctuations in time series data. (5)
5. Find the trend for the following series using a three-year weighted moving average
with weights 1, 2, 1. (5)
Year 1 2 3 4 5 6 7
Values 2 4 5 7 8 10 13
6. Discuss the method of fitting mathematical curves for determining the trend in time
series data. (5)
7. Fit a straight-line trend equation by the method of least squares from the following
data and then estimate the trend value for the year 2025. (5)
Year 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Value 65 80 84 75 77 71 76 74 70 68
8. With which component of the time series would you associate each of the following?
Why? (2 X 5=10)
(i) The rainfall that occurred in Calcutta for four days in February, 1981.
(ii) A decline in ice cream sales during November to March.
(iii) An era of prosperity.
(iv) Increase in garment sales in October.
(v) General increase in sale of T.V. sets.
9. Explain full periodic pattern and partial periodic pattern for time-related sequence
data with examples. (5)
10. Assuming a four-yearly cycle, calculate the trend by the method of moving averages
from the following data relating to the production of tea in India: (5)
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Productio 464 515 518 467 502 540 557 571 586 612
n
(lbs.)
11. The trend equation fitted to annual average sales is given by 𝑦 = 230 + 20𝑥, unit of
x- one year, origin- 30th June, 2008. Adjust the trend equation for finding the monthly
trend values and find trend value for the month of January 2020. (5)
12. Using 1964 as the origin, obtain a straight-line trend equation by the method of least
squares. Find the trend value of the missing year 1961. (5)
Year 1960 1962 1963 1964 1965 1966 1969
Value 140 144 160 152 168 176 180
14. Discuss briefly how we can obtain the monthly trend from annual data for odd and
even number of number of years given. (5)
15. a) Show that the sum of weights in an exponential smoothing is one. (3)
b) The last period’s forecast was 70 and demand was 60. What is the simple
exponential smoothing forecast with smoothing coefficient of 0.4 for the next period?
(2)
16. Fit a straight-line trend by the least squares method to the following figures of
production of a sugar factory: (5)
Year 1969 1970 1971 1972 1973 1974 1975
Production 76 87 95 81 91 96 90
(‘000
tons)
Estimate the production for 1976.
17. Explain in brief Similarity Search in Time-Series Analysis. (5)
Module4
1. Define Precision, Recall and F1 score in the context of evaluation of performance of a
machine learning model. (5)
2. A model makes predictions and predicts 120 examples as belonging to the minority
class, 90 of which are correct, and 30 of which are incorrect. Find the Precision of the
model. (3)
3. Precision of model is 0.75 and Recall is 0.43. Find the F-score. (2)
4. What is a Class Imbalance problem in the context of data analysis? (5)
5. Explain confusion matrix. (5)
6. Describe in brief the methodologies for Stream Data Processing and Stream Data
Systems. (10)
7. Explain Random Sampling, Sliding Window and Histogram concept with respect to
mining data streams. (6)
8. Explain Graph Mining. (5)
9. What is Social Network Analysis? (5)
10. What are the characteristics of Social Networks? Explain each of them briefly. (5)
11. What is frequent pattern mining in data stream? (5)
12. What is sequential pattern mining in data stream? (5)
Module5
1. What do you understand by Web Mining? What are the three types of web mining?
(5)
2. Compare Web Mining with Data Mining. (5)
3. Explain the challenges for mining the Web Wide Web. (5)
4. Explain the HITS Algorithm with an example. (5)
5. Explain in brief Web Structure Mining. (5)
6. Explain in brief Web Content Mining. (5)
7. Explain in brief Web Usage Mining. (5)
8. What is Vision-based Page Segmentation (VIPS)? (5)
9. What is a hub in the context of web pages? (3)
10. What is meant by authoritative Web pages? (3)
11. Write a short note on Automatic Classification of Web documents. (5)
12. Discuss about mining multimedia data on the web. (5)
13. There are 3 pages in a web graph: A, B and C. A points to B and C. But has no
incoming links itself. B and C have no outgoing links. For a value of “d” (Damping
factor) given as 0.6. Find the Page Ranks of A, B and C. (5)
14. There are 3 pages in a web graph, A, B and C. A and B both point to C but have no
incoming links themselves. C has no outgoing links. For a value of “d” (Damping
factor) given as 0.6. Find the Page Ranks of A, B and C. (5)
15. Explain the Page Rank algorithm. (5)
Module6
1. What is the requirement of dimensionality reduction and explain how PCA helps in
that scenario? (5)
2. Explain the steps of PCA Algorithm. (5)
3. Explain Curse of Dimensionality. (5)
4. Explain the differences between Social Network Analysis and Traditional Data
Mining. (5)
5. What does Social Network Analysis (SNA) mean? (5)
6. Write a short note on issues and challenges in data mining. (5)
7. What are the recent trends in data mining? (10)
8. What do you understand by the term “Graph Mining”? (5)
9. Why Class Imbalance is a problem? Explain with an example. (10)
10. What are the recent developments in distributed data warehouse environments? (10)
11. Explain the concept of distributed data mining. (5)
12. What are the issues relating to the diversity of data types? (5)
13. Find the covariance matrix of the following data:
X 2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1
Y 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9