0% found this document useful (0 votes)

38 views11 pages

DWDM-CSE-Question Bank

The document contains questions about data mining techniques and processes. It covers topics like data warehousing, association rule mining, clustering algorithms like k-means and k-medoids, and decision trees. Several questions involve applying these techniques to datasets and analyzing the results.

Uploaded by

MD SAQLAIN AHMAD KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

DWDM-CSE-Question Bank

Uploaded by

MD SAQLAIN AHMAD KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Module1

1. What is Data Mining? (3)

2. Define data warehouse. What is the purpose of it? (2+3)
3. What are the key elements of a data warehouse? Explain each of them. (6)
4. Describe the key steps in the data mining process. Why is it important to follow these
processes? (5)
5. What are the major mistakes to be avoided when doing data mining? (3)
6. Why is data cleaning so important? (3)
7. Define support, confidence and lift in Association rule mining. What are the demerits
of Apriori Algorithm? (3+2)
8. What is an Association Rule? What is the importance of Association Rules in Data
Mining? (5)
9. Find the cosine similarity and the dissimilarity between the 2 vectors- ‘X’ & ‘Y’
X= {3, 2, 0, 5} and Y = {1, 0, 0, 0} (5)
10. For the following given Transaction Data set, generate rules using Apriori Algorithm.
Consider the values of support = 22% & Confidence = 70%. (10)

11. Explain the KDD process in detail. (5)

12. Differentiate among Enterprise Warehouse, Data mart and Virtual warehouse. (3)

13. State the differences between Data Mart & Data Warehouse. (5)
14. Distinguish between OLTP and OLAP systems. (5)
15. How is data warehouse different from a database? (3)
16. Explain Metadata in brief. Explain different types of Metadata.
(5)
17. Define Data Lake. What is a Data Mart? Define the types of Data Marts.
(2.5+2.5)
18. What is the significance of a multi-dimensional data model in data-warehousing?
Briefly compare the snowflake schema and fact constellation concepts with a suitable
example. (3+3)
19. Discuss the steps of the Apriori Algorithm for mining frequent itemsets. (5)
20. Generate FP-Tree for the following Transaction dataset. [Min. Support Count= 3]
Show the Conditional Pattern Base, Conditional FP-Tree and Frequent Item set. (10)

Transaction ID Items
T1 {E, K, M, N, O, Y}
T2 {D, E, K, N, O, Y}
T3 {A, E, K, M}
T4 {C, K, M, U, Y}
T5 {C, E, I, K, O}

21. Define with suitable examples of each of the following data mining functionalities:
data characterization, data association and data discrimination. (3)
22. Explain the architecture of a typical data mining system. (5)
23. What is meant by slice-and-dice? Give an example. (5)
24. Define Roll-up and Drill-down process with a suitable example. (5)
25. Explain the three-tier data warehousing architecture. (5)
26. What is ETL? Explain each of the terms clearly. (5)
27. Differentiate among ROLAP, MOLAP and HOLAP. (5)
28. Discuss the different phases of FP-tree growth algorithm. (5)
29. What do you mean by OLAP? What are the various OLAP operations in
multidimensional data models? Describe them briefly. (10)
30. Write a short note on Snowflake Schema, Galaxy Schema. (3+3)
31. Discuss Star schema with suitable example. (5)
32. Explain Jaccard similarity index. Find the Jaccard similarity index and Jaccard
distance for the following data: (5)
A = {0, 1, 2, 5, 6}
B = {0, 2, 3, 4, 5, 7, 9}
33. The rating data for 4 colleges is given. (5)
Sl. No. Engg. Teaching Fees Placements Internship Infrastructure
College
1. A 5 2 5 5 3
2. B 4 5 5 4 5
3. C 3 4 4 3 4
4. D 1 3 1 1 2

a) Find the Euclidean distance between

i) College A-B
ii) College B-C
iii) College C-D
iv) College A-D
b) Out of the above-mentioned group of colleges, which of the group of college has
the shortest Euclidean distance between them?
34. Generate all Frequent Itemsets from the following transaction data given minimum
support = 0.3.
TID Items TID Items
1 A, B, C, E 6 B, C
2 B, D, E 7 A, C, E
3 B, C 8 A, B, C, E
4 A, B, D 9 A, B, C
5 A, C 10 C, D, E

Find the Association Rules from the above frequent sets at minimum 50% confidence.
(10)

Module2:
1. Define decision tree. (3)
2. What are the advantages and disadvantages of the decision tree approach over other
approaches for data mining? (3)
3. What is clustering? What are the different clustering techniques? Write some
applications of cluster analysis. (6)
4. Define Entropy and Information Gain with suitable examples. (5)
5. Describe the working of the PAM K-medoids clustering algorithm. (5)
6. Define Classification and Prediction. (5)
7. Describe K-medoids algorithm in brief. (5)
8. Using K-means clustering algorithm, determine 3 clusters for the following eight data
points: A1(2,10), A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). Distance
function is Euclidean distance. Do it for 3 iterations. (10)
9. Define Jaccard coefficient. (2)
10. Apply the K-means clustering for the following dataset for two clusters. Consider data
point S1 and S2 are the initial centroid of the respective clusters. Continue the
procedure for three iterations. (12)
Sample No. X Y
S1 185 72
S2 170 56
S3 168 60
S4 179 68
S5 182 72
S6 188 77
S7 180 71
S8 180 70
S9 183 84
S10 180 88
S11 180 67
S12 177 76

11. What do you mean by attribute selection measure with respect to decision tree
induction? (3)
12. Suppose that the data mining task is to cluster the following ten points representing
location into two clusters:
X1 2 6
X2 3 4
X3 3 8
X4 4 7
X5 6 2
X6 6 4
X7 7 3
X8 7 4
X9 8 5
X10 7 6

The distance function is defined as |Xi – Xj| + |Yi – Yj|. Use K-medoids algorithm to
determine the two clusters. (10)
13. Write down the algorithm for K-means clustering. (5)
14. What is hierarchical clustering technique? (5)
15. Distinguish between partitional clustering and hierarchical clustering. (5)
16. What is Classification and Clustering? Explain the key differences between them. (5)
17. What is a classification problem? What is the difference between Supervised and
Unsupervised Learning? (5)
18. Differentiate agglomerative hierarchical clustering and divisive hierarchical
clustering. (5)
19. Explain the ID3 algorithm for Decision Trees. (5)
20. What is a dendrogram? Explain it with the help of an example. (5)
21. Define Euclidean and Manhattan distance metric. (5)
22. What is a centroid point in K-means clustering? (2)
23. Apply Hierarchical Agglomerative clustering technique on the following dataset (Use
Complete Linkage method). Draw the corresponding dendrogram. (8)
Sample No. X Y
S1 40 53
S2 22 38
S3 35 32
S4 26 19
S5 8 41
S6 45 30
S7 40 50

24. Use single and complete linkage agglomerative clustering to group the data described
by the following distance matrix. Show the dendrograms. (5+5)
P1 P2 P3 P4 P5
P1 0 9 3 6 11
P2 9 0 7 5 10
P3 3 7 0 9 2
P4 6 5 9 0 8
P5 11 10 2 8 0

25. How does agglomerative hierarchical clustering works? (5)

26. How does divisive hierarchical clustering works? (5)
27. Write Bayesian probabilistic Theory. (5)
28. What is a regression model? (3)
29. What are the different types of regression? (5)
30. Explain simple linear regression. (5)
31. Explain multiple linear regression. (5)
32. How to improve accuracy of the linear regression model? (5)
33. Use the data given in Dataset as shown below, create a regression model to predict the
Test2 from Test1 score. Then predict the score for the one who got a 46 in Test1. (10)
Test1 Test2
59 56
52 63
44 55
51 50
42 66
42 48
41 58
45 36
27 13
63 50
54 81
44 56
50 64
47 50

34. Data in the table below shows the height to nearest weight of a sample of 10 male
students drawn at random from 1st year students of an Engineering College. Construct
the regression line that approximates the data set: (10)
X 63 59 62 65 61 64 65 62 60 58
(height
in
inches)
Y 55 52 54 58 63 60 59 53 60 51
(weight
in kg)

35. A random sample of 15 students in that class was selected and the data is given
below:
Internal 15 23 18 23 24 22 22 19 19 16 24 11 24 16 23
Exam
External 49 63 58 60 58 61 60 63 60 52 62 30 59 49 68
Exam
Construct the regression line that approximates the data set. (10)

36. Explain logistic regression with example. (5)

37. Explain Ordinary Least Squares (OLS) algorithm in the context of regression analysis.
(5)
38. Explain the key differences between classification and regression. (5)
39. What are the advantages of Logistic Regression? (3)
40. What are the disadvantages of Logistic Regression? (3)
41. What is sigmoid function? (3)
42. List down the advantages of the Decision Trees. (3)
43. List down the disadvantages of the Decision Trees. (3)

44. Create a decision tree for the following data given below. The objective is to predict
the class category (Play Tennis or not?). (10)

45. Write down the kNN Algorithm. (5)

46. Why kNN is known as lazy learning and non-parametric algorithm? (2.5+2.5)
47. List down the advantages and disadvantages of kNN algorithm. (5)
48. Apply kNN classification algorithm on the following dataset and predict the class for
P (X =3 and Y =7), where k = 3. (5)

X Y Class

7 7 False

7 4 False

3 4 True

1 4 True

49. Apply kNN classification algorithm on the following dataset and predict the class for
P (X = 5 and Y = 7), where k = 3. (5)
X Y Class

7 7 False

7 5 False

3 4 True

4 4 True

4 3 False

50. Apply the data set of Question 44 for Naïve Bayes classification problem. The
objective is to predict the class category (Play Tennis or not?).
(10)

Module 3
1. What do you understand by ‘Secular Trend’ in the analysis of a time series? Explain
with examples. (5)
2. Explain the process of Exponential Smoothing with an example. (5)
3. Mention the merits and demerits of Moving Average Method. (5)
4. Distinguish between ‘seasonal’ and ‘cyclical’ fluctuations in time series data. (5)
5. Find the trend for the following series using a three-year weighted moving average
with weights 1, 2, 1. (5)
Year 1 2 3 4 5 6 7
Values 2 4 5 7 8 10 13

6. Discuss the method of fitting mathematical curves for determining the trend in time
series data. (5)

7. Fit a straight-line trend equation by the method of least squares from the following
data and then estimate the trend value for the year 2025. (5)
Year 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Value 65 80 84 75 77 71 76 74 70 68

8. With which component of the time series would you associate each of the following?
Why? (2 X 5=10)
(i) The rainfall that occurred in Calcutta for four days in February, 1981.
(ii) A decline in ice cream sales during November to March.
(iii) An era of prosperity.
(iv) Increase in garment sales in October.
(v) General increase in sale of T.V. sets.
9. Explain full periodic pattern and partial periodic pattern for time-related sequence
data with examples. (5)
10. Assuming a four-yearly cycle, calculate the trend by the method of moving averages
from the following data relating to the production of tea in India: (5)
Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Productio 464 515 518 467 502 540 557 571 586 612
n
(lbs.)

11. The trend equation fitted to annual average sales is given by 𝑦 = 230 + 20𝑥, unit of
x- one year, origin- 30th June, 2008. Adjust the trend equation for finding the monthly
trend values and find trend value for the month of January 2020. (5)

12. Using 1964 as the origin, obtain a straight-line trend equation by the method of least
squares. Find the trend value of the missing year 1961. (5)
Year 1960 1962 1963 1964 1965 1966 1969
Value 140 144 160 152 168 176 180

13. Fit a second-degree polynomial to the following data: (5)

Year 1882 1883 1884 1885 1886 1887 1888 1889 1890
Price 84 82 76 72 69 68 70 72 73
index

14. Discuss briefly how we can obtain the monthly trend from annual data for odd and
even number of number of years given. (5)
15. a) Show that the sum of weights in an exponential smoothing is one. (3)
b) The last period’s forecast was 70 and demand was 60. What is the simple
exponential smoothing forecast with smoothing coefficient of 0.4 for the next period?
(2)
16. Fit a straight-line trend by the least squares method to the following figures of
production of a sugar factory: (5)
Year 1969 1970 1971 1972 1973 1974 1975
Production 76 87 95 81 91 96 90
(‘000
tons)
Estimate the production for 1976.
17. Explain in brief Similarity Search in Time-Series Analysis. (5)

Module4
1. Define Precision, Recall and F1 score in the context of evaluation of performance of a
machine learning model. (5)
2. A model makes predictions and predicts 120 examples as belonging to the minority
class, 90 of which are correct, and 30 of which are incorrect. Find the Precision of the
model. (3)
3. Precision of model is 0.75 and Recall is 0.43. Find the F-score. (2)
4. What is a Class Imbalance problem in the context of data analysis? (5)
5. Explain confusion matrix. (5)
6. Describe in brief the methodologies for Stream Data Processing and Stream Data
Systems. (10)
7. Explain Random Sampling, Sliding Window and Histogram concept with respect to
mining data streams. (6)
8. Explain Graph Mining. (5)
9. What is Social Network Analysis? (5)
10. What are the characteristics of Social Networks? Explain each of them briefly. (5)
11. What is frequent pattern mining in data stream? (5)
12. What is sequential pattern mining in data stream? (5)

Module5
1. What do you understand by Web Mining? What are the three types of web mining?
(5)
2. Compare Web Mining with Data Mining. (5)
3. Explain the challenges for mining the Web Wide Web. (5)
4. Explain the HITS Algorithm with an example. (5)
5. Explain in brief Web Structure Mining. (5)
6. Explain in brief Web Content Mining. (5)
7. Explain in brief Web Usage Mining. (5)
8. What is Vision-based Page Segmentation (VIPS)? (5)
9. What is a hub in the context of web pages? (3)
10. What is meant by authoritative Web pages? (3)
11. Write a short note on Automatic Classification of Web documents. (5)
12. Discuss about mining multimedia data on the web. (5)
13. There are 3 pages in a web graph: A, B and C. A points to B and C. But has no
incoming links itself. B and C have no outgoing links. For a value of “d” (Damping
factor) given as 0.6. Find the Page Ranks of A, B and C. (5)
14. There are 3 pages in a web graph, A, B and C. A and B both point to C but have no
incoming links themselves. C has no outgoing links. For a value of “d” (Damping
factor) given as 0.6. Find the Page Ranks of A, B and C. (5)
15. Explain the Page Rank algorithm. (5)

Module6
1. What is the requirement of dimensionality reduction and explain how PCA helps in
that scenario? (5)
2. Explain the steps of PCA Algorithm. (5)
3. Explain Curse of Dimensionality. (5)
4. Explain the differences between Social Network Analysis and Traditional Data
Mining. (5)
5. What does Social Network Analysis (SNA) mean? (5)
6. Write a short note on issues and challenges in data mining. (5)
7. What are the recent trends in data mining? (10)
8. What do you understand by the term “Graph Mining”? (5)
9. Why Class Imbalance is a problem? Explain with an example. (10)
10. What are the recent developments in distributed data warehouse environments? (10)
11. Explain the concept of distributed data mining. (5)
12. What are the issues relating to the diversity of data types? (5)
13. Find the covariance matrix of the following data:
X 2.5 0.5 2.2 1.9 3.1 2.3 2 1 1.5 1.1
Y 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9

DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
Datamining Quiz
No ratings yet
Datamining Quiz
173 pages
Vi Sem Bca Qbank - Wcms - Fds
50% (2)
Vi Sem Bca Qbank - Wcms - Fds
11 pages
DM Question Bank
No ratings yet
DM Question Bank
50 pages
Das Pal Volume 2
100% (1)
Das Pal Volume 2
585 pages
DW & DM Questions & Answers
No ratings yet
DW & DM Questions & Answers
12 pages
Nutanix - SRE Intern Job Description
No ratings yet
Nutanix - SRE Intern Job Description
1 page
DMDA Viva Questions-1
No ratings yet
DMDA Viva Questions-1
7 pages
Be Computer Engineering Aids Semester 5 2024 May Data Warehousing and Miningrev 2019 C Scheme
No ratings yet
Be Computer Engineering Aids Semester 5 2024 May Data Warehousing and Miningrev 2019 C Scheme
2 pages
Data Mining
100% (1)
Data Mining
7 pages
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
No ratings yet
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
4 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
DMBI All Pyqs
No ratings yet
DMBI All Pyqs
4 pages
Ca-3 QB (Pec-It602b) - 2024-1
No ratings yet
Ca-3 QB (Pec-It602b) - 2024-1
12 pages
DM 100
No ratings yet
DM 100
17 pages
Mobile Computing
No ratings yet
Mobile Computing
3 pages
DWDM SR2
No ratings yet
DWDM SR2
21 pages
DWDM Ii Mid Paper
No ratings yet
DWDM Ii Mid Paper
2 pages
DWM Pyq
No ratings yet
DWM Pyq
10 pages
DWDM
No ratings yet
DWDM
18 pages
Fundamentals of Data Science-1
No ratings yet
Fundamentals of Data Science-1
9 pages
Mca 302
No ratings yet
Mca 302
1 page
2018 & 2019 Data Mining Answers
No ratings yet
2018 & 2019 Data Mining Answers
25 pages
CEUC502 - DMBI - Question - Bank
No ratings yet
CEUC502 - DMBI - Question - Bank
12 pages
DWDM QB
No ratings yet
DWDM QB
6 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Comp 414 Revision
No ratings yet
Comp 414 Revision
9 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
DWM Assignment
No ratings yet
DWM Assignment
15 pages
Datamining Bits
No ratings yet
Datamining Bits
16 pages
DWM Paper 1
No ratings yet
DWM Paper 1
2 pages
16CS531-Data Warehousing and Data Mining
No ratings yet
16CS531-Data Warehousing and Data Mining
6 pages
DM QB
No ratings yet
DM QB
7 pages
SemSuggestions DM
No ratings yet
SemSuggestions DM
6 pages
HW 1
No ratings yet
HW 1
5 pages
Data Mining Questions
No ratings yet
Data Mining Questions
5 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Mining IMP Objective Questions - Sep 2023
No ratings yet
Data Mining IMP Objective Questions - Sep 2023
4 pages
Data Mining Long Answers
No ratings yet
Data Mining Long Answers
4 pages
Vivaquestions
No ratings yet
Vivaquestions
14 pages
Review Sheet 1 Question I: MCQ
No ratings yet
Review Sheet 1 Question I: MCQ
10 pages
QB Data Mining
No ratings yet
QB Data Mining
5 pages
DW Model Questions
No ratings yet
DW Model Questions
8 pages
Dmwviva
No ratings yet
Dmwviva
4 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
Question Bank Bca - Ids
No ratings yet
Question Bank Bca - Ids
3 pages
(It-704c) Data Warehousing and Data Mining (2013-14)
No ratings yet
(It-704c) Data Warehousing and Data Mining (2013-14)
6 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
IT - Sem VI - DMBI - Sample Questions
No ratings yet
IT - Sem VI - DMBI - Sample Questions
10 pages
Question Bank 2
No ratings yet
Question Bank 2
4 pages
1 - Page
No ratings yet
1 - Page
11 pages
Assign em NT
No ratings yet
Assign em NT
2 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Question Bank: Q1) What Is Data Warehouse?
No ratings yet
Question Bank: Q1) What Is Data Warehouse?
17 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
Wind Driven Mobile Charging Unit Signed
No ratings yet
Wind Driven Mobile Charging Unit Signed
6 pages
Anik's Resume
No ratings yet
Anik's Resume
1 page
College Anchoring Script
No ratings yet
College Anchoring Script
3 pages
Olevel ICT 2 14MAY AB
No ratings yet
Olevel ICT 2 14MAY AB
2 pages
MCQ On Keyboard Shortcuts 5eea6a1639140f30f369f54d
No ratings yet
MCQ On Keyboard Shortcuts 5eea6a1639140f30f369f54d
22 pages
Amcat Prep1883938391911
No ratings yet
Amcat Prep1883938391911
1 page
Amcat Prep1883938391911
No ratings yet
Amcat Prep1883938391911
1 page
Tionhffffg
No ratings yet
Tionhffffg
1 page
Motto of Mvcdkhwoon
No ratings yet
Motto of Mvcdkhwoon
1 page
Solution of Coding Question Virtusa 2024
No ratings yet
Solution of Coding Question Virtusa 2024
1 page