0% found this document useful (0 votes)

17 views6 pages

SemSuggestions DM

The document contains a comprehensive list of questions and topics related to data mining, covering concepts such as data cleaning, OLAP, classification, clustering, and various algorithms like Naïve Bayes and decision trees. It also includes practical exercises on distance calculations, association rule mining, and the application of algorithms to datasets. Additionally, it addresses challenges in data mining and the importance of data preprocessing.

Uploaded by

rounaksainbwn17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

SemSuggestions DM

Uploaded by

rounaksainbwn17

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Suggestions (Part 1):

1. Define Data-mining. Explain the process of Knowledge Discovery in Database (KDD). (5)
2. Define Data Cleaning. What do you mean by Feature Engineering? (2.5+2.5)
3. Define the terms ROLAP, MOLAP and HOLAP. Write two applications of OLAP. (3+2)
4. What is Feature Selection? Define Regression. (2+3)
5. How data warehouse is different from DBMS? How prediction is different from
classification? (2+3)
6. Define Support and Confidence. What is pruning? (4+1)

7. Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8): (5)
a. Compute the Euclidean distance between the two objects.
b. Compute the Manhattan distance between the two objects.
c. Compute the Minkowski distance between the two objects using p=3
8. How can we compute the dissimilarity between two binary objects? (5)
9. Write K- nearest neighbour classification algorithm. (5)
10. How can outliers affect the performance of a model? Explain with a suitable example. (5)
11. What is decision tree? Define the concept of classification. (2+3)
12. Define and differentiate Entropy and Gini Impurity in decision tree algorithms. Which one
among the two is more efficient and why? (5)
13. Explain the concepts of overfit and underfit and how do they affect the performance of a
model? (5)

14. Explain different OLAP operations on multi-dimensional data with suitable examples and
necessary diagrams of data cubes. (10)

15. Describe the difference between ROLAP and MOLAP. (5)

16. Suppose we have data on a few individuals randomly surveyed. The data gives the responses
towards interests to promotional offers made in the area of Finance, Travel and Health.
Gender is the output attribute to be predicted. Apply Naïve Bayesian classification algorithm
to classify the new instance:

(Finance=No, Travel=Yes, Health= No) (10)

Finance Travel Health Gender

Yes No No Male
Yes Yes No Male
No Yes Yes Female
No Yes Yes Male
Yes Yes Yes Female
No No No Female
Yes No No Male
Yes Yes No Male
No No Yes Female
Yes No No Male

17. Explain Association Rule Mining. (5)

18. Describe different data Pre-processing techniques. (10)

19. Explain the concepts of Overfit and Underfit. (5)

20. Consider the following transaction dataset ‘D’ which shows 9 transactions with the items I1,
I2, I3, I4 and I5. Apply Apriori Algorithm to find the frequent itemset and strong association
rules for the following table with minimum support 3 and minimum confidence 60%.
(15)

Tid List of items

T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

21. Define Classification. Explain the general approach for solving classification models. (2+7)
22. Explain confusion matrix. (6)
23. Illustrate Association Rule Mining with a suitable example. (10)
24. Explain Support and Confidence. (5)
25. Explain various data mining techniques. (9)
26. Differentiate OLTP and OLAP. (6)
27. Explain Naïve Bayesian Classification in detail with an example. (15)
28. Explain in detail about Outlier Analysis. (9)
29. How can outliers affect the performance of a model? (6)
30. What is the role of Bayes theorem in Naive Bayes Algorithm? (5)
31. Why is naïve Bayesian classification called “naive”? (5)
32. Explain independent events and mutually exclusive events. (5)
33. Compare and contrast classification and clustering techniques with suitable illustrations. (10)
34. Explain the significance of cross validation. (5)
35. Define information gain and explain its importance in decision tree algorithm. (8)

36. Explain the conditions for overfitting and underfitting in decision tree classification
algorithm. (7)
37. The following data gives us the prediction to play tennis based on some individual attributes
like Outlook and Humidity. Apply Decision Tree Algorithm to classify the new instance as
follows:
(Outlook=Overcast, Humidity= High) [15]

OUTLOOK HUMIDITY PLAY TENNIS

Overcast High Yes
Rainy High Yes
Rainy Normal Yes
Rainy Normal No
Overcast Normal Yes
Sunny High No
Sunny Normal Yes
Overcast High Yes
Overcast Normal Yes

38. Apply FP-Growth algorithm on the following transactional data to find frequent itemset. List
all frequent itemset with their support count. Generate the association rules. Minimum
support count is 3 and minimum confidence is 75%. (15)

Tid List of Items

1 f,a,c,d,g,m,p
2 a,b,c,f,l,m,o
3 b,f,h,o
4 b,f,c,p
5 a,f,c,l,p,m,n

39. Why is the similarity measure important? (3)

40. What are the types of similarity? Explain any two in details. (6)
41. “Data Pre-processing is necessary before data mining process”- Justify your answer (6)
42. Explain Market Basket Analysis. (4)
43. Describe four challenges of Data Mining. (4)
44. How is Hamming Distance calculated? Explain with an example. (7)
45. Write short notes on any three of the following: (3X5)
a. Supervised vs unsupervised learning
b. kNN algorithm
c. Confusion matrix
d. k-fold cross validation technique
e. PageRank algorithm
Suggestions (Part 2):

1. What is cluster analysis? Explain different types of clustering. (5)

2. What are the advantages of DBSCAN over k-Means clustering algorithm? How is entropy of a
dataset calculated? (2.5+2.5)
3. Explain the basic of Agglomerative Hierarchical clustering algorithm. (5)
4. What are the advantages of PAM algorithm over k-means algorithm? (5)
5. Describe the following activities involved in the web usage mining: i) Pre-processing activity
ii) Pattern analysis activity (5)
6. Explain text mining with suitable examples. (5)
7. Explain HITS algorithm with a suitable example. (5)
8. What is clustering? Compare and contrast k-Means and k-Medoid algorithms. How do you
determine the best value of k in these algorithms? (2+10+3)
9. What is hierarchical clustering? Explain the concepts of Agglomerative and Divisive methods
of clustering with suitable examples. (2+13)
10. What is web mining? What are the challenges in web mining? Explain HITS algorithm with
suitable illustrations. (3+5+7)

Cost Management of Engineering Projects PDF
50% (4)
Cost Management of Engineering Projects PDF
30 pages
IGCSE ICT - Turtle Graphics
100% (1)
IGCSE ICT - Turtle Graphics
4 pages
Operations Management-Chapter Four-1
No ratings yet
Operations Management-Chapter Four-1
44 pages
A Variable Is Any Characteristic or Quantity That Varies Among The Members of A Particular Group
No ratings yet
A Variable Is Any Characteristic or Quantity That Varies Among The Members of A Particular Group
61 pages
Hach Nitrate Method 10206 Final 01102013
No ratings yet
Hach Nitrate Method 10206 Final 01102013
10 pages
Aksum University College of Social Sciences and Humanity Department of Civic and Ethical Studies
100% (1)
Aksum University College of Social Sciences and Humanity Department of Civic and Ethical Studies
18 pages
Priyanka Final Project
No ratings yet
Priyanka Final Project
42 pages
NO Tanggal GDST Mlbi MLPT Close P RI Close P RI Close P RI
No ratings yet
NO Tanggal GDST Mlbi MLPT Close P RI Close P RI Close P RI
56 pages
Shubham Gandhi Ty Bba-Ib Final Project
No ratings yet
Shubham Gandhi Ty Bba-Ib Final Project
50 pages
Questions Answers Topic 5
No ratings yet
Questions Answers Topic 5
5 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
Apache Spark Week-5 PDF
No ratings yet
Apache Spark Week-5 PDF
9 pages
Consolidated Cse Question Bank1
No ratings yet
Consolidated Cse Question Bank1
170 pages
Analisis Big Data en El Mundo Corporativo
No ratings yet
Analisis Big Data en El Mundo Corporativo
8 pages
2024 Tutorial 12
No ratings yet
2024 Tutorial 12
7 pages
CS467 Machine Learning, January 2023
No ratings yet
CS467 Machine Learning, January 2023
3 pages
Certificate MeriSkill
No ratings yet
Certificate MeriSkill
2 pages
Families of Children With Down Syndrome: Responding To "A Change in Plans" With Resilience
No ratings yet
Families of Children With Down Syndrome: Responding To "A Change in Plans" With Resilience
14 pages
IT - Sem VI - DMBI - Sample Questions
No ratings yet
IT - Sem VI - DMBI - Sample Questions
10 pages
Question Bank: Q1) What Is Data Warehouse?
No ratings yet
Question Bank: Q1) What Is Data Warehouse?
17 pages
Book Exercises NayelliAnswers
No ratings yet
Book Exercises NayelliAnswers
3 pages
DWDM-CSE-Question Bank
No ratings yet
DWDM-CSE-Question Bank
11 pages
Lesson 2 - Accounting As An Information System
No ratings yet
Lesson 2 - Accounting As An Information System
26 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
8 pages
Cosumers in The Market
No ratings yet
Cosumers in The Market
38 pages
DMBI Questions
No ratings yet
DMBI Questions
8 pages
Cyber Security Chapter1
No ratings yet
Cyber Security Chapter1
10 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
Firewalls
No ratings yet
Firewalls
40 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
2018 & 2019 Data Mining Answers
No ratings yet
2018 & 2019 Data Mining Answers
25 pages
Jntuqp DWDM
No ratings yet
Jntuqp DWDM
8 pages
Vivaquestions
No ratings yet
Vivaquestions
14 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
Wizklub Futurz All - Grades AY 24-25
No ratings yet
Wizklub Futurz All - Grades AY 24-25
30 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Gandhinagar Institute of Technology: Computer Engineer Ing Department Question Bank
No ratings yet
Gandhinagar Institute of Technology: Computer Engineer Ing Department Question Bank
3 pages
DWDM Ii Mid Paper
No ratings yet
DWDM Ii Mid Paper
2 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Data Warehousing and Mining (Notes)
No ratings yet
Data Warehousing and Mining (Notes)
12 pages
Vi Sem Bca Qbank - Wcms - Fds
50% (2)
Vi Sem Bca Qbank - Wcms - Fds
11 pages
Question Bank DWM 2022-23 Vii Semester B.E. Cse
No ratings yet
Question Bank DWM 2022-23 Vii Semester B.E. Cse
3 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
DW Model Questions
No ratings yet
DW Model Questions
8 pages
Question Bank 2
No ratings yet
Question Bank 2
4 pages
Significance of The Stochastic Disturbance Term
No ratings yet
Significance of The Stochastic Disturbance Term
5 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
Data Warehousing and Mining April 2019
No ratings yet
Data Warehousing and Mining April 2019
4 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
2 pages
Data Mining
No ratings yet
Data Mining
3 pages
Table of Specification Edited Template
No ratings yet
Table of Specification Edited Template
1 page
Statistik Era Ayu Wandira
No ratings yet
Statistik Era Ayu Wandira
3 pages
Crime Prediction Detailed Presentation
No ratings yet
Crime Prediction Detailed Presentation
11 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Data Mning
No ratings yet
Data Mning
40 pages
Documentation of Our Project
No ratings yet
Documentation of Our Project
21 pages
Applied Research Main Notes
100% (1)
Applied Research Main Notes
9 pages
Mobile Computing
No ratings yet
Mobile Computing
3 pages
16 Marks DWDM
No ratings yet
16 Marks DWDM
6 pages
Rounak Sain - WIT - CA1
No ratings yet
Rounak Sain - WIT - CA1
5 pages
Presentation 1
No ratings yet
Presentation 1
4 pages
Presentation 1
No ratings yet
Presentation 1
4 pages
CEUC502 - DMBI - Question - Bank
No ratings yet
CEUC502 - DMBI - Question - Bank
12 pages
DocScanner 01-Feb-2025 12-50
No ratings yet
DocScanner 01-Feb-2025 12-50
2 pages
Ca1-Top Page For The Presentaion
No ratings yet
Ca1-Top Page For The Presentaion
1 page
Ca1-Top Page For The Presentaion
No ratings yet
Ca1-Top Page For The Presentaion
1 page
Data Mining (Gtu Sem-6) 002
No ratings yet
Data Mining (Gtu Sem-6) 002
5 pages
Merged Presentation Choladeck
No ratings yet
Merged Presentation Choladeck
14 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Sebastian Martin Resume
No ratings yet
Sebastian Martin Resume
2 pages
DWDM Important Questions
No ratings yet
DWDM Important Questions
2 pages
DM Vsaq
No ratings yet
DM Vsaq
8 pages
DWDM Mid-1
No ratings yet
DWDM Mid-1
3 pages
DM Question Bank
No ratings yet
DM Question Bank
2 pages
DWDM QB
No ratings yet
DWDM QB
6 pages
DMBI All Pyqs
No ratings yet
DMBI All Pyqs
4 pages
Data Mining Model Qns
100% (1)
Data Mining Model Qns
14 pages
DMDW
No ratings yet
DMDW
4 pages
DWDM Unitwise Questions
No ratings yet
DWDM Unitwise Questions
3 pages
QB Data Mining
No ratings yet
QB Data Mining
5 pages
Question Bank Bca - Ids
No ratings yet
Question Bank Bca - Ids
3 pages
DWDM Questions Bank (BCS058)
No ratings yet
DWDM Questions Bank (BCS058)
9 pages
DMBI
No ratings yet
DMBI
1 page
Data Science Question Bank
No ratings yet
Data Science Question Bank
6 pages
Applied Statistical - Syllabus
No ratings yet
Applied Statistical - Syllabus
4 pages
Data Mining Questions
No ratings yet
Data Mining Questions
5 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
Seperated
No ratings yet
Seperated
11 pages
Ans DM
No ratings yet
Ans DM
16 pages
Data Mining Long Answers
No ratings yet
Data Mining Long Answers
4 pages
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
100 Puzzles to Learn Data Warehousing
From Everand
100 Puzzles to Learn Data Warehousing
Cristian Scutaru
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet

SemSuggestions DM

Uploaded by

SemSuggestions DM

Uploaded by

Suggestions (Part 1):

15. Describe the difference between ROLAP and MOLAP. (5)

(Finance=No, Travel=Yes, Health= No) (10)

Finance Travel Health Gender

17. Explain Association Rule Mining. (5)

18. Describe different data Pre-processing techniques. (10)

19. Explain the concepts of Overfit and Underfit. (5)

Tid List of items

OUTLOOK HUMIDITY PLAY TENNIS

Tid List of Items

39. Why is the similarity measure important? (3)

1. What is cluster analysis? Explain different types of clustering. (5)

You might also like