Data Mining List of Important Question

Uploaded by

Amrit Sapkota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views4 pages

Data Mining List of Important Question

Uploaded by

Amrit Sapkota

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Probable Questions Collection for Data Mining (Elective-I )

Chapter 1: Introduction [4 Marks]

1. What is Data Mining? Differentiate between descriptive and predictive Data Mining.
2. Explain the architecture of a typical data mining system.
3. What are the main requirements of data mining?
4. What are the key steps in Knowledge Discovery in Databases (KDD)? Explain.
5. Is KDD a Data Mining? Explain the phases of KDD with example.
6. Explain Data Mining as a step in KDD.
7. How is Data Mining different from query tools?
8. Describe Data Warehouse and explain its characteristics.
9. What is meta data? Briefly explain the architecture of Data Warehouse.
10. Explain Different Data Warehouse Models.
11. What are Virtual Warehouse, Data Mart and Enterprise Warehouse? Explain.
12. What do you mean by “schema”? Briefly describe different Data Warehouse schemas.
13. Compare Star, Snowflakes and Fact Constellation schemas with examples.
14. Explain symmetric and asymmetric data with example.
Chapter 2: Data Pre-processing [10 marks]
1. Why is data preprocessing required?
2. What do you mean by attribute? Explain Nominal, Ordinal, Interval and Ratio attribute types.
3. Differentiate discrete and continuous attributes with examples.
4. What are the types of data sets? What are spatial and temporal data. Give examples.
5. What do you mean by dimension of data? Briefly explain about curse of Dimensionality.
6. Explain why and how to avoid curse of dimensionality.
7. Explain the techniques for dimensionality reduction.
8. Differentiate between feature selection and feature extraction.
9. What do you mean by skewed data? Describe positive and negative skewness. Why are real-life data skewed?
10. Explain different processes involved in data preprocessing.
11. What is data cleaning? Explain different approaches to handle missing data, noisy data and outliers.
12. What is data integration? What are the challenges of Data Integration? How can they be handled?
13. Explain data reduction with strategies.
14. What do you mean by Data Sampling? Explain various way of sampling the data.
15. Describe briefly about Data Discretization and how it can be achieved.
16. What is Online Analytical Processing (OLAP)? Explain various operations on OLAP with suitable example.
17. Differentiate between OLAP and OLTP. Define data cube, OLAP operations, fact table and dimension table.
18. Differentiate between Data Warehouse and Database. (Note: Answer is same as that of Qn.14)
19. What are the approaches to measure similarity between data.
Chapter 3: Classification [20 marks]
1. Define Classification and prediction with example. Explain different stages in Classification with clear block
diagram.
2. Differentiate supervised and unsupervised learning with suitable examples.
3. Describe Decision Tree Classifier with example. Explain Hunt’s Algorithm.
4. Describe Iterative Dichotomizer 3 (ID3) algorithm. What makes ID3 Algorithm “greedy”?
5. How does C4.5 overcomes problem of ID3 Algorithm? Compare Information Gain, Gain ratio and Gini Index.
6. What do you mean by overfitting? What are approaches to avoid it? Differentiate Pre-pruning and Post-
pruning.
7. What are the advantages and disadvantages of Decision Tree Classifier?
8. Define rule based classifier with example. Describe rule assessment measures- Coverage and Accuracy.
9. How can rules be extracted from decision tree? Explain with an example.
10. Explain Sequential covering algorithm for Rule Induction. What are characteristics of Rule Based Classifier?
11. Describe CN2 and RIPPER algorithm for rule growing. What are the measures for Rule Evaluation?
12. What are the advantages and disadvantages of rule based classifier?
13. What is rote-learner? Describe K-Nearest Neighbor Classifier with an example. Explain issues for choosing K.
14. What are the advantages and disadvantages of Nearest Neighbor Classifier?
15. Describe Naïve Bayes Classifier. Why is it “Naïve”? Explain Laplacian Correction for zero-probability
Problem.
16. What are the advantages and disadvantages of Naïve Bayes Classifier?
17. Define Artificial Neural Network (ANN). Explain Back Propagation Algorithm for training an ANN.
18. How can you measure classifier accuracy by Holdout and Cross validation methods?
19. Explain the role of Receiver Operating Characteristics (ROC) Curve for classifier model selection.
20. Describe Confusion Matrix with example. Define Accuracy, Error rate, Sensitivity, Specificity, Precision, Recall,
Positive Predictive Value and Negative Predictive Value, TPR, TNR, FPR, FNR of the classifier model.
21. Explain the inverse relation between Precision and Recall of classifier model.
22. Consider following training data (Buys Computer Data)
SN Age income student credit_rating class: buys_computer
1 Young high no Fair no
2 Young high no excellent no
3 middle_aged high no Fair yes
4 Senior medium no Fair yes
5 Senior low yes Fair yes
6 Senior low yes excellent no
7 middle_aged low yes excellent yes
8 Young medium no Fair no
9 Young low yes Fair yes
10 Senior medium yes Fair yes
11 Young medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes Fair yes
14 Senior medium no excellent no
a. Determine the root node of decision tree for above data set using ID3 algorithm.
b. Draw the complete Decision Tree using ID3 algorithm and determine class for X={age = young, income = low,
student = yes and credit rating = fair}
c. Use Naïve Bayes Classifier to determine class for
i. X={age = young, income = low, student = yes and credit rating = fair}
ii. X={age = middle_aged, income = low and credit rating = excellent}
iii. X={age = senior and credit rating = excellent}
23. Consider the following data set (Play Golf Data). Use “ID3 algorithm” and “Naïve Bayes Classifier” to predict if
people will play golf on a
a. Hot, Sunny day with high humidity and no wind.
b. Cool, rainy day with normal humidity and no wind
c. Mild, overcast, windy day with normal humidity
Temperature Temperature Humidity Humidity Class:
SN Outlook Windy
(Numeric) (Nominal) (Numeric) (Nominal) Play
1 Overcast 83 Hot 86 High False Yes
2 Overcast 64 Cool 65 Normal True Yes
3 Overcast 72 Mild 90 High True Yes
4 Overcast 81 Hot 75 Normal False Yes
5 Rainy 70 Mild 96 High False Yes
6 Rainy 68 Cool 80 Normal False Yes
7 Rainy 65 Cool 70 Normal True No
8 Rainy 75 Mild 80 Normal False Yes
9 Rainy 71 Mild 91 High True No
10 Sunny 85 Hot 85 High False No
11 Sunny 80 Hot 90 High True No
12 Sunny 72 Mild 95 High False No
13 Sunny 69 Cool 70 Normal False Yes
14 Sunny 75 Mild 70 Normal True Yes
24. Given the following confusion matrix, determine Accuracy, Error rate, Sensitivity, Specificity, Precision,
Recall, Positive Predictive Value and Negative Predictive Value, TPR, TNR, FPR, FNR of the classifier model.
Predicted
+ ve - ve
Actual
+ ve 152 130

- ve 88 630

25. Given the following confusion matrix, determine Accuracy, Error rate, Sensitivity, Specificity, Precision,
Recall, Positive Predictive Value and Negative Predictive Value, TPR, TNR, FPR, FNR of the classifier model.
Predicted cancer = yes cancer = no
Actual
cancer = yes 90 210

cancer = no 140 9560

Chapter 4: Association Analysis [18 marks]

1. Explain Association rule mining with example. Define Support, frequent item set and confidence with
example.
2. Describe apriori principle and explain apriori algorithm for frequent item set generation.
3. Discuss about advantages and disadvantages of apriori algorithm.
4. Define FP-Growth algorithm for frequent item set generation. Explain how FP growth approach
overcomes the disadvantage of Apriori algorithm.
5. Define the measure “Lift” with suitable example. What does it signifies?
6. What are FP-Tree and Conditional FP-Tree? Explain with example.
7. Discuss advantages and disadvantages of FP-Growth algorithm.
8. Describe the issues related to categorical data. Explain Sequential, Sub-graph and infrequent patterns.
9. Given min_sup=33.34% and min_conf=60%, use apriori algorithm on following transaction data to determine
frequent item sets. Also, indicate association rules generated, underline the strong ones and sort them by
confidence.
Transaction ID Item set
TID1 HotDogs, Buns, Ketchup
TID2 HotDogs, Buns
TID3 HotDogs, Coke, Chips
TID4 Coke, Chips
TID5 Chips, Ketchup
TID6 HotDogs, Coke, Chips
10. Use the data set in Qn.8 with same min_sup to build FP-tree. Show for each transaction, how the tree evolves.
Then use FP-Growth approach to discover the frequent item set from this FP tree.
11. Identify the candidate and large item sets of the following transaction table. Use Apriori algorithm with
minimum support 2. Also, indicate association rules generated, underline the strong ones and sort them by
confidence.(min_conf = 60%)
Transaction id Items
t1 {A, C, D}
t2 {B, C, E}
t3 {A, B, C, E}
t4 {B, E}
t5 {A, B, C, E}
12. Use the data set in Qn.10 with same minimum support to build FP-tree. Show for each transaction, how the
tree evolves. Then use FP-Growth approach to discover the frequent item set from this FP tree.
13. Write short note on improving the efficiency of apriori algorithm.
14. How can we handle categorical, sequential, graphical and data stream using association mining.
Chapter 5: Cluster Analysis [16 marks]
1. Define Clustering. Why do we need cluster analysis? Discuss the qualities of a good cluster.
2. Describe major clustering approaches. Differentiate between hierarchical and partitioning clustering technique.
3. Describe K-means algorithm for clustering and discuss its strengths and weaknesses.
4. Compare k-means, k-medoids, and k-modes algorithm.
5. What is hierarchical clustering? Describe AGNES and DIANA methods of hierarchical clustering approach.
6. Describe DBSCAN Clustering. What are the advantages of Density based clustering.
7. Describe external, internal and relative measures of clustering quality.
8. Write short notes on
a. Partitioning Clustering
b. Hierarchical Clustering
c. Evaluation of Clustering
9. Identify the cluster of the following instances using K-Means clustering algorithm (Take, K=2 and K=3)
X = {2, 24, 21, 5, 6, 41, 35, 36, 9, 26, 44, 7, 46, 26, 11, 1, 32, 43, 48, 13}
10. Identify the cluster of the following instances using K-Means clustering algorithm (Take, K=2).[IOE 2063]
Instance X Y
1 1.0 1.5
2 2.5 5.5
3 1.5 1.0
4 2.0 3.0
5 2.5 3.5
6 4.0 6.2
11. Use K-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:
A1=(2, 10) , A2=(2, 5) , A3=(8, 4) , A4=(5, 8) , A5=(7, 5) , A6=(6, 4) , A7=(1, 2) , A8=(4, 9)
Chapter 6: Anomaly/Fraud Detection [6marks]
1. What do you mean by anomaly detection, why is it important and where is it applicable?
2. What are the challenges in Anomaly Detection? Explain Different types of anomaly detection schemes.
3. List the drawbacks of graphical approach and describe statistical approaches for anomaly detection.
4. Write sort notes on
a. Distance based approaches for anomaly detection
b. Likelihood approach for anomaly detection
c. Base rate fallacy

Chapter 7: Advance applications [6 marks]

1. Describe web mining along with its structure. What are the challenges in web mining?
2. Briefly explain the page ranking algorithm.
3. Write Short notes on
a. WWW Mining
b. Time series data and regression analysis
c. Multimedia Mining

Data Mining Model Qns
100% (1)
Data Mining Model Qns
14 pages
Question Set Machine Learning A Revolution in Risk Management and Compliance
100% (11)
Question Set Machine Learning A Revolution in Risk Management and Compliance
11 pages
Instrumentation II Handwritten Notes
No ratings yet
Instrumentation II Handwritten Notes
252 pages
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
No ratings yet
A Presentation On "Deep Neural Network" Nikhil Sunil Patil
9 pages
Final PPT of Bank
100% (1)
Final PPT of Bank
29 pages
Engineering Mathematics II
100% (1)
Engineering Mathematics II
214 pages
Exam Question Related To Case Study With Solutions
No ratings yet
Exam Question Related To Case Study With Solutions
5 pages
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
10 pages
Fitria (2021)
No ratings yet
Fitria (2021)
15 pages
Impact of EV
No ratings yet
Impact of EV
12 pages
Data Mining Merged
No ratings yet
Data Mining Merged
10 pages
Vi Sem Bca Qbank - Wcms - Fds
50% (2)
Vi Sem Bca Qbank - Wcms - Fds
11 pages
T.E. (2019 Pattern) Insem Exam. Timetable For March-April-2024
No ratings yet
T.E. (2019 Pattern) Insem Exam. Timetable For March-April-2024
13 pages
Internship Data Information Technology AY 2020 21 Compressed
No ratings yet
Internship Data Information Technology AY 2020 21 Compressed
346 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Unit 3 Question Bank
No ratings yet
Unit 3 Question Bank
8 pages
Data Mining Question Bank
0% (1)
Data Mining Question Bank
7 pages
A, Sign Language Detection
No ratings yet
A, Sign Language Detection
32 pages
Imanage Threat Manager User Guide 10.3.5
No ratings yet
Imanage Threat Manager User Guide 10.3.5
125 pages
Data Warehousing and Mining April 2019
No ratings yet
Data Warehousing and Mining April 2019
4 pages
Homework 4
0% (1)
Homework 4
4 pages
DMW MCQ
No ratings yet
DMW MCQ
388 pages
DWDM Unitwise Questions
No ratings yet
DWDM Unitwise Questions
3 pages
MCA 2023 Syllabus - 27-10-2023
No ratings yet
MCA 2023 Syllabus - 27-10-2023
107 pages
Mcqs Unit 3
No ratings yet
Mcqs Unit 3
6 pages
Data Mning
No ratings yet
Data Mning
40 pages
HDR 2020 Overview English
No ratings yet
HDR 2020 Overview English
36 pages
American - Sign - Language - Progress Final
No ratings yet
American - Sign - Language - Progress Final
44 pages
Final Defence Ecommerce WORD
No ratings yet
Final Defence Ecommerce WORD
47 pages
Aasl
No ratings yet
Aasl
34 pages
American SIGN - LANGUAGE - DETECTION
No ratings yet
American SIGN - LANGUAGE - DETECTION
35 pages
DMA QB Solved
No ratings yet
DMA QB Solved
42 pages
Security and Communication Networks - 2022 - Ahmed - Machine Learning Techniques For Spam Detection in Email and IoT
No ratings yet
Security and Communication Networks - 2022 - Ahmed - Machine Learning Techniques For Spam Detection in Email and IoT
19 pages
DM PYQ Merged
No ratings yet
DM PYQ Merged
26 pages
Marcus Ranum Keynote
No ratings yet
Marcus Ranum Keynote
40 pages
CEUC502 - DMBI - Question - Bank
No ratings yet
CEUC502 - DMBI - Question - Bank
12 pages
Sign Language Detection
No ratings yet
Sign Language Detection
32 pages
66fd1d7531d12787c44a8bf7 Bachelors General Brochure OCT
No ratings yet
66fd1d7531d12787c44a8bf7 Bachelors General Brochure OCT
22 pages
DW Model Questions
No ratings yet
DW Model Questions
8 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
REF 1.0 Time Series Forecasting of Consolidation Settlement Using LSTM
No ratings yet
REF 1.0 Time Series Forecasting of Consolidation Settlement Using LSTM
19 pages
Datamining Bits
No ratings yet
Datamining Bits
16 pages
Data Mining - DM 1-5 Question Bank
No ratings yet
Data Mining - DM 1-5 Question Bank
10 pages
Use of Virtual C... Tools
No ratings yet
Use of Virtual C... Tools
13 pages
Arshad Hisham Sir
No ratings yet
Arshad Hisham Sir
10 pages
Unit 4 - Question Bank
No ratings yet
Unit 4 - Question Bank
11 pages
Wa0001
No ratings yet
Wa0001
6 pages
Techincal Seminar
No ratings yet
Techincal Seminar
17 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
27 pages
Aie - Concept of Data Mining
No ratings yet
Aie - Concept of Data Mining
5 pages
A Design and An Implementation of Forecast Sentence Extractor
No ratings yet
A Design and An Implementation of Forecast Sentence Extractor
10 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
A Study On Software Effort Prediction Using Machine Learning Techniques
No ratings yet
A Study On Software Effort Prediction Using Machine Learning Techniques
15 pages
Estimation of Fuel Consumption: B. Tech Degree in Information Technology
No ratings yet
Estimation of Fuel Consumption: B. Tech Degree in Information Technology
15 pages
DM 100
No ratings yet
DM 100
17 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Qwen 2 5
No ratings yet
Qwen 2 5
15 pages
Ia1 ML Scheme Common To Is, Ai, Cs
No ratings yet
Ia1 ML Scheme Common To Is, Ai, Cs
10 pages
DM Question Bank
No ratings yet
DM Question Bank
5 pages
Dsbda Prelim QB Solution
No ratings yet
Dsbda Prelim QB Solution
11 pages
Seperated
No ratings yet
Seperated
11 pages
DM Obj
No ratings yet
DM Obj
16 pages
Advancing App With Real-Time Image Analysis, Machine Learning, and Vision - Mastering ARKit - Apple's Augmented Reality App Development Platform
No ratings yet
Advancing App With Real-Time Image Analysis, Machine Learning, and Vision - Mastering ARKit - Apple's Augmented Reality App Development Platform
8 pages
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
No ratings yet
B.Tech Degree S8 (S, FE) / S6 (PT) (S, FE) Examination June 2023 (2015 Scheme)
4 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Predicting CVSS Metric Via Description Interpretat
No ratings yet
Predicting CVSS Metric Via Description Interpretat
10 pages
Simulation in Military Training Recent D
No ratings yet
Simulation in Military Training Recent D
11 pages
QB Data Mining
No ratings yet
QB Data Mining
5 pages
E Nose IoT
No ratings yet
E Nose IoT
8 pages
DWDM Unit Wise Question Bank
No ratings yet
DWDM Unit Wise Question Bank
8 pages
SemSuggestions DM
No ratings yet
SemSuggestions DM
6 pages
DW & DM Questions & Answers
No ratings yet
DW & DM Questions & Answers
12 pages
10.5 Light - Reflection and Refraction
No ratings yet
10.5 Light - Reflection and Refraction
6 pages
A Deep Learning Approach To Automatic Arc Us Sen Il Is
No ratings yet
A Deep Learning Approach To Automatic Arc Us Sen Il Is
7 pages
Ans DM
No ratings yet
Ans DM
16 pages
DMBI Questions
No ratings yet
DMBI Questions
8 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
DM Vsaq
No ratings yet
DM Vsaq
8 pages
Question Bank 2
No ratings yet
Question Bank 2
4 pages
DMDW
No ratings yet
DMDW
4 pages
Unit4 Mcqs
No ratings yet
Unit4 Mcqs
7 pages
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
No ratings yet
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
9 pages
Data Mining (Gtu Sem-6) 002
No ratings yet
Data Mining (Gtu Sem-6) 002
5 pages
DM Question Bank
No ratings yet
DM Question Bank
2 pages
Question Bank Bca - Ids
No ratings yet
Question Bank Bca - Ids
3 pages
new-Guidelines-Datamining-I-UGCF-DSE-CS Hons-Sem 4-Jan 25
No ratings yet
new-Guidelines-Datamining-I-UGCF-DSE-CS Hons-Sem 4-Jan 25
3 pages
DMBI All Pyqs
No ratings yet
DMBI All Pyqs
4 pages
Lab Sheet 2
No ratings yet
Lab Sheet 2
2 pages
DWDM QB
No ratings yet
DWDM QB
6 pages
Data Mining Question Bank 3,4,5
No ratings yet
Data Mining Question Bank 3,4,5
7 pages
Unit 2 - Week 1: Assignment 1
No ratings yet
Unit 2 - Week 1: Assignment 1
3 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
Final Exam Review
No ratings yet
Final Exam Review
6 pages
Assignment DMW
No ratings yet
Assignment DMW
2 pages
Camouflaged Lifeform Detection ML Aarushi Approach Paper
No ratings yet
Camouflaged Lifeform Detection ML Aarushi Approach Paper
2 pages
Gandhinagar Institute of Technology: Computer Engineer Ing Department Question Bank
No ratings yet
Gandhinagar Institute of Technology: Computer Engineer Ing Department Question Bank
3 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Gujarat Technological University: Subject Code: 171601 Date: 25/11/2014 Subject Name: Data Warehousing and Data Mining
No ratings yet
Gujarat Technological University: Subject Code: 171601 Date: 25/11/2014 Subject Name: Data Warehousing and Data Mining
2 pages
Free Cover Letter Template
No ratings yet
Free Cover Letter Template
1 page
Are Pally Su Kumar Resume
No ratings yet
Are Pally Su Kumar Resume
1 page
QB Students DM
No ratings yet
QB Students DM
12 pages
A Project Work Report
No ratings yet
A Project Work Report
9 pages

Data Mining List of Important Question

Uploaded by

Data Mining List of Important Question

Uploaded by

Probable Questions Collection for Data Mining (Elective-I )

Chapter 1: Introduction [4 Marks]

cancer = no 140 9560

Chapter 4: Association Analysis [18 marks]

Chapter 7: Advance applications [6 marks]

You might also like