This document contains questions for an exam on data warehousing and mining. It covers topics such as:
1) Justifying the need for data mining and explaining principal component analysis, classification methods for categorical variables, naive Bayesian classification, and measuring frequent itemsets, support and confidence.
2) The major issues in data mining, and the steps involved in the data mining process and KDD process.
3) Data preprocessing methods, the ID3 decision tree algorithm, converting decision trees to rules and pruning rules versus pruning trees first.
4) Bayes' theorem and its use for classification. Why naive Bayesian classification is called "naive".
5) The Apriori algorithm for generating frequent itemsets
This document contains questions for an exam on data warehousing and mining. It covers topics such as:
1) Justifying the need for data mining and explaining principal component analysis, classification methods for categorical variables, naive Bayesian classification, and measuring frequent itemsets, support and confidence.
2) The major issues in data mining, and the steps involved in the data mining process and KDD process.
3) Data preprocessing methods, the ID3 decision tree algorithm, converting decision trees to rules and pruning rules versus pruning trees first.
4) Bayes' theorem and its use for classification. Why naive Bayesian classification is called "naive".
5) The Apriori algorithm for generating frequent itemsets
This document contains questions for an exam on data warehousing and mining. It covers topics such as:
1) Justifying the need for data mining and explaining principal component analysis, classification methods for categorical variables, naive Bayesian classification, and measuring frequent itemsets, support and confidence.
2) The major issues in data mining, and the steps involved in the data mining process and KDD process.
3) Data preprocessing methods, the ID3 decision tree algorithm, converting decision trees to rules and pruning rules versus pruning trees first.
4) Bayes' theorem and its use for classification. Why naive Bayesian classification is called "naive".
5) The Apriori algorithm for generating frequent itemsets
This document contains questions for an exam on data warehousing and mining. It covers topics such as:
1) Justifying the need for data mining and explaining principal component analysis, classification methods for categorical variables, naive Bayesian classification, and measuring frequent itemsets, support and confidence.
2) The major issues in data mining, and the steps involved in the data mining process and KDD process.
3) Data preprocessing methods, the ID3 decision tree algorithm, converting decision trees to rules and pruning rules versus pruning trees first.
4) Bayes' theorem and its use for classification. Why naive Bayesian classification is called "naive".
5) The Apriori algorithm for generating frequent itemsets
III B. Tech II Semester Regular/Supplementary Examinations, October/November - 2020
DATA WAREHOUSING AND MINING (Computer Science and Engineering) Time: 3 hours Max. Marks: 70 Note: 1. Question Paper consists of two parts (Part-A and Part-B) 2. Answer ALL the question in Part-A 3. Answer any FOUR Questions from Part-B ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PART –A (14 Marks) 1. a) Justify the need for data mining. [2M] b) What are the principal components? How are principal components used in [2M] feature selection? c) What is a method used for classification that can be used even if some of the [2M] variables are categorical? d) What is the main idea of naïve Bayesian classification? [3M] e) Define frequent itemset, support and confidence. [3M] f) How would you measure the quality of clusters? [2M]
PART –B (56 Marks)
2. a) What are the major issues in Data Mining? Explain briefly. [7M] b) Explain the steps involved in the Data Mining Process. Give the sketch of the [7M] KDD process.
3. Explain various data pre-processing methods with appropriate examples. [14M]
4. a) Explain the ID3 algorithm for the induction of decision trees. [6M] b) Given a decision tree, there exists an option of (i) converting the decision tree to [8M] rules and then pruning the resulting rules, or (ii) pruning the decision tree and then converting the pruned tree to rules? What advantage does (i) have over (ii)?
5. a) What is Bayes Theorem? Show how it is used for classification? [7M]
b) Why Naïve Bayesian classification is called “naïve”? Explain. [7M]
6. a) Develop the Apriori Algorithm for generating frequent-item set. [8M]
b) What is association analysis? Explain. [6M] 7. What are the advantages and disadvantages of k-means clustering against model- [14M] based clustering? You are given a set of numbers {2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377}. Use the following techniques to find two clusters from this data set. (i) K-Means with initial centroids {1} and {378} (ii) K-Means with initial centroids {21} and {34}. Explain the differences between K-means clustering and kernel K-means clustering. *****