SemSuggestions DM
SemSuggestions DM
1. Define Data-mining. Explain the process of Knowledge Discovery in Database (KDD). (5)
2. Define Data Cleaning. What do you mean by Feature Engineering? (2.5+2.5)
3. Define the terms ROLAP, MOLAP and HOLAP. Write two applications of OLAP. (3+2)
4. What is Feature Selection? Define Regression. (2+3)
5. How data warehouse is different from DBMS? How prediction is different from
classification? (2+3)
6. Define Support and Confidence. What is pruning? (4+1)
7. Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8): (5)
a. Compute the Euclidean distance between the two objects.
b. Compute the Manhattan distance between the two objects.
c. Compute the Minkowski distance between the two objects using p=3
8. How can we compute the dissimilarity between two binary objects? (5)
9. Write K- nearest neighbour classification algorithm. (5)
10. How can outliers affect the performance of a model? Explain with a suitable example. (5)
11. What is decision tree? Define the concept of classification. (2+3)
12. Define and differentiate Entropy and Gini Impurity in decision tree algorithms. Which one
among the two is more efficient and why? (5)
13. Explain the concepts of overfit and underfit and how do they affect the performance of a
model? (5)
14. Explain different OLAP operations on multi-dimensional data with suitable examples and
necessary diagrams of data cubes. (10)
16. Suppose we have data on a few individuals randomly surveyed. The data gives the responses
towards interests to promotional offers made in the area of Finance, Travel and Health.
Gender is the output attribute to be predicted. Apply Naïve Bayesian classification algorithm
to classify the new instance:
20. Consider the following transaction dataset ‘D’ which shows 9 transactions with the items I1,
I2, I3, I4 and I5. Apply Apriori Algorithm to find the frequent itemset and strong association
rules for the following table with minimum support 3 and minimum confidence 60%.
(15)
21. Define Classification. Explain the general approach for solving classification models. (2+7)
22. Explain confusion matrix. (6)
23. Illustrate Association Rule Mining with a suitable example. (10)
24. Explain Support and Confidence. (5)
25. Explain various data mining techniques. (9)
26. Differentiate OLTP and OLAP. (6)
27. Explain Naïve Bayesian Classification in detail with an example. (15)
28. Explain in detail about Outlier Analysis. (9)
29. How can outliers affect the performance of a model? (6)
30. What is the role of Bayes theorem in Naive Bayes Algorithm? (5)
31. Why is naïve Bayesian classification called “naive”? (5)
32. Explain independent events and mutually exclusive events. (5)
33. Compare and contrast classification and clustering techniques with suitable illustrations. (10)
34. Explain the significance of cross validation. (5)
35. Define information gain and explain its importance in decision tree algorithm. (8)
36. Explain the conditions for overfitting and underfitting in decision tree classification
algorithm. (7)
37. The following data gives us the prediction to play tennis based on some individual attributes
like Outlook and Humidity. Apply Decision Tree Algorithm to classify the new instance as
follows:
(Outlook=Overcast, Humidity= High) [15]
38. Apply FP-Growth algorithm on the following transactional data to find frequent itemset. List
all frequent itemset with their support count. Generate the association rules. Minimum
support count is 3 and minimum confidence is 75%. (15)