0% found this document useful (0 votes)
17 views6 pages

SemSuggestions DM

The document contains a comprehensive list of questions and topics related to data mining, covering concepts such as data cleaning, OLAP, classification, clustering, and various algorithms like Naïve Bayes and decision trees. It also includes practical exercises on distance calculations, association rule mining, and the application of algorithms to datasets. Additionally, it addresses challenges in data mining and the importance of data preprocessing.

Uploaded by

rounaksainbwn17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

SemSuggestions DM

The document contains a comprehensive list of questions and topics related to data mining, covering concepts such as data cleaning, OLAP, classification, clustering, and various algorithms like Naïve Bayes and decision trees. It also includes practical exercises on distance calculations, association rule mining, and the application of algorithms to datasets. Additionally, it addresses challenges in data mining and the importance of data preprocessing.

Uploaded by

rounaksainbwn17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Suggestions (Part 1):

1. Define Data-mining. Explain the process of Knowledge Discovery in Database (KDD). (5)
2. Define Data Cleaning. What do you mean by Feature Engineering? (2.5+2.5)
3. Define the terms ROLAP, MOLAP and HOLAP. Write two applications of OLAP. (3+2)
4. What is Feature Selection? Define Regression. (2+3)
5. How data warehouse is different from DBMS? How prediction is different from
classification? (2+3)
6. Define Support and Confidence. What is pruning? (4+1)

7. Given two objects represented by the tuples (22,1,42,10) and (20,0,36,8): (5)
a. Compute the Euclidean distance between the two objects.
b. Compute the Manhattan distance between the two objects.
c. Compute the Minkowski distance between the two objects using p=3
8. How can we compute the dissimilarity between two binary objects? (5)
9. Write K- nearest neighbour classification algorithm. (5)
10. How can outliers affect the performance of a model? Explain with a suitable example. (5)
11. What is decision tree? Define the concept of classification. (2+3)
12. Define and differentiate Entropy and Gini Impurity in decision tree algorithms. Which one
among the two is more efficient and why? (5)
13. Explain the concepts of overfit and underfit and how do they affect the performance of a
model? (5)

14. Explain different OLAP operations on multi-dimensional data with suitable examples and
necessary diagrams of data cubes. (10)

15. Describe the difference between ROLAP and MOLAP. (5)

16. Suppose we have data on a few individuals randomly surveyed. The data gives the responses
towards interests to promotional offers made in the area of Finance, Travel and Health.
Gender is the output attribute to be predicted. Apply Naïve Bayesian classification algorithm
to classify the new instance:

(Finance=No, Travel=Yes, Health= No) (10)

Finance Travel Health Gender


Yes No No Male
Yes Yes No Male
No Yes Yes Female
No Yes Yes Male
Yes Yes Yes Female
No No No Female
Yes No No Male
Yes Yes No Male
No No Yes Female
Yes No No Male

17. Explain Association Rule Mining. (5)

18. Describe different data Pre-processing techniques. (10)

19. Explain the concepts of Overfit and Underfit. (5)

20. Consider the following transaction dataset ‘D’ which shows 9 transactions with the items I1,
I2, I3, I4 and I5. Apply Apriori Algorithm to find the frequent itemset and strong association
rules for the following table with minimum support 3 and minimum confidence 60%.
(15)

Tid List of items


T1 I1, I2, I5
T2 I2, I4
T3 I2, I3
T4 I1, I2, I4
T5 I1, I3
T6 I2, I3
T7 I1, I3
T8 I1, I2, I3, I5
T9 I1, I2, I3

21. Define Classification. Explain the general approach for solving classification models. (2+7)
22. Explain confusion matrix. (6)
23. Illustrate Association Rule Mining with a suitable example. (10)
24. Explain Support and Confidence. (5)
25. Explain various data mining techniques. (9)
26. Differentiate OLTP and OLAP. (6)
27. Explain Naïve Bayesian Classification in detail with an example. (15)
28. Explain in detail about Outlier Analysis. (9)
29. How can outliers affect the performance of a model? (6)
30. What is the role of Bayes theorem in Naive Bayes Algorithm? (5)
31. Why is naïve Bayesian classification called “naive”? (5)
32. Explain independent events and mutually exclusive events. (5)
33. Compare and contrast classification and clustering techniques with suitable illustrations. (10)
34. Explain the significance of cross validation. (5)
35. Define information gain and explain its importance in decision tree algorithm. (8)

36. Explain the conditions for overfitting and underfitting in decision tree classification
algorithm. (7)
37. The following data gives us the prediction to play tennis based on some individual attributes
like Outlook and Humidity. Apply Decision Tree Algorithm to classify the new instance as
follows:
(Outlook=Overcast, Humidity= High) [15]

OUTLOOK HUMIDITY PLAY TENNIS


Overcast High Yes
Rainy High Yes
Rainy Normal Yes
Rainy Normal No
Overcast Normal Yes
Sunny High No
Sunny Normal Yes
Overcast High Yes
Overcast Normal Yes

38. Apply FP-Growth algorithm on the following transactional data to find frequent itemset. List
all frequent itemset with their support count. Generate the association rules. Minimum
support count is 3 and minimum confidence is 75%. (15)

Tid List of Items


1 f,a,c,d,g,m,p
2 a,b,c,f,l,m,o
3 b,f,h,o
4 b,f,c,p
5 a,f,c,l,p,m,n

39. Why is the similarity measure important? (3)


40. What are the types of similarity? Explain any two in details. (6)
41. “Data Pre-processing is necessary before data mining process”- Justify your answer (6)
42. Explain Market Basket Analysis. (4)
43. Describe four challenges of Data Mining. (4)
44. How is Hamming Distance calculated? Explain with an example. (7)
45. Write short notes on any three of the following: (3X5)
a. Supervised vs unsupervised learning
b. kNN algorithm
c. Confusion matrix
d. k-fold cross validation technique
e. PageRank algorithm
Suggestions (Part 2):

1. What is cluster analysis? Explain different types of clustering. (5)


2. What are the advantages of DBSCAN over k-Means clustering algorithm? How is entropy of a
dataset calculated? (2.5+2.5)
3. Explain the basic of Agglomerative Hierarchical clustering algorithm. (5)
4. What are the advantages of PAM algorithm over k-means algorithm? (5)
5. Describe the following activities involved in the web usage mining: i) Pre-processing activity
ii) Pattern analysis activity (5)
6. Explain text mining with suitable examples. (5)
7. Explain HITS algorithm with a suitable example. (5)
8. What is clustering? Compare and contrast k-Means and k-Medoid algorithms. How do you
determine the best value of k in these algorithms? (2+10+3)
9. What is hierarchical clustering? Explain the concepts of Agglomerative and Divisive methods
of clustering with suitable examples. (2+13)
10. What is web mining? What are the challenges in web mining? Explain HITS algorithm with
suitable illustrations. (3+5+7)

You might also like