Revision (Ques - Only)
Revision (Ques - Only)
Revision
Sample exam questions
1. Name five kinds of graphics/plots that can be used to represent data dispersion characteristics
effectively.
2. For the following group of data: 53, 55, 70, 58, 64, 57, 53, 69, 57, 68, 53
3. For the following vectors x and y, calculate the indicated similarity or distance measures.
4. Both decision-tree induction and associative classification may generate rules for classification.
What are their major differences? Why is it that in many cases an associative induction may lead to
better accuracy in prediction?
(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|−), P(B|−), and P(C|−).
1
Introduction to Data Mining Course Lecturer: Sang Nguyen
(b) Use the estimate of conditional probabilities given in the previous question to predict the class
label for a test sample (A=0,B=1,C=0) using the Na¨ıve Bayes approach.
6. Describe two frequent itemset mining method, i.e., Apriori and FPGrowth, and remark their
advantages and disadvantages.
(a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a
market basket.
(b) Use the results in part (a) to compute the confidence for the association rules: {b, d}→{e} and
{e}→{b, d}. Is confidence a symmetric measure?
(c) Repeat part (a) by treating each customer ID as a market basket. Each item should be treated as a
binary variable (1 if an item appears in at least one transaction bought by the customer, and 0
otherwise.)
(d) Use the results in part (c) to compute the confidence for the association rules: {b, d}→{e} and
{e}→{b, d}.
{1,2,3},{1,2,4},{1,2,5},{1,3,4},{1,3,5},{2,3,4},{2,3,5},{3,4,5}.
Assume that there are only five items in the data set.
(a) List all candidate 4-itemsets obtained by the candidate generation procedure in Apriori.
(b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori algorithm.