0% found this document useful (0 votes)
6 views2 pages

Revision (Ques - Only)

The document outlines sample exam questions for an Introduction to Data Mining course taught by Sang Nguyen. It covers various topics including data representation, statistical measures, similarity calculations, classification methods, frequent itemset mining, and clustering algorithms. The questions are designed to assess understanding of key concepts and techniques in data mining.

Uploaded by

dohoangtruonghuy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Revision (Ques - Only)

The document outlines sample exam questions for an Introduction to Data Mining course taught by Sang Nguyen. It covers various topics including data representation, statistical measures, similarity calculations, classification methods, frequent itemset mining, and clustering algorithms. The questions are designed to assess understanding of key concepts and techniques in data mining.

Uploaded by

dohoangtruonghuy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Introduction to Data Mining Course Lecturer: Sang Nguyen

Revision
Sample exam questions

1. Name five kinds of graphics/plots that can be used to represent data dispersion characteristics
effectively.

2. For the following group of data: 53, 55, 70, 58, 64, 57, 53, 69, 57, 68, 53

(a) What is the mean of the data? What is the median?


(b) What is the mode of the data?
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) What is the variance? What is the standard deviation?

3. For the following vectors x and y, calculate the indicated similarity or distance measures.

(a) x=(1,1,1,1), y=(2,2,2,2). Euclidean, Manhattan, Minkowski (h=3)

(b) x=(0,1,0,1), y=(1,0,1,0). Cosine, Euclidean, Jaccard

4. Both decision-tree induction and associative classification may generate rules for classification.
What are their major differences? Why is it that in many cases an associative induction may lead to
better accuracy in prediction?

5. Consider the data set shown in the following:

(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|−), P(B|−), and P(C|−).

1
Introduction to Data Mining Course Lecturer: Sang Nguyen

(b) Use the estimate of conditional probabilities given in the previous question to predict the class
label for a test sample (A=0,B=1,C=0) using the Na¨ıve Bayes approach.

6. Describe two frequent itemset mining method, i.e., Apriori and FPGrowth, and remark their
advantages and disadvantages.

7. Consider the data set shown below.

(a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each transaction ID as a
market basket.

(b) Use the results in part (a) to compute the confidence for the association rules: {b, d}→{e} and
{e}→{b, d}. Is confidence a symmetric measure?

(c) Repeat part (a) by treating each customer ID as a market basket. Each item should be treated as a
binary variable (1 if an item appears in at least one transaction bought by the customer, and 0
otherwise.)

(d) Use the results in part (c) to compute the confidence for the association rules: {b, d}→{e} and
{e}→{b, d}.

8. Consider the following set of frequent 3-itemsets:

{1,2,3},{1,2,4},{1,2,5},{1,3,4},{1,3,5},{2,3,4},{2,3,5},{3,4,5}.

Assume that there are only five items in the data set.

(a) List all candidate 4-itemsets obtained by the candidate generation procedure in Apriori.

(b) List all candidate 4-itemsets that survive the candidate pruning step of the Apriori algorithm.

9. Present some clustering algorithms, e.g., DBSCAN

10. Review advanced classification algorithms.

You might also like