0% found this document useful (0 votes)
35 views

Introduction to Data Mining Assignment 2

This document is an assignment on data mining that includes questions about frequent itemset mining using Apriori and FP-growth algorithms, as well as implementation tasks for these algorithms in programming languages like C++ or Java. It also explores association rules, correlation relationships, and various measures of confidence in the context of supermarket transaction data. The assignment requires analysis of algorithm performance and correlation relationships based on given data sets.

Uploaded by

Ayesha Rahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Introduction to Data Mining Assignment 2

This document is an assignment on data mining that includes questions about frequent itemset mining using Apriori and FP-growth algorithms, as well as implementation tasks for these algorithms in programming languages like C++ or Java. It also explores association rules, correlation relationships, and various measures of confidence in the context of supermarket transaction data. The assignment requires analysis of algorithm performance and correlation relationships based on given data sets.

Uploaded by

Ayesha Rahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Introduction to Data Mining

Assignment #2

Q#1: A database has five transactions. Let min-sup=60% and min-conf=80%


TID Items-bought
T100 {M, O, N, K, E, Y}
T200 {D, O, N, K, E, Y}
T300 {M, A, K, E}
T400 {M, U, C, K, Y}
T500 {C, O, O, K, I, E}
Find all frequent itemsets using Apriori and FP-growth, respectively. Compare the efficiency of
the two mining processes.
List all the strong association rules (with support s and confidence c) matching the following
metarule, where X is a variable representing customers, and item I denotes variables representing
items(e.g, “A,” “B”);)
∀x ∈ transaction, buys(X,item1) ∧ buys(X,item2) ⇒ buys(X,item3) [s,c]
Q#2: (Implementation project) Using a programming language that you are familiar with, such
as C++ or Java, implement three frequent itemset mining algorithms introduced in this chapter:
(1) Apriori [AS94b], (2) FP-growth [HPY00], and (3) Eclat [Zak00] (mining using the
vertical data format). Compare the performance of each algorithm with various kinds of large
data sets. Write a report to analyze the situations (e.g., data size, data distribution, minimal
support threshold setting, and pattern density) where one algorithm may perform better than the
others, and state why?
Q#3: Give a short example to show that items in a strong association rule actually may be
negatively correlated.
Q#4: The following contingency table summarizes supermarket transaction data, where hot dogs
refers to the transactions containing hot dogs, hot dogs refers to the transactions that do not
contain hot dogs, hamburgers refers to the transactions containing hamburgers, and hamburgers
refers to the transactions that do not contain hamburgers.

(a) Suppose that the association rule “hot dogs ⇒ hamburgers” is mined. Given a minimum
support threshold of 25% and a minimum confidence threshold of 50%, is this association rule
strong?
(b) Based on the given data, is the purchase of hot dogs independent of the purchase of
hamburgers? If not, what kind of correlation relationship exists between the two?
(c) Compare the use of the all confidence, max confidence, Kulczynski, and cosine measures
with lift and correlation on the given data.

You might also like