Lecture 7
Lecture 7
1
• We have seen frequent itemset mining using
– Apriori Algorithm
– FP-Growth
• We will continue the discussion …
2
The itemset lattice
4
Illustration of the Apriori principle
Frequent
subsets
Found to be frequent
5
Illustration of the Apriori principle
Found to be
Infrequent
Infrequent supersets
Pruned
6
• We use to mean frequency count
of the itemset {we call this as support count of
the itemset also}
• We use s to mean support (as a fraction) of an
association rule.
• Similarly c to mean confidence of an
association rule.
7
8
Maximal
Itemsets
Maximal itemsets = positive border
Infrequent
Itemsets Border
Maximal: no superset has this property
12
What is this?
• Maximal frequent itemsets effectively provide a
compact representation of frequent itemsets.
• That is, they form the smallest set of itemsets
from which all frequent itemsets can be derived.
Where it is useful?
• It is very useful where very long frequent itemsets
are present.
• Exponentially many frequent itemsets are present !
•
14
When it is useful?
• The presence of efficient algorithms exist to
explicitly find the maximal frequent itemsets
without having to enumerate all their subsets.
• Is FP-growth one such?
• There are other methods in literature which works
on the lattice of itemsets
• Top-down (over the lattice)
• Bottom-up
• BFS
• DFS
•
15
Closed itemset
• An itemset X is closed iff none of its immediate
supersets has exactly the same count (support
count) as X.
Closed itemset
• An itemset X is closed iff atleast one of its
immediate supersets has a different count
(support count).
Not supported
by any
transactions
23
Not supported
by any
transactions
24
Not supported
by any
transactions
25
Closed
and
maximal
# Closed = 9
# Maximal = 4
26