0% found this document useful (0 votes)
6 views26 pages

Lecture 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views26 pages

Lecture 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Frequent Itemsets, Association

Rules, some concise representatons,


other interestingness measures
Frequent itemset mining …

1
• We have seen frequent itemset mining using
– Apriori Algorithm
– FP-Growth
• We will continue the discussion …

2
The itemset lattice

Given d items, there are


2d possible itemsets
3
Too expensive to test all!
The Apriori Principle
• Apriori principle (Main observation):
– If an itemset is frequent, then all of its subsets must also be
frequent
– If an itemset is not frequent, then all of its supersets cannot
be frequent

– The support of an itemset never exceeds the support of its


subsets
– This is known as the anti-monotone property of support

4
Illustration of the Apriori principle
Frequent
subsets

Found to be frequent

5
Illustration of the Apriori principle

Found to be
Infrequent

Infrequent supersets

Pruned
6
• We use to mean frequency count
of the itemset {we call this as support count of
the itemset also}
• We use s to mean support (as a fraction) of an
association rule.
• Similarly c to mean confidence of an
association rule.

7
8

Compact Representation of frequent


itemsets
• In practice, the number of frequent itemsets
produced from a transaction data set can be very
large.
• It is useful to identify a small representative set of
itemsets from which all other frequent itemsets
can be derived.
• Two such things –
• Maximal itemset
• Closed itemset
9

Compact Representation of Frequent


Itemsets
• Some itemsets are redundant because they have identical
support as their supersets

• Number of frequent itemsets

• Need a compact representation


10

Maximal Frequent Itemset


• A maximal frequent itemset is defined as a
frequent itemset for which none of its immediate
supersets are frequent.
11

Maximal Frequent Itemset


An itemset is maximal frequent if none of its immediate supersets is
frequent

Maximal
Itemsets
Maximal itemsets = positive border

Infrequent
Itemsets Border
Maximal: no superset has this property
12

What is this?
• Maximal frequent itemsets effectively provide a
compact representation of frequent itemsets.
• That is, they form the smallest set of itemsets
from which all frequent itemsets can be derived.

• An itemset and its immediate superset cannot


both be maximal frequent at the same time.
• All subsets of a maximal frequent itemset are frequent.
13

Where it is useful?
• It is very useful where very long frequent itemsets
are present.
• Exponentially many frequent itemsets are present !

14

When it is useful?
• The presence of efficient algorithms exist to
explicitly find the maximal frequent itemsets
without having to enumerate all their subsets.
• Is FP-growth one such?
• There are other methods in literature which works
on the lattice of itemsets
• Top-down (over the lattice)
• Bottom-up
• BFS
• DFS

15

What is the limitation of this?


• Maximal frequent itemsets do not contain the
support information of their subsets.
• An additional pass over the data set is therefore
needed to determine the support counts of the
non-maximal frequent itemsets.

• Minimal representation of frequent itemsets that


preserves the support information might be useful
very much.
• Closed frequent itemsets
16

Closed itemset
• An itemset X is closed iff none of its immediate
supersets has exactly the same count (support
count) as X.

• How this is used?


• If an immediate subset of X (say Y) is not closed and X
is closed then support(Y) = support(X).
17

Closed itemset
• An itemset X is closed iff atleast one of its
immediate supersets has a different count
(support count).

• How this is used?


• If an immediate subset of X (say Y) is not closed and X
is closed then support(Y) = support(X).

• We maintain support count of only closed


itemsets.
18

• A closed itemset’s immediate superset also can


be closed.
19

How closed itemsets are useful?


• Let us say that an itemset X is not closed.
• That means, we do not maintain X’s support
count.
• We search for X’s immediate superset which is
closed. Let us say it is Y with support count s.
• Then X’s support count is s.
20

• What if X has two immediate supersets, say Y


and Z which are closed and are having different
support counts?
• This situation is not possible.
• If this is the case then X must be closed which
contradicts with the premise that X is not closed.
21

Closed frequent itemset


• An itemset that is closed and frequent.

• We store closed frequent itemsets along with their


support counts.
• We can get all frequent itemsets along with their support
counts.

• Number of closed frequent itemsets can be more


than the number of maximal frequent itemsets.
22

Can you find Closed Itemsets?


Transaction
Ids

Not supported
by any
transactions
23

Can you find Closed Itemsets?


Transaction
Ids

Look at this level.


C, D, E
are closed

Not supported
by any
transactions
24

Can you find Closed and maximal frequent


Itemsets?
Transaction
Ids

Not supported
by any
transactions
25

Maximal vs Closed Frequent Itemsets


Closed but not
Minimum support = 2 maximal

Closed
and
maximal

# Closed = 9
# Maximal = 4
26

Maximal vs Closed Itemsets

You might also like