0% found this document useful (0 votes)
104 views18 pages

The Concept of Maximal Frequent Itemsets

The document discusses algorithms for efficiently mining maximal frequent itemsets from transactional databases. It introduces Max-Miner, which identifies long frequent itemsets to prune subsets. MAFIA integrates depth-first search with pruning to mine maximal frequent itemsets. GenMax uses a backtracking search and techniques like superset checking to efficiently mine maximal patterns. The document analyzes different types of databases and distributions of maximal pattern lengths.

Uploaded by

NareshMalviya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views18 pages

The Concept of Maximal Frequent Itemsets

The document discusses algorithms for efficiently mining maximal frequent itemsets from transactional databases. It introduces Max-Miner, which identifies long frequent itemsets to prune subsets. MAFIA integrates depth-first search with pruning to mine maximal frequent itemsets. GenMax uses a backtracking search and techniques like superset checking to efficiently mine maximal patterns. The document analyzes different types of databases and distributions of maximal pattern lengths.

Uploaded by

NareshMalviya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

The Concept of Maximal

Frequent Itemsets
NCU CSIE Database Laboratory
Kuo-Yu Huang
2002-04-15

Kuo-Yu Huang

NCU CSIE DBLab

Outline

Introduction
Max-Miner
MAFIA
GenMax
Conclusion

Kuo-Yu Huan

NCU CSIE DBLab

Introduction(1/2)
Interesting datasets with long patterns
Questionnaire results
Transactions database
Contain many frequently occurring items
A wide average record length

Apriori-like algorithms are inadequate


Enumerates every single frequent itemsets
Kuo-Yu Huan

NCU CSIE DBLab

Introduction(2/2)
Maximal Frequent Itemsets
If it has no superset that is frequent.
eq
Items: a, b, c, d, e
Frequent Itemset: {a, b, c}
{a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not
Frequent Itemset.
Maximal Frequent Itemsets: {a, b, c}
Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(1/4)
Efficiently mining long patterns from
databases
R. J. Bayardo
ACM SIGMOD98

Max-Miner
Abandons a bottom-up traversal
Attempts to look-ahead
Identify a long frequent itemset, prune all its
subsets.
Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(2/4)
Set-enumeration tree
Breadth-first search

Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(3/4)
Candidate group
Head: h(g)
Itemset enumerated by the node.

Tail: t(g)
An ordered set and contains all items not in h(g)

eg:Node {1}
h{g}: {1}
t{g}: {2, 3, 4}

Kuo-Yu Huan

NCU CSIE DBLab

Max-Miner(4/4)
Support counting
h(g), h(g)t{g}, h(g) {i} for all
If h(g)t{g} is frequent, then any itemset
enumerated by a sub-node will also be
frequent but no maximal.
If h(g){i} is infrequent, then any head of a
sub-node that contains item I will also be
infrequent.
Kuo-Yu Huan

NCU CSIE DBLab

MAFIA(1/4)
MAFIA: A Maximal Frequent Itemset
Algorithm for Transactional Databases.
D. Burdick, M. Calimlim, and J. Gehrke.
ICDE01

MAFIA
Integrates a depth-first traversal of the
itmset lattice with eiffective pruning
mechanisms
Kuo-Yu Huan

NCU CSIE DBLab

MAFIA(2/4)

Kuo-Yu Huan

NCU CSIE DBLab

10

MAFIA(3/4)
HUTMFI
Check Head Union Tail is in MFI
Stop searching and return

PEP
newNode = C i
Check newNode.support == C.support
Move I from C.tail to C.head

FHUT
newNode = C I
Whether I is the leftmost child in the tail

Kuo-Yu Huan

NCU CSIE DBLab

11

MAFIA(4/4)

Kuo-Yu Huan

NCU CSIE DBLab

12

GenMax(1/2)
Efficiently Mining Maximal Frequent
Itemsets
Karam Gouda and Mohammed J. Zaki.
ICDM01

GenMax
A backtrack search based algorithm for
mining maximal frequent itemsets.
Kuo-Yu Huan

NCU CSIE DBLab

13

GenMax(2/2)
Superset checking techniques
Do superset check only for Il+1Pl+1
Using check_status flag
Local maximal frequent itemsets

Reordering the combine set


Diffsets propagation

Kuo-Yu Huan

NCU CSIE DBLab

14

Conclusion(1/4)
Type I:
normal MFI distribution with not too long maximal patterns.

Type II:
Left-skewed distribution with longer pattern

Type III:
Exponential decay distribution with short maximal pattern

Type I
Type II
Type III

database

# of Items

Average length

# of records

Maximal pattern
length

Chess
Pumsb

76
7117

37
74

3196
49046

23(20%)
27(40%)

Connect
Pumsb*

130
7117

43
50

67557
49046

31(2.5%)
43(2.5%)

T10I4D100K
T40I10D100K

1000
1000

10
40

100,000
100,000

13(0.01%)
25(0.1%)

Kuo-Yu Huan

NCU CSIE DBLab

15

Conclusion(2/4)

Kuo-Yu Huan

NCU CSIE DBLab

16

Conclusion(3/4)

Kuo-Yu Huan

NCU CSIE DBLab

17

Conclusion(4/4)

Kuo-Yu Huan

NCU CSIE DBLab

18

You might also like