Lecture 13 14 FP
Lecture 13 14 FP
ALGORITHM
Bottleneck of Frequent-pattern Mining
Multiple database scans are costly
Mining long patterns needs many passes of
scanning and generates lots of candidates
To find frequent itemset i1i2…i100
# of scans: 100
# of Candidates: (1001) + (1002) + … + (110000) = 2100-1 =
1.27*1030 !
Bottleneck: candidate-generation-and-test
Can we avoid candidate generation?
c:1
a:1
m:1
p:1
c:2
a:2
m:1 b:1
p:1 m:1
a:2 p:1
m:1 b:1
p:1 m:1
p:2 m:1
Completeness:
never breaks a long pattern of any transaction
preserves complete information for frequent pattern mining
Compactness
reduce irrelevant information—infrequent items are gone
frequency descending ordering: more frequent items are
more likely to be shared
never be larger than the original database (if not count
node-links and counts)
Example: For Connect-4 DB, compression ratio could be
over 100
p:2 m:1
root
c:3 b:1
a:3 p:1
m:2
p:2
c:3 b:1
a:3 p:1
m:2
p:2
c:2 b:1
a:2
m:2
c:2 b:1
a:2
m needs to be pruned m=2
m:2 a needs to be pruned a=2
b needs to be pruned b=1
f needs to be pruned f=2
root root
f:2 c:1
c:3
c:2 b:1
a:2
All frequent patterns that
include p
m:2
{p, cp}
root
c:3
a:3
m:2 b:1
m:1
root
c:3
a:3
m:2 b:1
m:1
c:3
a:3
b:1
c:3
b:1
root root
f:3 f:3
c:3 c:3
a:3 a:3
b:1
f:3 f:3
c:3 c:3
a:3
root
a:3
b:1
root
a:3
b:1
c:1
a:1
Prune infrequent
root
c:1
f:2 c:1
root
c:3
a:3
root
c:3
a:3
c:3
c:3
root root
f:3 f:3
c:3 c:3
root
Prefix path for c
f:4 c:1
c:3
root
Prefix path for c
f:4 c:1
c:3
root root
f:3 f:3
root
f:4
root root
f:4
DONE……….