0% found this document useful (0 votes)
16 views18 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

Lecture 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

FP-tree and FP-growth

Frequent itemset mining …


Is Apriori Fast Enough? — Performance
Bottlenecks

■ The core of the Apriori algorithm:


■ Use frequent (k – 1)-itemsets to generate candidate frequent
k-itemsets
■ Use database scan and pattern matching to collect counts for the
candidate itemsets
■ The bottleneck of Apriori: candidate generation
■ Huge candidate sets:

■ 104 frequent 1-itemset will generate 107 candidate 2-itemsets


■ To discover a frequent pattern of size 100, e.g., {a1, a2, …,
a100}, one needs to generate 2100 ≈ 1030 candidates.
■ Multiple scans of database:
■ Needs (n +1 ) scans, n is the length of the longest pattern

2
Mining Frequent Patterns Without
Candidate Generation

■ Compress a large database into a compact,


Frequent-Pattern tree (FP-tree) structure
■ highly condensed, but complete for frequent pattern
mining
■ avoid costly database scans
■ Develop an efficient, FP-tree-based frequent pattern
mining method
■ A divide-and-conquer methodology: decompose mining
tasks into smaller ones
■ Avoid candidate generation: sub-database test only!

3
Construct FP-tree from a
Transaction DB
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Steps: Header Table
1. Scan DB once, find frequent Item frequency head f:4 c:1
1-itemset (single item f 4
pattern) c 4 c:3 b:1 b:1
2. Order frequent items in a 3
b 3 a:3 p:1
frequency descending order m 3
3. Scan DB again, construct p 3 m:2 b:1
FP-tree
p:2 m:1 4
Construct FP-tree from a
Transaction DB
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

5
Benefits of the FP-tree Structure

■ Completeness:
■ never breaks a long pattern of any transaction

■ preserves complete information for frequent pattern


mining
■ Compactness
■ reduce irrelevant information—infrequent items are gone

■ frequency descending ordering: more frequent items are


more likely to be shared
■ Not larger than the original database (node-links and
counts are extra space; but asymptotically the size is
upperbouded by the DB size)
■ Example: For Connect-4 DB, compression ratio could be
over 100
6
Mining Frequent Patterns Using FP-tree

■ General idea (divide-and-conquer)


■ Recursively grow frequent pattern path using the

FP-tree
■ Method
■ For each item, construct its conditional pattern-base,

and then its conditional FP-tree


■ Repeat the process on each newly created conditional

FP-tree
■ Until the resulting FP-tree is empty, or it contains only

one path (single path will generate all the combinations of its
sub-paths, each of which is a frequent pattern)

7
Major Steps to Mine FP-tree

1) Construct conditional pattern base for each node in the


FP-tree
2) Construct conditional FP-tree from each conditional
pattern-base
3) Recursively mine conditional FP-trees and grow
frequent patterns obtained so far
▪ If the conditional FP-tree contains a single path,
simply enumerate all the patterns

8
Step 1: From FP-tree to Conditional
Pattern Base
■ Starting at the frequent header table in the FP-tree
■ Traverse the FP-tree by following the link of each frequent item
■ Accumulate all of transformed prefix paths of that item to form a
conditional pattern base

Header Table {}
Conditional pattern bases
Item frequency head f:4 c:1
f 4 itemcond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a fc:3
a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
9
Properties of FP-tree for Conditional
Pattern Base Construction

■ Node-link property
■ For any frequent item ai all the possible frequent
patterns that contain ai can be obtained by following
ai's node-links, starting from ai's head in the FP-tree
header
■ Prefix path property
■ To calculate the frequent patterns for a node ai in a
path P, only the prefix sub-path of ai in P need to be
accumulated, and its frequency count should carry the
same count as node ai

10
Step 2: Construct Conditional FP-tree

■ For each pattern-base


■ Accumulate the count for each item in the base

■ Construct the FP-tree for the frequent items of the


pattern base

{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
m,
a 3 🡲
b 3 a:3 p:1 f:3 🡲 fm, cm, am,
fcm, fam, cam,
m 3
p 3 m:2 b:1 c:3 fcam

p:2 m:1 a:3


m-conditional FP-tree
11
Mining Frequent Patterns by Creating
Conditional Pattern-Bases

Item Conditional pattern-base Conditional FP-tree


p {(fcam:2), (cb:1)} {(c:3)}|p
m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m
b {(fca:1), (f:1), (c:1)} Empty
a {(fc:3)} {(f:3, c:3)}|a
c {(f:3)} {(f:3)}|c
f Empty Empty

12
Step 3: Recursively mine the
conditional FP-tree
{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}

Cond. pattern base of “cam”: (f:3) f:3


cam-conditional FP-tree

13
Single FP-tree Path Generation

■ Suppose an FP-tree T has a single path P


■ The complete set of frequent pattern of T can be
generated by enumeration of all the combinations of the
sub-paths of P

{}
All frequent patterns
concerning m
f:3 m,
c:3 🡲 fm, cm, am,
fcm, fam, cam,
a:3 fcam

m-conditional FP-tree
14
Principles of Frequent Pattern
Growth
■ Pattern growth property
■ Let α be a frequent itemset in DB, B be α's
conditional pattern base, and β be an itemset in B.
Then α ∪ β is a frequent itemset in DB iff β is
frequent in B.
■ “abcdef ” is a frequent pattern, if and only if
■ “abcde ” is a frequent pattern, and
■ “f ” is frequent in the set of transactions containing
“abcde ”

15
Why Is Frequent Pattern Growth
Fast?

■ Our performance study shows


■ FP-growth is an order of magnitude faster than
Apriori, and is also faster than tree-projection
■ Reasoning
■ No candidate generation, no candidate test
■ Use compact data structure
■ Eliminate repeated database scan
■ Basic operation is counting and FP-tree building

16
FP-growth vs. Apriori: Scalability With
the Support Threshold

Data set T25I20D10K

17
FP-growth vs. Tree-Projection: Scalability
with Support Threshold

Data set T25I20D100K

18

You might also like