Lecture 6
Lecture 6
2
Mining Frequent Patterns Without
Candidate Generation
3
Construct FP-tree from a
Transaction DB
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Steps: Header Table
1. Scan DB once, find frequent Item frequency head f:4 c:1
1-itemset (single item f 4
pattern) c 4 c:3 b:1 b:1
2. Order frequent items in a 3
b 3 a:3 p:1
frequency descending order m 3
3. Scan DB again, construct p 3 m:2 b:1
FP-tree
p:2 m:1 4
Construct FP-tree from a
Transaction DB
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
5
Benefits of the FP-tree Structure
■ Completeness:
■ never breaks a long pattern of any transaction
FP-tree
■ Method
■ For each item, construct its conditional pattern-base,
FP-tree
■ Until the resulting FP-tree is empty, or it contains only
one path (single path will generate all the combinations of its
sub-paths, each of which is a frequent pattern)
7
Major Steps to Mine FP-tree
8
Step 1: From FP-tree to Conditional
Pattern Base
■ Starting at the frequent header table in the FP-tree
■ Traverse the FP-tree by following the link of each frequent item
■ Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
Header Table {}
Conditional pattern bases
Item frequency head f:4 c:1
f 4 itemcond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a fc:3
a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
9
Properties of FP-tree for Conditional
Pattern Base Construction
■ Node-link property
■ For any frequent item ai all the possible frequent
patterns that contain ai can be obtained by following
ai's node-links, starting from ai's head in the FP-tree
header
■ Prefix path property
■ To calculate the frequent patterns for a node ai in a
path P, only the prefix sub-path of ai in P need to be
accumulated, and its frequency count should carry the
same count as node ai
10
Step 2: Construct Conditional FP-tree
{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
m,
a 3 🡲
b 3 a:3 p:1 f:3 🡲 fm, cm, am,
fcm, fam, cam,
m 3
p 3 m:2 b:1 c:3 fcam
12
Step 3: Recursively mine the
conditional FP-tree
{}
c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree
{}
13
Single FP-tree Path Generation
{}
All frequent patterns
concerning m
f:3 m,
c:3 🡲 fm, cm, am,
fcm, fam, cam,
a:3 fcam
m-conditional FP-tree
14
Principles of Frequent Pattern
Growth
■ Pattern growth property
■ Let α be a frequent itemset in DB, B be α's
conditional pattern base, and β be an itemset in B.
Then α ∪ β is a frequent itemset in DB iff β is
frequent in B.
■ “abcdef ” is a frequent pattern, if and only if
■ “abcde ” is a frequent pattern, and
■ “f ” is frequent in the set of transactions containing
“abcde ”
15
Why Is Frequent Pattern Growth
Fast?
16
FP-growth vs. Apriori: Scalability With
the Support Threshold
17
FP-growth vs. Tree-Projection: Scalability
with Support Threshold
18