fp-tree
fp-tree
Presented By:
Xun Luo and Shun Liang
04/07/2005
Outline
Introduction
Constructing FP-Tree
Example 1
Performance Evaluation
Discussions
2
Introduction
Terminology
Apriori-like Algorithms
Generate-and-Test
Cost Bottleneck
3
Terminology
Item set
A set of items: I = {a1, a2, ……, am}
Transaction database
DB = <T1, T2, ……, Tn>
Pattern
A set of items: A
Support
The number of transactions containing A in DB
Frequent pattern
A’s support ≥ minimum support threshold ξ
Frequent Pattern Mining Problem
The problem of finding the complete set of frequent patterns
4
Apriori-like Algorithms
Algorithm
Anti-Monotone Heuristic
If any length k pattern is not in the database, its length (k+1) super-pattern
can never be frequent
Generating candidate set
Testing candidate set
pattern matching.
5
FP-Tree and FP-Growth Algorithm
FP-Tree: Frequent Pattern Tree
Compact presentation of the DB without information loss.
Easy to traverse, can quickly find out patterns associated with
a certain item.
Well-ordered by item frequency.
FP-Growth Algorithm
Start mining from length-1 patterns
Recursively do the following
Constructs its conditional FP-tree
Performance Evaluation
Discussions
7
FP-Tree Definition
Three components:
One root: labeled as “null”
roo
A set of item prefix subtrees t
A frequent-item header table f:4 c:
1
c: b: b:
3 1 1
Header a: p:
ite Table
head of node- 3 1
m links m: b:
f
c 2 1
a
b p: m:
m 2 1
p
8
FP-Tree Definition (cont.)
Each node in the item prefix subtree consists of
three fields:
item-name
node-link
count
head of node-link
9
Example 1: FP-Tree Construction
The transaction database used (fist two column only):
TID Items Bought (Ordered) Frequent Items
100 f,a,c,d,g,i,m,p f,c,a,m,p
200 a,b,c,f,l,m,o f,c,a,b,m
300 b,f,h,j,o f,b
400 b,c,k,s,p c,b,p
500 a,f,c,e,l,p,m,n f,c,a,m,p
10
Example 1 (cont.)
First Scan: //count and sort
count the frequencies of each item
11
Example 1 (cont.)
Second Scan://create the tree and header table
create the root, label it as “null”
via node-links
12
Example 1 (cont.)
The building process of the tree
c: c: c: b:
1 2 2 1
a: a: a:
1 2 2
m: m: b: m: b:
1 1 1 1 1
p: p: m: p: m:
1 1 1 1 1
Create After tran After tran After tran
root s 1 (f,c,a, s 2 (f,c,a, s 3 (f,b)
m,p) b,m) 13
Example 1 (cont.)
The building process of the tree (cont.)
roo roo
t t
f:3 c: f:4 c:
1 1
c: b: b: c: b: b:
2 1 1 3 1 1
a: p: a: p:
2 1 3 1
m: b: m: b:
1 1 2 1
p: m: p: m:
1 1 2 1
15
FP-Tree Properties
Completeness
Each transaction that contains frequent pattern is
mapped to a path.
Prefix sharing does not cause path ambiguity, as
only path starts from root represents a transaction.
Compactness
Number of nodes bounded by overall occurrence of
frequent items.
Height of tree bounded by maximal number of
frequent items in any transaction.
16
FP-Tree Properties (cont.)
Traversal Friendly (for mining task)
For any frequent item a , all the possible frequent p
i
atterns that contain ai can be obtained by following
ai’s node-links.
This property is important for divide-and-conquer.
It assures the soundness and completeness of probl
em reduction.
17
Outline
Introduction
Constructing FP-Tree
Example 1
Performance Evaluation
Discussions
18
FP-Growth Algorithm
Functionality:
Mining frequent patterns using FP-Tree generated before
Input:
FP-tree constructed earlier
Main algorithm:
Call FP-growth(FP-tree, null)
19
FP-growth(Tree, α)
Procedure FP-growth(Tree, α)
{
if (Tree contains only a single path P)
{ for each combination β of the nodes in P
{ generate pattern β Uα;
support = min(sup of all nodes in β)
}
}
else // Tree contains more than one path
{ for each ai in the header of Tree
{ generate pattern β= ai Uα;
β.support = ai.support;
construct β’s conditional pattern base;
construct β’s conditional FP-tree Treeβ;
if (Treeβ ≠ Φ)
FP-growth(Treeβ , β);
}
}
20
Example 2
Start from the bottom of the header table: node p
Two paths transformed prefix path
p’s conditional pattern base
{(f:2, c:2, a:2, m:2), (c:1, b:1)} roo
t
p’s conditional FP-tree f:4 c:
1
Only one branch (c:3)
c: b: b:
pattern (cp:3) 3 1 1
Patterns: Header a: p:
ite Table
head of node- 3 1
(p:3)
m links m: b:
(cp:3) f
c 2 1
a
b p: m:
m 2 1
p 21
Example 2 (cont.)
Continue with node m
Two paths
m’s conditional pattern base
{(f:2, c:2, a:2), (f:1,c:1, a:1, b:1)} roo
t
m’s conditional FP-tree: f:4 c:
(f:3, c:3, a:3) 1
c: b: b:
Call mine(<f:3, c:3, a:3>| m) 3 1 1
Patterns: Header a: p:
ite Table
head of node- 3 1
(m:3)
m links m: b:
f
see next slide c 2 1
a
b p: m:
m 2 1
p 22
mine(<(f:3, c:3, a:3>| m)
roo
node a:
t
(am:3)
Header
call mine(<f:3, c:3>|am)
(cam:3) ite Table
head of node- f:3
call(<f:3)|cam) m links
f
(fcam:3)
c
(fam:3) a c:
node c: 3
(cm:3)
call mine(<f:3>|cm)
(fcm:3)
a:
3
node f: conditional FP-tree
(fm:3) of “m”
All the patterns: (m:3, am:3, cm:3, fm:3, cam:3, fam:3, fcm:3, fcam:3)
Conclusion: A single path FP-Tree can be mined by outputting all the
combination of the items in the path.
23
Example 2 (cont.)
Continue with node b
Three paths
roo
b’s conditional pattern base t
{(f:1, c:1, a:1), (f:1), (c:1)} f:4 c:
1
b’s conditional FP-tree c: b: b:
3 1 1
Φ Header a: p:
ite Table
head of node- 3 1
Patterns: m
f links m: b:
(b:3) c 2 1
a
b p: m:
m 2 1
p 24
Example 2 (cont.)
Continue with node a
One path
a’s conditional pattern base roo
{(f:3, c:3)} t
f:4 c:
a’s conditional FP-tree 1
{(f:3, c:3)} c: b: b:
Patterns: 3 1 1
Header a: p:
(a:3)
ite Table
head of node- 3 1
(ca:3) m links
f m: b:
(fa:3) c 2 1
(fca:3) a
b p: m:
m 2 1
p 25
Example 2 (cont.)
Continue with node c
Two paths
c’s conditional pattern base roo
t
{(f:3)}
f:4 c:
1
c’s conditional FP-tree c: b: b:
{(f:3)} 3 1 1
Header a: p:
Patterns: ite Table
head of node- 3 1
(c:4) m links m: b:
f
c 2 1
(fc:3) a
b p: m:
m 2 1
p 26
Example 2 (cont.)
Continue with node f
One path
roo
f’s conditional pattern base t
f:4 c:
Φ 1
f’s conditional FP-tree c: b: b:
3 1 1
Φ Header a: p:
ite Table
head of node- 3 1
Patterns: m
f links m: b:
(f:4) c 2 1
a
b p: m:
m 2 1
p 27
Example 2 (cont.)
Final results:
item conditional pattern base conditional FP-tree
p {(f:2, c:2, a:2, m:2), (c:1, b:1)} {(c:3)}| p
m {(f:4, c:3, a:3, m:2), {(f:3, c:3, a:3)}| m
(f:4, c:3, a:3, b:1, m:1)}
b {(f:4, c:3, a:3, b:1), (f:4, b:1), Φ
(c:1, b:1)}
a {(f;3, c:3)} {(f:3, c:3}| a
c {(f:3)} {(f:3)}| c
f Φ Φ
28
FP-Growth Properties
Property 3.2 : Prefix path property
To calculate the frequent patterns for a node ai in a
path P, only the prefix subpath of node ai in P need
to be accumulated, and the frequency count of ever
y node in the prefix path should carry the same cou
nt as node ai.
Lemma 3.1 : Fragment growth
Letαbe an itemset in DB, B beα's conditional patter
n base, and βbe an itemset in B. Then the support o
fαUβin DB is equivalent to the support of βin B.
29
FP-Growth Properties (cont.)
Corollary 3.1 (Pattern growth)
Let αbe a frequent itemset in DB, B be α's conditional patter
n base, and βbe an itemset in B. Then αUβis frequent in DB
if and only if is βfrequent in B.
Lemma 3.2 (Single FP-tree path pattern generation)
Suppose an FP-tree T has a single path P. The complete set
of the frequent patterns of T can be generated by the enumer
ation of all the combinations of the subpaths of P with the s
upport being the minimum support of the items contained in
the subpath.
30
Outline
Introduction
Constructing FP-Tree
Example 1
Performance Evaluation
Discussions
31
Performance Evaluation:
FP-Tree vs. Apriori
Scalability with Support Threshold
32
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
Per-item runtime actually decreases with
support threshold decrease.
33
Performance Evaluation:
FP-Tree vs. Apriori (Cont.)
Scalability with DB size.
34
Outline
Introduction
Constructing FP-Tree
Example 1
Performance Evaluation
Discussions
35
Discussions
When database is extremely large.
Use FP-Tree on projected databases.
Materialization of an FP-Tree
Construct it independently of queries,
with an
reasonably fit-majority minimum support-threshold.
Incremental updates of an FP-Tree.
Record frequency count for every item.
Control by watermark.
36
Thank you!
Q & A.
37