0% found this document useful (0 votes)
107 views26 pages

4.1) FP Growth Algorithm

The FP-Growth algorithm uses an FP-tree to efficiently mine frequent itemsets from transactional databases. It involves two steps: (1) building an FP-tree from the database by scanning it twice, and (2) mining the FP-tree by identifying conditional patterns and constructing conditional FP-trees.

Uploaded by

Sanjana Sairama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views26 pages

4.1) FP Growth Algorithm

The FP-Growth algorithm uses an FP-tree to efficiently mine frequent itemsets from transactional databases. It involves two steps: (1) building an FP-tree from the database by scanning it twice, and (2) mining the FP-tree by identifying conditional patterns and constructing conditional FP-trees.

Uploaded by

Sanjana Sairama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Association Analysis (3)

FP-Tree/FP-Growth Algorithm
• Use a compressed representation of the database using an
FP-tree
• Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets.

Building the FP-Tree


1. Scan data to determine the support count of each item.
Infrequent items are discarded, while the frequent items are
sorted in decreasing support counts.
2. Make a second pass over the data to construct the FPtree.
As the transactions are read, before being processed, their items
are sorted according to the above order.
First scan – determine frequent 1-
itemsets, then build header
TID Items B 8
1 {A,B} A 7
2 {B,C,D}
C 7
3 {A,C,D,E}
D 5
4 {A,D,E}
E 3
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
FP-tree construction
null
After reading TID=1:

B:1
TID Items
1 {A,B}
2 {B,C,D} A:1
3 {A,C,D,E}
4 {A,D,E} After reading TID=2:
5 {A,B,C} null
6 {A,B,C,D}
B:2
7 {B,C}
8 {A,B,C}
C:1
9 {A,B,D} A:1
10 {B,C,E}
D:1
FP-Tree Construction
TID Items
Transaction
1 {A,B} null
2 {B,C,D} Database
3 {A,C,D,E}
4 {A,D,E}
B:8 A:2
5 {A,B,C}
6 {A,B,C,D} A:5 C:3 C:1 D:1
7 {B,C}
8 {A,B,C}
9 {A,B,D} C:3 D:1 D:1 E:1 D:1 E:1
10 {B,C,E}

Header table D:1 E:1


Item Pointer
B 8
A 7
Chain pointers help in quickly finding all the paths
C 7
of the tree containing some given item.
D 5
E 3
FP-Tree size
• The size of an FPtree is typically smaller than the size of the uncompressed
data because many transactions often share a few items in common.
• Bestcase scenario:
– All transactions have the same set of items, and the FPtree contains only a
single branch of nodes.
• Worstcase scenario:
– Every transaction has a unique set of items.
– As none of the transactions have any items in common, the size of the FP-
tree is effectively the same as the size of the original data.

• The size of an FPtree also depends on how the items are ordered.
– If the ordering scheme in the preceding example is reversed,
• i.e., from lowest to highest support item, the resulting FPtree probably is
denser (shown in next slide).
• Not always though…ordering is just a heuristic.
An FPtree representation for the data set with a different item ordering scheme.
FP-Growth (I)
• FPgrowth generates frequent itemsets from an FPtree by
exploring the tree in a bottomup fashion.

• Given the example tree, the algorithm looks for frequent


itemsets ending in E first, followed by D, C, A, and finally, B.

• Since every transaction is mapped onto a path in the FPtree, we


can derive the frequent itemsets ending with a particular item,
say, E, by examining only the paths containing node E.

• These paths can be accessed rapidly using the pointers


associated with node E.
Paths containing node E
null

B:8 A:2

A:5 C:3 C:1 D:1

C:3 D:1 D:1 E:1 D:1 E:1

null
D:1 E:1

B:3 A:2

C:3 C:1 D:1

E:1 D:1 E:1

E:1
Conditional FP-Tree for E
• We now need to build a conditional FP-Tree for E, which is the
tree of itemsets ending in E.

• It is not the tree obtained in previous slide as result of deleting


nodes from the original tree.

• Why? Because the order of the items change.


– In this example, C has a higher count than B.
Conditional FP-Tree for E
null Header table
Item Pointer
The
B:3 A:2 C 4
conditional
B 3
FP-Tree for E
A 2
C:3 C:1 D:1 D 2 null
The new
C:3 C:1 A:1
E:1 D:1 E:1 header

B:3
E:1 A:1 D:1

The set of paths containing E.


D:1
Insert each path (after truncating Adding up the counts for D we get
E) into a new tree. 2, so {E,D} is frequent itemset.

We continue recursively.
Base of recursion: When the tree
has a single path only.
FP-Tree Another Example
Transactions Freq. 1-Itemsets. Transactions with items sorted based
Supp. Count 2 on frequencies, and ignoring the
infrequent items.
ABCEFO A:8 ACEBF
ACG C:8 ACG
EI E:8 E
ACDEG G:5 ACEGD
B:2
ACEGL ACEG
D:2
EJ E
F:2
ABCEFP ACEBF
ACD ACD
ACEGM ACEG
ACEGN ACEG
FP-Tree after reading 1st transaction
ACEBF
Header null
ACG
E A:8 A:1
C:8
ACEGD
E:8 C:1
ACEG
G:5
E
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 2nd transaction
ACEBF
Header null
ACG
E A:8 A:2
C:8
ACEGD
E:8 C:2
ACEG
G:5
E G:1
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 3rd transaction
ACEBF
Header null
ACG
E A:8 A:2 E:1
C:8
ACEGD
E:8 C:2
ACEG
G:5
E G:1
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 4th transaction
ACEBF
Header null
ACG
E A:8 A:3 E:1
C:8
ACEGD
E:8 C:3
ACEG
G:5
E G:1
B:2 E:2
ACEBF D:2
ACD F:2 B:1
G:1
ACEG
ACEG F:1 D:1
FP-Tree after reading 5th transaction
ACEBF
Header null
ACG
E A:8 A:4 E:1
C:8
ACEGD
E:8 C:4
ACEG
G:5
E G:1
B:2 E:3
ACEBF D:2
ACD F:2 B:1
G:2
ACEG
ACEG F:1 D:1
FP-Tree after reading 6th transaction
ACEBF
Header null
ACG
E A:8 A:4 E:2
C:8
ACEGD
E:8 C:4
ACEG
G:5
E G:1
B:2 E:3
ACEBF D:2
ACD F:2 B:1
G:2
ACEG
ACEG F:1 D:1
FP-Tree after reading 7th transaction
ACEBF
Header null
ACG
E A:8 A:5 E:2
C:8
ACEGD
E:8 C:5
ACEG
G:5
E G:1
B:2 E:4
ACEBF D:2
ACD F:2 B:2
G:2
ACEG
ACEG F:2 D:1
FP-Tree after reading 8th transaction
ACEBF
Header null
ACG
E A:8 A:6 E:2
C:8
ACEGD
E:8 C:6
ACEG
G:5
E G:1 D:1
B:2 E:4
ACEBF D:2
ACD F:2 B:2
G:2
ACEG
ACEG F:2 D:1
FP-Tree after reading 9th transaction
ACEBF
Header null
ACG
E A:8 A:7 E:2
C:8
ACEGD
E:8 C:7
ACEG
G:5
E G:1 D:1
B:2 E:5
ACEBF D:2
ACD F:2 B:2
G:3
ACEG
ACEG F:2 D:1
FP-Tree after reading 10th transaction
ACEBF
Header null
ACG
E A:8 A:8 E:2
C:8
ACEGD
E:8 C:8
ACEG
G:5
E G:1 D:1
B:2 E:6
ACEBF D:2
ACD F:2 B:2
G:4
ACEG
ACEG F:2 D:1
Conditional FP-Trees
Build the conditional FP-Tree for each of the items.
For this:

1. Find the paths containing on focus item. With those paths we


build the conditional FP-Tree for the item.

2. Read again the tree to determine the new counts of the items
along those paths. Build a new header.

3. Insert the paths in the conditional FP-Tree according to the new


order.
Conditional FP-Tree for F
Header null null
New Header

A:8 A:8 A:2 A:2


C:8 C:2
E:8 C:8 E:2 C:2
G:5 B:2
B:2 E:6 E:2
D:2
F:2 B:2 B:2

F:2

There is only a single path containing F


Recursion
• We continue recursively on the
null
conditional FP-Tree for F. New Header
• However, when the tree is just a A:6 A:2
single path it is the base case for C:6
the recursion. E:5 C:2
• So, we just produce all the subsets B:2
of the items on this path merged E:2
with F.
B:2
{F} {A,F} {C,F} {E,F} {B,F}
{A,C,F}, …,
{A,C,E,F}
Conditional FP-Tree for D
New Header null
null

A:8
A:2 A:2
C:2
C:2
C:8
The other items are
E:6 D:1 removed as infrequent.
The tree is just a single path; it is
G:4 the base case for the recursion.
So, we just produce all the
subsets of the items on this path
merged with D.
D:1
{D} {A,D} {C,D} {A,C,D}
Paths containing D after updating the counts
Exercise: Complete the example.

You might also like