FP Growth

The FP-Growth algorithm is a method for mining frequent patterns without candidate generation, consisting of two phases: frequent itemset generation and rule generation. It constructs a compact FP-tree from transaction data and mines frequent itemsets by exploring the tree in a bottom-up manner, utilizing conditional pattern bases. The algorithm efficiently identifies frequent itemsets and generates valid association rules based on the mined data.

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views30 pages

FP Growth

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Frequent Pattern-Growth (FP-Growth)

Approach: Mining Frequent Patterns

Without Candidate Generation
FP-Growth Algorithm

 Similar to Apriori algorithm, the FP-Growth algorithm also follows a two step procedure
for association rule mining:

Phase 1: Frequent Itemset Generation

Phase 2: Rule Generation from Frequent Itemsets

 The FP-Growth algorithm differs from Apriori algorithm in the way in which it generate the
frequent itemsets.

 The FP-Growth algorithm generates all the frequent itemsets without candidate
generation.
Phase 1: Frequent itemsets Generation in FP-Growth Algorithm

 The FP-Growth algorithm follows a two-step approach to generate frequent itemsets from the given
set of transactions:

1. It compress a large database of transactions into a compact, Frequent‐Pattern tree (FP‐tree)

structure. The FP-tree retains the itemset association information.

2. It then divides the compressed database into a set of conditional pattern bases, each associated
with one frequent item or “pattern fragment”, and then mines each such database separately to
extract frequent itemsets.
Step 1 of Phase 1: FP-Tree Construction

 FP-Tree is constructed using 2 passes over the given transaction database

Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list

Use this ordered list or L-list in pass 2 to build the FP-Tree so that common prefixes can be
shared.
Step 1 of Phase 1: FP-Tree Construction

Pass 2:
(a node in the FP-tree correspond to a particular item in the given transactions and each node will have a
counter associated with it that indicates the number of transactions mapped onto the given path)
i). FP-Growth algorithm process one transaction at a time in L-list order and maps it to a path in
the FP-tree.
ii). Fixed order is used, so paths can overlap when transactions share items (when they have
the same prefix ).
– In this case, counters are incremented
iii). Pointers are maintained between nodes containing the same item, creating singly linked lists.
(pointers are represented using dotted lines in the FP-tree)
–The more paths that overlap, the higher the compression.
FP Growth Algorithm - Example

Consider the following transaction database (D). Lets assume that the minimum support
count = 2 and minimum confidence = 70%. Then, using FP-Growth algorithm (i) Find all
frequent itemsets (ii) List out all valid association rules
Step 1 of Phase 1: FP-Tree Construction
Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list

 Here, the L-list will be: {{I2:7}, {I1:6}, {I3:6}, {I4:2},{I5:2}}

Step 1 of Phase 1: FP-Tree Construction
Pass 2:

 Create the root of the FP-tree and label it as NULL or {}.

 The items in each transaction are then processed in L-list order (i.e., sorted according to descending
order of support count), and a branch is created for each transaction.

 The scan of the first transaction, “T1: I1, I2, I5,” which contains three items (I2, I1, I5
in L-order), leads to the construction of the first branch of the tree with three
nodes, <I2: 1>, <I1:1>, and <I5: 1>, where I2 is linked as a child of the root, I1 is
linked to I2, and I5 is linked to I1.
Step 1 of Phase 1: FP-Tree Construction
Pass 2:

After processing T1:

I5
Step 1 of Phase 1: FP-Tree Construction

Pass 2:

 The second transaction, T2, contains the items I2 and I4 in L order, which
would result in a branch where I2 is linked to the root and I4 is linked to I2.

 However, this branch would share a common prefix, I2, with the
existing path for T1.
 Therefore, we increment the count of the I2 node by 1, and create a new
node, <I4: 1>,which is linked as a child of <I2: 2>

 In general, when considering the branch to be added for a transaction, the count of each node
along a common prefix is incremented by 1, and nodes for the items following the prefix are
created and linked accordingly.
Step 1 of Phase 1: FP-Tree Construction

Pass 2: