0% found this document useful (0 votes)
14 views30 pages

FP Growth

The FP-Growth algorithm is a method for mining frequent patterns without candidate generation, consisting of two phases: frequent itemset generation and rule generation. It constructs a compact FP-tree from transaction data and mines frequent itemsets by exploring the tree in a bottom-up manner, utilizing conditional pattern bases. The algorithm efficiently identifies frequent itemsets and generates valid association rules based on the mined data.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

FP Growth

The FP-Growth algorithm is a method for mining frequent patterns without candidate generation, consisting of two phases: frequent itemset generation and rule generation. It constructs a compact FP-tree from transaction data and mines frequent itemsets by exploring the tree in a bottom-up manner, utilizing conditional pattern bases. The algorithm efficiently identifies frequent itemsets and generates valid association rules based on the mined data.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Frequent Pattern-Growth (FP-Growth)

Approach: Mining Frequent Patterns


Without Candidate Generation
FP-Growth Algorithm

 Similar to Apriori algorithm, the FP-Growth algorithm also follows a two step procedure
for association rule mining:

Phase 1: Frequent Itemset Generation

Phase 2: Rule Generation from Frequent Itemsets

 The FP-Growth algorithm differs from Apriori algorithm in the way in which it generate the
frequent itemsets.

 The FP-Growth algorithm generates all the frequent itemsets without candidate
generation.
Phase 1: Frequent itemsets Generation in FP-Growth Algorithm

 The FP-Growth algorithm follows a two-step approach to generate frequent itemsets from the given
set of transactions:

1. It compress a large database of transactions into a compact, Frequent‐Pattern tree (FP‐tree)


structure. The FP-tree retains the itemset association information.

2. It then divides the compressed database into a set of conditional pattern bases, each associated
with one frequent item or “pattern fragment”, and then mines each such database separately to
extract frequent itemsets.
Step 1 of Phase 1: FP-Tree Construction

 FP-Tree is constructed using 2 passes over the given transaction database

Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list

Use this ordered list or L-list in pass 2 to build the FP-Tree so that common prefixes can be
shared.
Step 1 of Phase 1: FP-Tree Construction

Pass 2:
(a node in the FP-tree correspond to a particular item in the given transactions and each node will have a
counter associated with it that indicates the number of transactions mapped onto the given path)
i). FP-Growth algorithm process one transaction at a time in L-list order and maps it to a path in
the FP-tree.
ii). Fixed order is used, so paths can overlap when transactions share items (when they have
the same prefix ).
– In this case, counters are incremented
iii). Pointers are maintained between nodes containing the same item, creating singly linked lists.
(pointers are represented using dotted lines in the FP-tree)
–The more paths that overlap, the higher the compression.
FP Growth Algorithm - Example

Consider the following transaction database (D). Lets assume that the minimum support
count = 2 and minimum confidence = 70%. Then, using FP-Growth algorithm (i) Find all
frequent itemsets (ii) List out all valid association rules
Step 1 of Phase 1: FP-Tree Construction
Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list

 Here, the L-list will be: {{I2:7}, {I1:6}, {I3:6}, {I4:2},{I5:2}}


Step 1 of Phase 1: FP-Tree Construction
Pass 2:

 Create the root of the FP-tree and label it as NULL or {}.

 The items in each transaction are then processed in L-list order (i.e., sorted according to descending
order of support count), and a branch is created for each transaction.

 The scan of the first transaction, “T1: I1, I2, I5,” which contains three items (I2, I1, I5
in L-order), leads to the construction of the first branch of the tree with three
nodes, <I2: 1>, <I1:1>, and <I5: 1>, where I2 is linked as a child of the root, I1 is
linked to I2, and I5 is linked to I1.
Step 1 of Phase 1: FP-Tree Construction
Pass 2:

After processing T1:

I5
Step 1 of Phase 1: FP-Tree Construction

Pass 2:

 The second transaction, T2, contains the items I2 and I4 in L order, which
would result in a branch where I2 is linked to the root and I4 is linked to I2.

 However, this branch would share a common prefix, I2, with the
existing path for T1.
 Therefore, we increment the count of the I2 node by 1, and create a new
node, <I4: 1>,which is linked as a child of <I2: 2>

 In general, when considering the branch to be added for a transaction, the count of each node
along a common prefix is incremented by 1, and nodes for the items following the prefix are
created and linked accordingly.
Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T2:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T3:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T4:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T5:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T6:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

After processing T9:


Step 1 of Phase 1: FP-Tree Construction

Pass 2:

 To facilitate tree traversal, an item header table is built so


that each item points to its occurrences in the tree via a chain of
node-links.
Step 2 of Phase 1: Mining frequent itemsets from the FP-tree
Step 2 of Phase 1: Mining frequent itemsets from the FP-tree

 A frequent length-1 pattern refers to a single item from the dataset that satisfies the
min_sup threshold. For example, in the given transaction database, each item I1, I2, I3, I4
and I5 is considered as a length-1 pattern.

 A suffix pattern is the item being considered for mining, starting from the simplest case of a
frequent length-1 pattern. In the previous example, each of the length-1 patterns is treated
as a suffix pattern.

 For a given suffix pattern, its conditional pattern base is constructed. This is essentially a
subset of the database, which consists of all the paths in the FP-tree that co-occur with the
suffix pattern (known as prefix paths). These prefix paths show how often the item (suffix
pattern) appears together with other items in the transactions.

 Conditional FP-tree is a smaller FP-tree built only for the items in the conditional pattern
base, keeping track of their frequencies.
Step 2 of Phase 1: Mining frequent itemsets from the FP-tree
Step 2: Mining frequent itemsets from the FP-tree

 The FP-growth algorithm generates frequent itemsets from an FP-tree by


exploring the tree in a bottom-up fashion.

 The algorithm will iterate the item header table in a


reverse order.
 So, first of all, it will find all the frequent items ending in I5, then I4, I3, I1 and
finally I2.
 Since every transaction is mapped onto a path in the FP-tree, we can
derive the frequent itemsets ending with a particular item, say I5, by
examining only the paths containing I5 as the last node. These paths can
be accessed rapidly using the pointers associated with node I5
 The current node that we are examining in the item header table is called suffix pattern
Step 2: Mining frequent itemsets from the FP-tree

 Since the item header table would be iterated in a reverse order, first those
frequent itemsets would be searched that end with the item I5. (i.e the suffix
pattern is I5)
 So, we will gather all the paths in the FP-tree starting from root and ending in
suffix I5
(excluding root and suffix). These paths are known as the prefix paths of
I5. Since the item header table already consists of only frequent items,
so I5 itself is frequent and we can expect itemsets ending with I5 to be
frequent as well.
 The occurrences of I5 can easily be found by following its chain of node-
links from the item header table.

 Considering I5 as a suffix pattern, its corresponding two prefix paths are <I2,
I1> and <I2, I1, I3>
Step 2: Mining frequent itemsets from the FP-tree

 Next step would be to update the support count of the nodes to only
represent those paths which contain node I5. For example, <I2:7, I1:4,
I3:2, I5:1> contains many paths without the node I5 (for example, <I2,
I1>). So, we have to update the support counts of all the nodes in the
prefix paths. We do this by placing the support count of the node I5 to all
 of its parent nodes till the root node.
We have to update all the prefix paths of I5 and the resulting paths are referred to as
transformed prefix paths.

 Thus, for I5, the transformed prefix paths are:

<I2:1, I1:1> and <I2:1, I1:1, I3:1>

 These transformed prefix paths are also known as conditional (sub-) pattern base
Step 2: Mining frequent itemsets from the FP-tree

 Next step is the accumulation of all conditional pattern bases of item I5 to form I5’s conditional FP-tree.

onditional pattern bases for the item I5 is accumulated in such a way that the support count of each items
n the conditional pattern bases are updated by adding all the support counts of that item in the conditional
pattern bases and then eliminate all those items whose support count is less than the minimum support c

 Thus, the conditional pattern bases for I5 (i.e., <I2:1, I1:1> and <I2:1, I1:1, I3:1>) can
be accumulated to form <I2:2,I1:2>. Here, I3 will be discarded because after
accumulation its count will be 1 and less than minsup value.

 This accumulated conditional pattern bases of I5 forms the


conditional FP-Tree of I5
Step 2: Mining frequent itemsets from the FP-tree

 All frequent itemsets corresponding to suffix pattern I 5 are generated by considering all possible
combinations of I5 with its conditional FP-Tree - <I2:2,I1:2>.

 This results in the following frequent patterns:

<I2, I5:2>, <I1, I5:2> and <I2, I1, I5:2>


Step 2: Mining frequent itemsets from the FP-tree

 The same procedure is applied to suffixes I4, I3 and I1 to generate the frequent itemsets
corresponding to these suffixes.

 Note: In the conditional pattern base for the item I 3, there are two paths on the left subtree and one
path on the right subtree. So, we need to keep this fact in our mind when the conditional pattern
bases are accumulated. As a result, {I1:2} can not be accumulated with {I2,I1:2} but we can merge
{I2:2} with {I2,I1:2}.

 Note: I2 from the item header table is not taken into consideration for suffix pattern because it
doesn’t have any prefix at all.
Step 2: Mining frequent itemsets from the FP-tree
Step 2: Mining frequent itemsets from the FP-tree

 Thus FP-growth algorithm results the following frequent itemsets.

{I2,I5}, {I1,I5}, {I2,I1,I5}


{I2,I4}
{I2,I3}, {I1,I3}, {I2,I1,I3}
{I2,I1}
Phase II - Rule generation from frequent
itemsets
Valid association rules can be generated from the above obtained frequent
itemsets by using the following steps:

You might also like