FP Growth
FP Growth
Similar to Apriori algorithm, the FP-Growth algorithm also follows a two step procedure
for association rule mining:
The FP-Growth algorithm differs from Apriori algorithm in the way in which it generate the
frequent itemsets.
The FP-Growth algorithm generates all the frequent itemsets without candidate
generation.
Phase 1: Frequent itemsets Generation in FP-Growth Algorithm
The FP-Growth algorithm follows a two-step approach to generate frequent itemsets from the given
set of transactions:
2. It then divides the compressed database into a set of conditional pattern bases, each associated
with one frequent item or “pattern fragment”, and then mines each such database separately to
extract frequent itemsets.
Step 1 of Phase 1: FP-Tree Construction
Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list
Use this ordered list or L-list in pass 2 to build the FP-Tree so that common prefixes can be
shared.
Step 1 of Phase 1: FP-Tree Construction
Pass 2:
(a node in the FP-tree correspond to a particular item in the given transactions and each node will have a
counter associated with it that indicates the number of transactions mapped onto the given path)
i). FP-Growth algorithm process one transaction at a time in L-list order and maps it to a path in
the FP-tree.
ii). Fixed order is used, so paths can overlap when transactions share items (when they have
the same prefix ).
– In this case, counters are incremented
iii). Pointers are maintained between nodes containing the same item, creating singly linked lists.
(pointers are represented using dotted lines in the FP-tree)
–The more paths that overlap, the higher the compression.
FP Growth Algorithm - Example
Consider the following transaction database (D). Lets assume that the minimum support
count = 2 and minimum confidence = 70%. Then, using FP-Growth algorithm (i) Find all
frequent itemsets (ii) List out all valid association rules
Step 1 of Phase 1: FP-Tree Construction
Pass 1:
i) Scan each transaction and find the support count or support for each item.
ii) Discard infrequent items.
iii) Sort frequent items in decreasing order of their support count or support.
The resulting set or list is known as L-list
The items in each transaction are then processed in L-list order (i.e., sorted according to descending
order of support count), and a branch is created for each transaction.
The scan of the first transaction, “T1: I1, I2, I5,” which contains three items (I2, I1, I5
in L-order), leads to the construction of the first branch of the tree with three
nodes, <I2: 1>, <I1:1>, and <I5: 1>, where I2 is linked as a child of the root, I1 is
linked to I2, and I5 is linked to I1.
Step 1 of Phase 1: FP-Tree Construction
Pass 2:
I5
Step 1 of Phase 1: FP-Tree Construction
Pass 2:
The second transaction, T2, contains the items I2 and I4 in L order, which
would result in a branch where I2 is linked to the root and I4 is linked to I2.
However, this branch would share a common prefix, I2, with the
existing path for T1.
Therefore, we increment the count of the I2 node by 1, and create a new
node, <I4: 1>,which is linked as a child of <I2: 2>
In general, when considering the branch to be added for a transaction, the count of each node
along a common prefix is incremented by 1, and nodes for the items following the prefix are
created and linked accordingly.
Step 1 of Phase 1: FP-Tree Construction
Pass 2:
Pass 2:
Pass 2:
Pass 2:
Pass 2:
Pass 2:
Pass 2:
A frequent length-1 pattern refers to a single item from the dataset that satisfies the
min_sup threshold. For example, in the given transaction database, each item I1, I2, I3, I4
and I5 is considered as a length-1 pattern.
A suffix pattern is the item being considered for mining, starting from the simplest case of a
frequent length-1 pattern. In the previous example, each of the length-1 patterns is treated
as a suffix pattern.
For a given suffix pattern, its conditional pattern base is constructed. This is essentially a
subset of the database, which consists of all the paths in the FP-tree that co-occur with the
suffix pattern (known as prefix paths). These prefix paths show how often the item (suffix
pattern) appears together with other items in the transactions.
Conditional FP-tree is a smaller FP-tree built only for the items in the conditional pattern
base, keeping track of their frequencies.
Step 2 of Phase 1: Mining frequent itemsets from the FP-tree
Step 2: Mining frequent itemsets from the FP-tree
Since the item header table would be iterated in a reverse order, first those
frequent itemsets would be searched that end with the item I5. (i.e the suffix
pattern is I5)
So, we will gather all the paths in the FP-tree starting from root and ending in
suffix I5
(excluding root and suffix). These paths are known as the prefix paths of
I5. Since the item header table already consists of only frequent items,
so I5 itself is frequent and we can expect itemsets ending with I5 to be
frequent as well.
The occurrences of I5 can easily be found by following its chain of node-
links from the item header table.
Considering I5 as a suffix pattern, its corresponding two prefix paths are <I2,
I1> and <I2, I1, I3>
Step 2: Mining frequent itemsets from the FP-tree
Next step would be to update the support count of the nodes to only
represent those paths which contain node I5. For example, <I2:7, I1:4,
I3:2, I5:1> contains many paths without the node I5 (for example, <I2,
I1>). So, we have to update the support counts of all the nodes in the
prefix paths. We do this by placing the support count of the node I5 to all
of its parent nodes till the root node.
We have to update all the prefix paths of I5 and the resulting paths are referred to as
transformed prefix paths.
These transformed prefix paths are also known as conditional (sub-) pattern base
Step 2: Mining frequent itemsets from the FP-tree
Next step is the accumulation of all conditional pattern bases of item I5 to form I5’s conditional FP-tree.
onditional pattern bases for the item I5 is accumulated in such a way that the support count of each items
n the conditional pattern bases are updated by adding all the support counts of that item in the conditional
pattern bases and then eliminate all those items whose support count is less than the minimum support c
Thus, the conditional pattern bases for I5 (i.e., <I2:1, I1:1> and <I2:1, I1:1, I3:1>) can
be accumulated to form <I2:2,I1:2>. Here, I3 will be discarded because after
accumulation its count will be 1 and less than minsup value.
All frequent itemsets corresponding to suffix pattern I 5 are generated by considering all possible
combinations of I5 with its conditional FP-Tree - <I2:2,I1:2>.
The same procedure is applied to suffixes I4, I3 and I1 to generate the frequent itemsets
corresponding to these suffixes.
Note: In the conditional pattern base for the item I 3, there are two paths on the left subtree and one
path on the right subtree. So, we need to keep this fact in our mind when the conditional pattern
bases are accumulated. As a result, {I1:2} can not be accumulated with {I2,I1:2} but we can merge
{I2:2} with {I2,I1:2}.
Note: I2 from the item header table is not taken into consideration for suffix pattern because it
doesn’t have any prefix at all.
Step 2: Mining frequent itemsets from the FP-tree
Step 2: Mining frequent itemsets from the FP-tree