FP Growth Algorithm
FP Growth Algorithm
Lecture 33/15-10-09
Lecture 33/15-10-09 1
Observations about FP-tree
• Size of FP-tree depends on how items are
ordered.
• In the previous example, if ordering is done in
increasing order, the resulting FP-tree will be
different and for this example, it will be denser
(wider).
• At the root node the branching factor will
increase from 2 to 5 as shown on next slide.
• Also, ordering by decreasing support count
doesn’t always lead to the smallest tree.
Lecture 33/15-10-09 2
Lecture 33/15-10-09 3
FP-Growth
FP-growth Algorithm:
Mining Frequent Patterns
Using FP-tree
Lecture 33/15-10-09 4
Frequent itemset generation using FP-growth
algorithm
• This algo generates frequent itemsets from
FP-tree by traversing in bottom-up fashion.
• This algo extracts frequent itemsets ending in
‘e’ first and then ending in ‘d’, ‘c’, ‘b’ and ‘a’.
• As every trans. is mapped onto a single path
in the FP-tree, so frequent itemsets, say
ending in ‘e’ can be found by investigating
the paths containing node ‘e’.
5
Lecture 33/15-10-09 6
Mining Frequent Patterns Using FP-tree
Lecture 33/15-10-09 7
Major Steps of FP-Growth algorithm
Starting the processing from the end of list L:
Step 1:
Construct conditional pattern base for each item in the header table.
Step 2:
Construct conditional FP-tree from each conditional pattern base.
Step 3:
Recursively mine conditional FP-trees and grow frequent patterns
obtained so far.
Lecture 33/15-10-09 8
Step 1: Construct Conditional Pattern Base
• Starting at the bottom of frequent-item header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent item
• Accumulate all of transformed prefix paths of that item to form a conditional
pattern base
Lecture 33/15-10-09 10
Principles of FP-Growth
(why ‘b’ is not considered?)
Lecture 33/15-10-09 11
FP-Growth
Step 3: Recursively mine the conditional
FP-tree
conditional FP-tree of conditional FP-tree of conditional FP-tree of
add
“m”: (fca:3) “am”: (fc:3) “c” “cam”: (f:3)
{} {}
{} add Frequent Pattern
Frequent Pattern “a” f:3 Frequent Pattern
f:3
f:3 add c:3 add ad
“f” d
“c”
c:3 “f”
conditional FP-tree of conditional FP-tree of
a:3 “cm”: (f:3) of “fam”: 3
add
{} “f”
Frequent Pattern
Frequent Pattern
add conditional FP-tree of
f:3 “fcm”: 3
“f”
Frequent Pattern
fcam
conditional FP-tree of “fm”: 3 Frequent Pattern
{}
All frequent patterns concerning m:
combination of {f, c, a} and m
f:3 m,
c:3 fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree
Lecture 33/15-10-09 14
Summary of FP-Growth
Algorithm
• Mining frequent patterns can be viewed as first
mining 1-itemset and progressively growing each 1-
itemset by mining on its conditional pattern base
recursively
Lecture 33/15-10-09 15
Evaluation of Association patterns
• Objective interestingness measure:
– It uses statistics derived from data to
determine whether a pattern is interesting or
not.
– Examples are support, confidence and
correlation.
• Subjective interestingness measure:
– A pattern is called subjectively interesting if it
reveals unexpected information about the
data/ that can approach to profitable actions.
Lecture 33/15-10-09 16
• Example: {butter} {bread} may not be
interesting b’coz relnship represents
obvious information.
– But {Diapers} {beers} can be interesting
as relnship is quite unexpected and can really
help retailers in cross-selling for making
profits.
• Determining subjective knowledge is little
difficult as it requires prior information from
domain experts.
Lecture 33/15-10-09 17
Different approaches for incorporating
subjective knowledge
• 1. Visualization:
– Domain experts interact with the data mining system
by interpreting and verifying the discovered patterns.
• 2. Template-based approach:
– instead of considering all the rules, only those rules
that specify the user requirement are considered.
• 3. Subjective interestingness measure:
– A subjective measure can be defined depending on
domain information such as concept hierarchy. The
measure can be used to filter patterns that are
obvious and not required.
Lecture 33/15-10-09 18
Objective measures of
interestingness
• It is a data-driven approach for evaluating
the quality of an asso. rule.
• Domain-independent and needs least
input from users (like a threshold value for filtering low-
quality patterns).
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100
Association Rule: Tea → Coffee