0% found this document useful (0 votes)
23 views33 pages

Unit4 2 Association Rules FP Growth

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views33 pages

Unit4 2 Association Rules FP Growth

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

4.

3 FP-Growth, FP-Tree

1
Scalable Frequent Itemset Mining Methods

1. Apriori: A Candidate Generation-and-Test Approach

• Also need to Improving the Efficiency of Apriori

2. FPGrowth: A Frequent Pattern-Growth Approach

2
Apriori vs FPGrowth
• Bottlenecks of the Apriori approach
• Breadth-first (i.e., level-wise) search
• Candidate generation and test
• Often generates a huge number of candidates
• The FPGrowth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
• Depth-first search
• Avoid explicit candidate generation
• Major philosophy: Grow long patterns from short ones using local frequent items only
• “abc” is a frequent pattern
• Get all transactions having “abc”, i.e., project DB on abc: DB|abc
• “d” is a local frequent item in DB|abc à abcd is a frequent pattern
3
Frequent Pattern (FP) Growth Method
•- Mining frequent itemsets without candidate generation.
•- It is a divide and conquers strategy.
•- It compress the database representing frequent items into a frequent –pattern
tree (FP- Tree), which retains the itemsets association information.
•- Divides the compressed database into a set of conditional databases, each
associated with one frequent item or pattern fragment and then mines each such
database separately.
•- FP-Growth method transforms the problem of finding long frequent patterns to
searching for shorter ones recursively and then concatenating the suffix.
•- It uses least frequent items as suffix .
Adv: Reduce search cost, has good selectivity, faster than apriori.
Disadv: When the database is large, it is sometimes unrealistic to construct a
main memory based FP-tree.
Frequent Pattern (FP) Growth Algorithm has 2 steps:

Step 1: Build FP-Tree (FP-Tree algorithm )


•- Create root node of tree, labeled with null.
•- Scan the transactional database.
•- The items in each transaction are processed in sorted order (Descending) and
branch is created for each transaction.

•Step2: Extract Frequent Itemset (Conditional FP-Tree algorithm )


•- Start from each frequent length pattern as an initial suffix pattern.
•- Construct conditional pattern base. (Pattern base is a sub database which
consists of the set of prefix paths in the FP-tree co-occurring with suffix pattern.
•- Construct its FP-tree and perform mining recursively on such a tree
Tid Items

T100 I1,I2,I5

T200 I2,I4

T300 I2,I3

T400 I1,I2,I4

T500 I1,I3

T600 I2,I3

T700 I1,I3

T800 I1,I2,I3,I5

T900 I1,I2,I3
Calculate Support Count (Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2
Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5

T900 I1,I2,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2
Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2

Tid Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
Calculate Support Count
(Descending order):
I2:7
I1:6
I3:6
I4:2
I5:2
Summary of problem solution (FROM BOOK)
Write in this way in exam::
Benefits of the FP-tree Structure

• Completeness
• Preserve complete information for frequent pattern mining
• Never break a long pattern of any transaction
• Compactness
• Reduce irrelevant info—infrequent items are gone
• Items in frequency descending order: the more frequently
occurring, the more likely to be shared
• Never be larger than the original database (not count node-
links and the count field)

32
Advantages of the Pattern Growth Approach

• Divide-and-conquer:
• Decompose both the mining task and DB according to the frequent
patterns obtained so far
• Lead to focused search of smaller databases
• Other factors
• No candidate generation, no candidate test
• Compressed database: FP-tree structure
• No repeated scan of entire database
• Basic ops: counting local freq items and building sub FP-tree, no pattern
search and matching
• A good open-source implementation and refinement of FPGrowth
• FPGrowth+ (Grahne and J. Zhu, FIMI'03)
33
Q: What is the most significant advantage of FP-Tree? Why FP-
Tree is complete in relevance to frequent pattern mining?
• Efficiency, the most significant advantage of the FP-tree is that it requires
two scans to the underlying database (and only two scans) to construct
the FP-tree. This efficiency is further apparent in database with prolific
and long patterns or for mining frequent patterns with low support
threshold.
• As each transaction in the database is mapped to one path in the FP-Tree,
therefore, the frequent item-set information in each transaction is
completely stored in the FP-Tree. Besides, one path in the FP-Tree may
represent frequent item-sets in multiple transactions without ambiguity
since the path representing every transaction must start from the root of
each item prefix sub-tree.

You might also like