0% found this document useful (0 votes)
10 views13 pages

Association Rule Mining3

The document discusses the FP-growth algorithm for frequent pattern mining. It explains the key concepts of maximal and closed frequent itemsets and how FP-growth works by building an FP-tree and dividing the search space. The algorithm is described with an example transactional database to show how it efficiently finds all frequent itemsets without candidate generation.

Uploaded by

Kristofar Lolan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Association Rule Mining3

The document discusses the FP-growth algorithm for frequent pattern mining. It explains the key concepts of maximal and closed frequent itemsets and how FP-growth works by building an FP-tree and dividing the search space. The algorithm is described with an example transactional database to show how it efficiently finds all frequent itemsets without candidate generation.

Uploaded by

Kristofar Lolan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Association Rule Mining-III

Maximal vs Closed Frequent Itemsets


• An itemset X is a max-pattern or Maximal Frequent if X is
frequent and there exists no frequent super-pattern Y ⊃ X.
• An itemset X is closed if X is frequent and there exists no
super-pattern Y ⊃ X, with the same support as X.

• Closed Frequent Itemsets


are Lossless: the support for
any frequent itemset can be
deduced from the closed
frequent itemsets
Maximal vs Closed Frequent Itemsets
Frequent Pattern Growth
Method
FP-growth (1)
• Example:
– Given minimum support threshold of 3 and transactional
database T:
FP-growth (2)
• First scan of database T derives a list L:
L = {(f, 4), (c, 4), (a, 3), (b, 3), (m, 3), (p, 3)}
– The root of the tree, labeled with ROOT is created.

• Scan the database T second time.


– The scan of the first transaction leads to the construction of
the first branch of the FP-tree:
{(f, 1), (c, 1), (a, 1), (m, 1), (p, 1)}
– For the second transaction: since it shares common items, f,
c, and a, it shares the common prefix f, c, a with the previous
branch, and extend to the new branch:
{(f, 2), (c, 2), (a, 2), (m, 1), (p, 1)}
FP-growth (3)
• To facilitate tree traversal, an item header table is built, in
which each item in L list connects nodes in FP-tree with the
same item value through node-links
• According to the list of frequent items L, the complete set of
frequent itemsets can be divided into subsets (6 for our
example) without overlap:
1. frequent itemsets having item p (the end of L list);
2. the itemsets having item m but no p;
3. the frequent itemsets with b and without both m and p;
4.5̃. ...;
6. the large itemsets only with f.
FP-growth (4)
• For our example, two paths are selected in the FP-tree:
{(f, 4), (c, 3), (a, 3), (m, 2), ((p, 2)} and
{(c, 1), (b, 1), (p, 1)}
– where samples with a frequent item p are
{(f, 2), (c, 2), (a, 2), (m, 2), ((p, 2)} and
{(c, 1), (b, 1), (p, 1)}
– Given threshold value (3) satisfies only frequent itemset
{(c, 3), (p, 3)} or simplified {c, p}
FP-growth (5)
• Next subset of frequent itemsets are these with m and without
p.

– FP-tree recognizes paths:


{(f, 4), (c, 3), (a, 3), (m, 2)} and
{(f, 4), (c, 3), (a, 3), (b, 1), (m, 1)}
– corresponding accumulated samples:
{(f, 2), (c, 2), (a, 2), (m, 2)} and
{(f, 1), (c, 1), (a, 1), (b, 1), (m, 1)}
– Analyzing samples we discover frequent itemset
{(f, 3), (c, 3), (a, 3), (m, 3)} or simplified {f, c, a, m}
FP-growth (6)
• Repeating the same process for subsets 3) to 6) in our
example, additional frequent itemsets could be mined.

• In our example these are itemsets {f, c, a} and {f, c} , but they
are already subsets of the frequent itemset {f, c, a, m}.

• Therefore the final solution of the FP-growth method is the set


of frequent itemsets, which is in our example:
{{c, p} , {f, c, a, m}}
• FP-growth algorithm is about an order of magnitude faster
than Apriori algorithm.
References
You may follow the listed books for further reading.
1. Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin
Kumar, Pearson Education.
2. Data Mining: Concepts and Techniques, Jiawei Han ,Micheline Kamber, 2nd
Edition , Morgan Kaufmann Publisher.
3. Data Warehousing Fundamentals for IT Professionals, Paulraj Ponniah,
Second Edition, Wiley India.
4. Introduction to Machine Learning with Python, A. C. Muller and S. Guido,
O’Reilly.
5. Data Mining: A Tutorial Based Primer, Richard Roiger, Michael Geatz,
Pearson Education.
6. Introduction to Data Mining with Case Studies, G.K. Gupta, PHI.

You might also like