The document discusses the FP-growth algorithm for frequent pattern mining. It explains the key concepts of maximal and closed frequent itemsets and how FP-growth works by building an FP-tree and dividing the search space. The algorithm is described with an example transactional database to show how it efficiently finds all frequent itemsets without candidate generation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
10 views13 pages
Association Rule Mining3
The document discusses the FP-growth algorithm for frequent pattern mining. It explains the key concepts of maximal and closed frequent itemsets and how FP-growth works by building an FP-tree and dividing the search space. The algorithm is described with an example transactional database to show how it efficiently finds all frequent itemsets without candidate generation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13
Association Rule Mining-III
Maximal vs Closed Frequent Itemsets
• An itemset X is a max-pattern or Maximal Frequent if X is frequent and there exists no frequent super-pattern Y ⊃ X. • An itemset X is closed if X is frequent and there exists no super-pattern Y ⊃ X, with the same support as X.
• Closed Frequent Itemsets
are Lossless: the support for any frequent itemset can be deduced from the closed frequent itemsets Maximal vs Closed Frequent Itemsets Frequent Pattern Growth Method FP-growth (1) • Example: – Given minimum support threshold of 3 and transactional database T: FP-growth (2) • First scan of database T derives a list L: L = {(f, 4), (c, 4), (a, 3), (b, 3), (m, 3), (p, 3)} – The root of the tree, labeled with ROOT is created.
• Scan the database T second time.
– The scan of the first transaction leads to the construction of the first branch of the FP-tree: {(f, 1), (c, 1), (a, 1), (m, 1), (p, 1)} – For the second transaction: since it shares common items, f, c, and a, it shares the common prefix f, c, a with the previous branch, and extend to the new branch: {(f, 2), (c, 2), (a, 2), (m, 1), (p, 1)} FP-growth (3) • To facilitate tree traversal, an item header table is built, in which each item in L list connects nodes in FP-tree with the same item value through node-links • According to the list of frequent items L, the complete set of frequent itemsets can be divided into subsets (6 for our example) without overlap: 1. frequent itemsets having item p (the end of L list); 2. the itemsets having item m but no p; 3. the frequent itemsets with b and without both m and p; 4.5̃. ...; 6. the large itemsets only with f. FP-growth (4) • For our example, two paths are selected in the FP-tree: {(f, 4), (c, 3), (a, 3), (m, 2), ((p, 2)} and {(c, 1), (b, 1), (p, 1)} – where samples with a frequent item p are {(f, 2), (c, 2), (a, 2), (m, 2), ((p, 2)} and {(c, 1), (b, 1), (p, 1)} – Given threshold value (3) satisfies only frequent itemset {(c, 3), (p, 3)} or simplified {c, p} FP-growth (5) • Next subset of frequent itemsets are these with m and without p.
– FP-tree recognizes paths:
{(f, 4), (c, 3), (a, 3), (m, 2)} and {(f, 4), (c, 3), (a, 3), (b, 1), (m, 1)} – corresponding accumulated samples: {(f, 2), (c, 2), (a, 2), (m, 2)} and {(f, 1), (c, 1), (a, 1), (b, 1), (m, 1)} – Analyzing samples we discover frequent itemset {(f, 3), (c, 3), (a, 3), (m, 3)} or simplified {f, c, a, m} FP-growth (6) • Repeating the same process for subsets 3) to 6) in our example, additional frequent itemsets could be mined.
• In our example these are itemsets {f, c, a} and {f, c} , but they are already subsets of the frequent itemset {f, c, a, m}.
• Therefore the final solution of the FP-growth method is the set
of frequent itemsets, which is in our example: {{c, p} , {f, c, a, m}} • FP-growth algorithm is about an order of magnitude faster than Apriori algorithm. References You may follow the listed books for further reading. 1. Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson Education. 2. Data Mining: Concepts and Techniques, Jiawei Han ,Micheline Kamber, 2nd Edition , Morgan Kaufmann Publisher. 3. Data Warehousing Fundamentals for IT Professionals, Paulraj Ponniah, Second Edition, Wiley India. 4. Introduction to Machine Learning with Python, A. C. Muller and S. Guido, O’Reilly. 5. Data Mining: A Tutorial Based Primer, Richard Roiger, Michael Geatz, Pearson Education. 6. Introduction to Data Mining with Case Studies, G.K. Gupta, PHI.