0% found this document useful (0 votes)
59 views24 pages

Association Analysis-Part2 Notes

The document discusses techniques for efficient candidate generation and support counting in the Apriori algorithm for frequent itemset mining. It describes: 1) How candidate k-itemsets are generated by merging frequent (k-1)-itemsets. 2) How candidate pruning eliminates candidate itemsets that cannot be frequent based on subsets not meeting minimum support. 3) How a hash tree can be used to efficiently count support by hashing itemsets to candidate nodes during a single transaction scan.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views24 pages

Association Analysis-Part2 Notes

The document discusses techniques for efficient candidate generation and support counting in the Apriori algorithm for frequent itemset mining. It describes: 1) How candidate k-itemsets are generated by merging frequent (k-1)-itemsets. 2) How candidate pruning eliminates candidate itemsets that cannot be frequent based on subsets not meeting minimum support. 3) How a hash tree can be used to efficiently count support by hashing itemsets to candidate nodes during a single transaction scan.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Association Analysis-part2

Candidate Generation and Pruning


• The apriori-gen function generates candidate itemsets by
performing the following two operations:
1. Candidate Generation. This operation generates new
candidate k-itemsets based on the frequent (k- 1)-itemsets
found in the previous iteration.

2. Candidate Pruning. This operation eliminates some of the


candidate k-itemsets using the support-based pruning
strategy.
Candidate Generation: Brute-force method
Candidate Generation: Merge Fk-1 and F1 itemsets
Candidate Generation: Fk-1 x Fk-1 Method

• Merge two frequent (k-1)-itemsets if their first (k-2) items are


identical

• F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
– Merge(ABC, ABD) = ABCD
– Merge(ABC, ABE) = ABCE
– Merge(ABD, ABE) = ABDE

– Do not merge(ABD,ACD) because they share only


prefix of length 1 instead of length 2
Candidate Generation: Fk-1 x Fk-1 Method
Candidate Pruning
• Let F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} be the
set of frequent 3-itemsets

• L4 = {ABCD,ABCE,ABDE} is the set of candidate


4-itemsets generated (from previous slide)

• Candidate pruning
– Prune ABCE because ACE and BCE are infrequent
– Prune ABDE because ADE is infrequent

• After candidate pruning: L4 = {ABCD}


Alternate Fk-1 x Fk-1 Method

• Merge two frequent (k-1)-itemsets if the last (k-2) items of the


first one is identical to the first (k-2) items of the second.

• F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
– Merge(ABC, BCD) = ABCD
– Merge(ABD, BDE) = ABDE
– Merge(ACD, CDE) = ACDE
– Merge(BCD, CDE) = BCDE
Candidate Pruning for Alternate Fk-1 x Fk-1 Method
• Let F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} be the
set of frequent 3-itemsets

• L4 = {ABCD,ABDE,ACDE,BCDE} is the set of


candidate 4-itemsets generated (from previous
slide)
• Candidate pruning
– Prune ABDE because ADE is infrequent
– Prune ACDE because ACE and ADE are infrequent
– Prune BCDE because BCE
• After candidate pruning: L4 = {ABCD}
Support Counting
• One approach for doing this is to compare
each transaction against every candidate
itemset.
• and update the support counts of candidates
contained in the transaction.
• This approach is computationally expensive,
especially when the numbers of transactions
and candidate itemsets are large.
• An alternative approach is to enumerate the
itemsets contained in each transaction
Support Counting Using a Hash {1,4,5}
{1,2,4}
Tree {4,5,7}
{1,2,5}
Hash Function Candidate Hash Tree {4,5,8}
{1,5,9}
{1,3,6}
{2,3,4}
1,4,7 3,6,9
{5,6,7}
2,5,8 {3,4,5}
234 {3,5,6}
{3,5,7}
567 {6,8,9}
{3,6,7}
145 136 {3,6,8}
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458
Support Counting Using a Hash Tree
Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458
Support Counting Using a Hash Tree
Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458
Support Counting Using a Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9

2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458
Support Counting Using a Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
Rule Generation


Confidence based Pruning

You might also like