CSC-452--Lecture 07
Association Rules
Aniqa Naeem
Email : [email protected]
1
Previous Lecture
Supmin = 2 Itemset sup
Database TDB {A} 2
L1 Itemset sup
Tid Items C1 {B} 3
{A} 2
10 A, C, D {C} 3
1st scan {B} 3
20 B, C, E {D} 1
{C} 3
30 A, B, C, E {E} 3
{E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
C3 Itemset L3 Itemset sup
3rd scan C4 = Terminate
{B, C, E} {B, C, E} 2
Previous Lecture
Generate Ck using Lk-1 to find Lk
Join
Prune
07/06/2021 3
Previous Lecture
Itemset Support Confidence
count Nonempty subsets
{I1, I2, I3} 2 2/4 = 50%
{I1, I2}
{I1, I2, I5} 2
{I1, I5} 2/2 = 100%
TID List of items
{I2, I5} 2/2 = 100%
T100 I1, I2, I5
{I1} 2/6 = 33%
T200 I2, I4
T300 I2, I3 {I2} 2/7 = 29%
T400 I1, I2, I4
{I5} 2/2 = 100%
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
For a min_confidence = 70%
T900 I1, I2, I3
Data Mining 2013 – Mining Frequent
07/06/2021 4
Patterns, Association, and Correlations
Agenda
Frequent Itemset Pattern Evaluation
The Basics
Mining Methods Methods
Market Basket Apriori Algorithm
Analysis Generating
Frequent Itemsets Association Rules
Association Rules from Frequent
Itemsets
FP-Growth
Chapter 6: Mining Frequent Patterns,
Associations and Correlations: Basics Concept
and Methods
Book: Data Mining: Concept and Techniques,
4th Edition by J.Han, J.Pei, M.Kamber
Association Rules
FP-Growth
6
Bottleneck of Frequent-pattern Mining
• The core of the Apriori algorithm:
– Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets
– Use database scan and pattern matching to collect counts for the candidate
itemsets
• Multiple database scans are costly
– Needs (n +1 ) scans, n is the length of the longest pattern
• Mining long patterns needs many passes of scanning and generates lots of
candidates
– To find frequent itemset i1i2…i100
• # of scans: 100
• # of Candidates: (1001) + (1002) + … + (100100) = 2100-1 = 1.27*1030 !
• Bottleneck: candidate-generation-and-test
• Can we avoid candidate generation?
7
Mining Frequent Patterns Without Candidate
Generation
• Compress a large database into a compact, Frequent-
Pattern tree (FP-tree) structure
– highly condensed, but complete for frequent pattern mining
– avoid costly database scans
• Develop an efficient, FP-tree-based frequent pattern
mining method
– A divide-and-conquer methodology: decompose mining
tasks into smaller ones
– Avoid candidate generation: sub-database test only!
8
Mining Frequent Patterns Without Candidate
Generation
• Grow long patterns from short ones using
local frequent items
– “abc” is a frequent pattern
– Get all transactions having “abc”: DB|abc
– “d” is a local frequent item in DB|abc abcd is a
frequent pattern
9
Mining Frequent Itemsets
FP-Growth Compare
Transactional data example candidate support
N=9, min_supp count=2 Scan dataset for with min_supp
count of each
TID List of items candidate
L1 - Reordered
T100 I1, I2, I5
C1 Itemset Support Itemset Support
T200 I2, I4 count count
T300 I2, I3 {I1} 6 {I2} 7
T400 I1, I2, I4 {I2} 7 {I1} 6
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
07/06/2021 10
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree
null { }
L1 - Reordered
Itemset Support
count
{I2} 7
{I1} 6
{I3} 6
{I4} 2
{I5} 2
07/06/2021 11
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree
null { }
T100
L1 - Reordered
I2:1
Itemset Support
count
{I2} 7 TID List of items
{I1} 6 I1:1 T100 I2, I1, I5
{I3} 6 T200 I2, I4
{I4} 2 T300 I2, I3
I5:1 T400 I2, I1, I4
{I5} 2
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
Order of items is kept throughout path construction, with common
prefixes shared whenever applicable T900 I2, I1, I3
07/06/2021
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree
null { }
L1 - Reordered
I2:1 T200
Itemset Support
count
{I2} 7 TID List of items
{I1} 6 I1:1 I4:1 T100 I2, I1, I5
{I3} 6 T200 I2, I4
{I4} 2 T300 I2, I3
I5:1 T400 I2, I1, I4
{I5} 2
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
Data Mining 2013 – Mining Frequent
07/06/2021 13
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree
null { }
L1 - Reordered
I2:2 T200
Itemset Support
count
{I2} 7 TID List of items
{I1} 6 I1:1 I4:1 T100 I2, I1, I5
{I3} 6 T200 I2, I4
T300 I2, I3
{I4} 2
I5:1 T400 I2, I1, I4
{I5} 2
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
Data Mining 2013 – Mining Frequent
07/06/2021 14
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree null { }
L1 - Reordered
I2:2
Itemset Support
count
{I2} 7 T300 TID List of items
{I1} 6 I1:1 I3:1 I4:1 T100 I2, I1, I5
{I3} 6 T200 I2, I4
T300 I2, I3
{I4} 2
I5:1 T400 I2, I1, I4
{I5} 2
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
Data Mining 2013 – Mining Frequent
07/06/2021 15
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree null { }
L1 - Reordered
I2:3
Itemset Support
count
{I2} 7 T300 TID List of items
{I1} 6 I1:1 I3:1 I4:1
T100 I2, I1, I5
{I3} 6 T200 I2, I4
{I4} 2 T300 I2, I3
I5:1
{I5} 2 T400 I2, I1, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
Data Mining 2013 – Mining Frequent T900 I2, I1, I3
07/06/2021 16
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – FP-tree Construction
FP-tree
Trace the node link path for each node entry and
you get that item’s support count null { }
L1 - Reordered
I2:7 I1:2
Itemset Support Node
count Link
{I2} 7
{I1} 6 I1:4 I3:2 I4:1 I3:2
{I3} 6
TID List of items
{I4} 2 T100 I2, I1, I5
I5:1 I3:2 I4:1
{I5} 2 T200 I2, I4
T300 I2, I3
T400 I2, I1, I4
T500 I1, I3
For Tree I5:1
T600 I2, I3
Traversal T700 I1, I3
T800 I2, I1, I3, I5
Data Mining 2013 – Mining Frequent
07/06/2021 T900 I2, I1, I3 17
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – Frequent Patterns Mining
FP-tree
Bottom-up algorithm – start from leaves and go up to
root – I5 for example has two paths to root null { }
L1 - Reordered
Itemset Support Node I2:7 I1:2
count Link
{I2} 7
{I1} 6 I1:4 I3:2 I4:1 I3:2
{I3} 6
{I4} 2
{I5} 2 I5:1 I3:2 I4:1
{I3, I5} frequency < min_support
I5:1 count threshold
Data Mining 2013 – Mining Frequent
07/06/2021 18
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – Conditional FP-tree Construction
FP-tree
For I5
null { }
L1 - Reordered
Itemset Support Node
count Link
{I2} 7 TID List of items
{I1} 6 T100 I2, I1, I5
{I3} 6 Eliminate transactions T200 I2, I4
not including I5 T300 I2, I3
{I4} 2
T400 I2, I1, I4
{I5} 2
T500 I1, I3
Eliminate I5
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
19
Mining Frequent Itemsets
FP-Growth – Conditional FP-tree Construction
FP-tree
For I5
null { }
L1 - Reordered
I2:1
Itemset Support Node
count Link
{I2} 7 TID List of items
{I1} 6 I1:1 T100 I2, I1, I5
{I3} 6 Eliminate transactions not T200 I2, I4
including I5 T300 I2, I3
{I4} 2
T400 I2, I1, I4
{I5} 2
T500 I1, I3
Eliminate I5
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
20
Mining Frequent Itemsets
FP-Growth – Conditional FP-tree Construction
FP-tree
For I5
null { }
L1 - Reordered
Itemset Support Node
count Link I2:2
{I2} 7
TID List of items
{I1} 6
I1:2 T100 I2, I1, I5
{I3} 6
Eliminate transactions T200 I2, I4
{I4} 2 not including I5 T300 I2, I3
{I5} 2 I3:1 T400 I2, I1, I4
T500 I1, I3
Eliminate I5
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
Data Mining 2013 – Mining Frequent
07/06/2021 21
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth
Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated
I5 {{I2, I1: 1}, {I2, I1, I3: 1}} {I2, I5: 2}, {I1, I5: 2},
{I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 1}, {I1: 2}} {I2, I3: 4}, {I1, I3: 4},
{I2, I1, I3: 2}
I1 {{I2: 4}} {I2, I1: 4}
Paths to which item is suffix Prefix paths to item
after eliminating
infrequent items
Data Mining 2013 – Mining Frequent
07/06/2021 22
Patterns, Association, and Correlations
Mining Frequent Itemsets
FP-Growth – Conditional FP-tree Construction
FP-tree
For I4
null { }
L1 - Reordered
Itemset Support Node
count Link I2:2
{I2} 7
TID List of items
{I1} 6
I1:1 T100 I2, I1, I5
{I3} 6
Eliminate transactions T200 I2, I4
{I4} 2 not including I4 T300 I2, I3
{I5} 2 T400 I2, I1, I4
T500 I1, I3
Eliminate I4
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
23
Mining Frequent Itemsets
FP-Growth – Conditional FP-tree Construction
For I3
FP-tree null { }
L1 - Reordered
Itemset Support Node I2:4 I1:2
count Link
{I2} 7
TID List of items
{I1} 6
I1:2 T100 I2, I1, I5
{I3} 6 Eliminate transactions not T200 I2, I4
{I4} 2 including I3
T300 I2, I3
{I5} 2 T400 I2, I1, I4
T500 I1, I3
Eliminate I3
T600 I2, I3
T700 I1, I3
T800 I2, I1, I3, I5
T900 I2, I1, I3
Mining Frequent Itemsets
FP-Growth
Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated
I5 {{I2, I1: 1}, {I2, I1, I3: 1}} {I2, I5: 2}, {I1, I5: 2},
{I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 1}, {I1: 2}} {I2, I3: 4}, {I1, I3: 4},
{I2, I1, I3: 2}
I1 {{I2: 4}} {I2, I1: 4}
Data Mining 2013 – Mining Frequent
07/06/2021 25
Patterns, Association, and Correlations
Exercise: Construct FP-Tree
TID Items bought
100 {f, a, c, d, g, i, m, p}
200 {a, b, c, f, l, m, o}
300 {b, f, h, j, o, w}
400 {b, c, k, s, p}
500 {a, f, c, e, l, p, m, n}
min_support = 3
26
Exercise: Construct FP-Tree (Solution)
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b}
400 {b, c, k, s, p} {c, b, p} min_support = 3
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Header Table
1. Scan DB once, find frequent 1-
itemset (single item pattern) f:4 c:1
Item frequency head
2. Sort frequent items in frequency
descending order, f-list f 4
c 4 c:3 b:1 b:1
3. Scan DB again, construct FP-tree
a 3
b 3 a:3 p:1
m 3
p 3
m:2 b:1
F-list = f-c-a-b-m-p p:2 m:1
Agenda
Frequent Itemset Pattern Evaluation
The Basics
Mining Methods Methods
Market Basket Apriori Algorithm
Analysis Generating
Frequent Itemsets Association Rules
Association Rules from Frequent
Itemsets
FP-Growth
Data Mining 2013 – Mining Frequent
07/06/2021 28
Patterns, Association, and Correlations
Pattern Evaluation Methods
• Not
all association rules are interesting
– buys(X, “computer games”) buys(X, “videos”) [40%, 66%]
– P(“videos”) is already 75% > 66%
– The two items are negatively associated buying one
decreases the likelihood of buying the other
• We need to measure “real strength” of rule
• Correlation analysis
Data Mining 2013 – Mining Frequent
07/06/2021 29
Patterns, Association, and Correlations
Pattern Evaluation Methods
30 07/06/202
1
References
1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques
Third Edition, Elsevier, 2012
2. Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical Machine
Learning Tools and Techniques 3rd Edition, Elsevier, 2011
3. Markus Hofmann and Ralf Klinkenberg, RapidMiner: Data Mining Use Cases
and Business Analytics Applications, CRC Press Taylor & Francis Group, 2014
4. Daniel T. Larose, Discovering Knowledge in Data: an Introduction to Data
Mining, John Wiley & Sons, 2005
5. Ethem Alpaydin, Introduction to Machine Learning, 3rd ed., MIT Press, 2014
6. Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer,
2011
7. Oded Maimon and Lior Rokach, Data Mining and Knowledge Discovery
Handbook Second Edition, Springer, 2010
8. Warren Liao and Evangelos Triantaphyllou (eds.), Recent Advances in Data
Mining of Enterprise Data: Algorithms and Applications, World Scientific, 2007
31
32