ch06 Assocrules
ch06 Assocrules
Association Rules
Association Rule Discovery
Supermarket shelf management – Market-basket
model:
Goal: Identify items that are bought together by
sufficiently many customers
Approach: Process the sales data collected with
barcode scanners to find dependencies among
items
A classic rule:
If someone buys diaper and milk, then he/she is
likely to buy beer
Don’t be surprised if you find six-packs next to diapers!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 2
The Market-Basket Model
Input:
A large set of items
e.g., things sold in a
supermarket
A large set of baskets
Each basket is a
small subset of items Output:
e.g., the things one Rules
RulesDiscovered:
Discovered:
{Milk}
{Milk}-->
-->{Coke}
{Coke}
customer buys on one day {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
Want to discover
association rules
People who bought {x,y,z} tend to buy {v,w}
Amazon!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 3
Applications – (1)
Items = products; Baskets = sets of products
someone bought in one trip to the store
Real market baskets: Chain stores keep TBs of
data about what customers buy together
Tells how typical customers navigate stores, lets
them position tempting items
Suggests tie-in “tricks”, e.g., run sale on diapers
and raise the price of beer
Need the rule to occur frequently, or no $$’s
Amazon’s people who bought X also bought Y
For example:
Finding communities in graphs (e.g., Twitter)
12 per
4 bytes per pair
occurring pair
Pass 2:
Only count pairs that hash to frequent buckets
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 111
PCY Algorithm – Between Passes
Replace the buckets by a bit-vector:
1 means the bucket count exceeded the support s
(call it a frequent bucket); 0 means it did not
Bitmap
Main memory
Hash
Hash table
table Counts of
for pairs
candidate
pairs
Pass 1 Pass 2
Bitmap 1 Bitmap 1
First Bitmap 2
hash table
First
Second Counts
hash table Counts of
of
hash table candidate
candidate
pairs
pairs
Bitmap 1
Main memory
First
First hash
hash table
table Bitmap 2
Counts
Countsofof
Second
Second candidate
candidate
hash table
hash table pairs
pairs
Pass 1 Pass 2