Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
— Chapter 5 —
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
diaper}
i.e., every transaction having {beer, diaper, nuts} also
@SIGMOD’00)
Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
February 16, 2022 7
Apriori: A Candidate Generation-and-Test Approach
C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
February 16, 2022 9
The Apriori Algorithm
Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
February 16, 2022 10
Important Details of Apriori
How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
How to count supports of candidates?
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
support
Exploration of shared multi-level mining (Agrawal &
Srikant@VLB’95, Han & Fu@VLDB’95)
mined is maximized
2-D quantitative association rules: Aquan1 Aquan2 Acat
Cluster adjacent
association rules
to form general
rules using a 2-D grid
Example
age(X,”34-35”) income(X,”30-50K”)
buys(X,”high resolution TV”)
Dec.’02
Dimension/level constraint
in relevance to region, price, brand, customer category
$200)
Interestingness constraint
strong rules: min_support 3%, min_confidence
60%
February 16, 2022 24
Constrained Mining vs. Constraint-Based Search
integrate them
Constrained mining vs. query processing in DBMS
Database query processing requires to find all