0% found this document useful (0 votes)
12 views51 pages

CH 4

Uploaded by

Revathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views51 pages

CH 4

Uploaded by

Revathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

1

CS 43105 Data Mining Techniques


Chapter 4 Frequent Itemset Mining
Xiang Lian
Department of Computer Science
Kent State University
Email: [email protected]
Homepage: http://www.cs.kent.edu/~xlian/
2

Outline
• Basic Concepts
• Frequent Itemset Mining
3

What is Data Mining?


• =Pattern Mining?
• What patterns?
• Why are they useful?
4

What Is Frequent Pattern Analysis?


• Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in
the context of frequent itemsets and association rule mining
• Motivation: Finding inherent regularities in data
• What products were often purchased together?— Beer and diapers?!

• What are the subsequent purchases after buying a PC?

• What kinds of DNA are sensitive to this new drug?

• Can we automatically classify Web documents?

• Applications
• Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
5

Why Is Frequent Pattern Mining


Important?
• Frequent pattern: An intrinsic and important
property of datasets
• Foundation for many essential data mining tasks
• Association, correlation, and causality analysis
• Sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
• Classification: discriminative, frequent pattern analysis
• Cluster analysis: frequent pattern-based clustering
• Data warehousing: iceberg cube and cube-gradient
• Semantic data compression: fascicles
• Broad applications
6

Frequent Patterns/Itemsets
TID Items
• itemset: A set of one or more
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
items
• Example: {Milk, Bread, Diaper}
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer • k-itemset X = {x1, …, xk}
5 Bread, Milk, Diaper, Coke • An itemset that contains k items

Customer Customer
buys both buys diaper

Customer
buys beer
7

Definition: Frequent Itemset


• Absolute support or support
count ()
• Frequency of occurrence of an
TID Items
itemset 1 Bread, Milk
• E.g. ({Milk, Bread,Diaper}) = 2 2 Bread, Diaper, Beer, Eggs
• Relative support or support 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
• Fraction of transactions that contain
5 Bread, Milk, Diaper, Coke
an itemset
• E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent itemset
• An itemset whose support is
greater than or equal to a minsup
threshold
Frequent Itemsets Mining
TID Transactions
• Minimum support
100 { A, B, E }
level 50%
200 { B, D }
• {A}, {B}, {C}, {A,B},
300 { A, B, E } {A,C}
400 { A, C }
500 { B, C }
600 { A, C }
700 { A, B }
800 { A, B, C, E }
900 { A, B, C }
1000 { A, C, E } 8
9

Three Different Views of FIM


TID Items
• Transactional Database
1 Bread, Milk
• How we do store a transactional 2 Bread, Diaper, Beer, Eggs
database? 3 Milk, Diaper, Beer, Coke
• Horizontal, Vertical, Transaction-Item Pair
4 Bread, Milk, Diaper, Beer
• Binary Matrix 5 Bread, Milk, Diaper, Coke
• Bipartite Graph

TIDs

items
9
This Photo by Unknown Author is licensed under CC BY-SA
10

Representation of Transactional
Databases
• Horizontal vs. vertical data layout

Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B
11

Representation of Transactional
Databases (cont'd)
• Binary Matrix Representation
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12

Frequent Itemset Generation


null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there


ABCDE
are 2d possible
candidate itemsets
13

Frequent Itemset Generation


• Brute-force approach:
• Each itemset in the lattice is a candidate frequent itemset
• Count the support of each candidate by scanning the
database Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
• Match each transaction against every candidate
• Complexity ~ O(NMw) => Expensive since M = 2d !!!
14

Scalable Frequent Itemset Mining


Methods
• Scalable mining methods: Three major approaches
• Apriori (Agrawal & Srikant@VLDB’94)
• Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
• Frequent pattern growth (FPgrowth—Han, Pei & Yin
@SIGMOD’00)
15

Reducing the Number of Candidates


• Apriori pruning principle:
• If an itemset is frequent, then all of its subsets must also be
frequent
• Reason: every transaction having {beer, diaper, nuts} also
contains the subset {beer, diaper}
• If there is any itemset which is infrequent, its superset
should not be generated/tested!
• Apriori pruning principle holds due to the following
property of the support measure:
X , Y : ( X  Y )  s( X )  s(Y )
• Support of an itemset never exceeds the support of its
subsets
• This is known as the anti-monotone property of support
16

Illustrating Apriori Principle


null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
17

Illustrating Apriori Principle


Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered, Itemset Count


6
C1 + 6C2 + 6C3 = 41 {Bread,Milk,Diaper} 3
With support-based pruning,
6 + 6 + 1 = 13
18

The Apriori Algorithm—Another Example


Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1 {E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
19

Apriori: A Candidate Generation & Test


Approach
• Apriori Method
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Test the candidates against DB
• Terminate when no frequent or candidate set can be
generated
20

The Apriori Algorithm (Pseudo-Code)


Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
21

Apriori

R. Agrawal and R. Srikant. Fast algorithms for mining association rules.


VLDB, 487-499, 1994
22
23

Implementation of Apriori
• How to generate candidates?
• Step 1: self-joining Lk
• Step 2: pruning
• Example of Candidate-generation
• L3={abc, abd, acd, ace, bcd}

• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
• Pruning:
• acde is removed because ade is not in L3

• C = {abcd}
24

How to Generate Candidates?


• Suppose the items in Lk-1 are listed in an order

• Step 1: self-joining Lk-1 (SQL Implementation)


insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 <
q.itemk-1
• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
25

Challenges of Frequent Itemset Mining


• Challenges
• Multiple scans of transaction database

• Huge number of candidates

• Tedious workload of support counting for candidates

• Improving Apriori: general ideas


• Reduce passes of transaction database scans

• Shrink number of candidates

• Facilitate support counting of candidates


26

Alternative Methods for Frequent Itemset


Generation
• Representation of Transactional Database
• horizontal vs. vertical data layout
Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B
27

ECLAT: Mining by Exploring Vertical


Data Format
• For each item, store a list of transaction ids (tids)

Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B TID-list
28

ECLAT
• Determine support of any k-itemset by intersecting
tid-lists of two of its (k-1) subsets.
A B AB
1 1 1
4 2 5
5
6
 5
7
 7
8
7 8
8 10
9

• 3 traversal approaches:
• top-down, bottom-up and hybrid
• Advantage: very fast support counting
• Disadvantage: intermediate tid-lists may become too
large for memory
29
30
31

FP-Growth Algorithm
• Use a compressed representation of the
database using an FP-tree

• Once an FP-tree has been constructed, it uses a


recursive divide-and-conquer approach to mine
the frequent itemsets
32

Construct FP-tree from a Transaction


Database
TID items Items bought (ordered) frequent
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 3
300 {b, f, h, j, o, w} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, construct m:2 b:1
FP-tree
F-list = f-c-a-b-m-p p:2 m:1
33

Partition Patterns and Databases


• Frequent patterns can be partitioned into subsets
according to f-list
• F-list = f-c-a-b-m-p
• Patterns containing p
• Patterns having m but no p
•…
• Patterns having c but no a nor b, m, p
• Pattern f
• Completeness and non-redundency
34

Find Patterns Having P From P-conditional


Database
• Starting at the frequent item header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent
item p
• Accumulate all of transformed prefix paths of item p to form
p’s conditional pattern base
{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
35

From Conditional Pattern-bases to


Conditional FP-trees
• For each pattern-base
• Accumulate the count for each item in the base
• Construct the FP-tree for the frequent items of the
pattern base
m-conditional pattern base:
{} fca:2, fcab:1
Header Table
Item frequency head All frequent
f:4 c:1 patterns relate to m
f 4 {}
c 4 c:3 b:1 b:1 m,
 fm, cm, am,
a 3 f:3 
b 3 a:3 p:1 fcm, fam, cam,
m 3 c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
36

Recursion: Mining Each Conditional FP-tree


{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}
Cond. pattern base of “cam”: (f:3) f:3
cam-conditional FP-tree
37

Another Example -- FP-Tree


Construction
null
After reading TID=1:

A:1
TID Items
1 {A,B}
2 {B,C,D} B:1
3 {A,C,D,E}
4 {A,D,E} After reading TID=2:
5 {A,B,C} null
6 {A,B,C,D} B:1
A:1
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:1 C:1
10 {B,C,E}
D:1
38

Another Example -- FP-Tree


Construction
TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table D:1


C:3 E:1
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
E frequent itemset generation
39

FP-Growth
Conditional Pattern base
null
for D:
P = {(A:1,B:1,C:1),
A:7 B:1 (A:1,B:1),
(A:1,C:1),
(A:1),
B:5 C:1 (B:1,C:1)}
C:1 D:1
Recursively apply FP-
D:1 growth on P
C:3
D:1
D:1 Frequent Itemsets found
(with sup > 1):
D:1 AD, BD, CD, ACD, BCD
41

Compact Representation of Frequent


Itemsets
• Some itemsets are redundant because they have identical
support as their supersets
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

• Number of frequent itemsets 10 


 3   
10

k
k 1

• Need a compact representation


42

Maximal Frequent Itemset


An itemset is maximal frequent if none of its
immediate supersets is frequent
null

Maximal A B C D E
Itemsets

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE


Border
Infrequent
Itemsets
ABCD
E
43

Closed Itemset
• An itemset is closed if none of its immediate
supersets has the same support as the itemset
Itemset Support
{A} 4
TID Items Itemset Support
{B} 5
1 {A,B} {A,B,C} 2
{C} 3
2 {B,C,D} {A,B,D} 3
{D} 4
3 {A,B,C,D} {A,C,D} 2
{A,B} 4 {B,C,D} 3
4 {A,B,D}
{A,C} 2 {A,B,C,D} 2
5 {A,B,C,D}
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
44

Maximal vs. Closed Itemsets


Transaction Ids
null
TID Items
1 ABC 124 123 1234 245 345
A B C D E
2 ABCD
3 BCE
4 ACDE 12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE
5 DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

2 4
ABCD ABCE ABDE ACDE BCDE

Not supported by
any transactions
ABCDE
45

Maximal vs. Closed Frequent Itemsets


Minimum support = 2 null Closed but
not
maximal
124 123 1234 245 345
A B C D E
Closed and
maximal

12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

2 4
ABCD ABCE ABDE ACDE BCDE # Closed = 9
# Maximal = 4

ABCDE
46

Maximal vs. Closed Itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets
47

Ref: Basic Concepts of Frequent Pattern Mining

• (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining


association rules between sets of items in large databases. SIGMOD'93
• (Max-pattern) R. J. Bayardo. Efficiently mining long patterns from databases.
SIGMOD'98
• (Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering
frequent closed itemsets for association rules. ICDT'99
• (Sequential pattern) R. Agrawal and R. Srikant. Mining sequential patterns.
ICDE'95
48

Ref: Apriori and Its Improvements


• R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94
• H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering
association rules. KDD'94
• A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining
association rules in large databases. VLDB'95
• J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95
• H. Toivonen. Sampling large databases for association rules. VLDB'96
• S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and
implication rules for market basket analysis. SIGMOD'97
• S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with
relational database systems: Alternatives and implications. SIGMOD'98
49

Ref: Depth-First, Projection-Based FP Mining


• R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of
frequent itemsets. J. Parallel and Distributed Computing, 2002.
• G. Grahne and J. Zhu, Efficiently Using Prefix-Trees in Mining Frequent Itemsets, Proc.
FIMI'03
• B. Goethals and M. Zaki. An introduction to workshop on frequent itemset mining
implementations. Proc. ICDM’03 Int. Workshop on Frequent Itemset Mining
Implementations (FIMI’03), Melbourne, FL, Nov. 2003
• J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.
SIGMOD’ 00
• J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic
Projection. KDD'02
• J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed Patterns without
Minimum Support. ICDM'02
• J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best Strategies for Mining
Frequent Closed Itemsets. KDD'03
50

Ref: Vertical Format and Row Enumeration Methods

• M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for

discovery of association rules. DAMI:97.


• M. J. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset

Mining, SDM'02.
• C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-Pruning

Algorithm for Itemsets with Constraints. KDD’02.


• F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER: Finding

Closed Patterns in Long Biological Datasets. KDD'03.


• H. Liu, J. Han, D. Xin, and Z. Shao, Mining Interesting Patterns from Very High

Dimensional Data: A Top-Down Row Enumeration Approach, SDM'06.


51

Ref: Mining Correlations and Interesting Rules


• S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing
association rules to correlations. SIGMOD'97.
• M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding
interesting rules from large sets of discovered association rules. CIKM'94.
• R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Measures of Interest.
Kluwer Academic, 2001.
• C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining
causal structures. VLDB'98.
• P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure
for Association Patterns. KDD'02.
• E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE’03.
• T. Wu, Y. Chen, and J. Han, “Re-Examination of Interestingness Measures in Pattern
Mining: A Unified Framework", Data Mining and Knowledge Discovery, 21(3):371-
397, 2010

You might also like