0% found this document useful (0 votes)

12 views51 pages

CH 4

Uploaded by

Revathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views51 pages

CH 4

Uploaded by

Revathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

1

CS 43105 Data Mining Techniques

Chapter 4 Frequent Itemset Mining
Xiang Lian
Department of Computer Science
Kent State University
Email: [email protected]
Homepage: http://www.cs.kent.edu/~xlian/
2

Outline
• Basic Concepts
• Frequent Itemset Mining
3

What is Data Mining?

• =Pattern Mining?
• What patterns?
• Why are they useful?
4

What Is Frequent Pattern Analysis?

• Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in
the context of frequent itemsets and association rule mining
• Motivation: Finding inherent regularities in data
• What products were often purchased together?— Beer and diapers?!

• What are the subsequent purchases after buying a PC?

• What kinds of DNA are sensitive to this new drug?

• Can we automatically classify Web documents?

• Applications
• Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
5

Why Is Frequent Pattern Mining

Important?
• Frequent pattern: An intrinsic and important
property of datasets
• Foundation for many essential data mining tasks
• Association, correlation, and causality analysis
• Sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
• Classification: discriminative, frequent pattern analysis
• Cluster analysis: frequent pattern-based clustering
• Data warehousing: iceberg cube and cube-gradient
• Semantic data compression: fascicles
• Broad applications
6

Frequent Patterns/Itemsets
TID Items
• itemset: A set of one or more
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
items
• Example: {Milk, Bread, Diaper}
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer • k-itemset X = {x1, …, xk}
5 Bread, Milk, Diaper, Coke • An itemset that contains k items

Customer Customer
buys both buys diaper

Customer
buys beer
7

Definition: Frequent Itemset

• Absolute support or support
count ()
• Frequency of occurrence of an
TID Items
itemset 1 Bread, Milk
• E.g. ({Milk, Bread,Diaper}) = 2 2 Bread, Diaper, Beer, Eggs
• Relative support or support 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
• Fraction of transactions that contain
5 Bread, Milk, Diaper, Coke
an itemset
• E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent itemset
• An itemset whose support is
greater than or equal to a minsup
threshold
Frequent Itemsets Mining
TID Transactions
• Minimum support
100 { A, B, E }
level 50%
200 { B, D }
• {A}, {B}, {C}, {A,B},
300 { A, B, E } {A,C}
400 { A, C }
500 { B, C }
600 { A, C }
700 { A, B }
800 { A, B, C, E }
900 { A, B, C }
1000 { A, C, E } 8
9

Three Different Views of FIM

TID Items
• Transactional Database
1 Bread, Milk
• How we do store a transactional 2 Bread, Diaper, Beer, Eggs
database? 3 Milk, Diaper, Beer, Coke
• Horizontal, Vertical, Transaction-Item Pair
4 Bread, Milk, Diaper, Beer
• Binary Matrix 5 Bread, Milk, Diaper, Coke
• Bipartite Graph

TIDs

items
9
This Photo by Unknown Author is licensed under CC BY-SA
10

Representation of Transactional
Databases
• Horizontal vs. vertical data layout

Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B
11

Representation of Transactional
Databases (cont'd)
• Binary Matrix Representation
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12

Frequent Itemset Generation

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there

ABCDE
are 2d possible
candidate itemsets
13

Frequent Itemset Generation

• Brute-force approach:
• Each itemset in the lattice is a candidate frequent itemset
• Count the support of each candidate by scanning the
database Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
• Match each transaction against every candidate
• Complexity ~ O(NMw) => Expensive since M = 2d !!!
14

Scalable Frequent Itemset Mining

Methods
• Scalable mining methods: Three major approaches
• Apriori (Agrawal & Srikant@VLDB’94)
• Vertical data format approach (Charm—Zaki & Hsiao
@SDM’02)
• Frequent pattern growth (FPgrowth—Han, Pei & Yin
@SIGMOD’00)
15

Reducing the Number of Candidates

• Apriori pruning principle:
• If an itemset is frequent, then all of its subsets must also be
frequent
• Reason: every transaction having {beer, diaper, nuts} also
contains the subset {beer, diaper}
• If there is any itemset which is infrequent, its superset
should not be generated/tested!
• Apriori pruning principle holds due to the following
property of the support measure:
X , Y : ( X  Y )  s( X )  s(Y )
• Support of an itemset never exceeds the support of its
subsets
• This is known as the anti-monotone property of support
16

Illustrating Apriori Principle

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
17

Illustrating Apriori Principle

Item Count Items (1-itemsets)
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered, Itemset Count

6
C1 + 6C2 + 6C3 = 41 {Bread,Milk,Diaper} 3
With support-based pruning,
6 + 6 + 1 = 13
18

The Apriori Algorithm—Another Example

Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1 {E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
19

Apriori: A Candidate Generation & Test

Approach
• Apriori Method
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Test the candidates against DB
• Terminate when no frequent or candidate set can be
generated
20

The Apriori Algorithm (Pseudo-Code)

Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
21

Apriori

R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

VLDB, 487-499, 1994
22
23

Implementation of Apriori
• How to generate candidates?
• Step 1: self-joining Lk
• Step 2: pruning
• Example of Candidate-generation
• L3={abc, abd, acd, ace, bcd}

• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
• Pruning:
• acde is removed because ade is not in L3

• C = {abcd}
24

How to Generate Candidates?

• Suppose the items in Lk-1 are listed in an order

• Step 1: self-joining Lk-1 (SQL Implementation)

insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 <
q.itemk-1
• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
25

Challenges of Frequent Itemset Mining

• Challenges
• Multiple scans of transaction database

• Huge number of candidates

• Tedious workload of support counting for candidates

• Improving Apriori: general ideas

• Reduce passes of transaction database scans

• Shrink number of candidates

• Facilitate support counting of candidates

Alternative Methods for Frequent Itemset

Generation
• Representation of Transactional Database
• horizontal vs. vertical data layout
Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B
27

ECLAT: Mining by Exploring Vertical

Data Format
• For each item, store a list of transaction ids (tids)

Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B TID-list
28

ECLAT
• Determine support of any k-itemset by intersecting
tid-lists of two of its (k-1) subsets.
A B AB
1 1 1
4 2 5
5
6
 5
7
 7
8
7 8
8 10
9

• 3 traversal approaches:
• top-down, bottom-up and hybrid
• Advantage: very fast support counting
• Disadvantage: intermediate tid-lists may become too
large for memory
29
30
31

FP-Growth Algorithm
• Use a compressed representation of the
database using an FP-tree

• Once an FP-tree has been constructed, it uses a

recursive divide-and-conquer approach to mine
the frequent itemsets
32

Construct FP-tree from a Transaction

Database
TID items Items bought (ordered) frequent
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 3
300 {b, f, h, j, o, w} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, construct m:2 b:1
FP-tree
F-list = f-c-a-b-m-p p:2 m:1
33

Partition Patterns and Databases

• Frequent patterns can be partitioned into subsets
according to f-list
• F-list = f-c-a-b-m-p
• Patterns containing p
• Patterns having m but no p
•…
• Patterns having c but no a nor b, m, p
• Pattern f
• Completeness and non-redundency
34

Find Patterns Having P From P-conditional

Database
• Starting at the frequent item header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent
item p
• Accumulate all of transformed prefix paths of item p to form
p’s conditional pattern base
{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
35

From Conditional Pattern-bases to

Conditional FP-trees
• For each pattern-base
• Accumulate the count for each item in the base
• Construct the FP-tree for the frequent items of the
pattern base
m-conditional pattern base:
{} fca:2, fcab:1
Header Table
Item frequency head All frequent
f:4 c:1 patterns relate to m
f 4 {}
c 4 c:3 b:1 b:1 m,
 fm, cm, am,
a 3 f:3 
b 3 a:3 p:1 fcm, fam, cam,
m 3 c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
36

Recursion: Mining Each Conditional FP-tree

{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}
Cond. pattern base of “cam”: (f:3) f:3
cam-conditional FP-tree
37

Another Example -- FP-Tree

Construction
null
After reading TID=1:

A:1
TID Items
1 {A,B}
2 {B,C,D} B:1
3 {A,C,D,E}
4 {A,D,E} After reading TID=2:
5 {A,B,C} null
6 {A,B,C,D} B:1
A:1
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:1 C:1
10 {B,C,E}
D:1
38

Another Example -- FP-Tree

Construction
TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table D:1

C:3 E:1
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
E frequent itemset generation
39

FP-Growth
Conditional Pattern base
null
for D:
P = {(A:1,B:1,C:1),
A:7 B:1 (A:1,B:1),
(A:1,C:1),
(A:1),
B:5 C:1 (B:1,C:1)}
C:1 D:1
Recursively apply FP-
D:1 growth on P
C:3
D:1
D:1 Frequent Itemsets found
(with sup > 1):
D:1 AD, BD, CD, ACD, BCD
41

Compact Representation of Frequent

Itemsets
• Some itemsets are redundant because they have identical
support as their supersets
TID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

• Number of frequent itemsets 10 

 3   
10

k
k 1

• Need a compact representation

Maximal Frequent Itemset

An itemset is maximal frequent if none of its
immediate supersets is frequent
null

Maximal A B C D E
Itemsets

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Border
Infrequent
Itemsets
ABCD
E
43

Closed Itemset
• An itemset is closed if none of its immediate
supersets has the same support as the itemset
Itemset Support
{A} 4
TID Items Itemset Support
{B} 5
1 {A,B} {A,B,C} 2
{C} 3
2 {B,C,D} {A,B,D} 3
{D} 4
3 {A,B,C,D} {A,C,D} 2
{A,B} 4 {B,C,D} 3
4 {A,B,D}
{A,C} 2 {A,B,C,D} 2
5 {A,B,C,D}
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
44

Maximal vs. Closed Itemsets

Transaction Ids
null
TID Items
1 ABC 124 123 1234 245 345
A B C D E
2 ABCD
3 BCE
4 ACDE 12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE
5 DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

2 4
ABCD ABCE ABDE ACDE BCDE

Not supported by
any transactions
ABCDE
45

Maximal vs. Closed Frequent Itemsets

Minimum support = 2 null Closed but
not
maximal
124 123 1234 245 345
A B C D E
Closed and
maximal

12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

2 4
ABCD ABCE ABDE ACDE BCDE # Closed = 9
# Maximal = 4

ABCDE
46

Maximal vs. Closed Itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets
47

Ref: Basic Concepts of Frequent Pattern Mining

• (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

association rules between sets of items in large databases. SIGMOD'93
• (Max-pattern) R. J. Bayardo. Efficiently mining long patterns from databases.
SIGMOD'98
• (Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering
frequent closed itemsets for association rules. ICDT'99
• (Sequential pattern) R. Agrawal and R. Srikant. Mining sequential patterns.
ICDE'95
48

Ref: Apriori and Its Improvements

• R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94
• H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering
association rules. KDD'94
• A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining
association rules in large databases. VLDB'95
• J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95
• H. Toivonen. Sampling large databases for association rules. VLDB'96
• S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and
implication rules for market basket analysis. SIGMOD'97
• S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with
relational database systems: Alternatives and implications. SIGMOD'98
49

Ref: Depth-First, Projection-Based FP Mining

• R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of
frequent itemsets. J. Parallel and Distributed Computing, 2002.
• G. Grahne and J. Zhu, Efficiently Using Prefix-Trees in Mining Frequent Itemsets, Proc.
FIMI'03
• B. Goethals and M. Zaki. An introduction to workshop on frequent itemset mining
implementations. Proc. ICDM’03 Int. Workshop on Frequent Itemset Mining
Implementations (FIMI’03), Melbourne, FL, Nov. 2003
• J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation.
SIGMOD’ 00
• J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic
Projection. KDD'02
• J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed Patterns without
Minimum Support. ICDM'02
• J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best Strategies for Mining
Frequent Closed Itemsets. KDD'03
50

Ref: Vertical Format and Row Enumeration Methods

• M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for

discovery of association rules. DAMI:97.

• M. J. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset

Mining, SDM'02.
• C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-Pruning

Algorithm for Itemsets with Constraints. KDD’02.

• F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER: Finding

Closed Patterns in Long Biological Datasets. KDD'03.

• H. Liu, J. Han, D. Xin, and Z. Shao, Mining Interesting Patterns from Very High

Dimensional Data: A Top-Down Row Enumeration Approach, SDM'06.

Ref: Mining Correlations and Interesting Rules

• S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing
association rules to correlations. SIGMOD'97.
• M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding
interesting rules from large sets of discovered association rules. CIKM'94.
• R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Measures of Interest.
Kluwer Academic, 2001.
• C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining
causal structures. VLDB'98.
• P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness Measure
for Association Patterns. KDD'02.
• E. Omiecinski. Alternative Interest Measures for Mining Associations. TKDE’03.
• T. Wu, Y. Chen, and J. Han, “Re-Examination of Interestingness Measures in Pattern
Mining: A Unified Framework", Data Mining and Knowledge Discovery, 21(3):371-
397, 2010

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
WT Da All Practical Questions
100% (2)
WT Da All Practical Questions
100 pages
Association Rules & Sequential Patterns
No ratings yet
Association Rules & Sequential Patterns
65 pages
8CT-DWM Lab Manual-19-20
No ratings yet
8CT-DWM Lab Manual-19-20
31 pages
Mining Various Kinds of Association Rules
No ratings yet
Mining Various Kinds of Association Rules
11 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Final Year Project
No ratings yet
Final Year Project
41 pages
Btech Sem6 Cs1141 Data Mining
No ratings yet
Btech Sem6 Cs1141 Data Mining
5 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Improved Algorithm For Mining of High Utility Patterns in One Phase Based On Map Reduce Framework On Hadoop
No ratings yet
Improved Algorithm For Mining of High Utility Patterns in One Phase Based On Map Reduce Framework On Hadoop
4 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
No ratings yet
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
118 pages
B ScCSIT-7SemSyllabus
No ratings yet
B ScCSIT-7SemSyllabus
26 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
No ratings yet
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
29 pages
1 Ijetst PDF
No ratings yet
1 Ijetst PDF
9 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
No ratings yet
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
19 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
3 FrequentItemsetMining
No ratings yet
3 FrequentItemsetMining
63 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
1098 2174 1 SM
No ratings yet
1098 2174 1 SM
9 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Unit 3
No ratings yet
Unit 3
62 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Frequent Item Mining
No ratings yet
Frequent Item Mining
35 pages
L6-7 - Apriori
No ratings yet
L6-7 - Apriori
22 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
B.E Ece 19 23 Batchno 35
No ratings yet
B.E Ece 19 23 Batchno 35
50 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
It - Kit 601 - Pes - QP - 31.05.2023
100% (1)
It - Kit 601 - Pes - QP - 31.05.2023
2 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Jurnal Information Retrieval
No ratings yet
Jurnal Information Retrieval
4 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Indira 2011
No ratings yet
Indira 2011
10 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
DM Association
No ratings yet
DM Association
43 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
44 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
A Machine Learning-Based Methodology To Predict Learners' Dropout, Success or Failure in MOOCs
No ratings yet
A Machine Learning-Based Methodology To Predict Learners' Dropout, Success or Failure in MOOCs
21 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Lecture 7
No ratings yet
Lecture 7
26 pages
Web Mining Frameworks
No ratings yet
Web Mining Frameworks
6 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
L20 21 AssociationRules
No ratings yet
L20 21 AssociationRules
24 pages
DM 2
No ratings yet
DM 2
71 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Bu I 11 FIM Apriori
No ratings yet
Bu I 11 FIM Apriori
72 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
ML Unsupervised Notes
No ratings yet
ML Unsupervised Notes
26 pages
Apriori
No ratings yet
Apriori
33 pages
CSC 501 Mid Term 2-Assignment
No ratings yet
CSC 501 Mid Term 2-Assignment
2 pages
Unit - III
No ratings yet
Unit - III
38 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
The Full Version and Explore A Variety of Ebooks
No ratings yet
The Full Version and Explore A Variety of Ebooks
43 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Slides
No ratings yet
Slides
92 pages
Chap5 Frequent Itemset
No ratings yet
Chap5 Frequent Itemset
70 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
11 pages
BIA Unit 4
No ratings yet
BIA Unit 4
11 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
AI&DS Module 1 KTU
No ratings yet
AI&DS Module 1 KTU
29 pages
Vitreous China, Fine Earthenware & Pottery Products World Summary: Market Values & Financials by Country
From Everand
Vitreous China, Fine Earthenware & Pottery Products World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Pottery, Ceramics & Plumbing Fixtures World Summary: Market Values & Financials by Country
From Everand
Pottery, Ceramics & Plumbing Fixtures World Summary: Market Values & Financials by Country
Editorial DataGroup
1/5 (1)

CH 4

Uploaded by

CH 4

Uploaded by

1

CS 43105 Data Mining Techniques

What is Data Mining?

What Is Frequent Pattern Analysis?

• What are the subsequent purchases after buying a PC?

• What kinds of DNA are sensitive to this new drug?

• Can we automatically classify Web documents?

Why Is Frequent Pattern Mining

Definition: Frequent Itemset

Three Different Views of FIM

Frequent Itemset Generation

ABCD ABCE ABDE ACDE BCDE

Given d items, there

Frequent Itemset Generation

Scalable Frequent Itemset Mining

Reducing the Number of Candidates

Illustrating Apriori Principle

ABCD ABCE ABDE ACDE BCDE

Illustrating Apriori Principle

If every subset is considered, Itemset Count

The Apriori Algorithm—Another Example

Apriori: A Candidate Generation & Test

The Apriori Algorithm (Pseudo-Code)

R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

How to Generate Candidates?

• Step 1: self-joining Lk-1 (SQL Implementation)

Challenges of Frequent Itemset Mining

• Huge number of candidates

• Tedious workload of support counting for candidates

• Improving Apriori: general ideas

• Shrink number of candidates

• Facilitate support counting of candidates

Alternative Methods for Frequent Itemset

ECLAT: Mining by Exploring Vertical

• Once an FP-tree has been constructed, it uses a

Construct FP-tree from a Transaction

Partition Patterns and Databases

Find Patterns Having P From P-conditional

From Conditional Pattern-bases to

Recursion: Mining Each Conditional FP-tree

{} Cond. pattern base of “am”: (fc:3) f:3

Another Example -- FP-Tree

Another Example -- FP-Tree

Header table D:1

Compact Representation of Frequent

• Number of frequent itemsets 10 

• Need a compact representation

Maximal Frequent Itemset

ABCD ABCE ABDE ACDE BCDE

Maximal vs. Closed Itemsets

Maximal vs. Closed Frequent Itemsets

Maximal vs. Closed Itemsets

Ref: Basic Concepts of Frequent Pattern Mining

• (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

Ref: Apriori and Its Improvements

Ref: Depth-First, Projection-Based FP Mining

Ref: Vertical Format and Row Enumeration Methods

• M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for

discovery of association rules. DAMI:97.

Algorithm for Itemsets with Constraints. KDD’02.

Closed Patterns in Long Biological Datasets. KDD'03.

Dimensional Data: A Top-Down Row Enumeration Approach, SDM'06.

Ref: Mining Correlations and Interesting Rules

You might also like